CN115687772A - Sequence recommendation method based on sequence dependence enhanced self-attention network - Google Patents

Sequence recommendation method based on sequence dependence enhanced self-attention network Download PDF

Info

Publication number
CN115687772A
CN115687772A CN202211398166.4A CN202211398166A CN115687772A CN 115687772 A CN115687772 A CN 115687772A CN 202211398166 A CN202211398166 A CN 202211398166A CN 115687772 A CN115687772 A CN 115687772A
Authority
CN
China
Prior art keywords
sequence
matrix
information
attention
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211398166.4A
Other languages
Chinese (zh)
Inventor
贾兆红
张虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN202211398166.4A priority Critical patent/CN115687772A/en
Publication of CN115687772A publication Critical patent/CN115687772A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a sequence recommendation method based on a sequence dependence enhanced self-attention network, which comprises the following steps: 1. constructing a data set and a representation of sequence recommendation, 2, acquiring a feature representation of an interaction sequence, 3, acquiring sequence dependence information and user preference change information of the interaction sequence, 4, capturing various feature information of the interaction sequence through a sequence dependence enhanced self-attention network, and 5, performing sequence recommendation by using the finally captured representation of the interaction sequence. When the relationship of the interactive items is processed, the sequential dependence enhanced self-attention network model is constructed, and the sequential dependence information of the user interactive items and the user preference change information are considered, so that the recommendation precision can be improved.

Description

Sequence recommendation method based on sequence dependence enhanced self-attention network
Technical Field
The invention relates to the field of recommendation, in particular to a sequence recommendation method based on a sequence dependence enhanced self-attention network.
Background
Sequence recommendation is an important research topic in the field of recommendation systems. Traditional recommendation systems model user and item interactions in a static manner, and we know that user interaction behavior tends to be continuous, user preferences and item popularity tend to change over time, and the like. At this point, sequence recommendations are generated, which view the user and item interactions as a sequence of dynamic interactions, mining the user's preferences by considering the contextual connections of the user-interactive items.
Traditional sequence recommendation algorithms mainly utilize a markov chain based approach (MC) which assumes that the next item to be interacted with by the user depends on several items that have been interacted with recently; due to this assumption, the MC-based approach cannot capture the higher order dependencies between items. In recent years, with the development of deep learning, deep learning models such as a Recurrent Neural Network (RNN) and a Convolutional Neural Network (CNN) are widely applied to sequence recommendation; RNN captures the sequence dependency relationship among items through a recursive structure, but has the problems of low efficiency, difficult long-term dependency storage and the like; while the CNN model captures the local features of the input sequence by performing convolution operations on the sequence, it cannot capture global information but only local information due to its own limitations. Later, with the big fire of the transform model in various fields, researchers began to introduce self-attention methods into sequence recommendations, which can capture the global dependency of the sequence well and can make the model focus more on previous interactive items with larger influence on future interactive items. User behavior tends to be contextual and user preferences tend to change over time. For example: after purchasing the tickets of the concert, the user can purchase the plane tickets going to the concert and then order the hotel to live in; users previously like to drink carbonated beverages and now like to drink fruit juice carbonated beverages. The conventional method cannot be well applied to the scenes, so that the recommendation accuracy is not high.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a sequence recommendation method based on a sequence dependence enhanced self-attention network, so that the sequence dependence of a user interaction sequence and the user preference change information can be better captured, and the accuracy of recommending articles to a user can be improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention relates to a sequence recommendation method based on a sequence dependency enhanced self-attention network, which is characterized by comprising the following steps of:
the method comprises the following steps: acquiring a user set U and acquiring an interaction sequence of any user U in the user set U
Figure BDA0003933958960000011
Figure BDA0003933958960000012
Wherein the content of the first and second substances,
Figure BDA0003933958960000013
the t-th interactive item representing the user u; i S u L represents the length of the interaction sequence of user u; thus the item set I is formed by the interaction sequence of all users u;
step two: constructing an enhanced self-attention network based on sequential dependence, comprising: an embedded layer, a position information layer, a GRU module, a self-attention module, a feedforward layer and a prediction layer;
step 2.1, the embedding layer utilizes a term vector matrix
Figure BDA0003933958960000021
Interaction sequence S of user u u Conversion to an embedded vector matrix
Figure BDA0003933958960000022
And is
Figure BDA0003933958960000023
Wherein the content of the first and second substances,
Figure BDA0003933958960000024
represent
Figure BDA0003933958960000025
The embedded vector representation of (a); i represents the length of the item set I; d represents the dimension of the item embedding vector;
step 2.2, the position information layer utilizes a position matrix
Figure BDA0003933958960000026
For embedded vector matrix E u Adding to obtain a position-embedded vector matrix X u And is made of
Figure BDA0003933958960000027
Wherein the content of the first and second substances,
Figure BDA0003933958960000028
represent
Figure BDA0003933958960000029
And the t-th position vector in the position matrix P
Figure BDA00039339589600000210
(ii) the added vector representation;
step 2.3: the GRU module utilizes formula (1) -formula (4) to vector matrix X u Processing to obtain the sequence and preference change information matrix
Figure BDA00039339589600000211
Figure BDA00039339589600000212
Figure BDA00039339589600000213
Figure BDA00039339589600000214
Figure BDA00039339589600000215
In the formulae (1) to (4), W r And U r A weight matrix representing the reset gates is shown,
Figure BDA00039339589600000216
status information indicating the t-1 st position,
Figure BDA00039339589600000217
representing a t position middle vector of the reset gate, and sigma representing a sigmoid activation function; an element that is a product of an element,
Figure BDA00039339589600000218
represents the t position intermediate vector of the update gate; w z And U z A weight matrix representing the updated gate, tanh represents a hyperbolic tangent function,
Figure BDA00039339589600000219
candidate state information indicating the t-th position; w is a group of h And U h A weight matrix to be learned representing a candidate state;
Figure BDA00039339589600000220
indicating the order-dependent information and user preference variation information captured at the t-th position;
step 2.4: the self-attention module utilizes a vector matrix X of equations (5) to (7) u And an order dependency information and user preference change information matrix H u Processing to obtain a self-attention information matrix
Figure BDA00039339589600000221
Figure BDA00039339589600000222
Figure BDA00039339589600000223
Figure BDA00039339589600000224
In the formula (5), the reaction mixture is,
Figure BDA00039339589600000225
are two weight matrices of the query vector from attention,
Figure BDA00039339589600000226
two weight matrices that are vectors of values from attention; e.g. of a cylinder ti Representing attention scores of the interactive items at the t positions and the interactive items at the ith position; a (a) to ti A weight value that is an attention score of the interactive item at the t-th position and the interactive item at the i-th position;
Figure BDA0003933958960000031
and
Figure BDA0003933958960000032
respectively, embedding vectors at the ith position
Figure BDA0003933958960000033
And the order and preference variation information vector of the ith position
Figure BDA0003933958960000034
The weight matrix of (a) is determined,
Figure BDA0003933958960000035
an output self-attention information vector representing the t-th position;
step 2.5, the feedforward layer utilizes the pair of formula (8)
Figure BDA0003933958960000036
Processing to obtain a feed-forward information matrix FFN (y) i ):
FFN(y i )=relu(W 1 y i +b 1 )W 2 +b 2 (8)
In formula (8), W 1 And W 2 Is a weight matrix; b is a mixture of 1 And b 2 Is an offset; relu is an activation function;
step 2.6, utilizing the pair of the formula (9)
Figure BDA0003933958960000037
After normalization operation is carried out, a normalized normalization information matrix is obtained
Figure BDA0003933958960000038
Inputting the feedforward information into the feedforward layer for processing to obtain a feedforward information matrix
Figure BDA0003933958960000039
And are combined with
Figure BDA00039339589600000310
After residual error connection, the intermediate sequence list of the t position is obtained
Figure BDA00039339589600000311
Figure BDA00039339589600000312
In the formula (9), the reaction mixture is,
Figure BDA00039339589600000313
and
Figure BDA00039339589600000314
represent
Figure BDA00039339589600000315
Mean and variance of; α and β are a scaling factor and an offset; epsilon represents a constant;
step 2.7, represent the t-th position middle order
Figure BDA00039339589600000316
Input into the self-attention moduleAnd with the order dependent information and the user preference variation information matrix H u Processing the two together to obtain a laminated self-attention information matrix Y u Then, the processing of step 2.5 and step 2.6 is carried out to obtain the final sequence representation of the t-th position
Figure BDA00039339589600000317
Step 2.8, the prediction layer represents the t-th position final sequence by using the formula (10)
Figure BDA00039339589600000318
Calculating to obtain the output of the t-th position interaction item of the user u after the t-th position interaction item passes through the sequential dependence self-attention network
Figure BDA00039339589600000319
Score r for ith item i,t
Figure BDA00039339589600000320
In the formula (10), the compound represented by the formula (10),
Figure BDA00039339589600000321
an embedded vector representation representing the ith item in the item vector matrix M; r is i,t A relevance score representing the ith item; t represents transposition;
step 2.9, constructing a target function Loss of the binary cross entropy by using a formula (11):
Figure BDA00039339589600000322
in formula (11), S represents a set of interaction sequences of all users; o z A positive sample number representing the expected z-th position, the positive sample representing the next item of user u predicted interaction; o 'to' z A negative sample number corresponding to the z-th position, the negative sample representing an item that does not appear in the user u interaction sequence;
Figure BDA00039339589600000323
represents the positive sample number o at the z-th position z A score for the corresponding positive sample;
Figure BDA00039339589600000324
denotes the z-th position negative sample number o' z The score of the corresponding negative example.
And 2.10, training the order dependence enhancement self-attention network by using a back propagation and gradient descent method, enabling the target function Loss to be minimum to update network parameters, stopping training when the iteration times reach the maximum iteration times, obtaining an optimal recommendation model for outputting scores of candidate items of the input interaction sequence, and selecting top items with the maximum scores from the optimal recommendation model for recommendation.
The electronic device of the invention comprises a memory and a processor, and is characterized in that the memory is used for storing programs for supporting the processor to execute the sequence recommendation method, and the processor is configured to execute the programs stored in the memory.
The invention relates to a computer-readable storage medium, on which a computer program is stored, characterized in that the computer program is adapted to perform the steps of the sequence recommendation method when being executed by a processor.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the method, on the basis of considering the absolute position information of the items, the sequence dependence information of the user interaction sequence and the information of the user preference change are further captured by the gate control mechanism-based recurrent neural network GRU, so that the capability of the model for capturing multi-aspect information is better enhanced, and therefore when the recommendation is performed on the users, the change of the user preference can be better found, and the required items can be better recommended to the users.
2. The invention adopts a mode of combining a self-attention method and a GRU method, and integrates multi-aspect information captured by the GRU on the basis of the self-attention method, and meanwhile, the attention mechanism also enables a model to pay more attention to information useful for future interaction items, so that the interference of noise items can be filtered out, and useful suggestions can be better provided for users.
3. The invention adopts the feedforward layer, effectively makes up the defect of the model in capturing nonlinear characteristics through two linear changes and an activation function, and improves the fault-tolerant capability of the model, thereby strengthening the generalization capability of the method and being capable of adapting to various application scenes.
4. According to the method, various dependency information of the interaction sequence can be captured more fully by overlapping the self-attention network for many times, and meanwhile, overfitting of the model in the training process is prevented by adopting layer normalization, residual connection and dropout regularization technologies, so that the training effect of the model is improved, and the accuracy rate recommended to a user is improved.
Drawings
Fig. 1 is a model diagram of a sequence recommendation method based on an order-dependent enhanced self-attention network according to the present invention.
Detailed Description
In this embodiment, a sequence recommendation method based on a sequential dependency enhanced self-attention network mainly utilizes a recurrent neural network GRU based on a gating mechanism and a self-attention network to extract various information and dependency relationships in an interaction sequence. As shown in fig. 1, the input of the model is a historical interaction sequence of the user, and the sequence input is obtained through an embedding layer to obtain an embedded vector representation of each interaction item; then adding each interactive item and the embedded vector of the corresponding position of the interactive item; inputting the addition result into GRU to extract sequence dependent information and user preference variation information; then, the GRU output and the original information are utilized to extract information in various aspects of the interactive sequence by self attention; then improving the nonlinear capturing capability of the model through a feedforward layer; the capability of the model for capturing various information is improved by superposing multiple layers of self-attention and feedforward layers; and over-fitting is prevented by layer normalization, residual concatenation; finally, the scores of the candidate items are calculated through the expression of the interaction sequences extracted by the user, so that top items with high scores are recommended to the user. Specifically, the method comprises the following steps:
the method comprises the following steps: acquiring a user set U and acquiring an interaction sequence of any user U in the user set U
Figure BDA0003933958960000051
Figure BDA0003933958960000052
Sequencing the interaction sequence of the user u according to the time stamp; wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003933958960000053
the t-th interactive item representing the user u; i S u L represents the length of the interaction sequence of user u; so that the item set I is formed by the interaction sequences of all users y; users with less than 5 interactive items and items with less than 5 interactions are not considered.
Step two: constructing an order dependence-based enhanced self-attention network as shown in fig. 1 includes: the system comprises an embedded layer, a position information layer, a GRU module, a self-attention module, a feedforward layer and a prediction layer;
step 2.1, the embedding layer utilizes the project vector matrix
Figure BDA0003933958960000054
Interaction sequence S of user y u Conversion to an embedded vector matrix
Figure BDA0003933958960000055
And is provided with
Figure BDA0003933958960000056
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003933958960000057
to represent
Figure BDA0003933958960000058
The embedded vector representation of (a); i represents the length of the item set I; d represents the dimension of the item embedding vector;
step 2.2, the position information layer utilizes the position matrix
Figure BDA0003933958960000059
For embedded vector matrix E u Carrying out superposition processing to obtain a vector matrix X after position embedding u And is and
Figure BDA00039339589600000510
the position information can be displayed to show the context of each item in the interaction sequence of the user u. Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA00039339589600000511
to represent
Figure BDA00039339589600000512
And the t-th position vector in the position matrix P
Figure BDA00039339589600000513
The added vector is represented.
Step 2.3: GRU module utilizes formula (1) -formula (4) to embed vector matrix X after position u Processing to obtain sequence dependence information and user preference change information matrix
Figure BDA00039339589600000514
Figure BDA00039339589600000515
Figure BDA00039339589600000516
Figure BDA00039339589600000517
Figure BDA00039339589600000518
In the formulae (1) to (4),W r and U r A weight matrix representing the reset gates is shown,
Figure BDA00039339589600000519
status information indicating the t-1 st position,
Figure BDA00039339589600000520
representing a t position middle vector of the reset gate, and sigma representing a sigmoid activation function; an element indicates that the number of lines,
Figure BDA00039339589600000521
represents the t-th position intermediate vector of the update gate; w z And U z A weight matrix representing the updated gate, tanh represents a hyperbolic tangent function,
Figure BDA00039339589600000522
candidate state information indicating the t-th position; w is a group of h And U h A weight matrix to be learned representing a candidate state;
Figure BDA00039339589600000523
indicating the order-dependent information and user preference variation information captured at the t-th position; the GRU module can well capture the sequence dependency relationship among the user interaction items and can instantly sense when the user preference changes so as to make better recommendation.
Step 2.4: the self-attention module utilizes a vector matrix X after embedding the positions by using an expression (5) to an expression (7) u And order dependency information and user preference change information matrix H u Processing to obtain a self-attention information matrix
Figure BDA0003933958960000061
Figure BDA0003933958960000062
Figure BDA0003933958960000063
Figure BDA0003933958960000064
In the formula (5), the reaction mixture is,
Figure BDA0003933958960000065
is a self-attentive query vector weight matrix,
Figure BDA0003933958960000066
a weight matrix that is a vector of values from attention; e.g. of the type ti Representing attention scores of the interactive items at the t positions and the interactive items at the ith position; a (a) to ti A weight value that is an attention score of the interactive item at the t-th position and the interactive item at the i-th position;
Figure BDA0003933958960000067
and
Figure BDA0003933958960000068
respectively, the i-th position embedding vector
Figure BDA0003933958960000069
And the order dependency information and user preference variation information vector of the ith position
Figure BDA00039339589600000610
The weight matrix of (a) is determined,
Figure BDA00039339589600000611
an output self-attention information vector representing the t-th position;
Figure BDA00039339589600000612
mainly for preventing e in the input formula (6) ij The value is too large, resulting in the partial derivative approaching 0. Some items which are irrelevant to the items to be interacted with by the user in the future exist in the interaction sequence of the user, and the attention mechanism can be well usedInformation of interaction items having a large influence on the items that we want to interact with in the future is captured, and interference of such noise data is reduced.
Step 2.5, the feedforward layer utilizes the pair of formula (8)
Figure BDA00039339589600000613
Processing to obtain a feed-forward information matrix FFN (y) i ):
FFN(y i )=relu(W 1 y i +b 1 )W 2 +b 2 (8)
In the formula (8), W 1 And W 2 Is a weight matrix; b 1 And b 2 Is an offset; relu is an activation function; the feedforward layer is composed of a linear change function and an activation function, the nonlinear characteristics of the model can be increased, and the fault tolerance capability and the generalization capability of the model are improved.
Step 2.6, utilizing the pair of the formula (9)
Figure BDA00039339589600000614
After normalization operation is carried out, a normalized normalization information matrix is obtained
Figure BDA00039339589600000615
Then inputting the data into a feedforward layer for processing to obtain a feedforward information matrix
Figure BDA00039339589600000616
And is combined with
Figure BDA00039339589600000617
After residual error connection, a middle sequence list representation matrix of the t-th position is obtained
Figure BDA00039339589600000618
Figure BDA00039339589600000619
In the formula (9), the reaction mixture is,
Figure BDA00039339589600000620
and
Figure BDA00039339589600000621
represent
Figure BDA00039339589600000622
Mean and variance of; α and β are a scaling factor and an offset; e =1e -8 Representing a constant. By adopting layer normalization and residual connection, the overfitting problem in the model training process and the information loss problem caused by the deepening of the network depth can be effectively reduced.
Step 2.7, represent the t-th position middle order
Figure BDA00039339589600000623
Input into the attention module and is related to the sequence dependency information and the user preference change information matrix H u Processing the two together to obtain a laminated self-attention information matrix Y u Then, the processing of step 2.5 and step 2.6 is carried out to obtain the final sequence representation of the t-th position
Figure BDA0003933958960000071
The two-time processing enables the model to better capture information in various aspects of the user interaction sequence, and improves the recommendation accuracy of the model.
Step 2.8, the prediction layer represents the t-th position final sequence by using the formula (10)
Figure BDA0003933958960000072
Calculating to obtain the score r of the ith position interactive item of the user u to the ith item after the t position interactive item passes through the sequential self-attention-dependent network i,t
Figure BDA0003933958960000073
In the formula (10), the compound represented by the formula (10),
Figure BDA0003933958960000074
an embedded vector representation representing the ith item in the item vector matrix M; r is i,t Representing the relevance score of the ith item; t represents transposition;
step 2.9, constructing a target function Loss of the binary cross entropy by using the formula (11):
Figure BDA0003933958960000075
in formula (11), S represents a set of interaction sequences of all users; o z A positive sample number representing the expected z-th position, the positive sample representing the next item the user u predicts to interact with; o' z A negative sample number corresponding to the z-th position, the negative sample representing an item that does not appear in the user u interaction sequence;
Figure BDA0003933958960000076
denotes the z-th position positive sample number o z A score for the corresponding positive sample;
Figure BDA0003933958960000077
denotes the z-th position negative sample number o' z Score of the corresponding negative example. In this embodiment, the data set is divided into a training set, a verification set, and a test set, the latest interaction item of the user is used as the test set, the second new interaction item is used as the verification set, and the rest are used as the training set.
Step 2.10, training the order dependence-based enhanced self-attention network by using a back propagation and gradient descent method, wherein the gradient descent method adopts a learning rate of 0.001 and an exponential decay rate beta 1 =0.9,β 2 =0.98, adam optimization algorithm is used; and minimizing the Loss of the target function to update the network parameters, stopping training when the iteration times reach the maximum iteration times of 600, so as to obtain the score of the optimal recommendation model for outputting the candidate items of the input interaction sequence, and selecting the top items with the maximum score from the optimal recommendation model for recommendation.
In this embodiment, an electronic device includes a memory for storing a program that supports a processor to execute the above-described sequence recommendation method, and a processor configured to execute the program stored in the memory.
In this embodiment, a computer-readable storage medium stores a computer program, and the computer program is executed by a processor to execute the steps of the sequence recommendation method.

Claims (3)

1. A sequence recommendation method based on a sequence dependence enhanced self-attention network is characterized by comprising the following steps:
the method comprises the following steps: acquiring a user set U and acquiring an interaction sequence of any user U in the user set U
Figure FDA0003933958950000011
Figure FDA0003933958950000012
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003933958950000013
the t-th interactive item representing the user u; i S u L represents the length of the interaction sequence of user u; thus the item set I is formed by the interaction sequence of all users u;
step two: constructing an enhanced self-attention network based on sequential dependence, comprising: the system comprises an embedded layer, a position information layer, a GRU module, a self-attention module, a feedforward layer and a prediction layer;
step 2.1, the embedding layer utilizes the project vector matrix
Figure FDA0003933958950000014
Interaction sequence S of user u u Conversion to an embedded vector matrix
Figure FDA0003933958950000015
And is
Figure FDA0003933958950000016
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003933958950000017
represent
Figure FDA0003933958950000018
The embedded vector representation of (a); i represents the length of the item set I; d represents the dimension of the item embedding vector;
step 2.2, the position information layer utilizes a position matrix
Figure FDA0003933958950000019
For embedded vector matrix E u Adding to obtain a position-embedded vector matrix X u And is made of
Figure FDA00039339589500000110
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA00039339589500000111
represent
Figure FDA00039339589500000112
And the t-th position vector in the position matrix P
Figure FDA00039339589500000113
The added vector representation;
step 2.3: the GRU module utilizes an equation (1) -equation (4) vector matrix X u Processing to obtain the order and preference change information matrix
Figure FDA00039339589500000114
Figure FDA00039339589500000115
Figure FDA00039339589500000116
Figure FDA00039339589500000117
Figure FDA00039339589500000118
In the formulae (1) to (4), W r And U r A weight matrix representing the reset gates is shown,
Figure FDA00039339589500000119
status information indicating the t-1 st position,
Figure FDA00039339589500000120
representing a t position middle vector of the reset gate, and sigma representing a sigmoid activation function; an element indicates that the number of lines,
Figure FDA00039339589500000121
represents the t position intermediate vector of the update gate; w z And U z A weight matrix representing the updated gate, tanh represents a hyperbolic tangent function,
Figure FDA00039339589500000122
candidate state information indicating the t-th position; w h And U h A weight matrix to be learned representing a candidate state;
Figure FDA00039339589500000123
indicating the order-dependent information and user preference variation information captured at the t-th position;
step 2.4: the self-attention module utilizes a vector matrix X of equations (5) -7 u And an order dependency information and user preference change information matrix H u Is processed to obtainSelf-attention information matrix
Figure FDA00039339589500000124
Figure FDA00039339589500000125
Figure FDA0003933958950000021
Figure FDA0003933958950000022
In the formula (5), the reaction mixture is,
Figure FDA0003933958950000023
are two weight matrices of the query vector from attention,
Figure FDA0003933958950000024
two weight matrices that are vectors of values from attention; e.g. of a cylinder ti Representing the attention scores of the interactive items of the t positions and the interactive items of the ith position; a (a) to ti A weight value that is an attention score of the interactive item at the t-th position and the interactive item at the i-th position;
Figure FDA0003933958950000025
and
Figure FDA0003933958950000026
respectively, embedding vectors at the ith position
Figure FDA0003933958950000027
And the order and preference variation information vector of the ith position
Figure FDA0003933958950000028
The weight matrix of (a) is determined,
Figure FDA0003933958950000029
an output self-attention information vector representing the t-th position;
step 2.5, the feedforward layer utilizes the pair of formula (8)
Figure FDA00039339589500000210
Processing to obtain a feed-forward information matrix FFN (y) i ):
FFN(y i )=relu(W 1 y i +b 1 )W 2 +b 2 (8)
In the formula (8), W 1 And W 2 Is a weight matrix; b is a mixture of 1 And b 2 Is an offset; relu is an activation function;
step 2.6, utilizing the pair of the formula (9)
Figure FDA00039339589500000211
After normalization operation is carried out, a normalized information matrix is obtained
Figure FDA00039339589500000212
Inputting the feedforward information into the feedforward layer for processing to obtain a feedforward information matrix
Figure FDA00039339589500000213
And are combined with
Figure FDA00039339589500000214
After residual error connection, the intermediate sequence list of the t position is obtained
Figure FDA00039339589500000215
Figure FDA00039339589500000216
In the formula (9), the reaction mixture is,
Figure FDA00039339589500000217
and
Figure FDA00039339589500000218
to represent
Figure FDA00039339589500000219
Mean and variance of; α and β are a scaling factor and an offset; e represents a constant;
step 2.7, represent the t-th position middle order
Figure FDA00039339589500000220
Input into the self-attention module, and output with the sequence dependence information and the user preference variation information matrix H u Processing the two together to obtain a laminated self-attention information matrix Y u Then, the processing of step 2.5 and step 2.6 is carried out to obtain the final sequence representation of the t-th position
Figure FDA00039339589500000221
Step 2.8, the prediction layer represents the t-th position final sequence by using the formula (10)
Figure FDA00039339589500000222
Calculating to obtain the output of the t-th position interactive item of the user u after the t-th position interactive item passes through the sequential self-attention-dependent network
Figure FDA00039339589500000223
Score r for ith item i,t
Figure FDA00039339589500000224
In the formula (10), the compound represented by the formula (10),
Figure FDA00039339589500000225
an embedded vector representation representing the ith item in the item vector matrix M; r is i,t A relevance score representing the ith item; t represents transposition;
step 2.9, constructing a target function Loss of the binary cross entropy by using the formula (11):
Figure FDA00039339589500000226
in formula (11), S represents a set of interaction sequences of all users; o z A positive sample number representing the expected z-th position, the positive sample representing the next item of predicted interaction by user u; o 'to' z A negative sample number corresponding to the z-th position, the negative sample representing an item that does not appear in the user u interaction sequence;
Figure FDA0003933958950000031
denotes the z-th position positive sample number o z A score for the corresponding positive sample;
Figure FDA0003933958950000032
denotes the z-th position negative sample number o' z The score of the corresponding negative example.
And 2.10, training the order dependence enhancement self-attention network by using a back propagation and gradient descent method, enabling the target function Loss to be minimum so as to update network parameters, stopping training when the iteration times reach the maximum iteration times, obtaining an optimal recommendation model for outputting scores of candidate items of the input interaction sequence, and selecting top items with the maximum scores from the optimal recommendation model for recommendation.
2. An electronic device comprising a memory and a processor, wherein the memory is configured to store a program that enables the processor to perform the sequence recommendation method of claim 1, and the processor is configured to execute the program stored in the memory.
3. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the sequence recommendation method of claim 1.
CN202211398166.4A 2022-11-09 2022-11-09 Sequence recommendation method based on sequence dependence enhanced self-attention network Pending CN115687772A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211398166.4A CN115687772A (en) 2022-11-09 2022-11-09 Sequence recommendation method based on sequence dependence enhanced self-attention network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211398166.4A CN115687772A (en) 2022-11-09 2022-11-09 Sequence recommendation method based on sequence dependence enhanced self-attention network

Publications (1)

Publication Number Publication Date
CN115687772A true CN115687772A (en) 2023-02-03

Family

ID=85049622

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211398166.4A Pending CN115687772A (en) 2022-11-09 2022-11-09 Sequence recommendation method based on sequence dependence enhanced self-attention network

Country Status (1)

Country Link
CN (1) CN115687772A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116645174A (en) * 2023-07-27 2023-08-25 山东省人工智能研究院 Personalized recommendation method based on decoupling multi-behavior characterization learning
CN117172884A (en) * 2023-10-31 2023-12-05 上海为旌科技有限公司 Method, device, electronic equipment and storage medium for recommending places of interest

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116645174A (en) * 2023-07-27 2023-08-25 山东省人工智能研究院 Personalized recommendation method based on decoupling multi-behavior characterization learning
CN116645174B (en) * 2023-07-27 2023-10-17 山东省人工智能研究院 Personalized recommendation method based on decoupling multi-behavior characterization learning
CN117172884A (en) * 2023-10-31 2023-12-05 上海为旌科技有限公司 Method, device, electronic equipment and storage medium for recommending places of interest

Similar Documents

Publication Publication Date Title
CN111079532B (en) Video content description method based on text self-encoder
US10540967B2 (en) Machine reading method for dialog state tracking
CN111339415B (en) Click rate prediction method and device based on multi-interactive attention network
CN113905391B (en) Integrated learning network traffic prediction method, system, equipment, terminal and medium
CN115687772A (en) Sequence recommendation method based on sequence dependence enhanced self-attention network
CN113673594B (en) Defect point identification method based on deep learning network
CN115082147B (en) Sequence recommendation method and device based on hypergraph neural network
CN111695779A (en) Knowledge tracking method, knowledge tracking device and storage medium
EP3411835B1 (en) Augmenting neural networks with hierarchical external memory
CN108182260B (en) Multivariate time sequence classification method based on semantic selection
CN114493755B (en) Self-attention sequence recommendation method fusing time sequence information
CN114519145A (en) Sequence recommendation method for mining long-term and short-term interests of users based on graph neural network
CN115222998B (en) Image classification method
CN112258262A (en) Conversation recommendation method based on convolution self-attention network
CN111178986B (en) User-commodity preference prediction method and system
CN114020964A (en) Method for realizing video abstraction by using memory network and gated cyclic unit
CN111027681B (en) Time sequence data processing model training method, data processing method, device and storage medium
Xing et al. Few-shot single-view 3d reconstruction with memory prior contrastive network
CN113821724B (en) Time interval enhancement-based graph neural network recommendation method
US20240037133A1 (en) Method and apparatus for recommending cold start object, computer device, and storage medium
CN111753995A (en) Local interpretable method based on gradient lifting tree
CN112069404A (en) Commodity information display method, device, equipment and storage medium
CN114511813B (en) Video semantic description method and device
CN113779244B (en) Document emotion classification method and device, storage medium and electronic equipment
CN114692012A (en) Electronic government affair recommendation method based on Bert neural collaborative filtering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination