CN113822742A - Recommendation method based on self-attention mechanism - Google Patents
Recommendation method based on self-attention mechanism Download PDFInfo
- Publication number
- CN113822742A CN113822742A CN202111098120.6A CN202111098120A CN113822742A CN 113822742 A CN113822742 A CN 113822742A CN 202111098120 A CN202111098120 A CN 202111098120A CN 113822742 A CN113822742 A CN 113822742A
- Authority
- CN
- China
- Prior art keywords
- user
- layer
- item
- sequence
- recommendation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 230000007246 mechanism Effects 0.000 title claims abstract description 24
- 230000003993 interaction Effects 0.000 claims abstract description 61
- 238000012549 training Methods 0.000 claims abstract description 26
- 238000005457 optimization Methods 0.000 claims abstract description 17
- 238000012163 sequencing technique Methods 0.000 claims abstract description 7
- 238000007781 pre-processing Methods 0.000 claims abstract description 5
- 230000006870 function Effects 0.000 claims description 21
- 239000013598 vector Substances 0.000 claims description 17
- 230000004927 fusion Effects 0.000 claims description 16
- 230000007774 longterm Effects 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 11
- 230000004913 activation Effects 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 230000001419 dependent effect Effects 0.000 claims description 4
- 230000006835 compression Effects 0.000 claims description 3
- 238000007906 compression Methods 0.000 claims description 3
- 230000002452 interceptive effect Effects 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 7
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 238000013136 deep learning model Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 241001522296 Erithacus rubecula Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005315 distribution function Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0631—Item recommendations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Business, Economics & Management (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Accounting & Taxation (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Finance (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the field of article recommendation, and discloses a recommendation method based on a self-attention mechanism, which improves the training efficiency and the personalized recommendation effect of a recommendation model. The method comprises the following steps: firstly, collecting historical interaction information of a user and preprocessing the historical interaction information to form a training sample set; then designing a recommendation model, taking the training sample set as the input of the recommendation model, and adopting a square loss function as an optimization target to train the recommendation model; and finally, calculating through a trained recommendation model to obtain the interaction probability between the user and the item to be recommended, and sequencing according to the interaction probability to generate a recommendation candidate set of the user.
Description
Technical Field
The invention relates to the field of article recommendation, in particular to a recommendation method based on a self-attention mechanism.
Background
In the information explosion era, the number of commodities is increased sharply, users are difficult to find interesting contents in a short time, and how to quickly and accurately generate personalized recommended contents by using the historical records of the users becomes a current research hotspot. The recommendation algorithm can help users to find out interesting contents, and the current mainstream method comprises the following steps: the collaborative filtering recommendation algorithm is based on a content recommendation algorithm and a deep learning model recommendation algorithm. The recommendation algorithm based on the deep learning model gradually becomes the most important method due to strong fitting capability and generalization capability.
In the aspect of a recommendation algorithm based on a deep learning model, Jin et al propose a neural perception recommendation method, capture a preference vector of a user by using a neural network, and further infer the user's preference degree for a project. The method considers the quality of commodities related to stores, and considers that the shopping consideration of the user is determined by both personal interest and the quality of the commodities. DeepCoNN adopts two parallel convolution modules, wherein one is focused on learning user behaviors by using comments written by users, the other network learns commodity attributes from the commodity comments, and the extracted features are input into convolution layers, maximum pooling layers and full connection layers of different kernels to obtain user representation XuAnd commodity representation YiAnd finally, calculating to obtain the expected rating value of the user on the commodity according to the two items.
The above research is mainly based on recommendation made by user historical information, the neural perception recommendation method proposed by Jin et al does not consider preference weights between users and various items, all interactions adopt the same weight value, and the lost information may influence the learning effect of the model. Meanwhile, the method needs to additionally calculate the overall quality score of the store through the feedback of the user, and the variation frequency of the score and the evaluation of the store is large, so that the training overhead of the model is increased. DeepCoNN adopts two feature extraction modules, each user needs to obtain the features of the items and the commodities from the user comments and the commodity comments through the two feature extraction modules, the calculation cost is high, and a large amount of text information learning possibly has negative influence on the recommendation effect.
In fact, the positions of the selected items appearing in the user records can reflect the interests of the users corresponding to the time nodes, and the research does not utilize the position information in the sequence, and meanwhile, the association degree between the items in the historical information is ignored.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: a recommendation method based on a self-attention mechanism is provided, and the training efficiency and the personalized recommendation effect of a recommendation model are improved.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a recommendation method based on a self-attention mechanism comprises the following steps:
s1, training a recommendation model:
s11, collecting and preprocessing historical interaction information of a user to form a training sample set;
s12, designing a recommendation model, taking the training sample set as the input of the recommendation model, and adopting a square loss function as an optimization target to train the recommendation model;
s2, recommending according to the trained recommendation model:
and calculating through the trained recommendation model to obtain the interaction probability between the user and the item to be recommended, and sequencing according to the interaction probability to generate a recommendation candidate set of the user.
As a further optimization, step S11 specifically includes:
s111, acquiring a historical interaction record of a user and converting the historical interaction record into a user-item interaction matrix; items of the interaction matrix comprise user codes, item codes and item categories;
s112, carrying out 0 setting and filling on the item type information which is a null value;
s113, converting the user codes, the item codes and the item types into one-hot codes, and performing numerical compression processing;
s114, sequencing the interaction information of the users in the user-item interaction matrix according to a time sequence to obtain a historical interaction sequence of each user, wherein each item in the historical interaction sequence is represented by one-hot codes.
As a further optimization, in step S12, the proposed model of the design includes: the device comprises an input layer, a coding layer, a feature fusion layer and an output layer; the input layer is used for converting input data into a low-dimensional embedded representation; the coding layer is responsible for acquiring long-term and short-term dependence representation of a user historical interaction sequence; the feature fusion layer is used for fusing the sequence features and the item category features and converting the sequence features and the item category features into more dense features; and the output layer is responsible for generating a final result as the interaction probability of the user and the project by combining the features obtained by the user code embedding layer, the project code embedding layer and the feature fusion layer.
As a further optimization, in step S12, the input layer is used to convert the input data into a low-dimensional embedded representation, and includes:
the one-hot coding of the user code, the item category and the historical interaction sequence represents that a vector corresponding to one-hot index is obtained by looking up a table in a random initialization embedding table, so that the vector is converted into a low-dimensional embedding value:
The embedded representation of the item category isWherein T is the total number of item categories, let dc=32;
The embedded representation of the historical interaction sequence is: wherein ,only a maximum of l items are kept per sequence,for item coding, piThe position of the item is encoded.
As a further optimization, in step S12, the encoding layer is responsible for obtaining a long-short term dependency representation of the user history interaction sequence, including:
adopting a Transformer module as a long-term dependent learning module to express E by embedding historical interaction sequencesTAs input, to obtain a sequence feature hL;
The GRU module is adopted as a learning module of short-term dependence and is coded by k items at the tail end of the sequence in the history interactive sequenceSelecting the output of the kth GRU unit as a short-term dependency representation as input, and capturing the recent interest feature h of the userS。
As a further optimization, the Transformer module comprises a multi-head self-attention layer, a feedforward network layer and a normalization layer; the multi-headed self-attention layer utilizes a plurality of self-attention modules to learn different hidden layer representations; the feedforward network layer adopts a Gelu activation function; the normalization layer employs a residual error network.
As a further optimization, in step S12, the feature fusion layer is used to fuse the sequence features and the item coding features and convert the sequence features into more dense features, and includes:
associating item categories with featuresSequence characteristics h obtained by the transform moduleLAnd output h of GRU moduleSSplicing the vectors into a new vector, and then adopting a multilayer perceptron to splice the vectorsConversion to d dimensionmIn which d is an embedded representation ofm=128:
wherein It is shown that the splicing operation is performed,are parameters that need to be learned.
As a further optimization, in step S12, the output layer is responsible for generating final results as interaction probabilities between the user and the project by combining the features obtained by the user code embedding, the project code embedding and the feature fusion layer, including:
user uiCorresponding embedded representationItem djIs embedded in the representationAnd the concatenation vectorIs spliced and then the final output is obtained through MLP (multi-layer neural network)
wherein ,i.e. user uiAnd item djF is the activation function Relu,are parameters that need to be learned.
As a further optimization, in step S12, the recommendation model is trained by using a square-loss function as an optimization target, including:
1) calculating a loss function:
wherein ,a tag that represents the data is provided,i.e. user uiAnd item djThe probability of interaction of (a) is,representing user uiAnd item djWhether it has interacted, if djPresent in uiIn the recording of (1), thenOtherwise, the value is 0; λ represents a regularization coefficient for controlling the degree of parameter regularization; phi denotes that regularization operation is requiredA parameter set;
2) and iterating the processing steps of the input layer, the coding layer, the feature fusion layer and the output layer in a random gradient descent mode until the training period is finished, and taking the model with the minimum loss as a trained recommendation model.
The invention has the beneficial effects that:
the image feature extraction with higher calculation cost is avoided, deep sequence features are captured from user history by using a Transformer module with high parallelism, the training efficiency is high, and the captured deep sequence features are combined with the project attributes, so that the characterization capability of model embedding can be enriched;
the model parameters are updated by adopting a multi-head attention mechanism, so that the learning capability of the model is effectively improved, the recommendation effect is more accurate, and the recommendation content is more in line with the user interest.
Drawings
FIG. 1 is a flow chart of a recommendation method based on a self-attention mechanism in an embodiment;
fig. 2 is a diagram of a recommendation model structure in the embodiment.
Detailed Description
The invention aims to provide a recommendation method based on a self-attention mechanism, which improves the training efficiency of a recommendation model and the personalized recommendation effect. Firstly, collecting historical interaction information of a user and preprocessing the historical interaction information to form a training sample set; then designing a recommendation model, taking the training sample set as the input of the recommendation model, and adopting a square loss function as an optimization target to train the recommendation model; and finally, calculating through a trained recommendation model to obtain the interaction probability between the user and the item to be recommended, and sequencing according to the interaction probability to generate a recommendation candidate set of the user.
Example (b):
as shown in fig. 1, the recommendation method based on the self-attention mechanism in this embodiment includes the following implementation steps:
A. training a recommendation model:
a1, collecting and preprocessing user historical interaction information to form a training sample set, and specifically comprising the following steps A11-A14:
a11, acquiring a historical interaction record of a user and converting the historical interaction record into a user-item interaction matrix; items of the interaction matrix comprise user codes, item codes and item categories; through conversion processing, the number u of each user is acquiredi∈{u1,u2,...,uN}, number of each item dj∈{d1,d2,...,dMAnd item djCorresponding categoryWhere N is the number of users, M is the number of items, and T is the number of categories of items.
A12, carrying out 0 setting and filling on the item type information which is null;
a13, converting the user code, the item code and the item type into one-hot codes and performing numerical compression processing; the processed numerical information is a continuous integer, and the serial numbers start from 0;
a14, sequencing the interaction information of the users in the user-item interaction matrix according to the time sequence to obtain each user uiHistorical interaction sequence ofL represents the length of the sequence, and each item in the historical interaction sequence is represented by one-hot coding.
A2, designing a recommendation model, taking a training sample set as an input of the recommendation model, and taking a square loss function as an optimization target to train the recommendation model, wherein the method specifically comprises the following steps of A21-A22:
a21, model construction:
in the step, a recommendation model based on a self-attention mechanism for commodity recommendation is designed, and the model consists of an input layer, a coding layer, a feature fusion layer and an output layer; the input layer is used for converting data represented by one-hot into low-dimensional embedded representation, the coding layer acquires long-term and short-term dependence representation of a user history sequence through a Transformer and a GRU module, the feature fusion layer fuses sequence features and item category features and converts the sequence features into denser features, and the output layer generates a final result as interaction probability of a user and an item by combining the features obtained by the user embedding, the item embedding and the feature fusion layer.
The 4 levels of the model are specified as follows:
(1) an input layer:
at the input level, the data represented by one-hot (including user code, item category) needs to be converted into embedded values of low dimension:
the embedded representation of the user code isWhere N is the total number of users, here let the embedding dimension du=128。
The embedded representation of the item code isWhere M is the total number of items, let d be the embedding dimensiond=128。
The embedded representation of the item category isWhere T is the number of item categories, here let the embedding dimension dc=32。
The sequence order of the items can reflect the change of user behaviors, but since the Transformer module designed in the coding layer does not carry the time sequence information like a recurrent neural network, additional sequence coding is needed to ensure the importance of the model to learn the position when processing the sequence information, and the position coding P is { P {1,p2,...,plAnd l is the maximum length of the sequence.
The data output of the input layer is: combining position codes with original input codes of items wherein Only a maximum of l entries are reserved per sequence.
(2) And (3) coding layer:
the combination of long-term and short-term dependence can learn the related information of short-term dependence on the basis of effectively acquiring the sequence characteristics, and effectively captures the recent interest characteristics of the user. The Transformer module has the advantage of long-term dependence capture, and the characteristic representation of the sequence is effectively obtained by learning terms and weights among the terms through a self-attention mechanism. In addition, the parallelism of the Transformer is high, and a serialization learning mode of a recurrent neural network is avoided.
Therefore, we choose a Transformer as the long-term dependent learning module at the coding layer. Although GRU (gated round robin unit) is not as effective in extracting long sequence features as the Transformer, the performance is almost consistent on short sequences, and the parameter amount of the GRU module is much smaller than that of the Transformer, so that the GRU module is more suitable for learning short-term dependence. Therefore, we choose the GRU module as the learning module for short term dependency at the coding level.
The Transformer consists of a multi-head attention layer, a feedforward network layer and a normalization layer. The key of the module is a multi-head attention layer, which is based on a self-attention mechanism and assists in learning the token vectors in different subspaces by a plurality of heads. The basic flow equation for the attention mechanism is as follows:
wherein Q, K and V respectively represent query (query), key (key) and value (value), namely the association degree between query and key determines the weight of the current value. Since the present invention uses a self-attention mechanism, Q, K, V are generated from the same input, Q ═ HWQ,K=HWK,V=HWV。
The multi-head attention is to learn different hidden layer characteristics by using a plurality of self-attention modules, which is specifically defined as follows:
hi=Attention(HL-1Wi Q,HL-1Wi K,HL-1Wi V)
HL=[h1,h2,...,hn]Wh
wherein ,HLFor the hidden layer representation output of the L-th layer, each head can calculate the corresponding attention weight distribution and then generate a new parameter matrix, wherein W isi Q,Wi K,Wi VAre independent weight matrices and are not shared by each head. Finally, splicing the obtained n heads together through a weight matrix WhThe conversion yields a multi-headed attention output for the L layers.
The purpose of the feedforward network layer FFN is to enable a model to have nonlinear modeling capability, and a Gelu activation function is adopted.
Compared with Relu, the Gelu activation function introduces random regularization, and the convergence rate is improved to some extent. The activation expression is as follows:
Gelu(x)=xφ(x)
where phi (x) is the cumulative distribution function of a standard gaussian distribution,is a learned parameter and is shared among each transform.
In the normalization layer, the residual network is used to ensure the learning effect of the deep network parameters, and in combination with the multi-head attention layer MH and the feedforward network layer FFN, the overall flow of the Transformer is as follows:
ANL=LN(HL-1+MH(HL-1))
hL=LN(FFN(ANL)+ANL)
the whole long-term dependent coding unit is composed of a plurality of transformers, and parameters between layers are shared, so that the whole parameter quantity of the model is greatly reduced, and the possibility of model expansion is provided.
Combining input matrices ETThe output of the first layer can be expressed as:
h1=LN(FFN(AN0)+AN0)
=LN(FFN(LN(ET+MH(ET)))+LN(ET+MH(ET)))
through the processing of each coding layer, the output of the L layer is obtainedLet l be 100 here. Will output the vectorAdding to obtain the final long-term dependence of the sequenceL。
The GRU module responsible for short-term dependence learning is composed of a plurality of GRU units, each GRU unit is composed of an update gate and a reset gate rtAnd an update gate ztAll of which are in accordance with the hidden layer variable h at the previous momentt-1Input x with the current timetIn relation, the calculation formula is as follows:
rt=σ(Wrxt+Urht-1)
zt=σ(Wzxt+Uzht-1)
rtand ztSigmoid activation function, W, is adoptedr,Ur,Wz,UzAre the weight parameters that need to be learned. Candidate hidden layerDependent on the reset gate rtHidden layer variable ht-1Inputting xtThis layer is similar to the new information in long and short term memory, using rtDetermine how much historical information to keep, rtA value of 0 indicates that the history information is completely reset, new informationOnly with xtIn connection with this, the present invention is,the expression of (a) is as follows:
hidden layer variable h at current momenttBy updating the door ztHidden layer variable ht-1New informationIs calculated to obtain ztDetermines hidden layer variable ht-1Forgetting rate and new informationThe final expression of the retention ratio of (c) is:
the input of the GRU module is the encoding of k items at the tail end of the sequenceSelecting the output of the kth GRU unit as the short term dependency representation, i.e.
(3) A characteristic fusion layer:
the role of this layer is to transform and compress each of the resulting features into a denser embedded representation, first with the item class corresponding to the featureSequence feature h obtained from the transform layerLAnd output h of GRU moduleSSplicing into new vectors, and then converting the vectors into dimensions by adopting a multilayer perceptronIs dmIn which d is an embedded representation ofm128. The process is as follows:
wherein It is shown that the splicing operation is performed,are parameters that need to be learned.
(4) An output layer:
given user uiItem djAccording to ui、djAndu is obtained by calculationiAnd djThe interaction probability of (2). Here u is first introducediAnd djCorresponding embedded representationAndsplicing and combining are carried out to form a new representation, and then final output is obtained through MLP (multi-layer neural network)
Where f is the activation function Relu.
The structure of the proposed model designed as above is shown in fig. 2.
A22, model training and saving:
in the step, a square loss function is used as an optimization target function of the model, iterative training is carried out in a specified period, and the optimal model obtained by training is obtained and stored.
Specifically, since the interaction probability of the user and the item is either 1 or 0, the objective of the model is to fit the interaction probability of the user and the item in the original data as much as possible, and therefore, the square loss function is selected to be used as the optimization objective function of the model. The loss function can be expressed as:
wherein ,a tag that represents the data is provided,representing user uiAnd item djWhether it has interacted, if djPresent in uiIn the recording of (1), thenOtherwise, it is 0.λ represents a regularization coefficient for controlling the degree of parameter regularization. Φ represents the set of parameters that need to be regularized.
The training mode is random gradient descent, the optimizer adopts Adam, the learning rate of the transform module is set to 0.001, the learning rate of the GRU module is set to 0.01, and the regularization parameter Dropout is set to 0.5. The training period epoch is initialized to 100 times, and the model with the lowest loss is stored in the set period and used as the model for subsequent deployment.
B. Recommending according to the trained recommendation model:
the method comprises the following steps of utilizing a trained recommendation model to carry out actual application recommendation, and acquiring a user u to be recommended in specific applicationiHistory of (3) calculating a user u to be recommendediAnd interaction probability values between the items. The items are processed according to the sequence of the probability values from large to smallAnd sorting, selecting a certain number of items (such as 20 items) as a recommendation set and pushing the recommendation set to the user.
Claims (9)
1. A recommendation method based on a self-attention mechanism is characterized by comprising the following steps:
s1, training a recommendation model:
s11, collecting and preprocessing historical interaction information of a user to form a training sample set;
s12, designing a recommendation model, taking the training sample set as the input of the recommendation model, and adopting a square loss function as an optimization target to train the recommendation model;
s2, recommending according to the trained recommendation model:
and calculating through the trained recommendation model to obtain the interaction probability between the user and the item to be recommended, and sequencing according to the interaction probability to generate a recommendation candidate set of the user.
2. The self-attention mechanism-based recommendation method of claim 1,
step S11 specifically includes:
s111, acquiring a historical interaction record of a user and converting the historical interaction record into a user-item interaction matrix; items of the interaction matrix comprise user codes, item codes and item categories;
s112, carrying out 0 setting and filling on the item type information which is a null value;
s113, converting the user codes, the item codes and the item types into one-hot codes, and performing numerical compression processing;
s114, sequencing the interaction information of the users in the user-item interaction matrix according to a time sequence to obtain a historical interaction sequence of each user, wherein each item in the historical interaction sequence is represented by one-hot codes.
3. A recommendation method based on the self-attention mechanism as claimed in claim 1 or 2,
in step S12, the designed recommended model includes: the device comprises an input layer, a coding layer, a feature fusion layer and an output layer; the input layer is used for converting input data into a low-dimensional embedded representation; the coding layer is responsible for acquiring long-term and short-term dependence representation of a user historical interaction sequence; the feature fusion layer is used for fusing the sequence features and the item category features and converting the sequence features and the item category features into more dense features; and the output layer is responsible for generating a final result as the interaction probability of the user and the project by combining the features obtained by the user code embedding layer, the project code embedding layer and the feature fusion layer.
4. A recommendation method based on the self-attention mechanism as claimed in claim 3,
in step S12, the input layer is configured to convert the input data into a low-dimensional embedded representation, and includes:
the one-hot coding of the user code, the item category and the historical interaction sequence represents that a vector corresponding to one-hot index is obtained by looking up a table in a random initialization embedding table, so that the vector is converted into a low-dimensional embedding value:
The embedded representation of the item category isWherein T is the total number of item categories, let dc=32;
5. A recommendation method based on the self-attention mechanism as claimed in claim 3,
in step S12, the encoding layer is responsible for acquiring a long-term and short-term dependency representation of a user history interaction sequence, including: adopting a Transformer module as a long-term dependent learning module to express E by embedding historical interaction sequencesTAs input, to obtain a sequence feature hL;
The GRU module is adopted as a learning module of short-term dependence and is coded by k items at the tail end of the sequence in the history interactive sequenceSelecting the output of the kth GRU unit as a short-term dependency representation as input, and capturing the recent interest feature h of the userS。
6. A recommendation method based on the self-attention mechanism as claimed in claim 5,
the Transformer module comprises a multi-head self-attention layer, a feedforward network layer and a normalization layer; the multi-headed self-attention layer utilizes a plurality of self-attention modules to learn different hidden layer representations; the feedforward network layer adopts a Gelu activation function; the normalization layer employs a residual error network.
7. A recommendation method based on the self-attention mechanism as claimed in claim 5,
in step S12, the feature fusion layer is configured to fuse the sequence features and the item category features and convert the sequence features and the item category features into more dense features, and includes:
associating item categories with featuresSequence characteristics h obtained by the transform moduleLAnd output h of GRU moduleSSplicing the vectors into a new vector, and then adopting a multilayer perceptron to splice the vectorsConversion to d dimensionmIn which d is an embedded representation ofm128, the formula is as follows:
8. The self-attention mechanism-based recommendation method of claim 7,
in step S12, the output layer is responsible for generating a final result as an interaction probability between the user and the project by combining the features obtained by the user code embedding, the project code embedding and the feature fusion layer, and includes:
user uiCorresponding embedded representationItem djIs embedded in the representationAnd the concatenation vectorIs spliced and then the final output is obtained through MLP
9. A recommendation method based on the self-attention mechanism as claimed in claim 3,
in step S12, training the recommendation model by using the square loss function as the optimization target includes:
1) calculating a loss function:
wherein ,a tag that represents the data is provided,i.e. user uiAnd item djThe probability of interaction of (a) is,representing user uiAnd item djWhether it has interacted, if djPresent in uiIn the recording of (1), thenOtherwise, the value is 0; λ represents a regularization coefficient for controlling the degree of parameter regularization; Φ represents the set of parameters that need to be regularized.
2) And iterating the processing steps of the input layer, the coding layer, the feature fusion layer and the output layer in a random gradient descent mode until the training period is finished, and taking the model with the minimum loss as a trained recommendation model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111098120.6A CN113822742B (en) | 2021-09-18 | 2021-09-18 | Recommendation method based on self-attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111098120.6A CN113822742B (en) | 2021-09-18 | 2021-09-18 | Recommendation method based on self-attention mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113822742A true CN113822742A (en) | 2021-12-21 |
CN113822742B CN113822742B (en) | 2023-05-12 |
Family
ID=78914855
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111098120.6A Active CN113822742B (en) | 2021-09-18 | 2021-09-18 | Recommendation method based on self-attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113822742B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114493755A (en) * | 2021-12-28 | 2022-05-13 | 电子科技大学 | Self-attention sequence recommendation method fusing time sequence information |
CN114528434A (en) * | 2022-01-19 | 2022-05-24 | 华南理工大学 | IPTV live channel fusion recommendation method based on self-attention mechanism |
CN114615524A (en) * | 2022-02-18 | 2022-06-10 | 聚好看科技股份有限公司 | Server, training method of media asset recommendation network and media asset recommendation method |
CN114861783A (en) * | 2022-04-26 | 2022-08-05 | 北京三快在线科技有限公司 | Recommendation model training method and device, electronic equipment and storage medium |
CN114912984A (en) * | 2022-05-31 | 2022-08-16 | 重庆师范大学 | Self-attention-based time scoring context-aware recommendation method and system |
CN116521971A (en) * | 2022-01-19 | 2023-08-01 | 腾讯科技(深圳)有限公司 | Content recommendation method, apparatus, device, storage medium, and computer program product |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111429234A (en) * | 2020-04-16 | 2020-07-17 | 电子科技大学中山学院 | Deep learning-based commodity sequence recommendation method |
WO2021066903A1 (en) * | 2019-09-30 | 2021-04-08 | Microsoft Technology Licensing, Llc | Providing explainable product recommendation in a session |
CN112732936A (en) * | 2021-01-11 | 2021-04-30 | 电子科技大学 | Radio and television program recommendation method based on knowledge graph and user microscopic behaviors |
CN113268633A (en) * | 2021-06-25 | 2021-08-17 | 北京邮电大学 | Short video recommendation method |
-
2021
- 2021-09-18 CN CN202111098120.6A patent/CN113822742B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021066903A1 (en) * | 2019-09-30 | 2021-04-08 | Microsoft Technology Licensing, Llc | Providing explainable product recommendation in a session |
CN111429234A (en) * | 2020-04-16 | 2020-07-17 | 电子科技大学中山学院 | Deep learning-based commodity sequence recommendation method |
CN112732936A (en) * | 2021-01-11 | 2021-04-30 | 电子科技大学 | Radio and television program recommendation method based on knowledge graph and user microscopic behaviors |
CN113268633A (en) * | 2021-06-25 | 2021-08-17 | 北京邮电大学 | Short video recommendation method |
Non-Patent Citations (2)
Title |
---|
WEINAN LI; 等: "Session-based recommendation with temporal convolutional network to balance numerical gaps" * |
段超 等: "融合注意力机制的深度混合推荐算法" * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114493755A (en) * | 2021-12-28 | 2022-05-13 | 电子科技大学 | Self-attention sequence recommendation method fusing time sequence information |
CN114493755B (en) * | 2021-12-28 | 2022-10-14 | 电子科技大学 | Self-attention sequence recommendation method fusing time sequence information |
CN114528434A (en) * | 2022-01-19 | 2022-05-24 | 华南理工大学 | IPTV live channel fusion recommendation method based on self-attention mechanism |
CN116521971A (en) * | 2022-01-19 | 2023-08-01 | 腾讯科技(深圳)有限公司 | Content recommendation method, apparatus, device, storage medium, and computer program product |
CN114615524A (en) * | 2022-02-18 | 2022-06-10 | 聚好看科技股份有限公司 | Server, training method of media asset recommendation network and media asset recommendation method |
CN114615524B (en) * | 2022-02-18 | 2023-10-24 | 聚好看科技股份有限公司 | Training method of server and media asset recommendation network and media asset recommendation method |
CN114861783A (en) * | 2022-04-26 | 2022-08-05 | 北京三快在线科技有限公司 | Recommendation model training method and device, electronic equipment and storage medium |
CN114912984A (en) * | 2022-05-31 | 2022-08-16 | 重庆师范大学 | Self-attention-based time scoring context-aware recommendation method and system |
Also Published As
Publication number | Publication date |
---|---|
CN113822742B (en) | 2023-05-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113822742B (en) | Recommendation method based on self-attention mechanism | |
CN109299396B (en) | Convolutional neural network collaborative filtering recommendation method and system fusing attention model | |
CN112598462B (en) | Personalized recommendation method and system based on collaborative filtering and deep learning | |
CN110046656B (en) | Multi-mode scene recognition method based on deep learning | |
CN110083770B (en) | Sequence recommendation method based on deeper feature level self-attention network | |
CN111310063B (en) | Neural network-based article recommendation method for memory perception gated factorization machine | |
CN110276406B (en) | Expression classification method, apparatus, computer device and storage medium | |
CN111127165A (en) | Sequence recommendation method based on self-attention self-encoder | |
CN114693397B (en) | Attention neural network-based multi-view multi-mode commodity recommendation method | |
CN110659411B (en) | Personalized recommendation method based on neural attention self-encoder | |
CN112328900A (en) | Deep learning recommendation method integrating scoring matrix and comment text | |
CN112800344B (en) | Deep neural network-based movie recommendation method | |
CN114386534A (en) | Image augmentation model training method and image classification method based on variational self-encoder and countermeasure generation network | |
CN114896434B (en) | Hash code generation method and device based on center similarity learning | |
Tong et al. | Collaborative generative adversarial network for recommendation systems | |
CN114020964A (en) | Method for realizing video abstraction by using memory network and gated cyclic unit | |
CN114004220A (en) | Text emotion reason identification method based on CPC-ANN | |
CN113177141A (en) | Multi-label video hash retrieval method and device based on semantic embedded soft similarity | |
CN114549850A (en) | Multi-modal image aesthetic quality evaluation method for solving modal loss problem | |
CN114462420A (en) | False news detection method based on feature fusion model | |
CN114564651A (en) | Self-supervision recommendation method combined with contrast learning method | |
CN112860930A (en) | Text-to-commodity image retrieval method based on hierarchical similarity learning | |
CN113887836B (en) | Descriptive event prediction method integrating event environment information | |
CN115408603A (en) | Online question-answer community expert recommendation method based on multi-head self-attention mechanism | |
CN114139066A (en) | Collaborative filtering recommendation system based on graph neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |