CN113822742A - Recommendation method based on self-attention mechanism - Google Patents

Recommendation method based on self-attention mechanism Download PDF

Info

Publication number
CN113822742A
CN113822742A CN202111098120.6A CN202111098120A CN113822742A CN 113822742 A CN113822742 A CN 113822742A CN 202111098120 A CN202111098120 A CN 202111098120A CN 113822742 A CN113822742 A CN 113822742A
Authority
CN
China
Prior art keywords
user
layer
item
sequence
recommendation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111098120.6A
Other languages
Chinese (zh)
Other versions
CN113822742B (en
Inventor
田玲
闫科
康昭
惠孛
罗光春
张天舒
曾翰林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202111098120.6A priority Critical patent/CN113822742B/en
Publication of CN113822742A publication Critical patent/CN113822742A/en
Application granted granted Critical
Publication of CN113822742B publication Critical patent/CN113822742B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Accounting & Taxation (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Finance (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of article recommendation, and discloses a recommendation method based on a self-attention mechanism, which improves the training efficiency and the personalized recommendation effect of a recommendation model. The method comprises the following steps: firstly, collecting historical interaction information of a user and preprocessing the historical interaction information to form a training sample set; then designing a recommendation model, taking the training sample set as the input of the recommendation model, and adopting a square loss function as an optimization target to train the recommendation model; and finally, calculating through a trained recommendation model to obtain the interaction probability between the user and the item to be recommended, and sequencing according to the interaction probability to generate a recommendation candidate set of the user.

Description

Recommendation method based on self-attention mechanism
Technical Field
The invention relates to the field of article recommendation, in particular to a recommendation method based on a self-attention mechanism.
Background
In the information explosion era, the number of commodities is increased sharply, users are difficult to find interesting contents in a short time, and how to quickly and accurately generate personalized recommended contents by using the historical records of the users becomes a current research hotspot. The recommendation algorithm can help users to find out interesting contents, and the current mainstream method comprises the following steps: the collaborative filtering recommendation algorithm is based on a content recommendation algorithm and a deep learning model recommendation algorithm. The recommendation algorithm based on the deep learning model gradually becomes the most important method due to strong fitting capability and generalization capability.
In the aspect of a recommendation algorithm based on a deep learning model, Jin et al propose a neural perception recommendation method, capture a preference vector of a user by using a neural network, and further infer the user's preference degree for a project. The method considers the quality of commodities related to stores, and considers that the shopping consideration of the user is determined by both personal interest and the quality of the commodities. DeepCoNN adopts two parallel convolution modules, wherein one is focused on learning user behaviors by using comments written by users, the other network learns commodity attributes from the commodity comments, and the extracted features are input into convolution layers, maximum pooling layers and full connection layers of different kernels to obtain user representation XuAnd commodity representation YiAnd finally, calculating to obtain the expected rating value of the user on the commodity according to the two items.
The above research is mainly based on recommendation made by user historical information, the neural perception recommendation method proposed by Jin et al does not consider preference weights between users and various items, all interactions adopt the same weight value, and the lost information may influence the learning effect of the model. Meanwhile, the method needs to additionally calculate the overall quality score of the store through the feedback of the user, and the variation frequency of the score and the evaluation of the store is large, so that the training overhead of the model is increased. DeepCoNN adopts two feature extraction modules, each user needs to obtain the features of the items and the commodities from the user comments and the commodity comments through the two feature extraction modules, the calculation cost is high, and a large amount of text information learning possibly has negative influence on the recommendation effect.
In fact, the positions of the selected items appearing in the user records can reflect the interests of the users corresponding to the time nodes, and the research does not utilize the position information in the sequence, and meanwhile, the association degree between the items in the historical information is ignored.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: a recommendation method based on a self-attention mechanism is provided, and the training efficiency and the personalized recommendation effect of a recommendation model are improved.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a recommendation method based on a self-attention mechanism comprises the following steps:
s1, training a recommendation model:
s11, collecting and preprocessing historical interaction information of a user to form a training sample set;
s12, designing a recommendation model, taking the training sample set as the input of the recommendation model, and adopting a square loss function as an optimization target to train the recommendation model;
s2, recommending according to the trained recommendation model:
and calculating through the trained recommendation model to obtain the interaction probability between the user and the item to be recommended, and sequencing according to the interaction probability to generate a recommendation candidate set of the user.
As a further optimization, step S11 specifically includes:
s111, acquiring a historical interaction record of a user and converting the historical interaction record into a user-item interaction matrix; items of the interaction matrix comprise user codes, item codes and item categories;
s112, carrying out 0 setting and filling on the item type information which is a null value;
s113, converting the user codes, the item codes and the item types into one-hot codes, and performing numerical compression processing;
s114, sequencing the interaction information of the users in the user-item interaction matrix according to a time sequence to obtain a historical interaction sequence of each user, wherein each item in the historical interaction sequence is represented by one-hot codes.
As a further optimization, in step S12, the proposed model of the design includes: the device comprises an input layer, a coding layer, a feature fusion layer and an output layer; the input layer is used for converting input data into a low-dimensional embedded representation; the coding layer is responsible for acquiring long-term and short-term dependence representation of a user historical interaction sequence; the feature fusion layer is used for fusing the sequence features and the item category features and converting the sequence features and the item category features into more dense features; and the output layer is responsible for generating a final result as the interaction probability of the user and the project by combining the features obtained by the user code embedding layer, the project code embedding layer and the feature fusion layer.
As a further optimization, in step S12, the input layer is used to convert the input data into a low-dimensional embedded representation, and includes:
the one-hot coding of the user code, the item category and the historical interaction sequence represents that a vector corresponding to one-hot index is obtained by looking up a table in a random initialization embedding table, so that the vector is converted into a low-dimensional embedding value:
the embedded representation of the user code is
Figure BDA0003269788620000021
Wherein N is the total number of users, let du=128;
The embedded representation of the item code is
Figure BDA0003269788620000022
Wherein M is the total number of itemsLet dd=128,
The embedded representation of the item category is
Figure BDA0003269788620000023
Wherein T is the total number of item categories, let dc=32;
The embedded representation of the historical interaction sequence is:
Figure BDA0003269788620000024
wherein ,
Figure BDA0003269788620000025
only a maximum of l items are kept per sequence,
Figure BDA0003269788620000026
for item coding, piThe position of the item is encoded.
As a further optimization, in step S12, the encoding layer is responsible for obtaining a long-short term dependency representation of the user history interaction sequence, including:
adopting a Transformer module as a long-term dependent learning module to express E by embedding historical interaction sequencesTAs input, to obtain a sequence feature hL
The GRU module is adopted as a learning module of short-term dependence and is coded by k items at the tail end of the sequence in the history interactive sequence
Figure BDA0003269788620000031
Selecting the output of the kth GRU unit as a short-term dependency representation as input, and capturing the recent interest feature h of the userS
As a further optimization, the Transformer module comprises a multi-head self-attention layer, a feedforward network layer and a normalization layer; the multi-headed self-attention layer utilizes a plurality of self-attention modules to learn different hidden layer representations; the feedforward network layer adopts a Gelu activation function; the normalization layer employs a residual error network.
As a further optimization, in step S12, the feature fusion layer is used to fuse the sequence features and the item coding features and convert the sequence features into more dense features, and includes:
associating item categories with features
Figure BDA0003269788620000032
Sequence characteristics h obtained by the transform moduleLAnd output h of GRU moduleSSplicing the vectors into a new vector, and then adopting a multilayer perceptron to splice the vectors
Figure BDA0003269788620000033
Conversion to d dimensionmIn which d is an embedded representation ofm=128:
Figure BDA0003269788620000034
wherein
Figure BDA0003269788620000035
It is shown that the splicing operation is performed,
Figure BDA0003269788620000036
are parameters that need to be learned.
As a further optimization, in step S12, the output layer is responsible for generating final results as interaction probabilities between the user and the project by combining the features obtained by the user code embedding, the project code embedding and the feature fusion layer, including:
user uiCorresponding embedded representation
Figure BDA0003269788620000037
Item djIs embedded in the representation
Figure BDA0003269788620000038
And the concatenation vector
Figure BDA0003269788620000039
Is spliced and then the final output is obtained through MLP (multi-layer neural network)
Figure BDA00032697886200000310
Figure BDA00032697886200000311
wherein ,
Figure BDA00032697886200000312
i.e. user uiAnd item djF is the activation function Relu,
Figure BDA00032697886200000313
are parameters that need to be learned.
As a further optimization, in step S12, the recommendation model is trained by using a square-loss function as an optimization target, including:
1) calculating a loss function:
Figure BDA00032697886200000314
wherein ,
Figure BDA00032697886200000315
a tag that represents the data is provided,
Figure BDA00032697886200000316
i.e. user uiAnd item djThe probability of interaction of (a) is,
Figure BDA00032697886200000317
representing user uiAnd item djWhether it has interacted, if djPresent in uiIn the recording of (1), then
Figure BDA00032697886200000318
Otherwise, the value is 0; λ represents a regularization coefficient for controlling the degree of parameter regularization; phi denotes that regularization operation is requiredA parameter set;
2) and iterating the processing steps of the input layer, the coding layer, the feature fusion layer and the output layer in a random gradient descent mode until the training period is finished, and taking the model with the minimum loss as a trained recommendation model.
The invention has the beneficial effects that:
the image feature extraction with higher calculation cost is avoided, deep sequence features are captured from user history by using a Transformer module with high parallelism, the training efficiency is high, and the captured deep sequence features are combined with the project attributes, so that the characterization capability of model embedding can be enriched;
the model parameters are updated by adopting a multi-head attention mechanism, so that the learning capability of the model is effectively improved, the recommendation effect is more accurate, and the recommendation content is more in line with the user interest.
Drawings
FIG. 1 is a flow chart of a recommendation method based on a self-attention mechanism in an embodiment;
fig. 2 is a diagram of a recommendation model structure in the embodiment.
Detailed Description
The invention aims to provide a recommendation method based on a self-attention mechanism, which improves the training efficiency of a recommendation model and the personalized recommendation effect. Firstly, collecting historical interaction information of a user and preprocessing the historical interaction information to form a training sample set; then designing a recommendation model, taking the training sample set as the input of the recommendation model, and adopting a square loss function as an optimization target to train the recommendation model; and finally, calculating through a trained recommendation model to obtain the interaction probability between the user and the item to be recommended, and sequencing according to the interaction probability to generate a recommendation candidate set of the user.
Example (b):
as shown in fig. 1, the recommendation method based on the self-attention mechanism in this embodiment includes the following implementation steps:
A. training a recommendation model:
a1, collecting and preprocessing user historical interaction information to form a training sample set, and specifically comprising the following steps A11-A14:
a11, acquiring a historical interaction record of a user and converting the historical interaction record into a user-item interaction matrix; items of the interaction matrix comprise user codes, item codes and item categories; through conversion processing, the number u of each user is acquiredi∈{u1,u2,...,uN}, number of each item dj∈{d1,d2,...,dMAnd item djCorresponding category
Figure BDA0003269788620000041
Where N is the number of users, M is the number of items, and T is the number of categories of items.
A12, carrying out 0 setting and filling on the item type information which is null;
a13, converting the user code, the item code and the item type into one-hot codes and performing numerical compression processing; the processed numerical information is a continuous integer, and the serial numbers start from 0;
a14, sequencing the interaction information of the users in the user-item interaction matrix according to the time sequence to obtain each user uiHistorical interaction sequence of
Figure BDA0003269788620000051
L represents the length of the sequence, and each item in the historical interaction sequence is represented by one-hot coding.
A2, designing a recommendation model, taking a training sample set as an input of the recommendation model, and taking a square loss function as an optimization target to train the recommendation model, wherein the method specifically comprises the following steps of A21-A22:
a21, model construction:
in the step, a recommendation model based on a self-attention mechanism for commodity recommendation is designed, and the model consists of an input layer, a coding layer, a feature fusion layer and an output layer; the input layer is used for converting data represented by one-hot into low-dimensional embedded representation, the coding layer acquires long-term and short-term dependence representation of a user history sequence through a Transformer and a GRU module, the feature fusion layer fuses sequence features and item category features and converts the sequence features into denser features, and the output layer generates a final result as interaction probability of a user and an item by combining the features obtained by the user embedding, the item embedding and the feature fusion layer.
The 4 levels of the model are specified as follows:
(1) an input layer:
at the input level, the data represented by one-hot (including user code, item category) needs to be converted into embedded values of low dimension:
the embedded representation of the user code is
Figure BDA0003269788620000052
Where N is the total number of users, here let the embedding dimension du=128。
The embedded representation of the item code is
Figure BDA0003269788620000053
Where M is the total number of items, let d be the embedding dimensiond=128。
The embedded representation of the item category is
Figure BDA0003269788620000054
Where T is the number of item categories, here let the embedding dimension dc=32。
The sequence order of the items can reflect the change of user behaviors, but since the Transformer module designed in the coding layer does not carry the time sequence information like a recurrent neural network, additional sequence coding is needed to ensure the importance of the model to learn the position when processing the sequence information, and the position coding P is { P {1,p2,...,plAnd l is the maximum length of the sequence.
The data output of the input layer is: combining position codes with original input codes of items
Figure BDA0003269788620000055
wherein
Figure BDA0003269788620000056
Only a maximum of l entries are reserved per sequence.
(2) And (3) coding layer:
the combination of long-term and short-term dependence can learn the related information of short-term dependence on the basis of effectively acquiring the sequence characteristics, and effectively captures the recent interest characteristics of the user. The Transformer module has the advantage of long-term dependence capture, and the characteristic representation of the sequence is effectively obtained by learning terms and weights among the terms through a self-attention mechanism. In addition, the parallelism of the Transformer is high, and a serialization learning mode of a recurrent neural network is avoided.
Therefore, we choose a Transformer as the long-term dependent learning module at the coding layer. Although GRU (gated round robin unit) is not as effective in extracting long sequence features as the Transformer, the performance is almost consistent on short sequences, and the parameter amount of the GRU module is much smaller than that of the Transformer, so that the GRU module is more suitable for learning short-term dependence. Therefore, we choose the GRU module as the learning module for short term dependency at the coding level.
The Transformer consists of a multi-head attention layer, a feedforward network layer and a normalization layer. The key of the module is a multi-head attention layer, which is based on a self-attention mechanism and assists in learning the token vectors in different subspaces by a plurality of heads. The basic flow equation for the attention mechanism is as follows:
Figure BDA0003269788620000061
wherein Q, K and V respectively represent query (query), key (key) and value (value), namely the association degree between query and key determines the weight of the current value. Since the present invention uses a self-attention mechanism, Q, K, V are generated from the same input, Q ═ HWQ,K=HWK,V=HWV
The multi-head attention is to learn different hidden layer characteristics by using a plurality of self-attention modules, which is specifically defined as follows:
hi=Attention(HL-1Wi Q,HL-1Wi K,HL-1Wi V)
HL=[h1,h2,...,hn]Wh
wherein ,HLFor the hidden layer representation output of the L-th layer, each head can calculate the corresponding attention weight distribution and then generate a new parameter matrix, wherein W isi Q,Wi K,Wi VAre independent weight matrices and are not shared by each head. Finally, splicing the obtained n heads together through a weight matrix WhThe conversion yields a multi-headed attention output for the L layers.
The purpose of the feedforward network layer FFN is to enable a model to have nonlinear modeling capability, and a Gelu activation function is adopted.
Compared with Relu, the Gelu activation function introduces random regularization, and the convergence rate is improved to some extent. The activation expression is as follows:
Figure BDA0003269788620000063
Gelu(x)=xφ(x)
where phi (x) is the cumulative distribution function of a standard gaussian distribution,
Figure BDA0003269788620000062
is a learned parameter and is shared among each transform.
In the normalization layer, the residual network is used to ensure the learning effect of the deep network parameters, and in combination with the multi-head attention layer MH and the feedforward network layer FFN, the overall flow of the Transformer is as follows:
Figure BDA0003269788620000071
ANL=LN(HL-1+MH(HL-1))
hL=LN(FFN(ANL)+ANL)
the whole long-term dependent coding unit is composed of a plurality of transformers, and parameters between layers are shared, so that the whole parameter quantity of the model is greatly reduced, and the possibility of model expansion is provided.
Combining input matrices ETThe output of the first layer can be expressed as:
h1=LN(FFN(AN0)+AN0)
=LN(FFN(LN(ET+MH(ET)))+LN(ET+MH(ET)))
through the processing of each coding layer, the output of the L layer is obtained
Figure BDA0003269788620000072
Let l be 100 here. Will output the vector
Figure BDA0003269788620000073
Adding to obtain the final long-term dependence of the sequenceL
The GRU module responsible for short-term dependence learning is composed of a plurality of GRU units, each GRU unit is composed of an update gate and a reset gate rtAnd an update gate ztAll of which are in accordance with the hidden layer variable h at the previous momentt-1Input x with the current timetIn relation, the calculation formula is as follows:
rt=σ(Wrxt+Urht-1)
zt=σ(Wzxt+Uzht-1)
rtand ztSigmoid activation function, W, is adoptedr,Ur,Wz,UzAre the weight parameters that need to be learned. Candidate hidden layer
Figure BDA0003269788620000074
Dependent on the reset gate rtHidden layer variable ht-1Inputting xtThis layer is similar to the new information in long and short term memory, using rtDetermine how much historical information to keep, rtA value of 0 indicates that the history information is completely reset, new information
Figure BDA0003269788620000075
Only with xtIn connection with this, the present invention is,
Figure BDA0003269788620000076
the expression of (a) is as follows:
Figure BDA0003269788620000077
hidden layer variable h at current momenttBy updating the door ztHidden layer variable ht-1New information
Figure BDA0003269788620000078
Is calculated to obtain ztDetermines hidden layer variable ht-1Forgetting rate and new information
Figure BDA0003269788620000079
The final expression of the retention ratio of (c) is:
Figure BDA00032697886200000710
the input of the GRU module is the encoding of k items at the tail end of the sequence
Figure BDA00032697886200000711
Selecting the output of the kth GRU unit as the short term dependency representation, i.e.
Figure BDA00032697886200000712
(3) A characteristic fusion layer:
the role of this layer is to transform and compress each of the resulting features into a denser embedded representation, first with the item class corresponding to the feature
Figure BDA0003269788620000081
Sequence feature h obtained from the transform layerLAnd output h of GRU moduleSSplicing into new vectors, and then converting the vectors into dimensions by adopting a multilayer perceptronIs dmIn which d is an embedded representation ofm128. The process is as follows:
Figure BDA0003269788620000082
wherein
Figure BDA0003269788620000083
It is shown that the splicing operation is performed,
Figure BDA0003269788620000084
are parameters that need to be learned.
(4) An output layer:
given user uiItem djAccording to ui、djAnd
Figure BDA0003269788620000085
u is obtained by calculationiAnd djThe interaction probability of (2). Here u is first introducediAnd djCorresponding embedded representation
Figure BDA0003269788620000086
And
Figure BDA0003269788620000087
splicing and combining are carried out to form a new representation, and then final output is obtained through MLP (multi-layer neural network)
Figure BDA0003269788620000088
Figure BDA0003269788620000089
Where f is the activation function Relu.
The structure of the proposed model designed as above is shown in fig. 2.
A22, model training and saving:
in the step, a square loss function is used as an optimization target function of the model, iterative training is carried out in a specified period, and the optimal model obtained by training is obtained and stored.
Specifically, since the interaction probability of the user and the item is either 1 or 0, the objective of the model is to fit the interaction probability of the user and the item in the original data as much as possible, and therefore, the square loss function is selected to be used as the optimization objective function of the model. The loss function can be expressed as:
Figure BDA00032697886200000810
wherein ,
Figure BDA00032697886200000811
a tag that represents the data is provided,
Figure BDA00032697886200000812
representing user uiAnd item djWhether it has interacted, if djPresent in uiIn the recording of (1), then
Figure BDA00032697886200000813
Otherwise, it is 0.λ represents a regularization coefficient for controlling the degree of parameter regularization. Φ represents the set of parameters that need to be regularized.
The training mode is random gradient descent, the optimizer adopts Adam, the learning rate of the transform module is set to 0.001, the learning rate of the GRU module is set to 0.01, and the regularization parameter Dropout is set to 0.5. The training period epoch is initialized to 100 times, and the model with the lowest loss is stored in the set period and used as the model for subsequent deployment.
B. Recommending according to the trained recommendation model:
the method comprises the following steps of utilizing a trained recommendation model to carry out actual application recommendation, and acquiring a user u to be recommended in specific applicationiHistory of (3) calculating a user u to be recommendediAnd interaction probability values between the items. The items are processed according to the sequence of the probability values from large to smallAnd sorting, selecting a certain number of items (such as 20 items) as a recommendation set and pushing the recommendation set to the user.

Claims (9)

1. A recommendation method based on a self-attention mechanism is characterized by comprising the following steps:
s1, training a recommendation model:
s11, collecting and preprocessing historical interaction information of a user to form a training sample set;
s12, designing a recommendation model, taking the training sample set as the input of the recommendation model, and adopting a square loss function as an optimization target to train the recommendation model;
s2, recommending according to the trained recommendation model:
and calculating through the trained recommendation model to obtain the interaction probability between the user and the item to be recommended, and sequencing according to the interaction probability to generate a recommendation candidate set of the user.
2. The self-attention mechanism-based recommendation method of claim 1,
step S11 specifically includes:
s111, acquiring a historical interaction record of a user and converting the historical interaction record into a user-item interaction matrix; items of the interaction matrix comprise user codes, item codes and item categories;
s112, carrying out 0 setting and filling on the item type information which is a null value;
s113, converting the user codes, the item codes and the item types into one-hot codes, and performing numerical compression processing;
s114, sequencing the interaction information of the users in the user-item interaction matrix according to a time sequence to obtain a historical interaction sequence of each user, wherein each item in the historical interaction sequence is represented by one-hot codes.
3. A recommendation method based on the self-attention mechanism as claimed in claim 1 or 2,
in step S12, the designed recommended model includes: the device comprises an input layer, a coding layer, a feature fusion layer and an output layer; the input layer is used for converting input data into a low-dimensional embedded representation; the coding layer is responsible for acquiring long-term and short-term dependence representation of a user historical interaction sequence; the feature fusion layer is used for fusing the sequence features and the item category features and converting the sequence features and the item category features into more dense features; and the output layer is responsible for generating a final result as the interaction probability of the user and the project by combining the features obtained by the user code embedding layer, the project code embedding layer and the feature fusion layer.
4. A recommendation method based on the self-attention mechanism as claimed in claim 3,
in step S12, the input layer is configured to convert the input data into a low-dimensional embedded representation, and includes:
the one-hot coding of the user code, the item category and the historical interaction sequence represents that a vector corresponding to one-hot index is obtained by looking up a table in a random initialization embedding table, so that the vector is converted into a low-dimensional embedding value:
the embedded representation of the user code is
Figure FDA0003269788610000011
Wherein N is the total number of users, let du=128;
The embedded representation of the item code is
Figure FDA0003269788610000012
Wherein M is the total number of items, let dd=128,
The embedded representation of the item category is
Figure FDA0003269788610000013
Wherein T is the total number of item categories, let dc=32;
The embedded representation of the historical interaction sequence is:
Figure FDA0003269788610000014
wherein ,
Figure FDA0003269788610000015
only a maximum of l items are kept per sequence,
Figure FDA0003269788610000021
for item coding, piThe position of the item is encoded.
5. A recommendation method based on the self-attention mechanism as claimed in claim 3,
in step S12, the encoding layer is responsible for acquiring a long-term and short-term dependency representation of a user history interaction sequence, including: adopting a Transformer module as a long-term dependent learning module to express E by embedding historical interaction sequencesTAs input, to obtain a sequence feature hL
The GRU module is adopted as a learning module of short-term dependence and is coded by k items at the tail end of the sequence in the history interactive sequence
Figure FDA0003269788610000022
Selecting the output of the kth GRU unit as a short-term dependency representation as input, and capturing the recent interest feature h of the userS
6. A recommendation method based on the self-attention mechanism as claimed in claim 5,
the Transformer module comprises a multi-head self-attention layer, a feedforward network layer and a normalization layer; the multi-headed self-attention layer utilizes a plurality of self-attention modules to learn different hidden layer representations; the feedforward network layer adopts a Gelu activation function; the normalization layer employs a residual error network.
7. A recommendation method based on the self-attention mechanism as claimed in claim 5,
in step S12, the feature fusion layer is configured to fuse the sequence features and the item category features and convert the sequence features and the item category features into more dense features, and includes:
associating item categories with features
Figure FDA0003269788610000023
Sequence characteristics h obtained by the transform moduleLAnd output h of GRU moduleSSplicing the vectors into a new vector, and then adopting a multilayer perceptron to splice the vectors
Figure FDA0003269788610000024
Conversion to d dimensionmIn which d is an embedded representation ofm128, the formula is as follows:
Figure FDA0003269788610000025
wherein
Figure FDA0003269788610000026
It is shown that the splicing operation is performed,
Figure FDA0003269788610000027
are parameters that need to be learned.
8. The self-attention mechanism-based recommendation method of claim 7,
in step S12, the output layer is responsible for generating a final result as an interaction probability between the user and the project by combining the features obtained by the user code embedding, the project code embedding and the feature fusion layer, and includes:
user uiCorresponding embedded representation
Figure FDA0003269788610000028
Item djIs embedded in the representation
Figure FDA0003269788610000029
And the concatenation vector
Figure FDA00032697886100000210
Is spliced and then the final output is obtained through MLP
Figure FDA00032697886100000211
Figure FDA00032697886100000212
wherein ,
Figure FDA00032697886100000213
i.e. user uiAnd item djF is the activation function Relu,
Figure FDA00032697886100000214
are parameters that need to be learned.
9. A recommendation method based on the self-attention mechanism as claimed in claim 3,
in step S12, training the recommendation model by using the square loss function as the optimization target includes:
1) calculating a loss function:
Figure FDA0003269788610000031
wherein ,
Figure FDA0003269788610000032
a tag that represents the data is provided,
Figure FDA0003269788610000033
i.e. user uiAnd item djThe probability of interaction of (a) is,
Figure FDA0003269788610000034
representing user uiAnd item djWhether it has interacted, if djPresent in uiIn the recording of (1), then
Figure FDA0003269788610000035
Otherwise, the value is 0; λ represents a regularization coefficient for controlling the degree of parameter regularization; Φ represents the set of parameters that need to be regularized.
2) And iterating the processing steps of the input layer, the coding layer, the feature fusion layer and the output layer in a random gradient descent mode until the training period is finished, and taking the model with the minimum loss as a trained recommendation model.
CN202111098120.6A 2021-09-18 2021-09-18 Recommendation method based on self-attention mechanism Active CN113822742B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111098120.6A CN113822742B (en) 2021-09-18 2021-09-18 Recommendation method based on self-attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111098120.6A CN113822742B (en) 2021-09-18 2021-09-18 Recommendation method based on self-attention mechanism

Publications (2)

Publication Number Publication Date
CN113822742A true CN113822742A (en) 2021-12-21
CN113822742B CN113822742B (en) 2023-05-12

Family

ID=78914855

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111098120.6A Active CN113822742B (en) 2021-09-18 2021-09-18 Recommendation method based on self-attention mechanism

Country Status (1)

Country Link
CN (1) CN113822742B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114493755A (en) * 2021-12-28 2022-05-13 电子科技大学 Self-attention sequence recommendation method fusing time sequence information
CN114528434A (en) * 2022-01-19 2022-05-24 华南理工大学 IPTV live channel fusion recommendation method based on self-attention mechanism
CN114615524A (en) * 2022-02-18 2022-06-10 聚好看科技股份有限公司 Server, training method of media asset recommendation network and media asset recommendation method
CN114861783A (en) * 2022-04-26 2022-08-05 北京三快在线科技有限公司 Recommendation model training method and device, electronic equipment and storage medium
CN114912984A (en) * 2022-05-31 2022-08-16 重庆师范大学 Self-attention-based time scoring context-aware recommendation method and system
CN116521971A (en) * 2022-01-19 2023-08-01 腾讯科技(深圳)有限公司 Content recommendation method, apparatus, device, storage medium, and computer program product

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111429234A (en) * 2020-04-16 2020-07-17 电子科技大学中山学院 Deep learning-based commodity sequence recommendation method
WO2021066903A1 (en) * 2019-09-30 2021-04-08 Microsoft Technology Licensing, Llc Providing explainable product recommendation in a session
CN112732936A (en) * 2021-01-11 2021-04-30 电子科技大学 Radio and television program recommendation method based on knowledge graph and user microscopic behaviors
CN113268633A (en) * 2021-06-25 2021-08-17 北京邮电大学 Short video recommendation method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021066903A1 (en) * 2019-09-30 2021-04-08 Microsoft Technology Licensing, Llc Providing explainable product recommendation in a session
CN111429234A (en) * 2020-04-16 2020-07-17 电子科技大学中山学院 Deep learning-based commodity sequence recommendation method
CN112732936A (en) * 2021-01-11 2021-04-30 电子科技大学 Radio and television program recommendation method based on knowledge graph and user microscopic behaviors
CN113268633A (en) * 2021-06-25 2021-08-17 北京邮电大学 Short video recommendation method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WEINAN LI; 等: "Session-based recommendation with temporal convolutional network to balance numerical gaps" *
段超 等: "融合注意力机制的深度混合推荐算法" *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114493755A (en) * 2021-12-28 2022-05-13 电子科技大学 Self-attention sequence recommendation method fusing time sequence information
CN114493755B (en) * 2021-12-28 2022-10-14 电子科技大学 Self-attention sequence recommendation method fusing time sequence information
CN114528434A (en) * 2022-01-19 2022-05-24 华南理工大学 IPTV live channel fusion recommendation method based on self-attention mechanism
CN116521971A (en) * 2022-01-19 2023-08-01 腾讯科技(深圳)有限公司 Content recommendation method, apparatus, device, storage medium, and computer program product
CN114615524A (en) * 2022-02-18 2022-06-10 聚好看科技股份有限公司 Server, training method of media asset recommendation network and media asset recommendation method
CN114615524B (en) * 2022-02-18 2023-10-24 聚好看科技股份有限公司 Training method of server and media asset recommendation network and media asset recommendation method
CN114861783A (en) * 2022-04-26 2022-08-05 北京三快在线科技有限公司 Recommendation model training method and device, electronic equipment and storage medium
CN114912984A (en) * 2022-05-31 2022-08-16 重庆师范大学 Self-attention-based time scoring context-aware recommendation method and system

Also Published As

Publication number Publication date
CN113822742B (en) 2023-05-12

Similar Documents

Publication Publication Date Title
CN113822742B (en) Recommendation method based on self-attention mechanism
CN109299396B (en) Convolutional neural network collaborative filtering recommendation method and system fusing attention model
CN112598462B (en) Personalized recommendation method and system based on collaborative filtering and deep learning
CN110046656B (en) Multi-mode scene recognition method based on deep learning
CN110083770B (en) Sequence recommendation method based on deeper feature level self-attention network
CN111310063B (en) Neural network-based article recommendation method for memory perception gated factorization machine
CN110276406B (en) Expression classification method, apparatus, computer device and storage medium
CN111127165A (en) Sequence recommendation method based on self-attention self-encoder
CN114693397B (en) Attention neural network-based multi-view multi-mode commodity recommendation method
CN110659411B (en) Personalized recommendation method based on neural attention self-encoder
CN112328900A (en) Deep learning recommendation method integrating scoring matrix and comment text
CN112800344B (en) Deep neural network-based movie recommendation method
CN114386534A (en) Image augmentation model training method and image classification method based on variational self-encoder and countermeasure generation network
CN114896434B (en) Hash code generation method and device based on center similarity learning
Tong et al. Collaborative generative adversarial network for recommendation systems
CN114020964A (en) Method for realizing video abstraction by using memory network and gated cyclic unit
CN114004220A (en) Text emotion reason identification method based on CPC-ANN
CN113177141A (en) Multi-label video hash retrieval method and device based on semantic embedded soft similarity
CN114549850A (en) Multi-modal image aesthetic quality evaluation method for solving modal loss problem
CN114462420A (en) False news detection method based on feature fusion model
CN114564651A (en) Self-supervision recommendation method combined with contrast learning method
CN112860930A (en) Text-to-commodity image retrieval method based on hierarchical similarity learning
CN113887836B (en) Descriptive event prediction method integrating event environment information
CN115408603A (en) Online question-answer community expert recommendation method based on multi-head self-attention mechanism
CN114139066A (en) Collaborative filtering recommendation system based on graph neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant