CN111062775B - Recommendation system recall method based on attention mechanism - Google Patents

Recommendation system recall method based on attention mechanism Download PDF

Info

Publication number
CN111062775B
CN111062775B CN201911222216.1A CN201911222216A CN111062775B CN 111062775 B CN111062775 B CN 111062775B CN 201911222216 A CN201911222216 A CN 201911222216A CN 111062775 B CN111062775 B CN 111062775B
Authority
CN
China
Prior art keywords
user
commodity
vector
layer
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911222216.1A
Other languages
Chinese (zh)
Other versions
CN111062775A (en
Inventor
郑子彬
李威琪
周晓聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201911222216.1A priority Critical patent/CN111062775B/en
Publication of CN111062775A publication Critical patent/CN111062775A/en
Application granted granted Critical
Publication of CN111062775B publication Critical patent/CN111062775B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a recommendation system recall method based on an attention mechanism, which comprises the following steps: extracting user features and commodity features in the training samples, converting the user features into user embedded vectors, and converting the commodity features into commodity embedded vectors; inputting the user embedded vector and the commodity embedded vector into an attention mechanism model for training, learning the weight of each feature through an attention network in the model, and carrying out weighted summation on the embedded vectors of all the features according to the weight to obtain a user characterization vector and a commodity characterization vector; calculating the inner product of the user characterization vector and the commodity characterization vector to obtain the matching degree of the user purchasing commodity will of the training sample, establishing a cross entropy loss function of the matching degree of the user purchasing commodity will, calculating the minimized cross entropy loss function, and converging the attention mechanism model; and inputting the sample to be tested into the converged attention mechanism model, acquiring the matching degree of the user purchasing commodity will of the sample to be tested, and selecting the commodity with the matching degree of the user purchasing commodity will in a preset interval as a recall result to be recommended. The invention enhances generalization and greatly simplifies the calculation amount of recall recommendation.

Description

Recommendation system recall method based on attention mechanism
Technical Field
The invention relates to the field of computer recommendation systems, in particular to a recommendation system recall method based on an attention mechanism.
Background
With the improvement of living standard, the choices of consumption are more and more, the goods from three families to tens of thousands of families today, and many times, it takes much time to find the goods needed by us, and even if we want, it is not necessarily the best fit for us. The recommendation system can help us find the relevant commodities searched by us from the commodity pile and recommend the commodity which is most suitable for us to us. Recommendation systems are now very widely used and ubiquitous in life. When the user purchases the product on line, the user wants to recommend the product which the user wants to purchase, when the user listens to music, he wants to hear songs which are suitable for the user's own taste, and when the user searches things, he wants to find the search result. Thus, fast and accurate prediction of user preferences is a primary goal of recommendation systems.
And recall, as one of the phases of the recommendation system, needs to finish selecting hundreds or tens of related commodities from a large number of commodities, then sends the commodities into a sorting model, and is different from the sorting requirement with high precision, recall is equivalent to rough sorting, and the recall does not need to have high precision, but needs to quickly select commodities related to our search from a large number of candidate commodities.
Recall of the recommendation system is based on collaborative filtering at the earliest, but the collaborative filtering method has a cold start problem because of modeling by using the IDs of the users and the commodities, and is equivalent to using only the IDs as their unique features, so that important information such as other attributes of the users and the commodities cannot be effectively utilized. Based on feature modeling, we think of the simplest LR model, which is easy to implement, but ignores the feature combination problem.
The FM model is widely used once because he considers the feature combination problem, learns a weight for each feature combination, and makes the idea of feature vectorization widely used in various deep learning models. However, the FM model is only a low-order crossover between two features, failing to consider higher-order feature crossover.
In recent years, with the development of deep learning, many models based on deep learning have been applied to recommendation systems. Based on a deep learning recommendation model, sigmoid, tanh and other activation functions are added, nonlinear change is provided, and the multi-layer neural network is an implicit multi-order feature crossover. In recent years, deep FM model is proposed, and the low-order cross and the high-order cross of the feature combination are combined together, so that the effect is remarkable. Considering that the higher order crossover of the neural networK is implicit, the interpretability is not high, google proposes a (Deep & Cross networK k) DCN model, which combines explicit and implicit feature crossover combinations. Considering that these feature crossings are all on an element level, microsoft has also proposed an xDeepFM model, the study being directed to feature crossing combinations in the vector dimension. As can be seen, feature combinations are an important part of the model, but these models are complex, generally applied in the ranking stage, and less applied in the recall stage.
Attention mechanisms, which are initially applied to natural language processing, are capable of selectively extracting important information in long sentences and focusing attention on these important information while ignoring non-important information. And the attention mechanism can be different for different samples, as well as the distribution of attention. This feature is suitable for most fields, and thus is widely used. But there is currently less research on applying attention mechanisms to recommendation system models.
Disclosure of Invention
The main purpose of the invention is to provide a recommendation system recall method based on an attention mechanism, aiming at overcoming the problems.
In order to achieve the above object, the present invention provides a recall method of a recommendation system based on an attention mechanism, comprising the following steps:
s10, extracting user features and commodity features in the training samples, converting the user features into user embedded vectors, and converting the commodity features into commodity embedded vectors;
s20, inputting the user embedded vector and the commodity embedded vector into an attention mechanism model for training, learning the weight of each feature through an attention network in the model, and carrying out weighted summation on the embedded vectors of all the features according to the weight to obtain a user characterization vector and a commodity characterization vector; calculating the inner product of the user characterization vector and the commodity characterization vector to obtain the matching degree of the user purchasing commodity will of the training sample, establishing a cross entropy loss function of the matching degree of the user purchasing commodity will, calculating the minimized cross entropy loss function, and converging the attention mechanism model;
S30, inputting the sample to be tested into the converged attention mechanism model, obtaining the matching degree of the user purchasing commodity intention of the sample to be tested, and selecting the commodity of which the matching degree of the user purchasing commodity intention is in a preset interval as a recall result to be recommended.
Preferably, the attention mechanism model is a bidirectional attention mechanism model, the bidirectional attention mechanism model comprises a multi-layer user attention network and a multi-layer commodity attention network, each layer of user attention network comprises a two-layer feedforward neural network FNN and a normalized layer Softmax, each layer of commodity attention network comprises a two-layer feedforward neural network FNN and a normalized layer Softmax, and the user attention network and the commodity attention network are in a layer-by-layer recursive relationship.
Preferably, more than one of the steps S20The layer user attention network comprises a K layer user attention network in which the user characterization vector u (k) Given by the formula:
Figure BDA0002301164230000021
Figure BDA0002301164230000022
wherein the superscript K (K-1) of all variables is the attention network of the K (K-1) th layer, U_attribute is the attention network of the user, each layer of network is the same, the specific operation process of the network structure consists of the following several formulas, and the input of the network is the embedded vector of the user characteristics
Figure BDA0002301164230000023
And the output of the previous layer->
Figure BDA0002301164230000024
The output of the network characterizes the vector u for the user of the layer (k) ,m (k) Is a storage vector for storing the summation of characterization vectors obtained by the previous K-layer network, after the input is obtained, the attention network firstly normalizes through the FNN and softmax layers of the two layers of feedforward neural network to obtain attention weight
Figure BDA0002301164230000025
The weight vector is utilized to carry out weighted average on T user feature vectors to obtain a characterization vector u of the layer (k)
At the K-th layer, for t=1, 2,3, …, T, the weight of the user's T-th embedded vector at that layer is first found
Figure BDA0002301164230000026
Figure BDA0002301164230000027
Figure BDA0002301164230000028
Figure BDA0002301164230000029
wherein ,
Figure BDA00023011642300000210
are all network parameter matrices, < >>
Figure BDA00023011642300000211
Embedded vector u representing user t-th feature in a k-th layer user attention network t For the parameter matrix of the input neural network, +.>
Figure BDA00023011642300000212
Storage vector representing output of a layer above in a layer k user attention network +.>
Figure BDA00023011642300000213
For the parameter matrix of the input neural network, +.>
Figure BDA00023011642300000214
Representing the hidden layer variable +.>
Figure BDA00023011642300000215
For the parameter matrix of the input neural network, +.>
Figure BDA00023011642300000216
For the hidden layer vector obtained by the user's t-th feature, tanh is the activation function, and as a custom vector multiplication, i.e., two vectors of the same length and elements in the same position are multiplied to obtain a new vector, " >
Figure BDA00023011642300000217
By matrix multiplication with a matrix with a number of rows 1, a value +.>
Figure BDA00023011642300000218
Then obtaining the weight of the surface sheet vector of the K layer of the final user through softmax conversion>
Figure BDA00023011642300000219
e is a natural constant;
then according to
Figure BDA00023011642300000220
Calculating the weighted sum of the embedded vectors of the user to obtain a characterization vector u of the K-th layer of the user (k)
Figure BDA00023011642300000221
Preferably, the multi-layer commodity attention network in S20 includes a K-layer commodity attention network, in which the commodity characterization vector v (k) Given by the formula:
Figure BDA00023011642300000222
Figure BDA00023011642300000223
wherein V_attribute represents commodity attention network, the network structure is the same as user attention network, and the weight of nth embedded vector of each commodity is obtained first
Figure BDA00023011642300000224
Then, according to the weight, the weighted sum of all commodity embedded vectors is obtained to obtain a commodity characterization vector v of the K-th layer (k)
At the K-th layer, for n=1, 2,3, …, N,the weight of the nth embedded vector of the commodity of the layer is obtained first
Figure BDA0002301164230000031
Figure BDA0002301164230000032
Figure BDA0002301164230000033
Figure BDA0002301164230000034
wherein ,
Figure BDA0002301164230000035
is a parameter matrix of the commodity attention network, +.>
Figure BDA0002301164230000036
Embedding vector v representing nth characteristic of commodity in kth layer commodity attention network n For the parameter matrix of the input neural network, +.>
Figure BDA0002301164230000037
Storage vector representing output of a layer above in a layer k commodity attention network>
Figure BDA0002301164230000038
For the parameter matrix of the input neural network, +. >
Figure BDA0002301164230000039
Representing the hidden layer variable +.>
Figure BDA00023011642300000310
For the parameter matrix of the input neural network, +.>
Figure BDA00023011642300000311
For the hidden layer vector obtained with the nth characteristic of the commodity, a value +.>
Figure BDA00023011642300000312
Then obtaining the characterization vector weight of the K layer of the final commodity through softmax conversion>
Figure BDA00023011642300000313
Then according to
Figure BDA00023011642300000314
Calculating the weighted sum of the commodity embedded vectors to obtain a characterization vector v of a K-th layer of the commodity (k)
Figure BDA00023011642300000315
Preferably, in the step S20, the method for calculating the inner product of the user characterization vector and the commodity characterization vector to obtain the matching degree of the user' S intention of purchasing the commodity of the training sample specifically includes:
multi-layer user attention network stitching and combining token vectors u for user attention networks of all layers (k) Obtaining a final user characterization vector z u =[u (0) ;…;u (K) ];
Multi-layer commodity attention network splicing and combining characterization vector v of commodity attention network of all layers (k) Obtaining a final commodity representation vector z v =[v (0) ;…;v (k) ];
Calculating the final user token vector z u And a final merchandise characterization vector z v And obtaining the matching degree of the final user's willingness to purchase goods.
Preferably, the cross entropy loss function of the matching degree of the willingness of the user to purchase goods is specifically:
Figure BDA00023011642300000316
wherein m is the number of samples, y i For sample labels, with click behavior treated as positive sample, labeled 1, no click behavior treated as negative sample, labeled 0, for each user, make up with each item clicked<u,v + >Considered as a positive sample pair; click-through commodity composition<u,v - >Consider as a negative pair of samples, model training by minimizing the loss function L, i.e., continually narrowing the distance between positive samples and expanding the distance between negative pairs.
Preferably, in S10, specifically:
dividing user data and commodity data from training samples, and processing the user data into sparse user vectors
Figure BDA00023011642300000317
T is the total number of user features, T is the current user feature, and u represents the user; processing commodity data into sparse commodity vector +.>
Figure BDA00023011642300000318
N is the total number of commodity features, N is the current commodity features, and v represents commodity;
classifying training sample data into category type features and continuous type features according to the attribute of the data, and adopting a single-heat coding vector x if the training sample data is the category type features i ,x i The vector length of the training sample is taken as the sum of the numbers of all the features of the current training sample, the value of the class feature value is 1, the other is 0, and a feature dictionary is established for the position sequence number of the class feature value in the vector; if the feature is continuous, the sum of the numbers of all the features of the current training sample is taken as a vector length, the value of the continuous feature is taken as a feature value of the vector, and the other feature values are 0, so that the feature value is coded into a sparse vector.
Preferably, the attention mechanism model is a representation type learning model
Preferably, the vector lengths of the user attention network and the commodity attention network are equal.
Preferably, the training sample data is collected from a click rate estimation CTR model.
The invention provides a recall method of a recommendation system based on attention, which comprises the basic processes of converting the characteristics of a user and a commodity into embedded vectors, searching important characteristic combinations through an attention mechanism network, weighting and summing the embedded vectors of all the characteristics to obtain respective characterization vectors of the user and the commodity in a space, and finally calculating the matching degree according to the distance between the user and the commodity in the vector space. The bright point of the invention is:
first, the method proposed by the present invention is a deep learning model. And is at the feature level because the model will first turn the feature vector into a low-dimensional dense embedded vector, the model output being equivalent to a weighted sum of all feature vectors. On one hand, the model can automatically learn feature combination intersection without manually doing feature engineering. On the other hand, most of the existing recall models are tree models or simple discrimination models, because the deep learning models are complex, because the existing several mainstream models, such as DCN, deep fm and the like, all need to calculate the cross combinations of the elements of all feature vectors, the calculated amount is large, and on the feature level, the combination number is small, the calculated amount is small, so the feasibility of applying the deep learning model to recall is large.
Second, the present invention innovatively applies an attention mechanism to feature combinations, finds important feature combinations for each sample through the attention mechanism, and ignores many unimportant feature combinations. The model takes the embedded vectors of the features as input, a group of attention weights are obtained through neural network learning, each feature corresponds to one weight, finally, the final vector is obtained by calculating the weighted sum of all the features according to the weights, and the combination of the feature vectors with different degrees is realized by giving different weights to each feature. The attention mechanism model combines deep learning, attention mechanism and feature engineering, and has great advantages.
Finally, the model in the invention is a representation learning model, and has strong generalization. The model can learn the characterization vector of the user and the commodity in the same space, so that the model can be applied to various different downstream tasks. In the invention, the downstream task of the model is a recall task, and an end-to-end model can be trained. In addition, the model can be divided into two parts, namely a user model and a commodity model, and the two models are simultaneously learned, so that the model is a model of a bidirectional attention mechanism. In the prediction stage, the characterization vectors of all commodities can be predicted independently, the characterization vectors can be stored, then the characterization vectors of each user are predicted, and the first M commodities closest to each other in the vector space are found in the stored characterization vectors of all shops, so that a large number of computations can be reduced by independent prediction, and the characterization vectors of the commodities are prevented from being repeatedly computed.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of the overall structure of an embodiment of the bidirectional attention model when k=2;
fig. 2 is a diagram showing a user attention network structure of a kth layer when t=3 according to the present invention;
fig. 3 is a diagram showing a commodity attention network structure of a kth layer when n=3 according to the present invention;
the achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that, if directional indications (such as up, down, left, right, front, and rear … …) are included in the embodiments of the present invention, the directional indications are merely used to explain the relative positional relationship, movement conditions, etc. between the components in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indications are correspondingly changed.
In addition, if there is a description of "first", "second", etc. in the embodiments of the present invention, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.
1-3, the recommendation system recall method based on the attention mechanism provided by the invention comprises the following steps:
S10, extracting user features and commodity features in the training samples, converting the user features into user embedded vectors, and converting the commodity features into commodity embedded vectors;
s20, inputting the user embedded vector and the commodity embedded vector into an attention mechanism model for training, learning the weight of each feature through an attention network in the model, and carrying out weighted summation on the embedded vectors of all the features according to the weight to obtain a user characterization vector and a commodity characterization vector; calculating the inner product of the user characterization vector and the commodity characterization vector to obtain the matching degree of the user purchasing commodity will of the training sample, establishing a cross entropy loss function of the matching degree of the user purchasing commodity will, calculating the minimized cross entropy loss function, and converging the attention mechanism model;
s30, inputting the sample to be tested into the converged attention mechanism model, obtaining the matching degree of the user purchasing commodity intention of the sample to be tested, and selecting the commodity of which the matching degree of the user purchasing commodity intention is in a preset interval as a recall result to be recommended.
Preferably, the attention mechanism model is a bidirectional attention mechanism model, the bidirectional attention mechanism model comprises a multi-layer user attention network and a multi-layer commodity attention network, each layer of user attention network comprises a two-layer feedforward neural network FNN and a normalized layer Softmax, each layer of commodity attention network comprises a two-layer feedforward neural network FNN and a normalized layer Softmax, and the user attention network and the commodity attention network are in a layer-by-layer recursive relationship.
Preferably, the multi-layer user attention network in S20 includes a K-layer user attention network, in which the user characterization vector u (k) Given by the formula:
Figure BDA0002301164230000051
Figure BDA0002301164230000052
wherein the superscript K (K-1) of all variables is the attention network of the K (K-1) th layer, U_attribute is the attention network of the user, each layer of network is the same, the specific operation process of the network structure consists of the following several formulas, and the input of the network is the embedded vector of the user characteristics
Figure BDA0002301164230000053
And the output of the previous layer->
Figure BDA0002301164230000054
The output of the network characterizes the vector u for the user of the layer (k) ,m (k) Is a storage vector which stores the accumulated sum of characterization vectors obtained by the previous K-layer network, and after input is obtained, the attention network firstlyThe attention weight is obtained by normalizing the FNN and softmax layers of the two layers of feedforward neural network
Figure BDA0002301164230000055
The weight vector is utilized to carry out weighted average on T user feature vectors to obtain a characterization vector u of the layer (k)
At the K-th layer, for t=1, 2,3, …, T, the weight of the user's T-th embedded vector at that layer is first found
Figure BDA0002301164230000056
Figure BDA0002301164230000057
Figure BDA0002301164230000058
Figure BDA0002301164230000059
wherein ,
Figure BDA00023011642300000510
is a network parameter matrix, < >>
Figure BDA00023011642300000511
Embedded vector u representing user t-th feature in a k-th layer user attention network t For the parameter matrix of the input neural network, +.>
Figure BDA00023011642300000512
Storage vector representing output of a layer above in a layer k user attention network +. >
Figure BDA00023011642300000513
For the parameter matrix of the input neural network, +.>
Figure BDA00023011642300000514
Representing the hidden layer variable +.>
Figure BDA00023011642300000515
For the parameter matrix of the input neural network, +.>
Figure BDA00023011642300000516
For the hidden layer vector obtained by the user's t-th feature, tanh is the activation function, and as a custom vector multiplication, i.e., two vectors of the same length and elements in the same position are multiplied to obtain a new vector, ">
Figure BDA00023011642300000517
By matrix multiplication with a matrix of row 1, a value is obtained
Figure BDA00023011642300000518
Then obtaining the final K-layer surface sheet vector weight of commodity by softmax conversion>
Figure BDA00023011642300000519
e is a natural constant;
then according to
Figure BDA00023011642300000520
Calculating the weighted sum of the embedded vectors of the user to obtain a characterization vector u of the K-th layer of the user (k)
Figure BDA00023011642300000521
Preferably, the multi-layer commodity attention network in S20 includes a K-layer commodity attention network, in which the commodity characterization vector v (k) Given by the formula:
Figure BDA0002301164230000061
Figure BDA0002301164230000062
wherein V_attribute represents commodity attention network, the network structure is the same as user attention network, and the weight of nth embedded vector of each commodity is obtained first
Figure BDA0002301164230000063
Then, according to the weight, the weighted sum of all commodity embedded vectors is obtained to obtain a commodity characterization vector v of the K-th layer (k)
In the K-th layer, for n=1, 2,3, …, N, the weight of the N-th embedded vector of the product of the layer is first obtained
Figure BDA0002301164230000064
Figure BDA0002301164230000065
Figure BDA0002301164230000066
Figure BDA0002301164230000067
wherein ,
Figure BDA0002301164230000068
is a parameter matrix of the commodity attention network, +.>
Figure BDA0002301164230000069
Embedding vector v representing nth characteristic of commodity in kth layer commodity attention network n For the parameter matrix of the input neural network, +.>
Figure BDA00023011642300000610
Storage vector representing output of a layer above in a layer k commodity attention network>
Figure BDA00023011642300000611
For the parameter matrix of the input neural network, +.>
Figure BDA00023011642300000612
Representing the hidden layer variable +.>
Figure BDA00023011642300000613
For the parameter matrix of the input neural network, +.>
Figure BDA00023011642300000614
For the hidden layer vector obtained with the nth characteristic of the commodity, a value +.>
Figure BDA00023011642300000615
Then obtaining the characterization vector weight of the K layer of the final commodity through softmax conversion>
Figure BDA00023011642300000616
Then according to
Figure BDA00023011642300000617
Calculating the weighted sum of the commodity embedded vectors to obtain a characterization vector v of a K-th layer of the commodity (k)
Figure BDA00023011642300000618
Preferably, in the step S20, the method for calculating the inner product of the user characterization vector and the commodity characterization vector to obtain the matching degree of the user' S intention of purchasing the commodity of the training sample specifically includes:
multi-layer user attention network stitching and combining token vectors u for user attention networks of all layers (k) Obtaining a final user characterization vector z u =[u (0) ;…;u (K) ];
Multi-layer commodity attention network splicing and combining characterization vector v of commodity attention network of all layers (k) Obtaining a final commodity representation vector z v =[v (0) ;…;v (K) ];
Calculating the final user token vector z u And a final merchandise characterization vector z v And obtaining the matching degree of the final user's willingness to purchase goods.
Preferably, the cross entropy loss function of the matching degree of the willingness of the user to purchase goods is specifically:
Figure BDA00023011642300000619
wherein m is the number of samples, y i For sample labels, with click behavior treated as positive sample, labeled 1, no click behavior treated as negative sample, labeled 0, for each user, make up with each item clicked<u,v + >Considered as a positive sample pair; click-through commodity composition<u,v - >Consider as a negative pair of samples, model training by minimizing the loss function L, i.e., continually narrowing the distance between positive samples and expanding the distance between negative pairs.
Preferably, in S10, specifically:
dividing user data and commodity data from training samples, and processing the user data into sparse user vectors
Figure BDA00023011642300000620
T is the total number of user features, T is the current user feature, and u represents the user; processing commodity data into sparse commodity vector +.>
Figure BDA00023011642300000621
N is the total number of commodity features, N is the current commodity features, and v represents commodity;
based on the attribute of the dataThe training sample data is divided into a category type feature and a continuous type feature, and if the training sample data is the category type feature, the single thermal coding vector x is adopted i ,x i The vector length of the training sample is taken as the sum of the numbers of all the features of the current training sample, the value of the class feature value is 1, the other is 0, and a feature dictionary is established for the position sequence number of the class feature value in the vector; if the feature is continuous, the sum of the numbers of all the features of the current training sample is taken as a vector length, the value of the continuous feature is taken as a feature value of the vector, and the other feature values are 0, so that the feature value is coded into a sparse vector.
Preferably, the attention mechanism model is a representation type learning model
Preferably, the vector lengths of the user attention network and the commodity attention network are equal.
Preferably, the training sample data is collected from a click rate estimation CTR model.
Actual operation example:
the data of the CTR model is collected in a similar click rate estimation mode, the characteristics of each sample can be divided into two parts, one part is the characteristics of a user, such as gender, age and the like, the other part is the characteristics of goods, such as category, price and the like, each sample corresponds to a label, and the value of the label is 1 or 0, so that whether the user purchases the data (whether the user clicks or stores the data as the label in actual conditions) is indicated. I.e. each sample represents the purchase of a certain commodity by a certain user. The problem to be solved is a classification problem, a classification model is trained by the samples, the model is output to judge whether the user purchases the commodity, the model outputs a probability value of 0 to 1, the probability value represents the likelihood of the user purchasing the commodity, and the probability value is larger the probability value represents the likelihood of purchasing.
In the prediction stage, M commodities are called back from all commodities for a certain user, I samples are respectively formed by the characteristics of the user and the characteristics of all commodities, I is the number of the commodities, the I samples are input into a model to obtain I probability values, namely the probability of the user for purchasing each commodity is represented, the I probability values are ordered, and the first M corresponding commodities with the largest probability values, namely the M commodities most likely to be purchased by the user are taken and then recommended to the user.
Let us take 4 samples in table 1 as examples:
user' s Sex (sex) Age of Goods commodity Category(s) Price of Label (Label)
1 Zhang San Man's body 9 Pencil with pencil lead Stationery 2 1
2 Li Si Man's body 38 Trousers Clothes with a pair of elastic members 56 0
3 Wang Wu Female 12 Facial mask Cosmetic product 35 1
4 Zhao Liu Female 27 Basketball ball Sports article 67 0
TABLE 1
In actual data preprocessing, we will discard the "user" column and the "merchandise" column.
The scheme comprises the following steps:
1) An embedding layer: the sparse characteristic data of the input users and commodities are respectively converted into low-dimensional dense embedded vector representations;
2) Attention mechanism layer: the embedded vectors of all the features are input, the weight of each feature is learned through the attention network, and the embedded vectors of the features are weighted and summed according to the weight to obtain the respective characterization vectors of the user and the commodity;
3) Output layer: and obtaining the matching degree of the user and the commodity by calculating the inner product of the characterization vector of the user and the commodity.
The specific operation of each step is described in detail below:
1) Embedding layer
At this level, we turn the input user and item feature data into embedded vector representations, respectively. The characteristics of the user and the commodity are input separately, and can be seen from a model structure diagram, so that the user and the commodity also need to be processed separately.
We first understand the concept of feature number and vocabulary, such as "gender" as a feature, the number of values of this feature is 2, i.e. "gender=men" and "gender=women", then the feature number is the number of features, and the vocabulary is the sum of the numbers of values of all features.
First, we need to process the data into sparse vectors
Figure BDA0002301164230000071
wherein ,/>
Figure BDA0002301164230000072
Is a set of sparse vectors of user features, i.e., { x u,1 ,x u,2 ,x u,3 ,…,x u,T T is the number of user features, and u in the subscript represents the user;
Figure BDA0002301164230000073
the method is characterized in that the method is a set of sparse vectors of commodity features, N is the number of the commodity features, and v in subscripts represents commodities.
The data generally includes a continuous value feature and a category value feature. Class value features, e.g. "gender", typically encoded as a single heat vector x i E.g. "gender=men" is coded as "[0, ], 0,1,0, …,0]", vector length is the vocabulary size. For continuous value features, such as "age=10", we can see a class of features, also encoded as sparse vectors, such as "[0, ], 0,1,0, …,0]". The sparse vectors are each of a certain position with a value (specifically, the values of the class features are all 1, the successive featuresThe sign value is unchanged), the remainder being 0. A feature dictionary needs to be built for a specific location number so that each location can represent a feature.
Taking the samples in the following tables 2 and 3 as examples, firstly, feature dictionaries of users and commodities are established, and a position serial number is allocated to each feature:
features (e.g. a character) Position number
Sex = male 0
Sex = female 1
Age of 2
TABLE 2
Features (e.g. a character) Position number
Category = stationery 0
Category = clothing 1
Category = cosmetic 2
Category = sports goods 3
Price of 4
TABLE 3 Table 3
Taking the characteristics of the user as an example, the number of the characteristic values of the user is 5, including "gender=man", "gender=woman" and "age", so that the vocabulary, that is, the dictionary length is 3, and the obtained sparse vector length is also 3:
"sex=men", position number 0, sparse vector [1, 0]
"sex=female", position number 1, sparse vector [0,1,0]
"age=10", position number 2, sparse vector [0,0,10]
Taking the characteristics of the commodity as an example, the number of the characteristic values of the user is 5, including "category=stationery", "category=clothing", "category=cosmetics", "category=sports goods", "price", so the vocabulary, that is, the dictionary length is 5, and the obtained sparse vector length is also 5:
"category=stationery", position number 0, sparse vector [1, 0]
"category=clothing", position number 1, sparse vector [0,1,0]
"category=cosmetic", position number 2, sparse vector [0,1,0]
Category=sports goods, position number is 3, and sparse vector [0,1,0] is obtained
"price=2", position number 4, sparse vector [0,0,0,0,2]
Then we can get a sparse vector for each feature of sample 1:
"sex=man": [1, 0];
"age=9": [0,0,9];
category = stationery ": [1, 0];
"price=2": [0,0,0,0,2].
I.e.
Figure BDA0002301164230000091
wherein ,xu,1 The sparse vector representing the 1 st feature of the user, "gender=men", and so on, to obtain four sparse vectors, the general process is to integrate the sparse vectors of all the features of the sample into one-dimensional vector, for example, the user features of sample 1 can be integrated into one length 3 vector [ 10 9] ]. But the encoded vector thus processed is high in latitude and sparse. And if the vocabulary is large, like some ID class characteristics, the direct input neural network cannot be effectively trained.
So to reduce the dimensions we use another widely used method to transform these sparse long feature vectors into low-dimensional and dense vectors (i.e. embedding vectors) by multiplying the sparse vectors with an embedding matrix. Since the sparse vector has a value at only one place, the multiplication result is equivalent to selecting a certain column from the embedded matrix and multiplying the value, and since the value of the category feature is 1, we can use each column of the embedded matrix as the embedded vector of each feature, except that the continuous feature needs to be multiplied by a number:
u t =W embed,u x u,t
v n =W embed,v x v,n
wherein ,ut An embedded vector representing the t-th feature of the user, the subscript u representing the user, v n Is the embedded vector of the nth feature of the commodity, and the subscript v represents the commodity; w (W) embed,u ∈R d×T I.e. user embedding (EmBedding) matrix, W embed,v ∈R d×N Is the commodity embedding matrix, d is the embedding vector length, and T and N are the vocabulary sizes of the user feature and the commodity feature, respectively. Because d<<T,d<<N(<<Representing far smaller), the original T-dimensional or N-dimensional vector is converted into a d-dimensional vector, thereby achieving the purpose of reducing the vector length. Both embedding matrices are parameters that the embedding layer needs to learn to get, optimized along with other parameters of the network.
Finally, we construct a set of embedded vectors containing user features and a set of embedded vectors containing merchandise features, respectively:
Figure BDA0002301164230000092
Figure BDA0002301164230000093
wherein ,ut An embedded vector for the T-th feature of the user, T is the feature number of the user, v n The feature number of the N commodity is the embedded vector of the nth feature of the commodity.
2) Attention mechanism layer
Attention mechanisms can focus on important information in a multi-layer attention network and in each step. For feature-level in our model, the network of each layer represents a cross-combination between features, i.e. the attention mechanism is able to find important feature combinations and increase their weights. On the other hand, the attention mechanism comprises a multi-layer attention network, and the cross combination of the high-order features is deduced, so that the attention mechanism can extract important high-order feature combinations. The attention mechanism can also reduce the amount of information processed by screening feature combinations of the core.
Our approach simultaneously focuses on the characteristics of both the user and the commodity through a multi-layer attention network, both of which use the same network architecture to extract important feature combinations. In this section, we will explain the attention mechanisms used in each layer of attention network, which eventually make up the whole model. For simplicity we will omit the bias term b in the following equation.
As can be seen from the model structure diagram, the attention mechanism model is divided into a left part and a right part, namely a user attention mechanism and a commodity attention mechanism. The user attention mechanism takes the embedded vector of the user characteristics as input and comprises a plurality of layers of user attention networks; the commodity attention mechanism takes the embedded vector of commodity characteristics as input and comprises a plurality of layers of commodity attention networks. The attention network is a layer-by-layer recursive relationship, the input of the current layer is the output of the previous layer, and the output of the current layer is used as the input of the next layer. The number of layers can be determined according to the data characteristics, and the number of layers of the attention mechanisms of the user and the commodity are kept consistent. The two-layer attention network is used herein as an example to make up the attention mechanism.
[ user attention mechanism ]
The user attention mechanism aims to find important feature combinations among the features of the user. Is made up of K user attention networks. In the layer k attention network, the user characterizes the vector u (k) Given by the formula:
Figure BDA0002301164230000101
Figure BDA0002301164230000102
wherein the superscript k (k-1) of all variables represents the attention network of the k (k-1) th layer, U_attribute represents the attention network of the user, each layer of network is the same, and the specific operation process of the network structure consists of the following several formulas. The input to the network is an embedded vector of user features
Figure BDA0002301164230000103
And the output of the previous layer->
Figure BDA0002301164230000104
The output of the network characterizes the vector u for the user of the layer (k) ,m (k) Is a stored vector that holds the accumulated sum of the characterization vectors obtained from the previous k-layer network. After input is obtained, the attention network is normalized by a two-layer feed Forward Neural Network (FNN) and softmax layers to obtain attention weights
Figure BDA0002301164230000105
The weight vector is utilized to carry out weighted average on T user feature vectors to obtain a characterization vector u of the layer (k)
At the kth layer, for t=1, 2,3, …, T, the weight of the user's kth embedded vector at that layer is first found
Figure BDA0002301164230000106
Figure BDA0002301164230000107
Figure BDA0002301164230000108
Figure BDA0002301164230000109
wherein ,
Figure BDA00023011642300001010
are all network parameter matrices, < >>
Figure BDA00023011642300001011
Embedded vector u representing user t-th feature in a k-th layer user attention network t For the parameter matrix of the input neural network, +.>
Figure BDA00023011642300001012
Representing output of a layer above in a layer k user attention networkStore vector->
Figure BDA00023011642300001013
For the parameter matrix of the input neural network, +.>
Figure BDA00023011642300001014
Representing the hidden layer variable +.>
Figure BDA00023011642300001015
Is a parameter matrix of the input neural network. />
Figure BDA00023011642300001016
For the hidden layer vector obtained by the user's t-th feature, tan h is the activation function, and as a custom vector multiplication operation, i.e., two vectors of the same length and elements in the same position are multiplied to obtain a new vector. />
Figure BDA00023011642300001017
By matrix multiplication with a matrix with a number of rows 1, a value +. >
Figure BDA00023011642300001018
Then obtaining the weight of the surface sheet vector of the K layer of the final user through softmax conversion>
Figure BDA00023011642300001019
e is a natural constant.
Then, calculating the weighted sum of the embedded vectors of the user according to the obtained weights to obtain the characterization vector u of the kth layer of the user (k)
Figure BDA00023011642300001020
[ Commodity attentiveness mechanism ]
The commodity attention mechanism aims at searching important characteristic combinations in the characteristics of commodities, and the whole network is composed of K commodity attention networks. In the layer k attention network, the characterization vector of the commodity is obtained by the following formula:
Figure BDA00023011642300001021
Figure BDA00023011642300001022
wherein, V_Atation represents commodity attention network, and the network structure is the same as user attention network. The weight of the nth embedded vector of each commodity is obtained
Figure BDA00023011642300001023
Then, according to the weight, the weighted sum of all commodity embedded vectors is obtained to obtain a commodity characterization vector v of the k-th layer (k)
In the K-th layer, for n=1, 2,3, …, N, the weight of the N-th embedded vector of the product of the layer is first obtained
Figure BDA00023011642300001024
Figure BDA00023011642300001025
Figure BDA00023011642300001026
Figure BDA00023011642300001027
Figure BDA0002301164230000111
wherein ,
Figure BDA0002301164230000112
is a parameter matrix of the commodity attention network, +.>
Figure BDA0002301164230000113
Embedding vector v representing nth characteristic of commodity in kth layer commodity attention network n For the parameter matrix of the input neural network, +.>
Figure BDA0002301164230000114
Storage vector representing output of a layer above in a layer k commodity attention network>
Figure BDA0002301164230000115
For the parameter matrix of the input neural network, +. >
Figure BDA0002301164230000116
Representing the hidden layer variable +.>
Figure BDA0002301164230000117
Is a parameter matrix of the input neural network. />
Figure BDA0002301164230000118
For the hidden layer vector obtained with the nth characteristic of the commodity, a value +.>
Figure BDA0002301164230000119
Then obtaining the characterization vector weight of the K layer of the final commodity through softmax conversion>
Figure BDA00023011642300001119
Then according to->
Figure BDA00023011642300001111
Calculating the weighted sum of the commodity embedded vectors to obtain a characterization vector v of a K-th layer of the commodity (k)
In particular, u as an input to the layer 1 attention network (0) and v(0) Initialized to the mean of the feature vectors, and
Figure BDA00023011642300001120
and />
Figure BDA00023011642300001121
Respectively equal to u (0) and v(0) :/>
Figure BDA00023011642300001114
Figure BDA00023011642300001115
Figure BDA00023011642300001116
Figure BDA00023011642300001117
3) Output layer
After the characterization vectors of the user and the commodity are obtained, a general method for measuring the matching degree of the user and the commodity is that the inner product of the two vectors is obtained, and the higher the inner product is, the closer the distance between the two vectors in the characterization space is, so that the matching degree is higher. The purpose of the output layer is to obtain the characterization vectors of the user and the commodity and calculate their matching degree.
Through the attention mechanism layer, we obtain the characterization vector of each layer of network of the user and the commodity, and finally obtain the characterization vector z of the user and the commodity through combination u and zv And obtaining the final matching degree by solving the inner product:
z u =[u (0) ;…;u (K) ]
z v =[v (0) ;…;v (K) ]
S=z u ·z v
Wherein, [; the method comprises the steps of carrying out a first treatment on the surface of the And the 'is a splicing operation' is a dot product operation, namely, multiplying two vectors with the same length and elements with the same position and summing to obtain the inner product of the two vectors.
As in fig. 1, the overall structure of the model is described when k=2,
model training uses cross entropy loss functions, which is a widely used loss function.
Figure BDA00023011642300001118
Wherein m is the number of samples, y i For sample labels, the positive sample with click action is 1, otherwise, is 0, and for each user, the sample labels form a pair with each clicked commodity<u,v + >Is a positive sample pair; too many products can be sampled after clicking, and a plurality of products are randomly selected to form a plurality of pairs<u,v - >Is a negative sample pair. Model training is accomplished by minimizing the loss function L, i.e., continually narrowing the distance between positive samples and expanding the distance between negative pairs of samples.
In the prediction stage, the characterization vectors of all commodities are calculated and stored through the right half part of the model, namely the commodity attention mechanism. For each user, the characterization vector of the user is obtained through the left half part of the model, namely the user attention mechanism, then the matching degree of the user and all commodities is calculated, and finally the top P commodities with the highest matching degree are selected as recall results. The authentication index may select a recall rate.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structural changes made by the description of the present invention and the accompanying drawings or direct/indirect application in other related technical fields are included in the scope of the invention.

Claims (9)

1. A recommendation system recall method based on an attention mechanism, comprising the steps of:
s10, extracting user features and commodity features in the training samples, converting the user features into user embedded vectors, and converting the commodity features into commodity embedded vectors;
s20, inputting the user embedded vector and the commodity embedded vector into an attention mechanism model for training, learning the weight of each feature through an attention network in the model, and carrying out weighted summation on the embedded vectors of all the features according to the weight to obtain a user characterization vector and a commodity characterization vector; calculating the inner product of the user characterization vector and the commodity characterization vector to obtain the matching degree of the user purchasing commodity will of the training sample, establishing a cross entropy loss function of the matching degree of the user purchasing commodity will, calculating the minimized cross entropy loss function, and converging the attention mechanism model;
The multi-layer user attention network in S20 includes a K-layer user attention network in which the user characterization vector u (k) Given by the formula:
Figure FDA0004106136750000011
Figure FDA0004106136750000012
wherein the superscript K (K-1) of all variables is the attention network of the K (K-1) th layer, U_attribute is the attention network of the user, each layer of network is the same, the specific operation process of the network structure consists of the following several formulas, and the input of the network is the embedded vector of the user characteristics
Figure FDA0004106136750000013
And the output of the previous layer->
Figure FDA0004106136750000014
The output of the network characterizes the vector u for the user of the layer (k) ,m (k) Is the accumulation of characterization vectors obtained by the K-layer network before preservationAnd a storage vector, after input, the attention network normalizes the two layers of feedforward neural network FNN and softmax layers to obtain attention weight->
Figure FDA0004106136750000015
The weight vector is utilized to carry out weighted average on T user feature vectors to obtain a characterization vector u of the layer (k)
At the K-th layer, for t=1, 2,3, …, T, the weight of the user's T-th embedded vector at that layer is first found
Figure FDA0004106136750000016
Figure FDA0004106136750000017
Figure FDA0004106136750000018
Figure FDA0004106136750000019
wherein ,
Figure FDA00041061367500000110
are all network parameter matrices, < >>
Figure FDA00041061367500000111
Embedded vector u representing user t-th feature in a k-th layer user attention network t For the parameter matrix of the input neural network, +.>
Figure FDA00041061367500000112
Storage vector representing output of a layer above in a layer k user attention network +. >
Figure FDA00041061367500000113
For the parameter matrix of the input neural network, +.>
Figure FDA00041061367500000114
Representing the hidden layer variable +.>
Figure FDA00041061367500000115
For the parameter matrix of the input neural network, +.>
Figure FDA00041061367500000116
For the hidden layer vector obtained by the user's t-th feature, tanh is the activation function, and as a custom vector multiplication, i.e., two vectors of the same length and elements in the same position are multiplied to obtain a new vector, ">
Figure FDA00041061367500000117
By matrix multiplication with a matrix with a number of rows 1, a value +.>
Figure FDA00041061367500000118
Then obtaining the weight of the surface sheet vector of the K layer of the final user through softmax conversion>
Figure FDA00041061367500000119
e is a natural constant, and is a natural constant,
then according to
Figure FDA00041061367500000120
Calculating the weighted sum of the embedded vectors of the user to obtain a characterization vector u of the K-th layer of the user (k)
Figure FDA00041061367500000121
/>
S30, inputting the sample to be tested into the converged attention mechanism model, obtaining the matching degree of the user purchasing commodity intention of the sample to be tested, and selecting the commodity of which the matching degree of the user purchasing commodity intention is in a preset interval as a recall result to be recommended.
2. The attention mechanism based recommendation system recall method of claim 1 wherein the attention mechanism model is a bi-directional attention mechanism model comprising a multi-layer user attention network and a multi-layer commodity attention network, each layer of user attention network comprising a two-layer feed forward neural network FNN and a normalized layer Softmax, each layer of commodity attention network comprising a two-layer feed forward neural network FNN and a normalized layer Softmax, the user attention network and the commodity attention network each being in a layer-by-layer recursive relationship.
3. The attention mechanism based recommendation system recall method of claim 2 wherein the multi-tier commodity attention network in S20 comprises a K-tier commodity attention network in which commodity characterization vector v (k) Given by the formula:
Figure FDA0004106136750000021
Figure FDA0004106136750000022
wherein V_attribute represents commodity attention network, the network structure is the same as user attention network, and the weight of nth embedded vector of each commodity is obtained first
Figure FDA0004106136750000023
Then, according to the weight, the weighted sum of all commodity embedded vectors is obtained to obtain a commodity characterization vector v of the K-th layer (k)
In the K-th layer, for n=1, 2,3, …, N, the weight of the N-th embedded vector of the product of the layer is first obtained
Figure FDA0004106136750000024
Figure FDA0004106136750000025
Figure FDA0004106136750000026
Figure FDA0004106136750000027
wherein ,
Figure FDA0004106136750000028
is a parameter matrix of the commodity attention network, +.>
Figure FDA0004106136750000029
Embedding vector v representing nth characteristic of commodity in kth layer commodity attention network n For the parameter matrix of the input neural network, +.>
Figure FDA00041061367500000210
Storage vector representing output of a layer above in a layer k commodity attention network>
Figure FDA00041061367500000211
For the parameter matrix of the input neural network, +.>
Figure FDA00041061367500000212
Representing the hidden layer variable +.>
Figure FDA00041061367500000213
For the parameter matrix of the input neural network, +.>
Figure FDA00041061367500000214
For the hidden layer vector obtained with the nth characteristic of the commodity, a value +. >
Figure FDA00041061367500000215
Then obtaining the characterization vector weight of the K layer of the final commodity through softmax conversion>
Figure FDA00041061367500000216
Then according to
Figure FDA00041061367500000217
Calculating the weighted sum of the commodity embedded vectors to obtain a characterization vector v of a K-th layer of the commodity (k)
Figure FDA00041061367500000218
4. The recall method of a recommendation system based on an attention mechanism of claim 1, wherein the method for calculating the inner product of the user characterization vector and the commodity characterization vector to obtain the matching degree of the user' S intention to purchase the commodity of the training sample in S20 is specifically as follows:
multi-layer user attention network stitching and combining token vectors u for user attention networks of all layers (k) Obtaining a final user characterization vector z u =[u (0) ;…;u (K) ];
Multi-layer commodity attention network splicing and combining characterization vector v of commodity attention network of all layers (k) Obtaining a final commodity representation vector z v =[v (0) ;…;v (K) ];
Calculating the final user token vector z u And a final merchandise characterization vector z v And obtaining the matching degree of the final user's willingness to purchase goods.
5. The attention mechanism based recommender system recall method of claim 1 wherein said cross entropy loss function of user willingness to purchase commodity matching is specifically:
Figure FDA00041061367500000219
wherein m is the number of samples, y i For sample labels, with click behavior treated as positive sample, labeled 1, no click behavior treated as negative sample, labeled 0, for each user, make up with each item clicked <u,v + >Considered as a positive sample pair; click-through commodity composition<u,v - >Consider as a negative pair of samples, model training by minimizing the loss function L, i.e., continually narrowing the distance between positive samples and expanding the distance between negative pairs.
6. The attention mechanism based recommender system recall method of claim 1 wherein at S10:
dividing user data and commodity data from training samples, and processing the user data into sparse user vectors
Figure FDA00041061367500000220
T is the total number of user features, T is the current user feature, and u represents the user; processing commodity data into sparse commodity vector +.>
Figure FDA00041061367500000221
N is the total number of commodity features, N is the current commodity features, and v represents commodity;
classifying training sample data into category type features and continuous type features according to the attribute of the data, and adopting a single-heat coding vector x if the training sample data is the category type features i ,x i The vector length of (2) is taken as the sum of the numbers of all the characteristics of the current training sample, and the category characteristics thereofThe value is 1, the other values are 0, and a feature dictionary is built for the position serial numbers of the category feature values in the vectors; if the feature is continuous, the sum of the numbers of all the features of the current training sample is taken as a vector length, the value of the continuous feature is taken as a feature value of the vector, and the other feature values are 0, so that the feature value is coded into a sparse vector.
7. The attention mechanism based recommender system recall method of claim 1 wherein said attention mechanism model is a representational learning model.
8. The attention mechanism based recommender recall method of claim 1 wherein the vector lengths of said user attention network and said merchandise attention network are equal.
9. The attention mechanism based recommender recall method of claim 1 wherein said training sample data is collected from a click rate estimation CTR model.
CN201911222216.1A 2019-12-03 2019-12-03 Recommendation system recall method based on attention mechanism Active CN111062775B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911222216.1A CN111062775B (en) 2019-12-03 2019-12-03 Recommendation system recall method based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911222216.1A CN111062775B (en) 2019-12-03 2019-12-03 Recommendation system recall method based on attention mechanism

Publications (2)

Publication Number Publication Date
CN111062775A CN111062775A (en) 2020-04-24
CN111062775B true CN111062775B (en) 2023-05-05

Family

ID=70299499

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911222216.1A Active CN111062775B (en) 2019-12-03 2019-12-03 Recommendation system recall method based on attention mechanism

Country Status (1)

Country Link
CN (1) CN111062775B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021223165A1 (en) * 2020-05-07 2021-11-11 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for object evaluation
CN111737569B (en) * 2020-06-04 2022-05-03 山东省人工智能研究院 Personalized recommendation method based on attribute perception intention-minded convolutional neural network
CN111695260B (en) * 2020-06-12 2022-06-21 上海大学 Material performance prediction method and system
CN111737573A (en) * 2020-06-17 2020-10-02 北京三快在线科技有限公司 Resource recommendation method, device, equipment and storage medium
CN112184391B (en) * 2020-10-16 2023-10-10 中国科学院计算技术研究所 Training method of recommendation model, medium, electronic equipment and recommendation model
CN112270571B (en) * 2020-11-03 2023-06-27 中国科学院计算技术研究所 Meta-model training method for cold-start advertisement click rate estimation model
CN112416931A (en) * 2020-11-18 2021-02-26 脸萌有限公司 Information generation method and device and electronic equipment
CN112328893B (en) * 2020-11-25 2022-08-02 重庆理工大学 Recommendation method based on memory network and cooperative attention
CN112598462B (en) * 2020-12-19 2023-08-25 武汉大学 Personalized recommendation method and system based on collaborative filtering and deep learning
CN113139850A (en) * 2021-04-26 2021-07-20 西安电子科技大学 Commodity recommendation model for relieving data sparsity and commodity cold start
CN113761392B (en) * 2021-09-14 2022-04-12 上海任意门科技有限公司 Content recall method, computing device, and computer-readable storage medium
CN113742594B (en) * 2021-09-16 2024-02-27 中国银行股份有限公司 Recommendation system recall method and device
CN115062220B (en) * 2022-06-16 2023-06-23 成都集致生活科技有限公司 Attention merging-based recruitment recommendation system
CN116521936B (en) * 2023-06-30 2023-09-01 云南师范大学 Course recommendation method and device based on user behavior analysis and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109509054A (en) * 2018-09-30 2019-03-22 平安科技(深圳)有限公司 Method of Commodity Recommendation, electronic device and storage medium under mass data
CN109960759A (en) * 2019-03-22 2019-07-02 中山大学 Recommender system clicking rate prediction technique based on deep neural network
CN110196946A (en) * 2019-05-29 2019-09-03 华南理工大学 A kind of personalized recommendation method based on deep learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109509054A (en) * 2018-09-30 2019-03-22 平安科技(深圳)有限公司 Method of Commodity Recommendation, electronic device and storage medium under mass data
CN109960759A (en) * 2019-03-22 2019-07-02 中山大学 Recommender system clicking rate prediction technique based on deep neural network
CN110196946A (en) * 2019-05-29 2019-09-03 华南理工大学 A kind of personalized recommendation method based on deep learning

Also Published As

Publication number Publication date
CN111062775A (en) 2020-04-24

Similar Documents

Publication Publication Date Title
CN111062775B (en) Recommendation system recall method based on attention mechanism
Li et al. Multi-interest network with dynamic routing for recommendation at Tmall
CN112598462B (en) Personalized recommendation method and system based on collaborative filtering and deep learning
CN108363804B (en) Local model weighted fusion Top-N movie recommendation method based on user clustering
Tan et al. Improved recurrent neural networks for session-based recommendations
Tautkute et al. Deepstyle: Multimodal search engine for fashion and interior design
Zhang et al. Deep Learning over Multi-field Categorical Data: –A Case Study on User Response Prediction
CN106021364B (en) Foundation, image searching method and the device of picture searching dependency prediction model
CN110458627B (en) Commodity sequence personalized recommendation method for dynamic preference of user
CN108537624B (en) Deep learning-based travel service recommendation method
CN111737474A (en) Method and device for training business model and determining text classification category
CN108665323B (en) Integration method for financial product recommendation system
US20100223258A1 (en) Information retrieval system and method using a bayesian algorithm based on probabilistic similarity scores
CN109064285B (en) Commodity recommendation sequence and commodity recommendation method
EP3300002A1 (en) Method for determining the similarity of digital images
Chen et al. Using fruit fly optimization algorithm optimized grey model neural network to perform satisfaction analysis for e-business service
Meena et al. Identifying emotions from facial expressions using a deep convolutional neural network-based approach
Li et al. Retrieving real world clothing images via multi-weight deep convolutional neural networks
CN111814842A (en) Object classification method and device based on multi-pass graph convolution neural network
CN111737578A (en) Recommendation method and system
CN114693397A (en) Multi-view multi-modal commodity recommendation method based on attention neural network
Alfarhood et al. DeepHCF: a deep learning based hybrid collaborative filtering approach for recommendation systems
Du et al. POLAR++: active one-shot personalized article recommendation
CN111597428A (en) Recommendation method for splicing user and article with q-separation k sparsity
CN110163716B (en) Red wine recommendation method based on convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant