CN116894476A

CN116894476A - Multi-behavior attention self-supervision learning method based on double channels

Info

Publication number: CN116894476A
Application number: CN202310823335.2A
Authority: CN
Inventors: 王楠; 曲明月; 钟颖莉
Original assignee: Heilongjiang University
Current assignee: Heilongjiang University
Priority date: 2023-07-06
Filing date: 2023-07-06
Publication date: 2023-10-17

Abstract

The present invention proposes a multi-behavior attention self-supervised learning method based on dual channels. The method is able to distinguish the ability of different user behavior perception preferences. Take advantage of the different behaviors of users interacting with items. Capturing the user's dependence on multiple behaviors, the three self-supervised learning methods invented not only enhance the representation results of the dual channels, but also enable the model to obtain more auxiliary supervision signals in self-supervised learning within and between channels. Effectively alleviates the problem of sparse supervision signals.

Description

A self-supervised learning method for multi-behavior attention based on dual channels

技术领域Technical field

本发明涉及基于双通道的跨行为依赖建模的多种行为偏好预测技术领域，具体涉及双向编码表征转换器和图神经网络和自监督学习的个性化行为推荐方法。特别涉及一种基于双通道的多行为注意力自监督学习方法。The invention relates to the technical field of multiple behavioral preference predictions based on dual-channel cross-behavior dependency modeling, and specifically relates to a bidirectional coding representation converter and a graph neural network and a personalized behavior recommendation method of self-supervised learning. In particular, it relates to a multi-behavior attention self-supervised learning method based on dual channels.

背景技术Background technique

近年来，随着移动互联网的发展，电子商务的重心从个人电脑转移到智能手机，各种移动电子商务平台应运而生，基于用户交互的训练网络也有了长足的发展，尤其是多行为推荐受到了学术界的广泛关注。数百万用户生成的海量交互数据为探索用户多种行为中的潜在意图提供了绝佳的机会，但同时海量数据也使用户陷入行为信息种类稀疏的困境，无法有效地进行个性化推荐。为了解决这一问题，多行为推荐作为一种不仅帮助用户探索自己感兴趣的商品且帮助电商平台提供吸引更多潜在的用户的有效方法，已成为电商平台挖掘领域最热门的研究课题之一。In recent years, with the development of the mobile Internet, the focus of e-commerce has shifted from personal computers to smartphones. Various mobile e-commerce platforms have emerged. Training networks based on user interaction have also made great progress, especially multi-behavior recommendation. attracted widespread attention from academic circles. The massive interaction data generated by millions of users provides an excellent opportunity to explore the potential intentions of users in various behaviors. However, at the same time, the massive data also puts users into the dilemma of sparse behavioral information types, making it impossible to effectively make personalized recommendations. In order to solve this problem, multi-behavior recommendation has become one of the most popular research topics in the field of e-commerce platform mining as an effective method that not only helps users explore the products they are interested in but also helps e-commerce platforms attract more potential users. one.

通常，由于在移动设备上打字比在台式电脑上更困难，意图推荐可以节省用户的时间，而不需要任何输入，这将提高用户的活跃度和购物体验感，并且用户可以通过分享行为与朋友分享他们的爱用物，这些行为包含用户上下文信息通常能够反映用户兴趣。通过对这些行为信息的深度挖掘，可以揭示更深层次的用户对商品的喜好。用户的交互行为代表用户与商品的交互，同样还表现出了商品特征和用户特征。从用户角度来看，用户点击商品后对该商品加入购物车，最后对购物车中的多个商品一同清算；从商品的角度来看，对于包括加购，收藏的交互行为的商品比只有单一交互行为的商品更能吸引该用户。另外，用户通常会跟好友分享商品。这些独特的行为使多种行为推荐不同于传统的推荐系统，因此，需要全面了解多种行为中的用户对于商品的交互信息，从而开发用于多种行为推荐的新算法。Generally, since typing on mobile devices is more difficult than on desktop computers, intent recommendations can save users' time without requiring any input, which will increase user activity and shopping experience, and users can communicate with friends through sharing behaviors Sharing their favorite things, these behaviors contain user contextual information and often reflect user interests. Through in-depth mining of this behavioral information, deeper user preferences for products can be revealed. The user's interactive behavior represents the interaction between the user and the product, and also shows the product characteristics and user characteristics. From the user's perspective, the user clicks on the product and adds it to the shopping cart, and finally liquidates multiple products in the shopping cart together; from the product's perspective, for products with interactive behaviors including additional purchases and collections, only a single Products with interactive behaviors are more attractive to this user. In addition, users often share products with friends. These unique behaviors make multi-behavior recommendation different from traditional recommendation systems. Therefore, it is necessary to comprehensively understand the interactive information of users in multiple behaviors for products, so as to develop new algorithms for multi-behavior recommendation.

发明内容Contents of the invention

本发明目的是为了解决现有技术中的问题，从而提出一种基于双通道的多行为注意力自监督学习方法。The purpose of the present invention is to solve the problems in the existing technology and thereby propose a multi-behavior attention self-supervised learning method based on dual channels.

本发明是通过以下技术方案实现的，本发明提出一种基于双通道的多行为注意力自监督学习方法，所述方法包括以下步骤：The present invention is realized through the following technical solutions. The present invention proposes a multi-behavior attention self-supervised learning method based on dual channels. The method includes the following steps:

步骤一、在天猫和CIKM2019电商人工智能挑战赛获取商品交互数据集，选取T％的数据作为训练数据集，(1-T％)的数据作为测试数据集，其中训练数据集包含用户对商品、用户及用户的多种交互行为历史；Step 1. Obtain the product interaction data set from Tmall and CIKM2019 E-commerce Artificial Intelligence Challenge, select T% of the data as the training data set, and (1-T%) of the data as the test data set. The training data set contains user interactions. Products, users, and users’ various interactive behavior histories;

步骤二、训练数据集中的用户集合为U，U＝{u₁，u₂，...，u_q，_...，u_N}，q∈{1，...，N}，其中u_q为第q个用户，N为用户的数量；商品集合为I，I＝{i₁，i₂，...，i_t，...，i_T}，t∈{1，...，T}，其中i_t为第t个用户，T为商品的数量；行为集合为B，B＝{b₁，b₂，...，b_k，....，i_K}，k∈{1，...，K}，其中b_k为第k个行为，K为行为的数量；Step 2. The user set in the training data set is U, U={u ₁ , u ₂ ,..., u _q , _. .., u _N }, q∈{1,..., N}, where u _q is the qth user, N is the number of users; the product set is I, I={i ₁ , i ₂ ,..., it _t ,..., i _T }, t∈{1,... , T}, where i _t is the t-th user, T is the quantity of goods; the behavior set is B, B={b ₁ , b ₂ ,..., b _k,. ..., i _K }, k ∈{1,...,K}, where b _k is the k-th behavior and K is the number of behaviors;

步骤三、从序列通道角度考虑，根据用户的行为历史，构建用户-商品交互序列；Step 3: From the perspective of sequence channels, construct a user-product interaction sequence based on the user's behavior history;

步骤四、从序列通道角度考虑，由于深度双向模型优于单向模型，引入BERT4Rec的计算方法GELU为高斯误差线性单元激活函数；W表示GELU激活函数的权重矩阵，b表示偏置；/>softmax作为输出的激活函数，对各种行为序列拼接的结果进行归一化操作，对于不同的用户，他们拥有不同的行为交互序列导致不同编码结果；Step 4. From the perspective of sequence channels, since the deep bidirectional model is better than the unidirectional model, the calculation method of BERT4Rec is introduced. GELU is the Gaussian error linear unit activation function; W represents the weight matrix of the GELU activation function, and b represents the bias;/> Softmax is used as the output activation function to normalize the results of splicing various behavioral sequences. For different users, they have different behavioral interaction sequences that lead to different encoding results;

步骤五、从序列通道考虑，根据上述表征结果，设计自监督损失；Step 5: Considering the sequence channel and based on the above characterization results, design the self-supervised loss;

步骤六、从图通道考虑，获取用户足够多的可用信息；用户存在多种行为，包括点击，加入购物车，收藏和购买行为；定义G＝(V，E)，V表示结点集合包含用户集u∈U和项目集i∈I即(U，I)∈V；E表示用户结点与项目结点间的不同交互行为；多行为图的嵌入由多个行为子图嵌入构成，行为子图嵌入表示成G_b＝(V_b，E_b)；Step 6: Consider the graph channel to obtain enough available information about the user; users have multiple behaviors, including clicking, adding to shopping cart, collecting and purchasing behaviors; define G = (V, E), V means that the node set contains the user The set u∈U and the item set i∈I are (U, I)∈V; E represents the different interactive behaviors between user nodes and item nodes; the embedding of multi-behavior graphs consists of multiple behavior sub-graph embeddings, and the behavior sub-graph Graph embedding is expressed as G _b =(V _b , E _b );

步骤七、从图通道考虑，辅助行为图和目标行为图/>作为注意力的输入；Step 7: Consider from the diagram channel, auxiliary behavior diagram and target behavior diagram/> as input to attention;

步骤八、从图通道考虑，设计通道内多行为交互图的自监督学习，通过自监督学习增强多种行为数据监督信号；Step 8: Considering the graph channel, design self-supervised learning of multi-behavior interaction graphs in the channel, and enhance multiple behavioral data supervision signals through self-supervised learning;

步骤九、从序列通道和图通道考虑，通过双通道结合的自监督学习来增强监督信号。Step 9: Consider the sequence channel and the graph channel, and enhance the supervision signal through self-supervised learning combined with dual channels.

进一步地，在步骤三中，设定用户交互行为序列内的每个元素为一个三元组的特征向量表示用户q用第k种行为与项目x交互；用户的多行为序列包含的是单一用户的交互信息，用户的多行为交互序列映射成初始嵌入形成特征矩阵/>其包含用户q通过所有行为交互的商品；辅助行为交互序列的特征向量/>和目标行为交互序列的特征向量/>作为多行为交互序列依赖编码器的输入。Further, in step three, each element in the user interaction behavior sequence is set as a triplet feature vector. Indicates that user q uses the kth behavior to interact with item It contains the products that user q interacts with through all behaviors; the feature vector of the auxiliary behavior interaction sequence/> Feature vector of interaction sequence with target behavior/> As input to a multi-behavior interaction sequence-dependent encoder.

进一步地，计算每个辅助行为的特征向量和目标行为的特征向量，计算过程为其中W^Q，W^Q∈R^d*n是可学习的行为向量的权重矩阵；/>表示/>的转置；/>表示辅助行为k和目标行为k′之间的关联矩阵；/> 每个关联矩阵/>经过softmax归一化就得到了符合概率分布取值区间的注意力分数/>softmax通过计算余弦相似度得出与购买行为最相近的行为；W^V∈R^d*n是可学习的行为向量的权重矩阵。Further, calculate the feature vector of each auxiliary behavior and the feature vector of the target behavior. The calculation process is: Where W ^Q , W ^Q ∈R ^d*n is the weight matrix of the learnable behavior vector;/> Express/> transposition;/> Represents the correlation matrix between auxiliary behavior k and target behavior k′;/> Each correlation matrix/> After softmax normalization, the attention score that conforms to the value range of the probability distribution is obtained/> softmax calculates the cosine similarity to find the behavior closest to the purchase behavior; W ^V ∈R ^d*n is the weight matrix of learnable behavior vectors.

进一步地，在步骤五中，把同一用户的不同行为当作正样本对不同用户间的不同行为当作负样本对/>由此关于用户行为的自监督损失为：/>其中/>中的表示计算余弦相似度。Further, in step five, different behaviors of the same user are regarded as positive sample pairs Different behaviors between different users are treated as negative sample pairs/> Therefore, the self-monitoring loss regarding user behavior is:/> Among them/> middle Represents the calculation of cosine similarity.

进一步地，在步骤六中，图卷积用于学习图的结点表征，聚合并且传递结点特征；图卷积的过程具体为：对于每个行为子图嵌入成邻接矩阵A_k，其是由矩阵R_k构成，具体过程为：每个行为子图嵌入成邻接矩阵A_k作为行为的归一化拉普拉斯矩阵的输入，归一化过程为：/>其中/>表征k行为的度矩阵，I_k表示k行为的单位矩阵/>图卷积的输出通过阈值函数sigmoid：/> 其中/>是图中结点的l层的结点特征矩阵，W_k是行为视图信息传递的转换矩阵；图卷积共L层，L表示获取的L阶邻居结点，通过结点信息来得到信息聚合的过程，获取图中关于k种行为的结点的特征，能够保存多行为上下文信息。Further, in step six, graph convolution is used to learn the node representation of the graph, aggregate and transfer node features; the process of graph convolution is specifically: for each behavioral subgraph, it is embedded into an adjacency matrix A _k , which is It is composed of matrix R _k , and the specific process is: Each behavioral subgraph is embedded into an adjacency matrix A _k as the input of the normalized Laplacian matrix of the behavior. The normalization process is:/> Among them/> The degree matrix characterizing k behavior, I _k represents the identity matrix of k behavior/> The output of graph convolution passes through the threshold function sigmoid:/> Among them/> is the node feature matrix of the l layer of the node in the graph, W _k is the transformation matrix for behavioral view information transmission; graph convolution has a total of L layers, L represents the obtained L-order neighbor nodes, and information aggregation is obtained through node information The process of obtaining the characteristics of nodes related to k kinds of behaviors in the graph can save multi-behavior context information.

进一步地，在步骤七中，通过注意力辨别出辅助行为图对目标行为图的影响强度过程为：其中W^Q∈R^d*n和W^K∈R^d*n是可不断迭代更新的行为矩阵的权重矩阵，/>是注意力相关系数矩阵；/>的注意力计算过程与/>的注意力计算过程相同，被视作权重乘辅助行为/>其中W^V∈R^d*n是可不断迭代更新的行为矩阵的权重矩阵，/>是对于目标行为的辅助行为特征矩阵，作为跨行为交互图注意力编码器最终的输出。Further, in step seven, the process of identifying the intensity of the influence of the auxiliary behavior map on the target behavior map through attention is: Where W ^Q ∈R ^d*n and W ^K ∈R ^d*n are the weight matrices of the behavior matrix that can be continuously updated iteratively, /> Is the attention correlation coefficient matrix;/> The attention calculation process and/> The attention calculation process is the same and is regarded as the weight multiplied by the auxiliary behavior/> Where W ^V ∈R ^d*n is the weight matrix of the behavior matrix that can be continuously updated iteratively,/> is the auxiliary behavior feature matrix for the target behavior, which is the final output of the cross-behavior interaction graph attention encoder.

进一步地，在步骤八中，同一用户的不同行为视图被视为正样本对不同用户的不同行为被视为负样本对/>通过自监督正样本和负样本对来最大化用户之间的互信息：两个行为视图的一致性，并且最大化不同用户行为之间的差异性，得到行为数据监督信号的增强。Further, in step eight, different behavior views of the same user are regarded as positive sample pairs Different behaviors of different users are regarded as negative sample pairs/> Maximize the mutual information between users by self-supervising pairs of positive and negative samples: The consistency of the two behavioral views and maximizing the differences between different user behaviors are enhanced by behavioral data supervision signals.

进一步地，在步骤九中，把同一用户的序列通道和视图通道看作是正样本，用表示；不同用户的序列通道和视图通道看作是负样本，用/> 表示；自监督损失：/>τ是温度系数，平衡两通道间学习的强度；所有的自监督损失和作为最终的目标损失：L_CL＝L_SCL+L_GCL+L_SGCL；L_SCL是序列通道内多行为交互序列自监督损失；L_GCL是图通道内多行为交互图的自监督损失；最终的损失函数列表L_CL由每对行为的序列损失函数L_ScL和视图损失函数L_GCL和序列视图损失函数L_SGCL构成。Further, in step nine, the sequence channel and view channel of the same user are regarded as positive samples, using Represented; the sequence channels and view channels of different users are regarded as negative samples, using/> Represents; self-monitoring loss:/> τ is the temperature coefficient, balancing the intensity of learning between the two channels; the sum of all self-supervised losses is used as the final target loss: L _CL = L _SCL + L _GCL + L _SGCL ; L _SCL is the multi-behavior interactive sequence self-supervised loss in the sequence channel ; L _GCL is the self-supervised loss of the multi-behavior interaction graph within the graph channel; the final loss function list L _CL consists of the sequence loss function L _ScL and the view loss function L _GCL and the sequence view loss function L _SGCL for each pair of behaviors.

本发明提出一种电子设备，包括存储器和处理器，所述存储器存储有计算机程序，所述处理器执行所述计算机程序时实现所述一种基于双通道的多行为注意力自监督学习方法的步骤。The present invention proposes an electronic device, including a memory and a processor. The memory stores a computer program. When the processor executes the computer program, it implements the dual-channel-based multi-behavior attention self-supervised learning method. step.

本发明提出一种计算机可读存储介质，用于存储计算机指令，所述计算机指令被处理器执行时实现所述一种基于双通道的多行为注意力自监督学习方法的步骤。The present invention proposes a computer-readable storage medium for storing computer instructions. When the computer instructions are executed by a processor, the steps of the dual-channel-based multi-behavior attention self-supervised learning method are implemented.

本发明具有以下有益效果：The invention has the following beneficial effects:

本发明提出一种基于双通道的多行为注意力自监督学习方法，所述方法能够区分不同用户行为感知偏好的能力。利用不同模态下用户与项目交互的不同行为。捕获用户的多行为的共性，发明的三种自监督学习方式，不仅增强了双通道的表征结果，也使模型在通道内和通道间的自监督学习中能够获得更多的辅助监督信号，有效缓解了监督信号稀疏的问题。The present invention proposes a dual-channel multi-behavior attention self-supervised learning method, which can distinguish the capabilities of different users' behavior perception preferences. Take advantage of the different behaviors of users interacting with items in different modes. Capturing the commonality of users' multiple behaviors, the three self-supervised learning methods invented not only enhance the representation results of dual channels, but also enable the model to obtain more auxiliary supervision signals in self-supervised learning within and between channels, effectively Alleviating the problem of sparse supervision signals.

附图说明Description of the drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据提供的附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only These are embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on the provided drawings without exerting creative efforts.

图1为本发明的一种基于双通道的多行为注意力自监督学习方法的整体示意图。Figure 1 is an overall schematic diagram of a multi-behavior attention self-supervised learning method based on dual channels of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, rather than all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.

个性化推荐是电子商务平台中的重要组成部分。推荐系统通过神经网络来增强协同过滤，以达到准确捕获用户偏好，从而得到更好的推荐性能。传统的推荐方法集中于用户单一行为的结果，忽略了利用用户多种交互行为(点击，加入购物车，购买)的建模。虽然很多研究也集中在多行为建模上，但是，目前依然有两个重要挑战：1)由于忽略了多重行为上下文信息，在识别行为的多模态关系问题上仍然存在挑战。2)监督信号依然稀疏。为了解决这一问题，本发明提出了多行为注意力双通道对比学习方法，通过设定的自监督学习在用户的不同类型的交互中提取多重行为上下文信息，为用户提供行为依赖，得到不同行为的多种关系。增强模型的鲁棒性。本发明设计了双通道的自监督学习来增强数据监督信号。在两个真实数据集上的大量实验表明，本发明的方法始终优于最先进的多行为推荐方法。Personalized recommendations are an important part of e-commerce platforms. The recommendation system enhances collaborative filtering through neural networks to accurately capture user preferences, thereby obtaining better recommendation performance. Traditional recommendation methods focus on the results of a single user behavior and ignore the modeling of users' multiple interactive behaviors (click, add to shopping cart, purchase). Although many studies have also focused on multi-behavior modeling, there are still two important challenges: 1) Due to the neglect of multiple behavioral context information, there are still challenges in identifying multi-modal relationships of behaviors. 2) Supervision signals are still sparse. In order to solve this problem, the present invention proposes a multi-behavior attention dual-channel contrastive learning method. Through set self-supervised learning, multiple behavioral context information is extracted from different types of interactions of users, providing behavioral dependencies for users to obtain different behaviors. of multiple relationships. Enhance the robustness of the model. The present invention designs dual-channel self-supervised learning to enhance the data supervision signal. Extensive experiments on two real-world datasets show that our method consistently outperforms state-of-the-art multi-behavior recommendation methods.

结合图1，本发明提出一种基于双通道的多行为注意力自监督学习方法，所述方法包括以下步骤：Combined with Figure 1, the present invention proposes a multi-behavior attention self-supervised learning method based on dual channels. The method includes the following steps:

步骤二、训练数据集中的用户集合为U，U＝{u₁，u₂，...，u_q，_...，u_N}，q∈{1，...，N}，其中u_q为第q个用户，N为用户的数量；商品集合为I，I＝{i₁，i₂，...，i_t，....，i_T}，t∈{1，...，T}，其中i_t为第t个用户，T为商品的数量；行为集合为B，B＝{b₁，b₂，...，b_k，....，i_K}，k∈{1，...，K}，其中b_k为第k个行为，K为行为的数量；Step 2. The user set in the training data set is U, U={u ₁ , u ₂ ,..., u _q , _. .., u _N }, q∈{1,..., N}, where u _q is the qth user, N is the number of users; the product set is I, I={i ₁ , i ₂ ,..., it _t ,..., i _T }, t∈{1,... ., T}, where i _t is the t-th user, T is the quantity of goods; the behavior set is B, B={b ₁ , b ₂ ,..., b _k ,...., i _K }, k∈{1,...,K}, where b _k is the k-th behavior and K is the number of behaviors;

在步骤三中，设定用户交互行为序列内的每个元素为一个三元组的特征向量表示用户q用第k种行为与项目x交互；用户的多行为序列包含的是单一用户的交互信息，用户的多行为交互序列映射成初始嵌入形成特征矩阵其包含用户q通过所有行为交互的商品；辅助行为交互序列的特征向量/>和目标行为交互序列的特征向量/>作为多行为交互序列依赖编码器的输入。In step three, set each element in the user interaction behavior sequence to a triplet feature vector. Indicates that user q uses the k-th behavior to interact with item It contains the products that user q interacts with through all behaviors; the feature vector of the auxiliary behavior interaction sequence/> Feature vector of interaction sequence with target behavior/> As input to a multi-behavior interaction sequence-dependent encoder.

计算每个辅助行为的特征向量和目标行为的特征向量，计算过程为其中WQ，WQ∈Rd*n是可学习的行为向量的权重矩阵；/>表示的转置；/>表示辅助行为k和目标行为k′之间的关联矩阵；/> 每个关联矩阵/>经过softmax归一化就得到了符合概率分布取值区间的注意力分数/>softmax通过计算余弦相似度得出与购买行为最相近的行为；WV∈Rd*n是可学习的行为向量的权重矩阵。Calculate the feature vector of each auxiliary behavior and the feature vector of the target behavior. The calculation process is: Where WQ, WQ∈Rd*n is the weight matrix of the learnable behavior vector;/> express transposition;/> Represents the correlation matrix between auxiliary behavior k and target behavior k′;/> Each correlation matrix/> After softmax normalization, the attention score that conforms to the value range of the probability distribution is obtained/> softmax calculates the cosine similarity to find the behavior closest to the purchase behavior; WV∈Rd*n is the weight matrix of learnable behavior vectors.

在步骤五中，把同一用户的不同行为当作正样本对不同用户间的不同行为当作负样本对/>由此关于用户行为的自监督损失为：其中/>中的/> 表示计算余弦相似度。In step five, different behaviors of the same user are regarded as positive sample pairs Different behaviors between different users are treated as negative sample pairs/> Therefore, the self-monitoring loss regarding user behavior is: Among them/> in/> Represents the calculation of cosine similarity.

在步骤六中，图卷积用于学习图的结点表征，聚合并且传递结点特征；图卷积的过程具体为：对于每个行为子图嵌入成邻接矩阵A_k，其是由矩阵R_k构成，具体过程为：每个行为子图嵌入成邻接矩阵A_k作为行为的归一化拉普拉斯矩阵的输入，归一化过程为：/>其中/>表征k行为的度矩阵，I_k表示k行为的单位矩阵图卷积的输出通过阈值函数sigmoid：/>其中/>是图中结点的l层的结点特征矩阵，W_k是行为视图信息传递的转换矩阵；图卷积共L层，L表示获取的L阶邻居结点，通过结点信息来得到信息聚合的过程，获取图中关于k种行为的结点的特征，能够保存多行为上下文信息。In step six, graph convolution is used to learn the node representation of the graph, aggregate and transfer node features; the process of graph convolution is specifically: for each behavioral subgraph, it is embedded into an adjacency matrix A _k , which is represented by the matrix R _k is formed, the specific process is: Each behavioral subgraph is embedded into an adjacency matrix A _k as the input of the normalized Laplacian matrix of the behavior. The normalization process is:/> Among them/> The degree matrix characterizing k behavior, I _k represents the identity matrix of k behavior The output of graph convolution passes through the threshold function sigmoid:/> Among them/> is the node feature matrix of the l layer of the node in the graph, W _k is the transformation matrix for behavioral view information transmission; graph convolution has a total of L layers, L represents the obtained L-order neighbor nodes, and information aggregation is obtained through node information The process of obtaining the characteristics of nodes related to k kinds of behaviors in the graph can save multi-behavior context information.

在步骤七中，通过注意力辨别出辅助行为图对目标行为图的影响强度过程为：其中W^Q∈R^d*n和W^K∈R^d*n是可不断迭代更新的行为矩阵的权重矩阵，/>是注意力相关系数矩阵；/>的注意力计算过程与的注意力计算过程相同，被视作权重乘辅助行为/>其中W^V∈R^d*n是可不断迭代更新的行为矩阵的权重矩阵，/>是对于目标行为的辅助行为特征矩阵，作为跨行为交互图注意力编码器最终的输出。In step seven, the process of identifying the intensity of the influence of the auxiliary behavior map on the target behavior map through attention is: Where W ^Q ∈R ^d*n and W ^K ∈R ^d*n are the weight matrices of the behavior matrix that can be continuously updated iteratively, /> Is the attention correlation coefficient matrix;/> The attention calculation process of The attention calculation process is the same and is regarded as the weight multiplied by the auxiliary behavior/> Where W ^V ∈R ^d*n is the weight matrix of the behavior matrix that can be continuously updated iteratively,/> is the auxiliary behavior feature matrix for the target behavior, which is the final output of the cross-behavior interaction graph attention encoder.

在步骤八中，同一用户的不同行为视图被视为正样本对不同用户的不同行为被视为负样本对/>通过自监督正样本和负样本对来最大化用户之间的互信息：/>两个行为视图的一致性，并且最大化不同用户行为之间的差异性，得到行为数据监督信号的增强。In step eight, different behavioral views of the same user are regarded as positive sample pairs Different behaviors of different users are regarded as negative sample pairs/> Maximize mutual information between users by self-supervising pairs of positive and negative samples: /> The consistency of the two behavioral views and maximizing the differences between different user behaviors are enhanced by behavioral data supervision signals.

在步骤九中，把同一用户的序列通道和视图通道看作是正样本，用表示；不同用户的序列通道和视图通道看作是负样本，用/>表示；自监督损失：/>τ是温度系数，平衡两通道间学习的强度；所有的自监督损失和作为最终的目标损失：L_CL＝L_SCL+L_GCL+L_SGCL；L_SCL是序列通道内多行为交互序列自监督损失；L_GCL是图通道内多行为交互图的自监督损失；最终的损失函数列表L_CL由每对行为的序列损失函数L_SCL和视图损失函数L_GCL和序列视图损失函数L_SGCL构成。In step nine, the sequence channel and view channel of the same user are regarded as positive samples, using Represented; the sequence channels and view channels of different users are regarded as negative samples, using/> Represents; self-monitoring loss:/> τ is the temperature coefficient, balancing the intensity of learning between the two channels; the sum of all self-supervised losses is used as the final target loss: L _CL = L _SCL + L _GCL + L _SGCL ; L _SCL is the multi-behavior interactive sequence self-supervised loss in the sequence channel ; L _GCL is the self-supervised loss of the multi-behavior interaction graph within the graph channel; the final loss function list L _CL consists of the sequence loss function L _SCL and the view loss function L _GCL and the sequence view loss function L _SGCL for each pair of behaviors.

实施例Example

本发明提出了一个通用且灵活的多行为关系学习框架——基于双通道的多行为注意力自监督学习方法。具体来说，所述方法首先提出了一个多行为依赖编码器，通过在不同类型的用户-项目交互中结合特定类型的行为表示来学习行为的相互依赖关系。然后双通道的多行为自监督学习解决数据稀疏问题。为了对多类型行为模式依赖性进行建模，并进行综合学习以进行推荐。本发明设计的双通道多行为依赖自监督学习模型是将每种类型的用户-项目交互参数化到单独的嵌入空间中学习用户个性化行为类型的依赖表示，利用通道间的自监督学习范式增强数据监督信号。The present invention proposes a general and flexible multi-behavior relationship learning framework—a dual-channel multi-behavior attention self-supervised learning method. Specifically, the described method first proposes a multi-action dependency encoder to learn the interdependencies of actions by combining specific types of action representations in different types of user-item interactions. Then dual-channel multi-behavior self-supervised learning solves the data sparse problem. To model multi-type behavior pattern dependencies and conduct comprehensive learning for recommendation. The dual-channel multi-behavior dependent self-supervised learning model designed by the present invention is to parameterize each type of user-item interaction into a separate embedding space to learn the dependency representation of the user's personalized behavior type, and utilize the inter-channel self-supervised learning paradigm to enhance Data surveillance signals.

本发明提出一种基于双通道的多行为注意力自监督学习方法，所述方法包括以下步骤：The present invention proposes a multi-behavior attention self-supervised learning method based on dual channels. The method includes the following steps:

步骤1、在天猫和CIKM2019电商人工智能挑战赛获取商品交互数据集选取T％的数据作为训练数据，(1-T％)的数据作为测试数据，其中训练数据集包含用户对商品、用户及用户的多种交互行为历史；Step 1. Obtain the product interaction data set from Tmall and CIKM2019 E-commerce Artificial Intelligence Challenge. Select T% of the data as training data and (1-T%) of the data as test data. The training data set includes users’ responses to products, users and the user’s various interactive behavior histories;

步骤2、训练集中的用户集合为U，U＝{u₁，u₂，...，u_q，...，u._N}，q∈{1，...，N}，其中u_q为第q个用户，N为用户的数量。商品集合为I，I＝{i₁，i₂，...，i_t，....，i_T}，t∈{1，...，T}，其中i_t为第t个用户，T为商品的数量。行为集合为B，B＝{b₁，b₂，...，b_k，_....，i_K}，k∈{1，...，K}，其中b_k为第k个行为，K为行为的数量。Step 2. The user set in the training set is U, U={u ₁ , u ₂ ,..., u _q ,..., u. _N }, q∈{1,..., N}, where u _q is the q-th user, and N is the number of users. The product set is I, I={i ₁ , i ₂ ,..., it _t ,..., i _T }, t∈{1,..., T}, where i _t is the t-th user , T is the quantity of goods. The behavior set is B, B={b ₁ , b ₂ , ..., b _k , _. ..., i _K }, k∈{1, ..., K}, where b _k is the k-th behavior , K is the number of behaviors.

步骤3、从序列通道角度考虑，根据用户的行为历史，构建用户-商品交互序列。设定用户交互行为序列内的每个元素为一个三元组的特征向量表示用户q用第k种行为与项目x交互。用户的多行为序列包含的是单一用户的交互信息。由此，用户的多行为交互序列映射成初始嵌入形成特征矩阵/> 其包含用户q通过所有行为交互的商品。辅助行为交互序列的特征向量/>和目标行为交互序列的特征向量/>作为多行为交互序列依赖编码器的输入，多行为交互序列依赖编码器计算的计算方法与注意力的计算方法一致，计算每个辅助行为的特征向量和目标行为的特征向量，计算过程为/> 其中W^Q，W^Q∈R^d*n是可学习的行为向量的权重矩阵。/>表示/>的转置。/>表示辅助行为k和目标行为k′之间的关联矩阵。/> 每个关联矩阵/>经过softmax归一化就得到了符合概率分布取值区间的注意力分数/>softmax通过计算余弦相似度得出与购买行为最相近的行为。W^V∈R^d*n是可学习的行为向量的权重矩阵。为了防止过拟合问题同时避免计算时间成本过大，我们使用dropout得到/> Step 3. From the perspective of sequence channels, construct a user-product interaction sequence based on the user's behavior history. Set each element in the user interaction behavior sequence as a triplet feature vector Indicates that user q uses the kth behavior to interact with item x. A user's multi-behavior sequence contains interaction information of a single user. As a result, the user's multi-behavior interaction sequence is mapped into the initial embedding to form a feature matrix/> It contains the goods that user q interacts with through all actions. Characteristic vector of auxiliary behavior interaction sequence/> Feature vector of interaction sequence with target behavior/> As the input of the multi-behavior interaction sequence dependence encoder, the calculation method of the multi-behavior interaction sequence dependence encoder is consistent with the calculation method of attention. The feature vector of each auxiliary behavior and the feature vector of the target behavior are calculated. The calculation process is/> where W ^Q , W ^Q ∈R ^d*n is the weight matrix of learnable behavior vectors. /> Express/> of transposition. /> Represents the correlation matrix between auxiliary behavior k and target behavior k′. /> Each correlation matrix/> After softmax normalization, the attention score that conforms to the value range of the probability distribution is obtained/> Softmax calculates the cosine similarity to find the behavior closest to the purchase behavior. W ^V ∈R ^d*n is the weight matrix of learnable behavior vectors. In order to prevent over-fitting problems and avoid excessive computational time costs, we use dropout to get/>

步骤4、从序列通道角度考虑，由于深度双向模型优于单向模型，引入BERT4Rec的计算方注GELU为高斯误差线性单元激活函数。W表示GELU激活函数的权重矩阵，b表示偏置。/>softmax作为输出的激活函数，对各种行为序列拼接的结果进行归一化操作。对于不同的用户，他们拥有不同的行为交互序列导致不同编码结果。Step 4. From the perspective of sequence channels, since the deep bidirectional model is better than the unidirectional model, the calculation method of BERT4Rec is introduced. GELU is the Gaussian error linear unit activation function. W represents the weight matrix of the GELU activation function, and b represents the bias. /> Softmax is used as the output activation function to normalize the results of splicing various behavioral sequences. For different users, they have different behavioral interaction sequences resulting in different coding results.

步骤5、从序列通道考虑，针对上述表征结果，设计了自监督损失。具体来说，把同一用户的不同行为当作正样本对不同用户间的不同行为当作负样本对因此关于用户行为的自监督损失为：/> 其中/>中的/>表示计算余弦相似度。Step 5. Considering the sequence channel, a self-supervised loss is designed for the above characterization results. Specifically, different behaviors of the same user are regarded as positive sample pairs Different behaviors between different users are treated as negative sample pairs Therefore, the self-monitoring loss regarding user behavior is:/> Among them/> in/> Represents the calculation of cosine similarity.

步骤6、从图通道考虑，利用用户的单一信息无法获取足够的可用信息。由于用户存在多种行为，包括点击，加入购物车，收藏和购买行为。定义G＝(V，E)，V表示结点集合包含用户集u∈U和项目集i∈I即(U，I)∈V。E表示用户结点与项目结点间的不同交互行为。此外，多行为图的嵌入由多个行为子图嵌入构成，因此行为子图嵌入可以表示成G_b＝(V_b，E_b)。例如用户点击的项目构成的行为子图，G_click＝(V_click，E_click)，其中G_click表示用户通过点击行为交互的项目视图表示，V_click表示与点击行为连接的用户和项目结点，E_click表示用户的点击行为。首先图卷积致力于学习图的结点表征，聚合并且传递结点特征。图卷积的过程，具体来说，对于每个行为子图嵌入成邻接矩阵A_k，它是由矩阵R_k构成，具体过程如：每个行为子图嵌入成邻接矩阵A_k作为行为的归一化拉普拉斯矩阵的输入，归一化过程如：/>其中/>表征k行为的度矩阵，I_k表示k行为的单位矩阵由于图卷积能够很好的获取用户结点的的高阶依赖关系，因此，对于用户的多行为交互图可以使用图卷积更好的获取所有用户结点的全局表征。图卷积的输出方法经过阈值函数/>其中/>是图中结点的l层的结点特征矩阵，W_k是行为视图信息传递的转换矩阵。图卷积共L层，L表示获取的L阶邻居结点，通过结点信息来得到信息聚合的过程，获取图中关于k种行为的结点的特征，可以保存多行为上下文信息。Step 6. Considering the graph channel, it is impossible to obtain enough available information using the user's single information. Because users have a variety of behaviors, including clicks, adding to shopping carts, collections and purchasing behaviors. Define G = (V, E), V means that the node set contains the user set u∈U and the item set i∈I, that is, (U, I)∈V. E represents the different interactive behaviors between user nodes and project nodes. In addition, the embedding of a multi-behavior graph is composed of multiple behavior sub-graph embeddings, so the behavior sub-graph embedding can be expressed as G _b = (V _b , E _b ). For example, the behavior subgraph composed of the items clicked by the user, G _click = (V _click , E _click ), where G _click represents the item view representation of the user's interaction through the click behavior, and V _click represents the user and item nodes connected to the click behavior. E _click represents the user's click behavior. First, graph convolution is dedicated to learning the node representation of the graph, aggregating and transmitting node features. The process of graph convolution, specifically, embeds each behavioral subgraph into an adjacency matrix A _k , which is composed of the matrix R _k . The specific process is as follows: Each behavioral subgraph is embedded into an adjacency matrix A _k as the input of the normalized Laplacian matrix of the behavior. The normalization process is as follows: /> Among them/> The degree matrix characterizing k behavior, I _k represents the identity matrix of k behavior Since graph convolution can well obtain the high-order dependencies of user nodes, graph convolution can be used to better obtain the global representation of all user nodes for the user's multi-behavior interaction graph. The output method of graph convolution passes through the threshold function/> Among them/> is the node feature matrix of layer l of the node in the graph, and W _k is the transformation matrix for behavioral view information transmission. Graph convolution has a total of L layers, where L represents the acquired L-order neighbor nodes. The process of information aggregation is obtained through node information, and the characteristics of nodes related to k behaviors in the graph are obtained, and multi-behavior context information can be saved.

步骤7、从图通道考虑，辅助行为图和目标行为图/>作为注意力的输入。通过注意力辨别出辅助行为图对目标行为图的影响强度过程如：/>其中W^Q∈R^d*n和W^K∈R^d*n是可不断迭代更新的行为矩阵的权重矩阵，/>是注意力相关系数矩阵。的注意力计算过程与步骤3中注意力计算过程相同，被视作权重乘辅助行为/>其中W^V∈R^d*n是可不断迭代更新的行为矩阵的权重矩阵。/>是对于目标行为的辅助行为特征矩阵，作为跨行为交互图注意力编码器最终的输出。Step 7. Consider from the diagram channel, auxiliary behavior diagram and target behavior diagram/> as attentional input. The process of identifying the intensity of the influence of the auxiliary behavior map on the target behavior map through attention is as follows: /> Where W ^Q ∈R ^d*n and W ^K ∈R ^d*n are the weight matrices of the behavior matrix that can be continuously updated iteratively, /> is the attention correlation coefficient matrix. The attention calculation process is the same as the attention calculation process in step 3, which is regarded as the weight multiplied by the auxiliary behavior/> Where W ^V ∈R ^d*n is the weight matrix of the behavior matrix that can be continuously updated iteratively. /> is the auxiliary behavior feature matrix for the target behavior, which is the final output of the cross-behavior interaction graph attention encoder.

步骤8、从图通道考虑，设计了通道内多行为交互图的自监督学习，通过自监督学习增强多种行为数据监督信号。同一用户的不同行为视图被视为正样本对不同用户的不同行为被视为负样本对/>通过自监督正样本和负样本对来最大化用户之间的互信息：两个行为视图的一致性。并且最大化不同用户行为之间的差异性，得到行为数据监督信号的增强。Step 8. Considering the graph channel, self-supervised learning of multi-behavior interaction graphs within the channel is designed, and multiple behavioral data supervision signals are enhanced through self-supervised learning. Different behavioral views of the same user are considered positive sample pairs Different behaviors of different users are regarded as negative sample pairs/> Maximize the mutual information between users by self-supervising pairs of positive and negative samples: Consistency of two behavioral views. And maximize the difference between different user behaviors, and obtain the enhancement of behavioral data supervision signals.

步骤9、从序列通道和图通道考虑，双通道结合的自监督学习，更加有利于增强监督信号。把同一用户的序列通道和视图通道看作是正样本，用表示。不同用户的序列通道和视图通道看作是负样本，用/>表示。自监督损失：τ是温度系数，平衡两通道间学习的强度。所有的自监督损失和作为最终的目标损失：L_CL＝L_SCL+L_GCL+L_SaCL。L_SCL是在步骤5部分提到的序列通道内多行为交互序列自监督损失。L_GCL是在步骤9部分提到的图通道内多行为交互图的自监督损失。因此本发明将所提出的两种自监督损失作为补充。最终的损失函数列表L_CL由每对行为的序列损失函数L_SCL和视图损失函数L_GCL和序列视图损失函数L_SGCL构成。Step 9. Considering the sequence channel and the graph channel, self-supervised learning combined with dual channels is more conducive to enhancing the supervision signal. Consider the sequence channel and view channel of the same user as positive samples, use express. Sequence channels and view channels of different users are regarded as negative samples, use/> express. Self-supervised loss: τ is the temperature coefficient, balancing the intensity of learning between the two channels. The sum of all self-supervised losses is the final target loss: L _CL = L _SCL + L _GCL + L _SaCL . L _SCL is the multi-behavior interactive sequence self-supervised loss within the sequence channel mentioned in the step 5 section. L _GCL is a self-supervised loss for multi-behavior interaction graphs within the graph channel mentioned in the step 9 section. Therefore, the present invention supplements the two proposed self-supervision losses. The final loss function list L _CL consists of the sequence loss function L _SCL and the view loss function L _GCL and the sequence view loss function L _SGCL for each pair of behaviors.

本申请实施例中的存储器可以是易失性存储器或非易失性存储器，或可包括易失性和非易失性存储器两者。其中，非易失性存储器可以是只读存储器(read only memory，ROM)、可编程只读存储器(programmable ROM，PROM)、可擦除可编程只读存储器(erasablePROM，EPROM)、电可擦除可编程只读存储器(electrically EPROM，EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory，RAM)，其用作外部高速缓存。通过示例性但不是限制性说明，许多形式的RAM可用，例如静态随机存取存储器(static RAM，SRAM)、动态随机存取存储器(dynamic RAM，DRAM)、同步动态随机存取存储器(synchronousDRAM，SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM，DDRSDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM，ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM，SLDRAM)和直接内存总线随机存取存储器(direct rambusRAM，DR RAM)。应注意，本发明描述的方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。The memory in the embodiment of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories. Among them, the non-volatile memory can be read only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasablePROM, EPROM), electrically erasable memory Programmable read-only memory (electrically EPROM, EEPROM) or flash memory. Volatile memory may be random access memory (RAM), which is used as an external cache. By way of illustration, but not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM) ), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDRSDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (synchlink DRAM, SLDRAM) and Direct memory bus random access memory (direct rambusRAM, DR RAM). It should be noted that the memory of the method described herein is intended to include, but is not limited to, these and any other suitable types of memory.

在上述实施例中，可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时，可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机指令时，全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中，或者从一个计算机可读存储介质向另一个计算机可读存储介质传输，例如，所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line，DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如，软盘、硬盘、磁带)、光介质(例如，高密度数字视频光盘(digital video disc，DVD))、或者半导体介质(例如，固态硬盘(solid state disc，SSD))等。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated. The usable media may be magnetic media (e.g., floppy disks, hard disks, tapes), optical media (e.g., high-density digital video discs (DVD)), or semiconductor media (e.g., solid state disks). SSD)) etc.

在实现过程中，上述方法的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成，或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器，闪存、只读存储器，可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器，处理器读取存储器中的信息，结合其硬件完成上述方法的步骤。为避免重复，这里不再详细描述。During the implementation process, each step of the above method can be completed by instructions in the form of hardware integrated logic circuits or software in the processor. The steps of the methods disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware processor for execution, or can be executed by a combination of hardware and software modules in the processor. The software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field. The storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware. To avoid repetition, it will not be described in detail here.

应注意，本申请实施例中的处理器可以是一种集成电路芯片，具有信号的处理能力。在实现过程中，上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成，或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器，闪存、只读存储器，可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器，处理器读取存储器中的信息，结合其硬件完成上述方法的步骤。It should be noted that the processor in the embodiment of the present application may be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the above method embodiment can be completed through an integrated logic circuit of hardware in the processor or instructions in the form of software. The above-mentioned processor may be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. . Each method, step and logical block diagram disclosed in the embodiment of this application can be implemented or executed. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc. The steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field. The storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.

以上对本发明所提出的一种基于双通道的多行为注意力自监督学习方法进行了详细介绍，本文中应用了具体个例对本发明的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本发明的方法及其核心思想；同时，对于本领域的一般技术人员，依据本发明的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本发明的限制。The above is a detailed introduction to the multi-behavior attention self-supervised learning method based on dual channels proposed by the present invention. In this article, specific examples are used to illustrate the principles and implementation methods of the present invention. The description of the above embodiments is only for To help understand the method and its core idea of the present invention; at the same time, for those of ordinary skill in the field, there will be changes in the specific implementation and application scope based on the idea of the present invention. In summary, this specification The contents should not be construed as limitations of the invention.

Claims

1. A dual-channel based multi-behavior attention self-supervised learning method, characterized in that: the method includes the following steps:

Step 1. Obtain the product interaction data set from Tmall and CIKM2019 E-commerce Artificial Intelligence Challenge, select T% of the data as the training data set, and (1-T%) of the data as the test data set. The training data set contains user interactions. Products, users, and users’ various interactive behavior histories;

Step 2. The user set in the training data set is U, U={u ₁ ,u ₂ ,...,u _q ,...,u _N }, q∈{1,...,N}, where u _q is the qth user, N is the number of users; the product set is I, I={i ₁ ,i ₂ ,...,i _t ,....,i _T },t∈{1,.. .,T}, where i _t is the t-th user, T is the quantity of goods; the behavior set is B, B={b ₁ ,b ₂ ,...,b _k ,....,i _K }, k∈{1,...,K}, where b _k is the k-th behavior and K is the number of behaviors;

Step 3: From the perspective of sequence channels, construct a user-product interaction sequence based on the user's behavior history;

Step 4. From the perspective of sequence channels, since the deep bidirectional model is better than the unidirectional model, the calculation method of BERT4Rec is introduced. GELU is the Gaussian error linear unit activation function; W represents the weight matrix of the GELU activation function, and b represents the bias;/> Softmax is used as the output activation function to normalize the results of splicing various behavioral sequences. For different users, they have different behavioral interaction sequences that lead to different encoding results;

Step 5: Considering the sequence channel and based on the above characterization results, design the self-supervised loss;

Step 6: Consider the graph channel to obtain enough available information about the user; users have multiple behaviors, including clicking, adding to shopping cart, collecting and purchasing behaviors; define G = (V, E), V means that the node set contains the user The set u∈U and the item set i∈I are (U,I)∈V; E represents the different interactive behaviors between user nodes and item nodes; the embedding of multi-behavior graphs consists of multiple behavior sub-graph embeddings, and the behavior sub-graph embedding Graph embedding is expressed as G _b =(V _b ,E _b );

Step 7: Consider from the diagram channel, auxiliary behavior diagram and target behavior diagram/> as input to attention;

Step 8: Considering the graph channel, design self-supervised learning of multi-behavior interaction graphs in the channel, and enhance multiple behavioral data supervision signals through self-supervised learning;

Step 9: Consider the sequence channel and the graph channel, and enhance the supervision signal through self-supervised learning combined with dual channels.

2. The method according to claim 1, characterized in that: in step three, each element in the user interaction behavior sequence is set to be a triplet feature vector. Indicates that user q uses the kth behavior to interact with item It contains the products that user q interacts with through all behaviors; the feature vector of the auxiliary behavior interaction sequence/> Feature vector of interaction sequence with target behavior/> As input to a multi-behavior interaction sequence-dependent encoder.

3. The method according to claim 2, characterized in that: calculating the feature vector of each auxiliary behavior and the feature vector of the target behavior, the calculation process is: Where W ^Q , W ^Q ∈R ^d*n is the weight matrix of the learnable behavior vector;/> Express/> transposition;/> Represents the correlation matrix between auxiliary behavior k and target behavior k';/> Each correlation matrix/> After softmax normalization, the attention score that conforms to the value range of the probability distribution is obtained/> softmax calculates the cosine similarity to find the behavior closest to the purchase behavior;/> W ^V ∈R ^d*n is the weight matrix of learnable behavior vectors.

4. The method according to claim 3, characterized in that: in step five, different behaviors of the same user are regarded as positive sample pairs. Different behaviors between different users are treated as negative sample pairs/> U,q≠p}; thus the self-supervision loss regarding user behavior is:/> Among them/> in/> Represents the calculation of cosine similarity.

5. The method according to claim 4, characterized in that: in step six, graph convolution is used to learn the node representation of the graph, aggregate and transfer node features; the process of graph convolution is specifically: for each The behavioral subgraph is embedded into the adjacency matrix A _k , which is composed of the matrix R _k . The specific process is: Each behavioral subgraph is embedded into an adjacency matrix A _k as the input of the normalized Laplacian matrix of the behavior. The normalization process is:/> Among them/> The degree matrix characterizing k behavior, I _k represents the identity matrix of k behavior/> The output of graph convolution passes through the threshold function Among them/> is the node feature matrix of the l layer of the node in the graph, W _k is the transformation matrix for behavioral view information transmission; graph convolution has a total of L layers, L represents the obtained L-order neighbor nodes, and information aggregation is obtained through node information The process of obtaining the characteristics of nodes related to k kinds of behaviors in the graph can save multi-behavior context information.

6. The method according to claim 5, characterized in that: in step seven, the process of identifying the influence intensity of the auxiliary behavior map on the target behavior map through attention is: Where W ^Q ∈R ^d*n and W ^K ∈R ^d*n are the weight matrices of the behavior matrix that can be continuously updated iteratively, /> Is the attention correlation coefficient matrix;/> The attention calculation process and/> The attention calculation process is the same and is regarded as the weight multiplied by the auxiliary behavior/> Where W ^V ∈R ^d*n is the weight matrix of the behavior matrix that can be continuously updated iteratively,/> is the auxiliary behavior feature matrix for the target behavior, which is the final output of the cross-behavior interaction graph attention encoder.

7. The method according to claim 6, characterized in that: in step eight, different behavior views of the same user are regarded as positive sample pairs Different behaviors of different users are regarded as negative sample pairs/> Maximize mutual information between users by self-supervising pairs of positive and negative samples: /> The consistency of the two behavioral views and maximizing the differences between different user behaviors are enhanced by behavioral data supervision signals.

8. The method according to claim 7, characterized in that: in step nine, the sequence channel and the view channel of the same user are regarded as positive samples, using Represented; the sequence channels and view channels of different users are regarded as negative samples, using/> Represents; self-monitoring loss:/> τ is the temperature coefficient, balancing the intensity of learning between the two channels; the sum of all self-supervised losses is used as the final target loss: L _CL = L _SCL + L _GCL + L _SGCL ; L _SCL is the multi-behavior interactive sequence self-supervised loss in the sequence channel ; L _GCL is the self-supervised loss of the multi-behavior interaction graph within the graph channel; the final loss function list L _CL consists of the sequence loss function L _SCL and the view loss function L _GCL and the sequence view loss function L _SGCL for each pair of behaviors.

9. An electronic device, including a memory and a processor, the memory stores a computer program, characterized in that when the processor executes the computer program, the steps of the method of any one of claims 1-8 are implemented.

10. A computer-readable storage medium used to store computer instructions, characterized in that when the computer instructions are executed by a processor, the steps of the method described in any one of claims 1-8 are implemented.