CN111291261B

CN111291261B - A cross-domain recommendation method and its implementation system that integrates labels and attention mechanisms

Info

Publication number: CN111291261B
Application number: CN202010068923.6A
Authority: CN
Inventors: 钱忠胜; 涂宇; 朱懿敏
Original assignee: Jiangxi University of Finance and Economics
Current assignee: Dragon Totem Technology Hefei Co ltd; Shanghai Juhui Network Technology Co ltd
Priority date: 2020-01-21
Filing date: 2020-01-21
Publication date: 2023-05-26
Anticipated expiration: 2040-01-21
Also published as: CN111291261A

Abstract

The present invention discloses a cross-domain recommendation method and its implementation system for fusing labels and attention mechanisms. The recommendation method includes, firstly, selecting and constructing cross-domain fusion labels, and weighting the label vectors of the source domain and the target domain respectively. Then, according to the interest mining algorithm based on the attention mechanism, the user's preferences in the source domain and the target domain are obtained; then, according to the cross-domain label mapping algorithm based on BP neural network, learn the relationship between the source domain and the target domain. The label mapping of the user is used to obtain the user's comprehensive preference in the target field; finally, through the cross-domain recommendation algorithm that integrates the label mapping and attention mechanism, the items with high similarity to the user's comprehensive preference in the target field are recommended to the user. Considering the user's preferences in different fields through cross-domain recommendation, the cold start problem of users in the target field recommendation is improved; at the same time, in the cross-domain recommendation system, by analyzing the user's preferences in different fields, the recommendation results are more diverse. .

Description

Cross-domain recommendation method and implementation system integrating labels and attention mechanism

技术领域Technical Field

本发明涉及信息推荐方法和系统技术领域，具体涉及一种融合标签和注意力机制的跨领域推荐方法及其实现系统。The present invention relates to the technical field of information recommendation methods and systems, and in particular to a cross-domain recommendation method integrating tags and attention mechanisms and an implementation system thereof.

背景技术Background Art

随着互联网技术的快速发展，QQ、微信、微博等各类社交应用软件的数量快速地增长，多种多样的信息呈现在人们面前，极大地丰富了人们的日常生活。但是，这个过程中出现了一些不可避免的问题，例如信息泛滥和信息迷航。为了帮助每个用户更好地获取资源，个性化的推荐技术应运而生。当前，相关研究人员将个性化推荐技术应用于各个领域的资源推荐，除了电影、音乐、体育之外，还包括了电子商务、基于位置的服务、医疗等领域。未来个性化推荐技术的应用范畴会越来越广。With the rapid development of Internet technology, the number of various social application software such as QQ, WeChat, and Weibo has grown rapidly, and a variety of information has been presented to people, greatly enriching people's daily lives. However, some inevitable problems have emerged in this process, such as information flooding and information wandering. In order to help each user better obtain resources, personalized recommendation technology has emerged. At present, relevant researchers have applied personalized recommendation technology to resource recommendations in various fields, including e-commerce, location-based services, medical care, etc. in addition to movies, music, and sports. In the future, the application scope of personalized recommendation technology will become wider and wider.

大多数传统推荐算法都重点关注用户对项目的显性偏好，即数字评分。随着电子商务系统的不断扩大，用户的评分数据变得十分稀疏，仅仅通过分析用户的评分数据不足以充分了解用户的需求。而用户的隐性偏好，诸如用户的浏览记录、点击记录、标签信息，蕴含着丰富的信息，能显著改善推荐结果。Most traditional recommendation algorithms focus on users' explicit preferences for items, that is, numerical ratings. As e-commerce systems continue to expand, user rating data has become very sparse, and simply analyzing user rating data is not enough to fully understand user needs. However, users' implicit preferences, such as their browsing history, click history, and tag information, contain rich information and can significantly improve recommendation results.

目前，大多数推荐技术都为单领域的推荐技术，即仅仅利用用户在单一领域的兴趣对用户进行推荐，而多领域相结合的推荐技术较少。在单领域推荐中，往往存在着数据稀疏、用户冷启动以及商品冷启动等问题，使得推荐系统的性能下降，推荐准确度降低。At present, most recommendation technologies are single-domain recommendation technologies, that is, they only use users' interests in a single domain to recommend to users, while there are few recommendation technologies that combine multiple domains. In single-domain recommendation, there are often problems such as data sparsity, user cold start, and product cold start, which reduce the performance of the recommendation system and the accuracy of recommendation.

发明内容Summary of the invention

有鉴于此，有必要提供一种通过分析用户在不同领域的偏好、使得推荐结构更具有多样化特点的融合标签和注意力机制的跨领域推荐方法及其实现系统。In view of this, it is necessary to provide a cross-domain recommendation method and its implementation system that analyzes user preferences in different fields and makes the recommendation structure more diversified by integrating labels and attention mechanisms.

一种融合标签和注意力机制的跨领域推荐方法，包括以下步骤：A cross-domain recommendation method integrating labels and attention mechanisms includes the following steps:

步骤一，选择和构建跨领域融合的标签，将两个相近领域融合为一个新领域，作为跨领域推荐的源领域，对源领域和目标领域的标签向量分别进行加权求和，得到每个资源的资源向量；Step 1: Select and construct cross-domain fusion labels, merge two similar domains into a new domain as the source domain for cross-domain recommendation, and perform weighted summation on the label vectors of the source domain and the target domain to obtain the resource vector of each resource;

步骤二，依据基于注意力机制的兴趣挖掘算法，通过长短期记忆算法LSTM学习用户和资源之间的时序关系，并引入注意力机制，得到用户在源领域和目标领域的指定时间段内的偏好；Step 2: Based on the interest mining algorithm based on the attention mechanism, the temporal relationship between users and resources is learned through the long short-term memory algorithm LSTM, and the attention mechanism is introduced to obtain the user's preferences in the specified time period of the source and target fields;

步骤三，依据基于BP神经网络的跨领域标签映射算法，通过三层BP神经网络来学习源领域与目标领域间的标签映射，将用户源领域的偏好向量映射到目标领域后，并与用户在目标领域的偏好向量加权求和，得到用户在目标领域的综合偏好；Step 3: According to the cross-domain label mapping algorithm based on BP neural network, the label mapping between the source domain and the target domain is learned through a three-layer BP neural network. After mapping the user's preference vector in the source domain to the target domain, it is weighted and summed with the user's preference vector in the target domain to obtain the user's comprehensive preference in the target domain.

步骤四，融合标签映射和注意力机制的跨领域推荐算法，根据用户在源领域和目标领域中的综合偏好，计算在目标领域中，用户未浏览过的资源向量与用户的综合偏好的相似度，将前N项推荐给用户。Step 4: A cross-domain recommendation algorithm that integrates label mapping and attention mechanism calculates the similarity between the resource vectors that the user has not browsed in the target domain and the user's comprehensive preferences based on the user's comprehensive preferences in the source domain and the target domain, and recommends the top N items to the user.

进一步地，步骤一中的所述选择和构建跨领域融合的标签流程包括：Furthermore, the process of selecting and constructing cross-domain fusion labels in step 1 includes:

步骤1-1，源领域和目标领域的数据预处理；Step 1-1, data preprocessing of source and target domains;

采集两个相近的领域A和领域B、及目标领域C中的普遍使用的标签作为定制标签DT，检索与所述定制标签相关的资源，得到与所述资源对应的资源标签RT，去除重复的标签后，得到RT-DT矩阵，矩阵的每一列为定制标签DT向量，每一行为资源标签RT向量，对剩余的定制标签向量进行统一度量；Collect commonly used tags in two similar fields A and B and the target field C as custom tags DT, retrieve resources related to the custom tags, obtain resource tags RT corresponding to the resources, remove duplicate tags, and obtain the RT-DT matrix. Each column of the matrix is a custom tag DT vector, and each row is a resource tag RT vector. Perform a unified measurement on the remaining custom tag vectors.

步骤1-2，跨领域融合标签的选择与构建；Step 1-2, selection and construction of cross-domain fusion labels;

分析A、B领域相同的定制标签DT向量以及相同的资源标签RT向量的相似度，利用余弦相似度计算用户A与B资源内同一DT向量以及同一RT向量的相似度，如式(1)所示：Analyze the similarity of the same custom tag DT vector and the same resource tag RT vector in domains A and B, and use cosine similarity to calculate the similarity of the same DT vector and the same RT vector in the resources of users A and B, as shown in formula (1):

其中，

和

分别表示A领域和B领域的同一DT或RT，

、

分别为标签

和

的向量表示；为了确保推荐质量，将相似度低于阈值的标签剔除；in,

and

Represents the same DT or RT in field A and field B respectively,

,

Labels

and

To ensure the quality of recommendation, labels with similarity below the threshold are removed.

在已构建标签向量的各领域中，对定制标签DT和资源标签RT进行筛选，依据用户在各领域上的兴趣对相同资源标签RT向量进行加权求和，得到A-B领域标签向量矩阵，如式(2)和式(3)所示：In each field where the label vector has been constructed, the custom label DT and resource label RT are screened, and the weighted sum of the same resource label RT vectors is performed according to the user's interest in each field to obtain the A-B field label vector matrix, as shown in formula (2) and formula (3):

其中，

为跨领域融合标签，

为标签

的向量表示。in,

To integrate labels across domains,

For label

The vector representation of .

各个资源的资源标签RT按照该标签被用户标注的次数排序，前面的标签比后面的标签与资源的联系更为密切，按式(4)为每个资源的标签分配权重：The resource tag RT of each resource is sorted according to the number of times the tag is marked by the user. The front tags are more closely related to the resource than the back tags. The weight is assigned to each resource tag according to formula (4):

对A-B领域和C领域分别使用式(4)得到所有资源的每个标签向量对应的权重，再将这些标签向量加权求和得到每个资源的资源向量，如式(5)所示：Formula (4) is used for the A-B domain and the C domain to obtain the weights corresponding to each label vector of all resources, and then the weighted sum of these label vectors is used to obtain the resource vector of each resource, as shown in Formula (5):

其中，

为资源的每个标签的标签向量，

为资源的资源向量。in,

is the label vector for each label of the resource,

A resource vector for the resource.

进一步地，步骤1-1中的源领域和目标领域的数据预处理，包括如下步骤：Furthermore, the data preprocessing of the source domain and the target domain in step 1-1 includes the following steps:

步骤1-1-1，使用TF-IDF技术分别从A、B领域资源的文本中采集标签，再从已采集的标签中提取m个用户普遍使用且同时出现在两个领域的标签，即定制标签DT，这些标签能较好地表示资源特征，记为DTs；然后在A、B领域分别检索与DT相关的资源，再显示每个资源所对应的标签即资源标签RT，记为RTs；Step 1-1-1, use TF-IDF technology to collect tags from the text of resources in fields A and B respectively, and then extract m tags commonly used by users and appearing in both fields from the collected tags, namely customized tags DT. These tags can better represent resource characteristics, recorded as DTs; then search for resources related to DT in fields A and B respectively, and then display the tags corresponding to each resource, namely resource tags RT, recorded as RTs;

步骤1-1-2，使用TF-IDF技术从C领域中采集N个用户普遍使用的标签，记为DTt。然后检索与DT相关的资源，再显示每个资源所对应的标签，记为RTt；Step 1-1-2, use TF-IDF technology to collect N tags commonly used by users from the C field, denoted as DTt. Then retrieve resources related to DT, and then display the tags corresponding to each resource, denoted as RTt;

步骤1-1-3，对于收集到的标签，使用NLPIR汉语分词系统完成分词，并将重复的标签去除，统计每个RT在各DT对应的资源中出现的频次，向量值越大RT和DT联系越紧密；Step 1-1-3: For the collected tags, use the NLPIR Chinese word segmentation system to complete the word segmentation, remove duplicate tags, and count the frequency of each RT in the resources corresponding to each DT. The larger the vector value, the closer the connection between RT and DT;

步骤1-1-4，由于不同DT检索的资源总数不同，故将所有DT向量的每个分量除以该向量的最大分量，得到度量统一的DT向量。Step 1-1-4, since the total number of resources retrieved by different DTs is different, each component of all DT vectors is divided by the maximum component of the vector to obtain a DT vector with a unified metric.

进一步地，步骤二中的所述基于注意力机制的兴趣挖掘算法，包括如下步骤：Furthermore, the interest mining algorithm based on the attention mechanism in step 2 includes the following steps:

使用长短期记忆算法学习用户和资源之间的时序关系，假设记忆单元层的更新间隔为时间步

，

是记忆单元层在时间

的输入；

与

是权重矩阵；

是偏差向量；具体为:Use the long short-term memory algorithm to learn the temporal relationship between users and resources, assuming that the update interval of the memory unit layer is the time step

,

is the memory unit layer at time

Input;

and

is the weight matrix;

is the deviation vector; specifically:

步骤2-1，在每一个时间步

，输入门输入信息与权值相乘，再加上偏置量，计算得到输入门的控制变量

和新的输入向量

，具体如式(6)和式(7)所示：Step 2-1, at each time step

, the input gate input information is multiplied by the weight, and then the bias is added to calculate the control variable of the input gate

and the new input vector

, as shown in formula (6) and formula (7):

步骤2-2，在每一个时间步

，计算遗忘门的控制变量

与候选状态

相乘，即将遗忘门输入信息与权值相乘，再加上输入门输入信息与权值的乘积，然后将记忆单元状态从

更新到

，具体如式(8)和式(9)所示：Step 2-2, at each time step

, calculate the control variable of the forget gate

With candidate status

Multiply, that is, multiply the forget gate input information by the weight, plus the product of the input gate input information and the weight, and then change the memory unit state from

Update to

, as shown in formula (8) and formula (9):

步骤2-3，在更新后的记忆单元状态中，不断计算输出门的值，具体如式(10)和式(11)所示：Step 2-3, in the updated memory cell state, the value of the output gate is continuously calculated, as shown in equations (10) and (11):

步骤2-4，计算损失函数，并使用梯度下降法将其最小化，如式(12)所示：Step 2-4, calculate the loss function and minimize it using the gradient descent method, as shown in formula (12):

经过长短期记忆算法LSTM层计算后，将LSTM每个时间步隐藏状态作为输出结果输出到注意力层，以捕获序列之间的依赖关系，加权求和后得到输出序列i对应的上下文向量表示

，具体公式描述如式(13)和式(14)所示：After the LSTM layer is calculated, the hidden state of each time step of the LSTM is output to the attention layer as the output result to capture the dependencies between sequences. After weighted summation, the context vector representation corresponding to the output sequence i is obtained.

, the specific formula description is shown in formula (13) and formula (14):

其中，

为LSTM第i个时间步的输出，

表示第i个时间步与第j个时间步的输出进行归一化后的权重，相似度计算函数采用W表示的矩阵变换。in,

is the output of the LSTM i -th time step,

It represents the normalized weight of the output of the i -th time step and the j -th time step, and the similarity calculation function adopts the matrix transformation represented by W.

进一步地，依据所述基于注意力机制的兴趣挖掘算法，用户在源领域和目标领域的历史记录的所有资源向量分别通过LSTM层和注意力层计算后，得到用户在源领域和目标领域的偏好向量。Furthermore, according to the interest mining algorithm based on the attention mechanism, all resource vectors of the user's historical records in the source field and the target field are calculated through the LSTM layer and the attention layer respectively to obtain the user's preference vectors in the source field and the target field.

进一步地，步骤三中的所述基于BP神经网络的跨领域标签映射算法，包括如下流程：Furthermore, the cross-domain label mapping algorithm based on BP neural network in step 3 includes the following process:

通过三层BP神经网络来学习不同领域间的标签映射，首先在A-B领域和C领域分别收集用户使用最多的n个标签，根据式(4)分别计算两个领域中每个标签的权重，再通过加权分别计算该用户在A-B和C领域的特征向量，具体如式(15)和式(16)所示：The label mapping between different fields is learned through a three-layer BP neural network. First, the n labels most used by users in the A-B field and the C field are collected respectively. The weight of each label in the two fields is calculated according to formula (4). Then, the feature vectors of the user in the A-B and C fields are calculated by weighting, as shown in formulas (15) and (16):

其中，

为用户在A-B领域的特征向量，

和

为A-B领域各标签的权重和该标签的特征向量；

为用户在C领域的特征向量，

和

为C领域各标签的权重和该标签的特征向量；in,

is the feature vector of the user in the AB field,

and

is the weight of each label in the AB field and the feature vector of the label;

is the feature vector of the user in the C domain,

and

is the weight of each label in field C and the feature vector of the label;

然后，以

为输入向量，

为实际输出向量，使用BP神经网络学习

和

之间的映射，如式(17)~(21)所示：Then,

is the input vector,

To actually output the vector, use BP neural network to learn

and

The mapping between them is shown in equations (17) to (21):

输入层到隐藏层的映射关系如式(17)所示：The mapping relationship from the input layer to the hidden layer is shown in formula (17):

经过隐藏层的激活函数如式(18)所示：The activation function of the hidden layer is shown in formula (18):

隐藏层到输出层的映射关系如式(19)所示：The mapping relationship from the hidden layer to the output layer is shown in formula (19):

经过输出层的激活函数如式(20)所示：The activation function of the output layer is shown in formula (20):

BP神经网络的损失函数如式(21)所示：The loss function of the BP neural network is shown in formula (21):

其中，在式(14)和式(16)中，

；n为BP神经网络的隐藏层单元的数量，即，

，其中，

为输入层单元数量，

为输出层单元数量。Among them, in formula (14) and formula (16),

; n is the number of hidden layer units of the BP neural network, that is,

,in,

is the number of input layer units,

is the number of output layer units.

进一步地，BP神经网络的隐藏层单元的数量n采用黄金分割法确定。Furthermore, the number n of hidden layer units of the BP neural network is determined by the golden section method.

进一步地，步骤四中的所述融合标签映射和注意力机制的跨领域推荐算法，具体包括如下步骤：Furthermore, the cross-domain recommendation algorithm integrating label mapping and attention mechanism in step 4 specifically includes the following steps:

步骤4-1，设A-B领域的历史记录经过LSTM层和注意力层处理后的用户偏好向量为

，C领域的历史记录经过LSTM网络处理后的用户偏好向量为

，源领域和目标领域之间的映射网络为

，按式(22)将

和

加权求和，得到最终的用户资源向量：Step 4-1: Assume that the user preference vector of the historical records in field AB after being processed by the LSTM layer and the attention layer is

, the user preference vector after the historical records in field C are processed by the LSTM network is

, the mapping network between the source domain and the target domain is

, according to formula (22)

and

Weighted summation is performed to obtain the final user resource vector:

其中，

表示用户的资源向量，

表示用户历史记录中目标领域资源的数量，

表示用户历史记录中源领域资源的数量；in,

represents the user's resource vector,

Indicates the number of target domain resources in the user's history.

Indicates the number of source domain resources in the user's history;

步骤4-2，在计算得到最终的用户资源向量后，计算用户资源向量和资源库中目标领域的资源向量的相似度，具体如式(23)所示：Step 4-2, after calculating the final user resource vector, calculate the similarity between the user resource vector and the resource vector of the target domain in the resource library, as shown in formula (23):

其中，

表示用户资源向量，

表示资源库中目标领域的资源向量。in,

represents the user resource vector,

A resource vector representing the target domain in the resource library.

进一步地，所述最终的用户资源向量的计算中，根据用户在目标领域中的浏览量不同，直接影响用户在目标领域中的未浏览过的资源向量的相似度；当用户在目标领域浏览过的数据量很少时，可通过用户在源领域的偏好映射得到在目标领域的偏好计算用户在目标领域中未浏览过的资源向量的相似度；当用户在目标领域中的浏览数据量远大于其在源领域中的浏览数据量时，则数据稀疏性不存在，相当于直接根据目标领域中的浏览过的资源向量得出最终推荐结果。Furthermore, in the calculation of the final user resource vector, the different browsing volumes of the user in the target domain directly affect the similarity of the resource vectors that the user has not browsed in the target domain; when the amount of data browsed by the user in the target domain is very small, the similarity of the resource vectors that the user has not browsed in the target domain can be calculated by obtaining the preferences in the target domain through the user's preference mapping in the source domain; when the amount of data browsed by the user in the target domain is much larger than the amount of data browsed in the source domain, data sparsity does not exist, which is equivalent to directly obtaining the final recommendation result based on the browsed resource vectors in the target domain.

以及，一种融合标签和注意力机制的跨领域推荐方法的实现系统，其用于实现如上任一项所述的融合标签和注意力机制的跨领域推荐方法，该实现系统包括：And, a system for implementing a cross-domain recommendation method integrating labels and attention mechanisms, which is used to implement the cross-domain recommendation method integrating labels and attention mechanisms as described in any one of the above items, and the implementation system includes:

基于标签的跨领域资源融合算法模块，利用两个相近领域A和B的普遍使用的定制标签，得到与所述定制标签对应资源的资源标签，分析A、B领域相同的定制标签DT向量以及相同的资源标签RT向量的相似度，并剔除相似度低于阈值的资源标签向量；将由两个相近领域A和B组成的源领域及目标领域C的资源标签向量分别进行加权求和，得到源领域和目标领域的每个资源的资源向量；The label-based cross-domain resource fusion algorithm module uses the commonly used custom labels of two similar fields A and B to obtain the resource labels of the resources corresponding to the custom labels, analyzes the similarity of the same custom label DT vectors and the same resource label RT vectors in fields A and B, and removes the resource label vectors whose similarity is lower than the threshold; the resource label vectors of the source field and the target field C composed of two similar fields A and B are weighted and summed respectively to obtain the resource vectors of each resource in the source field and the target field;

基于注意力机制的兴趣挖掘算法模块，利用用户在源领域和目标领域的历史记录的所有资源向量，通过长短期记忆算法LSTM层和注意力层的深入学习，得到用户在源领域和目标领域的偏好向量；The interest mining algorithm module based on the attention mechanism uses all resource vectors of the user's historical records in the source and target fields, and obtains the user's preference vectors in the source and target fields through in-depth learning of the long short-term memory algorithm LSTM layer and the attention layer;

基于BP神经网络的跨领域标签映射算法模块，利用三层BP神经网络，将用户源领域的偏好向量映射到目标领域，得到用户在目标领域的综合偏好；The cross-domain label mapping algorithm module based on BP neural network uses a three-layer BP neural network to map the user's preference vector in the source domain to the target domain, and obtain the user's comprehensive preference in the target domain;

融合标签映射和注意力机制的跨领域推荐算法模块，用于结合用户在源领域和目标领域的偏好，得到用户在目标领域的综合偏好，然后计算用户在目标领域中未浏览过的资源向量与用户的综合偏好的相似度，得到用户在目标领域的推荐列表。The cross-domain recommendation algorithm module that integrates label mapping and attention mechanism is used to combine the user's preferences in the source domain and the target domain to obtain the user's comprehensive preference in the target domain, and then calculates the similarity between the resource vector that the user has not browsed in the target domain and the user's comprehensive preference to obtain the user's recommendation list in the target domain.

上述融合标签和注意力机制的跨领域推荐方法及其实现系统中，通过融合两个相近领域的项目标签，构建跨领域融合标签，将两个相近领域融合为一个新领域，并将其作为跨领域推荐的源领域，再将一个相关度较低的一个领域作为目标领域，使得跨领域推荐的源领域数据更丰富。通过引入注意力机制在源领域和目标领域分别计算用户当前的偏好，使得推荐结果更具有时效性。通过BP神经网络获取用户在源领域和目标领域的总体偏好的映射关系，并将此映射关系应用于用户当前的偏好，将其从源领域映射到目标领域，再结合目标领域单独的推荐结果，使得最终推荐结果更加准确。In the cross-domain recommendation method and implementation system of the above-mentioned fusion label and attention mechanism, a cross-domain fusion label is constructed by fusing the project labels of two similar fields, and the two similar fields are fused into a new field, which is used as the source field of the cross-domain recommendation, and then a field with lower correlation is used as the target field, so that the source field data of the cross-domain recommendation is richer. By introducing the attention mechanism, the user's current preferences are calculated in the source field and the target field respectively, so that the recommendation results are more timely. The mapping relationship of the user's overall preferences in the source field and the target field is obtained through the BP neural network, and this mapping relationship is applied to the user's current preferences, mapping them from the source field to the target field, and then combining the recommendation results of the target field separately, so that the final recommendation results are more accurate.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明实施例的融合标签和注意力机制的跨领域推荐方法的实现系统的结构示意图。FIG1 is a schematic diagram of the structure of an implementation system of a cross-domain recommendation method integrating labels and attention mechanisms according to an embodiment of the present invention.

图2是本发明实施例的融合标签和注意力机制的跨领域推荐方法的实现系统的三层BP神经网络的结构示意图。FIG2 is a schematic diagram of the structure of a three-layer BP neural network of an implementation system of a cross-domain recommendation method integrating tags and attention mechanisms according to an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

本实施例以融合标签和注意力机制的跨领域推荐方法及其实现系统为例，以下将结合具体实施例和附图对本发明进行详细说明。This embodiment takes the cross-domain recommendation method and its implementation system that integrates labels and attention mechanisms as an example, and the present invention will be described in detail below in conjunction with specific embodiments and drawings.

请参阅图1和图2，示出本发明实施例提供的一种融合标签和注意力机制的跨领域推荐方法。Please refer to Figures 1 and 2, which show a cross-domain recommendation method that integrates labels and attention mechanisms provided by an embodiment of the present invention.

利用标签和注意力机制实现跨领域资源推荐，通过将用户在源领域的偏好映射到目标领域，再结合用户在目标领域的偏好，得到用户在目标领域的综合偏好。Cross-domain resource recommendation is achieved by using tags and attention mechanisms. By mapping the user's preferences in the source domain to the target domain and combining them with the user's preferences in the target domain, the user's comprehensive preferences in the target domain are obtained.

由于单领域数据的稀疏性，使得推荐的准确度降低，若能将多个领域的数据结合可大大提高推荐结果的可靠性。若用户在目标领域的数据稀疏，可将其源领域的偏好通过映射网络获得对应的目标领域偏好，若用户在源领域的数据同样也是稀疏的，则可通过融合一个与之相似的领域来解决。若用户在目标领域的数据丰富程度比用户在融合后的源领域的数据高很多，则可直接通过用户在目标领域的数据来学习其偏好。Due to the sparsity of single-domain data, the accuracy of recommendations is reduced. If data from multiple domains can be combined, the reliability of recommendation results can be greatly improved. If the user's data in the target domain is sparse, the preferences in the source domain can be mapped to the corresponding target domain preferences through the network. If the user's data in the source domain is also sparse, it can be solved by fusing a similar domain. If the user's data in the target domain is much richer than the user's data in the fused source domain, the user's preferences can be learned directly from the user's data in the target domain.

通过构建跨领域标签向量将两个相近领域融合为一个新领域，作为跨领域推荐的源领域。首先，使用自然语言处理的方法提取每个领域的标签，定制相应的标签向量，并将两个领域中相似度超过指定阈值的标签向量进行融合，定制跨领域融合标签。其次，通过引入注意力机制并结合LSTM，分别在源领域和目标领域计算用户在指定时间段内的偏好。再次，使用BP神经网络学习源领域标签和目标领域标签之间的映射关系，将目标领域和源领域的数据结合，使推荐结果更加准确。最后，描述了跨领域推荐模型的整体框架，将得到的用户源领域偏好通过学习到的映射网络映射到目标领域，与用户目标领域偏好结合得到最终的结果。By constructing a cross-domain label vector, two similar domains are fused into a new domain as the source domain for cross-domain recommendation. First, the label of each domain is extracted using natural language processing methods, the corresponding label vector is customized, and the label vectors whose similarity exceeds the specified threshold in the two domains are fused to customize the cross-domain fusion label. Secondly, by introducing the attention mechanism and combining it with LSTM, the user's preferences in the specified time period are calculated in the source domain and the target domain respectively. Thirdly, the BP neural network is used to learn the mapping relationship between the source domain label and the target domain label, and the data of the target domain and the source domain are combined to make the recommendation results more accurate. Finally, the overall framework of the cross-domain recommendation model is described. The user's source domain preference is mapped to the target domain through the learned mapping network, and the final result is obtained by combining it with the user's target domain preference.

1. 标签的跨领域资源融合1. Cross-domain resource integration of tags

各个领域不是孤立的，它们之间往往存在着一定的联系，当它们的相似性超过一定程度时，就可以结合为一个新的领域。这个新领域的资源数量为两个相似领域的资源数量之和，将其作为跨领域推荐的源领域，可丰富源领域的信息。Each field is not isolated, and there are often certain connections between them. When their similarities exceed a certain degree, they can be combined into a new field. The number of resources in this new field is the sum of the number of resources in the two similar fields. Using it as the source field for cross-field recommendation can enrich the information in the source field.

1.1数据预处理1.1 Data Preprocessing

记A，B为两个相似领域，我们将A和B结合的领域作为源领域，另一个跨度较大的领域C作为目标领域，处理步骤如下：Let A and B be two similar domains. We take the domain that combines A and B as the source domain and another domain C with a larger span as the target domain. The processing steps are as follows:

1）使用TF-IDF技术分别从A，B领域资源的文本中采集标签，再从已采集的标签中提取m个用户普遍使用且同时出现在两个领域的标签，即定制标签(简称DT)，这些标签能较好地表示资源特征，记为DTs。然后在A、B领域分别检索与DT相关的资源，再显示每个资源所对应的标签即资源标签（简称RT），记为RTs。1) Use TF-IDF technology to collect tags from the text of resources in domains A and B respectively, and then extract m tags commonly used by users and appearing in both domains from the collected tags, namely customized tags (DT for short). These tags can better represent resource features and are recorded as DTs. Then search for resources related to DT in domains A and B respectively, and then display the tags corresponding to each resource, namely resource tags (RT for short), recorded as RTs.

2）使用TF-IDF技术从C领域中采集N个用户普遍使用的标签，记为DTt。然后检索与DT相关的资源，再显示每个资源所对应的标签，记为RTt。2) Use TF-IDF technology to collect N tags commonly used by users from the C field, denoted as DTt. Then retrieve resources related to DT and display the tags corresponding to each resource, denoted as RTt.

3）对于收集到的标签，使用NLPIR汉语分词系统完成分词，并将重复的标签去除，统计每个RT在各DT对应的资源中出现的频次，向量值越大RT和DT联系越紧密。3) For the collected tags, use the NLPIR Chinese word segmentation system to complete the word segmentation, remove duplicate tags, and count the frequency of each RT in the resources corresponding to each DT. The larger the vector value, the closer the connection between RT and DT.

4）由于不同DT检索的资源总数不同，故将所有DT向量的每个分量除以该向量的最大分量，得到度量统一的DT向量。4) Since the total number of resources retrieved by different DTs is different, each component of all DT vectors is divided by the maximum component of the vector to obtain a DT vector with a unified metric.

1.2跨领域融合标签的选择与构建1.2 Selection and Construction of Cross-Domain Fusion Labels

我们需要分析A、B领域相同的DT向量以及相同的RT向量的相关性，只有在两者相关性足够大时，才能够被利用与跨领域推荐。若A领域和B领域的同一DT向量或同一RT向量的单个分量相似并不能说明它们相关性高，但若所有分量都相似，则可认为两者的相关性足够高。利用余弦相似度计算用户A与B资源内同一DT向量以及同一RT向量的相似度，如式(1)所示。We need to analyze the correlation between the same DT vector and the same RT vector in domains A and B. Only when the correlation between the two is large enough can they be used for cross-domain recommendation. If the single component of the same DT vector or the same RT vector in domains A and B is similar, it does not mean that they are highly correlated. However, if all components are similar, it can be considered that the correlation between the two is high enough. The cosine similarity is used to calculate the similarity between the same DT vector and the same RT vector in the resources of user A and B, as shown in formula (1).

其中，

和

分别表示A领域和B领域的同一DT或RT，

、

分别为标签

和

的向量表示。为了确保推荐质量，将相似度低于阈值的标签剔除。in,

and

Represents the same DT or RT in field A and field B respectively,

,

Labels

and

In order to ensure the quality of recommendation, the tags with similarity below the threshold are removed.

在已构建各领域的标签向量，并对DT和RT进行筛选后，依据用户在各领域上的兴趣对相同RT向量进行加权求和，得到A-B领域标签向量矩阵，如式(2)和式(3)所示。After constructing the label vectors of each field and screening DT and RT, the same RT vectors are weighted and summed according to the user's interests in each field to obtain the A-B field label vector matrix, as shown in Equation (2) and Equation (3).

其中，

为跨领域融合标签，

为标签

的向量表示。in,

To integrate labels across domains,

For label

The vector representation of .

资源的RT按照该标签被用户标注的次数排序，前面的标签比后面的标签与资源的联系更为密切，按式(4)为每个资源的标签分配权重。The RT of resources is sorted according to the number of times the tag is marked by the user. The front tags are more closely related to the resources than the back tags. The weight is assigned to each resource tag according to formula (4).

对A-B领域和C领域分别使用式(4)得到所有资源的每个标签向量对应的权重，再将这些标签向量加权求和得到每个资源的资源向量，如式(5)所示。Formula (4) is used for the A-B domain and the C domain to obtain the weight corresponding to each label vector of all resources, and then the resource vector of each resource is obtained by weighted summation of these label vectors, as shown in Formula (5).

其中，

为资源的每个标签的标签向量，

为资源的资源向量，该过程如算法1所示。in,

is the label vector for each label of the resource,

is the resource vector of the resource. The process is shown in Algorithm 1.

2. 基于注意力机制的兴趣挖掘算法2. Interest mining algorithm based on attention mechanism

不同的人对于资源的关注点不尽相同，有些人可能喜欢历史题材的资源，有人喜欢科幻类的资源，也有人喜欢推理类的资源。为了准确地发现用户的偏好，本软件使用LSTM来学习用户和资源之间的时序关系。具体见下面的模型。Different people have different interests in resources. Some people may like historical resources, some like science fiction resources, and some like reasoning resources. In order to accurately discover user preferences, this software uses LSTM to learn the temporal relationship between users and resources. See the model below for details.

假设每个时间步为

，记忆单元层将进行更新，并假设:Assume that each time step is

, the memory unit layer will be updated and assume that:

1)

是记忆单元层在时间

的输入，即用户的浏览记录。1)

is the memory unit layer at time

The input is the user's browsing history.

2)

与

是权重矩阵。2)

and

is the weight matrix.

3)

是偏差向量。具体为:3)

is the deviation vector. Specifically:

① 在每一个时间步

，将输入门输入信息与权值相乘，再加上偏置量，计算得到输入门的控制变量

和新的输入向量

，具体见式(6)和式(7)。① At each time step

, multiply the input gate input information by the weight, add the bias, and calculate the control variable of the input gate

and the new input vector

, see formula (6) and formula (7) for details.

② 在每一个时间步

，将计算遗忘门的控制变量

与候选状态

更新到

，具体见式(8)和式(9)。② At each time step

, the control variable of the forget gate will be calculated

With candidate status

Update to

, see formula (8) and formula (9) for details.

③ 在上述更新后记忆单元状态后，就可以不断计算输出门的值，具体见式(10)和式(11)。③ After the above-mentioned updated memory cell state, the value of the output gate can be continuously calculated, as shown in equations (10) and (11).

④ 计算损失函数，并使用梯度下降法将其最小化，如式(12)所示。④ Calculate the loss function and minimize it using the gradient descent method, as shown in formula (12).

经过LSTM层后，LSTM每个时间步隐藏状态为输出结果，我们此时将LSTM输出的每一个时间步输出到注意力层，即可以捕获序列之间的依赖关系，加权求和后我们得到输出序列i对应的上下文向量表示

，具体公式描述如式(13)和式(14)所示。After the LSTM layer, the hidden state of each time step of LSTM is the output result. At this time, we output each time step of LSTM output to the attention layer, which can capture the dependency between sequences. After weighted summation, we get the context vector representation corresponding to the output sequence i

, the specific formula description is shown in formula (13) and formula (14).

其中，

为LSTM第i个时间步的输出，

表示第i个时间步与第j个时间步的输出进行归一化后的权重，这里的相似度计算函数采用的是矩阵变换，用W表示。最后还要经过一层全连接层得到最终的输出概率。该过程如算法2所示。in,

is the output of the LSTM i -th time step,

It represents the normalized weight of the output of the i- th time step and the j -th time step. The similarity calculation function here uses matrix transformation, represented by W. Finally, it passes through a fully connected layer to obtain the final output probability. This process is shown in Algorithm 2.

用户在源领域和目标领域的历史记录的所有资源向量分别通过LSTM层和注意力层后得到用户在源领域和目标领域的偏好向量，将用户在源领域的偏好向量映射到目标领域，并结合用户在目标领域的偏好得到最终结果，该映射过程在下一部分具体阐述。All resource vectors of the user's historical records in the source and target domains are passed through the LSTM layer and the attention layer respectively to obtain the user's preference vectors in the source and target domains. The user's preference vector in the source domain is mapped to the target domain, and the final result is obtained by combining the user's preference in the target domain. The mapping process is described in detail in the next section.

3. 基于BP神经网络的跨领域标签映射算法3. Cross-domain label mapping algorithm based on BP neural network

在不同的领域之间，用户的兴趣亦存在一定的对应关系。例如，喜欢看喜剧的用户大都偏向于听摇滚类型的音乐，而喜欢看恐怖片的用户一般喜欢听旋律变化较明显的音乐。这里通过三层BP神经网络来学习不同领域间的标签映射，首先在A-B领域和C领域分别收集用户使用最多的n个标签，根据式(4)分别计算两个领域中每个标签的权重，再通过加权的方式分别计算该用户在A-B和C领域的特征向量，如式(15)和式(16)所示。There is also a certain correspondence between user interests in different fields. For example, users who like to watch comedies tend to listen to rock music, while users who like to watch horror movies generally like to listen to music with obvious melody changes. Here, a three-layer BP neural network is used to learn the label mapping between different fields. First, the n labels most used by users are collected in the A-B field and the C field, and the weights of each label in the two fields are calculated according to formula (4). Then, the feature vectors of the user in the A-B and C fields are calculated in a weighted manner, as shown in formulas (15) and (16).

其中，

为用户在A-B领域的特征向量，

和

为A-B领域各标签的权重和该标签的特征向量；

为用户在C领域的特征向量，

和

为C领域各标签的权重和该标签的特征向量。in,

is the feature vector of the user in the AB field,

and

is the feature vector of the user in the C domain,

and

is the weight of each label in the C field and the feature vector of the label.

接着，以

为输入向量，

为实际输出向量，使用BP神经网络学习

和

之间的映射，如式(17)~(21)所示：Next,

is the input vector,

To actually output the vector, use BP neural network to learn

and

The mapping between them is shown in equations (17) to (21):

1) 输入层到隐藏层的映射关系如式(17)所示：1) The mapping relationship from the input layer to the hidden layer is shown in formula (17):

2) 经过隐藏层的激活函数如式(18)所示：2) The activation function of the hidden layer is shown in formula (18):

3) 隐藏层到输出层的映射关系如式(19)所示：3) The mapping relationship from the hidden layer to the output layer is shown in formula (19):

4) 经过输出层的激活函数如式(20)所示：4) The activation function of the output layer is shown in formula (20):

5) BP神经网络的损失函数如式(21)所示：5) The loss function of BP neural network is shown in formula (21):

其中，在式(14)和式(16)中，

，具体模型结构如图2所示。Among them, in formula (14) and formula (16),

,The specific model structure is shown in Figure 2.

在BP神经网络中，隐含层单元的数目与问题的要求、输入、输出单元的数目都有直接关系。数目太少，所获取的信息太少，则出现欠拟合。数目太多，增加训练时间，则容易出现过拟合，泛化能力也差。本软件使用黄金分割法确定BP神经网络的隐藏层单元的数量。即，

，其中，

为输入层单元数量，

为输出层单元数量。最后，使用梯度下降法最小化损失函数。该过程如算法3所示。In BP neural network, the number of hidden layer units is directly related to the requirements of the problem, the number of input and output units. If the number is too small, the information obtained is too little, and underfitting occurs. If the number is too large, the training time is increased, and overfitting is likely to occur, and the generalization ability is also poor. This software uses the golden section method to determine the number of hidden layer units of BP neural network. That is,

,in,

is the number of input layer units,

is the number of output layer units. Finally, the gradient descent method is used to minimize the loss function. The process is shown in Algorithm 3.

在将用户源领域的偏好向量映射到目标领域后，以一定的方式与用户在目标领域的偏好向量加权求和，得到用户在目标领域的综合偏好，该过程在下一部分详细描述。After mapping the user's preference vector in the source domain to the target domain, it is weighted and summed with the user's preference vector in the target domain in a certain way to obtain the user's comprehensive preference in the target domain. This process is described in detail in the next section.

4. 融合标签映射和注意力机制的跨领域推荐框架4. Cross-domain recommendation framework integrating label mapping and attention mechanism

根据前三个部分，下面将给出融合标签映射和注意力机制的跨领域推荐算法。在将用户源领域的偏好向量映射到目标领域后，需要以一定的方式与用户目标领域的偏好向量加权求和。Based on the first three parts, the cross-domain recommendation algorithm integrating label mapping and attention mechanism is given below. After mapping the preference vector of the user's source domain to the target domain, it needs to be weighted and summed with the preference vector of the user's target domain in a certain way.

跨领域推荐的意义在于解决用户在目标领域数据稀疏的问题。用户在目标领域数据相对源领域足够丰富，则更多地使用用户在目标领域地数据，故可按用户在源领域和目标领域的资源数占比确定权重，算法的整体框架如图1所示。The significance of cross-domain recommendation is to solve the problem of sparse data in the target domain. If the user's data in the target domain is relatively rich compared to the source domain, more of the user's data in the target domain will be used. Therefore, the weight can be determined according to the proportion of the user's resources in the source domain and the target domain. The overall framework of the algorithm is shown in Figure 1.

记A-B领域的历史记录经过LSTM层和注意力层处理后的用户偏好向量为

，C领域的历史记录经过LSTM网络处理后的用户偏好向量为

，源领域和目标领域之间的映射网络为

，按式(22)将

和

加权求和，得到最终的用户资源向量。The user preference vector after the historical records in the AB field are processed by the LSTM layer and the attention layer is

, the mapping network between the source domain and the target domain is

, according to formula (22)

and

The weighted sum is performed to obtain the final user resource vector.

其中，

表示用户的资源向量，

表示用户历史记录中目标领域资源的数量，

表示用户历史记录中源领域资源的数量。若用户在目标领域浏览过的数据量很少，则可通过其在源领域的偏好映射得到在目标领域的偏好；若用户在目标领域的数据远多于其在源领域的数据，则数据稀疏性不存在，相当于直接根据单领域推荐的结果得出最终结果。in,

represents the user's resource vector,

Indicates the number of target domain resources in the user's history.

Indicates the number of source domain resources in the user's history. If the amount of data that the user has browsed in the target domain is very small, the preference in the target domain can be obtained through the preference mapping in the source domain; if the user has much more data in the target domain than in the source domain, data sparsity does not exist, which is equivalent to directly obtaining the final result based on the result of single-domain recommendation.

在计算得到最终的用户资源向量后，使用式(23)计算用户资源向量和资源库中目标领域的资源向量的相似度，并向用户推荐TOPN列表。After the final user resource vector is calculated, the similarity between the user resource vector and the resource vector of the target domain in the resource library is calculated using formula (23), and the TOPN list is recommended to the user.

其中，

表示用户资源向量，

表示资源库中目标领域的资源向量。该过程如算法4所示。in,

represents the user resource vector,

Represents the resource vector of the target domain in the resource library. The process is shown in Algorithm 4.

算法4将用户源领域的偏好与目标领域的偏好结合，得到用户在目标领域的综合偏好，然后计算其与目标领域用户未浏览过的资源向量的相似度，将前N项推荐给用户。资源更丰富的领域对最终的结果影响更大。Algorithm 4 combines the user's preferences in the source domain with the preferences in the target domain to obtain the user's comprehensive preferences in the target domain, and then calculates its similarity with the resource vectors in the target domain that the user has not browsed, and recommends the top N items to the user. The domain with richer resources has a greater impact on the final result.

上述融合标签和注意力机制的跨领域推荐方法及其实现系统中，通过融合两个相近领域的项目标签，构建跨领域融合标签，将两个相近领域融合为一个新领域，并将其作为跨领域推荐的源领域，再将一个相关度较低的一个领域作为目标领域，使得跨领域推荐的源领域数据更丰富。通过引入注意力机制在源领域和目标领域分别计算用户当前的偏好，使得推荐结果更具有时效性。通过BP神经网络获取用户在源领域和目标领域的总体偏好的映射关系，并将此映射关系应用于用户当前的偏好，将其从源领域映射到目标领域，再结合目标领域单独的推荐结果，使得最终推荐结果更加准确。In the cross-domain recommendation method and implementation system of the above-mentioned fusion label and attention mechanism, a cross-domain fusion label is constructed by fusing the project labels of two similar fields, and the two similar fields are fused into a new field, which is used as the source field of the cross-domain recommendation, and then a field with lower correlation is used as the target field, so that the source field data of the cross-domain recommendation is richer. By introducing the attention mechanism, the user's current preferences are calculated in the source field and the target field respectively, so that the recommendation results are more timely. The mapping relationship of the user's overall preferences in the source field and the target field is obtained through the BP neural network, and this mapping relationship is applied to the user's current preferences, which are mapped from the source field to the target field, and then combined with the recommendation results of the target field separately, so that the final recommendation results are more accurate.

需要说明的是，以上所述仅为本发明的优选实施例，并不用于限制本发明，对于本领域技术人员而言，本发明可以有各种改动和变化。凡在本发明的精神和原理之内所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。It should be noted that the above is only a preferred embodiment of the present invention and is not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the protection scope of the present invention.

Claims

1. A cross-domain recommendation method integrating labels and attention mechanism, characterized by comprising the following steps:

Step 1: Select and construct cross-domain fusion labels, merge two similar domains into a new domain as the source domain for cross-domain recommendation, and perform weighted summation on the label vectors of the source domain and the target domain to obtain the resource vector of each resource;

Step 2: Based on the interest mining algorithm based on the attention mechanism, the temporal relationship between users and resources is learned through the long short-term memory algorithm LSTM, and the attention mechanism is introduced to obtain the user's preferences in the specified time period of the source and target fields;

Step 3: According to the cross-domain label mapping algorithm based on BP neural network, the label mapping between the source domain and the target domain is learned through a three-layer BP neural network. After mapping the user's preference vector in the source domain to the target domain, it is weighted and summed with the user's preference vector in the target domain to obtain the user's comprehensive preference in the target domain.

Step 4: A cross-domain recommendation algorithm that integrates label mapping and attention mechanism calculates the similarity between the resource vectors that the user has not browsed and the user's comprehensive preferences in the target domain according to the user's comprehensive preferences in the source domain and the target domain, and recommends the top N items to the user;

The process of selecting and constructing cross-domain fusion labels in step 1 includes:

Step 1-1, data preprocessing of source and target domains:

Collect commonly used tags in two similar fields A and B and the target field C as custom tags DT, retrieve resources related to the custom tags, obtain resource tags RT corresponding to the resources, remove duplicate tags, and obtain the RT-DT matrix. Each column of the matrix is a custom tag DT vector, and each row is a resource tag RT vector. Perform a unified measurement on the remaining custom tag vectors.

Step 1-2, selection and construction of cross-domain fusion labels;

Analyze the similarity of the same custom tag DT vector and the same resource tag RT vector in domains A and B, and use cosine similarity to calculate the similarity of the same DT vector and the same RT vector in the resources of users A and B, as shown in formula (1):

in,

and

Represents the same DT or RT in field A and field B respectively,

,

Labels

and

In each field where the label vector has been constructed, the custom label DT and resource label RT are screened, and the weighted sum of the same resource label RT vectors is performed according to the user's interest in each field to obtain the A-B field label vector matrix, as shown in formula (2) and formula (3):

in,

To integrate labels across domains,

For label

The vector representation of ;

The resource tag RT of each resource is sorted according to the number of times the tag is marked by the user. The front tags are more closely related to the resource than the back tags. The weight is assigned to each resource tag according to formula (4):

Formula (4) is used for the A-B domain and the C domain to obtain the weights corresponding to each label vector of all resources, and then the weighted sum of these label vectors is used to obtain the resource vector of each resource, as shown in Formula (5):

in,

is the label vector for each label of the resource,

is the resource vector of the resource;

The interest mining algorithm based on the attention mechanism in step 2 includes the following steps:

Use the long short-term memory algorithm to learn the temporal relationship between users and resources, assuming that the update interval of the memory unit layer is the time step

,

is the memory unit layer at time

Input;

and

is the weight matrix;

is the deviation vector; specifically:

Step 2-1, at each time step

and the new input vector

, as shown in formula (6) and formula (7):

Step 2-2, at each time step

, calculate the control variable of the forget gate

With candidate status

Update to

, as shown in formula (8) and formula (9):

Step 2-3, in the updated memory cell state, the value of the output gate is continuously calculated, as shown in equations (10) and (11):

Step 2-4, calculate the loss function and minimize it using the gradient descent method, as shown in formula (12):

After the LSTM layer is calculated, the hidden state of each time step of the LSTM is output to the attention layer as the output result to capture the dependencies between sequences. After weighted summation, the context vector representation corresponding to the output sequence i is obtained.

, the specific formula description is shown in formula (13) and formula (14):

in,

is the output of the LSTM i -th time step,

represents the normalized weight of the output of the i -th time step and the j -th time step, and the similarity calculation function uses the matrix transformation represented by W;

The cross-domain label mapping algorithm based on BP neural network in step 3 includes the following process:

The label mapping between different fields is learned through a three-layer BP neural network. First, the n labels most used by users in the A-B field and the C field are collected respectively. The weight of each label in the two fields is calculated according to formula (4). Then, the feature vectors of the user in the A-B and C fields are calculated by weighting, as shown in formulas (15) and (16):

in,

is the feature vector of the user in the AB field,

and

is the feature vector of the user in the C domain,

and

is the weight of each label in field C and the feature vector of the label;

Then,

is the input vector,

To actually output the vector, use BP neural network to learn

and

The mapping between them is shown in equations (17) to (21):

The mapping relationship from the input layer to the hidden layer is shown in formula (17):

The activation function of the hidden layer is shown in formula (18):

The mapping relationship from the hidden layer to the output layer is shown in formula (19):

The activation function of the output layer is shown in formula (20):

The loss function of the BP neural network is shown in formula (21):

Among them, in formula (14) and formula (16),

; n is the number of hidden layer units of the BP neural network, that is,

,in,

is the number of input layer units,

is the number of output layer units;

The cross-domain recommendation algorithm integrating label mapping and attention mechanism in step 4 specifically includes the following steps:

Step 4-1: Assume that the user preference vector of the historical records in field AB after being processed by the LSTM layer and the attention layer is

, the mapping network between the source domain and the target domain is

, according to formula (22)

and

Weighted summation is performed to obtain the final user resource vector:

in,

represents the user's resource vector,

Indicates the number of target domain resources in the user's history.

Indicates the number of source domain resources in the user's history;

Step 4-2, after calculating the final user resource vector, calculate the similarity between the user resource vector and the resource vector of the target domain in the resource library, as shown in formula (23):

in,

represents the user resource vector,

A resource vector representing the target domain in the resource library.

2. The cross-domain recommendation method integrating labels and attention mechanism as claimed in claim 1, characterized in that the data preprocessing of the source domain and the target domain in step 1-1 comprises the following steps:

Step 1-1-1, use TF-IDF technology to collect tags from the text of resources in fields A and B respectively, and then extract m tags commonly used by users and appearing in both fields from the collected tags, namely customized tags DT. These tags can better represent resource characteristics, recorded as DTs; then search for resources related to DT in fields A and B respectively, and then display the tags corresponding to each resource, namely resource tags RT, recorded as RTs;

Step 1-1-2, use TF-IDF technology to collect N tags commonly used by users from the C field, denoted as DTt, then retrieve resources related to DT, and then display the tags corresponding to each resource, denoted as RTt;

Step 1-1-3: For the collected tags, use the NLPIR Chinese word segmentation system to complete the word segmentation, remove duplicate tags, and count the frequency of each RT in the resources corresponding to each DT. The larger the vector value, the closer the connection between RT and DT;

Step 1-1-4, since the total number of resources retrieved by different DTs is different, each component of all DT vectors is divided by the maximum component of the vector to obtain a DT vector with a unified metric.

3. The cross-domain recommendation method integrating tags and attention mechanism as described in claim 1 is characterized in that, according to the interest mining algorithm based on the attention mechanism, all resource vectors of the user's historical records in the source domain and the target domain are calculated through the LSTM layer and the attention layer respectively to obtain the user's preference vectors in the source domain and the target domain.

4. The cross-domain recommendation method integrating labels and attention mechanisms as described in claim 1 is characterized in that the number n of hidden layer units of the BP neural network is determined by the golden section method.

5. The cross-domain recommendation method of integrating tags and attention mechanism as described in claim 1 is characterized in that, in the calculation of the final user resource vector, the similarity of the resource vectors that the user has not browsed in the target domain is directly affected by the different browsing volume of the user in the target domain; when the amount of data browsed by the user in the target domain is very small, the similarity of the resource vectors that the user has not browsed in the target domain can be calculated by mapping the user's preference in the source domain to obtain the preference in the target domain; when the amount of data browsed by the user in the target domain is much larger than the amount of data browsed in the source domain, data sparsity does not exist, which is equivalent to directly obtaining the final recommendation result based on the browsed resource vectors in the target domain.

6. A cross-domain recommendation implementation system integrating labels and attention mechanisms, which is used to implement the cross-domain recommendation method integrating labels and attention mechanisms as described in any one of claims 1 to 5, characterized in that the implementation system comprises:

The label-based cross-domain resource fusion algorithm module uses the commonly used custom labels of two similar fields A and B to obtain the resource labels of the resources corresponding to the custom labels, analyzes the similarity of the same custom label DT vectors and the same resource label RT vectors in fields A and B, and removes the resource label vectors whose similarity is lower than the threshold; the resource label vectors of the source field and the target field C composed of two similar fields A and B are weighted and summed respectively to obtain the resource vectors of each resource in the source field and the target field;

The interest mining algorithm module based on the attention mechanism uses all resource vectors of the user's historical records in the source and target fields, and obtains the user's preference vectors in the source and target fields through in-depth learning of the long short-term memory algorithm LSTM layer and the attention layer;

The cross-domain label mapping algorithm module based on BP neural network uses a three-layer BP neural network to map the user's preference vector in the source domain to the target domain, and obtain the user's comprehensive preference in the target domain;

The cross-domain recommendation algorithm module that integrates label mapping and attention mechanism is used to combine the user's preferences in the source domain and the target domain to obtain the user's comprehensive preference in the target domain, and then calculates the similarity between the resource vector that the user has not browsed in the target domain and the user's comprehensive preference to obtain the user's recommendation list in the target domain.