CN111291261B - A cross-domain recommendation method and its implementation system that integrates labels and attention mechanisms - Google Patents

A cross-domain recommendation method and its implementation system that integrates labels and attention mechanisms Download PDF

Info

Publication number
CN111291261B
CN111291261B CN202010068923.6A CN202010068923A CN111291261B CN 111291261 B CN111291261 B CN 111291261B CN 202010068923 A CN202010068923 A CN 202010068923A CN 111291261 B CN111291261 B CN 111291261B
Authority
CN
China
Prior art keywords
domain
user
resource
vector
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010068923.6A
Other languages
Chinese (zh)
Other versions
CN111291261A (en
Inventor
钱忠胜
涂宇
朱懿敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dragon Totem Technology Hefei Co ltd
Shanghai Juhui Network Technology Co ltd
Original Assignee
Jiangxi University of Finance and Economics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi University of Finance and Economics filed Critical Jiangxi University of Finance and Economics
Priority to CN202010068923.6A priority Critical patent/CN111291261B/en
Publication of CN111291261A publication Critical patent/CN111291261A/en
Application granted granted Critical
Publication of CN111291261B publication Critical patent/CN111291261B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种融合标签和注意力机制的跨领域推荐方法及其实现系统,该推荐方法包括,首先,选择和构建跨领域融合的标签,将源领域和目标领域的标签向量分别进行加权求和,得到资源向量;其次,依据基于注意力机制的兴趣挖掘算法,得到用户在源领域和目标领域的偏好;然后,依据基于BP神经网络的跨领域标签映射算法学习源领域与目标领域间的标签映射,得到用户在目标领域的综合偏好;最后,通过融合标签映射和注意力机制的跨领域推荐算法,将目标领域中与用户的综合偏好相似度高的项目推荐给用户。通过跨领域推荐综合考虑用户在不同领域的偏好,改善了用户在目标领域推荐中的冷启动问题;同时在跨领域推荐系统中,通过分析用户在不同领域的偏好,使得推荐结果更具多样化。

Figure 202010068923

The present invention discloses a cross-domain recommendation method and its implementation system for fusing labels and attention mechanisms. The recommendation method includes, firstly, selecting and constructing cross-domain fusion labels, and weighting the label vectors of the source domain and the target domain respectively. Then, according to the interest mining algorithm based on the attention mechanism, the user's preferences in the source domain and the target domain are obtained; then, according to the cross-domain label mapping algorithm based on BP neural network, learn the relationship between the source domain and the target domain. The label mapping of the user is used to obtain the user's comprehensive preference in the target field; finally, through the cross-domain recommendation algorithm that integrates the label mapping and attention mechanism, the items with high similarity to the user's comprehensive preference in the target field are recommended to the user. Considering the user's preferences in different fields through cross-domain recommendation, the cold start problem of users in the target field recommendation is improved; at the same time, in the cross-domain recommendation system, by analyzing the user's preferences in different fields, the recommendation results are more diverse. .

Figure 202010068923

Description

融合标签和注意力机制的跨领域推荐方法及其实现系统Cross-domain recommendation method and implementation system integrating labels and attention mechanism

技术领域Technical Field

本发明涉及信息推荐方法和系统技术领域,具体涉及一种融合标签和注意力机制的跨领域推荐方法及其实现系统。The present invention relates to the technical field of information recommendation methods and systems, and in particular to a cross-domain recommendation method integrating tags and attention mechanisms and an implementation system thereof.

背景技术Background Art

随着互联网技术的快速发展,QQ、微信、微博等各类社交应用软件的数量快速地增长,多种多样的信息呈现在人们面前,极大地丰富了人们的日常生活。但是,这个过程中出现了一些不可避免的问题,例如信息泛滥和信息迷航。为了帮助每个用户更好地获取资源,个性化的推荐技术应运而生。当前,相关研究人员将个性化推荐技术应用于各个领域的资源推荐,除了电影、音乐、体育之外,还包括了电子商务、基于位置的服务、医疗等领域。未来个性化推荐技术的应用范畴会越来越广。With the rapid development of Internet technology, the number of various social application software such as QQ, WeChat, and Weibo has grown rapidly, and a variety of information has been presented to people, greatly enriching people's daily lives. However, some inevitable problems have emerged in this process, such as information flooding and information wandering. In order to help each user better obtain resources, personalized recommendation technology has emerged. At present, relevant researchers have applied personalized recommendation technology to resource recommendations in various fields, including e-commerce, location-based services, medical care, etc. in addition to movies, music, and sports. In the future, the application scope of personalized recommendation technology will become wider and wider.

大多数传统推荐算法都重点关注用户对项目的显性偏好,即数字评分。随着电子商务系统的不断扩大,用户的评分数据变得十分稀疏,仅仅通过分析用户的评分数据不足以充分了解用户的需求。而用户的隐性偏好,诸如用户的浏览记录、点击记录、标签信息,蕴含着丰富的信息,能显著改善推荐结果。Most traditional recommendation algorithms focus on users' explicit preferences for items, that is, numerical ratings. As e-commerce systems continue to expand, user rating data has become very sparse, and simply analyzing user rating data is not enough to fully understand user needs. However, users' implicit preferences, such as their browsing history, click history, and tag information, contain rich information and can significantly improve recommendation results.

目前,大多数推荐技术都为单领域的推荐技术,即仅仅利用用户在单一领域的兴趣对用户进行推荐,而多领域相结合的推荐技术较少。在单领域推荐中,往往存在着数据稀疏、用户冷启动以及商品冷启动等问题,使得推荐系统的性能下降,推荐准确度降低。At present, most recommendation technologies are single-domain recommendation technologies, that is, they only use users' interests in a single domain to recommend to users, while there are few recommendation technologies that combine multiple domains. In single-domain recommendation, there are often problems such as data sparsity, user cold start, and product cold start, which reduce the performance of the recommendation system and the accuracy of recommendation.

发明内容Summary of the invention

有鉴于此,有必要提供一种通过分析用户在不同领域的偏好、使得推荐结构更具有多样化特点的融合标签和注意力机制的跨领域推荐方法及其实现系统。In view of this, it is necessary to provide a cross-domain recommendation method and its implementation system that analyzes user preferences in different fields and makes the recommendation structure more diversified by integrating labels and attention mechanisms.

一种融合标签和注意力机制的跨领域推荐方法,包括以下步骤:A cross-domain recommendation method integrating labels and attention mechanisms includes the following steps:

步骤一,选择和构建跨领域融合的标签,将两个相近领域融合为一个新领域,作为跨领域推荐的源领域,对源领域和目标领域的标签向量分别进行加权求和,得到每个资源的资源向量;Step 1: Select and construct cross-domain fusion labels, merge two similar domains into a new domain as the source domain for cross-domain recommendation, and perform weighted summation on the label vectors of the source domain and the target domain to obtain the resource vector of each resource;

步骤二,依据基于注意力机制的兴趣挖掘算法,通过长短期记忆算法LSTM学习用户和资源之间的时序关系,并引入注意力机制,得到用户在源领域和目标领域的指定时间段内的偏好;Step 2: Based on the interest mining algorithm based on the attention mechanism, the temporal relationship between users and resources is learned through the long short-term memory algorithm LSTM, and the attention mechanism is introduced to obtain the user's preferences in the specified time period of the source and target fields;

步骤三,依据基于BP神经网络的跨领域标签映射算法,通过三层BP神经网络来学习源领域与目标领域间的标签映射,将用户源领域的偏好向量映射到目标领域后,并与用户在目标领域的偏好向量加权求和,得到用户在目标领域的综合偏好;Step 3: According to the cross-domain label mapping algorithm based on BP neural network, the label mapping between the source domain and the target domain is learned through a three-layer BP neural network. After mapping the user's preference vector in the source domain to the target domain, it is weighted and summed with the user's preference vector in the target domain to obtain the user's comprehensive preference in the target domain.

步骤四,融合标签映射和注意力机制的跨领域推荐算法,根据用户在源领域和目标领域中的综合偏好,计算在目标领域中,用户未浏览过的资源向量与用户的综合偏好的相似度,将前N项推荐给用户。Step 4: A cross-domain recommendation algorithm that integrates label mapping and attention mechanism calculates the similarity between the resource vectors that the user has not browsed in the target domain and the user's comprehensive preferences based on the user's comprehensive preferences in the source domain and the target domain, and recommends the top N items to the user.

进一步地,步骤一中的所述选择和构建跨领域融合的标签流程包括:Furthermore, the process of selecting and constructing cross-domain fusion labels in step 1 includes:

步骤1-1,源领域和目标领域的数据预处理;Step 1-1, data preprocessing of source and target domains;

采集两个相近的领域A和领域B、及目标领域C中的普遍使用的标签作为定制标签DT,检索与所述定制标签相关的资源,得到与所述资源对应的资源标签RT,去除重复的标签后,得到RT-DT矩阵,矩阵的每一列为定制标签DT向量,每一行为资源标签RT向量,对剩余的定制标签向量进行统一度量;Collect commonly used tags in two similar fields A and B and the target field C as custom tags DT, retrieve resources related to the custom tags, obtain resource tags RT corresponding to the resources, remove duplicate tags, and obtain the RT-DT matrix. Each column of the matrix is a custom tag DT vector, and each row is a resource tag RT vector. Perform a unified measurement on the remaining custom tag vectors.

步骤1-2,跨领域融合标签的选择与构建;Step 1-2, selection and construction of cross-domain fusion labels;

分析A、B领域相同的定制标签DT向量以及相同的资源标签RT向量的相似度,利用余弦相似度计算用户A与B资源内同一DT向量以及同一RT向量的相似度,如式(1)所示:Analyze the similarity of the same custom tag DT vector and the same resource tag RT vector in domains A and B, and use cosine similarity to calculate the similarity of the same DT vector and the same RT vector in the resources of users A and B, as shown in formula (1):

Figure SMS_1
Figure SMS_2
Figure SMS_1
Figure SMS_2

其中,

Figure SMS_3
Figure SMS_4
分别表示A领域和B领域的同一DT或RT,
Figure SMS_5
Figure SMS_6
分别为标签
Figure SMS_7
Figure SMS_8
的向量表示;为了确保推荐质量,将相似度低于阈值的标签剔除;in,
Figure SMS_3
and
Figure SMS_4
Represents the same DT or RT in field A and field B respectively,
Figure SMS_5
,
Figure SMS_6
Labels
Figure SMS_7
and
Figure SMS_8
To ensure the quality of recommendation, labels with similarity below the threshold are removed.

在已构建标签向量的各领域中,对定制标签DT和资源标签RT进行筛选,依据用户在各领域上的兴趣对相同资源标签RT向量进行加权求和,得到A-B领域标签向量矩阵,如式(2)和式(3)所示:In each field where the label vector has been constructed, the custom label DT and resource label RT are screened, and the weighted sum of the same resource label RT vectors is performed according to the user's interest in each field to obtain the A-B field label vector matrix, as shown in formula (2) and formula (3):

Figure SMS_9
Figure SMS_10
Figure SMS_9
Figure SMS_10

Figure SMS_11
Figure SMS_12
Figure SMS_11
Figure SMS_12

其中,

Figure SMS_13
为跨领域融合标签,
Figure SMS_14
为标签
Figure SMS_15
的向量表示。in,
Figure SMS_13
To integrate labels across domains,
Figure SMS_14
For label
Figure SMS_15
The vector representation of .

各个资源的资源标签RT按照该标签被用户标注的次数排序,前面的标签比后面的标签与资源的联系更为密切,按式(4)为每个资源的标签分配权重:The resource tag RT of each resource is sorted according to the number of times the tag is marked by the user. The front tags are more closely related to the resource than the back tags. The weight is assigned to each resource tag according to formula (4):

Figure SMS_16
Figure SMS_17
Figure SMS_16
Figure SMS_17

对A-B领域和C领域分别使用式(4)得到所有资源的每个标签向量对应的权重,再将这些标签向量加权求和得到每个资源的资源向量,如式(5)所示:Formula (4) is used for the A-B domain and the C domain to obtain the weights corresponding to each label vector of all resources, and then the weighted sum of these label vectors is used to obtain the resource vector of each resource, as shown in Formula (5):

Figure SMS_18
Figure SMS_19
Figure SMS_18
Figure SMS_19

其中,

Figure SMS_20
为资源的每个标签的标签向量,
Figure SMS_21
为资源的资源向量。in,
Figure SMS_20
is the label vector for each label of the resource,
Figure SMS_21
A resource vector for the resource.

进一步地,步骤1-1中的源领域和目标领域的数据预处理,包括如下步骤:Furthermore, the data preprocessing of the source domain and the target domain in step 1-1 includes the following steps:

步骤1-1-1,使用TF-IDF技术分别从A、B领域资源的文本中采集标签,再从已采集的标签中提取m个用户普遍使用且同时出现在两个领域的标签,即定制标签DT,这些标签能较好地表示资源特征,记为DTs;然后在A、B领域分别检索与DT相关的资源,再显示每个资源所对应的标签即资源标签RT,记为RTs;Step 1-1-1, use TF-IDF technology to collect tags from the text of resources in fields A and B respectively, and then extract m tags commonly used by users and appearing in both fields from the collected tags, namely customized tags DT. These tags can better represent resource characteristics, recorded as DTs; then search for resources related to DT in fields A and B respectively, and then display the tags corresponding to each resource, namely resource tags RT, recorded as RTs;

步骤1-1-2,使用TF-IDF技术从C领域中采集N个用户普遍使用的标签,记为DTt。然后检索与DT相关的资源,再显示每个资源所对应的标签,记为RTt;Step 1-1-2, use TF-IDF technology to collect N tags commonly used by users from the C field, denoted as DTt. Then retrieve resources related to DT, and then display the tags corresponding to each resource, denoted as RTt;

步骤1-1-3,对于收集到的标签,使用NLPIR汉语分词系统完成分词,并将重复的标签去除,统计每个RT在各DT对应的资源中出现的频次,向量值越大RT和DT联系越紧密;Step 1-1-3: For the collected tags, use the NLPIR Chinese word segmentation system to complete the word segmentation, remove duplicate tags, and count the frequency of each RT in the resources corresponding to each DT. The larger the vector value, the closer the connection between RT and DT;

步骤1-1-4,由于不同DT检索的资源总数不同,故将所有DT向量的每个分量除以该向量的最大分量,得到度量统一的DT向量。Step 1-1-4, since the total number of resources retrieved by different DTs is different, each component of all DT vectors is divided by the maximum component of the vector to obtain a DT vector with a unified metric.

进一步地,步骤二中的所述基于注意力机制的兴趣挖掘算法,包括如下步骤:Furthermore, the interest mining algorithm based on the attention mechanism in step 2 includes the following steps:

使用长短期记忆算法学习用户和资源之间的时序关系,假设记忆单元层的更新间隔为时间步

Figure SMS_22
Figure SMS_23
是记忆单元层在时间
Figure SMS_24
的输入;
Figure SMS_25
Figure SMS_26
是权重矩阵;
Figure SMS_27
是偏差向量;具体为:Use the long short-term memory algorithm to learn the temporal relationship between users and resources, assuming that the update interval of the memory unit layer is the time step
Figure SMS_22
,
Figure SMS_23
is the memory unit layer at time
Figure SMS_24
Input;
Figure SMS_25
and
Figure SMS_26
is the weight matrix;
Figure SMS_27
is the deviation vector; specifically:

步骤2-1,在每一个时间步

Figure SMS_28
,输入门输入信息与权值相乘,再加上偏置量,计算得到输入门的控制变量
Figure SMS_29
和新的输入向量
Figure SMS_30
,具体如式(6)和式(7)所示:Step 2-1, at each time step
Figure SMS_28
, the input gate input information is multiplied by the weight, and then the bias is added to calculate the control variable of the input gate
Figure SMS_29
and the new input vector
Figure SMS_30
, as shown in formula (6) and formula (7):

Figure SMS_31
Figure SMS_32
Figure SMS_31
Figure SMS_32

Figure SMS_33
Figure SMS_34
Figure SMS_33
Figure SMS_34

步骤2-2,在每一个时间步

Figure SMS_35
,计算遗忘门的控制变量
Figure SMS_36
与候选状态
Figure SMS_37
相乘,即将遗忘门输入信息与权值相乘,再加上输入门输入信息与权值的乘积,然后将记忆单元状态从
Figure SMS_38
更新到
Figure SMS_39
,具体如式(8)和式(9)所示:Step 2-2, at each time step
Figure SMS_35
, calculate the control variable of the forget gate
Figure SMS_36
With candidate status
Figure SMS_37
Multiply, that is, multiply the forget gate input information by the weight, plus the product of the input gate input information and the weight, and then change the memory unit state from
Figure SMS_38
Update to
Figure SMS_39
, as shown in formula (8) and formula (9):

Figure SMS_40
Figure SMS_41
Figure SMS_40
Figure SMS_41

Figure SMS_42
Figure SMS_43
Figure SMS_42
Figure SMS_43

步骤2-3,在更新后的记忆单元状态中,不断计算输出门的值,具体如式(10)和式(11)所示:Step 2-3, in the updated memory cell state, the value of the output gate is continuously calculated, as shown in equations (10) and (11):

Figure SMS_44
Figure SMS_45
Figure SMS_44
Figure SMS_45

Figure SMS_46
Figure SMS_47
Figure SMS_46
Figure SMS_47

步骤2-4,计算损失函数,并使用梯度下降法将其最小化,如式(12)所示:Step 2-4, calculate the loss function and minimize it using the gradient descent method, as shown in formula (12):

Figure SMS_48
Figure SMS_49
Figure SMS_48
Figure SMS_49

经过长短期记忆算法LSTM层计算后,将LSTM每个时间步隐藏状态作为输出结果输出到注意力层,以捕获序列之间的依赖关系,加权求和后得到输出序列i对应的上下文向量表示

Figure SMS_50
,具体公式描述如式(13)和式(14)所示:After the LSTM layer is calculated, the hidden state of each time step of the LSTM is output to the attention layer as the output result to capture the dependencies between sequences. After weighted summation, the context vector representation corresponding to the output sequence i is obtained.
Figure SMS_50
, the specific formula description is shown in formula (13) and formula (14):

Figure SMS_51
Figure SMS_52
Figure SMS_51
Figure SMS_52

Figure SMS_53
Figure SMS_54
Figure SMS_53
Figure SMS_54

其中,

Figure SMS_55
为LSTM第i个时间步的输出,
Figure SMS_56
表示第i个时间步与第j个时间步的输出进行归一化后的权重,相似度计算函数采用W表示的矩阵变换。in,
Figure SMS_55
is the output of the LSTM i -th time step,
Figure SMS_56
It represents the normalized weight of the output of the i -th time step and the j -th time step, and the similarity calculation function adopts the matrix transformation represented by W.

进一步地,依据所述基于注意力机制的兴趣挖掘算法,用户在源领域和目标领域的历史记录的所有资源向量分别通过LSTM层和注意力层计算后,得到用户在源领域和目标领域的偏好向量。Furthermore, according to the interest mining algorithm based on the attention mechanism, all resource vectors of the user's historical records in the source field and the target field are calculated through the LSTM layer and the attention layer respectively to obtain the user's preference vectors in the source field and the target field.

进一步地,步骤三中的所述基于BP神经网络的跨领域标签映射算法,包括如下流程:Furthermore, the cross-domain label mapping algorithm based on BP neural network in step 3 includes the following process:

通过三层BP神经网络来学习不同领域间的标签映射,首先在A-B领域和C领域分别收集用户使用最多的n个标签,根据式(4)分别计算两个领域中每个标签的权重,再通过加权分别计算该用户在A-B和C领域的特征向量,具体如式(15)和式(16)所示:The label mapping between different fields is learned through a three-layer BP neural network. First, the n labels most used by users in the A-B field and the C field are collected respectively. The weight of each label in the two fields is calculated according to formula (4). Then, the feature vectors of the user in the A-B and C fields are calculated by weighting, as shown in formulas (15) and (16):

Figure SMS_57
Figure SMS_58
Figure SMS_57
Figure SMS_58

Figure SMS_59
Figure SMS_60
Figure SMS_59
Figure SMS_60

其中,

Figure SMS_61
为用户在A-B领域的特征向量,
Figure SMS_62
Figure SMS_63
为A-B领域各标签的权重和该标签的特征向量;
Figure SMS_64
为用户在C领域的特征向量,
Figure SMS_65
Figure SMS_66
为C领域各标签的权重和该标签的特征向量;in,
Figure SMS_61
is the feature vector of the user in the AB field,
Figure SMS_62
and
Figure SMS_63
is the weight of each label in the AB field and the feature vector of the label;
Figure SMS_64
is the feature vector of the user in the C domain,
Figure SMS_65
and
Figure SMS_66
is the weight of each label in field C and the feature vector of the label;

然后,以

Figure SMS_67
为输入向量,
Figure SMS_68
为实际输出向量,使用BP神经网络学习
Figure SMS_69
Figure SMS_70
之间的映射,如式(17)~(21)所示:Then,
Figure SMS_67
is the input vector,
Figure SMS_68
To actually output the vector, use BP neural network to learn
Figure SMS_69
and
Figure SMS_70
The mapping between them is shown in equations (17) to (21):

输入层到隐藏层的映射关系如式(17)所示:The mapping relationship from the input layer to the hidden layer is shown in formula (17):

Figure SMS_71
Figure SMS_72
Figure SMS_71
Figure SMS_72

经过隐藏层的激活函数如式(18)所示:The activation function of the hidden layer is shown in formula (18):

Figure SMS_73
Figure SMS_74
Figure SMS_73
Figure SMS_74

隐藏层到输出层的映射关系如式(19)所示:The mapping relationship from the hidden layer to the output layer is shown in formula (19):

Figure SMS_75
Figure SMS_76
Figure SMS_75
Figure SMS_76

经过输出层的激活函数如式(20)所示:The activation function of the output layer is shown in formula (20):

Figure SMS_77
Figure SMS_78
Figure SMS_77
Figure SMS_78

BP神经网络的损失函数如式(21)所示:The loss function of the BP neural network is shown in formula (21):

Figure SMS_79
Figure SMS_80
Figure SMS_79
Figure SMS_80

其中,在式(14)和式(16)中,

Figure SMS_81
n为BP神经网络的隐藏层单元的数量,即,
Figure SMS_82
,其中,
Figure SMS_83
为输入层单元数量,
Figure SMS_84
为输出层单元数量。Among them, in formula (14) and formula (16),
Figure SMS_81
; n is the number of hidden layer units of the BP neural network, that is,
Figure SMS_82
,in,
Figure SMS_83
is the number of input layer units,
Figure SMS_84
is the number of output layer units.

进一步地,BP神经网络的隐藏层单元的数量n采用黄金分割法确定。Furthermore, the number n of hidden layer units of the BP neural network is determined by the golden section method.

进一步地,步骤四中的所述融合标签映射和注意力机制的跨领域推荐算法,具体包括如下步骤:Furthermore, the cross-domain recommendation algorithm integrating label mapping and attention mechanism in step 4 specifically includes the following steps:

步骤4-1,设A-B领域的历史记录经过LSTM层和注意力层处理后的用户偏好向量为

Figure SMS_85
,C领域的历史记录经过LSTM网络处理后的用户偏好向量为
Figure SMS_86
,源领域和目标领域之间的映射网络为
Figure SMS_87
,按式(22)将
Figure SMS_88
Figure SMS_89
加权求和,得到最终的用户资源向量:Step 4-1: Assume that the user preference vector of the historical records in field AB after being processed by the LSTM layer and the attention layer is
Figure SMS_85
, the user preference vector after the historical records in field C are processed by the LSTM network is
Figure SMS_86
, the mapping network between the source domain and the target domain is
Figure SMS_87
, according to formula (22)
Figure SMS_88
and
Figure SMS_89
Weighted summation is performed to obtain the final user resource vector:

Figure SMS_90
Figure SMS_91
Figure SMS_90
Figure SMS_91

其中,

Figure SMS_92
表示用户的资源向量,
Figure SMS_93
表示用户历史记录中目标领域资源的数量,
Figure SMS_94
表示用户历史记录中源领域资源的数量;in,
Figure SMS_92
represents the user's resource vector,
Figure SMS_93
Indicates the number of target domain resources in the user's history.
Figure SMS_94
Indicates the number of source domain resources in the user's history;

步骤4-2,在计算得到最终的用户资源向量后,计算用户资源向量和资源库中目标领域的资源向量的相似度,具体如式(23)所示:Step 4-2, after calculating the final user resource vector, calculate the similarity between the user resource vector and the resource vector of the target domain in the resource library, as shown in formula (23):

Figure SMS_95
Figure SMS_96
Figure SMS_95
Figure SMS_96

其中,

Figure SMS_97
表示用户资源向量,
Figure SMS_98
表示资源库中目标领域的资源向量。in,
Figure SMS_97
represents the user resource vector,
Figure SMS_98
A resource vector representing the target domain in the resource library.

进一步地,所述最终的用户资源向量的计算中,根据用户在目标领域中的浏览量不同,直接影响用户在目标领域中的未浏览过的资源向量的相似度;当用户在目标领域浏览过的数据量很少时,可通过用户在源领域的偏好映射得到在目标领域的偏好计算用户在目标领域中未浏览过的资源向量的相似度;当用户在目标领域中的浏览数据量远大于其在源领域中的浏览数据量时,则数据稀疏性不存在,相当于直接根据目标领域中的浏览过的资源向量得出最终推荐结果。Furthermore, in the calculation of the final user resource vector, the different browsing volumes of the user in the target domain directly affect the similarity of the resource vectors that the user has not browsed in the target domain; when the amount of data browsed by the user in the target domain is very small, the similarity of the resource vectors that the user has not browsed in the target domain can be calculated by obtaining the preferences in the target domain through the user's preference mapping in the source domain; when the amount of data browsed by the user in the target domain is much larger than the amount of data browsed in the source domain, data sparsity does not exist, which is equivalent to directly obtaining the final recommendation result based on the browsed resource vectors in the target domain.

以及,一种融合标签和注意力机制的跨领域推荐方法的实现系统,其用于实现如上任一项所述的融合标签和注意力机制的跨领域推荐方法,该实现系统包括:And, a system for implementing a cross-domain recommendation method integrating labels and attention mechanisms, which is used to implement the cross-domain recommendation method integrating labels and attention mechanisms as described in any one of the above items, and the implementation system includes:

基于标签的跨领域资源融合算法模块,利用两个相近领域A和B的普遍使用的定制标签,得到与所述定制标签对应资源的资源标签,分析A、B领域相同的定制标签DT向量以及相同的资源标签RT向量的相似度,并剔除相似度低于阈值的资源标签向量;将由两个相近领域A和B组成的源领域及目标领域C的资源标签向量分别进行加权求和,得到源领域和目标领域的每个资源的资源向量;The label-based cross-domain resource fusion algorithm module uses the commonly used custom labels of two similar fields A and B to obtain the resource labels of the resources corresponding to the custom labels, analyzes the similarity of the same custom label DT vectors and the same resource label RT vectors in fields A and B, and removes the resource label vectors whose similarity is lower than the threshold; the resource label vectors of the source field and the target field C composed of two similar fields A and B are weighted and summed respectively to obtain the resource vectors of each resource in the source field and the target field;

基于注意力机制的兴趣挖掘算法模块,利用用户在源领域和目标领域的历史记录的所有资源向量,通过长短期记忆算法LSTM层和注意力层的深入学习,得到用户在源领域和目标领域的偏好向量;The interest mining algorithm module based on the attention mechanism uses all resource vectors of the user's historical records in the source and target fields, and obtains the user's preference vectors in the source and target fields through in-depth learning of the long short-term memory algorithm LSTM layer and the attention layer;

基于BP神经网络的跨领域标签映射算法模块,利用三层BP神经网络,将用户源领域的偏好向量映射到目标领域,得到用户在目标领域的综合偏好;The cross-domain label mapping algorithm module based on BP neural network uses a three-layer BP neural network to map the user's preference vector in the source domain to the target domain, and obtain the user's comprehensive preference in the target domain;

融合标签映射和注意力机制的跨领域推荐算法模块,用于结合用户在源领域和目标领域的偏好,得到用户在目标领域的综合偏好,然后计算用户在目标领域中未浏览过的资源向量与用户的综合偏好的相似度,得到用户在目标领域的推荐列表。The cross-domain recommendation algorithm module that integrates label mapping and attention mechanism is used to combine the user's preferences in the source domain and the target domain to obtain the user's comprehensive preference in the target domain, and then calculates the similarity between the resource vector that the user has not browsed in the target domain and the user's comprehensive preference to obtain the user's recommendation list in the target domain.

上述融合标签和注意力机制的跨领域推荐方法及其实现系统中,通过融合两个相近领域的项目标签,构建跨领域融合标签,将两个相近领域融合为一个新领域,并将其作为跨领域推荐的源领域,再将一个相关度较低的一个领域作为目标领域,使得跨领域推荐的源领域数据更丰富。通过引入注意力机制在源领域和目标领域分别计算用户当前的偏好,使得推荐结果更具有时效性。通过BP神经网络获取用户在源领域和目标领域的总体偏好的映射关系,并将此映射关系应用于用户当前的偏好,将其从源领域映射到目标领域,再结合目标领域单独的推荐结果,使得最终推荐结果更加准确。In the cross-domain recommendation method and implementation system of the above-mentioned fusion label and attention mechanism, a cross-domain fusion label is constructed by fusing the project labels of two similar fields, and the two similar fields are fused into a new field, which is used as the source field of the cross-domain recommendation, and then a field with lower correlation is used as the target field, so that the source field data of the cross-domain recommendation is richer. By introducing the attention mechanism, the user's current preferences are calculated in the source field and the target field respectively, so that the recommendation results are more timely. The mapping relationship of the user's overall preferences in the source field and the target field is obtained through the BP neural network, and this mapping relationship is applied to the user's current preferences, mapping them from the source field to the target field, and then combining the recommendation results of the target field separately, so that the final recommendation results are more accurate.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明实施例的融合标签和注意力机制的跨领域推荐方法的实现系统的结构示意图。FIG1 is a schematic diagram of the structure of an implementation system of a cross-domain recommendation method integrating labels and attention mechanisms according to an embodiment of the present invention.

图2是本发明实施例的融合标签和注意力机制的跨领域推荐方法的实现系统的三层BP神经网络的结构示意图。FIG2 is a schematic diagram of the structure of a three-layer BP neural network of an implementation system of a cross-domain recommendation method integrating tags and attention mechanisms according to an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

本实施例以融合标签和注意力机制的跨领域推荐方法及其实现系统为例,以下将结合具体实施例和附图对本发明进行详细说明。This embodiment takes the cross-domain recommendation method and its implementation system that integrates labels and attention mechanisms as an example, and the present invention will be described in detail below in conjunction with specific embodiments and drawings.

请参阅图1和图2,示出本发明实施例提供的一种融合标签和注意力机制的跨领域推荐方法。Please refer to Figures 1 and 2, which show a cross-domain recommendation method that integrates labels and attention mechanisms provided by an embodiment of the present invention.

利用标签和注意力机制实现跨领域资源推荐,通过将用户在源领域的偏好映射到目标领域,再结合用户在目标领域的偏好,得到用户在目标领域的综合偏好。Cross-domain resource recommendation is achieved by using tags and attention mechanisms. By mapping the user's preferences in the source domain to the target domain and combining them with the user's preferences in the target domain, the user's comprehensive preferences in the target domain are obtained.

由于单领域数据的稀疏性,使得推荐的准确度降低,若能将多个领域的数据结合可大大提高推荐结果的可靠性。若用户在目标领域的数据稀疏,可将其源领域的偏好通过映射网络获得对应的目标领域偏好,若用户在源领域的数据同样也是稀疏的,则可通过融合一个与之相似的领域来解决。若用户在目标领域的数据丰富程度比用户在融合后的源领域的数据高很多,则可直接通过用户在目标领域的数据来学习其偏好。Due to the sparsity of single-domain data, the accuracy of recommendations is reduced. If data from multiple domains can be combined, the reliability of recommendation results can be greatly improved. If the user's data in the target domain is sparse, the preferences in the source domain can be mapped to the corresponding target domain preferences through the network. If the user's data in the source domain is also sparse, it can be solved by fusing a similar domain. If the user's data in the target domain is much richer than the user's data in the fused source domain, the user's preferences can be learned directly from the user's data in the target domain.

通过构建跨领域标签向量将两个相近领域融合为一个新领域,作为跨领域推荐的源领域。首先,使用自然语言处理的方法提取每个领域的标签,定制相应的标签向量,并将两个领域中相似度超过指定阈值的标签向量进行融合,定制跨领域融合标签。其次,通过引入注意力机制并结合LSTM,分别在源领域和目标领域计算用户在指定时间段内的偏好。再次,使用BP神经网络学习源领域标签和目标领域标签之间的映射关系,将目标领域和源领域的数据结合,使推荐结果更加准确。最后,描述了跨领域推荐模型的整体框架,将得到的用户源领域偏好通过学习到的映射网络映射到目标领域,与用户目标领域偏好结合得到最终的结果。By constructing a cross-domain label vector, two similar domains are fused into a new domain as the source domain for cross-domain recommendation. First, the label of each domain is extracted using natural language processing methods, the corresponding label vector is customized, and the label vectors whose similarity exceeds the specified threshold in the two domains are fused to customize the cross-domain fusion label. Secondly, by introducing the attention mechanism and combining it with LSTM, the user's preferences in the specified time period are calculated in the source domain and the target domain respectively. Thirdly, the BP neural network is used to learn the mapping relationship between the source domain label and the target domain label, and the data of the target domain and the source domain are combined to make the recommendation results more accurate. Finally, the overall framework of the cross-domain recommendation model is described. The user's source domain preference is mapped to the target domain through the learned mapping network, and the final result is obtained by combining it with the user's target domain preference.

1. 标签的跨领域资源融合1. Cross-domain resource integration of tags

各个领域不是孤立的,它们之间往往存在着一定的联系,当它们的相似性超过一定程度时,就可以结合为一个新的领域。这个新领域的资源数量为两个相似领域的资源数量之和,将其作为跨领域推荐的源领域,可丰富源领域的信息。Each field is not isolated, and there are often certain connections between them. When their similarities exceed a certain degree, they can be combined into a new field. The number of resources in this new field is the sum of the number of resources in the two similar fields. Using it as the source field for cross-field recommendation can enrich the information in the source field.

1.1数据预处理1.1 Data Preprocessing

记A,B为两个相似领域,我们将A和B结合的领域作为源领域,另一个跨度较大的领域C作为目标领域,处理步骤如下:Let A and B be two similar domains. We take the domain that combines A and B as the source domain and another domain C with a larger span as the target domain. The processing steps are as follows:

1)使用TF-IDF技术分别从A,B领域资源的文本中采集标签,再从已采集的标签中提取m个用户普遍使用且同时出现在两个领域的标签,即定制标签(简称DT),这些标签能较好地表示资源特征,记为DTs。然后在A、B领域分别检索与DT相关的资源,再显示每个资源所对应的标签即资源标签(简称RT),记为RTs。1) Use TF-IDF technology to collect tags from the text of resources in domains A and B respectively, and then extract m tags commonly used by users and appearing in both domains from the collected tags, namely customized tags (DT for short). These tags can better represent resource features and are recorded as DTs. Then search for resources related to DT in domains A and B respectively, and then display the tags corresponding to each resource, namely resource tags (RT for short), recorded as RTs.

2)使用TF-IDF技术从C领域中采集N个用户普遍使用的标签,记为DTt。然后检索与DT相关的资源,再显示每个资源所对应的标签,记为RTt。2) Use TF-IDF technology to collect N tags commonly used by users from the C field, denoted as DTt. Then retrieve resources related to DT and display the tags corresponding to each resource, denoted as RTt.

3)对于收集到的标签,使用NLPIR汉语分词系统完成分词,并将重复的标签去除,统计每个RT在各DT对应的资源中出现的频次,向量值越大RT和DT联系越紧密。3) For the collected tags, use the NLPIR Chinese word segmentation system to complete the word segmentation, remove duplicate tags, and count the frequency of each RT in the resources corresponding to each DT. The larger the vector value, the closer the connection between RT and DT.

4)由于不同DT检索的资源总数不同,故将所有DT向量的每个分量除以该向量的最大分量,得到度量统一的DT向量。4) Since the total number of resources retrieved by different DTs is different, each component of all DT vectors is divided by the maximum component of the vector to obtain a DT vector with a unified metric.

1.2跨领域融合标签的选择与构建1.2 Selection and Construction of Cross-Domain Fusion Labels

我们需要分析A、B领域相同的DT向量以及相同的RT向量的相关性,只有在两者相关性足够大时,才能够被利用与跨领域推荐。若A领域和B领域的同一DT向量或同一RT向量的单个分量相似并不能说明它们相关性高,但若所有分量都相似,则可认为两者的相关性足够高。利用余弦相似度计算用户A与B资源内同一DT向量以及同一RT向量的相似度,如式(1)所示。We need to analyze the correlation between the same DT vector and the same RT vector in domains A and B. Only when the correlation between the two is large enough can they be used for cross-domain recommendation. If the single component of the same DT vector or the same RT vector in domains A and B is similar, it does not mean that they are highly correlated. However, if all components are similar, it can be considered that the correlation between the two is high enough. The cosine similarity is used to calculate the similarity between the same DT vector and the same RT vector in the resources of user A and B, as shown in formula (1).

Figure SMS_99
Figure SMS_100
Figure SMS_99
Figure SMS_100

其中,

Figure SMS_101
Figure SMS_102
分别表示A领域和B领域的同一DT或RT,
Figure SMS_103
Figure SMS_104
分别为标签
Figure SMS_105
Figure SMS_106
的向量表示。为了确保推荐质量,将相似度低于阈值的标签剔除。in,
Figure SMS_101
and
Figure SMS_102
Represents the same DT or RT in field A and field B respectively,
Figure SMS_103
,
Figure SMS_104
Labels
Figure SMS_105
and
Figure SMS_106
In order to ensure the quality of recommendation, the tags with similarity below the threshold are removed.

在已构建各领域的标签向量,并对DT和RT进行筛选后,依据用户在各领域上的兴趣对相同RT向量进行加权求和,得到A-B领域标签向量矩阵,如式(2)和式(3)所示。After constructing the label vectors of each field and screening DT and RT, the same RT vectors are weighted and summed according to the user's interests in each field to obtain the A-B field label vector matrix, as shown in Equation (2) and Equation (3).

Figure SMS_107
Figure SMS_108
Figure SMS_107
Figure SMS_108

Figure SMS_109
Figure SMS_110
Figure SMS_109
Figure SMS_110

其中,

Figure SMS_111
为跨领域融合标签,
Figure SMS_112
为标签
Figure SMS_113
的向量表示。in,
Figure SMS_111
To integrate labels across domains,
Figure SMS_112
For label
Figure SMS_113
The vector representation of .

资源的RT按照该标签被用户标注的次数排序,前面的标签比后面的标签与资源的联系更为密切,按式(4)为每个资源的标签分配权重。The RT of resources is sorted according to the number of times the tag is marked by the user. The front tags are more closely related to the resources than the back tags. The weight is assigned to each resource tag according to formula (4).

Figure SMS_114
Figure SMS_115
Figure SMS_114
Figure SMS_115

对A-B领域和C领域分别使用式(4)得到所有资源的每个标签向量对应的权重,再将这些标签向量加权求和得到每个资源的资源向量,如式(5)所示。Formula (4) is used for the A-B domain and the C domain to obtain the weight corresponding to each label vector of all resources, and then the resource vector of each resource is obtained by weighted summation of these label vectors, as shown in Formula (5).

Figure SMS_116
Figure SMS_117
Figure SMS_116
Figure SMS_117

其中,

Figure SMS_118
为资源的每个标签的标签向量,
Figure SMS_119
为资源的资源向量,该过程如算法1所示。in,
Figure SMS_118
is the label vector for each label of the resource,
Figure SMS_119
is the resource vector of the resource. The process is shown in Algorithm 1.

Figure SMS_120
Figure SMS_120

2. 基于注意力机制的兴趣挖掘算法2. Interest mining algorithm based on attention mechanism

不同的人对于资源的关注点不尽相同,有些人可能喜欢历史题材的资源,有人喜欢科幻类的资源,也有人喜欢推理类的资源。为了准确地发现用户的偏好,本软件使用LSTM来学习用户和资源之间的时序关系。具体见下面的模型。Different people have different interests in resources. Some people may like historical resources, some like science fiction resources, and some like reasoning resources. In order to accurately discover user preferences, this software uses LSTM to learn the temporal relationship between users and resources. See the model below for details.

假设每个时间步为

Figure SMS_121
,记忆单元层将进行更新,并假设:Assume that each time step is
Figure SMS_121
, the memory unit layer will be updated and assume that:

1)

Figure SMS_122
是记忆单元层在时间
Figure SMS_123
的输入,即用户的浏览记录。1)
Figure SMS_122
is the memory unit layer at time
Figure SMS_123
The input is the user's browsing history.

2)

Figure SMS_124
Figure SMS_125
是权重矩阵。2)
Figure SMS_124
and
Figure SMS_125
is the weight matrix.

3)

Figure SMS_126
是偏差向量。具体为:3)
Figure SMS_126
is the deviation vector. Specifically:

① 在每一个时间步

Figure SMS_127
,将输入门输入信息与权值相乘,再加上偏置量,计算得到输入门的控制变量
Figure SMS_128
和新的输入向量
Figure SMS_129
,具体见式(6)和式(7)。① At each time step
Figure SMS_127
, multiply the input gate input information by the weight, add the bias, and calculate the control variable of the input gate
Figure SMS_128
and the new input vector
Figure SMS_129
, see formula (6) and formula (7) for details.

Figure SMS_130
Figure SMS_131
Figure SMS_130
Figure SMS_131

Figure SMS_132
Figure SMS_133
Figure SMS_132
Figure SMS_133

② 在每一个时间步

Figure SMS_134
,将计算遗忘门的控制变量
Figure SMS_135
与候选状态
Figure SMS_136
相乘,即将遗忘门输入信息与权值相乘,再加上输入门输入信息与权值的乘积,然后将记忆单元状态从
Figure SMS_137
更新到
Figure SMS_138
,具体见式(8)和式(9)。② At each time step
Figure SMS_134
, the control variable of the forget gate will be calculated
Figure SMS_135
With candidate status
Figure SMS_136
Multiply, that is, multiply the forget gate input information by the weight, plus the product of the input gate input information and the weight, and then change the memory unit state from
Figure SMS_137
Update to
Figure SMS_138
, see formula (8) and formula (9) for details.

Figure SMS_139
Figure SMS_140
Figure SMS_139
Figure SMS_140

Figure SMS_141
Figure SMS_142
Figure SMS_141
Figure SMS_142

③ 在上述更新后记忆单元状态后,就可以不断计算输出门的值,具体见式(10)和式(11)。③ After the above-mentioned updated memory cell state, the value of the output gate can be continuously calculated, as shown in equations (10) and (11).

Figure SMS_143
Figure SMS_144
Figure SMS_143
Figure SMS_144

Figure SMS_145
Figure SMS_146
Figure SMS_145
Figure SMS_146

④ 计算损失函数,并使用梯度下降法将其最小化,如式(12)所示。④ Calculate the loss function and minimize it using the gradient descent method, as shown in formula (12).

Figure SMS_147
Figure SMS_148
Figure SMS_147
Figure SMS_148

经过LSTM层后,LSTM每个时间步隐藏状态为输出结果,我们此时将LSTM输出的每一个时间步输出到注意力层,即可以捕获序列之间的依赖关系,加权求和后我们得到输出序列i对应的上下文向量表示

Figure SMS_149
,具体公式描述如式(13)和式(14)所示。After the LSTM layer, the hidden state of each time step of LSTM is the output result. At this time, we output each time step of LSTM output to the attention layer, which can capture the dependency between sequences. After weighted summation, we get the context vector representation corresponding to the output sequence i
Figure SMS_149
, the specific formula description is shown in formula (13) and formula (14).

Figure SMS_150
Figure SMS_151
Figure SMS_150
Figure SMS_151

Figure SMS_152
Figure SMS_153
Figure SMS_152
Figure SMS_153

其中,

Figure SMS_154
为LSTM第i个时间步的输出,
Figure SMS_155
表示第i个时间步与第j个时间步的输出进行归一化后的权重,这里的相似度计算函数采用的是矩阵变换,用W表示。最后还要经过一层全连接层得到最终的输出概率。该过程如算法2所示。in,
Figure SMS_154
is the output of the LSTM i -th time step,
Figure SMS_155
It represents the normalized weight of the output of the i- th time step and the j -th time step. The similarity calculation function here uses matrix transformation, represented by W. Finally, it passes through a fully connected layer to obtain the final output probability. This process is shown in Algorithm 2.

Figure SMS_156
Figure SMS_156

用户在源领域和目标领域的历史记录的所有资源向量分别通过LSTM层和注意力层后得到用户在源领域和目标领域的偏好向量,将用户在源领域的偏好向量映射到目标领域,并结合用户在目标领域的偏好得到最终结果,该映射过程在下一部分具体阐述。All resource vectors of the user's historical records in the source and target domains are passed through the LSTM layer and the attention layer respectively to obtain the user's preference vectors in the source and target domains. The user's preference vector in the source domain is mapped to the target domain, and the final result is obtained by combining the user's preference in the target domain. The mapping process is described in detail in the next section.

3. 基于BP神经网络的跨领域标签映射算法3. Cross-domain label mapping algorithm based on BP neural network

在不同的领域之间,用户的兴趣亦存在一定的对应关系。例如,喜欢看喜剧的用户大都偏向于听摇滚类型的音乐,而喜欢看恐怖片的用户一般喜欢听旋律变化较明显的音乐。这里通过三层BP神经网络来学习不同领域间的标签映射,首先在A-B领域和C领域分别收集用户使用最多的n个标签,根据式(4)分别计算两个领域中每个标签的权重,再通过加权的方式分别计算该用户在A-B和C领域的特征向量,如式(15)和式(16)所示。There is also a certain correspondence between user interests in different fields. For example, users who like to watch comedies tend to listen to rock music, while users who like to watch horror movies generally like to listen to music with obvious melody changes. Here, a three-layer BP neural network is used to learn the label mapping between different fields. First, the n labels most used by users are collected in the A-B field and the C field, and the weights of each label in the two fields are calculated according to formula (4). Then, the feature vectors of the user in the A-B and C fields are calculated in a weighted manner, as shown in formulas (15) and (16).

Figure SMS_157
Figure SMS_158
Figure SMS_157
Figure SMS_158

Figure SMS_159
Figure SMS_160
Figure SMS_159
Figure SMS_160

其中,

Figure SMS_161
为用户在A-B领域的特征向量,
Figure SMS_162
Figure SMS_163
为A-B领域各标签的权重和该标签的特征向量;
Figure SMS_164
为用户在C领域的特征向量,
Figure SMS_165
Figure SMS_166
为C领域各标签的权重和该标签的特征向量。in,
Figure SMS_161
is the feature vector of the user in the AB field,
Figure SMS_162
and
Figure SMS_163
is the weight of each label in the AB field and the feature vector of the label;
Figure SMS_164
is the feature vector of the user in the C domain,
Figure SMS_165
and
Figure SMS_166
is the weight of each label in the C field and the feature vector of the label.

接着,以

Figure SMS_167
为输入向量,
Figure SMS_168
为实际输出向量,使用BP神经网络学习
Figure SMS_169
Figure SMS_170
之间的映射,如式(17)~(21)所示:Next,
Figure SMS_167
is the input vector,
Figure SMS_168
To actually output the vector, use BP neural network to learn
Figure SMS_169
and
Figure SMS_170
The mapping between them is shown in equations (17) to (21):

1) 输入层到隐藏层的映射关系如式(17)所示:1) The mapping relationship from the input layer to the hidden layer is shown in formula (17):

Figure SMS_171
Figure SMS_172
Figure SMS_171
Figure SMS_172

2) 经过隐藏层的激活函数如式(18)所示:2) The activation function of the hidden layer is shown in formula (18):

Figure SMS_173
Figure SMS_174
Figure SMS_173
Figure SMS_174

3) 隐藏层到输出层的映射关系如式(19)所示:3) The mapping relationship from the hidden layer to the output layer is shown in formula (19):

Figure SMS_175
Figure SMS_176
Figure SMS_175
Figure SMS_176

4) 经过输出层的激活函数如式(20)所示:4) The activation function of the output layer is shown in formula (20):

Figure SMS_177
Figure SMS_178
Figure SMS_177
Figure SMS_178

5) BP神经网络的损失函数如式(21)所示:5) The loss function of BP neural network is shown in formula (21):

Figure SMS_179
Figure SMS_180
Figure SMS_179
Figure SMS_180

其中,在式(14)和式(16)中,

Figure SMS_181
,具体模型结构如图2所示。Among them, in formula (14) and formula (16),
Figure SMS_181
,The specific model structure is shown in Figure 2.

在BP神经网络中,隐含层单元的数目与问题的要求、输入、输出单元的数目都有直接关系。数目太少,所获取的信息太少,则出现欠拟合。数目太多,增加训练时间,则容易出现过拟合,泛化能力也差。本软件使用黄金分割法确定BP神经网络的隐藏层单元的数量。即,

Figure SMS_182
,其中,
Figure SMS_183
为输入层单元数量,
Figure SMS_184
为输出层单元数量。最后,使用梯度下降法最小化损失函数。该过程如算法3所示。In BP neural network, the number of hidden layer units is directly related to the requirements of the problem, the number of input and output units. If the number is too small, the information obtained is too little, and underfitting occurs. If the number is too large, the training time is increased, and overfitting is likely to occur, and the generalization ability is also poor. This software uses the golden section method to determine the number of hidden layer units of BP neural network. That is,
Figure SMS_182
,in,
Figure SMS_183
is the number of input layer units,
Figure SMS_184
is the number of output layer units. Finally, the gradient descent method is used to minimize the loss function. The process is shown in Algorithm 3.

Figure SMS_185
Figure SMS_185

在将用户源领域的偏好向量映射到目标领域后,以一定的方式与用户在目标领域的偏好向量加权求和,得到用户在目标领域的综合偏好,该过程在下一部分详细描述。After mapping the user's preference vector in the source domain to the target domain, it is weighted and summed with the user's preference vector in the target domain in a certain way to obtain the user's comprehensive preference in the target domain. This process is described in detail in the next section.

4. 融合标签映射和注意力机制的跨领域推荐框架4. Cross-domain recommendation framework integrating label mapping and attention mechanism

根据前三个部分,下面将给出融合标签映射和注意力机制的跨领域推荐算法。在将用户源领域的偏好向量映射到目标领域后,需要以一定的方式与用户目标领域的偏好向量加权求和。Based on the first three parts, the cross-domain recommendation algorithm integrating label mapping and attention mechanism is given below. After mapping the preference vector of the user's source domain to the target domain, it needs to be weighted and summed with the preference vector of the user's target domain in a certain way.

跨领域推荐的意义在于解决用户在目标领域数据稀疏的问题。用户在目标领域数据相对源领域足够丰富,则更多地使用用户在目标领域地数据,故可按用户在源领域和目标领域的资源数占比确定权重,算法的整体框架如图1所示。The significance of cross-domain recommendation is to solve the problem of sparse data in the target domain. If the user's data in the target domain is relatively rich compared to the source domain, more of the user's data in the target domain will be used. Therefore, the weight can be determined according to the proportion of the user's resources in the source domain and the target domain. The overall framework of the algorithm is shown in Figure 1.

记A-B领域的历史记录经过LSTM层和注意力层处理后的用户偏好向量为

Figure SMS_186
,C领域的历史记录经过LSTM网络处理后的用户偏好向量为
Figure SMS_187
,源领域和目标领域之间的映射网络为
Figure SMS_188
,按式(22)将
Figure SMS_189
Figure SMS_190
加权求和,得到最终的用户资源向量。The user preference vector after the historical records in the AB field are processed by the LSTM layer and the attention layer is
Figure SMS_186
, the user preference vector after the historical records in field C are processed by the LSTM network is
Figure SMS_187
, the mapping network between the source domain and the target domain is
Figure SMS_188
, according to formula (22)
Figure SMS_189
and
Figure SMS_190
The weighted sum is performed to obtain the final user resource vector.

Figure SMS_191
Figure SMS_192
Figure SMS_191
Figure SMS_192

其中,

Figure SMS_193
表示用户的资源向量,
Figure SMS_194
表示用户历史记录中目标领域资源的数量,
Figure SMS_195
表示用户历史记录中源领域资源的数量。若用户在目标领域浏览过的数据量很少,则可通过其在源领域的偏好映射得到在目标领域的偏好;若用户在目标领域的数据远多于其在源领域的数据,则数据稀疏性不存在,相当于直接根据单领域推荐的结果得出最终结果。in,
Figure SMS_193
represents the user's resource vector,
Figure SMS_194
Indicates the number of target domain resources in the user's history.
Figure SMS_195
Indicates the number of source domain resources in the user's history. If the amount of data that the user has browsed in the target domain is very small, the preference in the target domain can be obtained through the preference mapping in the source domain; if the user has much more data in the target domain than in the source domain, data sparsity does not exist, which is equivalent to directly obtaining the final result based on the result of single-domain recommendation.

在计算得到最终的用户资源向量后,使用式(23)计算用户资源向量和资源库中目标领域的资源向量的相似度,并向用户推荐TOPN列表。After the final user resource vector is calculated, the similarity between the user resource vector and the resource vector of the target domain in the resource library is calculated using formula (23), and the TOPN list is recommended to the user.

Figure SMS_196
Figure SMS_197
Figure SMS_196
Figure SMS_197

其中,

Figure SMS_198
表示用户资源向量,
Figure SMS_199
表示资源库中目标领域的资源向量。该过程如算法4所示。in,
Figure SMS_198
represents the user resource vector,
Figure SMS_199
Represents the resource vector of the target domain in the resource library. The process is shown in Algorithm 4.

Figure SMS_200
Figure SMS_200

算法4将用户源领域的偏好与目标领域的偏好结合,得到用户在目标领域的综合偏好,然后计算其与目标领域用户未浏览过的资源向量的相似度,将前N项推荐给用户。资源更丰富的领域对最终的结果影响更大。Algorithm 4 combines the user's preferences in the source domain with the preferences in the target domain to obtain the user's comprehensive preferences in the target domain, and then calculates its similarity with the resource vectors in the target domain that the user has not browsed, and recommends the top N items to the user. The domain with richer resources has a greater impact on the final result.

以及,一种融合标签和注意力机制的跨领域推荐方法的实现系统,其用于实现如上任一项所述的融合标签和注意力机制的跨领域推荐方法,该实现系统包括:And, a system for implementing a cross-domain recommendation method integrating labels and attention mechanisms, which is used to implement the cross-domain recommendation method integrating labels and attention mechanisms as described in any one of the above items, and the implementation system includes:

基于标签的跨领域资源融合算法模块,利用两个相近领域A和B的普遍使用的定制标签,得到与所述定制标签对应资源的资源标签,分析A、B领域相同的定制标签DT向量以及相同的资源标签RT向量的相似度,并剔除相似度低于阈值的资源标签向量;将由两个相近领域A和B组成的源领域及目标领域C的资源标签向量分别进行加权求和,得到源领域和目标领域的每个资源的资源向量;The label-based cross-domain resource fusion algorithm module uses the commonly used custom labels of two similar fields A and B to obtain the resource labels of the resources corresponding to the custom labels, analyzes the similarity of the same custom label DT vectors and the same resource label RT vectors in fields A and B, and removes the resource label vectors whose similarity is lower than the threshold; the resource label vectors of the source field and the target field C composed of two similar fields A and B are weighted and summed respectively to obtain the resource vectors of each resource in the source field and the target field;

基于注意力机制的兴趣挖掘算法模块,利用用户在源领域和目标领域的历史记录的所有资源向量,通过长短期记忆算法LSTM层和注意力层的深入学习,得到用户在源领域和目标领域的偏好向量;The interest mining algorithm module based on the attention mechanism uses all resource vectors of the user's historical records in the source and target fields, and obtains the user's preference vectors in the source and target fields through in-depth learning of the long short-term memory algorithm LSTM layer and the attention layer;

基于BP神经网络的跨领域标签映射算法模块,利用三层BP神经网络,将用户源领域的偏好向量映射到目标领域,得到用户在目标领域的综合偏好;The cross-domain label mapping algorithm module based on BP neural network uses a three-layer BP neural network to map the user's preference vector in the source domain to the target domain, and obtain the user's comprehensive preference in the target domain;

融合标签映射和注意力机制的跨领域推荐算法模块,用于结合用户在源领域和目标领域的偏好,得到用户在目标领域的综合偏好,然后计算用户在目标领域中未浏览过的资源向量与用户的综合偏好的相似度,得到用户在目标领域的推荐列表。The cross-domain recommendation algorithm module that integrates label mapping and attention mechanism is used to combine the user's preferences in the source domain and the target domain to obtain the user's comprehensive preference in the target domain, and then calculates the similarity between the resource vector that the user has not browsed in the target domain and the user's comprehensive preference to obtain the user's recommendation list in the target domain.

上述融合标签和注意力机制的跨领域推荐方法及其实现系统中,通过融合两个相近领域的项目标签,构建跨领域融合标签,将两个相近领域融合为一个新领域,并将其作为跨领域推荐的源领域,再将一个相关度较低的一个领域作为目标领域,使得跨领域推荐的源领域数据更丰富。通过引入注意力机制在源领域和目标领域分别计算用户当前的偏好,使得推荐结果更具有时效性。通过BP神经网络获取用户在源领域和目标领域的总体偏好的映射关系,并将此映射关系应用于用户当前的偏好,将其从源领域映射到目标领域,再结合目标领域单独的推荐结果,使得最终推荐结果更加准确。In the cross-domain recommendation method and implementation system of the above-mentioned fusion label and attention mechanism, a cross-domain fusion label is constructed by fusing the project labels of two similar fields, and the two similar fields are fused into a new field, which is used as the source field of the cross-domain recommendation, and then a field with lower correlation is used as the target field, so that the source field data of the cross-domain recommendation is richer. By introducing the attention mechanism, the user's current preferences are calculated in the source field and the target field respectively, so that the recommendation results are more timely. The mapping relationship of the user's overall preferences in the source field and the target field is obtained through the BP neural network, and this mapping relationship is applied to the user's current preferences, which are mapped from the source field to the target field, and then combined with the recommendation results of the target field separately, so that the final recommendation results are more accurate.

需要说明的是,以上所述仅为本发明的优选实施例,并不用于限制本发明,对于本领域技术人员而言,本发明可以有各种改动和变化。凡在本发明的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。It should be noted that the above is only a preferred embodiment of the present invention and is not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the protection scope of the present invention.

Claims (6)

1.一种融合标签和注意力机制的跨领域推荐方法,其特征在于,包括以下步骤:1. A cross-domain recommendation method integrating labels and attention mechanism, characterized by comprising the following steps: 步骤一,选择和构建跨领域融合的标签,将两个相近领域融合为一个新领域,作为跨领域推荐的源领域,对源领域和目标领域的标签向量分别进行加权求和,得到每个资源的资源向量;Step 1: Select and construct cross-domain fusion labels, merge two similar domains into a new domain as the source domain for cross-domain recommendation, and perform weighted summation on the label vectors of the source domain and the target domain to obtain the resource vector of each resource; 步骤二,依据基于注意力机制的兴趣挖掘算法,通过长短期记忆算法LSTM学习用户和资源之间的时序关系,并引入注意力机制,得到用户在源领域和目标领域的指定时间段内的偏好;Step 2: Based on the interest mining algorithm based on the attention mechanism, the temporal relationship between users and resources is learned through the long short-term memory algorithm LSTM, and the attention mechanism is introduced to obtain the user's preferences in the specified time period of the source and target fields; 步骤三,依据基于BP神经网络的跨领域标签映射算法,通过三层BP神经网络来学习源领域与目标领域间的标签映射,将用户源领域的偏好向量映射到目标领域后,并与用户在目标领域的偏好向量加权求和,得到用户在目标领域的综合偏好;Step 3: According to the cross-domain label mapping algorithm based on BP neural network, the label mapping between the source domain and the target domain is learned through a three-layer BP neural network. After mapping the user's preference vector in the source domain to the target domain, it is weighted and summed with the user's preference vector in the target domain to obtain the user's comprehensive preference in the target domain. 步骤四,融合标签映射和注意力机制的跨领域推荐算法,根据用户在源领域和目标领域中的综合偏好,计算在目标领域中,用户未浏览过的资源向量与用户的综合偏好的相似度,将前N项推荐给用户;Step 4: A cross-domain recommendation algorithm that integrates label mapping and attention mechanism calculates the similarity between the resource vectors that the user has not browsed and the user's comprehensive preferences in the target domain according to the user's comprehensive preferences in the source domain and the target domain, and recommends the top N items to the user; 其中,步骤一中的所述选择和构建跨领域融合的标签流程包括:The process of selecting and constructing cross-domain fusion labels in step 1 includes: 步骤1-1,源领域和目标领域的数据预处理:Step 1-1, data preprocessing of source and target domains: 采集两个相近的领域A和领域B、及目标领域C中的普遍使用的标签作为定制标签DT,检索与所述定制标签相关的资源,得到与所述资源对应的资源标签RT,去除重复的标签后,得到RT-DT矩阵,矩阵的每一列为定制标签DT向量,每一行为资源标签RT向量,对剩余的定制标签向量进行统一度量;Collect commonly used tags in two similar fields A and B and the target field C as custom tags DT, retrieve resources related to the custom tags, obtain resource tags RT corresponding to the resources, remove duplicate tags, and obtain the RT-DT matrix. Each column of the matrix is a custom tag DT vector, and each row is a resource tag RT vector. Perform a unified measurement on the remaining custom tag vectors. 步骤1-2,跨领域融合标签的选择与构建;Step 1-2, selection and construction of cross-domain fusion labels; 分析A、B领域相同的定制标签DT向量以及相同的资源标签RT向量的相似度,利用余弦相似度计算用户A与B资源内同一DT向量以及同一RT向量的相似度,如式(1)所示:Analyze the similarity of the same custom tag DT vector and the same resource tag RT vector in domains A and B, and use cosine similarity to calculate the similarity of the same DT vector and the same RT vector in the resources of users A and B, as shown in formula (1):
Figure QLYQS_1
Figure QLYQS_2
Figure QLYQS_1
Figure QLYQS_2
其中,
Figure QLYQS_3
Figure QLYQS_4
分别表示A领域和B领域的同一DT或RT,
Figure QLYQS_5
Figure QLYQS_6
分别为标签
Figure QLYQS_7
Figure QLYQS_8
的向量表示;为了确保推荐质量,将相似度低于阈值的标签剔除;
in,
Figure QLYQS_3
and
Figure QLYQS_4
Represents the same DT or RT in field A and field B respectively,
Figure QLYQS_5
,
Figure QLYQS_6
Labels
Figure QLYQS_7
and
Figure QLYQS_8
To ensure the quality of recommendation, labels with similarity below the threshold are removed.
在已构建标签向量的各领域中,对定制标签DT和资源标签RT进行筛选,依据用户在各领域上的兴趣对相同资源标签RT向量进行加权求和,得到A-B领域标签向量矩阵,如式(2)和式(3)所示:In each field where the label vector has been constructed, the custom label DT and resource label RT are screened, and the weighted sum of the same resource label RT vectors is performed according to the user's interest in each field to obtain the A-B field label vector matrix, as shown in formula (2) and formula (3):
Figure QLYQS_9
Figure QLYQS_10
Figure QLYQS_9
Figure QLYQS_10
Figure QLYQS_11
Figure QLYQS_12
Figure QLYQS_11
Figure QLYQS_12
其中,
Figure QLYQS_13
为跨领域融合标签,
Figure QLYQS_14
为标签
Figure QLYQS_15
的向量表示;
in,
Figure QLYQS_13
To integrate labels across domains,
Figure QLYQS_14
For label
Figure QLYQS_15
The vector representation of ;
各个资源的资源标签RT按照该标签被用户标注的次数排序,前面的标签比后面的标签与资源的联系更为密切,按式(4)为每个资源的标签分配权重:The resource tag RT of each resource is sorted according to the number of times the tag is marked by the user. The front tags are more closely related to the resource than the back tags. The weight is assigned to each resource tag according to formula (4):
Figure QLYQS_16
Figure QLYQS_17
Figure QLYQS_16
Figure QLYQS_17
对A-B领域和C领域分别使用式(4)得到所有资源的每个标签向量对应的权重,再将这些标签向量加权求和得到每个资源的资源向量,如式(5)所示:Formula (4) is used for the A-B domain and the C domain to obtain the weights corresponding to each label vector of all resources, and then the weighted sum of these label vectors is used to obtain the resource vector of each resource, as shown in Formula (5):
Figure QLYQS_18
Figure QLYQS_19
Figure QLYQS_18
Figure QLYQS_19
其中,
Figure QLYQS_20
为资源的每个标签的标签向量,
Figure QLYQS_21
为资源的资源向量;
in,
Figure QLYQS_20
is the label vector for each label of the resource,
Figure QLYQS_21
is the resource vector of the resource;
步骤二中的所述基于注意力机制的兴趣挖掘算法,包括如下步骤:The interest mining algorithm based on the attention mechanism in step 2 includes the following steps: 使用长短期记忆算法学习用户和资源之间的时序关系,假设记忆单元层的更新间隔为时间步
Figure QLYQS_22
Figure QLYQS_23
是记忆单元层在时间
Figure QLYQS_24
的输入;
Figure QLYQS_25
Figure QLYQS_26
是权重矩阵;
Figure QLYQS_27
是偏差向量;具体为:
Use the long short-term memory algorithm to learn the temporal relationship between users and resources, assuming that the update interval of the memory unit layer is the time step
Figure QLYQS_22
,
Figure QLYQS_23
is the memory unit layer at time
Figure QLYQS_24
Input;
Figure QLYQS_25
and
Figure QLYQS_26
is the weight matrix;
Figure QLYQS_27
is the deviation vector; specifically:
步骤2-1,在每一个时间步
Figure QLYQS_28
,将输入门输入信息与权值相乘,再加上偏置量,计算得到输入门的控制变量
Figure QLYQS_29
和新的输入向量
Figure QLYQS_30
,具体如式(6)和式(7)所示:
Step 2-1, at each time step
Figure QLYQS_28
, multiply the input gate input information by the weight, add the bias, and calculate the control variable of the input gate
Figure QLYQS_29
and the new input vector
Figure QLYQS_30
, as shown in formula (6) and formula (7):
Figure QLYQS_31
Figure QLYQS_32
Figure QLYQS_31
Figure QLYQS_32
Figure QLYQS_33
Figure QLYQS_34
Figure QLYQS_33
Figure QLYQS_34
步骤2-2,在每一个时间步
Figure QLYQS_35
,计算遗忘门的控制变量
Figure QLYQS_36
与候选状态
Figure QLYQS_37
相乘,即将遗忘门输入信息与权值相乘,再加上输入门输入信息与权值的乘积,然后将记忆单元状态从
Figure QLYQS_38
更新到
Figure QLYQS_39
,具体如式(8)和式(9)所示:
Step 2-2, at each time step
Figure QLYQS_35
, calculate the control variable of the forget gate
Figure QLYQS_36
With candidate status
Figure QLYQS_37
Multiply, that is, multiply the forget gate input information by the weight, plus the product of the input gate input information and the weight, and then change the memory unit state from
Figure QLYQS_38
Update to
Figure QLYQS_39
, as shown in formula (8) and formula (9):
Figure QLYQS_40
Figure QLYQS_41
Figure QLYQS_40
Figure QLYQS_41
Figure QLYQS_42
Figure QLYQS_43
Figure QLYQS_42
Figure QLYQS_43
步骤2-3,在更新后的记忆单元状态中,不断计算输出门的值,具体如式(10)和式(11)所示:Step 2-3, in the updated memory cell state, the value of the output gate is continuously calculated, as shown in equations (10) and (11):
Figure QLYQS_44
Figure QLYQS_45
Figure QLYQS_44
Figure QLYQS_45
Figure QLYQS_46
Figure QLYQS_47
Figure QLYQS_46
Figure QLYQS_47
步骤2-4,计算损失函数,并使用梯度下降法将其最小化,如式(12)所示:Step 2-4, calculate the loss function and minimize it using the gradient descent method, as shown in formula (12):
Figure QLYQS_48
Figure QLYQS_49
Figure QLYQS_48
Figure QLYQS_49
经过长短期记忆算法LSTM层计算后,将LSTM每个时间步隐藏状态作为输出结果输出到注意力层,以捕获序列之间的依赖关系,加权求和后得到输出序列i对应的上下文向量表示
Figure QLYQS_50
,具体公式描述如式(13)和式(14)所示:
After the LSTM layer is calculated, the hidden state of each time step of the LSTM is output to the attention layer as the output result to capture the dependencies between sequences. After weighted summation, the context vector representation corresponding to the output sequence i is obtained.
Figure QLYQS_50
, the specific formula description is shown in formula (13) and formula (14):
Figure QLYQS_51
Figure QLYQS_52
Figure QLYQS_51
Figure QLYQS_52
Figure QLYQS_53
Figure QLYQS_54
Figure QLYQS_53
Figure QLYQS_54
其中,
Figure QLYQS_55
为LSTM第i个时间步的输出,
Figure QLYQS_56
表示第i个时间步与第j个时间步的输出进行归一化后的权重,相似度计算函数采用W表示的矩阵变换;
in,
Figure QLYQS_55
is the output of the LSTM i -th time step,
Figure QLYQS_56
represents the normalized weight of the output of the i -th time step and the j -th time step, and the similarity calculation function uses the matrix transformation represented by W;
步骤三中的所述基于BP神经网络的跨领域标签映射算法,包括如下流程:The cross-domain label mapping algorithm based on BP neural network in step 3 includes the following process: 通过三层BP神经网络来学习不同领域间的标签映射,首先在A-B领域和C领域分别收集用户使用最多的n个标签,根据式(4)分别计算两个领域中每个标签的权重,再通过加权分别计算该用户在A-B和C领域的特征向量,具体如式(15)和式(16)所示:The label mapping between different fields is learned through a three-layer BP neural network. First, the n labels most used by users in the A-B field and the C field are collected respectively. The weight of each label in the two fields is calculated according to formula (4). Then, the feature vectors of the user in the A-B and C fields are calculated by weighting, as shown in formulas (15) and (16):
Figure QLYQS_57
Figure QLYQS_58
Figure QLYQS_57
Figure QLYQS_58
Figure QLYQS_59
Figure QLYQS_60
Figure QLYQS_59
Figure QLYQS_60
其中,
Figure QLYQS_61
为用户在A-B领域的特征向量,
Figure QLYQS_62
Figure QLYQS_63
为A-B领域各标签的权重和该标签的特征向量;
Figure QLYQS_64
为用户在C领域的特征向量,
Figure QLYQS_65
Figure QLYQS_66
为C领域各标签的权重和该标签的特征向量;
in,
Figure QLYQS_61
is the feature vector of the user in the AB field,
Figure QLYQS_62
and
Figure QLYQS_63
is the weight of each label in the AB field and the feature vector of the label;
Figure QLYQS_64
is the feature vector of the user in the C domain,
Figure QLYQS_65
and
Figure QLYQS_66
is the weight of each label in field C and the feature vector of the label;
然后,以
Figure QLYQS_67
为输入向量,
Figure QLYQS_68
为实际输出向量,使用BP神经网络学习
Figure QLYQS_69
Figure QLYQS_70
之间的映射,如式(17)~(21)所示:
Then,
Figure QLYQS_67
is the input vector,
Figure QLYQS_68
To actually output the vector, use BP neural network to learn
Figure QLYQS_69
and
Figure QLYQS_70
The mapping between them is shown in equations (17) to (21):
输入层到隐藏层的映射关系如式(17)所示:The mapping relationship from the input layer to the hidden layer is shown in formula (17):
Figure QLYQS_71
Figure QLYQS_72
Figure QLYQS_71
Figure QLYQS_72
经过隐藏层的激活函数如式(18)所示:The activation function of the hidden layer is shown in formula (18):
Figure QLYQS_73
Figure QLYQS_74
Figure QLYQS_73
Figure QLYQS_74
隐藏层到输出层的映射关系如式(19)所示:The mapping relationship from the hidden layer to the output layer is shown in formula (19):
Figure QLYQS_75
Figure QLYQS_76
Figure QLYQS_75
Figure QLYQS_76
经过输出层的激活函数如式(20)所示:The activation function of the output layer is shown in formula (20):
Figure QLYQS_77
Figure QLYQS_78
Figure QLYQS_77
Figure QLYQS_78
BP神经网络的损失函数如式(21)所示:The loss function of the BP neural network is shown in formula (21):
Figure QLYQS_79
Figure QLYQS_80
Figure QLYQS_79
Figure QLYQS_80
其中,在式(14)和式(16)中,
Figure QLYQS_81
n为BP神经网络的隐藏层单元的数量,即,
Figure QLYQS_82
,其中,
Figure QLYQS_83
为输入层单元数量,
Figure QLYQS_84
为输出层单元数量;
Among them, in formula (14) and formula (16),
Figure QLYQS_81
; n is the number of hidden layer units of the BP neural network, that is,
Figure QLYQS_82
,in,
Figure QLYQS_83
is the number of input layer units,
Figure QLYQS_84
is the number of output layer units;
步骤四中的所述融合标签映射和注意力机制的跨领域推荐算法,具体包括如下步骤:The cross-domain recommendation algorithm integrating label mapping and attention mechanism in step 4 specifically includes the following steps: 步骤4-1,设A-B领域的历史记录经过LSTM层和注意力层处理后的用户偏好向量为
Figure QLYQS_85
,C领域的历史记录经过LSTM网络处理后的用户偏好向量为
Figure QLYQS_86
,源领域和目标领域之间的映射网络为
Figure QLYQS_87
,按式(22)将
Figure QLYQS_88
Figure QLYQS_89
加权求和,得到最终的用户资源向量:
Step 4-1: Assume that the user preference vector of the historical records in field AB after being processed by the LSTM layer and the attention layer is
Figure QLYQS_85
, the user preference vector after the historical records in field C are processed by the LSTM network is
Figure QLYQS_86
, the mapping network between the source domain and the target domain is
Figure QLYQS_87
, according to formula (22)
Figure QLYQS_88
and
Figure QLYQS_89
Weighted summation is performed to obtain the final user resource vector:
Figure QLYQS_90
Figure QLYQS_91
Figure QLYQS_90
Figure QLYQS_91
其中,
Figure QLYQS_92
表示用户的资源向量,
Figure QLYQS_93
表示用户历史记录中目标领域资源的数量,
Figure QLYQS_94
表示用户历史记录中源领域资源的数量;
in,
Figure QLYQS_92
represents the user's resource vector,
Figure QLYQS_93
Indicates the number of target domain resources in the user's history.
Figure QLYQS_94
Indicates the number of source domain resources in the user's history;
步骤4-2,在计算得到最终的用户资源向量后,计算用户资源向量和资源库中目标领域的资源向量的相似度,具体如式(23)所示:Step 4-2, after calculating the final user resource vector, calculate the similarity between the user resource vector and the resource vector of the target domain in the resource library, as shown in formula (23):
Figure QLYQS_95
Figure QLYQS_96
Figure QLYQS_95
Figure QLYQS_96
其中,
Figure QLYQS_97
表示用户资源向量,
Figure QLYQS_98
表示资源库中目标领域的资源向量。
in,
Figure QLYQS_97
represents the user resource vector,
Figure QLYQS_98
A resource vector representing the target domain in the resource library.
2.如权利要求1所述的融合标签和注意力机制的跨领域推荐方法,其特征在于,步骤1-1中的源领域和目标领域的数据预处理,包括如下步骤:2. The cross-domain recommendation method integrating labels and attention mechanism as claimed in claim 1, characterized in that the data preprocessing of the source domain and the target domain in step 1-1 comprises the following steps: 步骤1-1-1,使用TF-IDF技术分别从A、B领域资源的文本中采集标签,再从已采集的标签中提取m个用户普遍使用且同时出现在两个领域的标签,即定制标签DT,这些标签能较好地表示资源特征,记为DTs;然后在A、B领域分别检索与DT相关的资源,再显示每个资源所对应的标签即资源标签RT,记为RTs;Step 1-1-1, use TF-IDF technology to collect tags from the text of resources in fields A and B respectively, and then extract m tags commonly used by users and appearing in both fields from the collected tags, namely customized tags DT. These tags can better represent resource characteristics, recorded as DTs; then search for resources related to DT in fields A and B respectively, and then display the tags corresponding to each resource, namely resource tags RT, recorded as RTs; 步骤1-1-2,使用TF-IDF技术从C领域中采集N个用户普遍使用的标签,记为DTt,然后检索与DT相关的资源,再显示每个资源所对应的标签,记为RTt;Step 1-1-2, use TF-IDF technology to collect N tags commonly used by users from the C field, denoted as DTt, then retrieve resources related to DT, and then display the tags corresponding to each resource, denoted as RTt; 步骤1-1-3,对于收集到的标签,使用NLPIR汉语分词系统完成分词,并将重复的标签去除,统计每个RT在各DT对应的资源中出现的频次,向量值越大RT和DT联系越紧密;Step 1-1-3: For the collected tags, use the NLPIR Chinese word segmentation system to complete the word segmentation, remove duplicate tags, and count the frequency of each RT in the resources corresponding to each DT. The larger the vector value, the closer the connection between RT and DT; 步骤1-1-4,由于不同DT检索的资源总数不同,故将所有DT向量的每个分量除以该向量的最大分量,得到度量统一的DT向量。Step 1-1-4, since the total number of resources retrieved by different DTs is different, each component of all DT vectors is divided by the maximum component of the vector to obtain a DT vector with a unified metric. 3.如权利要求1所述的融合标签和注意力机制的跨领域推荐方法,其特征在于,依据所述基于注意力机制的兴趣挖掘算法,用户在源领域和目标领域的历史记录的所有资源向量分别通过LSTM层和注意力层计算后,得到用户在源领域和目标领域的偏好向量。3. The cross-domain recommendation method integrating tags and attention mechanism as described in claim 1 is characterized in that, according to the interest mining algorithm based on the attention mechanism, all resource vectors of the user's historical records in the source domain and the target domain are calculated through the LSTM layer and the attention layer respectively to obtain the user's preference vectors in the source domain and the target domain. 4.如权利要求1所述的融合标签和注意力机制的跨领域推荐方法,其特征在于,BP神经网络的隐藏层单元的数量n采用黄金分割法确定。4. The cross-domain recommendation method integrating labels and attention mechanisms as described in claim 1 is characterized in that the number n of hidden layer units of the BP neural network is determined by the golden section method. 5.如权利要求1所述的融合标签和注意力机制的跨领域推荐方法,其特征在于,所述最终的用户资源向量的计算中,根据用户在目标领域中的浏览量不同,直接影响用户在目标领域中的未浏览过的资源向量的相似度;当用户在目标领域浏览过的数据量很少时,可通过用户在源领域的偏好映射得到在目标领域的偏好计算用户在目标领域中未浏览过的资源向量的相似度;当用户在目标领域中的浏览数据量远大于其在源领域中的浏览数据量时,则数据稀疏性不存在,相当于直接根据目标领域中的浏览过的资源向量得出最终推荐结果。5. The cross-domain recommendation method of integrating tags and attention mechanism as described in claim 1 is characterized in that, in the calculation of the final user resource vector, the similarity of the resource vectors that the user has not browsed in the target domain is directly affected by the different browsing volume of the user in the target domain; when the amount of data browsed by the user in the target domain is very small, the similarity of the resource vectors that the user has not browsed in the target domain can be calculated by mapping the user's preference in the source domain to obtain the preference in the target domain; when the amount of data browsed by the user in the target domain is much larger than the amount of data browsed in the source domain, data sparsity does not exist, which is equivalent to directly obtaining the final recommendation result based on the browsed resource vectors in the target domain. 6.一种融合标签和注意力机制的跨领域推荐的实现系统,其用于实现如权利要求1-5任一项所述的融合标签和注意力机制的跨领域推荐方法,其特征在于,该实现系统包括:6. A cross-domain recommendation implementation system integrating labels and attention mechanisms, which is used to implement the cross-domain recommendation method integrating labels and attention mechanisms as described in any one of claims 1 to 5, characterized in that the implementation system comprises: 基于标签的跨领域资源融合算法模块,利用两个相近领域A和B的普遍使用的定制标签,得到与所述定制标签对应资源的资源标签,分析A、B领域相同的定制标签DT向量以及相同的资源标签RT向量的相似度,并剔除相似度低于阈值的资源标签向量;将由两个相近领域A和B组成的源领域及目标领域C的资源标签向量分别进行加权求和,得到源领域和目标领域的每个资源的资源向量;The label-based cross-domain resource fusion algorithm module uses the commonly used custom labels of two similar fields A and B to obtain the resource labels of the resources corresponding to the custom labels, analyzes the similarity of the same custom label DT vectors and the same resource label RT vectors in fields A and B, and removes the resource label vectors whose similarity is lower than the threshold; the resource label vectors of the source field and the target field C composed of two similar fields A and B are weighted and summed respectively to obtain the resource vectors of each resource in the source field and the target field; 基于注意力机制的兴趣挖掘算法模块,利用用户在源领域和目标领域的历史记录的所有资源向量,通过长短期记忆算法LSTM层和注意力层的深入学习,得到用户在源领域和目标领域的偏好向量;The interest mining algorithm module based on the attention mechanism uses all resource vectors of the user's historical records in the source and target fields, and obtains the user's preference vectors in the source and target fields through in-depth learning of the long short-term memory algorithm LSTM layer and the attention layer; 基于BP神经网络的跨领域标签映射算法模块,利用三层BP神经网络,将用户源领域的偏好向量映射到目标领域,得到用户在目标领域的综合偏好;The cross-domain label mapping algorithm module based on BP neural network uses a three-layer BP neural network to map the user's preference vector in the source domain to the target domain, and obtain the user's comprehensive preference in the target domain; 融合标签映射和注意力机制的跨领域推荐算法模块,用于结合用户在源领域和目标领域的偏好,得到用户在目标领域的综合偏好,然后计算用户在目标领域中未浏览过的资源向量与用户的综合偏好的相似度,得到用户在目标领域的推荐列表。The cross-domain recommendation algorithm module that integrates label mapping and attention mechanism is used to combine the user's preferences in the source domain and the target domain to obtain the user's comprehensive preference in the target domain, and then calculates the similarity between the resource vector that the user has not browsed in the target domain and the user's comprehensive preference to obtain the user's recommendation list in the target domain.
CN202010068923.6A 2020-01-21 2020-01-21 A cross-domain recommendation method and its implementation system that integrates labels and attention mechanisms Active CN111291261B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010068923.6A CN111291261B (en) 2020-01-21 2020-01-21 A cross-domain recommendation method and its implementation system that integrates labels and attention mechanisms

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010068923.6A CN111291261B (en) 2020-01-21 2020-01-21 A cross-domain recommendation method and its implementation system that integrates labels and attention mechanisms

Publications (2)

Publication Number Publication Date
CN111291261A CN111291261A (en) 2020-06-16
CN111291261B true CN111291261B (en) 2023-05-26

Family

ID=71018207

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010068923.6A Active CN111291261B (en) 2020-01-21 2020-01-21 A cross-domain recommendation method and its implementation system that integrates labels and attention mechanisms

Country Status (1)

Country Link
CN (1) CN111291261B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111813924B (en) * 2020-07-09 2021-04-09 四川大学 Class Detection Algorithm and System Based on Scalable Dynamic Selection and Attention Mechanism
CN111737582B (en) * 2020-07-29 2020-12-08 腾讯科技(深圳)有限公司 Content recommendation method and device
CN111931061B (en) * 2020-08-26 2023-03-24 腾讯科技(深圳)有限公司 Label mapping method and device, computer equipment and storage medium
CN112035743B (en) 2020-08-28 2021-10-15 腾讯科技(深圳)有限公司 Data recommendation method and device, computer equipment and storage medium
CN114422859B (en) * 2020-10-28 2024-01-30 贵州省广播电视信息网络股份有限公司 Deep learning-based ordering recommendation system and method for cable television operators
CN112100509B (en) * 2020-11-17 2021-02-09 腾讯科技(深圳)有限公司 Information recommendation method, device, server and storage medium
CN112464097B (en) * 2020-12-07 2023-06-06 广东工业大学 A cross-domain recommendation method and system for information fusion of multiple auxiliary domains
CN112417298B (en) * 2020-12-07 2021-06-29 中山大学 Cross-domain recommendation method and system based on a small number of overlapped users
CN112541132B (en) * 2020-12-23 2023-11-10 北京交通大学 Cross-domain recommendation method based on multi-view knowledge representation
CN113127737B (en) * 2021-04-14 2021-09-14 江苏科技大学 Personalized search method and search system integrating attention mechanism
CN114722275B (en) * 2022-03-25 2024-12-17 中国电子科技集团公司第七研究所 Label-based cross-modal resource recommendation method and system
CN114862514A (en) * 2022-05-10 2022-08-05 西安交通大学 A meta-learning-based method for user preference item recommendation
CN115098931B (en) * 2022-07-20 2022-12-16 江苏艾佳家居用品有限公司 Small sample analysis method for mining personalized requirements of indoor design of user
CN116012118B (en) * 2023-02-28 2023-08-29 荣耀终端有限公司 A product recommendation method and device
CN116070034B (en) * 2023-03-03 2023-11-03 江西财经大学 Graph convolution network recommendation method combining adaptive period and interest factor
CN116843394B (en) * 2023-09-01 2023-11-21 星河视效科技(北京)有限公司 AI-based advertisement pushing method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105138624A (en) * 2015-08-14 2015-12-09 北京矩道优达网络科技有限公司 Personalized recommendation method based on user data of on-line courses
CN107122399A (en) * 2017-03-16 2017-09-01 中国科学院自动化研究所 Combined recommendation system based on Public Culture knowledge mapping platform
CN107291815A (en) * 2017-05-22 2017-10-24 四川大学 Recommend method in Ask-Answer Community based on cross-platform tag fusion
JP2017204289A (en) * 2017-06-28 2017-11-16 凸版印刷株式会社 Electronic flyer recommendation system, electronic flyer recommendation server, and program
WO2018176413A1 (en) * 2017-03-31 2018-10-04 Microsoft Technology Licensing, Llc Providing news recommendation in automated chatting
CN110232153A (en) * 2019-05-29 2019-09-13 华南理工大学 A kind of cross-cutting recommended method based on content

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10659422B2 (en) * 2012-04-30 2020-05-19 Brightedge Technologies, Inc. Content management systems

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105138624A (en) * 2015-08-14 2015-12-09 北京矩道优达网络科技有限公司 Personalized recommendation method based on user data of on-line courses
CN107122399A (en) * 2017-03-16 2017-09-01 中国科学院自动化研究所 Combined recommendation system based on Public Culture knowledge mapping platform
WO2018176413A1 (en) * 2017-03-31 2018-10-04 Microsoft Technology Licensing, Llc Providing news recommendation in automated chatting
CN107291815A (en) * 2017-05-22 2017-10-24 四川大学 Recommend method in Ask-Answer Community based on cross-platform tag fusion
JP2017204289A (en) * 2017-06-28 2017-11-16 凸版印刷株式会社 Electronic flyer recommendation system, electronic flyer recommendation server, and program
CN110232153A (en) * 2019-05-29 2019-09-13 华南理工大学 A kind of cross-cutting recommended method based on content

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TA-BLSTM: Tag Attention-based Bidirectional Long Short-Term Memory for Service Recommendation in Mashup Creation;Min Shi;《2019 International Joint Conference on Neural Networks (IJCNN)》;20190719;全文 *
基于深度学习的推荐系统研究综述;黄立威等;《计算机学报》;20180305(第07期);全文 *

Also Published As

Publication number Publication date
CN111291261A (en) 2020-06-16

Similar Documents

Publication Publication Date Title
CN111291261B (en) A cross-domain recommendation method and its implementation system that integrates labels and attention mechanisms
Wang et al. Market2Dish: Health-aware food recommendation
Wang et al. A sentiment‐enhanced hybrid recommender system for movie recommendation: a big data analytics framework
CN108256093B (en) A Collaborative Filtering Recommendation Algorithm Based on User's Multi-interest and Interest Change
CN108363804B (en) Local model weighted fusion Top-N movie recommendation method based on user clustering
CN105224699B (en) News recommendation method and device
Pei et al. Interacting attention-gated recurrent networks for recommendation
CN109543109B (en) A Recommendation Algorithm Integrating Time Window Technology and Rating Prediction Model
US8818917B2 (en) Multi-social network discovery system and method
CN107357793A (en) Information recommendation method and device
Zhu et al. A graph-oriented model for hierarchical user interest in precision social marketing
CN110490686A (en) A kind of building of commodity Rating Model, recommended method and system based on Time Perception
Lee et al. Dynamic item recommendation by topic modeling for social networks
Yu et al. Spectrum-enhanced pairwise learning to rank
Kanaujia et al. A framework for development of recommender system for financial data analysis
Nasir et al. A survey and taxonomy of sequential recommender systems for e-commerce product recommendation
USRE45770E1 (en) Adaptive recommendation explanations
CN109670914B (en) Product recommendation method based on time dynamic characteristics
Hoang et al. Academic event recommendation based on research similarity and exploring interaction between authors
Yang et al. Collaborative meta-path modeling for explainable recommendation
Liu et al. CFDA: collaborative filtering with dual autoencoder for recommender system
Wang et al. BERT-based aggregative group representation for group recommendation
Wang et al. Factorization meets memory network: Learning to predict activity popularity
CN114862518A (en) A method for building a smart clothing recommendation model and a smart clothing recommendation method
Qiu et al. Multi-view hybrid recommendation model based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20231013

Address after: 200120, Room 505, 5th Floor, Building 8, No. 399 Jianyun Road, Pudong New Area, Shanghai

Patentee after: Shanghai Juhui Network Technology Co.,Ltd.

Address before: 230000 floor 1, building 2, phase I, e-commerce Park, Jinggang Road, Shushan Economic Development Zone, Hefei City, Anhui Province

Patentee before: Dragon totem Technology (Hefei) Co.,Ltd.

Effective date of registration: 20231013

Address after: 230000 floor 1, building 2, phase I, e-commerce Park, Jinggang Road, Shushan Economic Development Zone, Hefei City, Anhui Province

Patentee after: Dragon totem Technology (Hefei) Co.,Ltd.

Address before: 330000 No.169 Shuanggang East Street, Nanchang Economic and Technological Development Zone, Jiangxi Province

Patentee before: JIANGXI University OF FINANCE AND ECONOMICS

TR01 Transfer of patent right