WO2023035506A1 - 一种融合了序列信息的特征组合推荐算法框架 - Google Patents

一种融合了序列信息的特征组合推荐算法框架 Download PDF

Info

Publication number
WO2023035506A1
WO2023035506A1 PCT/CN2021/142966 CN2021142966W WO2023035506A1 WO 2023035506 A1 WO2023035506 A1 WO 2023035506A1 CN 2021142966 W CN2021142966 W CN 2021142966W WO 2023035506 A1 WO2023035506 A1 WO 2023035506A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
item
sub
information
items
Prior art date
Application number
PCT/CN2021/142966
Other languages
English (en)
French (fr)
Inventor
陈心童
傅剑文
韩弘炀
章建森
周文彬
Original Assignee
天翼电子商务有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 天翼电子商务有限公司 filed Critical 天翼电子商务有限公司
Priority to JP2023513552A priority Critical patent/JP2023545896A/ja
Publication of WO2023035506A1 publication Critical patent/WO2023035506A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]

Definitions

  • the invention relates to an Internet recommendation system, in particular to a feature combination recommendation algorithm framework that combines sequence information.
  • the Internet recommendation system refers to a system that finds items that users may be interested in through algorithms and recommends them to users. It has become an indispensable part of Internet products such as e-commerce, advertising, audio and video.
  • the click-through rate estimation in the recommendation system refers to judging the possibility of a user clicking on a product based on information such as user attributes, historical behavior, and material attributes.
  • Internet products display materials with high click probability to users, so as to promote user conversion, improve satisfaction, and achieve new and active Internet products.
  • this patent proposes a feature combination recommendation algorithm framework that integrates sequence information. Through the end-to-end model framework, it can more accurately describe user preferences and item characteristics, and improve the accuracy of the click-through rate estimation of the recommendation system. .
  • the technical problem to be solved by the present invention is to overcome the defects of the prior art, and provide a feature combination recommendation algorithm framework that combines sequence information.
  • the present invention provides the following technical solutions:
  • the present invention provides a feature combination recommendation algorithm framework that combines sequence information, including the following steps:
  • User interest sub-network use the above user clicked item sequence data, input the item id and feature information of the user clicked sequence, and map it into a dense vector through the embedding layer; express the user's historical interest preference through the sum pooling layer; at the same time, the user's feature Information and candidate items are mapped into dense vectors through the embedding layer, spliced and compressed with the aforementioned user historical interest preferences, and construct the network relationship between user information, user historical interest, and target items, and then train the network through the fully connected layer and sigmoid function Parameters; as shown in Figure 3; the Loss function is:
  • n is the total number of samples
  • y is the label of whether to click
  • p is the probability value of the output layer
  • Feature combination sub-network use the above-mentioned click classification sample data; for the exposed ⁇ user, item> sample pair, supplemented with user attribute information, item attribute information, user-item interaction information, and whether the user clicks as a label Binary classification learning; the input feature information is mapped to a dense vector through the shared embedding layer, and the feature combination crossover is realized through the traditional DNN network, and finally the classification learning is performed through the sigmoid function, as shown in Figure 4; the Loss function is:
  • n is the total number of samples
  • y is the label of whether to click
  • p is the probability value of the output layer
  • Each sub-network learns a part of the information of the recommendation scene, in which the user interest sub-network learns the user's past interest preferences and optimizes the representation of user features and item features, and the feature combination sub-network learns the high relationship between items and users The relationship between order cross features and click-through rate also optimizes the representation of user features and item features simultaneously;
  • this framework jointly learns the two sub-networks; first, the two sub-networks share the embedding mapping layer, and the interest sequence information and single-point prediction information are integrated in the embedding layer to improve the accuracy of the embedding representation; at the same time, in the final The loss layer weights and sums the losses of the two sub-networks;
  • can be adjusted according to the data characteristics and learning effect of the scene
  • the two sub-networks in the framework are trained simultaneously through joint learning; during offline inference, the timeliness requirements are low, and accurate inference can be performed through the entire framework; during online inference, the system requires high timeliness, and the user clicks on the sequence expression
  • the interests and hobbies of the user are constantly changing, and the user interest sub-network can be used alone to infer and update the recommendation results, which is better close to the user's short-term interest changes, and at the same time realizes online lightweight deployment;
  • the item is sorted in descending order, and the K items with the highest probability value are stored in hbase as the sorting result;
  • the front end obtains the sorting results stored in the hbase through the interface, and displays them in order.
  • the user's historical click sequence is integrated into the click-through rate estimation model to better capture user preferences and effectively improve the accuracy of user characterization
  • Fig. 1 is the implementation flowchart of the present invention
  • Fig. 2 is a schematic diagram of user-item interaction sequence data of the present invention
  • Fig. 3 is a schematic diagram of the user interest sub-network of the present invention.
  • Fig. 4 is a schematic diagram of the feature combination sub-network of the present invention.
  • Fig. 5 is a schematic diagram of the recommendation framework of the present invention.
  • the present invention is shown in Figures 1-5.
  • the present invention provides a feature combination recommendation algorithm framework that incorporates sequence information, including the following steps:
  • User interest sub-network use the above user clicked item sequence data, input the item id and feature information of the user clicked sequence, and map it into a dense vector through the embedding layer.
  • the user's historical interest preferences are expressed through the sum pooling layer.
  • the user's feature information and candidate items are mapped into a dense vector through the embedding layer, and are spliced and compressed with the aforementioned user's historical interest preferences to construct a network relationship between user information, user historical interests, and target items, and then through the fully connected layer and
  • the sigmoid function training network parameters.
  • Loss function is:
  • n is the total number of samples
  • y is the label of whether to click
  • p is the probability value of the output layer
  • Feature combination sub-network use the above-mentioned hit classification sample data.
  • the input feature information is mapped to a dense vector through the shared embedding layer, and the feature combination cross is realized through the traditional DNN network, and finally the classification learning is performed through the sigmoid function, as shown in Figure 4.
  • Loss function is:
  • n is the total number of samples
  • y is the label of whether to click
  • p is the probability value of the output layer
  • Each sub-network learns a part of the information of the recommendation scene, in which the user interest sub-network learns the user's past interest preferences and optimizes the representation of user features and item features, and the feature combination sub-network learns the high relationship between items and users The relationship between order cross features and click-through rate also optimizes the representation of user features and item features simultaneously.
  • this framework jointly learns the two sub-networks.
  • the two sub-networks share the embedding mapping layer, and the interest sequence information and single-point prediction information are integrated in the embedding layer to improve the accuracy of the embedding representation.
  • the losses of the two sub-networks are weighted and summed;
  • can be adjusted according to the data characteristics and learning effect of the scene
  • the two sub-networks in the framework are trained simultaneously through joint learning.
  • the timeliness requirements are low, and accurate inference can be performed through the entire framework.
  • the system requires high timeliness, and the interests and hobbies expressed by the user's click sequence are constantly changing.
  • the user interest sub-network can be used alone to infer and update the recommendation results, which is better close to the user's short-term interest changes, and at the same time realizes online lightweight deployment;
  • the item is sorted in descending order, and the K items with the highest probability value are stored in hbase as the sorting result;
  • the front end obtains the sorting results stored in the hbase through the interface, and displays them in order.
  • the example process is as follows:

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Neurology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了一种融合了序列信息的特征组合的推荐方法;该推荐方法包括:收集用户与物品的交互日志;将收集的日志信息整理成结构化数据,其中包括两部分数据;将两部分数据引入本方法框架进行学习,两部分数据分别对应框架中的两个子网络;将两个子网络在本框架中通过共享embedding映射并进行联合学习;进行点击率预估与线上服务;根据推断得到的用户对物品的点击概率值进行降序排序,将概率值最高的K个物品作为排序结果存储到hbase;前端在用户登录时,通过接口获取hbase中存储的排序结果,并按顺序进行展示。

Description

一种融合了序列信息的特征组合推荐算法框架 技术领域
本发明涉及互联网推荐系统,特别涉及一种融合了序列信息的特征组合推荐算法框架。
背景技术
互联网推荐系统是指通过算法找到用户可能感兴趣的物品推荐给用户的系统。它已经是电商、广告、音视频等互联网产品中不可或缺的一部分。
推荐系统中的点击率预估指的是根据用户的属性、历史行为、物料的属性等信息判断用户点击商品的可能性。互联网产品将点击概率高的物料展示给用户,以此促进用户转化,提升满意度,为互联网产品实现拉新促活。
本专利针对这一场景,提出了一种融合了序列信息的特征组合推荐算法框架,通过端到端的模型框架,更精准的刻画用户偏好及物品特征,提升了推荐系统点击率预估的准确性。
发明内容
本发明要解决的技术问题是克服现有技术的缺陷,提供一种融合了序列信息的特征组合推荐算法框架。
为了解决上述技术问题,本发明提供了如下的技术方案:
本发明提供一种融合了序列信息的特征组合推荐算法框架,包括以下步骤:
S1、收集用户与物品的交互日志:包括用户U曝光的物品E,点击的物品C(其中C∈E)及每个行为的时间戳;同时保存用户特征信息(如年龄、性别、地域等)、物品特征信息(如类别、品牌、店铺等);
S2、将收集的日志信息整理成结构化数据,其中包括两个部分:
1)用户点击物品序列数据;将每个用户曝光未点击/点击物品按点击时间排序,并以k个点击物品为单位截断生成样本数据;如图二所示;
2)点击分类样本数据;将<用户编号,物品编号>作为样本主键;其中,曝光未点击的样本标记为0,曝光且点击的样本标记为1;同时整理用户特征信息、物品特征信息、用户与物品交互信息作为样本特征;
S3、将上述两部分数据引入本算法框架进行学习,两部分数据分别对应框架中的两个子网络:
1)用户兴趣子网络:使用上述用户点击物品序列数据,输入用户点击序列的物品id以及特征信息,经过embedding层映射成稠密向量;经过sum pooling层表示出用户历史兴趣偏好;同时,用户的特征信息与候选物品经过embedding层映射成稠密向量,与前述用户历史兴趣偏好拼接并压缩,构建出用户信息、用户历史兴趣、目标物品之间的网络关系,再通过全连接层与sigmoid函数,训练网络参数;如图三所示;Loss function为:
Figure PCTCN2021142966-appb-000001
其中n为样本总量,y为是否点击的标签,p为输出层的概率值;
2)特征组合子网络:使用上述点击分类样本数据;对于曝光的<用户,物品>样本对,辅以用户的属性信息、物品属性信息、用户与物品的交互信息,以用户是否点击作为标签进行二分类学习;输入的特征信息经过共享embedding层映射为稠密向量,并经过传统的DNN网络实现特征组合交叉,最终经sigmoid函数进行分类学习,如图四所示;Loss function为:
Figure PCTCN2021142966-appb-000002
其中n为样本总量,y为是否点击的标签,p为输出层的概率值;
S4、将上述两个子网络在本框架中通过共享embedding映射并进行联合学习,如图五;
每个子网络都学习了推荐场景的一部分信息,其中用户兴趣子网络学习 了用户过去的兴趣偏好同时优化了用户特征与物品特征的表征,特征组合子网络通过网络学习了物品、用户之间的高阶交叉特征与点击率之间的关系也同步优化了用户特征与物品特征的表征;
为提升学习效果,本框架将两个子网络进行联合学习;首先两个子网络共享embedding映射层,在embedding层融合了兴趣序列信息与单点预估信息,提升embedding表征准确性;同时,在最终的loss层将两个子网络的loss进行加权求和;
loss=loss ctr+α*loss interest
其中α可根据场景的数据特性及学习效果进行调整;
S5、点击率预估与线上服务:
模型训练时将框架中的两个子网络通过联合学习同时训练;在离线推断时,时效性要求较低,可通过整个框架进行精准推断;在线推断时,系统时效性要求高,且用户点击序列表达的兴趣爱好在不断变动中,可单独使用用户兴趣子网络单独推断更新推荐结果,更好的贴近用户短期的兴趣变化,同时实现线上轻量部署;
S6、根据推断得到的用户对物品的点击概率值进行降序排序,将概率值最高的K个物品作为排序结果存储到hbase;
S7、前端在用户登录时,通过接口获取hbase中存储的排序结果,并按顺序进行展示。
与现有技术相比,本发明的有益效果如下:
1.用户历史点击序列融入点击率预估模型,更好的捕捉用户兴趣偏好,有效提升了用户刻画精准度;
2.在算法框架中保留了特征组合模型能力的同时,也结合用户兴趣,通过多类模型共享表征映射层的方式,更精准的学习物品表征,从而提升推荐 模型准确性;
3.模型训练时两个子网络融合后联合学习,离线服务使用框架整体进行推断,在线服务仅使用用户兴趣子网络进行推断,在保证推荐准确性的同时实现轻量级部署。
附图说明
附图用来提供对本发明的进一步理解,并且构成说明书的一部分,与本发明的实施例一起用于解释本发明,并不构成对本发明的限制。在附图中:
图1是本发明的实施流程图;
图2是本发明的用户-物品交互序列数据示意图;
图3是本发明的用户兴趣子网络示意图;
图4是本发明的特征组合子网络示意图;
图5是本发明的推荐框架示意图。
具体实施方式
以下结合附图对本发明的优选实施例进行说明,应当理解,此处所描述的优选实施例仅用于说明和解释本发明,并不用于限定本发明。
实施例1
本发明如图1-5所示,本发明提供一种融合了序列信息的特征组合推荐算法框架,包括以下步骤:
S1、收集用户与物品的交互日志:包括用户U曝光的物品E,点击的物品C(其中C∈E)及每个行为的时间戳。同时保存用户特征信息(如年龄、性别、地域等)、物品特征信息(如类别、品牌、店铺等);
S2、将收集的日志信息整理成结构化数据,其中包括两个部分:
1)用户点击物品序列数据。将每个用户曝光未点击/点击物品按点击时间排序,并以k个点击物品为单位截断生成样本数据。如图二所示;
2)点击分类样本数据。将<用户编号,物品编号>作为样本主键。其中, 曝光未点击的样本标记为0,曝光且点击的样本标记为1。同时整理用户特征信息、物品特征信息、用户与物品交互信息作为样本特征;
S3、将上述两部分数据引入本算法框架进行学习,两部分数据分别对应框架中的两个子网络:
1)用户兴趣子网络:使用上述用户点击物品序列数据,输入用户点击序列的物品id以及特征信息,经过embedding层映射成稠密向量。经过sum pooling层表示出用户历史兴趣偏好。同时,用户的特征信息与候选物品经过embedding层映射成稠密向量,与前述用户历史兴趣偏好拼接并压缩,构建出用户信息、用户历史兴趣、目标物品之间的网络关系,再通过全连接层与sigmoid函数,训练网络参数。如图三所示。Loss function为:
Figure PCTCN2021142966-appb-000003
其中n为样本总量,y为是否点击的标签,p为输出层的概率值;
2)特征组合子网络:使用上述点击分类样本数据。对于曝光的<用户,物品>样本对,辅以用户的属性信息、物品属性信息、用户与物品的交互信息,以用户是否点击作为标签进行二分类学习。输入的特征信息经过共享embedding层映射为稠密向量,并经过传统的DNN网络实现特征组合交叉,最终经sigmoid函数进行分类学习,如图四所示。Loss function为:
Figure PCTCN2021142966-appb-000004
其中n为样本总量,y为是否点击的标签,p为输出层的概率值;
S4、将上述两个子网络在本框架中通过共享embedding映射并进行联合学习,如图五;
每个子网络都学习了推荐场景的一部分信息,其中用户兴趣子网络学习了用户过去的兴趣偏好同时优化了用户特征与物品特征的表征,特征组合子 网络通过网络学习了物品、用户之间的高阶交叉特征与点击率之间的关系也同步优化了用户特征与物品特征的表征。
为提升学习效果,本框架将两个子网络进行联合学习。首先两个子网络共享embedding映射层,在embedding层融合了兴趣序列信息与单点预估信息,提升embedding表征准确性。同时,在最终的loss层将两个子网络的loss进行加权求和;
loss=loss ctr+α*loss interest
其中α可根据场景的数据特性及学习效果进行调整;
S5、点击率预估与线上服务:
模型训练时将框架中的两个子网络通过联合学习同时训练。在离线推断时,时效性要求较低,可通过整个框架进行精准推断。在线推断时,系统时效性要求高,且用户点击序列表达的兴趣爱好在不断变动中,可单独使用用户兴趣子网络单独推断更新推荐结果,更好的贴近用户短期的兴趣变化,同时实现线上轻量部署;
S6、根据推断得到的用户对物品的点击概率值进行降序排序,将概率值最高的K个物品作为排序结果存储到hbase;
S7、前端在用户登录时,通过接口获取hbase中存储的排序结果,并按顺序进行展示。
具体的,示例过程如下:
1.收集用户与物品的交互记录,包括曝光记录与点击记录,以及用户特征与物品特征。
2.根据交互记录生成两份结构化子数据,包括用户点击物品的序列数据,以及对应的用户特征与物品特征;点击分类样本数据,包括曝光未点击/点击的标签,以及用户特征、物品特征、用户与物品的交互特征。
3.将两份结构化子数据输入本框架的两个子网络,并根据本框架的联合训练方式进行模型学习,设置α为0.2。
4.将3中训练得到的embedding表征及各子网络中的参数保存。
5.在离线服务时,将预测的用户及物品特征输入本框架,输出得到用户对物品的点击率预估值;在线服务时,将预测的用户及物品特征数用户兴趣子网络,输出得到更新的点击率预估值。
6.将点击率最高的TOP100物品降序排序,并作为最终排序结果展示给用户。
最后应说明的是:以上所述仅为本发明的优选实施例而已,并不用于限制本发明,尽管参照前述实施例对本发明进行了详细的说明,对于本领域的技术人员来说,其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (1)

  1. 一种融合了序列信息的特征组合推荐算法框架,其特征在于,包括以下步骤:
    S1、收集用户与物品的交互日志:包括用户U曝光的物品E,点击的物品C(其中C∈E)及每个行为的时间戳;同时保存用户特征信息(如年龄、性别、地域等)、物品特征信息(如类别、品牌、店铺等);
    S2、将收集的日志信息整理成结构化数据,其中包括两个部分:
    1)用户点击物品序列数据;将每个用户曝光未点击/点击物品按点击时间排序,并以k个点击物品为单位截断生成样本数据;如图二所示;
    2)点击分类样本数据;将<用户编号,物品编号>作为样本主键;其中,曝光未点击的样本标记为0,曝光且点击的样本标记为1;同时整理用户特征信息、物品特征信息、用户与物品交互信息作为样本特征;
    S3、将上述两部分数据引入本算法框架进行学习,两部分数据分别对应框架中的两个子网络:
    1)用户兴趣子网络:使用上述用户点击物品序列数据,输入用户点击序列的物品id以及特征信息,经过embedding层映射成稠密向量;经过sum pooling层表示出用户历史兴趣偏好;同时,用户的特征信息与候选物品经过embedding层映射成稠密向量,与前述用户历史兴趣偏好拼接并压缩,构建出用户信息、用户历史兴趣、目标物品之间的网络关系,再通过全连接层与sigmoid函数,训练网络参数;如图三所示;Loss function为:
    Figure PCTCN2021142966-appb-100001
    其中n为样本总量,y为是否点击的标签,p为输出层的概率值;
    2)特征组合子网络:使用上述点击分类样本数据;对于曝光的<用户,物品>样本对,辅以用户的属性信息、物品属性信息、用户与物品的交互信息,以用户是否点击作为标签进行二分类学习;输入的特征信息经过共享 embedding层映射为稠密向量,并经过传统的DNN网络实现特征组合交叉,最终经sigmoid函数进行分类学习,如图四所示;Loss function为:
    Figure PCTCN2021142966-appb-100002
    其中n为样本总量,y为是否点击的标签,p为输出层的概率值;
    S4、将上述两个子网络在本框架中通过共享embedding映射并进行联合学习,如图五;
    每个子网络都学习了推荐场景的一部分信息,其中用户兴趣子网络学习了用户过去的兴趣偏好同时优化了用户特征与物品特征的表征,特征组合子网络通过网络学习了物品、用户之间的高阶交叉特征与点击率之间的关系也同步优化了用户特征与物品特征的表征;
    为提升学习效果,本框架将两个子网络进行联合学习;首先两个子网络共享embedding映射层,在embedding层融合了兴趣序列信息与单点预估信息,提升embedding表征准确性;同时,在最终的loss层将两个子网络的loss进行加权求和;
    loss=loss ctr+α*loss interest
    其中α可根据场景的数据特性及学习效果进行调整;
    S5、点击率预估与线上服务:
    模型训练时将框架中的两个子网络通过联合学习同时训练;在离线推断时,时效性要求较低,可通过整个框架进行精准推断;在线推断时,系统时效性要求高,且用户点击序列表达的兴趣爱好在不断变动中,可单独使用用户兴趣子网络单独推断更新推荐结果,更好的贴近用户短期的兴趣变化,同时实现线上轻量部署;
    S6、根据推断得到的用户对物品的点击概率值进行降序排序,将概率值 最高的K个物品作为排序结果存储到hbase;
    S7、前端在用户登录时,通过接口获取hbase中存储的排序结果,并按顺序进行展示。
PCT/CN2021/142966 2021-09-07 2021-12-30 一种融合了序列信息的特征组合推荐算法框架 WO2023035506A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2023513552A JP2023545896A (ja) 2021-09-07 2021-12-30 シーケンス情報を融合した特徴組み合わせ推薦アルゴリズムフレームワーク

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111046834.2A CN115774810A (zh) 2021-09-07 2021-09-07 一种融合了序列信息的特征组合推荐算法框架
CN202111046834.2 2021-09-07

Publications (1)

Publication Number Publication Date
WO2023035506A1 true WO2023035506A1 (zh) 2023-03-16

Family

ID=85388456

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/142966 WO2023035506A1 (zh) 2021-09-07 2021-12-30 一种融合了序列信息的特征组合推荐算法框架

Country Status (3)

Country Link
JP (1) JP2023545896A (zh)
CN (1) CN115774810A (zh)
WO (1) WO2023035506A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116029357A (zh) * 2023-03-29 2023-04-28 荣耀终端有限公司 训练样本生成、模型训练、点击率评估方法及电子设备
CN116720003A (zh) * 2023-08-08 2023-09-08 腾讯科技(深圳)有限公司 排序处理方法、装置、计算机设备、及存储介质
CN116887001A (zh) * 2023-09-06 2023-10-13 四川中电启明星信息技术有限公司 融合社会属性信息的短视频推送方法、装置及电子设备

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117708437B (zh) * 2024-02-05 2024-04-16 四川日报网络传媒发展有限公司 一种个性化内容的推荐方法、装置、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190095464A1 (en) * 2017-09-25 2019-03-28 Equifax Inc. Dual deep learning architecture for machine-learning systems
CN110555743A (zh) * 2018-05-31 2019-12-10 阿里巴巴集团控股有限公司 商品对象推荐方法、装置及电子设备
CN112035746A (zh) * 2020-09-01 2020-12-04 湖南大学 一种基于时空序列图卷积网络的会话推荐方法
US20210027146A1 (en) * 2018-10-23 2021-01-28 Tencent Technology (Shenzhen) Company Limited Method and apparatus for determining interest of user for information item
CN112541806A (zh) * 2020-12-18 2021-03-23 北京航天云路有限公司 一种基于异质信息网络的推荐方法及装置
CN112905876A (zh) * 2020-03-16 2021-06-04 腾讯科技(深圳)有限公司 基于深度学习的信息推送方法、装置和计算机设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190095464A1 (en) * 2017-09-25 2019-03-28 Equifax Inc. Dual deep learning architecture for machine-learning systems
CN110555743A (zh) * 2018-05-31 2019-12-10 阿里巴巴集团控股有限公司 商品对象推荐方法、装置及电子设备
US20210027146A1 (en) * 2018-10-23 2021-01-28 Tencent Technology (Shenzhen) Company Limited Method and apparatus for determining interest of user for information item
CN112905876A (zh) * 2020-03-16 2021-06-04 腾讯科技(深圳)有限公司 基于深度学习的信息推送方法、装置和计算机设备
CN112035746A (zh) * 2020-09-01 2020-12-04 湖南大学 一种基于时空序列图卷积网络的会话推荐方法
CN112541806A (zh) * 2020-12-18 2021-03-23 北京航天云路有限公司 一种基于异质信息网络的推荐方法及装置

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116029357A (zh) * 2023-03-29 2023-04-28 荣耀终端有限公司 训练样本生成、模型训练、点击率评估方法及电子设备
CN116029357B (zh) * 2023-03-29 2023-08-15 荣耀终端有限公司 训练样本生成、模型训练、点击率评估方法及电子设备
CN116720003A (zh) * 2023-08-08 2023-09-08 腾讯科技(深圳)有限公司 排序处理方法、装置、计算机设备、及存储介质
CN116720003B (zh) * 2023-08-08 2023-11-10 腾讯科技(深圳)有限公司 排序处理方法、装置、计算机设备、及存储介质
CN116887001A (zh) * 2023-09-06 2023-10-13 四川中电启明星信息技术有限公司 融合社会属性信息的短视频推送方法、装置及电子设备
CN116887001B (zh) * 2023-09-06 2023-12-15 四川中电启明星信息技术有限公司 融合社会属性信息的短视频推送方法、装置及电子设备

Also Published As

Publication number Publication date
CN115774810A (zh) 2023-03-10
JP2023545896A (ja) 2023-11-01

Similar Documents

Publication Publication Date Title
WO2023035506A1 (zh) 一种融合了序列信息的特征组合推荐算法框架
WO2020147594A1 (zh) 获取实体间关系表达的方法、系统和设备、广告召回系统
CN107609063B (zh) 一种多标签分类的手机应用推荐系统及其方法
WO2019157928A1 (zh) 一种获取多标签用户画像的方法和装置
CN111160954B (zh) 基于图卷积网络模型的面向群组对象的推荐方法
CN115062732B (zh) 基于大数据用户标签信息的资源共享合作推荐方法及系统
CN111949887A (zh) 物品推荐方法、装置及计算机可读存储介质
CN114997956B (zh) 基于大数据的母婴产品智能推荐系统
WO2022247666A1 (zh) 一种内容的处理方法、装置、计算机设备和存储介质
CN115329215A (zh) 异构网络中基于自适应动态知识图谱的推荐方法及系统
CN113590965B (zh) 一种融合知识图谱与情感分析的视频推荐方法
CN111429161A (zh) 特征提取方法、特征提取装置、存储介质及电子设备
CN112364245B (zh) 基于异构信息网络嵌入的Top-K电影推荐方法
CN112541010B (zh) 一种基于逻辑回归的用户性别预测方法
CN111815410B (zh) 基于选择性邻域信息的商品推荐方法
CN116342228B (zh) 一种基于有向图神经网络的相关推荐的方法
CN110851694A (zh) 基于用户记忆网络和树形结构的深度模型的个性化推荐系统
US8971645B1 (en) Video categorization using heterogeneous signals
US20160171228A1 (en) Method and apparatus for obfuscating user demographics
WO2023284516A1 (zh) 基于知识图谱的信息推荐方法、装置、设备、介质及产品
Vaishanvi et al. Product Recommendation Using Sentiment Analysis
Liu et al. Gcn-int: A click-through rate prediction model based on graph convolutional network interaction
CN114519600A (zh) 一种融合相邻节点方差的图神经网络ctr预估算法
CN113987338A (zh) 一种基于标签的异质图推荐算法
Pereira et al. A survey on filtering techniques for recommendation system

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2023513552

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21956675

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE