CN111310050B

CN111310050B - Recommendation method based on multilayer attention

Info

Publication number: CN111310050B
Application number: CN202010123053.8A
Authority: CN
Inventors: 何晓明; 王娜
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2020-02-27
Filing date: 2020-02-27
Publication date: 2023-04-18
Anticipated expiration: 2040-02-27
Also published as: WO2021169367A1; CN111310050A

Abstract

The invention discloses a recommendation method based on multilayer attention, which comprises the steps of obtaining historical behaviors of a user to be recommended and generating a user behavior sequence according to the historical behaviors; determining a recommendation score corresponding to each article in a preset article set based on the user behavior sequence and the trained recommendation network model; and determining a recommended article corresponding to the user to be recommended according to the recommendation score, and pushing the recommended article to the user to be recommended. According to the method and the system, the historical behaviors of the user are used as input items, and the context characteristics of the historical behaviors of the user are learned through the recommendation network model to determine the recommended articles, so that the accuracy of recommending the articles can be improved.

Description

A recommendation method based on multi-layer attention

技术领域Technical Field

本发明涉及人工智能技术领域，特别涉及一种基于多层注意力的推荐方法。The present invention relates to the field of artificial intelligence technology, and in particular to a recommendation method based on multi-layer attention.

背景技术Background Art

随着移动通信技术的不断发展，大数据服务的不断深入，各类互联网应用应运而生，由此带来的信息激增，信息超载问题成为人们亟待解决的问题。例如，抖音短视频、快手短视频、腾讯微视等短视频平台中数以亿计的视频信息，淘宝、京东、Amazon等电商中眼花缭乱的商品数据。但是互联网应用可利用的信息具有多源异构、分布不均匀、大规模等复杂特征，对推荐策略而言，看似数据信息丰富，实则存在很强的局限性：(1)经典的协同过滤方法无法利用用户和项目的深层次特征；(2)基于内容的推荐方法需要有效的特征提取，传统的浅层模型依赖于人工设计特征，其有效性及可扩展性有限，制约了推荐算法的性能；(3)用户提供的显式反馈远远小于隐式反馈，利用用户的显式反馈进行推荐，应用场景受限。With the continuous development of mobile communication technology and the deepening of big data services, various Internet applications have emerged, resulting in a surge in information. Information overload has become an urgent problem to be solved. For example, there are hundreds of millions of video information in short video platforms such as Douyin, Kuaishou, and Tencent Weishi, and dazzling product data in e-commerce platforms such as Taobao, JD.com, and Amazon. However, the information available for Internet applications has complex characteristics such as multi-source heterogeneity, uneven distribution, and large scale. For recommendation strategies, although the data information seems to be rich, it actually has strong limitations: (1) Classic collaborative filtering methods cannot utilize the deep features of users and items; (2) Content-based recommendation methods require effective feature extraction. Traditional shallow models rely on artificially designed features, and their effectiveness and scalability are limited, which restricts the performance of recommendation algorithms; (3) The explicit feedback provided by users is far less than the implicit feedback. The application scenarios of using users' explicit feedback for recommendation are limited.

发明内容Summary of the invention

本发明要解决的技术问题在于，针对现有技术的不足，提供一种基于多层注意力的推荐方法。The technical problem to be solved by the present invention is to provide a recommendation method based on multi-layer attention in view of the deficiencies of the prior art.

为了解决上述技术问题，本发明所采用的技术方案如下：In order to solve the above technical problems, the technical solution adopted by the present invention is as follows:

一种基于多层注意力的推荐方法，所述方法包括：A recommendation method based on multi-layer attention, the method comprising:

获取待推荐用户的历史行为，并根据所述历史行为生成用户行为序列；Obtaining historical behaviors of the user to be recommended, and generating a user behavior sequence based on the historical behaviors;

基于所述用户行为序列以及经训练的推荐网络模型确定预设物品集中各物品对应的推荐分数；Determine a recommendation score corresponding to each item in a preset item set based on the user behavior sequence and the trained recommendation network model;

根据所述推荐分数确定所述待推荐用户对应的推荐物品，并将所述推荐物品推送给所述待推荐用户。Determine the recommended item corresponding to the to-be-recommended user according to the recommendation score, and push the recommended item to the to-be-recommended user.

所述基于多层注意力的推荐方法，其中，所述获取待推荐用户的历史行为，并根据所述历史行为生成用户行为序列具体包括：The recommendation method based on multi-layer attention, wherein the step of obtaining the historical behavior of the user to be recommended and generating the user behavior sequence according to the historical behavior specifically includes:

获取待推荐用户的历史行为，其中，各历史行为包括物品标识以及行为时间；Obtaining historical behaviors of the user to be recommended, wherein each historical behavior includes an item identifier and a behavior time;

根据所述行为时间将所述历史行为排序以得到用户行为序列。The historical behaviors are sorted according to the behavior time to obtain a user behavior sequence.

所述基于多层注意力的推荐方法，其中，所述基于所述用户行为序列以及经训练的推荐网络模型确定预设物品集中各物品对应的推荐分数具体包括：The recommendation method based on multi-layer attention, wherein the determining of the recommendation score corresponding to each item in the preset item set based on the user behavior sequence and the trained recommendation network model specifically includes:

对于预设物品集中的每一物品，获取该物品对应的物品向量；For each item in the preset item set, obtain the item vector corresponding to the item;

基于所述用户行为序列以及物品向量生成物品序列，并将物品序列输入至经训练的推荐网络模型，以通过所述推荐网络模型输出该物品对应的推荐分数。An item sequence is generated based on the user behavior sequence and the item vector, and the item sequence is input into a trained recommendation network model so as to output a recommendation score corresponding to the item through the recommendation network model.

所述基于多层注意力的推荐方法，其中，所述推荐网络模型的训练过程包括：The recommendation method based on multi-layer attention, wherein the training process of the recommendation network model includes:

将训练用户行为序列划分为第一行为序列和第二行为序列，其中，第二行为序列包括训练用户行为序列中最后的行为记录；Dividing the training user behavior sequence into a first behavior sequence and a second behavior sequence, wherein the second behavior sequence includes the last behavior record in the training user behavior sequence;

按照预设掩码策略对所述第一行为序列进行掩码处理，以得到掩码后的第一行为序列；Performing masking processing on the first behavior sequence according to a preset masking strategy to obtain a masked first behavior sequence;

基于待训练的掩码网络模型输出掩码后的第一行为序列对应的生成行为序列；A generated behavior sequence corresponding to the first behavior sequence after masking output by the mask network model to be trained;

基于所述生成行为序列以及所述第二行为序列对待训练的推荐网络模型进行训练，以得到经过训练的推荐网络模型。The recommendation network model to be trained is trained based on the generated behavior sequence and the second behavior sequence to obtain a trained recommendation network model.

所述基于多层注意力的推荐方法，其中，所述推荐网络模型包括多层注意力结构；所述基于所述生成行为序列以及所述第二行为序列对待训练的推荐网络模型进行训练，以得到经过训练的推荐网络模型具体包括：The recommendation method based on multi-layer attention, wherein the recommendation network model includes a multi-layer attention structure; the training of the recommendation network model to be trained based on the generated behavior sequence and the second behavior sequence to obtain a trained recommendation network model specifically includes:

分别将所述生成行为序列以及所述第二行为序列等分为若干子向量；Divide the generated behavior sequence and the second behavior sequence into a plurality of sub-vectors respectively;

将等分后的生成序列以及第二行为序列输入至多层注意力结构，通过所述多层注意力结构输出生成行为序列相对于第二行为序列的注意力分数；Inputting the equally divided generated sequence and the second behavior sequence into a multi-layer attention structure, and outputting the attention score of the generated behavior sequence relative to the second behavior sequence through the multi-layer attention structure;

基于所述注意力分数修正待训练的推荐网络模型的网络参数，以得到经过训练的推荐网络模型。The network parameters of the recommendation network model to be trained are modified based on the attention score to obtain a trained recommendation network model.

所述基于多层注意力的推荐方法，其中，所述掩码网络模型包括多层注意力结构；所述基于待训练的掩码网络模型输出掩码后的第一行为序列对应的生成行为序列具体包括：The recommendation method based on multi-layer attention, wherein the mask network model includes a multi-layer attention structure; the generated behavior sequence corresponding to the first behavior sequence after the mask network model to be trained outputs the mask specifically includes:

将掩码后的第一行为序列输入至所述多层注意力结构，通过所述多层注意力结构输出生成行为序列。The masked first behavior sequence is input into the multi-layer attention structure, and the behavior sequence is generated through the output of the multi-layer attention structure.

所述基于多层注意力的推荐方法，其中，所述掩码网络模型还包括识别结构，所述基于待训练的掩码网络模型输出掩码后的第一行为序列对应的生成行为序列之后，所述方法还包括；The recommendation method based on multi-layer attention, wherein the mask network model further includes a recognition structure, and after the mask network model to be trained outputs a generated behavior sequence corresponding to the first behavior sequence after masking, the method further includes:

将所述生成行为序列输入至所述识别结构，通过所述识别结构输出生成物品标识；Inputting the generated behavior sequence into the recognition structure, and generating an item identification through the recognition structure output;

基于所述生成物品标识对所述待训练的掩码网络模型的网络参数进行修正，以训练所述待训练的掩码网络模型。Based on the generated object identification, the network parameters of the mask network model to be trained are modified to train the mask network model to be trained.

所述基于多层注意力的推荐方法，其中，所述基于所述注意力分数修正待训练的推荐网络模型的网络参数，以得到经过训练的推荐网络模型具体包括：The recommendation method based on multi-layer attention, wherein the step of correcting the network parameters of the recommendation network model to be trained based on the attention score to obtain the trained recommendation network model specifically includes:

基于所述注意力分数修正待训练的推荐网络模型的网络参数，并且当所述推荐网络模型的网络参数满足预设条件后，将所述第一行为序列和所述第二行为序列输入所述网络参数满足预设条件的推荐网络模型的网络参数；Based on the attention score, the network parameters of the recommended network model to be trained are modified, and when the network parameters of the recommended network model meet the preset conditions, the first behavior sequence and the second behavior sequence are input into the network parameters of the recommended network model whose network parameters meet the preset conditions;

通过所述网络参数满足预设条件的推荐网络模型输出所述第二行为序列对应的注意力分数；Outputting the attention score corresponding to the second behavior sequence through the recommendation network model whose network parameters meet the preset conditions;

基于注意力分数修正网络参数满足预设条件的推荐网络模型的网络参数，以得到经过训练的推荐网络模型。Based on the attention score, the network parameters of the recommendation network model whose network parameters meet the preset conditions are corrected to obtain a trained recommendation network model.

一种计算机可读存储介质，所述计算机可读存储介质存储有一个或者多个程序，所述一个或者多个程序可被一个或者多个处理器执行，以实现如上任一所述的基于多层注意力的推荐方法中的步骤。A computer-readable storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to implement the steps in any of the multi-layer attention-based recommendation methods described above.

一种终端设备，其包括：处理器、存储器及通信总线；所述存储器上存储有可被所述处理器执行的计算机可读程序；A terminal device comprises: a processor, a memory and a communication bus; the memory stores a computer-readable program executable by the processor;

所述通信总线实现处理器和存储器之间的连接通信；The communication bus realizes the connection and communication between the processor and the memory;

所述处理器执行所述计算机可读程序时实现如上任一所述的基于多层注意力的推荐方法中的步骤。When the processor executes the computer-readable program, the processor implements the steps in any of the above-described multi-layer attention-based recommendation methods.

有益效果：与现有技术相比，本发明提供了一种基于多层注意力的推荐方法，所述方法通过获取待推荐用户的历史行为，并根据所述历史行为生成用户行为序列；基于所述用户行为序列以及经训练的推荐网络模型确定预设物品集中各物品对应的推荐分数；根据所述推荐分数确定所述待推荐用户对应的推荐物品，并将所述推荐物品推送给所述待推荐用户。本发明采用用户的历史行为作为输入项，并通过推荐网络模型学习用户历史行为的上下文特征，以确定推荐物品，这样可以提高推荐物品的准确性。Beneficial effects: Compared with the prior art, the present invention provides a recommendation method based on multi-layer attention, which obtains the historical behavior of the user to be recommended and generates a user behavior sequence based on the historical behavior; determines the recommendation score corresponding to each item in the preset item set based on the user behavior sequence and the trained recommendation network model; determines the recommended item corresponding to the user to be recommended based on the recommendation score, and pushes the recommended item to the user to be recommended. The present invention uses the user's historical behavior as an input item, and learns the contextual features of the user's historical behavior through the recommendation network model to determine the recommended item, which can improve the accuracy of the recommended item.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明提供的基于多层注意力的推荐方法的流程图。FIG1 is a flow chart of a multi-layer attention-based recommendation method provided by the present invention.

图2为本发明提供的基于多层注意力的推荐方法的流程示意图。FIG2 is a flow chart of the multi-layer attention-based recommendation method provided by the present invention.

图3为本发明提供的基于多层注意力的推荐方法中待训练的推荐网络模型的训练过程的流程示意图。FIG3 is a flow chart of the training process of the recommendation network model to be trained in the multi-layer attention-based recommendation method provided by the present invention.

图4为本发明提供的终端设备的结构原理图。FIG. 4 is a schematic diagram of the structure of the terminal device provided by the present invention.

具体实施方式DETAILED DESCRIPTION

本发明提供一种基于多层注意力的推荐方法，为使本发明的目的、技术方案及效果更加清楚、明确，以下参照附图并举实施例对本发明进一步详细说明。应当理解，此处所描述的具体实施例仅用以解释本发明，并不用于限定本发明。The present invention provides a recommendation method based on multi-layer attention. In order to make the purpose, technical solution and effect of the present invention clearer and more specific, the present invention is further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are only used to explain the present invention and are not used to limit the present invention.

本技术领域技术人员可以理解，除非特意声明，这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是，本发明的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件，但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。应该理解，当我们称元件被“连接”或“耦接”到另一元件时，它可以直接连接或耦接到其他元件，或者也可以存在中间元件。此外，这里使用的“连接”或“耦接”可以包括无线连接或无线耦接。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的全部或任一单元和全部组合。It will be understood by those skilled in the art that, unless expressly stated, the singular forms "one", "said", and "the" used herein may also include plural forms. It should be further understood that the term "comprising" used in the specification of the present invention refers to the presence of the features, integers, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or groups thereof. It should be understood that when we refer to an element as being "connected" or "coupled" to another element, it may be directly connected or coupled to the other element, or there may be intermediate elements. In addition, the "connection" or "coupling" used herein may include wireless connection or wireless coupling. The term "and/or" used herein includes all or any unit and all combinations of one or more associated listed items.

本技术领域技术人员可以理解，除非另外定义，这里使用的所有术语(包括技术术语和科学术语)，具有与本发明所属领域中的普通技术人员的一般理解相同的意义。还应该理解的是，诸如通用字典中定义的那些术语，应该被理解为具有与现有技术的上下文中的意义一致的意义，并且除非像这里一样被特定定义，否则不会用理想化或过于正式的含义来解释。It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as those generally understood by those skilled in the art in the art to which the present invention belongs. It should also be understood that terms such as those defined in general dictionaries should be understood to have meanings consistent with the meanings in the context of the prior art, and will not be interpreted with idealized or overly formal meanings unless specifically defined as herein.

本实施例提供了一种基于多层注意力的推荐方法，该方法可以应用于电子设备，所述电子设备可以以各种形式来实现。例如，手机、平板电脑、掌上电脑、个人数字助理(Personal Digital Assistant，PDA)等。另外，该方法所实现的功能可以通过电子设备中的处理器调用程序代码来实现，当然程序代码可以保存在计算机存储介质中，可见，该电子设备至少包括处理器和存储介质。This embodiment provides a recommendation method based on multi-layer attention, which can be applied to electronic devices, and the electronic devices can be implemented in various forms. For example, mobile phones, tablet computers, PDAs, personal digital assistants (PDAs), etc. In addition, the functions implemented by the method can be implemented by calling program codes by a processor in the electronic device, and of course the program codes can be stored in a computer storage medium. It can be seen that the electronic device at least includes a processor and a storage medium.

如图1和图2所示，本实施提供了一种基于多层注意力的推荐方法，所述方法可以包括以下步骤：As shown in FIG. 1 and FIG. 2 , this implementation provides a recommendation method based on multi-layer attention, and the method may include the following steps:

S10、获取待推荐用户的历史行为，并根据所述历史行为生成用户行为序列。S10: Obtain historical behaviors of the user to be recommended, and generate a user behavior sequence according to the historical behaviors.

具体地，所述推荐用户的历史行为可以是通过互联网应用(例如，淘宝应用，腾讯视频应用等)记录的，互联网应用的用户在互联网应用上的进行的所有行为记录。此外，由于互联网应用面对市场方向可以不相同，从而各互联网络应用记录的用户行为也可以不同，例如，购物网站记录了每一个用户的购买记录，视频网站记录了每一个用户的观看记录。但是，对于每个互联网应用，该互联网应用中的每个用户(user_id)均可以拥有多个行为记录(event)，其中，每一个行为记录均包括物品标识(item_id)和行为时间(timestamp)组成。当然，值得说明的是，对于获取到待推荐用户的每个历史行为，该历史行为对应的行为类型均相同，例如，所有历史行为均为购物行为，所有历史行为均为观看视频行为等。Specifically, the historical behavior of the recommended user can be recorded through an Internet application (for example, Taobao application, Tencent video application, etc.), and all the behavior records of the users of the Internet application on the Internet application. In addition, since the market directions of Internet applications may be different, the user behaviors recorded by each Internet application may also be different. For example, a shopping website records the purchase records of each user, and a video website records the viewing records of each user. However, for each Internet application, each user (user_id) in the Internet application can have multiple behavior records (event), wherein each behavior record includes an item identifier (item_id) and a behavior time (timestamp). Of course, it is worth noting that for each historical behavior of the user to be recommended, the behavior type corresponding to the historical behavior is the same, for example, all historical behaviors are shopping behaviors, all historical behaviors are video watching behaviors, etc.

进一步，由于获取到每个历史行为均包括行为时间，从而在获取到所有历史行为后，可以将获取到所有历史行为按照行为时间进行排序，以得到用户行为序列，其中，用户行为序列中的每个历史行为对应的行为时间均早于获取待推荐用户的历史形式的时刻，所述获取待推荐用户的历史行为的时刻可以是用户登录该互联网应用的时刻。相应的，在本实施例的一个实现方式中，所述获取待推荐用户的历史行为，并根据所述历史行为生成用户行为序列具体包括：Furthermore, since each acquired historical behavior includes the behavior time, after all the historical behaviors are acquired, all the acquired historical behaviors can be sorted according to the behavior time to obtain a user behavior sequence, wherein the behavior time corresponding to each historical behavior in the user behavior sequence is earlier than the moment when the historical behavior of the user to be recommended is acquired, and the moment when the historical behavior of the user to be recommended is acquired can be the moment when the user logs into the Internet application. Accordingly, in an implementation of this embodiment, the acquisition of the historical behavior of the user to be recommended and the generation of the user behavior sequence according to the historical behavior specifically include:

S11、获取待推荐用户的历史行为，其中，各历史行为包括物品标识以及行为时间；S11, obtaining historical behaviors of the user to be recommended, wherein each historical behavior includes an item identifier and a behavior time;

S12、根据所述行为时间将所述历史行为排序以得到用户行为序列。S12. Sort the historical behaviors according to the behavior time to obtain a user behavior sequence.

具体地，待推荐用户的历史行为可以为一个或者多个，故在获取到待推荐用户对应的历史行为后，可以将获取到的所有历史行为整理为：[user_id:event¹→event²→event³......→eventⁿ]]其中，user_id表示待推荐用户的用户标识，event表示行为记录，n表示历史行为数量。此外，由于每个行为记录均包括物品标识以及行为时间，从而可以将该获取到的所有历史行为按照行为时间进行排列，以得到待推荐用户对应的用户行为序列，其可以表示为[user_id:item_id¹→item_id²→item_id³......→item_idⁿ]，其中，item_id表示物品标识，n表示历史行为数量。Specifically, the historical behavior of the user to be recommended can be one or more, so after obtaining the historical behavior corresponding to the user to be recommended, all the historical behaviors obtained can be sorted into: [user_id: event ¹ → event ² → event ³ ...... → event ⁿ ]], where user_id represents the user ID of the user to be recommended, event represents the behavior record, and n represents the number of historical behaviors. In addition, since each behavior record includes an item ID and a behavior time, all the historical behaviors obtained can be arranged according to the behavior time to obtain the user behavior sequence corresponding to the user to be recommended, which can be expressed as [user_id: item_id ¹ → item_id ² → item_id ³ ...... → item_id ⁿ ], where item_id represents the item ID, and n represents the number of historical behaviors.

S20、基于所述用户行为序列以及经训练的推荐网络模型确定预设物品集中各物品对应的推荐分数。S20: Determine a recommendation score corresponding to each item in a preset item set based on the user behavior sequence and the trained recommendation network model.

具体地，所述推荐网络模型为预先训练的，用于根据待推荐用户的历史行为形成的用户行为序列来确定该用户对待推荐物品的推荐分数。可以理解的是，所述推荐网络模型的输入项为用户行为序列以及待推荐物品，输出为该待推荐物品对应的分数。所述预设物品集中的应用该推荐方法的互联网应用所包含的所有物品，例如，互联网应用为淘宝，那么所述预设物品集为淘宝中销售的所有商品。由此，在本实施例的一个实现方式中，所述基于所述用户行为序列以及经训练的推荐网络模型确定预设物品集中各物品对应的推荐分数具体为：对于预设物品集中的每一物品，获取该物品对应的物品向量；基于所述用户行为序列以及物品向量生成物品序列，并将物品序列输入至经训练的推荐网络模型，以通过所述推荐网络模型输出该物品对应的推荐分数。Specifically, the recommendation network model is pre-trained and is used to determine the recommendation score of the user for the recommended item based on the user behavior sequence formed by the historical behavior of the user to be recommended. It can be understood that the input items of the recommendation network model are the user behavior sequence and the item to be recommended, and the output is the score corresponding to the item to be recommended. The preset item set includes all items contained in the Internet application to which the recommendation method is applied. For example, if the Internet application is Taobao, then the preset item set is all the goods sold in Taobao. Therefore, in one implementation of this embodiment, the recommendation score corresponding to each item in the preset item set is determined based on the user behavior sequence and the trained recommendation network model as follows: for each item in the preset item set, obtain the item vector corresponding to the item; generate an item sequence based on the user behavior sequence and the item vector, and input the item sequence into the trained recommendation network model to output the recommendation score corresponding to the item through the recommendation network model.

在本实施例的一个实现方式中，如图3所示，所述推荐网络模型的训练过程包括：In an implementation of this embodiment, as shown in FIG3 , the training process of the recommendation network model includes:

H10、将训练用户行为序列划分为第一行为序列和第二行为序列，其中，第二行为序列包括训练用户行为序列中最后的行为记录；H10, dividing the training user behavior sequence into a first behavior sequence and a second behavior sequence, wherein the second behavior sequence includes the last behavior record in the training user behavior sequence;

H20、按照预设掩码策略对所述第一行为序列进行掩码处理，以得到掩码后的第一行为序列；H20, performing mask processing on the first behavior sequence according to a preset masking strategy to obtain a masked first behavior sequence;

H30、基于待训练的掩码网络模型输出掩码后的第一行为序列对应的生成行为序列；H30, based on the mask network model to be trained outputting the generated behavior sequence corresponding to the first behavior sequence after masking;

H40、基于所述生成行为序列以及所述第二行为序列对待训练的推荐网络模型进行训练，以得到经过训练的推荐网络模型。H40. Train the recommendation network model to be trained based on the generated behavior sequence and the second behavior sequence to obtain a trained recommendation network model.

具体地，在所述步骤H10中，所述待训练用户行序列包含于预先的训练样本集中，所述训练样本集包括若干待训练用户行为序列，并对于训练样本集中的每个待训练用户行为序列，该待训练用户行为序列对应的行为类别均相同，例如，均为购物类别，均为观看视频类别等。此外，每个待训练用户行为序列均包括多个行为记录对应的物品标识，并且多个行为记录对应的物品标识按照行为时间的先后顺序排列，具体可以参照待推荐用户对应的用户行为序列的获取过程，这里就不再赘述。当然，值得说明的是，所述第一行为序列包括待训练用户行为序列中除最后一个行为记录外的所有最后的，所述第二行为序列包括待训练用户行为序列中最后一个行为记录。Specifically, in the step H10, the user behavior sequence to be trained is included in a pre-training sample set, and the training sample set includes several user behavior sequences to be trained, and for each user behavior sequence to be trained in the training sample set, the behavior categories corresponding to the user behavior sequence to be trained are the same, for example, all are shopping categories, all are video watching categories, etc. In addition, each user behavior sequence to be trained includes item identifiers corresponding to multiple behavior records, and the item identifiers corresponding to the multiple behavior records are arranged in chronological order of behavior time. For details, please refer to the process of obtaining the user behavior sequence corresponding to the user to be recommended, which will not be repeated here. Of course, it is worth noting that the first behavior sequence includes all the last behavior records in the user behavior sequence to be trained except the last one, and the second behavior sequence includes the last behavior record in the user behavior sequence to be trained.

进一步，在本实施例的一个实现方式中，基于预设的训练样本集对待训练的推荐网络模型Nxet_item进行训练之前，可以对训练样本集进行扩充，以提高训练样本集中训练数据的多样性，从而可以提高基于该训练样本集训练得到的推荐网络模型的精确性。所述对训练样本集的扩充过程可以包括：对于训练样本集中的每个训练用户行为序列，获取该训练用户行为序列中的最后一个行为记录，随机选取一物品标识替换该行为记录中物品标识，以得到该训练用户序列对应的负样本，并且随机选取的物品标识与该行为记录中的物品标识不相同。其中，随机选取的物品标识为在预设物品库中随机选取，并且基于该随机物品标识以及行为记录形式的用户行为的行为类型与最后一行为记录的用户行为的行为类型相同，例如，最后一个行为记录为购买商品行为，则基于该随机物品标识以及行为记录为购买商品行为。Further, in an implementation of this embodiment, before training the recommendation network model Nxet_item to be trained based on the preset training sample set, the training sample set can be expanded to improve the diversity of training data in the training sample set, thereby improving the accuracy of the recommendation network model obtained by training based on the training sample set. The expansion process of the training sample set may include: for each training user behavior sequence in the training sample set, obtaining the last behavior record in the training user behavior sequence, randomly selecting an item identifier to replace the item identifier in the behavior record, so as to obtain a negative sample corresponding to the training user sequence, and the randomly selected item identifier is different from the item identifier in the behavior record. Among them, the randomly selected item identifier is randomly selected from the preset item library, and the behavior type of the user behavior based on the random item identifier and the behavior record form is the same as the behavior type of the user behavior in the last behavior record. For example, if the last behavior record is a purchase behavior, then the random item identifier and the behavior record are based on the purchase behavior.

举例说明，假如训练用户行为序列为：item_id¹→item_id²→item_id³→item_id⁴→item_id⁵，将该训练用户行为序列记为正样本，那么基于该正样本生成的负样本可以为：item_id¹→item_id²→item_id³→item_id⁴→rand_item，其中，rand_item为随机选取的物品标识。For example, if the training user behavior sequence is: item_id ¹ →item_id ² →item_id ³ →item_id ⁴ →item_id ⁵ , and the training user behavior sequence is recorded as a positive sample, then the negative sample generated based on the positive sample can be: item_id ¹ →item_id ² →item_id ³ →item_id ⁴ →rand_item, where rand_item is the randomly selected item identifier.

进一步，在所述步骤H20中，所述预设掩码策略为预先设置，根据所述预设掩码策略对训练样本进行掩码处理，以确定训练用户行为序列中需要被掩码mask的物品标识。在本实施例中，所述掩码策略为在训练用户行为序列中按照第一预设概率随机选取若干物品标识，再按照掩码规则对选取到若干物品标识进行mask。其中，所述第一预设概率可以为15％等，所述掩码规则可以为第二预设概率替换成“[Mask]”标记，第三预设概率的概率随机替换成其他物品标识，第四预设概率不作改动，其中，第二预设概率、第三预设概率以及第四预设概率的和为1。例如，第二预设概率为80％，第三预设概率为10％，第四预设概率为10％。Further, in the step H20, the preset mask strategy is pre-set, and the training samples are masked according to the preset mask strategy to determine the item identifiers that need to be masked in the training user behavior sequence. In this embodiment, the mask strategy is to randomly select a number of item identifiers in the training user behavior sequence according to the first preset probability, and then mask the selected item identifiers according to the mask rule. Among them, the first preset probability can be 15%, etc., the mask rule can be replaced by the "[Mask]" mark for the second preset probability, the probability of the third preset probability is randomly replaced by other item identifiers, and the fourth preset probability is unchanged, wherein the sum of the second preset probability, the third preset probability and the fourth preset probability is 1. For example, the second preset probability is 80%, the third preset probability is 10%, and the fourth preset probability is 10%.

举例说明，例如训练用户行为序列为：item_id¹→item_id²→item_id³→item_id⁴→item_id⁵，其中，按照掩码策略item_id²为被选中执行替换隐藏标记，那么采用[mask]替换item_id²之后，可以得到item_id¹→[mask]→item_id³→item_id⁴→item_id⁵。此外，在获取到item_id¹→[mask]→item_id³→item_id⁴→item_id⁵之后，根据掩码规则对获取到item_id¹→[mask]→item_id³→item_id⁴→item_id⁵进行更新，更新结果为：80％的概率使用“[Mask]”标记替换选定的物品，得到掩码处理后的训练用户行为序列为item_id¹→[mask]→item_id³→item_id⁴→item_id⁵；10％的概率保持item_id²不变，得到掩码处理后的训练用户行为序列为item_id¹→item_id²→item_id³→item_id⁴→item_id⁵；10％的概率用随机选择的物品标识替换item_id²，得到掩码处理后的训练用户行为序列为item_id¹→rand_item→item_id³→item_id⁴→item_id⁵。可以理解的是，对于每个训练用户行为序列，该训练用户行为序列中的物品标识发生随机替换的概率只有1.5％(即10％的10％)，从而并不会影响用户行为序列中包含的用户兴趣的分布。For example, the training user behavior sequence is: item_id ¹ →item_id ² →item_id ³ →item_id ⁴ →item_id ⁵ , where item_id ² is selected to replace the hidden mark according to the mask strategy. Then, after replacing item_id ² with [mask], we can get item_id ¹ →[mask]→item_id ³ →item_id ⁴ →item_id ⁵ . In addition, after obtaining item_id ¹ →[mask]→item_id ³ →item_id ⁴ →item_id ⁵ , item_id ^{1 →[mask]→item_id 3} →item_id ⁴ →item_id ⁵ is updated according to the ^mask rule. The update result is: 80% probability of using the "[Mask]" tag to replace the selected item, and the training user behavior sequence after mask processing is item_id ¹ →[mask]→item_id ³ →item_id ⁴ →item_id ⁵ ; 10% probability of keeping item_id ² unchanged, and the training user behavior sequence after mask processing is item_id ¹ →item_id ² →item_id ³ →item_id ⁴ →item_id ⁵ ; 10% probability of replacing item_id ² with a randomly selected item identifier, and the training user behavior sequence after mask processing is item_id ¹ →rand_item→item_id ³ →item_id ⁴ →item_id ^5. It is understandable that, for each training user behavior sequence, the probability of random replacement of item identifiers in the training user behavior sequence is only 1.5% (ie, 10% of 10%), which does not affect the distribution of user interests contained in the user behavior sequence.

进一步，由于对于训练样本集中的每个训练用户行为序列均对于一个负样本，从而对于训练样本集中的每个训练用户行为序列，在对该训练用户行为序列进行取负样本以及掩码处理后，该训练用户行为序列可以对应为两个用户行为序列，分别记为根据正样本掩码得到的item_id¹→[mask]→item_id³→...→item_id^n-1→item_idⁿ以及根据负样本掩码得到的item_id¹→item_id²→[mask]→...→item_id^n-1→rand_item。可以理解的是，对于训练样本集中的每个用户user_id对应的用户行为序列在经过取负样本以及掩码处理后，均可以对应两个用户行为序列，可以表示为：Furthermore, since each training user behavior sequence in the training sample set corresponds to a negative sample, after taking negative samples and masking the training user behavior sequence, the training user behavior sequence can correspond to two user behavior sequences, which are respectively recorded as item_id ¹ →[mask]→item_id ³ →...→item_id ^n-1 →item_id ⁿ obtained according to the positive sample mask and item_id ¹ →item_id ² →[mask]→...→item_id ^n-1 →rand_item obtained according to the negative sample mask. It can be understood that the user behavior sequence corresponding to each user user_id in the training sample set can correspond to two user behavior sequences after taking negative samples and masking, which can be expressed as:

进一步，在所述步骤H30中，所述待训练的掩码网络模型Masked_lm用于将掩码后用户行为序列通过自注意分数重建，使得重建之后的生成行为序列包含语义特征。可以理解的是，通过待训练的掩码网络模型可以生成掩码后的第一行为序列对应的生成行为序列。此外，为了可以将掩码后的第一行为序列输入至待训练的掩码网络模型，需要将每个物品标识用一个向量来标识，以将第一行为序列以及第二行为序列均转换为行为向量，其中，第一行为序列可以转换为第一行为向量，第二行为序列可以转为第二行为序列。Further, in the step H30, the masked network model Masked_lm to be trained is used to reconstruct the masked user behavior sequence through the self-attention score, so that the generated behavior sequence after reconstruction contains semantic features. It can be understood that the generated behavior sequence corresponding to the masked first behavior sequence can be generated by the masked network model to be trained. In addition, in order to input the masked first behavior sequence into the masked network model to be trained, each item identification needs to be identified with a vector to convert the first behavior sequence and the second behavior sequence into behavior vectors, wherein the first behavior sequence can be converted into a first behavior vector, and the second behavior sequence can be converted into a second behavior sequence.

在本实施的一个实现方式中，对于每个物品标识，该物品标识均可以转换为一个固定长度的连续稠密的向量，并且每个物品标识对应的连续稠密的向量的长度均相等。其中，将物品标识转换为连续稠密的向量的过程可以为：对于训练用户序列行为中的每个物品标识，将该物品标识的one-hot编码向量与预设稠密初始化矩阵进行向量乘积，乘积结果是初始化矩阵中某一行的值，并将该乘积结果作为该物品标识对应的连续稠密的向量。这样每一个物品标识均会被映射成一个固定长度的稠密向量。此外，所述物品标识的one-hot编码向量是基于该物品标识在对应的训练用户行为序列中的位置顺序编码得到，该物品标识所处的位置用1表示，其余位置用表示。In one implementation of the present embodiment, for each item identifier, the item identifier can be converted into a continuous dense vector of a fixed length, and the length of the continuous dense vector corresponding to each item identifier is equal. Among them, the process of converting the item identifier into a continuous dense vector can be: for each item identifier in the training user sequence behavior, the one-hot encoding vector of the item identifier is vector-multiplied with a preset dense initialization matrix, the product result is the value of a row in the initialization matrix, and the product result is used as the continuous dense vector corresponding to the item identifier. In this way, each item identifier will be mapped into a dense vector of a fixed length. In addition, the one-hot encoding vector of the item identifier is obtained based on the position sequence encoding of the item identifier in the corresponding training user behavior sequence, and the position of the item identifier is represented by 1, and the remaining positions are represented by.

举例说明：假设物品表示对应的one-hot编码向量为[0 0 0 1 0]，预设稠密初始化矩阵为：For example: Assume that the one-hot encoding vector corresponding to the item representation is [0 0 0 1 0], and the preset dense initialization matrix is:

那么，该物品标识对应的连续稠密的向量为：Then, the continuous dense vector corresponding to the item identifier is:

当然，值得说明的是，在未训练的推荐网模型训练的过程中，随着梯度下降方法对未训练的推荐网模型的网络参数的迭代次数的增加，预设稠密初始化矩阵会不断地更新，使得物品所带的实质意义会被赋予给该物品所映射的连续稠密的向量中。Of course, it is worth mentioning that during the training process of the untrained recommendation network model, as the number of iterations of the gradient descent method on the network parameters of the untrained recommendation network model increases, the preset dense initialization matrix will be continuously updated, so that the essential meaning of the item will be given to the continuous dense vector mapped by the item.

进一步，在本实施例的一个实现方式中，所述待训练的掩码网络模型包括多层注意力结构，通过该多层注意力结构对掩码后的第一行为序列进行自注意分数重建，以得到第一行为序列对应的生成行为序列。相应的，所述基于待训练的掩码网络模型输出掩码后的第一行为序列对应的生成行为序列具体为：Further, in an implementation of this embodiment, the mask network model to be trained includes a multi-layer attention structure, through which the self-attention score is reconstructed on the masked first behavior sequence to obtain a generated behavior sequence corresponding to the first behavior sequence. Accordingly, the generated behavior sequence corresponding to the masked first behavior sequence output based on the mask network model to be trained is specifically:

H30a、将掩码后的第一行为序列输入至所述多层注意力结构，通过所述多层注意力结构输出生成行为序列。H30a. Input the masked first behavior sequence into the multi-layer attention structure, and generate a behavior sequence through the output of the multi-layer attention structure.

具体地，所述多层注意力机构采用多层注意力机制，所述多头注意力机制是在多个语义空间里计算注意力机制，这样通过多语义空间有助于提高掩码网络模型的模型表现能力。所述第一行为序列预先基于上述转换方过程转换为第一行为向量，具体可以参照上述转换过程。此外，所述多层注意力结构可以表示为：Specifically, the multi-layer attention mechanism adopts a multi-layer attention mechanism, and the multi-head attention mechanism calculates the attention mechanism in multiple semantic spaces, so that the multi-semantic space helps to improve the model performance of the mask network model. The first behavior sequence is pre-converted into a first behavior vector based on the above conversion process, and the specific conversion process can be referred to. In addition, the multi-layer attention structure can be expressed as:

其中，h₀为掩码后的第一行为序列对应的第一行为向量，h₃为多层注意力机制输出的生成行为向量，其中，所述生成行为向量为多层注意力结构输出生成行为序列。Among them, _h0 is the first behavior vector corresponding to the first behavior sequence after masking, and _h3 is the generated behavior vector output by the multi-layer attention mechanism, wherein the generated behavior vector is the generated behavior sequence output by the multi-layer attention structure.

进一步，在本实施例的一个实现方式中，将掩码后的第一行为序列输入至所述多层注意力结构，通过所述多层注意力结构输出生成行为序列具体可以包括下三个步骤：Further, in an implementation of this embodiment, inputting the masked first behavior sequence into the multi-layer attention structure, and generating the behavior sequence through the multi-layer attention structure output may specifically include the following three steps:

第一步骤：将第一行为向量均等分为h份，即对于第一行为序列中的每一个物品向量编码w_i而言划分为w_i＝{w_i1,w_i2,...,w_ih}，对于包含N个物品的行为序列h₀∈R^N*K(N表示物品的数量，K表示稠密向量的长度)经过均等分之后的向量h₀∈R^hN*K/h(N表示物品的数量，K表示稠密向量的长度，表示等分分数)。The first step: divide the first behavior vector into h equal parts, that is, for each item vector code w _i in the first behavior sequence, divide it into w _i = {w _i1 ,w _i2 ,...,w _ih }. For the behavior sequence h ₀ ∈R ^N*K (N represents the number of items, K represents the length of the dense vector) containing N items, the vector h ₀ ∈R ^hN*K/h (N represents the number of items, K represents the length of the dense vector, represents the equal division fraction) after equal division.

第二步骤：划分得到向量经过多层注意力机构映射在不同的语义空间可以学习到不同的物品所代表的实际意义，例如羊毛衫和羊毛毯子在第一个语义空间里可能相似性很高，而在第二个语义空间里可能相似性很低，其中，第一个语义空间所代表的物品的实际含义是物品的材质，第二个语义空间所代表的实际含义是物品的用途。此外，在本实施例中，采用并行地计算语义空间里的注意力分数，这样与不划分语义空间相比，并不会增加额外的计算量。其中，多层注意力机构的映射过程可以如公式(1)所示，自注意力分数的计算过程可以如图公式(2)，其中，公式(1)和公式(2)如下所示：Step 2: The divided vectors are mapped in different semantic spaces through multi-layer attention mechanisms to learn the actual meanings of different items. For example, a wool sweater and a wool blanket may have a high similarity in the first semantic space, but a low similarity in the second semantic space. The actual meaning of the items represented by the first semantic space is the material of the items, and the actual meaning represented by the second semantic space is the purpose of the items. In addition, in this embodiment, the attention scores in the semantic space are calculated in parallel, so that compared with not dividing the semantic space, no additional calculation amount is added. Among them, the mapping process of the multi-layer attention mechanism can be shown in formula (1), and the calculation process of the self-attention score can be shown in formula (2), wherein formula (1) and formula (2) are as follows:

Q＝QW_i ^Q,K＝KW_i ^K,V＝VW_i ^V (1)Q＝QW _i ^Q ,K＝KW _i ^K ,V＝VW _i ^V (1)

其中，

表示查询向量的映射矩阵，

表示键向量的映射矩阵,

表示值向量的映射矩阵，K表示第一行为向量的大小，h表示语义空间的数量，

表示将第一行为向量均分成h份，也就是将第一行为向量分在h个语义空间里。Q，K,V均表示第一行为向量，d_k表示语义空间中向量的维度。in,

represents the mapping matrix of the query vector,

represents the mapping matrix of the key vector,

represents the mapping matrix of the value vector, K represents the size of the vector in the first row, h represents the number of semantic spaces,

It means that the vector of the first line is divided into h parts, that is, the vector of the first line is divided into h semantic spaces. Q, K, V all represent the vector of the first line, and d _k represents the dimension of the vector in the semantic space.

第三步骤：计算完个语义空间中的注意力分数之后，我们把个语义空间的输出重新连接起来，如下所示：Step 3: After calculating the attention scores in the semantic space, we reconnect the outputs of the semantic space as follows:

MultiHead＝Concat(head₁,...,head_h)W⁰ wherehead_i＝Attention(Q,K,V)MultiHead＝Concat(head ₁ ,...,head _h )W ⁰ wherehead _i ＝Attention(Q,K,V)

其中，W⁰表示映射参数，head₁,...,head_h均表示语义空间，h为语义空间数量。Among them, W ⁰ represents the mapping parameter, head ₁ ,...,head _h all represent semantic spaces, and h is the number of semantic spaces.

举例说明：第一行为序列为[a,b,c,d],正样本为[e]，第一行为序列被Mask之后为[a,Mask,c,d]，经过向量转换得到第一行为向量,将第一行为向量均分成h份，则得到h个语义空间。假设在其中一个语义空间中的向量为[E_a,E_mask,E_c,E_d]，则将该语义空间中的向量映射成查询向量、映射成键向量以及值向量,其中，查询向量为[Q_a,Q_mask,Q_c,Q_d]，键向量为[K_a,K_mask,K_c,K_d]，值向量为[V_a,V_mask,V_c,V_d]。For example: the first line sequence is [a,b,c,d], the positive sample is [e], the first line sequence after masking is [a,mask,c,d], the first line vector is obtained after vector conversion, the first line vector is divided into h parts, and h semantic spaces are obtained. Assuming that the vector in one of the semantic spaces is [E _a ,E _mask ,E _c ,E _d ], the vector in the semantic space is mapped into a query vector, a key vector and a value vector, where the query vector is [Q _a ,Q _mask ,Q _c ,Q _d ], the key vector is [K _a ,K _mask ,K _c ,K _d ], and the value vector is [V _a ,V _mask ,V _c ,V _d ].

利用查询向量和键向量得到注意力分数矩阵：Using the query vector and key vector, we get the attention score matrix:

其中，Q_m表示Q_mask，K_m表示K_mask。图中数字表示

的结果。Where, Q _m represents Q _mask and K _m represents K _mask . The numbers in the figure represent

result.

注意力分数矩阵再乘以值向量，得到值向量经过注意力分数加权之后的结果。例如，该语义空间中的向量E_a经过多头注意力机制之后的值为[0.4*V_a+0.2*V_mask+0.2*V_c+0.2*V_d]，E_mask经过多头注意力机制之后的值为[0.3*V_a+0.3*V_mask+0.2*V_c+0.2*V_d],E_c经过多头注意力机制之后的值为[0.1*V_a+0.3*V_mask+0.3*V_c+0.3*V_d]，E_d经过多头注意力机制之后的值为[0.3*V_a+0.2*V_mask+0.3*V_c+0.2*V_d]。The attention score matrix is then multiplied by the value vector to obtain the value vector weighted by the attention score. For example, the value of the vector E _a in the semantic space after the multi-head attention mechanism is [0.4*V _a +0.2*V _mask +0.2*V _c +0.2*V _d ], the value of E _mask after the multi-head attention mechanism is [0.3*V _a +0.3*V _mask +0.2*V _c +0.2*V _d ], the value of E _c after the multi-head attention mechanism is [0.1*V _a +0.3*V _mask +0.3*V _c +0.3*V _d ], and the value of E _d after the multi-head attention mechanism is [0.3*V _a +0.2*V _mask +0.3*V _c +0.2*V _d ].

进一步，在本实施例的一个实现方式中，为了对待训练的掩码网络模型进行序列，在通过多层注意力机构输出生成行为向量后，根据生成行为向量中查找“[Mask]”标记或替换的物品对应的生成编码，以基于该生成编码对待训练的掩码网络模的网络参数进行修正。相应的，所述掩码网络模型还包括识别结构，所述基于待训练的掩码网络模型输出掩码后的第一行为序列对应的生成行为序列之后，所述方法还包括；Furthermore, in an implementation of this embodiment, in order to sequence the mask network model to be trained, after the generated behavior vector is output through the multi-layer attention mechanism, the generated code corresponding to the item marked or replaced by "[Mask]" in the generated behavior vector is searched, and the network parameters of the mask network model to be trained are modified based on the generated code. Accordingly, the mask network model also includes a recognition structure, and after the generated behavior sequence corresponding to the first behavior sequence after the mask network model to be trained outputs the mask, the method also includes:

具体地，所述识别结构可以为softmax层，通过softmax层来确定生成编码为训练用户行为序列中该生成编码对应的真实编码的概率，并基于该概率对待训练的掩码网络模型的网络参数进行修正，以训练所述待训练的掩码网络模型。其中，所述softmax层对应的softmax函数可以表示为：Specifically, the recognition structure can be a softmax layer, which determines the probability that the generated code is the real code corresponding to the generated code in the training user behavior sequence through the softmax layer, and modifies the network parameters of the mask network model to be trained based on the probability to train the mask network model to be trained. The softmax function corresponding to the softmax layer can be expressed as:

其中，M为生成行为序列中的“[Mask]”标记或替换的物品对应的生成编码，

为训练样本集对应的所有物品对应的编码矩阵。Where M is the generated code corresponding to the “[Mask]” mark or the replaced item in the generated behavior sequence.

It is the encoding matrix corresponding to all items in the training sample set.

进一步，在所述步骤H40中，所述推荐网络模型包括多层注意力结构；所述基于所述生成行为序列以及所述第二行为序列对待训练的推荐网络模型进行训练，以得到经过训练的推荐网络模型具体包括：Further, in the step H40, the recommendation network model includes a multi-layer attention structure; the training of the recommendation network model to be trained based on the generated behavior sequence and the second behavior sequence to obtain the trained recommendation network model specifically includes:

具体地，所述第二行为序列等分为若干子向量之前，需要先转换为第二行为向量，其中，所述第二行为序列转换为第二行为向量的过程与第一行为序列转换为第一行为向量的过程相同，具体请参照第一行为序列转换为第一行为向量的过程，所述生成行为序列为待训练的掩码网络模型基于第一行为向量生成的。所述多层注意力结构为从左至右的单项注意力模型，它预测的是具有历史行为的用户是否会对新的物品产生行为，以电子商务网站为例，如果一位用户购买了羊毛大衣、牛仔裤，雪地靴，毛绒玩具，当这个用户访问电子商务网站时，我们需要预测她下一件会购买的商品，例如一件新的羊毛大衣。此外，有预设训练样本集的扩充可以得到，预设样本集中每个训练用户行为序列均对应一个负样本，由此，第二行为序列可以为训练用户行为序列中的最后一个物品标识，也可以为训练用户行为序列对应的负样本中的最后一个物品标识，这样可以通过正负样本作为第二行为序列，可以提高待训练的推荐网络模型的训练样本的多样性，进而提高训练得到的推荐网络模型的精度。Specifically, before the second behavior sequence is equally divided into several sub-vectors, it needs to be converted into a second behavior vector first, wherein the process of converting the second behavior sequence into the second behavior vector is the same as the process of converting the first behavior sequence into the first behavior vector. For details, please refer to the process of converting the first behavior sequence into the first behavior vector. The generated behavior sequence is generated by the mask network model to be trained based on the first behavior vector. The multi-layer attention structure is a single-item attention model from left to right, which predicts whether a user with historical behavior will have behavior on a new item. Taking an e-commerce website as an example, if a user buys a wool coat, jeans, snow boots, and plush toys, when this user visits the e-commerce website, we need to predict the next item she will buy, such as a new wool coat. In addition, it can be obtained by expanding the preset training sample set that each training user behavior sequence in the preset sample set corresponds to a negative sample. Therefore, the second behavior sequence can be the last item identifier in the training user behavior sequence, or the last item identifier in the negative sample corresponding to the training user behavior sequence. In this way, the positive and negative samples can be used as the second behavior sequence, which can improve the diversity of the training samples of the recommendation network model to be trained, thereby improving the accuracy of the recommendation network model obtained by training.

进一步，所述待训练的推荐网络模型用于计算目标物品向量和历史行为向量之间的注意力分数，从用户的历史行为序列中学习用户的兴趣偏好。其中，所述多层注意力结构与待训练的掩码网络模型中的多层注意力结构相同，但是，所述多层注意力结构与待训练的掩码网络模型的输出项不同，所述多层注意力结构的输出项为第二行为训练以及待训练的掩码网络模型输出的生成行为序列，待训练的掩码网络模型的输入项为经过掩码处理的第一行为序列。此外，所述多层注意力结构的处理过程也包括三个步骤，其中，第一步骤和第三步骤与待训练的掩码网络模型中的多层注意力结构的处理过程中的第一步骤和第三步骤相同，这里不再赘述，仅对第二步骤进行说明。Furthermore, the recommendation network model to be trained is used to calculate the attention score between the target item vector and the historical behavior vector, and learn the user's interest preferences from the user's historical behavior sequence. Among them, the multi-layer attention structure is the same as the multi-layer attention structure in the mask network model to be trained, but the output items of the multi-layer attention structure and the mask network model to be trained are different. The output items of the multi-layer attention structure are the second behavior training and the generated behavior sequence output by the mask network model to be trained, and the input items of the mask network model to be trained are the first behavior sequence after mask processing. In addition, the processing process of the multi-layer attention structure also includes three steps, among which the first step and the third step are the same as the first step and the third step in the processing process of the multi-layer attention structure in the mask network model to be trained, and will not be repeated here, and only the second step will be explained.

所述第二步骤的具体处理过程可以为：计算第二行为序列和生成行为序列之间的注意力分数，它的输入为Q＝I_target表示第二行为序列经过k等均分之后的向量，K＝V＝I_hist表示生成行为序列经过k等均分之后的向量，

为第二行为序列和生成行为序列的自注意力分数。The specific processing process of the second step can be: calculating the attention score between the second behavior sequence and the generated behavior sequence, and its input is Q=I _target represents the vector of the second behavior sequence after k equal divisions, K=V=I _hist represents the vector of the generated behavior sequence after k equal divisions,

is the self-attention score of the second behavior sequence and the generated behavior sequence.

进一步，待训练的推荐网络模型在训练过程中是与待训练的掩码网络模型联合训练，而待训练的推荐网络模型在训练完成后是单独使用的，并且待训练的掩码模型采用具有mask标记的训练样本作为输入项，而在实际应用中，获取到用户行为序列为不携带mask标记。由此，为了使得经训练推荐网络模型对不携带mask标记的用户行为训练的适应性，在待训练的推荐网络的网络模型满足预设条件后，可以采用不携带mask标记的用户行为对网络参数满足预设条件的推荐网络模型的网络参数进行修正。Furthermore, the recommendation network model to be trained is jointly trained with the mask network model to be trained during the training process, and the recommendation network model to be trained is used alone after the training is completed, and the mask model to be trained uses training samples with mask tags as input items, while in actual applications, the user behavior sequence obtained does not carry mask tags. Therefore, in order to make the trained recommendation network model adaptable to the training of user behaviors without mask tags, after the network model of the recommendation network to be trained meets the preset conditions, the network parameters of the recommendation network model whose network parameters meet the preset conditions can be corrected using user behaviors without mask tags.

相应的，在本实施例的一个实现方式中，所述基于所述注意力分数修正待训练的推荐网络模型的网络参数，以得到经过训练的推荐网络模型具体包括：基于所述注意力分数修正待训练的推荐网络模型的网络参数，并且当所述推荐网络模型的网络参数满足预设条件后，将所述第一行为序列和所述第二行为序列输入所述网络参数满足预设条件的推荐网络模型的模型参数；通过所述网络参数满足预设条件的推荐网络模型输出所述第二行为序列对应的注意力分数；基于注意力分数修正网络参数满足预设条件的推荐网络模型的模型参数，以得到经过训练的推荐网络模型。Correspondingly, in an implementation of the present embodiment, the correcting the network parameters of the recommended network model to be trained based on the attention score to obtain a trained recommended network model specifically includes: correcting the network parameters of the recommended network model to be trained based on the attention score, and when the network parameters of the recommended network model meet preset conditions, inputting the first behavior sequence and the second behavior sequence into the model parameters of the recommended network model whose network parameters meet the preset conditions; outputting the attention score corresponding to the second behavior sequence through the recommended network model whose network parameters meet the preset conditions; and correcting the model parameters of the recommended network model whose network parameters meet the preset conditions based on the attention score to obtain a trained recommended network model.

S30、根据所述推荐分数确定所述待推荐用户对应的推荐物品，并将所述推荐物品推送给所述待推荐用户。S30: Determine the recommended item corresponding to the user to be recommended according to the recommendation score, and push the recommended item to the user to be recommended.

具体地，所述推荐分数为待推荐物品与用户历史行为序列的自注意力分数，对于预设物品集中的每个待推荐物品，该待推荐物品对应一个自注意力分数，并将该自注意力分数作为该待推荐物品对应的推荐分数。此外，在获取到各待推荐物品对应的推荐分数后，可以将各待推荐物品对应的推荐分数进行比较，以确定所述待推荐用户对应的推荐物品。其中，所述待推荐用户对应的推荐物品可以为所有带推荐物品中推荐分数最高的待推荐物品。Specifically, the recommendation score is the self-attention score of the item to be recommended and the user's historical behavior sequence. For each item to be recommended in the preset item set, the item to be recommended corresponds to a self-attention score, and the self-attention score is used as the recommendation score corresponding to the item to be recommended. In addition, after obtaining the recommendation score corresponding to each item to be recommended, the recommendation scores corresponding to each item to be recommended can be compared to determine the recommended item corresponding to the user to be recommended. Among them, the recommended item corresponding to the user to be recommended can be the item to be recommended with the highest recommendation score among all recommended items.

基于上述基于多层注意力的推荐方法，本实施例提供了一种计算机可读存储介质，所述计算机可读存储介质存储有一个或者多个程序，所述一个或者多个程序可被一个或者多个处理器执行，以实现如上述实施例所述的基于多层注意力的推荐方法中的步骤。Based on the above-mentioned recommendation method based on multi-layer attention, this embodiment provides a computer-readable storage medium, which stores one or more programs. The one or more programs can be executed by one or more processors to implement the steps in the recommendation method based on multi-layer attention as described in the above embodiment.

基于上述基于多层注意力的推荐方法，本发明还提供了一种终端设备，如图4所示，其包括至少一个处理器(processor)20；显示屏21；以及存储器(memory)22，还可以包括通信接口(Communications Interface)23和总线24。其中，处理器20、显示屏21、存储器22和通信接口23可以通过总线24完成相互间的通信。显示屏21设置为显示初始设置模式中预设的用户引导界面。通信接口23可以传输信息。处理器20可以调用存储器22中的逻辑指令，以执行上述实施例中的方法。Based on the above-mentioned recommendation method based on multi-layer attention, the present invention also provides a terminal device, as shown in FIG4, which includes at least one processor (processor) 20; display screen 21; and memory (memory) 22, and may also include a communication interface (Communications Interface) 23 and a bus 24. Among them, the processor 20, the display screen 21, the memory 22 and the communication interface 23 can communicate with each other through the bus 24. The display screen 21 is configured to display a preset user guide interface in the initial setting mode. The communication interface 23 can transmit information. The processor 20 can call the logic instructions in the memory 22 to execute the method in the above-mentioned embodiment.

此外，上述的存储器22中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。In addition, the logic instructions in the memory 22 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product.

存储器22作为一种计算机可读存储介质，可设置为存储软件程序、计算机可执行程序，如本公开实施例中的方法对应的程序指令或模块。处理器20通过运行存储在存储器22中的软件程序、指令或模块，从而执行功能应用以及数据处理，即实现上述实施例中的方法。The memory 22, as a computer-readable storage medium, can be configured to store software programs, computer executable programs, such as program instructions or modules corresponding to the methods in the embodiments of the present disclosure. The processor 20 executes functional applications and data processing by running the software programs, instructions or modules stored in the memory 22, that is, implementing the methods in the above embodiments.

存储器22可包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、至少一个功能所需的应用程序；存储数据区可存储根据终端设备的使用所创建的数据等。此外，存储器22可以包括高速随机存取存储器，还可以包括非易失性存储器。例如，U盘、移动硬盘、只读存储器(Read-Only Memory，ROM)、随机存取存储器(Random Access Memory，RAM)、磁碟或者光盘等多种可以存储程序代码的介质，也可以是暂态存储介质。The memory 22 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and at least one application required for a function; the data storage area may store data created according to the use of the terminal device, etc. In addition, the memory 22 may include a high-speed random access memory and may also include a non-volatile memory. For example, a variety of media that can store program codes, such as a USB flash drive, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a disk or an optical disk, may also be a transient storage medium.

此外，上述存储介质以及终端设备中的多条指令处理器加载并执行的具体过程在上述方法中已经详细说明，在这里就不再一一陈述。In addition, the specific process of loading and executing the multiple instruction processors in the above-mentioned storage medium and the terminal device has been described in detail in the above-mentioned method, and will not be described one by one here.

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit it. Although the present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or make equivalent replacements for some of the technical features therein. However, these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A multi-tier attention-based recommendation method, the method comprising:

acquiring historical behaviors of a user to be recommended, and generating a user behavior sequence according to the historical behaviors;

determining a recommendation score corresponding to each article in a preset article set based on the user behavior sequence and the trained recommendation network model;

determining a recommended article corresponding to the user to be recommended according to the recommendation score, and pushing the recommended article to the user to be recommended;

wherein the recommended network model comprises a multi-layer attention structure, and the training process of the recommended network model comprises:

dividing the training user behavior sequence into a first behavior sequence and a second behavior sequence, wherein the second behavior sequence comprises the last behavior record in the training user behavior sequence;

performing mask processing on the first behavior sequence according to a preset mask strategy to obtain a masked first behavior sequence;

outputting a generated behavior sequence corresponding to the first behavior sequence after the mask is output based on a mask network model to be trained;

equally dividing the generated behavior sequence and the second behavior sequence into a plurality of sub-vectors respectively;

inputting the generated sequence and the second behavior sequence after being divided into the multilayer attention structure, and outputting an attention score of the generated behavior sequence relative to the second behavior sequence through the multilayer attention structure;

correcting network parameters of a recommended network model to be trained based on the attention score, and inputting the first behavior sequence and the second behavior sequence into the network parameters of the recommended network model of which the network parameters meet preset conditions when the network parameters of the recommended network model meet the preset conditions;

outputting the attention score corresponding to the second behavior sequence through a recommended network model with the network parameter meeting a preset condition;

and correcting the network parameters of the recommended network model with the network parameters meeting the preset conditions based on the attention scores to obtain the trained recommended network model.

2. The multi-layer attention-based recommendation method according to claim 1, wherein the obtaining of the historical behaviors of the user to be recommended and the generating of the user behavior sequence according to the historical behaviors specifically comprise:

acquiring historical behaviors of a user to be recommended, wherein each historical behavior comprises an article identifier and behavior time;

and sequencing the historical behaviors according to the behavior time to obtain a user behavior sequence.

3. The multi-tier attention-based recommendation method according to claim 1, wherein the determining recommendation scores corresponding to respective items in a preset item set based on the user behavior sequence and a trained recommendation network model specifically comprises:

for each article in a preset article set, acquiring an article vector corresponding to the article;

and generating an article sequence based on the user behavior sequence and the article vector, and inputting the article sequence into a trained recommendation network model so as to output a recommendation score corresponding to the article through the recommendation network model.

4. The multi-tier attention-based recommendation method of claim 1, wherein said mask network model comprises a multi-tier attention structure; the generating behavior sequence corresponding to the first behavior sequence after the mask is output based on the mask network model to be trained specifically includes:

inputting the masked first behavior sequence into the multi-layer attention structure, and generating a behavior sequence through the multi-layer attention structure output.

5. The multi-layer attention-based recommendation method according to claim 4, wherein the mask network model further comprises an identification structure, and after outputting a generation behavior sequence corresponding to the masked first behavior sequence based on the mask network model to be trained, the method further comprises;

inputting the generated behavior sequence into the recognition structure, and outputting a generated article identifier through the recognition structure;

and correcting the network parameters of the mask network model to be trained based on the generated article identifier so as to train the mask network model to be trained.

6. A computer-readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to perform the steps of the multi-tiered attention recommendation method as recited in any one of claims 1-5.

7. A terminal device, comprising: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;

the communication bus realizes connection communication between the processor and the memory;

the processor, when executing the computer readable program, implements the steps in the multi-tiered attention recommendation method as recited in any of claims 1-5.