WO2022151649A1 - Deep interest network-based topic recommendation method and apparatus - Google Patents

Deep interest network-based topic recommendation method and apparatus Download PDF

Info

Publication number
WO2022151649A1
WO2022151649A1 PCT/CN2021/099766 CN2021099766W WO2022151649A1 WO 2022151649 A1 WO2022151649 A1 WO 2022151649A1 CN 2021099766 W CN2021099766 W CN 2021099766W WO 2022151649 A1 WO2022151649 A1 WO 2022151649A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
item
topic
vector
deep interest
Prior art date
Application number
PCT/CN2021/099766
Other languages
French (fr)
Chinese (zh)
Inventor
刘志杰
陈鑫晶
蔡淇森
Original Assignee
稿定(厦门)科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 稿定(厦门)科技有限公司 filed Critical 稿定(厦门)科技有限公司
Publication of WO2022151649A1 publication Critical patent/WO2022151649A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Definitions

  • the invention relates to the technical field of deep learning, and in particular, to a topic recommendation method based on a deep learning network, a computer-readable storage medium, a computer device, and a topic recommendation device based on a deep interest network.
  • the method of portrait is often used; that is, first, based on the rules, the user's preference scores for different topics are counted; Priority is given to display to complete the recommendation of the topic; however, this method is highly dependent on the tags corresponding to the topic. In order to improve the accuracy of the topic recommendation, a lot of manpower and material resources must be spent to establish high-quality tags.
  • an object of the present invention is to propose a topic recommendation method based on a deep interest network, which can accurately recommend topics to users without establishing labels corresponding to topics, and reduce the cost of the topic recommendation process. Human and material resources.
  • a second object of the present invention is to provide a computer-readable storage medium.
  • the third object of the present invention is to propose a computer device.
  • the fourth object of the present invention is to provide a topic recommendation device based on a deep interest network.
  • the embodiment of the first aspect of the present invention proposes a topic recommendation method based on a deep interest network, including the following steps: acquiring user information and historical click data of users, and according to the user information and the historical click data data to generate training data; perform model training according to the training data to obtain a deep interest capture model; obtain item information corresponding to an item, and input the item information into the deep interest capture model to capture through the deep interest
  • the model outputs the corresponding item vector, and calculates the thematic vector according to the item vector corresponding to each item; obtains the user's click data to be analyzed, and inputs the to-be-analyzed click data into the depth interest capture model to pass the depth
  • the interest capture model outputs a corresponding user vector; performs similarity retrieval according to the user vector and the topic vector, determines a topic recommendation list according to the retrieval result, and pushes the topic recommendation list to the user.
  • topic recommendation method based on the deep interest network according to the embodiment of the present invention, first, user information and historical click data of the user are obtained, and training data is generated according to the user information and the historical click data; then, according to the training data Perform model training to obtain a deep interest capture model; then, obtain item information corresponding to the item, and input the item information into the deep interest capture model, to output a corresponding item vector through the deep interest capture model, and Calculate the thematic vector according to the item vector corresponding to each item; then, obtain the user's click data to be analyzed, and input the to-be-analyzed click data into the deep interest capture model, so as to output the corresponding deep interest capture model through the deep interest capture model User vector; then, carry out similarity retrieval according to the user vector and the topic vector, and determine the topic recommendation list according to the retrieval result, and push the topic recommendation list to the user; so as to realize that there is no need to establish a label corresponding to the topic Under the premise of , it can accurately recommend topics to users, and reduce
  • topic recommendation method based on the deep interest network proposed according to the above embodiments of the present invention may also have the following additional technical features:
  • the user's historical click data includes item information and time information corresponding to each historical click behavior of the user, and ranking information among various historical click behaviors.
  • the training data includes discrete features, continuous features, and sequence features; wherein, the discrete features include time information, user attribute information, and item classification information, and the continuous features include user historically clicked item classifications.
  • the sequence feature includes the item information sequence corresponding to the user's historical click behavior.
  • the training data further includes sample time characteristics
  • generating training data according to the user information and the historical click data includes: generating a training sample according to the user information and the historical click data, and calculating The time difference between the training sample and the current time, and determining whether the time difference is greater than a preset time threshold, so as to use the judgment result as a time characteristic of the sample.
  • generating training data according to the user information and the historical click data includes: counting the number of clicks corresponding to each item, and determining the probability of selecting a negative sample corresponding to each item according to the statistical result; The negative sample selection probability corresponding to the item is used for random selection of negative samples.
  • determining the topic recommendation list according to the retrieval result includes: clustering topics according to the kmeas clustering algorithm to generate multiple topic categories; generating a topic list to be recommended according to the retrieval results, and according to the multiple topic categories and The sliding window breaking method is used to break up the list of topics to be recommended, so as to generate a final recommendation list of topics.
  • the embodiment of the second aspect of the present invention provides a computer-readable storage medium, which stores a topic recommendation program based on a deep interest network, and the topic recommendation program based on a deep interest network is implemented when the processor is executed.
  • the topic recommendation method based on deep interest network.
  • the processor by storing the topic recommendation program based on the deep interest network, the processor implements the above-mentioned topic recommendation program based on the deep interest network when executing the topic recommendation program based on the deep interest network
  • the recommendation method can be used to accurately recommend topics to users without establishing labels corresponding to topics, thereby reducing the manpower and material resources required in the process of topic recommendation.
  • a third aspect of the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and running on the processor.
  • the processor executes the program, To achieve the above-mentioned topic recommendation method based on deep interest network.
  • the deep interest network-based topic recommendation program is stored in the memory, so that when the processor executes the deep interest network-based topic recommendation program, the above-mentioned deep interest network-based topic recommendation program is implemented.
  • the recommendation method can be used to accurately recommend topics to users without establishing labels corresponding to topics, thereby reducing the manpower and material resources required in the process of topic recommendation.
  • a fourth aspect of the present invention provides a topic recommendation device based on a deep interest network, including: an acquisition module, the acquisition module is used to acquire user information and user historical click data, and according to the User information and the historical click data to generate training data; a training module, which is used for model training according to the training data to obtain a deep interest capture model; an interest capture module, which is used to acquire items corresponding item information, and input the item information into the deep interest capture model, so as to output the corresponding item vector through the deep interest capture model, and calculate the topic vector according to the item vector corresponding to each item; the interest The capture module is also used to obtain the click data to be analyzed of the user, and input the click data to be analyzed into the deep interest capture model, so as to output the corresponding user vector through the deep interest capture model; the recommendation module, the recommending The module is configured to perform similarity retrieval according to the user vector and the topic vector, determine a topic recommendation list according to the retrieval result, and push the topic recommendation list to the user
  • the acquisition module is configured to acquire user information and historical click data of the user, and generate training data according to the user information and the historical click data;
  • the training module uses Carry out model training according to the training data to obtain a deep interest capture model;
  • the interest capture module is used to obtain the item information corresponding to the item, and input the item information into the deep interest capture model, so as to obtain the deep interest capture model through the deep interest capture module.
  • the capture model outputs the corresponding item vector, and calculates the topic vector according to the item vector corresponding to each item;
  • the interest capture module is also used to obtain the user's click data to be analyzed, and input the to-be-analyzed click data into the deep interest capture model, to output the corresponding user vector through the deep interest capture model;
  • the recommendation module is used to perform similarity retrieval according to the user vector and the topic vector, and determine the topic recommendation list according to the retrieval result, and recommend the topic
  • the list is pushed to the user; thus, it is possible to accurately recommend the topic to the user without establishing a label corresponding to the topic, and reduce the manpower and material resources required in the process of the topic recommendation.
  • topic recommendation device based on the deep interest network proposed according to the above embodiments of the present invention may also have the following additional technical features:
  • the user's historical click data includes item information and time information corresponding to each historical click behavior of the user, and ranking information among various historical click behaviors.
  • FIG. 1 is a schematic flowchart of a topic recommendation method based on a deep interest network according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of a deep interest capture model according to an embodiment of the present invention.
  • FIG. 3 is a schematic block diagram of a topic recommendation apparatus based on a deep interest network according to an embodiment of the present invention.
  • the special recommendation method of the network first, obtain user information and user historical click data, and generate training data according to the user information and the historical click data; then, carry out model training according to the training data to obtain deep interest capture Then, obtain the item information corresponding to the item, and input the item information into the deep interest capture model, so as to output the corresponding item vector through the deep interest capture model, and calculate according to the item vector corresponding to each item thematic vector; then, obtain the click data to be analyzed of the user, and input the click data to be analyzed into the deep interest capture model, so as to output the corresponding user vector through the deep interest capture model; then, according to the user
  • the similarity search is performed between the vector and the topic vector, and the topic recommendation list is determined according to the retrieval result, and the topic recommendation list is pushed to
  • the topic recommendation method based on a deep interest network includes the following steps:
  • S101 Obtain user information and historical click data of the user, and generate training data according to the user information and historical click data.
  • the user's historical click data includes the user's exposure log and click behavior log, wherein the exposure log records whether an item is exposed to a user on that day, and the click behavior log records the corresponding click behavior of the user. information.
  • the user's historical click data includes item information corresponding to each historical click behavior of the user, time information, and ranking information among various historical click behaviors.
  • the user's historical click data only contains the information corresponding to the user's click behavior, but not the exposure information; it should be noted that, in the actual scene, because different pages have different depths, pages of different depths are The CTR varies greatly, and pages with deeper access depths tend to have higher CTRs; therefore, in order to avoid the impact of differences in CTRs caused by pages with different depths, the historical click data does not include exposure information.
  • the training data includes discrete features, continuous features, and sequence features; wherein the discrete features include time information, user attribute information, and item classification information, the continuous features include historical user clicked item classification statistics, and sequence features Including the item information sequence corresponding to the user's historical click behavior.
  • the discrete features include date information (for example, the day of the week, whether it is a working day, whether it is a working period, etc.), item ID, item classification ID, and item attribute ID, etc.
  • continuous features include the number of clicks on different attributes in the user's history, such features can be directly input continuous values without processing; sequence features include the user's historical click item ID Sequence, historical click category ID sequence.
  • the training data further includes sample time features
  • generating the training data according to the user information and the historical click data includes: generating a training sample according to the user information and the historical click data, and calculating the difference between the training sample and the current time. time difference, and judging whether the time difference is greater than a preset time threshold, so as to use the judgment result as a sample time characteristic.
  • the training data contains time information, that is, the contextual features include the day of the week, whether it is a working day, etc., in the training process of the model, if the training samples are fully scattered in time, the training process will also be unstable. , due to the influence of time factors, the sample will have a great impact on the model. The closer the sample is to the test date, the greater the effect it plays; therefore, the sample time feature is added to ensure the stability of the model training process. .
  • training data is generated based on user information and historical click data, including:
  • the number of clicks corresponding to each item is counted, and the negative sample selection probability corresponding to each item is determined according to the statistical result, and the negative sample is randomly selected according to the negative sample selection probability corresponding to each item.
  • the negative sample is randomly selected according to the negative sample selection probability corresponding to each item.
  • there are many ways to select negative samples For example, directly selecting a preset number of negative samples from the positive samples in a random manner; preferably, the number of clicks corresponding to each item can be counted in the above-mentioned manner to determine the probability that the item is selected as a negative sample; Therefore, the more popular items have a higher probability of being selected as negative samples, which makes the final training model more accurate.
  • the sequence feature in order to avoid the problem that the calculation amount of the softmax function is too large, is set to the form of binary classification; that is, when the input item ID is the item ID that the user clicks next, the label is 1, 0 otherwise.
  • the sequence feature structure is shown in Table 1:
  • FIG. 2 is a schematic structural diagram of a deep interest capture model according to an embodiment of the present invention.
  • sequence features, discrete features, and continuous features are After splicing, it passes through the BatchNormalization layer, and then input to the multi-layer fully connected layer. After each layer of the fully connected layer, it will be connected to the BatchNormalization layer and the Dice activation function, and finally the user vector is obtained.
  • the Adagrad optimizer is used, the initial learning rate is 0.1, the learning rate decays to 1/2 of the original value every 50,000 steps, and the Batch size is 128.
  • the L2 regularization parameter will be added to the Embedding layer and the DNN layer, and the regular loss will be added to the loss function for optimization.
  • S103 Obtain item information corresponding to the item, and input the item information into the deep interest capture model, so as to output the corresponding item vector through the deep interest capture model, and calculate the thematic vector according to the item vector corresponding to each item.
  • each topic will contain a different number of items, and the topic is a collection of items of the same category; for example, if the topic is sports, the items corresponding to the topic may include: football, basketball, swimming, etc.
  • the topic vector there can be various ways to calculate the topic vector according to the item vector corresponding to each item; for example, after obtaining the item vector, average pooling is performed on the item vectors of all items under the topic, so as to use the pooling result as the topic thematic vector.
  • S104 Acquire click data of the user to be analyzed, and input the click data to be analyzed into a deep interest capture model, so as to output a corresponding user vector through the deep interest capture model.
  • S105 Perform similarity retrieval according to the user vector and the topic vector, determine the topic recommendation list according to the retrieval result, and push the topic recommendation list to the user.
  • determining the topic recommendation list according to the retrieval result includes: clustering topics according to the kmeas clustering algorithm to generate multiple topic categories; generating a topic list to be recommended according to the retrieval result, and The sliding window breaking method is used to break up the list of recommended topics to generate the final recommended topic list.
  • the category sequence of the topic is A
  • the size of the sliding window is 3, it means that the thematic categories placed in three adjacent positions do not overlap.
  • the categories in the first sliding window are A, A, A, and the position index starts from 0. You need to break up the lists in the first and second positions, and traverse from the third position back.
  • the first different category is B, then the A in the first position and the B in the third position are exchanged, and the thematic category sequence becomes A
  • the categories in the first sliding window become A, B, A, then the second position needs to be processed, starting from the fourth position and traversing backwards, the first different category is the fourth position.
  • C so if position 2 and position 4 are exchanged, the thematic category id sequence becomes A
  • the category sequence in the second sliding window is B, C, A, then no processing is required, the window continues to slide forward, and the third window is C, A, A, then A in the fourth position needs to be Processing, exchange with position 5, the thematic category ID sequence becomes A
  • the topic recommendation method based on the deep interest network according to the embodiment of the present invention, first, user information and historical click data of the user are obtained, and training data is generated according to the user information and the historical click data; then, Perform model training according to the training data to obtain a deep interest capture model; then, obtain item information corresponding to the item, and input the item information into the deep interest capture model, so as to output the corresponding deep interest capture model through the deep interest capture model and calculate the thematic vector according to the item vector corresponding to each item; then, obtain the click data to be analyzed of the user, and input the click data to be analyzed into the deep interest capture model, so as to pass the deep interest
  • the embodiments of the present invention provide a computer-readable storage medium on which a topic recommendation program based on a deep interest network is stored, and when the topic recommendation program based on a deep interest network is executed by a processor, the above-mentioned The topic recommendation method based on deep interest network.
  • the processor by storing the topic recommendation program based on the deep interest network, the processor implements the above-mentioned topic recommendation program based on the deep interest network when executing the topic recommendation program based on the deep interest network
  • the recommendation method can be used to accurately recommend topics to users without establishing labels corresponding to topics, thereby reducing the manpower and material resources required in the process of topic recommendation.
  • the embodiments of the present invention provide a computer device, including a memory, a processor, and a computer program stored in the memory and running on the processor.
  • the processor executes the program, the processor implements the following The above-mentioned topic recommendation method based on deep interest network.
  • the deep interest network-based topic recommendation program is stored in the memory, so that when the processor executes the deep interest network-based topic recommendation program, the above-mentioned deep interest network-based topic recommendation program is implemented.
  • the recommendation method can be used to accurately recommend topics to users without establishing labels corresponding to topics, thereby reducing the manpower and material resources required in the process of topic recommendation.
  • the embodiment of the present invention proposes a topic recommendation device based on a deep interest network.
  • the topic recommendation device based on a deep interest network includes: an acquisition module 10, a training module 20, an interest capture module module 30 and recommendation module 40 .
  • the acquisition module 10 is used to acquire user information and historical click data of the user, and generate training data according to the user information and historical click data;
  • the training module 20 is used for model training according to the training data to obtain a deep interest capture model
  • the interest capture module 30 is used to obtain the article information corresponding to the article, and the article information is input into the deep interest capture model, to output the corresponding article vector by the deep interest capture model, and calculate the thematic vector according to the article vector corresponding to each article;
  • the interest capture module 30 is further configured to obtain the click data to be analyzed of the user, and input the click data to be analyzed into the deep interest capture model, so as to output the corresponding user vector through the deep interest capture model;
  • the recommendation module 40 is configured to perform similarity retrieval according to the user vector and the topic vector, determine the topic recommendation list according to the retrieval result, and push the topic recommendation list to the user.
  • the user's historical click data includes item information corresponding to each historical click behavior of the user, time information, and ranking information among various historical click behaviors.
  • an acquisition module is set to acquire user information and historical click data of users, and to generate training according to the user information and the historical click data
  • the training module is used to perform model training according to the training data to obtain a deep interest capture model
  • the interest capture module is used to obtain the item information corresponding to the item, and input the item information into the deep interest capture model to obtain a deep interest capture model.
  • the corresponding item vector is output through the deep interest capture model, and the thematic vector is calculated according to the item vector corresponding to each item; the interest capture module is also used to obtain the user's click data to be analyzed, and input the to-be-analyzed click data into the
  • the deep interest capture model is used to output the corresponding user vector through the deep interest capture model; the recommendation module is configured to perform similarity retrieval according to the user vector and the topic vector, and determine a topic recommendation list according to the retrieval result, and The topic recommendation list is pushed to the user; thus, the topic recommendation can be accurately performed to the user without establishing a label corresponding to the topic, and the manpower and material resources required in the topic recommendation process are reduced.
  • embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
  • the apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
  • any reference signs placed between parentheses shall not be construed as limiting the claim.
  • the word “comprising” does not exclude the presence of elements or steps not listed in a claim.
  • the word “a” or “an” preceding an element does not preclude the presence of a plurality of such elements.
  • the invention can be implemented by means of hardware comprising several different components and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means may be embodied by one and the same item of hardware.
  • the use of the words first, second, and third, etc. do not denote any order. These words can be interpreted as names.
  • first and second are only used for description purposes, and cannot be interpreted as indicating or implying relative importance or the number of indicated technical features. Thus, a feature defined as “first” or “second” may expressly or implicitly include one or more of that feature.
  • “plurality” means two or more, unless otherwise expressly and specifically defined.
  • the terms “installed”, “connected”, “connected”, “fixed” and other terms should be understood in a broad sense, for example, it may be a fixed connection or a detachable connection , or integrated; it can be a mechanical connection or an electrical connection; it can be a direct connection or an indirect connection through an intermediate medium, and it can be the internal connection of the two elements or the interaction relationship between the two elements.
  • installed may be a fixed connection or a detachable connection , or integrated; it can be a mechanical connection or an electrical connection; it can be a direct connection or an indirect connection through an intermediate medium, and it can be the internal connection of the two elements or the interaction relationship between the two elements.
  • a first feature "on” or “under” a second feature may be in direct contact between the first and second features, or the first and second features indirectly through an intermediary touch.
  • the first feature being “above”, “over” and “above” the second feature may mean that the first feature is directly above or obliquely above the second feature, or simply means that the first feature is level higher than the second feature.
  • a first feature “below”, “below” and “below” a second feature may mean that the first feature is directly below or diagonally below the second feature, or simply means that the first feature has a lower level than the second feature.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application discloses a deep interest network-based topic recommendation method and apparatus, a medium, and a device. The deep interest network-based topic recommendation method comprises: obtaining user information and historical click data of a user, and generating training data; obtaining, by training, a deep interest capture model; obtaining item information corresponding to an item, outputting a corresponding item vector by means of the deep interest capture model, and calculating a topic vector according to an item vector corresponding to each item; obtaining click data to be analyzed of the user, and outputting a corresponding user vector by means of the deep interest capture model; and performing similarity retrieval according to the user vector and the topic vector, determining a topic recommendation list according to the retrieval result, and pushing the topic recommendation list to the user. The application can precisely perform topic recommendation on a user without establishing labels corresponding to a topic, thereby reducing manpower and material resources consumed during topic recommendation.

Description

基于深度兴趣网络的专题推荐方法及装置Method and device for topic recommendation based on deep interest network 技术领域technical field
本发明涉及深度学习技术领域,特别涉及一种基于深度学习网络的专题推荐方法、一种计算机可读存储介质、一种计算机设备以及一种基于深度兴趣网络的专题推荐装置。The invention relates to the technical field of deep learning, and in particular, to a topic recommendation method based on a deep learning network, a computer-readable storage medium, a computer device, and a topic recommendation device based on a deep interest network.
背景技术Background technique
相关技术中,在需要为用户推荐相应的专题时,多采用画像的方式;即言,首先,基于规则统计出用户对于不同专题的偏好得分;然后,将用户最偏好的类目下的专题进行优先展示,以完成专题的推荐;然而,这种方式对于专题对应的标签依赖性强,为了提高专题推荐的准确性,必然需要耗费大量的人力物力来建立高质量的标签。In the related art, when it is necessary to recommend corresponding topics for users, the method of portrait is often used; that is, first, based on the rules, the user's preference scores for different topics are counted; Priority is given to display to complete the recommendation of the topic; however, this method is highly dependent on the tags corresponding to the topic. In order to improve the accuracy of the topic recommendation, a lot of manpower and material resources must be spent to establish high-quality tags.
发明内容SUMMARY OF THE INVENTION
本发明旨在至少在一定程度上解决上述技术中的技术问题之一。为此,本发明的一个目的在于提出一种基于深度兴趣网络的专题推荐方法,能够在无需建立专题对应的标签的前提下,准确地对用户进行专题推荐,降低专题推荐过程中所需耗费的人力和物力。The present invention aims to solve one of the technical problems in the above technologies at least to a certain extent. Therefore, an object of the present invention is to propose a topic recommendation method based on a deep interest network, which can accurately recommend topics to users without establishing labels corresponding to topics, and reduce the cost of the topic recommendation process. Human and material resources.
本发明的第二个目的在于提出一种计算机可读存储介质。A second object of the present invention is to provide a computer-readable storage medium.
本发明的第三个目的在于提出一种计算机设备。The third object of the present invention is to propose a computer device.
本发明的第四个目的在于提出一种基于深度兴趣网络的专题推荐装置。The fourth object of the present invention is to provide a topic recommendation device based on a deep interest network.
为达到上述目的,本发明第一方面实施例提出了一种基于深度兴趣网络的专题推荐方法,包括以下步骤:获取用户信息和用户的历史点击数据,并根据所述用户信息和所述历史点击数据生成训练数据;根据所述训练数据进行模型训练,以得到深度兴趣捕获模型;获取物品对应的物品信息,并将所述物品信息输入到所述深度兴趣捕获模型,以通过所述深度兴趣捕获模型输出对应的物品向量,以及根据每个物品对应的物品向量计算专题向量;获取用户的待分析点击数据,并将所述待分析点击数据输入到所述深度兴趣捕获模型,以通过所述深度兴趣捕获模型输出对应的用户向量;根据所述用户向量和所述专题向量进行相似性检索,并根据检索结果确定专题推荐列表,以及将所述专题推荐列表推送给该用户。In order to achieve the above object, the embodiment of the first aspect of the present invention proposes a topic recommendation method based on a deep interest network, including the following steps: acquiring user information and historical click data of users, and according to the user information and the historical click data data to generate training data; perform model training according to the training data to obtain a deep interest capture model; obtain item information corresponding to an item, and input the item information into the deep interest capture model to capture through the deep interest The model outputs the corresponding item vector, and calculates the thematic vector according to the item vector corresponding to each item; obtains the user's click data to be analyzed, and inputs the to-be-analyzed click data into the depth interest capture model to pass the depth The interest capture model outputs a corresponding user vector; performs similarity retrieval according to the user vector and the topic vector, determines a topic recommendation list according to the retrieval result, and pushes the topic recommendation list to the user.
根据本发明实施例的基于深度兴趣网络的专题推荐方法,首先,获取用户信息和用户的历史点击数据,并根据所述用户信息和所述历史点击数据生成训练数据;接着,根据所述训练数据进行模型训练,以得到深度兴趣捕获模型;然后,获取物品对应的物品信息,并将所述物品信息输入到所述深度兴趣捕获模型,以通过所述深度兴趣捕获模型输出对应 的物品向量,以及根据每个物品对应的物品向量计算专题向量;接着,获取用户的待分析点击数据,并将所述待分析点击数据输入到所述深度兴趣捕获模型,以通过所述深度兴趣捕获模型输出对应的用户向量;然后,根据所述用户向量和所述专题向量进行相似性检索,并根据检索结果确定专题推荐列表,以及将所述专题推荐列表推送给该用户;从而实现在无需建立专题对应的标签的前提下,准确地对用户进行专题推荐,降低专题推荐过程中所需耗费的人力和物力。According to the topic recommendation method based on the deep interest network according to the embodiment of the present invention, first, user information and historical click data of the user are obtained, and training data is generated according to the user information and the historical click data; then, according to the training data Perform model training to obtain a deep interest capture model; then, obtain item information corresponding to the item, and input the item information into the deep interest capture model, to output a corresponding item vector through the deep interest capture model, and Calculate the thematic vector according to the item vector corresponding to each item; then, obtain the user's click data to be analyzed, and input the to-be-analyzed click data into the deep interest capture model, so as to output the corresponding deep interest capture model through the deep interest capture model User vector; then, carry out similarity retrieval according to the user vector and the topic vector, and determine the topic recommendation list according to the retrieval result, and push the topic recommendation list to the user; so as to realize that there is no need to establish a label corresponding to the topic Under the premise of , it can accurately recommend topics to users, and reduce the manpower and material resources required in the process of topic recommendation.
另外,根据本发明上述实施例提出的基于深度兴趣网络的专题推荐方法还可以具有如下附加的技术特征:In addition, the topic recommendation method based on the deep interest network proposed according to the above embodiments of the present invention may also have the following additional technical features:
可选地,所述用户的历史点击数据包括用户每次历史点击行为对应的物品信息、时间信息和各历史点击行为之间的排序信息。Optionally, the user's historical click data includes item information and time information corresponding to each historical click behavior of the user, and ranking information among various historical click behaviors.
可选地,所述训练数据包括离散型特征、连续型特征和序列特征;其中,所述离散型特征包括时间信息、用户属性信息和物品分类信息,所述连续型特征包括用户历史点击物品分类统计信息,所述序列特征包括用户历史点击行为对应的物品信息序列。Optionally, the training data includes discrete features, continuous features, and sequence features; wherein, the discrete features include time information, user attribute information, and item classification information, and the continuous features include user historically clicked item classifications. Statistical information, the sequence feature includes the item information sequence corresponding to the user's historical click behavior.
可选地,所述训练数据还包括样本时间特征,其中,根据所述用户信息和所述历史点击数据生成训练数据,包括:根据所述用户信息和所述历史点击数据生成训练样本,并计算所述训练样本与当前时间之间的时间差值,以及判断该时间差值是否大于预设的时间阈值,以便将判断结果作为样本时间特征。Optionally, the training data further includes sample time characteristics, wherein generating training data according to the user information and the historical click data includes: generating a training sample according to the user information and the historical click data, and calculating The time difference between the training sample and the current time, and determining whether the time difference is greater than a preset time threshold, so as to use the judgment result as a time characteristic of the sample.
可选地,根据所述用户信息和所述历史点击数据生成训练数据,包括:统计每个物品对应的被点击次数,并根据统计结果确定每个物品对应的负样本选取概率,以及根据每个物品对应的负样本选取概率进行负样本的随机选择。Optionally, generating training data according to the user information and the historical click data includes: counting the number of clicks corresponding to each item, and determining the probability of selecting a negative sample corresponding to each item according to the statistical result; The negative sample selection probability corresponding to the item is used for random selection of negative samples.
可选地,根据检索结果确定专题推荐列表,包括:根据kmeas聚类算法对专题进行聚类,以生成多个专题类别;根据检索结果生成待推荐专题列表,并根据所述多个专题类别和滑窗打散法对所述待推荐专题列表进行打散处理,以生成最终专题推荐列表。Optionally, determining the topic recommendation list according to the retrieval result includes: clustering topics according to the kmeas clustering algorithm to generate multiple topic categories; generating a topic list to be recommended according to the retrieval results, and according to the multiple topic categories and The sliding window breaking method is used to break up the list of topics to be recommended, so as to generate a final recommendation list of topics.
为达到上述目的,本发明第二方面实施例提出了一种计算机可读存储介质,其上存储有基于深度兴趣网络的专题推荐程序,该基于深度兴趣网络的专题推荐程序被处理器执行时实现如上述的基于深度兴趣网络的专题推荐方法。In order to achieve the above object, the embodiment of the second aspect of the present invention provides a computer-readable storage medium, which stores a topic recommendation program based on a deep interest network, and the topic recommendation program based on a deep interest network is implemented when the processor is executed. As mentioned above, the topic recommendation method based on deep interest network.
根据本发明实施例的计算机可读存储介质,通过存储基于深度兴趣网络的专题推荐程序,以使得处理器在执行该基于深度兴趣网络的专题推荐程序时,实现如上述的基于深度兴趣网络的专题推荐方法,从而实现在无需建立专题对应的标签的前提下,准确地对用户进行专题推荐,降低专题推荐过程中所需耗费的人力和物力。According to the computer-readable storage medium of the embodiment of the present invention, by storing the topic recommendation program based on the deep interest network, the processor implements the above-mentioned topic recommendation program based on the deep interest network when executing the topic recommendation program based on the deep interest network The recommendation method can be used to accurately recommend topics to users without establishing labels corresponding to topics, thereby reducing the manpower and material resources required in the process of topic recommendation.
为达到上述目的,本发明第三方面实施例提出了一种计算机设备,包括存储器、处理 器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时,实现如上述的基于深度兴趣网络的专题推荐方法。In order to achieve the above object, a third aspect of the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and running on the processor. When the processor executes the program, To achieve the above-mentioned topic recommendation method based on deep interest network.
根据本发明实施例的计算机设备,通过存储器对基于深度兴趣网络的专题推荐程序进行存储,以使得处理器在执行该基于深度兴趣网络的专题推荐程序时,实现如上述的基于深度兴趣网络的专题推荐方法,从而实现在无需建立专题对应的标签的前提下,准确地对用户进行专题推荐,降低专题推荐过程中所需耗费的人力和物力。According to the computer device of the embodiment of the present invention, the deep interest network-based topic recommendation program is stored in the memory, so that when the processor executes the deep interest network-based topic recommendation program, the above-mentioned deep interest network-based topic recommendation program is implemented. The recommendation method can be used to accurately recommend topics to users without establishing labels corresponding to topics, thereby reducing the manpower and material resources required in the process of topic recommendation.
为达到上述目的,本发明第四方面实施例提出了一种基于深度兴趣网络的专题推荐装置,包括:获取模块,所述获取模块用于获取用户信息和用户的历史点击数据,并根据所述用户信息和所述历史点击数据生成训练数据;训练模块,所述训练模块用于根据所述训练数据进行模型训练,以得到深度兴趣捕获模型;兴趣捕获模块,所述兴趣捕获模块用于获取物品对应的物品信息,并将所述物品信息输入到所述深度兴趣捕获模型,以通过所述深度兴趣捕获模型输出对应的物品向量,以及根据每个物品对应的物品向量计算专题向量;所述兴趣捕获模块还用于获取用户的待分析点击数据,并将所述待分析点击数据输入到所述深度兴趣捕获模型,以通过所述深度兴趣捕获模型输出对应的用户向量;推荐模块,所述推荐模块用于根据所述用户向量和所述专题向量进行相似性检索,并根据检索结果确定专题推荐列表,以及将所述专题推荐列表推送给该用户。In order to achieve the above purpose, a fourth aspect of the present invention provides a topic recommendation device based on a deep interest network, including: an acquisition module, the acquisition module is used to acquire user information and user historical click data, and according to the User information and the historical click data to generate training data; a training module, which is used for model training according to the training data to obtain a deep interest capture model; an interest capture module, which is used to acquire items corresponding item information, and input the item information into the deep interest capture model, so as to output the corresponding item vector through the deep interest capture model, and calculate the topic vector according to the item vector corresponding to each item; the interest The capture module is also used to obtain the click data to be analyzed of the user, and input the click data to be analyzed into the deep interest capture model, so as to output the corresponding user vector through the deep interest capture model; the recommendation module, the recommending The module is configured to perform similarity retrieval according to the user vector and the topic vector, determine a topic recommendation list according to the retrieval result, and push the topic recommendation list to the user.
根据本发明实施例的基于深度兴趣网络的专题推荐装置,通过设置获取模块用于获取用户信息和用户的历史点击数据,并根据所述用户信息和所述历史点击数据生成训练数据;训练模块用于根据所述训练数据进行模型训练,以得到深度兴趣捕获模型;兴趣捕获模块用于获取物品对应的物品信息,并将所述物品信息输入到所述深度兴趣捕获模型,以通过所述深度兴趣捕获模型输出对应的物品向量,以及根据每个物品对应的物品向量计算专题向量;兴趣捕获模块还用于获取用户的待分析点击数据,并将所述待分析点击数据输入到所述深度兴趣捕获模型,以通过所述深度兴趣捕获模型输出对应的用户向量;推荐模块用于根据所述用户向量和所述专题向量进行相似性检索,并根据检索结果确定专题推荐列表,以及将所述专题推荐列表推送给该用户;从而实现在无需建立专题对应的标签的前提下,准确地对用户进行专题推荐,降低专题推荐过程中所需耗费的人力和物力。According to the special topic recommendation device based on the deep interest network according to the embodiment of the present invention, the acquisition module is configured to acquire user information and historical click data of the user, and generate training data according to the user information and the historical click data; the training module uses Carry out model training according to the training data to obtain a deep interest capture model; the interest capture module is used to obtain the item information corresponding to the item, and input the item information into the deep interest capture model, so as to obtain the deep interest capture model through the deep interest capture module. The capture model outputs the corresponding item vector, and calculates the topic vector according to the item vector corresponding to each item; the interest capture module is also used to obtain the user's click data to be analyzed, and input the to-be-analyzed click data into the deep interest capture model, to output the corresponding user vector through the deep interest capture model; the recommendation module is used to perform similarity retrieval according to the user vector and the topic vector, and determine the topic recommendation list according to the retrieval result, and recommend the topic The list is pushed to the user; thus, it is possible to accurately recommend the topic to the user without establishing a label corresponding to the topic, and reduce the manpower and material resources required in the process of the topic recommendation.
另外,根据本发明上述实施例提出的基于深度兴趣网络的专题推荐装置还可以具有如下附加的技术特征:In addition, the topic recommendation device based on the deep interest network proposed according to the above embodiments of the present invention may also have the following additional technical features:
可选地,所述用户的历史点击数据包括用户每次历史点击行为对应的物品信息、时间信息和各历史点击行为之间的排序信息。Optionally, the user's historical click data includes item information and time information corresponding to each historical click behavior of the user, and ranking information among various historical click behaviors.
附图说明Description of drawings
图1为根据本发明实施例的基于深度兴趣网络的专题推荐方法的流程示意图;1 is a schematic flowchart of a topic recommendation method based on a deep interest network according to an embodiment of the present invention;
图2为根据本发明实施例的深度兴趣捕获模型的结构示意图;2 is a schematic structural diagram of a deep interest capture model according to an embodiment of the present invention;
图3为根据本发明实施例的基于深度兴趣网络的专题推荐装置的方框示意图。FIG. 3 is a schematic block diagram of a topic recommendation apparatus based on a deep interest network according to an embodiment of the present invention.
具体实施方式Detailed ways
下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本发明,而不能理解为对本发明的限制。The following describes in detail the embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary, and are intended to explain the present invention and should not be construed as limiting the present invention.
相关技术中,对专题进行推荐时,对于专题对应的标签依赖性强,为了提高专题推荐的准确性,必然需要耗费大量的人力物力来建立高质量的标签;根据本发明实施例的基于深度兴趣网络的专题推荐方法,首先,获取用户信息和用户的历史点击数据,并根据所述用户信息和所述历史点击数据生成训练数据;接着,根据所述训练数据进行模型训练,以得到深度兴趣捕获模型;然后,获取物品对应的物品信息,并将所述物品信息输入到所述深度兴趣捕获模型,以通过所述深度兴趣捕获模型输出对应的物品向量,以及根据每个物品对应的物品向量计算专题向量;接着,获取用户的待分析点击数据,并将所述待分析点击数据输入到所述深度兴趣捕获模型,以通过所述深度兴趣捕获模型输出对应的用户向量;然后,根据所述用户向量和所述专题向量进行相似性检索,并根据检索结果确定专题推荐列表,以及将所述专题推荐列表推送给该用户;从而实现在无需建立专题对应的标签的前提下,准确地对用户进行专题推荐,降低专题推荐过程中所需耗费的人力和物力。In the related art, when a topic is recommended, there is a strong dependence on the tags corresponding to the topic. In order to improve the accuracy of the topic recommendation, it is necessary to spend a lot of manpower and material resources to establish high-quality tags; The special recommendation method of the network, first, obtain user information and user historical click data, and generate training data according to the user information and the historical click data; then, carry out model training according to the training data to obtain deep interest capture Then, obtain the item information corresponding to the item, and input the item information into the deep interest capture model, so as to output the corresponding item vector through the deep interest capture model, and calculate according to the item vector corresponding to each item thematic vector; then, obtain the click data to be analyzed of the user, and input the click data to be analyzed into the deep interest capture model, so as to output the corresponding user vector through the deep interest capture model; then, according to the user The similarity search is performed between the vector and the topic vector, and the topic recommendation list is determined according to the retrieval result, and the topic recommendation list is pushed to the user; thus, it is possible to accurately carry out the search for the user without establishing the label corresponding to the topic. Thematic recommendation reduces the manpower and material resources required in the process of thematic recommendation.
为了更好的理解上述技术方案,下面将参照附图更详细地描述本发明的示例性实施例。虽然附图中显示了本发明的示例性实施例,然而应当理解,可以以各种形式实现本发明而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本发明,并且能够将本发明的范围完整的传达给本领域的技术人员。For better understanding of the above technical solutions, exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that the present invention will be more thoroughly understood, and will fully convey the scope of the present invention to those skilled in the art.
为了更好的理解上述技术方案,下面将结合说明书附图以及具体的实施方式对上述技术方案进行详细的说明。In order to better understand the above technical solutions, the above technical solutions will be described in detail below with reference to the accompanying drawings and specific embodiments.
图1为根据本发明实施例的基于深度兴趣网络的专题推荐方法的流程示意图,如图1所示,该基于深度兴趣网络的专题推荐方法包括以下步骤:1 is a schematic flowchart of a topic recommendation method based on a deep interest network according to an embodiment of the present invention. As shown in FIG. 1 , the topic recommendation method based on a deep interest network includes the following steps:
S101,获取用户信息和用户的历史点击数据,并根据用户信息和历史点击数据生成训练数据。S101: Obtain user information and historical click data of the user, and generate training data according to the user information and historical click data.
其中,用户的历史点击数据的选择方式可以有多种。Among them, there may be various methods for selecting the historical click data of the user.
作为一种示例,用户的历史点击数据包括用户的曝光日志和点击行为日志,其中,曝光日志中记录了某个物品对某个用户当天是否曝光,而点击行为日志记载了用户的点击行 为对应的信息。As an example, the user's historical click data includes the user's exposure log and click behavior log, wherein the exposure log records whether an item is exposed to a user on that day, and the click behavior log records the corresponding click behavior of the user. information.
作为另一种示例,用户的历史点击数据包括用户每次历史点击行为对应的物品信息、时间信息和各历史点击行为之间的排序信息。即言,该用户的历史点击数据中仅包含了用户点击行为所对应的信息,而不包含曝光信息;需要说明的是,在实际场景中,由于不同页面具有不同的深度,使得不同深度的页面点击率差异较大,访问深度越深的页面往往点击率越高;因此,为了避免不同深度的页面造成点击率的差异这一影响,该历史点击数据中不包含曝光信息。As another example, the user's historical click data includes item information corresponding to each historical click behavior of the user, time information, and ranking information among various historical click behaviors. In other words, the user's historical click data only contains the information corresponding to the user's click behavior, but not the exposure information; it should be noted that, in the actual scene, because different pages have different depths, pages of different depths are The CTR varies greatly, and pages with deeper access depths tend to have higher CTRs; therefore, in order to avoid the impact of differences in CTRs caused by pages with different depths, the historical click data does not include exposure information.
其中,训练数据的设置方式可以有多种。Among them, there are various ways of setting the training data.
作为一种示例,训练数据包括离散型特征、连续型特征和序列特征;其中,离散型特征包括时间信息、用户属性信息和物品分类信息,连续型特征包括用户历史点击物品分类统计信息,序列特征包括用户历史点击行为对应的物品信息序列。具体地,离散型特征包括日期信息(例如,星期几、是否是工作日、是否为工作时段等)、物品ID、物品分类ID和物品属性ID等,需要说明的是,根据离散型特征分类较少的特性,可以通过ONE-HOT进行编码处理;连续型特征包括用户历史对不同属性的点击次数,该类特征可以不经过处理,而直接输入连续型的值;序列特征包括用户历史点击物品ID序列、历史点击分类ID序列。As an example, the training data includes discrete features, continuous features, and sequence features; wherein the discrete features include time information, user attribute information, and item classification information, the continuous features include historical user clicked item classification statistics, and sequence features Including the item information sequence corresponding to the user's historical click behavior. Specifically, the discrete features include date information (for example, the day of the week, whether it is a working day, whether it is a working period, etc.), item ID, item classification ID, and item attribute ID, etc. It should be noted that, according to the discrete feature classification, more If there are few features, it can be encoded by ONE-HOT; continuous features include the number of clicks on different attributes in the user's history, such features can be directly input continuous values without processing; sequence features include the user's historical click item ID Sequence, historical click category ID sequence.
在一些实施例中,训练数据还包括样本时间特征,其中,根据用户信息和历史点击数据生成训练数据,包括:根据用户信息和历史点击数据生成训练样本,并计算训练样本与当前时间之间的时间差值,以及判断该时间差值是否大于预设的时间阈值,以便将判断结果作为样本时间特征。In some embodiments, the training data further includes sample time features, wherein generating the training data according to the user information and the historical click data includes: generating a training sample according to the user information and the historical click data, and calculating the difference between the training sample and the current time. time difference, and judging whether the time difference is greater than a preset time threshold, so as to use the judgment result as a sample time characteristic.
可以理解,由于训练数据中包含时间信息,即上下文特征包括星期几、是否是工作日等,在模型的训练过程中,及时将训练样本进行充分打散,也会使得训练过程不够稳定;此时,由于时间因素的影响,样本会对模型产生很大的影响,样本距离测试日期越近,其所起到的作用越大;因此,增加了样本时间特征,以保证模型训练过程中的稳定性。It can be understood that since the training data contains time information, that is, the contextual features include the day of the week, whether it is a working day, etc., in the training process of the model, if the training samples are fully scattered in time, the training process will also be unstable. , due to the influence of time factors, the sample will have a great impact on the model. The closer the sample is to the test date, the greater the effect it plays; therefore, the sample time feature is added to ensure the stability of the model training process. .
在一些实施例中,根据用户信息和历史点击数据生成训练数据,包括:In some embodiments, training data is generated based on user information and historical click data, including:
统计每个物品对应的被点击次数,并根据统计结果确定每个物品对应的负样本选取概率,以及根据每个物品对应的负样本选取概率进行负样本的随机选择。可以理解,在训练过程中需要进行负样本的选取,以顺利进行模型的训练。其中,负样本的选取方式可以有多种。例如,直接以随机的方式在正样本中选取预设个数的负样本;优选地,可以通过上述方式进行每个物品对应的点击次数的统计,以确定该物品被选为负样本的概率;从而,使得越热门的物品被选为负样本的概率越大,使得最终训练得到的模型的准确率更高。The number of clicks corresponding to each item is counted, and the negative sample selection probability corresponding to each item is determined according to the statistical result, and the negative sample is randomly selected according to the negative sample selection probability corresponding to each item. It can be understood that in the training process, it is necessary to select negative samples to smoothly train the model. Among them, there are many ways to select negative samples. For example, directly selecting a preset number of negative samples from the positive samples in a random manner; preferably, the number of clicks corresponding to each item can be counted in the above-mentioned manner to determine the probability that the item is selected as a negative sample; Therefore, the more popular items have a higher probability of being selected as negative samples, which makes the final training model more accurate.
在一些实施例中,为了避免softmax函数计算量过大的问题,将序列特征设置为二分 类的形式;即言,当输入的物品ID为用户下一次点击的物品ID时,则标签为1,否则为0。具体地,假设用户点击的物品序列为[1,2,3,4,5,6],则序列特征构造如表1所示:In some embodiments, in order to avoid the problem that the calculation amount of the softmax function is too large, the sequence feature is set to the form of binary classification; that is, when the input item ID is the item ID that the user clicks next, the label is 1, 0 otherwise. Specifically, assuming that the sequence of items clicked by the user is [1, 2, 3, 4, 5, 6], the sequence feature structure is shown in Table 1:
Figure PCTCN2021099766-appb-000001
Figure PCTCN2021099766-appb-000001
表1Table 1
S102,根据训练数据进行模型训练,以得到深度兴趣捕获模型。S102, perform model training according to the training data to obtain a deep interest capturing model.
为了便于理解,以图2为例,图2为本发明一实施例的深度兴趣捕获模型的结构示意图;如图2所示,在该实施例中,将序列特征、离散型特征和连续型特征进行拼接,拼接后通过BatchNormalization层,然后输入到多层全连接层,每层全连接层后会接BatchNormalization层和Dice激活函数,最终得到用户向量。For ease of understanding, take FIG. 2 as an example, which is a schematic structural diagram of a deep interest capture model according to an embodiment of the present invention; as shown in FIG. 2 , in this embodiment, sequence features, discrete features, and continuous features are After splicing, it passes through the BatchNormalization layer, and then input to the multi-layer fully connected layer. After each layer of the fully connected layer, it will be connected to the BatchNormalization layer and the Dice activation function, and finally the user vector is obtained.
在一些实施例中,模型训练时,采用Adagrad优化器,初始学习率为0.1,学习率每50000步衰减到原来的1/2,Batch size为128。并且为了使模型训练更稳定,在Embedding层和DNN层都会加L2正则化参数,将正则损失一起加入到损失函数中进行优化。In some embodiments, during model training, the Adagrad optimizer is used, the initial learning rate is 0.1, the learning rate decays to 1/2 of the original value every 50,000 steps, and the Batch size is 128. And in order to make the model training more stable, the L2 regularization parameter will be added to the Embedding layer and the DNN layer, and the regular loss will be added to the loss function for optimization.
S103,获取物品对应的物品信息,并将物品信息输入到深度兴趣捕获模型,以通过深度兴趣捕获模型输出对应的物品向量,以及根据每个物品对应的物品向量计算专题向量。S103: Obtain item information corresponding to the item, and input the item information into the deep interest capture model, so as to output the corresponding item vector through the deep interest capture model, and calculate the thematic vector according to the item vector corresponding to each item.
可以理解,每个专题都会包含不同个数的物品,而专题是同一分类物品的集合;例如,专题为体育,则该专题对应的物品可以包括:足球、篮球、游泳等。其中,根据每个物品对应的物品向量计算专题向量的方式可以有多种;例如,在得到物品向量之后,对该专题下所有物品的物品向量进行平均池化,以将池化结果作为该专题的专题向量。It can be understood that each topic will contain a different number of items, and the topic is a collection of items of the same category; for example, if the topic is sports, the items corresponding to the topic may include: football, basketball, swimming, etc. Among them, there can be various ways to calculate the topic vector according to the item vector corresponding to each item; for example, after obtaining the item vector, average pooling is performed on the item vectors of all items under the topic, so as to use the pooling result as the topic thematic vector.
S104,获取用户的待分析点击数据,并将待分析点击数据输入到深度兴趣捕获模型, 以通过深度兴趣捕获模型输出对应的用户向量。S104: Acquire click data of the user to be analyzed, and input the click data to be analyzed into a deep interest capture model, so as to output a corresponding user vector through the deep interest capture model.
S105,根据用户向量和专题向量进行相似性检索,并根据检索结果确定专题推荐列表,以及将专题推荐列表推送给该用户。S105: Perform similarity retrieval according to the user vector and the topic vector, determine the topic recommendation list according to the retrieval result, and push the topic recommendation list to the user.
在一些实施例中,根据检索结果确定专题推荐列表,包括:根据kmeas聚类算法对专题进行聚类,以生成多个专题类别;根据检索结果生成待推荐专题列表,并根据多个专题类别和滑窗打散法对待推荐专题列表进行打散处理,以生成最终专题推荐列表。In some embodiments, determining the topic recommendation list according to the retrieval result includes: clustering topics according to the kmeas clustering algorithm to generate multiple topic categories; generating a topic list to be recommended according to the retrieval result, and The sliding window breaking method is used to break up the list of recommended topics to generate the final recommended topic list.
可以理解,在根据检索结果生成待推荐专题列表之后,该待推荐专题列表中,可能存在同一窗口下出现多个同一类别的专题,这将给用户带来不好的体验;因此,为了保障用户体验,通过滑窗打散法和专题的聚类结果对待推荐专题列表进行打散处理,以使得同一窗口下专题的类别不同,以确定最终专题推荐列表。It can be understood that after generating a list of topics to be recommended according to the search results, there may be multiple topics of the same category under the same window in the list of topics to be recommended, which will bring a bad experience to users; therefore, in order to protect users Experience, through the sliding window breaking method and the clustering results of the topic, the recommended topic list is broken up, so that the categories of topics under the same window are different to determine the final topic recommendation list.
具体地,如表2所示:Specifically, as shown in Table 2:
Figure PCTCN2021099766-appb-000002
Figure PCTCN2021099766-appb-000002
表2Table 2
如表2所示,假如用户001得到的专题排序为1|2|3|4|5|6|7|8|9,专题所属类别序列为A|A|A|B|C|B|B|D|D。假设滑动窗口的大小为3,表示相邻3个位置放置的专题类别不重复。则:As shown in Table 2, if the topic sequence obtained by user 001 is 1|2|3|4|5|6|7|8|9, the category sequence of the topic is A|A|A|B|C|B|B |D|D. Assuming that the size of the sliding window is 3, it means that the thematic categories placed in three adjacent positions do not overlap. but:
第一步,第一个滑窗内的类别为A,A,A,位置索引从0开始计数,则需要将第1和第2位置的列表进行打散处理,从第3个位置往后遍历,第一个不同类别为B,则将第1个位置的A和第3个位置的B进行交换,则专题类别序列变为A|B|A|A|C|B|B|D|D,然后将专题id序列第1和第3位置进行交换。In the first step, the categories in the first sliding window are A, A, A, and the position index starts from 0. You need to break up the lists in the first and second positions, and traverse from the third position back. , the first different category is B, then the A in the first position and the B in the third position are exchanged, and the thematic category sequence becomes A|B|A|A|C|B|B|D|D , and then swap the 1st and 3rd positions of the topic id sequence.
第二步,第一个滑窗内的类别变为A,B,A,则第2个位置需要处理,从第4个位置开始往后遍历,第一个不同类目为第4个位置的C,所以将位置2和位置4进行交换,则专题类目id序列变为A|B|C|A|A|B|B|D|D,专题Id序列为1|4|5|2|3|6|7|8|9。In the second step, the categories in the first sliding window become A, B, A, then the second position needs to be processed, starting from the fourth position and traversing backwards, the first different category is the fourth position. C, so if position 2 and position 4 are exchanged, the thematic category id sequence becomes A|B|C|A|A|B|B|D|D, and the thematic Id sequence is 1|4|5|2| 3|6|7|8|9.
第三步,第二个滑窗内的类别序列为B,C,A,则不需要处理,窗口继续向前滑动,第三个窗口为C,A,A,则第4个位置的A需要处理,和位置5进行交换,专题类目ID序列变成A|B|C|A|B|A|B|D|D想,专题id序列变成1|4|5|2|6|3|7|8|9,以此类推,直到序列完成 或者已打散长度达到阈值。In the third step, the category sequence in the second sliding window is B, C, A, then no processing is required, the window continues to slide forward, and the third window is C, A, A, then A in the fourth position needs to be Processing, exchange with position 5, the thematic category ID sequence becomes A|B|C|A|B|A|B|D|D, and the theme id sequence becomes 1|4|5|2|6|3 |7|8|9, and so on, until the sequence is complete or the fragmented length reaches the threshold.
综上所述,根据本发明实施例的基于深度兴趣网络的专题推荐方法,首先,获取用户信息和用户的历史点击数据,并根据所述用户信息和所述历史点击数据生成训练数据;接着,根据所述训练数据进行模型训练,以得到深度兴趣捕获模型;然后,获取物品对应的物品信息,并将所述物品信息输入到所述深度兴趣捕获模型,以通过所述深度兴趣捕获模型输出对应的物品向量,以及根据每个物品对应的物品向量计算专题向量;接着,获取用户的待分析点击数据,并将所述待分析点击数据输入到所述深度兴趣捕获模型,以通过所述深度兴趣捕获模型输出对应的用户向量;然后,根据所述用户向量和所述专题向量进行相似性检索,并根据检索结果确定专题推荐列表,以及将所述专题推荐列表推送给该用户;从而实现在无需建立专题对应的标签的前提下,准确地对用户进行专题推荐,降低专题推荐过程中所需耗费的人力和物力。To sum up, according to the topic recommendation method based on the deep interest network according to the embodiment of the present invention, first, user information and historical click data of the user are obtained, and training data is generated according to the user information and the historical click data; then, Perform model training according to the training data to obtain a deep interest capture model; then, obtain item information corresponding to the item, and input the item information into the deep interest capture model, so as to output the corresponding deep interest capture model through the deep interest capture model and calculate the thematic vector according to the item vector corresponding to each item; then, obtain the click data to be analyzed of the user, and input the click data to be analyzed into the deep interest capture model, so as to pass the deep interest The user vector corresponding to the capture model output; then, similarity retrieval is performed according to the user vector and the topic vector, and a topic recommendation list is determined according to the retrieval result, and the topic recommendation list is pushed to the user; Under the premise of establishing labels corresponding to topics, users can accurately recommend topics to reduce the manpower and material resources required in the process of topic recommendation.
为了实现上述实施例,本发明实施例提出了一种计算机可读存储介质,其上存储有基于深度兴趣网络的专题推荐程序,该基于深度兴趣网络的专题推荐程序被处理器执行时实现如上述的基于深度兴趣网络的专题推荐方法。In order to realize the above-mentioned embodiments, the embodiments of the present invention provide a computer-readable storage medium on which a topic recommendation program based on a deep interest network is stored, and when the topic recommendation program based on a deep interest network is executed by a processor, the above-mentioned The topic recommendation method based on deep interest network.
根据本发明实施例的计算机可读存储介质,通过存储基于深度兴趣网络的专题推荐程序,以使得处理器在执行该基于深度兴趣网络的专题推荐程序时,实现如上述的基于深度兴趣网络的专题推荐方法,从而实现在无需建立专题对应的标签的前提下,准确地对用户进行专题推荐,降低专题推荐过程中所需耗费的人力和物力。According to the computer-readable storage medium of the embodiment of the present invention, by storing the topic recommendation program based on the deep interest network, the processor implements the above-mentioned topic recommendation program based on the deep interest network when executing the topic recommendation program based on the deep interest network The recommendation method can be used to accurately recommend topics to users without establishing labels corresponding to topics, thereby reducing the manpower and material resources required in the process of topic recommendation.
为了实现上述实施例,本发明实施例提出了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时,实现如上述的基于深度兴趣网络的专题推荐方法。In order to implement the above embodiments, the embodiments of the present invention provide a computer device, including a memory, a processor, and a computer program stored in the memory and running on the processor. When the processor executes the program, the processor implements the following The above-mentioned topic recommendation method based on deep interest network.
根据本发明实施例的计算机设备,通过存储器对基于深度兴趣网络的专题推荐程序进行存储,以使得处理器在执行该基于深度兴趣网络的专题推荐程序时,实现如上述的基于深度兴趣网络的专题推荐方法,从而实现在无需建立专题对应的标签的前提下,准确地对用户进行专题推荐,降低专题推荐过程中所需耗费的人力和物力。According to the computer device of the embodiment of the present invention, the deep interest network-based topic recommendation program is stored in the memory, so that when the processor executes the deep interest network-based topic recommendation program, the above-mentioned deep interest network-based topic recommendation program is implemented. The recommendation method can be used to accurately recommend topics to users without establishing labels corresponding to topics, thereby reducing the manpower and material resources required in the process of topic recommendation.
为了实现上述实施例,本发明实施例提出了一种基于深度兴趣网络的专题推荐装置,如图3所示,该基于深度兴趣网络的专题推荐装置包括:获取模块10、训练模块20、兴趣捕获模块30和推荐模块40。In order to realize the above embodiment, the embodiment of the present invention proposes a topic recommendation device based on a deep interest network. As shown in FIG. 3 , the topic recommendation device based on a deep interest network includes: an acquisition module 10, a training module 20, an interest capture module module 30 and recommendation module 40 .
其中,获取模块10用于获取用户信息和用户的历史点击数据,并根据用户信息和历史点击数据生成训练数据;Wherein, the acquisition module 10 is used to acquire user information and historical click data of the user, and generate training data according to the user information and historical click data;
训练模块20用于根据训练数据进行模型训练,以得到深度兴趣捕获模型;The training module 20 is used for model training according to the training data to obtain a deep interest capture model;
兴趣捕获模块30用于获取物品对应的物品信息,并将物品信息输入到深度兴趣捕获模 型,以通过深度兴趣捕获模型输出对应的物品向量,以及根据每个物品对应的物品向量计算专题向量;The interest capture module 30 is used to obtain the article information corresponding to the article, and the article information is input into the deep interest capture model, to output the corresponding article vector by the deep interest capture model, and calculate the thematic vector according to the article vector corresponding to each article;
兴趣捕获模块30还用于获取用户的待分析点击数据,并将待分析点击数据输入到深度兴趣捕获模型,以通过深度兴趣捕获模型输出对应的用户向量;The interest capture module 30 is further configured to obtain the click data to be analyzed of the user, and input the click data to be analyzed into the deep interest capture model, so as to output the corresponding user vector through the deep interest capture model;
推荐模块40用于根据用户向量和专题向量进行相似性检索,并根据检索结果确定专题推荐列表,以及将专题推荐列表推送给该用户。The recommendation module 40 is configured to perform similarity retrieval according to the user vector and the topic vector, determine the topic recommendation list according to the retrieval result, and push the topic recommendation list to the user.
在一些实施例中,用户的历史点击数据包括用户每次历史点击行为对应的物品信息、时间信息和各历史点击行为之间的排序信息。In some embodiments, the user's historical click data includes item information corresponding to each historical click behavior of the user, time information, and ranking information among various historical click behaviors.
需要说明的是,上述关于图1中基于深度兴趣网络的专题推荐方法的描述同样适用于该基于深度兴趣网络的专题推荐装置,在此不做赘述。It should be noted that, the above description about the topic recommendation method based on the deep interest network in FIG. 1 is also applicable to the topic recommendation apparatus based on the deep interest network, and will not be repeated here.
综上所述,根据本发明实施例的基于深度兴趣网络的专题推荐装置,通过设置获取模块用于获取用户信息和用户的历史点击数据,并根据所述用户信息和所述历史点击数据生成训练数据;训练模块用于根据所述训练数据进行模型训练,以得到深度兴趣捕获模型;兴趣捕获模块用于获取物品对应的物品信息,并将所述物品信息输入到所述深度兴趣捕获模型,以通过所述深度兴趣捕获模型输出对应的物品向量,以及根据每个物品对应的物品向量计算专题向量;兴趣捕获模块还用于获取用户的待分析点击数据,并将所述待分析点击数据输入到所述深度兴趣捕获模型,以通过所述深度兴趣捕获模型输出对应的用户向量;推荐模块用于根据所述用户向量和所述专题向量进行相似性检索,并根据检索结果确定专题推荐列表,以及将所述专题推荐列表推送给该用户;从而实现在无需建立专题对应的标签的前提下,准确地对用户进行专题推荐,降低专题推荐过程中所需耗费的人力和物力。To sum up, according to the special topic recommendation device based on the deep interest network according to the embodiment of the present invention, an acquisition module is set to acquire user information and historical click data of users, and to generate training according to the user information and the historical click data The training module is used to perform model training according to the training data to obtain a deep interest capture model; the interest capture module is used to obtain the item information corresponding to the item, and input the item information into the deep interest capture model to obtain a deep interest capture model. The corresponding item vector is output through the deep interest capture model, and the thematic vector is calculated according to the item vector corresponding to each item; the interest capture module is also used to obtain the user's click data to be analyzed, and input the to-be-analyzed click data into the The deep interest capture model is used to output the corresponding user vector through the deep interest capture model; the recommendation module is configured to perform similarity retrieval according to the user vector and the topic vector, and determine a topic recommendation list according to the retrieval result, and The topic recommendation list is pushed to the user; thus, the topic recommendation can be accurately performed to the user without establishing a label corresponding to the topic, and the manpower and material resources required in the topic recommendation process are reduced.
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式 工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.
应当注意的是,在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的部件或步骤。位于部件之前的单词“一”或“一个”不排除存在多个这样的部件。本发明可以借助于包括有若干不同部件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。It should be noted that, in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not preclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several different components and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. do not denote any order. These words can be interpreted as names.
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。Although preferred embodiments of the present invention have been described, additional changes and modifications to these embodiments may occur to those skilled in the art once the basic inventive concepts are known. Therefore, the appended claims are intended to be construed to include the preferred embodiment and all changes and modifications that fall within the scope of the present invention.
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit and scope of the invention. Thus, provided that these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include these modifications and variations.
在本发明的描述中,需要理解的是,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本发明的描述中,“多个”的含义是两个或两个以上,除非另有明确具体的限定。In the description of the present invention, it should be understood that the terms "first" and "second" are only used for description purposes, and cannot be interpreted as indicating or implying relative importance or the number of indicated technical features. Thus, a feature defined as "first" or "second" may expressly or implicitly include one or more of that feature. In the description of the present invention, "plurality" means two or more, unless otherwise expressly and specifically defined.
在本发明中,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”、“固定”等术语应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或成一体;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通或两个元件的相互作用关系。对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在本发明中的具体含义。In the present invention, unless otherwise expressly specified and limited, the terms "installed", "connected", "connected", "fixed" and other terms should be understood in a broad sense, for example, it may be a fixed connection or a detachable connection , or integrated; it can be a mechanical connection or an electrical connection; it can be a direct connection or an indirect connection through an intermediate medium, and it can be the internal connection of the two elements or the interaction relationship between the two elements. For those of ordinary skill in the art, the specific meanings of the above terms in the present invention can be understood according to specific situations.
在本发明中,除非另有明确的规定和限定,第一特征在第二特征“上”或“下”可以是第一和第二特征直接接触,或第一和第二特征通过中间媒介间接接触。而且,第一特征在第二特征“之上”、“上方”和“上面”可是第一特征在第二特征正上方或斜上方,或仅仅表示第一特征水平高度高于第二特征。第一特征在第二特征“之下”、“下方”和“下面”可以是第一特征在第二特征正下方或斜下方,或仅仅表示第一特征水平 高度小于第二特征。In the present invention, unless otherwise expressly specified and limited, a first feature "on" or "under" a second feature may be in direct contact between the first and second features, or the first and second features indirectly through an intermediary touch. Also, the first feature being "above", "over" and "above" the second feature may mean that the first feature is directly above or obliquely above the second feature, or simply means that the first feature is level higher than the second feature. A first feature "below", "below" and "below" a second feature may mean that the first feature is directly below or diagonally below the second feature, or simply means that the first feature has a lower level than the second feature.
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不应理解为必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, description with reference to the terms "one embodiment," "some embodiments," "example," "specific example," or "some examples", etc., mean specific features described in connection with the embodiment or example , structure, material or feature is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms should not be construed as necessarily referring to the same embodiment or example. Furthermore, the particular features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, those skilled in the art may combine and combine the different embodiments or examples described in this specification, as well as the features of the different embodiments or examples, without conflicting each other.
尽管上面已经示出和描述了本发明的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本发明的限制,本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。Although the embodiments of the present invention have been shown and described above, it should be understood that the above embodiments are exemplary and should not be construed as limiting the present invention. Embodiments are subject to variations, modifications, substitutions and variations.

Claims (10)

  1. 一种基于深度兴趣网络的专题推荐方法,其特征在于,包括以下步骤:A topic recommendation method based on deep interest network, characterized in that it includes the following steps:
    获取用户信息和用户的历史点击数据,并根据所述用户信息和所述历史点击数据生成训练数据;Obtain user information and historical click data of the user, and generate training data according to the user information and the historical click data;
    根据所述训练数据进行模型训练,以得到深度兴趣捕获模型;Perform model training according to the training data to obtain a deep interest capture model;
    获取物品对应的物品信息,并将所述物品信息输入到所述深度兴趣捕获模型,以通过所述深度兴趣捕获模型输出对应的物品向量,以及根据每个物品对应的物品向量计算专题向量;Obtain the item information corresponding to the item, and input the item information into the deep interest capture model, so as to output the corresponding item vector through the deep interest capture model, and calculate the thematic vector according to the item vector corresponding to each item;
    获取用户的待分析点击数据,并将所述待分析点击数据输入到所述深度兴趣捕获模型,以通过所述深度兴趣捕获模型输出对应的用户向量;Acquiring the click data to be analyzed of the user, and inputting the click data to be analyzed into the deep interest capture model, so as to output the corresponding user vector through the deep interest capture model;
    根据所述用户向量和所述专题向量进行相似性检索,并根据检索结果确定专题推荐列表,以及将所述专题推荐列表推送给该用户。Similarity retrieval is performed according to the user vector and the topic vector, a topic recommendation list is determined according to the retrieval result, and the topic recommendation list is pushed to the user.
  2. 如权利要求1所述的基于深度兴趣网络的专题推荐方法,其特征在于,所述用户的历史点击数据包括用户每次历史点击行为对应的物品信息、时间信息和各历史点击行为之间的排序信息。The topic recommendation method based on the deep interest network according to claim 1, wherein the user's historical click data includes item information corresponding to each historical click behavior of the user, time information and the ranking among the historical click behaviors information.
  3. 如权利要求1所述的基于深度兴趣网络的专题推荐方法,其特征在于,所述训练数据包括离散型特征、连续型特征和序列特征;The topic recommendation method based on a deep interest network according to claim 1, wherein the training data includes discrete features, continuous features and sequence features;
    其中,所述离散型特征包括时间信息、用户属性信息和物品分类信息,所述连续型特征包括用户历史点击物品分类统计信息,所述序列特征包括用户历史点击行为对应的物品信息序列。Wherein, the discrete features include time information, user attribute information and item classification information, the continuous features include historical user clicked item classification statistics, and the sequence features include an item information sequence corresponding to the user's historical click behavior.
  4. 如权利要求3所述的基于深度兴趣网络的专题推荐方法,其特征在于,所述训练数据还包括样本时间特征,其中,根据所述用户信息和所述历史点击数据生成训练数据,包括:The thematic recommendation method based on the deep interest network according to claim 3, wherein the training data further includes sample time characteristics, wherein generating the training data according to the user information and the historical click data includes:
    根据所述用户信息和所述历史点击数据生成训练样本,并计算所述训练样本与当前时间之间的时间差值,以及判断该时间差值是否大于预设的时间阈值,以便将判断结果作为样本时间特征。Generate training samples according to the user information and the historical click data, calculate the time difference between the training samples and the current time, and judge whether the time difference is greater than a preset time threshold, so as to use the judgment result as Sample time characteristics.
  5. 如权利要求1-4中任一项所述的基于深度兴趣网络的专题推荐方法,其特征在于,根据所述用户信息和所述历史点击数据生成训练数据,包括:The thematic recommendation method based on a deep interest network according to any one of claims 1-4, wherein generating training data according to the user information and the historical click data, comprising:
    统计每个物品对应的被点击次数,并根据统计结果确定每个物品对应的负样本选取概率,以及根据每个物品对应的负样本选取概率进行负样本的随机选择。The number of clicks corresponding to each item is counted, and the negative sample selection probability corresponding to each item is determined according to the statistical result, and the negative sample is randomly selected according to the negative sample selection probability corresponding to each item.
  6. 如权利要求1-4中任一项所述的基于深度兴趣网络的专题推荐方法,其特征在于,根据检索结果确定专题推荐列表,包括:The topic recommendation method based on the deep interest network according to any one of claims 1-4, wherein determining a topic recommendation list according to a retrieval result, comprising:
    根据kmeas聚类算法对专题进行聚类,以生成多个专题类别;Cluster the topics according to the kmeas clustering algorithm to generate multiple topic categories;
    根据检索结果生成待推荐专题列表,并根据所述多个专题类别和滑窗打散法对所述待推荐专题列表进行打散处理,以生成最终专题推荐列表。A list of topics to be recommended is generated according to the retrieval result, and the list of topics to be recommended is broken up according to the plurality of topic categories and the sliding window breaking method, so as to generate a final recommendation list of topics.
  7. 一种计算机可读存储介质,其特征在于,其上存储有基于深度兴趣网络的专题推荐程序,该基于深度兴趣网络的专题推荐程序被处理器执行时实现如权利要求1-6中任一项所述的基于深度兴趣网络的专题推荐方法。A computer-readable storage medium, characterized in that a topic recommendation program based on a deep interest network is stored thereon, and when the topic recommendation program based on a deep interest network is executed by a processor, any one of claims 1-6 is implemented The described topic recommendation method based on deep interest network.
  8. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时,实现如权利要求1-6中任一项所述的基于深度兴趣网络的专题推荐方法。A computer device, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that, when the processor executes the program, any one of claims 1-6 is implemented. The topic recommendation method based on deep interest network described in item.
  9. 一种基于深度兴趣网络的专题推荐装置,其特征在于,包括:A special topic recommendation device based on a deep interest network, characterized in that it includes:
    获取模块,所述获取模块用于获取用户信息和用户的历史点击数据,并根据所述用户信息和所述历史点击数据生成训练数据;an acquisition module, which is used to acquire user information and historical click data of the user, and generate training data according to the user information and the historical click data;
    训练模块,所述训练模块用于根据所述训练数据进行模型训练,以得到深度兴趣捕获模型;a training module, which is used for model training according to the training data to obtain a deep interest capture model;
    兴趣捕获模块,所述兴趣捕获模块用于获取物品对应的物品信息,并将所述物品信息输入到所述深度兴趣捕获模型,以通过所述深度兴趣捕获模型输出对应的物品向量,以及根据每个物品对应的物品向量计算专题向量;An interest capture module, the interest capture module is used to obtain the item information corresponding to the item, and input the item information into the deep interest capture model, so as to output the corresponding item vector through the deep interest capture model, and according to each The item vector corresponding to each item calculates the thematic vector;
    所述兴趣捕获模块还用于获取用户的待分析点击数据,并将所述待分析点击数据输入到所述深度兴趣捕获模型,以通过所述深度兴趣捕获模型输出对应的用户向量;The interest capture module is further configured to acquire click data to be analyzed of the user, and input the click data to be analyzed into the deep interest capture model, so as to output the corresponding user vector through the deep interest capture model;
    推荐模块,所述推荐模块用于根据所述用户向量和所述专题向量进行相似性检索,并根据检索结果确定专题推荐列表,以及将所述专题推荐列表推送给该用户。A recommendation module, which is configured to perform similarity retrieval according to the user vector and the topic vector, determine a topic recommendation list according to the retrieval result, and push the topic recommendation list to the user.
  10. 如权利要求9所述基于深度兴趣网络的专题推荐装置,其特征在于,所述用户的历史点击数据包括用户每次历史点击行为对应的物品信息、时间信息和各历史点击行为之间的排序信息。The topic recommendation device based on the deep interest network according to claim 9, wherein the user's historical click data includes item information, time information and ranking information between each historical click behavior corresponding to each historical click behavior of the user .
PCT/CN2021/099766 2021-01-15 2021-06-11 Deep interest network-based topic recommendation method and apparatus WO2022151649A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110054841.0A CN112800097A (en) 2021-01-15 2021-01-15 Special topic recommendation method and device based on deep interest network
CN202110054841.0 2021-01-15

Publications (1)

Publication Number Publication Date
WO2022151649A1 true WO2022151649A1 (en) 2022-07-21

Family

ID=75809601

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/099766 WO2022151649A1 (en) 2021-01-15 2021-06-11 Deep interest network-based topic recommendation method and apparatus

Country Status (2)

Country Link
CN (2) CN113688167A (en)
WO (1) WO2022151649A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115587261A (en) * 2022-12-09 2023-01-10 思创数码科技股份有限公司 Government affair resource catalog recommendation method and system
CN115828107A (en) * 2023-01-09 2023-03-21 深圳市云积分科技有限公司 Model training method and device based on offline environment
CN115952359A (en) * 2023-03-10 2023-04-11 特斯联科技集团有限公司 Recommendation system recall method and device, electronic equipment and storage medium
CN116385048A (en) * 2023-06-06 2023-07-04 山东政信大数据科技有限责任公司 Intelligent marketing method and system for agricultural products
CN116521908A (en) * 2023-06-28 2023-08-01 图林科技(深圳)有限公司 Multimedia content personalized recommendation method based on artificial intelligence
CN117493677A (en) * 2023-11-10 2024-02-02 成达文化科技(广州)有限公司 Personalized search information recommendation system and method based on user portraits

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688167A (en) * 2021-01-15 2021-11-23 稿定(厦门)科技有限公司 Deep interest capture model construction method and device based on deep interest network
CN113657975B (en) * 2021-09-03 2024-03-26 西安稻叶山供应链管理有限公司 Marketing method and system based on Internet E-commerce live broadcast platform
CN114119142A (en) * 2021-11-11 2022-03-01 北京沃东天骏信息技术有限公司 Information recommendation method, device and system
CN114218476B (en) * 2021-11-12 2022-10-04 深圳前海鹏影数字软件运营有限公司 Content recommendation method and device and terminal equipment
CN114187036B (en) * 2021-11-30 2022-10-11 深圳市喂车科技有限公司 Internet advertisement intelligent recommendation management system based on behavior characteristic recognition
CN114567811B (en) * 2022-02-28 2024-02-09 广州欢聊网络科技有限公司 Multi-modal model training method, system and related equipment for voice sequencing

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108182621A (en) * 2017-12-07 2018-06-19 合肥美的智能科技有限公司 The Method of Commodity Recommendation and device for recommending the commodity, equipment and storage medium
US20190387067A1 (en) * 2016-09-14 2019-12-19 Oath Inc. Baseline Interest Profile for Recommendations Using a Geographic Location
CN111046285A (en) * 2019-12-11 2020-04-21 拉扎斯网络科技(上海)有限公司 Recommendation sequencing determination method, device, server and storage medium
CN111310056A (en) * 2020-03-11 2020-06-19 腾讯科技(深圳)有限公司 Information recommendation method, device, equipment and storage medium based on artificial intelligence
CN111368210A (en) * 2020-05-27 2020-07-03 腾讯科技(深圳)有限公司 Information recommendation method and device based on artificial intelligence and electronic equipment
CN111767459A (en) * 2019-10-16 2020-10-13 北京京东尚科信息技术有限公司 Item recommendation method and device
CN112800097A (en) * 2021-01-15 2021-05-14 稿定(厦门)科技有限公司 Special topic recommendation method and device based on deep interest network

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11269891B2 (en) * 2014-08-21 2022-03-08 Affectomatics Ltd. Crowd-based scores for experiences from measurements of affective response
CN110162700B (en) * 2019-04-23 2024-06-25 腾讯科技(深圳)有限公司 Training method, device and equipment for information recommendation and model and storage medium
CN111125521A (en) * 2019-12-13 2020-05-08 上海喜马拉雅科技有限公司 Information recommendation method, device, equipment and storage medium
CN111651669A (en) * 2020-05-20 2020-09-11 拉扎斯网络科技(上海)有限公司 Information recommendation method and device, electronic equipment and computer-readable storage medium
CN111737578B (en) * 2020-06-22 2024-04-02 陕西师范大学 Recommendation method and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190387067A1 (en) * 2016-09-14 2019-12-19 Oath Inc. Baseline Interest Profile for Recommendations Using a Geographic Location
CN108182621A (en) * 2017-12-07 2018-06-19 合肥美的智能科技有限公司 The Method of Commodity Recommendation and device for recommending the commodity, equipment and storage medium
CN111767459A (en) * 2019-10-16 2020-10-13 北京京东尚科信息技术有限公司 Item recommendation method and device
CN111046285A (en) * 2019-12-11 2020-04-21 拉扎斯网络科技(上海)有限公司 Recommendation sequencing determination method, device, server and storage medium
CN111310056A (en) * 2020-03-11 2020-06-19 腾讯科技(深圳)有限公司 Information recommendation method, device, equipment and storage medium based on artificial intelligence
CN111368210A (en) * 2020-05-27 2020-07-03 腾讯科技(深圳)有限公司 Information recommendation method and device based on artificial intelligence and electronic equipment
CN112800097A (en) * 2021-01-15 2021-05-14 稿定(厦门)科技有限公司 Special topic recommendation method and device based on deep interest network

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115587261A (en) * 2022-12-09 2023-01-10 思创数码科技股份有限公司 Government affair resource catalog recommendation method and system
CN115587261B (en) * 2022-12-09 2023-04-07 思创数码科技股份有限公司 Government affair resource catalog recommendation method and system
CN115828107A (en) * 2023-01-09 2023-03-21 深圳市云积分科技有限公司 Model training method and device based on offline environment
CN115952359A (en) * 2023-03-10 2023-04-11 特斯联科技集团有限公司 Recommendation system recall method and device, electronic equipment and storage medium
CN116385048A (en) * 2023-06-06 2023-07-04 山东政信大数据科技有限责任公司 Intelligent marketing method and system for agricultural products
CN116385048B (en) * 2023-06-06 2023-08-22 山东政信大数据科技有限责任公司 Intelligent marketing method and system for agricultural products
CN116521908A (en) * 2023-06-28 2023-08-01 图林科技(深圳)有限公司 Multimedia content personalized recommendation method based on artificial intelligence
CN116521908B (en) * 2023-06-28 2024-01-09 图林科技(深圳)有限公司 Multimedia content personalized recommendation method based on artificial intelligence
CN117493677A (en) * 2023-11-10 2024-02-02 成达文化科技(广州)有限公司 Personalized search information recommendation system and method based on user portraits
CN117493677B (en) * 2023-11-10 2024-05-28 成达文化科技(广州)有限公司 Personalized search information recommendation system and method based on user portraits

Also Published As

Publication number Publication date
CN113688167A (en) 2021-11-23
CN112800097A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
WO2022151649A1 (en) Deep interest network-based topic recommendation method and apparatus
WO2022141861A1 (en) Emotion classification method and apparatus, electronic device, and storage medium
JP5423030B2 (en) Determining words related to a word set
WO2017121314A1 (en) Information recommendation method and apparatus
CN103258025B (en) Generate the method for co-occurrence keyword, the method that association search word is provided and system
CN106021362A (en) Query picture characteristic representation generation method and device, and picture search method and device
CN111126495B (en) Model training method, information prediction device, storage medium and equipment
CN108345601B (en) Search result ordering method and device
JP2013522720A (en) Determination of word information entropy
CN110971659A (en) Recommendation message pushing method and device and storage medium
CN112749330B (en) Information pushing method, device, computer equipment and storage medium
CN109902823B (en) Model training method and device based on generation countermeasure network
CN106599047B (en) Information pushing method and device
JP2015504564A (en) Classification of attribute data intervals
CN110059221B (en) Video recommendation method, electronic device and computer readable storage medium
JP2009163615A (en) Co-clustering device, co-clustering method, co-clustering program, and recording-medium recording co-clustering program
CN110765348B (en) Hot word recommendation method and device, electronic equipment and storage medium
WO2021244583A1 (en) Data cleaning method, apparatus and device, program, and storage medium
EP2573685A1 (en) Ranking of heterogeneous information objects
CN108804577B (en) Method for estimating interest degree of information tag
CN107391577B (en) Work label recommendation method and system based on expression vector
CN111914950B (en) Unsupervised cross-modal retrieval model training method based on depth dual variational hash
CN110737805A (en) Method and device for processing graph model data and terminal equipment
CN112199582A (en) Content recommendation method, device, equipment and medium
CN112163614A (en) Anchor classification method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21918845

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21918845

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 221123)

122 Ep: pct application non-entry in european phase

Ref document number: 21918845

Country of ref document: EP

Kind code of ref document: A1