WO2022198983A1 - Conversation recommendation method and apparatus, electronic device, and storage medium - Google Patents

Conversation recommendation method and apparatus, electronic device, and storage medium Download PDF

Info

Publication number
WO2022198983A1
WO2022198983A1 PCT/CN2021/122167 CN2021122167W WO2022198983A1 WO 2022198983 A1 WO2022198983 A1 WO 2022198983A1 CN 2021122167 W CN2021122167 W CN 2021122167W WO 2022198983 A1 WO2022198983 A1 WO 2022198983A1
Authority
WO
WIPO (PCT)
Prior art keywords
item
candidate
attribute
preference
value
Prior art date
Application number
PCT/CN2021/122167
Other languages
French (fr)
Chinese (zh)
Inventor
赵朋朋
田鑫涛
郝永静
Original Assignee
苏州大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州大学 filed Critical 苏州大学
Publication of WO2022198983A1 publication Critical patent/WO2022198983A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention relates to the field of dialogue recommendation, in particular to a dialogue recommendation method, device, electronic device and storage medium.
  • Conversational Recommender System is a recommender system that can actively obtain preference attributes from users and use the attributes to recommend items.
  • dialogue recommendation can use the preference attribute currently asked to the user for recommendation, and it can ask the user for preference attribute or recommend the item considering the historical items that the user has interacted with, but ignore the interaction between the user and the historical item
  • the impact of order on recommendation ignores the importance of the sequence of interaction history items, which makes it difficult for existing dialogue recommendation methods to efficiently and accurately recommend dialogue with users.
  • the purpose of the present invention is to provide a dialogue recommendation method, device, electronic device and storage medium, which can use the historical interaction sequence reflecting the user's historical item preference to train the recommendation network model and generate the candidate item set, thereby ensuring that the dialogue recommendation can be targeted Conduct conversations with sexual users to improve the efficiency and accuracy of conversation recommendations.
  • the present invention provides a dialogue recommendation method, including:
  • a dialog recommendation is made to the user using the action decision.
  • the performing a dialogue recommendation to the user by using the action decision includes:
  • the attribute of the candidate item with the largest predicted preference value is sent to the user terminal, and the feedback data is received;
  • the updating the candidate item set using the preference item attribute includes:
  • a preference item that does not have the preference item attribute in the candidate item set is removed.
  • generating a candidate attribute set using the interactive predicted value and the candidate item, and calculating the preference predicted value of each candidate item attribute in the candidate attribute set using the interactive predicted value includes:
  • the present invention also provides a dialogue recommendation device, comprising:
  • a candidate item set generation module configured to generate a candidate item set by using the item preference value and the items that the user has not interacted with
  • a first calculation module configured to update the candidate item set using the preference item attribute when receiving the preference item attribute sent by the user, and use the item correlation value to calculate each candidate item in the updated candidate item set
  • a second calculation module configured to generate a candidate attribute set using the interactive predicted value and the candidate item, and use the interactive predicted value to calculate the preference predicted value of each candidate item attribute in the candidate attribute set;
  • the dialogue recommendation module is used for inputting the calculated candidate item set and candidate attribute set into the policy network for reinforcement learning, and performing dialogue recommendation to the user.
  • the present invention also provides an electronic device, comprising:
  • the processor is configured to implement the above-mentioned dialogue recommendation method when executing the computer program.
  • the present invention further provides a storage medium, where computer-executable instructions are stored in the storage medium, and when the computer-executable instructions are loaded and executed by a processor, the above-mentioned dialog recommendation method is implemented.
  • the present invention provides a dialogue recommendation method, comprising: acquiring a historical interaction sequence between a user and an item, and inputting the historical interaction sequence into a recommendation network model including an embedding layer, a self-attention block and a prediction layer for training, and generating an item preference value; wherein the item includes an item attribute; a candidate item set is generated using the item preference value and an item that the user has not interacted with; when the preference item attribute sent by the user is received, the preference item is used Item attributes update the candidate item set, and use the item correlation value to calculate the interactive prediction value of each candidate item in the updated candidate item set; use the interactive prediction value and the candidate item to generate a candidate attribute set, and use the
  • the interaction prediction value is used to calculate the preference prediction value of each candidate item attribute in the candidate attribute set; the calculated candidate item set and candidate attribute set are input into the strategy network for reinforcement learning, and dialogue recommendation is made to the user.
  • this method first obtains the historical interaction sequence between the user and the item. Since the sequence reflects the user's preference for historical items and the interaction sequence, this method uses the historical interaction sequence to participate in the training of the recommendation network model, which can ensure the use of the recommendation network.
  • the candidate item set generated by the model also considers the user's preference for historical items and the interaction order, and can generate a candidate item set that is more in line with the user's preference, thereby ensuring that the candidate item set can be used in dialogue recommendation to conduct targeted dialogues with users. It can effectively reduce the dialogue rounds of dialogue recommendation, and at the same time can improve the accuracy of dialogue recommendation.
  • the present invention also provides a dialogue recommendation device, an electronic device and a storage medium, which have the above beneficial effects.
  • FIG. 1 is a flowchart of a dialog recommendation method provided by an embodiment of the present invention
  • FIG. 3 is a structural block diagram of a dialogue recommendation system based on historical interaction sequences provided by an embodiment of the present invention.
  • Conversational Recommender System is a recommender system that can actively obtain preference attributes from users and use the attributes to recommend items.
  • the dialogue recommendation can only be recommended by using the preference attributes currently inquired to the user, and it is difficult to consider the user's historical item preferences. It is difficult to make conversational recommendations with users efficiently and accurately.
  • the present invention provides a dialogue recommendation method, which can use the historical interaction sequence reflecting the user's historical item preference to train the recommendation network model and generate the candidate item set, thereby ensuring that the dialogue recommendation can conduct dialogue with the user in a targeted manner and improve the dialogue Recommended efficiency and accuracy. Please refer to FIG. 1.
  • FIG. 1 is a flowchart of a dialog recommendation method provided by an embodiment of the present invention. The method may include:
  • S101 Obtain a historical interaction sequence between a user and an item, and input the historical interaction sequence into a recommendation network model including an embedding layer, a self-attention block, and a prediction layer for training to generate an item preference value; the item includes an item attribute.
  • a recommendation network model including an embedding layer, a self-attention block, and a prediction layer for training to generate an item preference value; the item includes an item attribute.
  • the historical interaction sequence is the record of the user's interaction with the item in the past, and the items in the historical interaction sequence are sorted according to the time sequence of the interaction between the user and the item, and the interaction can be click to view, favorite, purchase, etc. Since the historical interaction sequence includes the user's historical preference for items and the user's interaction sequence for historical items, the embodiment of the present invention integrates the historical interaction sequence into the training of the recommendation network model, which can ensure that the recommendation network model can simultaneously integrate the user's historical preferences And the interactive sequence characteristics of historical items, which can ensure that the dialogue recommendation can conduct targeted dialogues according to the user's historical preferences, which can effectively improve the efficiency and accuracy of dialogue recommendation. It should be noted that the embodiments of the present invention do not limit a specific item.
  • the item is a specific item, for example, a physical item such as a book, or a virtual item such as a movie.
  • This embodiment of the present invention also does not limit the specific item attributes.
  • the item attributes reflect a certain feature of the item.
  • the item attribute can be the type of the book, such as novels, biographies, and textbooks. etc., it can also be an item attribute indicating whether it is a popular book, or it can be other item attributes related to books.
  • This embodiment of the present invention also does not limit the number of item attributes that an item can contain, and an item can have one or more item attributes.
  • the recommendation network model used in the embodiment of the present invention is based on a deep learning neural network.
  • the embodiments of the present invention do not limit the specific structures and learning methods of the embedding layer, the self-attention block, and the prediction layer in the recommendation network model, and users may refer to related technologies of deep learning neural networks.
  • the embodiment of the present invention does not limit the number of items that can be included in the historical interaction sequence, and one or more items can be included in the historical interaction sequence. Since the length of the historical interaction sequence of different users is different, in order to facilitate the training of the recommendation network model, the historical interaction sequence can be converted into a training sequence of preset length. It is understandable that when the number of items contained in the historical interaction sequence is too small, it will be difficult to calculate a reliable preference for the user, so a minimum number of items can be set for the historical interaction sequence, when the number of items contained in the historical interaction sequence is less than the minimum number of items. , the network model training will not be performed.
  • the embodiments of the present invention do not limit the specific value of the minimum number of items, which can be set by the user according to actual application requirements.
  • the minimum number of items may be 10. It should be noted that the embodiment of the present invention does not limit the specific value of the preset length, which can be set according to actual application requirements.
  • the historical interaction sequence is input into the recommendation network model including the embedding layer, the self-attention block and the prediction layer for training, and the process of generating the item preference value may include:
  • Step 11 Use the historical interaction sequence to generate a training sequence of preset length.
  • Step 12 Input all items and training sequences into the embedding layer, and output the integrated embedding matrix.
  • ⁇ d is obtained.
  • the training sequence embedding matrix E is integrated with a learnable position embedding matrix P ⁇ Rn ⁇ d to obtain the integrated embedding matrix:
  • Step 13 Input the integrated embedding matrix into the self-attention block for iterative feature learning to generate a learning matrix.
  • the embodiment of the present invention does not limit the specific learning method of the self-attention block, and the user may refer to the related technology of self-attention.
  • the embodiments of the present invention also do not limit the specific structure of the self-attention block.
  • the self-attention block includes a self-attention layer and a point-to-point feedforward network.
  • the present invention also does not limit whether to use multiple self-attention blocks for stacking computation. It can be understood that the more layers are stacked, the more features the self-attention block can learn.
  • the self-attention layer has three matrices Q(Query, query), K(Key, key), V(Value, value), all of which come from the same input into the self-attention block .
  • the dot product between Q and K will be calculated first. In order to prevent the dot product result from being too large, it will be divided by a scale. where d is the dimension of the query and key vectors. Finally, the calculation result is normalized and converted into a probability distribution through a softmax operation, and multiplied by the matrix V to obtain the representation of the weight sum. Attention is defined by a scalar dot product as:
  • the self-attention layer learns the complete training sequence. For this, a mask operation can be used to block the connection between Qi and Kj in the self-attention layer (i ⁇ j ). After the self-attention layer outputs the learning result Sk based on the first k items, a point-to-two-layer feedforward network can be used to convert the self-attention layer from a linear model to a nonlinear model. Meanwhile, in order to learn more complex item transformations, iterative feature learning can be performed by stacking self-attention blocks. The embodiment of the present invention does not limit the specific number of stacked layers, which can be set according to actual application requirements. For the bth (b>1) self-attention block, it can be defined as:
  • SA Self Attention
  • FFN Field Forward Network
  • RELU Rectified Linear Unit
  • W Q , W K , W V , W 1 , W 2 ⁇ R d ⁇ d are all learnable matrices
  • b 1 , b 2 are d-dimensional vectors.
  • Multi-layer neural network has strong feature learning ability. But simply adding more network layers can lead to problems such as overfitting and consuming more training time, because when the network becomes deeper, the hidden danger of gradient disappearance will also increase, and the model performance will decrease.
  • the above situation can be alleviated by residual connections.
  • the process of residual connection can be as follows: normalize the input x of the self-attention layer and the feedforward network, and at the same time perform a dropout (drop method) operation on the output of the self-attention layer and the feedforward network, and finally put the original The output x is added to the output after completing the dropout operation as the final output.
  • Step 14 Input the learning matrix into the prediction layer for matrix decomposition, and calculate the key value of the initial project
  • the MF layer (Matrix Factorization, matrix factorization layer) is input to perform matrix factorization to calculate the item-related value of item i to predict whether item i is a recommendable item.
  • the item-related value can be calculated by the following formula:
  • ri ,k represents the relevance of item i becoming the next item (that can be recommended) based on the first k items; N ⁇ R
  • Step 15 Use the binary cross-entropy loss function to optimize the recommendation network model until the output value of the binary cross-entropy loss function is the smallest, and set the initial item preference value when the output value is the smallest as the item preference value.
  • the embodiments of the present invention do not limit the specific process of network optimization using a binary cross entropy loss function (Binary cross entropy), and reference may be made to related technologies.
  • the binary cross-entropy loss function can be expressed as:
  • e t represents the expected output of the network at time step t
  • the item preference value can represent the user's interest preference for the item
  • the item preference value will be used as an indicator for generating a candidate item set for sorting the items. It can be understood that, in order to predict the items that the user may be interested in in the future, in this embodiment of the present invention, the items that the user has not interacted with will be used to generate a candidate item set, and the items that the user has not interacted with can be obtained by taking the total set containing all the items. It is obtained from the difference set of the historical item set corresponding to the historical interaction sequence.
  • dialog recommendation the dialog needs to be initiated by the user first sending a preferred item attribute.
  • the embodiment of the present invention does not limit the specific process for the user to send the preference item attribute.
  • the user can select from a preset item attribute list and send the selection data, or the user can send a dialogue recommendation request, and the dialogue recommendation system can send the selection data.
  • the embodiment of the present invention does not limit the specific method of using the preference item attribute and the candidate item set to generate the candidate attribute set. It can be understood that the item that does not have the preference item attribute in the candidate item set can be removed, and the candidate item can also be retained. Items with this preference item property are collected.
  • updating the candidate item set with the preference item attribute may include:
  • Step 21 Remove the items that do not have the preference item attribute in the candidate item set.
  • the following specifically describes the process of using the item correlation value to calculate the interactive prediction value of each candidate item in the updated candidate item set.
  • the set of attributes accepted by the user before the dialog starts and user denied attribute set are all empty
  • the candidate item set V cand contains items that the user has not interacted with
  • the candidate attribute set P cand is also empty.
  • the preference item attribute can be used to first remove the items that do not have the preference item attribute in the candidate item set to update the candidate item set, which can be specifically expressed as:
  • the output of the multi-layer self-attention block can be exploited
  • Calculate the interactive prediction value of each candidate item in the candidate item set It is obtained by inputting the initial training sequence V'u of the historical interaction sequence into the b-layer self-attention block.
  • the calculation process of the interactive predicted value can be expressed as:
  • item attributes belong to the item, and when users like certain items, they also have preferences for certain item attributes in these items, so after obtaining the interactive prediction value of the candidate item, the preference prediction of the candidate item attribute can also be calculated. value to determine the candidate item attributes that the user prefers.
  • the embodiment of the present invention does not limit the specific method of calculating the attribute preference prediction value of candidate items. It can be understood that multiple items can have the same item attribute.
  • the interactive prediction value of the item to which the item attribute belongs is calculated.
  • the average value of the interaction prediction value is used as the preference prediction value of the item attribute; of course, the information entropy of the candidate item attribute can also be calculated, and the information entropy can be used as the preference prediction value of the candidate item attribute, where the information entropy is A measure that removes information uncertainty. The lower the probability of something happening, the greater the information entropy it can provide when it happens.
  • the candidate item set can be efficiently filtered. Therefore, in the embodiment of the present invention, the preference prediction value of the candidate item attribute can be calculated by calculating the information entropy.
  • the amount of calculation of information entropy is large, so the candidate items can be sorted by the interactive prediction value, and the candidate attribute set can be generated by using the first preset number of candidate items with larger interactive prediction value, and finally the candidate attribute set can be sorted. Carry out information entropy calculation. It should be noted that the embodiment of the present invention does not limit the specific value of the preset number, which can be set according to actual application requirements.
  • the process of using the interactive predicted value and the candidate item to generate a candidate attribute set, and using the interactive predicted value to calculate the preference predicted value of each candidate item attribute in the candidate attribute set may include:
  • Step 31 Generate a candidate attribute set using the item attributes contained in the previous preset number of candidate items in the descending order of the interactive prediction values;
  • the candidate item set V cand is sorted by using the order of interactive prediction values from large to small, and the candidate attribute set is generated by using the first L candidate items:
  • Step 32 Calculate the information entropy of each candidate item attribute in the candidate attribute set by using the interactive prediction value of the candidate item, and use the information entropy as the preference prediction value of the candidate item attribute.
  • the information entropy of the candidate item attribute can be calculated by the following formula:
  • is the sigmoid function (activation function)
  • sv is the interactive prediction value of the candidate item V
  • Vp represents the item containing the item attribute p.
  • the embodiment of the present invention uses the method of weight entropy to arrange higher weights for important candidate items by calculating the information entropy instead of treating each item equally. To put it simply, if multiple items in the candidate item contain attribute p, it will be difficult for attribute p to filter the candidate item, and p is not a suitable attribute. , the candidate item attribute can quickly filter the candidate items.
  • the policy network is based on a deep reinforcement learning network, which can generate an action decision for the current dialogue round according to the input state and the user's historical dialogue results, and then use the action decision to recommend this round of dialogue.
  • the embodiments of the present invention do not limit the reinforcement learning process of the policy network, and reference may be made to the related technologies of the deep reinforcement learning network.
  • the embodiments of the present invention also do not limit the network optimization method of the policy network, and reference may also be made to the related technologies of the deep reinforcement learning network.
  • the calculated candidate item set and candidate attribute set are input into the policy network for reinforcement learning, and the process of recommending dialogue to the user may include:
  • Step 41 Input the candidate item set and the candidate attribute set into the policy network for reinforcement learning to generate action decisions.
  • the policy network determines whether the action recommended by the current round of dialogue is to ask or recommend through reinforcement learning.
  • the policy network involves four values, namely state, action, reward and policy.
  • the state contains the dialogue history and the length of the current candidate item set.
  • the dialogue history is encoded by s his , the size of which is the maximum round T of dialogue recommendation, and each dimension represents the user's dialogue history in the t-th round.
  • the user's conversation history can be represented by a special value. It should be noted that the present invention does not limit specific special values, which can be set according to actual application requirements.
  • the embodiment of the present invention includes two actions, ie, asking a ask and recommending a rec .
  • the middle reward rt in round t is the weighted sum of these five.
  • the embodiment of the present invention does not limit the specific setting method of the above-mentioned reward value, which can be set according to actual application requirements.
  • the policy network After inputting the current dialogue state s into the policy network, the policy network will output the action decision value Q(s, a) about the two actions, and then the action of this round of dialogue can be determined according to the action decision value.
  • the policy network uses standard deep Q-learning to optimize the network.
  • Step 42 Use action decision to make dialogue recommendation to the user.
  • the process of recommending dialogues to users by using action decisions may include:
  • Step 51 when the action decision is an item recommendation, send the candidate item with the largest interaction prediction value to the user terminal, and receive feedback data;
  • Step 52 When the feedback data indicates that the candidate item is accepted, exit the dialogue recommendation
  • Step 53 When the feedback data indicates that the candidate item is rejected, the candidate item is removed from the candidate item set, and the candidate item set after the removal is used to perform the interactive prediction of each candidate item in the updated candidate item set using the item correlation value calculation. value;
  • Step 54 when the action decision is an attribute query, send the attribute of the candidate item with the largest preference prediction value to the user terminal, and receive the feedback data;
  • Step 55 Use the feedback data to verify the items in the candidate item set, and remove the items that have not passed the verification, and finally use the removed candidate item set to calculate each candidate item in the updated candidate item set using the item-related values.
  • the candidate item attribute p i ⁇ P cand with the largest preference prediction value is selected from the candidate item attribute set P cand .
  • the candidate item can be verified by using the candidate item attribute in the feedback data.
  • the verification process is: update the attribute set received by the user and update the candidate itemset
  • the verification process is: update the attribute set rejected by the user and update the candidate itemset
  • the dialogue recommendation when the dialogue recommendation cannot always get an item acceptable to the user, the dialogue recommendation will continue for several rounds.
  • the dialogue rounds can be limited by a preset threshold, and when the dialogue round reaches the preset threshold, the dialogue recommendation is exited.
  • the action decision to make a dialogue recommendation to the user can also include:
  • Step 61 Determine whether the dialogue round corresponding to the action decision is greater than a preset threshold
  • the present invention does not limit the specific value of the preset threshold, which can be set according to actual application requirements.
  • Step 62 If yes, exit the dialogue recommendation
  • Step 63 If not, execute the step of recommending dialogue to the user by using the action decision.
  • the method first obtains the historical interaction sequence between the user and the item. Since the sequence reflects the user's preference for historical items and the interaction sequence, the method uses the historical interaction sequence to participate in the training of the recommendation network model, which can ensure The candidate item set generated by the recommendation network model also considers the user's preference for historical items and the interaction order, and can generate a candidate item set that is more in line with the user's preference, thereby ensuring that the candidate item set can be used in dialogue recommendation. Conducting a dialogue can effectively reduce the number of dialogue rounds of dialogue recommendation, and at the same time can improve the accuracy of dialogue recommendation.
  • the following describes a dialogue recommendation apparatus, electronic device, and storage medium provided by the embodiments of the present invention.
  • the dialogue recommendation apparatus, electronic equipment, and storage medium described below and the dialogue recommendation method described above may refer to each other correspondingly.
  • FIG. 2 is a structural block diagram of a dialogue recommendation apparatus provided by an embodiment of the present invention.
  • the apparatus may include:
  • the recommendation network module 201 is used to obtain the historical interaction sequence between the user and the item, and input the historical interaction sequence into the recommendation network model including the embedding layer, the self-attention block and the prediction layer for training, and generate the item preference value; wherein, project contains project properties;
  • a candidate item set generation module 202 configured to generate a candidate item set by utilizing the item preference value and items that the user has not interacted with;
  • the first calculation module 203 is configured to update the candidate item set using the preference item attribute when receiving the preference item attribute sent by the user, and use the item correlation value to calculate the interactive prediction value of each candidate item in the updated candidate item set;
  • the second calculation module 204 is configured to generate a candidate attribute set using the interactive predicted value and the candidate item, and use the interactive predicted value to calculate the preference predicted value of each candidate item attribute in the candidate attribute set;
  • the dialogue recommendation module 205 is used for inputting the calculated candidate item set and candidate attribute set into the policy network for reinforcement learning, and performing dialogue recommendation to the user.
  • the recommending network module 201 may include:
  • the self-attention block sub-module is used to input the integrated embedding matrix into the self-attention block for iterative feature learning and generate a learning matrix;
  • the prediction layer sub-module is used to input the learning matrix into the prediction layer for matrix decomposition and calculate the initial item preference value
  • the dialogue recommendation module 205 may include:
  • the reinforcement learning sub-module is used to input the candidate item set and the candidate attribute set into the policy network for reinforcement learning to generate action decisions;
  • the dialogue recommendation submodule is used to make dialogue recommendations to users using action decisions.
  • the dialogue recommendation sub-module may include:
  • a first sending unit configured to send the candidate item with the largest interaction prediction value to the user terminal when the action decision is an item recommendation, and receive feedback data;
  • a first processing unit configured to quit the dialogue recommendation when the feedback data indicates that the candidate item is accepted
  • the second processing unit is configured to remove the candidate item from the candidate item set when the feedback data indicates that the candidate item is rejected, and use the removed candidate item set to perform the calculation of each candidate item in the updated candidate item set using the item correlation value.
  • the interactive predicted value of the item is configured to remove the candidate item from the candidate item set when the feedback data indicates that the candidate item is rejected, and use the removed candidate item set to perform the calculation of each candidate item in the updated candidate item set using the item correlation value.
  • a second sending unit configured to send the item attribute with the largest preference prediction value to the user terminal when the action decision is an attribute query, and receive feedback data;
  • the third processing unit is used to use the feedback data to verify the items in the candidate item set, remove the items that fail the verification, and finally use the removed candidate item set to calculate the updated candidate item using the item related value.
  • the dialogue recommendation module 205 may further include:
  • the dialogue round judgment sub-module is used to judge whether the dialogue round corresponding to the action decision is greater than the preset threshold; if so, exit the dialogue recommendation; if not, execute the step of using the action decision to recommend the dialogue to the user.
  • the first computing module 203 includes:
  • the removal operation submodule is used to remove the preference items that do not have the preference item attribute in the candidate item set.
  • the second computing module 204 includes:
  • the candidate attribute set generation sub-module is used to generate a candidate attribute set by using the item attributes contained in the previous preset number of candidate items in the order of the interactive prediction value from large to small;
  • the second calculation submodule is used for calculating the information entropy of each candidate item attribute in the candidate attribute set by using the interactive prediction value of the candidate item, and using the information entropy as the preference prediction value of the candidate item attribute.
  • FIG. 3 is a structural block diagram of a dialogue recommendation system based on a historical interaction sequence provided by an embodiment of the present invention.
  • SeqCR Sequential Conversation Recommender
  • the Policy Network Module is used to complete the functions of the dialogue recommendation module 205 in the above embodiment, such as receiving feedback and updating preferences. ), and update candidate items and attributes (Update Candidate items and attributes) to the Sequential Module sequence module.
  • the Sequential Module sequence module is used to complete the function of the recommended network module 201 in the above embodiment, including the Embedding Layer, Self-attention Block Self-attention block and Prediction Layer prediction layer, wherein the self-attention block includes Self-attention self-attention layer and Feed Forward Network feedforward network; Soring Module scoring module is used to complete the candidate itemset generation module 202 in the above embodiment, the first calculation
  • the functions of the module 203 and the second calculation module 204 such as the Item Scroring item scoring (calculating the interactive prediction value) and the Attribute Scoring attribute scoring function (calculating the preference prediction value), User represents the user, and can receive the query sent by the policy network and send it to the policy network. Provide feedback.
  • the processor is configured to implement the steps of the above dialogue recommendation method when executing the computer program.
  • the embodiments of the electronic device part correspond to the embodiments of the dialogue recommendation method part, the embodiments of the electronic device part refer to the description of the embodiments of the dialogue recommendation method part, which will not be repeated here.
  • Embodiments of the present invention further provide a storage medium, where a computer program is stored on the storage medium, and when the computer program is executed by a processor, the steps of the dialog recommendation method of any of the foregoing embodiments are implemented.
  • the embodiments of the storage medium part correspond to the embodiments of the dialogue recommendation method part, the embodiments of the storage medium part refer to the description of the embodiments of the dialogue recommendation method part, which will not be repeated here.
  • a software module can be placed in random access memory (RAM), internal memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other in the technical field. in any other known form of storage medium.
  • RAM random access memory
  • ROM read only memory
  • electrically programmable ROM electrically erasable programmable ROM
  • registers hard disk, removable disk, CD-ROM, or any other in the technical field. in any other known form of storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A conversation recommendation method and apparatus, an electronic device, and a storage medium. A historical sequence is used to improve the recommendation efficiency. The method comprises: obtaining a historical interaction sequence between a user and an item, inputting the historical interaction sequence into a recommendation network model comprising an embedding layer, a self-attention block, and a prediction layer for training, and generating an item preference value, wherein the item comprises an item attribute; generating a candidate item set by using the item preference value and items having no interaction with the user; when a preference item attribute sent by the user is received, updating the candidate item set by using the preference item attribute, and calculating an interaction prediction value of each candidate item in the updated candidate item set by using an item correlation value; generating a candidate attribute set by using the interaction prediction value and the candidate item, and calculating a preference prediction value of each candidate item attribute in the candidate attribute set by using the interaction prediction value; and inputting the calculated candidate item set and the calculated candidate attribute set into a policy network for reinforcement learning, and carrying out conversation recommendation to the user.

Description

一种对话推荐方法、装置、电子设备及存储介质Dialogue recommendation method, device, electronic device and storage medium
本申请要求于2021年03月23日提交中国专利局、申请号为202110308759.6、发明名称为“一种对话推荐方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on March 23, 2021 with the application number 202110308759.6 and the invention titled "A dialogue recommendation method, device, electronic device and storage medium", the entire contents of which are approved by Reference is incorporated in this application.
技术领域technical field
本发明涉及对话推荐领域,特别涉及一种对话推荐方法、装置、电子设备及存储介质。The present invention relates to the field of dialogue recommendation, in particular to a dialogue recommendation method, device, electronic device and storage medium.
背景技术Background technique
对话推荐系统(CRS,Conversational Recommender System)是一种可主动向用户获取偏好属性并利用该属性进行项目推荐的推荐系统。在相关技术中,对话推荐可以利用当前向用户询问的偏好属性进行推荐,并且它可以考虑用户交互过的历史项目向用户询问偏好属性或是推荐项目,但忽视了用户与历史项目之间的交互次序对推荐的影响,忽略掉了交互历史项目的序列的重要性,进而导致现有的对话推荐方法难以高效且准确地与用户进行对话推荐。Conversational Recommender System (CRS, Conversational Recommender System) is a recommender system that can actively obtain preference attributes from users and use the attributes to recommend items. In the related art, dialogue recommendation can use the preference attribute currently asked to the user for recommendation, and it can ask the user for preference attribute or recommend the item considering the historical items that the user has interacted with, but ignore the interaction between the user and the historical item The impact of order on recommendation ignores the importance of the sequence of interaction history items, which makes it difficult for existing dialogue recommendation methods to efficiently and accurately recommend dialogue with users.
发明内容SUMMARY OF THE INVENTION
本发明的目的是提供一种对话推荐方法、装置、电子设备及存储介质,可利用反映用户历史项目偏好的历史交互序列进行推荐网络模型训练及生成候选项目集,进而可确保对话推荐能够有针对性向用户进行对话,提升对话推荐的效率及准确度。The purpose of the present invention is to provide a dialogue recommendation method, device, electronic device and storage medium, which can use the historical interaction sequence reflecting the user's historical item preference to train the recommendation network model and generate the candidate item set, thereby ensuring that the dialogue recommendation can be targeted Conduct conversations with sexual users to improve the efficiency and accuracy of conversation recommendations.
为解决上述技术问题,本发明提供一种对话推荐方法,包括:In order to solve the above-mentioned technical problems, the present invention provides a dialogue recommendation method, including:
获取用户与项目之间的历史交互序列,并将所述历史交互序列输入至包含嵌入层、自注意块及预测层的推荐网络模型中进行训练,生成项目偏好值;其中,所述项目包含项目属性;Obtain the historical interaction sequence between the user and the item, and input the historical interaction sequence into the recommendation network model including the embedding layer, the self-attention block and the prediction layer for training, and generate the item preference value; wherein, the item includes the item Attributes;
利用所述项目偏好值及所述用户未交互过的项目生成候选项目集;generating a candidate item set using the item preference value and the items that the user has not interacted with;
当接收到所述用户发送的偏好项目属性时,利用所述偏好项目属性更 新所述候选项目集,并利用所述项目相关值计算更新后的候选项目集中各候选项目的交互预测值;When receiving the preference item attribute sent by the user, use the preference item attribute to update the candidate item set, and use the item correlation value to calculate the interaction prediction value of each candidate item in the updated candidate item set;
利用所述交互预测值及所述候选项目生成候选属性集,并利用所述交互预测值计算所述候选属性集中各候选项目属性的偏好预测值;Generate a candidate attribute set using the interactive predicted value and the candidate item, and use the interactive predicted value to calculate the preference predicted value of each candidate item attribute in the candidate attribute set;
将完成计算后的候选项目集及候选属性集输入策略网络中进行强化学习,向所述用户进行对话推荐。Input the calculated candidate item set and candidate attribute set into the policy network for reinforcement learning, and perform dialogue recommendation to the user.
可选地,所述将所述历史交互序列输入至包含嵌入层、自注意块及预测层的推荐网络模型中进行训练,生成项目偏好值,包括:Optionally, inputting the historical interaction sequence into a recommendation network model including an embedding layer, a self-attention block and a prediction layer for training to generate an item preference value, including:
利用所述历史交互序列生成预设长度的训练序列;Use the historical interaction sequence to generate a training sequence of preset length;
将所有所述项目及所述训练序列输入所述嵌入层,输出整合嵌入矩阵;Input all the items and the training sequence into the embedding layer, and output an integrated embedding matrix;
将所述整合嵌入矩阵输入所述自注意块中进行迭代特征学习,生成学习矩阵;Inputting the integrated embedding matrix into the self-attention block for iterative feature learning to generate a learning matrix;
将所述学习矩阵输入所述预测层进行矩阵分解,计算初始项目偏好值;Inputting the learning matrix into the prediction layer for matrix decomposition, and calculating the initial item preference value;
利用二值交叉熵损失函数对所述推荐网络模型进行网络优化,直至所述二值交叉熵损失函数的输出值最小时,将所述输出值最小时的初始项目偏好值作为所述项目偏好值。The recommendation network model is optimized by using a binary cross-entropy loss function until the output value of the binary cross-entropy loss function is the smallest, and the initial item preference value when the output value is the smallest is used as the item preference value .
可选地,所述将完成计算后的候选项目集及候选属性集输入策略网络中进行强化学习,向所述用户进行对话推荐,包括:Optionally, inputting the calculated candidate item set and candidate attribute set into the strategy network for reinforcement learning, and performing dialogue recommendation to the user, including:
将所述候选项目集及所述候选属性集输入所述策略网络中进行所述强化学习,生成动作决策;Inputting the candidate item set and the candidate attribute set into the strategy network to perform the reinforcement learning to generate an action decision;
利用所述动作决策向所述用户进行对话推荐。A dialog recommendation is made to the user using the action decision.
可选地,所述利用所述动作决策向所述用户进行对话推荐,包括:Optionally, the performing a dialogue recommendation to the user by using the action decision includes:
当所述动作决策为项目推荐时,向用户端发送所述交互预测值最大的候选项目,并接收反馈数据;When the action decision is an item recommendation, send the candidate item with the largest interaction prediction value to the user terminal, and receive feedback data;
当所述反馈数据表示接受所述候选项目,则退出对话推荐;When the feedback data indicates that the candidate item is accepted, exit the dialogue recommendation;
当所述反馈数据表示拒绝所述候选项目,则在所述候选项目集中移除所述候选项目,并利用完成移除后的候选项目集,执行所述利用所述项目相关值计算更新后的候选项目集中各候选项目的交互预测值;When the feedback data indicates that the candidate item is rejected, the candidate item is removed from the candidate item set, and using the candidate item set after the removal is completed, the calculation of the updated using the item correlation value is performed. The interactive prediction value of each candidate item in the candidate item set;
当所述动作决策为属性询问时,向用户端发送所述偏好预测值最大的 候选项目属性,并接收所述反馈数据;When the action decision is an attribute query, the attribute of the candidate item with the largest predicted preference value is sent to the user terminal, and the feedback data is received;
利用所述反馈数据对所述候选项目集中的项目进行验证,并移除未通过验证的项目,最后利用完成移除后的候选项目集,执行所述利用所述项目相关值计算更新后的候选项目集中各候选项目的交互预测值的步骤。Use the feedback data to verify the items in the candidate item set, remove the items that fail the verification, and finally use the removed candidate item set to perform the calculation of the updated candidate item using the item related value The step of interactively predicting the value of each candidate item in the item set.
可选地,在利用所述动作决策向所述用户进行对话推荐之前,还包括:Optionally, before using the action decision to recommend a dialogue to the user, the method further includes:
判断所述动作决策对应的对话轮次是否大于预设阈值;judging whether the dialogue round corresponding to the action decision is greater than a preset threshold;
若是,则退出所述对话推荐;If so, exit the dialog recommendation;
若否,则执行利用所述动作决策向所述用户进行对话推荐的步骤。If not, perform the step of making dialog recommendation to the user by using the action decision.
可选地,所述利用所述偏好项目属性更新所述候选项目集,包括:Optionally, the updating the candidate item set using the preference item attribute includes:
移除所述候选项目集中不具有所述偏好项目属性的偏好项目。A preference item that does not have the preference item attribute in the candidate item set is removed.
可选地,所述利用所述交互预测值及所述候选项目生成候选属性集,并利用所述交互预测值计算所述候选属性集中各候选项目属性的偏好预测值,包括:Optionally, generating a candidate attribute set using the interactive predicted value and the candidate item, and calculating the preference predicted value of each candidate item attribute in the candidate attribute set using the interactive predicted value, includes:
按所述交互预测值从大到小的顺序,利用前预设数量的所述候选项目所包含的项目属性生成候选属性集;Generate a candidate attribute set by utilizing the item attributes contained in the first preset number of candidate items in descending order of the interactive predicted values;
利用所述候选项目的交互预测值,计算所述候选属性集中各候选项目属性的信息熵,并将所述信息熵作为所述候选项目属性的偏好预测值。Using the interactive prediction value of the candidate item, calculate the information entropy of each candidate item attribute in the candidate attribute set, and use the information entropy as the preference prediction value of the candidate item attribute.
本发明还提供一种对话推荐装置,包括:The present invention also provides a dialogue recommendation device, comprising:
推荐网络模块,用于获取用户与项目之间的历史交互序列,并将所述历史交互序列输入至包含嵌入层、自注意块及预测层的推荐网络模型中进行训练,生成候选项目集及项目相关值;其中,所述项目包含项目属性;The recommendation network module is used to obtain the historical interaction sequence between the user and the item, and input the historical interaction sequence into the recommendation network model including the embedding layer, self-attention block and prediction layer for training, and generate candidate item sets and items a relevant value; wherein the item contains an item attribute;
候选项目集生成模块,用于利用所述项目偏好值及所述用户未交互过的项目生成候选项目集;A candidate item set generation module, configured to generate a candidate item set by using the item preference value and the items that the user has not interacted with;
第一计算模块,用于当接收到所述用户发送的偏好项目属性时,利用所述偏好项目属性更新所述候选项目集,并利用所述项目相关值计算更新后的候选项目集中各候选项目的交互预测值;a first calculation module, configured to update the candidate item set using the preference item attribute when receiving the preference item attribute sent by the user, and use the item correlation value to calculate each candidate item in the updated candidate item set The interactive prediction value of ;
第二计算模块,用于利用所述交互预测值及所述候选项目生成候选属性集,并利用所述交互预测值计算所述候选属性集中各候选项目属性的偏好预测值;a second calculation module, configured to generate a candidate attribute set using the interactive predicted value and the candidate item, and use the interactive predicted value to calculate the preference predicted value of each candidate item attribute in the candidate attribute set;
对话推荐模块,用于将完成计算后的候选项目集及候选属性集输入策略网络中进行强化学习,向所述用户进行对话推荐。The dialogue recommendation module is used for inputting the calculated candidate item set and candidate attribute set into the policy network for reinforcement learning, and performing dialogue recommendation to the user.
本发明还提供一种电子设备,包括:The present invention also provides an electronic device, comprising:
存储器,用于存储计算机程序;memory for storing computer programs;
处理器,用于执行所述计算机程序时实现如上述所述的对话推荐方法。The processor is configured to implement the above-mentioned dialogue recommendation method when executing the computer program.
本发明还提供一种存储介质,所述存储介质中存储有计算机可执行指令,所述计算机可执行指令被处理器加载并执行时,实现如上述所述的对话推荐方法。The present invention further provides a storage medium, where computer-executable instructions are stored in the storage medium, and when the computer-executable instructions are loaded and executed by a processor, the above-mentioned dialog recommendation method is implemented.
本发明提供一种对话推荐方法,包括:获取用户与项目之间的历史交互序列,并将所述历史交互序列输入至包含嵌入层、自注意块及预测层的推荐网络模型中进行训练,生成项目偏好值;其中,所述项目包含项目属性;利用所述项目偏好值及所述用户未交互过的项目生成候选项目集;当接收到所述用户发送的偏好项目属性时,利用所述偏好项目属性更新所述候选项目集,并利用所述项目相关值计算更新后的候选项目集中各候选项目的交互预测值;利用所述交互预测值及所述候选项目生成候选属性集,并利用所述交互预测值计算所述候选属性集中各候选项目属性的偏好预测值;将完成计算后的候选项目集及候选属性集输入策略网络中进行强化学习,向所述用户进行对话推荐。The present invention provides a dialogue recommendation method, comprising: acquiring a historical interaction sequence between a user and an item, and inputting the historical interaction sequence into a recommendation network model including an embedding layer, a self-attention block and a prediction layer for training, and generating an item preference value; wherein the item includes an item attribute; a candidate item set is generated using the item preference value and an item that the user has not interacted with; when the preference item attribute sent by the user is received, the preference item is used Item attributes update the candidate item set, and use the item correlation value to calculate the interactive prediction value of each candidate item in the updated candidate item set; use the interactive prediction value and the candidate item to generate a candidate attribute set, and use the The interaction prediction value is used to calculate the preference prediction value of each candidate item attribute in the candidate attribute set; the calculated candidate item set and candidate attribute set are input into the strategy network for reinforcement learning, and dialogue recommendation is made to the user.
可见,本方法首先获取了用户与项目之间的历史交互序列,由于该序列反映了用户对历史项目的偏好及交互顺序,因此本方法利用历史交互序列参与推荐网络模型训练,可确保利用推荐网络模型生成的候选项目集同时考虑了用户对历史项目的偏好及交互顺序,可生成更符合用户偏好的候选项目集,进而确保在对话推荐中可利用候选项目集有针对性地向用户进行对话,能够有效减少对话推荐的对话轮次,同时能够提升对话推荐的准确性。本发明还提供一种对话推荐装置、电子设备及存储介质,具有上述有益效果。It can be seen that this method first obtains the historical interaction sequence between the user and the item. Since the sequence reflects the user's preference for historical items and the interaction sequence, this method uses the historical interaction sequence to participate in the training of the recommendation network model, which can ensure the use of the recommendation network. The candidate item set generated by the model also considers the user's preference for historical items and the interaction order, and can generate a candidate item set that is more in line with the user's preference, thereby ensuring that the candidate item set can be used in dialogue recommendation to conduct targeted dialogues with users. It can effectively reduce the dialogue rounds of dialogue recommendation, and at the same time can improve the accuracy of dialogue recommendation. The present invention also provides a dialogue recommendation device, an electronic device and a storage medium, which have the above beneficial effects.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only It is an embodiment of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to the provided drawings without creative work.
图1为本发明实施例所提供的一种对话推荐方法的流程图;FIG. 1 is a flowchart of a dialog recommendation method provided by an embodiment of the present invention;
图2为本发明实施例所提供的一种对话推荐装置的结构框图;FIG. 2 is a structural block diagram of a dialogue recommendation apparatus provided by an embodiment of the present invention;
图3为本发明实施例所提供的一种基于历史交互序列的对话推荐系统的结构框图。FIG. 3 is a structural block diagram of a dialogue recommendation system based on historical interaction sequences provided by an embodiment of the present invention.
具体实施方式Detailed ways
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
对话推荐系统(CRS,Conversational Recommender System)是一种可主动向用户获取偏好属性并利用该属性进行项目推荐的推荐系统。在相关技术中,对话推荐仅能利用当前向用户询问的偏好属性进行推荐,难以考虑用户的历史项目偏好,只能盲目地向用户询问偏好属性或是推荐项目,最终导致现有的对话推荐方法难以高效且准确地与用户进行对话推荐。有鉴于此,本发明提供一种对话推荐方法,可利用反映用户历史项目偏好的历史交互序列进行推荐网络模型训练及生成候选项目集,进而可确保对话推荐能够有针对性向用户进行对话,提升对话推荐的效率及准确度。请参考图1,图1为本发明实施例所提供的一种对话推荐方法的流程图,该方法可以包括:Conversational Recommender System (CRS, Conversational Recommender System) is a recommender system that can actively obtain preference attributes from users and use the attributes to recommend items. In the related art, the dialogue recommendation can only be recommended by using the preference attributes currently inquired to the user, and it is difficult to consider the user's historical item preferences. It is difficult to make conversational recommendations with users efficiently and accurately. In view of this, the present invention provides a dialogue recommendation method, which can use the historical interaction sequence reflecting the user's historical item preference to train the recommendation network model and generate the candidate item set, thereby ensuring that the dialogue recommendation can conduct dialogue with the user in a targeted manner and improve the dialogue Recommended efficiency and accuracy. Please refer to FIG. 1. FIG. 1 is a flowchart of a dialog recommendation method provided by an embodiment of the present invention. The method may include:
S101、获取用户与项目之间的历史交互序列,并将历史交互序列输入至包含嵌入层、自注意块及预测层的推荐网络模型中进行训练,生成项目偏好值;其中,项目包含项目属性。S101. Obtain a historical interaction sequence between a user and an item, and input the historical interaction sequence into a recommendation network model including an embedding layer, a self-attention block, and a prediction layer for training to generate an item preference value; the item includes an item attribute.
历史交互序列为用户过去对项目进行交互的记录,同时历史交互序列的项目按照用户与项目的交互时间次序进行排序,该交互可以为点击查看、收藏、购买等。由于历史交互序列中包含了用户对项目的历史偏好以及用户对历史项目的交互顺序,因此本发明实施例将历史交互序列融入推荐网络模型的训练,可确保推荐网络模型能够同时融合用户的历史偏好及对历史项目的交互顺序特征,进而能够确保对话推荐能够根据用户的历史偏好进行针对性的对话,可有效提升对话推荐的效率及准确度。需要说明的是,本发明实施例并不限定具体的项目,可以理解的是该项目为一个具体的物品,例如可以为实体物品,如书籍等,也可以为虚拟物品,例如电影等。本发明实施例也不限定具体的项目属性,可以理解的是,该项目属性反映了项目的某种特征,例如当项目为书籍时,该项目属性可以为书籍的类型,如小说、传记、教科书等,也可以为表示是否为热销书籍的项目属性,也可以为其他与书籍有关的项目属性。本发明实施例也不限定项目可包含的项目属性的数量,一个项目可拥有一个或多个项目属性。The historical interaction sequence is the record of the user's interaction with the item in the past, and the items in the historical interaction sequence are sorted according to the time sequence of the interaction between the user and the item, and the interaction can be click to view, favorite, purchase, etc. Since the historical interaction sequence includes the user's historical preference for items and the user's interaction sequence for historical items, the embodiment of the present invention integrates the historical interaction sequence into the training of the recommendation network model, which can ensure that the recommendation network model can simultaneously integrate the user's historical preferences And the interactive sequence characteristics of historical items, which can ensure that the dialogue recommendation can conduct targeted dialogues according to the user's historical preferences, which can effectively improve the efficiency and accuracy of dialogue recommendation. It should be noted that the embodiments of the present invention do not limit a specific item. It can be understood that the item is a specific item, for example, a physical item such as a book, or a virtual item such as a movie. This embodiment of the present invention also does not limit the specific item attributes. It can be understood that the item attributes reflect a certain feature of the item. For example, when the item is a book, the item attribute can be the type of the book, such as novels, biographies, and textbooks. etc., it can also be an item attribute indicating whether it is a popular book, or it can be other item attributes related to books. This embodiment of the present invention also does not limit the number of item attributes that an item can contain, and an item can have one or more item attributes.
需要说明的是,本发明实施例所使用的推荐网络模型基于深度学习神经网络。本发明实施例并不限定该推荐网络模型中的嵌入层、自注意块及预测层具体的结构及学习方式,用户可参考深度学习神经网络的相关技术。It should be noted that the recommendation network model used in the embodiment of the present invention is based on a deep learning neural network. The embodiments of the present invention do not limit the specific structures and learning methods of the embedding layer, the self-attention block, and the prediction layer in the recommendation network model, and users may refer to related technologies of deep learning neural networks.
进一步,本发明实施例并不限定历史交互序列中可包含的项目数量,历史交互序列中可包含一个或多个项目。由于不同用户的历史交互序列长度不一,为了方便推荐网络模型的训练,可以将历史交互序列转换为一个预设长度的训练序列。可以理解的是,当历史交互序列中包含的项目数量过少时,将难以为用户计算可靠的偏好,因此可为历史交互序列设置最小项目数量,当历史交互序列中包含的项目数量小于最小项目数量时,将不会进行网络模型训练。本发明实施例并不限定最小项目数量的具体数值,用户可根据实际应用需求进行设定。在一种可能的情况中,最小项目数量可以为10个。需要说明的是,本发明实施例并不限定该预设长度的具体数值,可根据实际应用需求进行设置。Further, the embodiment of the present invention does not limit the number of items that can be included in the historical interaction sequence, and one or more items can be included in the historical interaction sequence. Since the length of the historical interaction sequence of different users is different, in order to facilitate the training of the recommendation network model, the historical interaction sequence can be converted into a training sequence of preset length. It is understandable that when the number of items contained in the historical interaction sequence is too small, it will be difficult to calculate a reliable preference for the user, so a minimum number of items can be set for the historical interaction sequence, when the number of items contained in the historical interaction sequence is less than the minimum number of items. , the network model training will not be performed. The embodiments of the present invention do not limit the specific value of the minimum number of items, which can be set by the user according to actual application requirements. In one possible case, the minimum number of items may be 10. It should be noted that the embodiment of the present invention does not limit the specific value of the preset length, which can be set according to actual application requirements.
在一种可能的情况中,将历史交互序列输入至包含嵌入层、自注意块及预测层的推荐网络模型中进行训练,生成项目偏好值的过程,可以包括:In a possible situation, the historical interaction sequence is input into the recommendation network model including the embedding layer, the self-attention block and the prediction layer for training, and the process of generating the item preference value may include:
步骤11:利用历史交互序列生成预设长度的训练序列。Step 11: Use the historical interaction sequence to generate a training sequence of preset length.
具体的,对于某一用户的历史交互序列
Figure PCTCN2021122167-appb-000001
可首先生成初始训练序列
Figure PCTCN2021122167-appb-000002
由于每个用户之间的历史交互序列长度不一,因此可将初始训练序列转换为一个固定长度的训练序列s=(s 1,s 2,s 3,…,s n),其中n表示新序列的预设长度。由于存在用户的历史访问较少,历史访问序列的长度小于n,此时可利用预设的填充项目对初始训练序列进行左填充,填充的数量为n-(|V n|-1);当然,可以理解的是,若用户的历史访问序列长度大于n,则只保留最近发生交互的n个项目。
Specifically, for a user's historical interaction sequence
Figure PCTCN2021122167-appb-000001
An initial training sequence can be generated first
Figure PCTCN2021122167-appb-000002
Since the length of the historical interaction sequence between each user is different, the initial training sequence can be converted into a fixed-length training sequence s=(s 1 ,s 2 ,s 3 ,...,s n ), where n represents the new The preset length of the sequence. Since there are few historical visits of users, the length of the historical visit sequence is less than n. At this time, the initial training sequence can be left-filled with the preset padding items, and the number of padding is n-(|V n |-1); of course, , it can be understood that if the length of the user's historical access sequence is greater than n, only the n items that have interacted recently are retained.
步骤12:将所有项目及训练序列输入嵌入层,输出整合嵌入矩阵。Step 12: Input all items and training sequences into the embedding layer, and output the integrated embedding matrix.
具体的,在嵌入层,首先嵌入所有项目,得到项目嵌入矩阵M∈R |V|×d。对项目嵌入矩阵执行查找操作,d表示嵌入矩阵的维度,该维度可任意设定,当维度越大时,嵌入矩阵潜在的特征越多。生成训练序列s的训练序列嵌入矩阵E∈R n×d,其中该查找操作为
Figure PCTCN2021122167-appb-000003
E i表示训练序列S中第i个元素s i在训练序列嵌入矩阵中对应的向量,而
Figure PCTCN2021122167-appb-000004
表示在s i项目嵌入矩阵中对应的向量。最后,训练序列嵌入矩阵E与一个可学习的位置嵌入矩阵P∈R n×d进行整合,得到整合嵌入矩阵:
Specifically, in the embedding layer, all items are firstly embedded, and the item embedding matrix M∈R |V|×d is obtained. Perform a search operation on the item embedding matrix, where d represents the dimension of the embedding matrix, which can be set arbitrarily. The larger the dimension, the more potential features of the embedding matrix. Generate the training sequence embedding matrix E∈Rn ×d of the training sequence s, where this lookup operation is
Figure PCTCN2021122167-appb-000003
E i represents the vector corresponding to the i-th element s i in the training sequence S in the training sequence embedding matrix, and
Figure PCTCN2021122167-appb-000004
represents the corresponding vector in the si -item embedding matrix. Finally, the training sequence embedding matrix E is integrated with a learnable position embedding matrix P∈Rn ×d to obtain the integrated embedding matrix:
E s=E+P E s =E+P
步骤13:将整合嵌入矩阵输入自注意块中进行迭代特征学习,生成学习矩阵。Step 13: Input the integrated embedding matrix into the self-attention block for iterative feature learning to generate a learning matrix.
需要说明的是,本发明实施例并不限定自注意块的具体学习方式,用户可参考自注意(Self-Attention)的相关技术。本发明实施例也不限定自注意块的具体结构,在一种可能的情况中,自注意块中包含自注意层及点对前馈网络。本发明也不限定是否利用多个自注意块进行堆叠计算,可以理解的是堆叠的层数越多,自注意块可学习的特征便越多。It should be noted that the embodiment of the present invention does not limit the specific learning method of the self-attention block, and the user may refer to the related technology of self-attention. The embodiments of the present invention also do not limit the specific structure of the self-attention block. In a possible situation, the self-attention block includes a self-attention layer and a point-to-point feedforward network. The present invention also does not limit whether to use multiple self-attention blocks for stacking computation. It can be understood that the more layers are stacked, the more features the self-attention block can learn.
在一种可能的情况中,自注意层存在三个矩阵Q(Query,查询),K(Key,键),V(Value,值),这三个矩阵均来自于输入自注意块的同一输入。在自注意块的计算中,首先会计算Q与K之间的点乘,为了防止点乘结果过大,将会除以一个尺度标度
Figure PCTCN2021122167-appb-000005
其中d为query和key向量的维度。最后,通 过一个softmax操作将计算结果进行归一化处理转换为概率分布,并乘以矩阵V就得到权重求和的表示。Attention通过一个标量点积定义为:
In one possible case, the self-attention layer has three matrices Q(Query, query), K(Key, key), V(Value, value), all of which come from the same input into the self-attention block . In the calculation of the self-attention block, the dot product between Q and K will be calculated first. In order to prevent the dot product result from being too large, it will be divided by a scale.
Figure PCTCN2021122167-appb-000005
where d is the dimension of the query and key vectors. Finally, the calculation result is normalized and converted into a probability distribution through a softmax operation, and multiplied by the matrix V to obtain the representation of the weight sum. Attention is defined by a scalar dot product as:
Figure PCTCN2021122167-appb-000006
Figure PCTCN2021122167-appb-000006
在本发明实施例中,在预测第(k+1)个项目时,应当只利用训练序列中的前k个项目进行计算。但在自注意层中,自注意层会对完整的训练序列进行学习。对此,可利用mask操作来阻止自注意层中Q i和K j之间的连接(i<j)。在自注意层输出基于前k个项目的学习结果S k后,可利用点对两层的前馈网络将自注意层由线性模型转换为非线性模型。同时,为了学习更加复杂的项目转换,可通过堆叠自注意块的方式进行迭代特征学习。本发明实施例并不限定具体的堆叠层数,可根据实际应用需求进行设置。对于第b(b>1)个自注意块,可以定义为: In this embodiment of the present invention, when predicting the (k+1)th item, only the first k items in the training sequence should be used for calculation. But in the self-attention layer, the self-attention layer learns the complete training sequence. For this, a mask operation can be used to block the connection between Qi and Kj in the self-attention layer (i< j ). After the self-attention layer outputs the learning result Sk based on the first k items, a point-to-two-layer feedforward network can be used to convert the self-attention layer from a linear model to a nonlinear model. Meanwhile, in order to learn more complex item transformations, iterative feature learning can be performed by stacking self-attention blocks. The embodiment of the present invention does not limit the specific number of stacked layers, which can be set according to actual application requirements. For the bth (b>1) self-attention block, it can be defined as:
SA(F (b-1))=Attention(F (b-1)W Q,F (b-1)W K,F (b-1)W V) SA(F (b-1) )=Attention(F (b-1) W Q ,F (b-1) W K ,F (b-1) W V )
Figure PCTCN2021122167-appb-000007
Figure PCTCN2021122167-appb-000007
其中,SA(Self Attention)为自注意层,FFN(Feed Forward Network)为前馈网络,
Figure PCTCN2021122167-appb-000008
表示第b个自注意块基于前k(k∈{1,2,3,…,n})个项目的学习矩阵,RELU(Rectified Linear Unit)为线性整流函数,
Figure PCTCN2021122167-appb-000009
表示在第b个自注意块中聚合了前k个项目的自注意层的输出。W Q,W K,W V,W 1,W 2∈R d×d均为可学习矩阵,b 1,b 2为d维向量。
Among them, SA (Self Attention) is the self-attention layer, FFN (Feed Forward Network) is the feedforward network,
Figure PCTCN2021122167-appb-000008
represents the learning matrix of the b-th self-attention block based on the first k(k∈{1,2,3,…,n}) items, RELU (Rectified Linear Unit) is a linear rectification function,
Figure PCTCN2021122167-appb-000009
represents the output of the self-attention layer that aggregated the top k items in the b-th self-attention block. W Q , W K , W V , W 1 , W 2 ∈ R d×d are all learnable matrices, and b 1 , b 2 are d-dimensional vectors.
对于第一个自注意块,可定义为S (1)=SA(E S)以及F (1)=FFN(S (1))。 For the first self-attention block, it can be defined as S (1) = SA(E S ) and F (1) = FFN(S (1) ).
多层神经网络具有很强的特征学习能力。但是简单的增加更多的网络层会导致过度拟合和消耗更多的训练时间等问题,这是因为当网络变深时,梯度消失的隐患也会增加,模型性能会下降。对此,可通过残差连接缓解上述情况。具体的,残差连接的过程可以为:对自注意层和前馈网络的输入x进行归一化处理,同时对自注意层和前馈网络的输出进行dropout(丢弃法)操作,最后将原始输出x添加至完成dropout操作后的输出作为最终输出。Multi-layer neural network has strong feature learning ability. But simply adding more network layers can lead to problems such as overfitting and consuming more training time, because when the network becomes deeper, the hidden danger of gradient disappearance will also increase, and the model performance will decrease. In this regard, the above situation can be alleviated by residual connections. Specifically, the process of residual connection can be as follows: normalize the input x of the self-attention layer and the feedforward network, and at the same time perform a dropout (drop method) operation on the output of the self-attention layer and the feedforward network, and finally put the original The output x is added to the output after completing the dropout operation as the final output.
步骤14:将学习矩阵输入预测层进行矩阵分解,计算初始项目关键值;Step 14: Input the learning matrix into the prediction layer for matrix decomposition, and calculate the key value of the initial project;
具体的,在b个自注意块后,可利用
Figure PCTCN2021122167-appb-000010
进行项目预测。可将
Figure PCTCN2021122167-appb-000011
输入MF层(Matrix Factorization,矩阵分解层)进行矩阵分解来计算项目i的项目相关值,以预测项目i是否为可推荐的项目。项目相关值可通过如下公式进行计算:
Specifically, after b self-attention blocks, we can use
Figure PCTCN2021122167-appb-000010
Make project forecasts. can be
Figure PCTCN2021122167-appb-000011
The MF layer (Matrix Factorization, matrix factorization layer) is input to perform matrix factorization to calculate the item-related value of item i to predict whether item i is a recommendable item. The item-related value can be calculated by the following formula:
Figure PCTCN2021122167-appb-000012
Figure PCTCN2021122167-appb-000012
其中,r i,k表示基于前k个项目,项目i成为下一个项目(即可被推荐)的相关性;N∈R |V|×d为一个项目嵌入矩阵。 Among them, ri ,k represents the relevance of item i becoming the next item (that can be recommended) based on the first k items; N∈R |V|×d is an item embedding matrix.
步骤15:利用二值交叉熵损失函数对推荐网络模型进行网络优化,直至二值交叉熵损失函数的输出值最小时,将输出值最小时的初始项目偏好值为项目偏好值。Step 15: Use the binary cross-entropy loss function to optimize the recommendation network model until the output value of the binary cross-entropy loss function is the smallest, and set the initial item preference value when the output value is the smallest as the item preference value.
需要说明的是,本发明实施例并不限定利用二值交叉熵损失函数(Binary cross entropy)进行网络优化的具体过程,可参考相关技术。在一种可能的情况中,二值交叉熵损失函数可表示为:It should be noted that the embodiments of the present invention do not limit the specific process of network optimization using a binary cross entropy loss function (Binary cross entropy), and reference may be made to related technologies. In one possible case, the binary cross-entropy loss function can be expressed as:
Figure PCTCN2021122167-appb-000013
Figure PCTCN2021122167-appb-000013
其中,e t表示网络在时间步t的期望输出,而s t表示网络在时间步t的实际输出。由于训练序列中可能存在预设填充项目,当s t为该预设填充项目时,应当将期望输出e t也设置为填充项目。当s t为正常的项目时,e t应当设置为:e t=s t+1。当t=n,即在时间步t训练序列已到头,此时
Figure PCTCN2021122167-appb-000014
where e t represents the expected output of the network at time step t, and s t represents the actual output of the network at time step t. Since there may be a preset filling item in the training sequence, when s t is the preset filling item, the expected output e t should also be set as the filling item. When s t is a normal item, et should be set as: et =s t +1 . When t=n, that is, the training sequence has ended at time step t, then
Figure PCTCN2021122167-appb-000014
S102、利用所述项目偏好值及所述用户未交互过的项目生成候选项目集。S102. Generate a candidate item set by using the item preference value and the items that the user has not interacted with.
由于项目偏好值可表示用户对项目的兴趣偏好,因此在本发明实施例中将采用项目偏好值作为生成候选项目集的指标,用于对项目进行排序。可以理解的是,为了对用户未来可能感兴趣的项目进行预测,在本发明实施例中将采用用户未交互的项目生成候选项目集,用户未交互过的项目可 通过取包含所有项目的总集合相对于历史交互序列对应的历史项目集合的差集得到。Since the item preference value can represent the user's interest preference for the item, in the embodiment of the present invention, the item preference value will be used as an indicator for generating a candidate item set for sorting the items. It can be understood that, in order to predict the items that the user may be interested in in the future, in this embodiment of the present invention, the items that the user has not interacted with will be used to generate a candidate item set, and the items that the user has not interacted with can be obtained by taking the total set containing all the items. It is obtained from the difference set of the historical item set corresponding to the historical interaction sequence.
S103、当接收到用户发送的偏好项目属性时,利用偏好项目属性更新候选项目集,并利用项目相关值计算更新后的候选项目集中各候选项目的交互预测值。S103 , when the preference item attribute sent by the user is received, update the candidate item set using the preference item attribute, and use the item correlation value to calculate the interactive prediction value of each candidate item in the updated candidate item set.
在对话推荐中,对话需要由用户首先发送一个偏好的项目属性来进行启动。本发明实施例并不限定用户发送偏好项目属性的具体过程,例如可以为用户在预先设置的项目属性列表中进行挑选,并发送选择数据,也可以是由用户发送对话推荐请求,由对话推荐系统向用户发送项目属性列表以使用户进行选择,最后接收用户发送的选择数据。In dialog recommendation, the dialog needs to be initiated by the user first sending a preferred item attribute. The embodiment of the present invention does not limit the specific process for the user to send the preference item attribute. For example, the user can select from a preset item attribute list and send the selection data, or the user can send a dialogue recommendation request, and the dialogue recommendation system can send the selection data. Send a list of item properties to the user for selection by the user, and finally receive the selection data sent by the user.
进一步,本发明实施例并不限定利用偏好项目属性及候选项目集生成候选属性集的具体方式,可以理解的是,可以移除候选项目集中不具有该偏好项目属性的项目,也可以保留候选项目集中具有该偏好项目属性的项目。Further, the embodiment of the present invention does not limit the specific method of using the preference item attribute and the candidate item set to generate the candidate attribute set. It can be understood that the item that does not have the preference item attribute in the candidate item set can be removed, and the candidate item can also be retained. Items with this preference item property are collected.
在一种可能的情况中,利用偏好项目属性更新候选项目集,可以包括:In one possible case, updating the candidate item set with the preference item attribute may include:
步骤21:移除候选项目集中不具有偏好项目属性的项目。Step 21: Remove the items that do not have the preference item attribute in the candidate item set.
下面具体介绍利用项目相关值计算更新后的候选项目集中各候选项目的交互预测值的过程。在一种可能的情况中,具体的,在对话开始前,用户接受的属性集
Figure PCTCN2021122167-appb-000015
和用户拒绝的属性集
Figure PCTCN2021122167-appb-000016
均为空,候选项目集V cand中包含该用户未曾交互的项目,同时候选属性集P cand也为空。当接收到用户发送的某个偏好项目属性p 0后,可首先利用偏好项目属性对候选项目集中不具有偏好项目属性的项目进行移除,以更新候选项目集,具体可表示为:
The following specifically describes the process of using the item correlation value to calculate the interactive prediction value of each candidate item in the updated candidate item set. In one possible case, specifically, the set of attributes accepted by the user before the dialog starts
Figure PCTCN2021122167-appb-000015
and user denied attribute set
Figure PCTCN2021122167-appb-000016
are all empty, the candidate item set V cand contains items that the user has not interacted with, and the candidate attribute set P cand is also empty. After receiving a preference item attribute p 0 sent by the user, the preference item attribute can be used to first remove the items that do not have the preference item attribute in the candidate item set to update the candidate item set, which can be specifically expressed as:
Figure PCTCN2021122167-appb-000017
Figure PCTCN2021122167-appb-000017
其中,
Figure PCTCN2021122167-appb-000018
为带有偏好项目属性p 0的项目。在更新候选项目集后,便可利用多层自注意块的输出
Figure PCTCN2021122167-appb-000019
计算候选项目集中各候选项目的交互预测值,
Figure PCTCN2021122167-appb-000020
通过将历史交互序列的初始训练序列V′ u输入b层自注意块得到。交互预测值的计算过程可表示为:
in,
Figure PCTCN2021122167-appb-000018
is the item with the preference item attribute p 0 . After updating the candidate itemset, the output of the multi-layer self-attention block can be exploited
Figure PCTCN2021122167-appb-000019
Calculate the interactive prediction value of each candidate item in the candidate item set,
Figure PCTCN2021122167-appb-000020
It is obtained by inputting the initial training sequence V'u of the historical interaction sequence into the b-layer self-attention block. The calculation process of the interactive predicted value can be expressed as:
Figure PCTCN2021122167-appb-000021
Figure PCTCN2021122167-appb-000021
Figure PCTCN2021122167-appb-000022
Figure PCTCN2021122167-appb-000022
当策略网络确定本轮对话为项目推荐,则会从V cand中提取交互预测值较大的前预设数量n个候选项目V rec。若用户接受了该项目,则对话推荐结束;否则,应当从候选项目集V cand中移除推荐的项目,即V cand=V cand-V recWhen the policy network determines that the current round of dialogue is an item recommendation, it will extract from V cand the first preset number of n candidate items V rec with larger interaction prediction values. If the user accepts the item, the dialog recommendation ends; otherwise, the recommended item should be removed from the candidate item set V cand , ie V cand =V cand -V rec .
S104、利用交互预测值及候选项目生成候选属性集,并利用交互预测值计算候选属性集中各候选项目属性的偏好预测值。S104 , generating a candidate attribute set using the interactive predicted value and the candidate items, and using the interactive predicted value to calculate the preference predicted value of each candidate item attribute in the candidate attribute set.
由于项目属性归属于项目,同时用户在喜欢某些项目时,也存在对这些项目中的某些项目属性拥有偏好,因此在得到候选项目的交互预测值后,也可计算候选项目属性的偏好预测值,以此确定用户偏好的候选项目属性。Since item attributes belong to the item, and when users like certain items, they also have preferences for certain item attributes in these items, so after obtaining the interactive prediction value of the candidate item, the preference prediction of the candidate item attribute can also be calculated. value to determine the candidate item attributes that the user prefers.
需要说明的是,本发明实施例并不限定计算候选项目属性偏好预测值的具体方式,可以理解的是多个项目可拥有同一个项目属性,此时计算该项目属性归属的项目的交互预测值平均值,并将该交互预测值平均值作为该项目属性的偏好预测值;当然,也可以利用计算候选项目属性的信息熵,并将信息熵作为候选项目属性的偏好预测值,其中信息熵是消除信息不确定性的一种估量。当一件事情发生的概率越低,其发生时所能提供的信息熵就越大。考虑到利用信息熵作为候选项目属性的偏好预测值,可对候选项目集进行高效过滤,因此在本发明实施例中,可通过计算信息熵的方式计算候选项目属性的偏好预测值。It should be noted that the embodiment of the present invention does not limit the specific method of calculating the attribute preference prediction value of candidate items. It can be understood that multiple items can have the same item attribute. In this case, the interactive prediction value of the item to which the item attribute belongs is calculated. The average value of the interaction prediction value is used as the preference prediction value of the item attribute; of course, the information entropy of the candidate item attribute can also be calculated, and the information entropy can be used as the preference prediction value of the candidate item attribute, where the information entropy is A measure that removes information uncertainty. The lower the probability of something happening, the greater the information entropy it can provide when it happens. Considering that the information entropy is used as the preference prediction value of the candidate item attribute, the candidate item set can be efficiently filtered. Therefore, in the embodiment of the present invention, the preference prediction value of the candidate item attribute can be calculated by calculating the information entropy.
可以理解的是,信息熵的计算量较大,因此可利用交互预测值对候选项目进行排序,利用交互预测值较大的前预设数量的候选项目生成候选属性集,最后再对候选属性集进行信息熵计算。需要说明的是,本发明实施例并不限定预设数量的具体数值,可根据实际应用需求进行设置。It can be understood that the amount of calculation of information entropy is large, so the candidate items can be sorted by the interactive prediction value, and the candidate attribute set can be generated by using the first preset number of candidate items with larger interactive prediction value, and finally the candidate attribute set can be sorted. Carry out information entropy calculation. It should be noted that the embodiment of the present invention does not limit the specific value of the preset number, which can be set according to actual application requirements.
在一种可能的情况中,利用交互预测值及候选项目生成候选属性集,并利用交互预测值计算候选属性集中各候选项目属性的偏好预测值的过 程,可以包括:In a possible situation, the process of using the interactive predicted value and the candidate item to generate a candidate attribute set, and using the interactive predicted value to calculate the preference predicted value of each candidate item attribute in the candidate attribute set, may include:
步骤31:按交互预测值从大到小的顺序,利用前预设数量的候选项目所包含的项目属性生成候选属性集;Step 31: Generate a candidate attribute set using the item attributes contained in the previous preset number of candidate items in the descending order of the interactive prediction values;
具体的,利用交互预测值从大到小的顺序,对候选项目集V cand进行排序,并利用前L个候选项目生成候选属性集: Specifically, the candidate item set V cand is sorted by using the order of interactive prediction values from large to small, and the candidate attribute set is generated by using the first L candidate items:
Figure PCTCN2021122167-appb-000023
Figure PCTCN2021122167-appb-000023
Figure PCTCN2021122167-appb-000024
Figure PCTCN2021122167-appb-000024
其中,
Figure PCTCN2021122167-appb-000025
表示V cand中前i个候选项目的属性集,
Figure PCTCN2021122167-appb-000026
表示V cand中第i个候选项目的属性集,
Figure PCTCN2021122167-appb-000027
Figure PCTCN2021122167-appb-000028
分别表示用户接受的属性集和拒绝的属性集。
in,
Figure PCTCN2021122167-appb-000025
represents the attribute set of the first i candidate items in V cand ,
Figure PCTCN2021122167-appb-000026
represents the attribute set of the ith candidate item in V cand ,
Figure PCTCN2021122167-appb-000027
and
Figure PCTCN2021122167-appb-000028
Represents the attribute set accepted by the user and the attribute set rejected by the user.
步骤32:利用候选项目的交互预测值,计算候选属性集中各候选项目属性的信息熵,并将信息熵作为候选项目属性的偏好预测值。Step 32: Calculate the information entropy of each candidate item attribute in the candidate attribute set by using the interactive prediction value of the candidate item, and use the information entropy as the preference prediction value of the candidate item attribute.
具体的,可通过如下公式计算候选项目属性的信息熵:Specifically, the information entropy of the candidate item attribute can be calculated by the following formula:
f att(P cand,V cand)=-prob(p)·log 2(prob(p))  p∈P cand f att (P cand ,V cand )=-prob(p)·log 2 (prob(p)) p∈P cand
Figure PCTCN2021122167-appb-000029
Figure PCTCN2021122167-appb-000029
其中,σ是Sigmoid函数(激活函数),s v是候选项目V的交互预测值,V p表示包含项目属性p的项目。本发明实施例通过计算信息熵的方式,使用权重熵的方法来为重要的候选项目安排更高的权重而不是平等的对待每个项目。简单来说,如果候选项目中的多个项目都包含属性p,则属性p将难以对候选项目进行筛选,进而p就不是一个合适的属性,因此通过权重熵,可确保用户在接收候选项目属性时,该候选项目属性可快速对候选项 目进行快速筛选。 where σ is the sigmoid function (activation function), sv is the interactive prediction value of the candidate item V , and Vp represents the item containing the item attribute p. The embodiment of the present invention uses the method of weight entropy to arrange higher weights for important candidate items by calculating the information entropy instead of treating each item equally. To put it simply, if multiple items in the candidate item contain attribute p, it will be difficult for attribute p to filter the candidate item, and p is not a suitable attribute. , the candidate item attribute can quickly filter the candidate items.
S105、将完成计算后的候选项目集及候选属性集输入策略网络中进行强化学习,向用户进行对话推荐。S105 , input the calculated candidate item set and the candidate attribute set into the strategy network for reinforcement learning, and perform dialogue recommendation to the user.
需要说明的是,策略网络基于深度强化学习网络,可根据输入的状态及用户的历史对话结果,生成当前对话轮次的动作决策,进而可利用该动作决策进行本轮对话推荐。本发明实施例并不限定策略网络的强化学习过程,可参考深度强化学习网络的相关技术。本发明实施例也不限定策略网络的网络优化方式,同样可参考深度强化学习网络的相关技术。It should be noted that the policy network is based on a deep reinforcement learning network, which can generate an action decision for the current dialogue round according to the input state and the user's historical dialogue results, and then use the action decision to recommend this round of dialogue. The embodiments of the present invention do not limit the reinforcement learning process of the policy network, and reference may be made to the related technologies of the deep reinforcement learning network. The embodiments of the present invention also do not limit the network optimization method of the policy network, and reference may also be made to the related technologies of the deep reinforcement learning network.
在一种可能的情况中,将完成计算后的候选项目集及候选属性集输入策略网络中进行强化学习,向用户进行对话推荐的过程,可以包括:In a possible situation, the calculated candidate item set and candidate attribute set are input into the policy network for reinforcement learning, and the process of recommending dialogue to the user may include:
步骤41:将候选项目集及候选属性集输入策略网络中进行强化学习,生成动作决策。Step 41: Input the candidate item set and the candidate attribute set into the policy network for reinforcement learning to generate action decisions.
具体的,策略网络通过强化学习确定本轮对话推荐的动作是进行询问还是进行推荐。策略网络涉及四种值,即状态、动作、奖励和策略。状态包含对话历史及当前候选项目集的长度,用s his编码对话历史,其大小为对话推荐的最大轮次T,每个维度表示用户在第t轮的对话历史。可利用特殊值表示用户的对话历史。需要说明的是,本发明并不限定具体的特殊值,可根据实际应用需求进行设置。在一种可能的情况中,可利用-2表示推荐失败,利用-1表示用户拒绝了询问的项目属性,2表示推荐成功,1表示用户接受了询问的项目属性;用s len编码当前候选项目集的长度。可对上述两种状态进行向量拼接,得到总状态: Specifically, the policy network determines whether the action recommended by the current round of dialogue is to ask or recommend through reinforcement learning. The policy network involves four values, namely state, action, reward and policy. The state contains the dialogue history and the length of the current candidate item set. The dialogue history is encoded by s his , the size of which is the maximum round T of dialogue recommendation, and each dimension represents the user's dialogue history in the t-th round. The user's conversation history can be represented by a special value. It should be noted that the present invention does not limit specific special values, which can be set according to actual application requirements. In a possible situation, you can use -2 to indicate that the recommendation failed, use -1 to indicate that the user rejected the query item attribute, 2 to indicate that the recommendation was successful, and 1 to indicate that the user accepted the query item attribute; use slen to encode the current candidate item length of the set. The vector splicing of the above two states can be performed to obtain the total state:
Figure PCTCN2021122167-appb-000030
Figure PCTCN2021122167-appb-000030
本发明实施例包含两种动作,即询问a ask和推荐a recThe embodiment of the present invention includes two actions, ie, asking a ask and recommending a rec .
本发明实施例采用了五种奖励:The embodiment of the present invention adopts five kinds of rewards:
(1)r rec_suc,如果推荐成功给予一个强正奖励; (1)r rec_suc , if the recommendation is successful, a strong positive reward will be given;
(2)r rec_fail如果推荐失败给予一个强负奖励; (2) r rec_fail gives a strong negative reward if the recommendation fails;
(3)r ask_suc,如果询问的属性被用户接受给予一个轻正奖励; (3)r ask_suc , if the attribute of the query is accepted by the user, a mild reward is given;
(4)r ask_fail,如果询问的属性被用户拒绝给予一个负奖励; (4)r ask_fail , if the requested attribute is rejected by the user, a negative reward is given;
(5)r quit,如果用户退出或达到最大轮次给予一个强负奖励。 (5)r quit , give a strong negative reward if the user quits or reaches the maximum round.
在第t轮的中间奖励r t就是这五个的加权和。 The middle reward rt in round t is the weighted sum of these five.
需要说明的是,本发明实施例并不限定上述奖励值的具体设置方式,可根据实际应用需求进行设置。It should be noted that, the embodiment of the present invention does not limit the specific setting method of the above-mentioned reward value, which can be set according to actual application requirements.
将当前的对话状态s输入策略网络后,策略网络将会输出关于两个动作的动作决策值Q(s,a),随后便可根据该动作决策值确定本轮对话的动作。在本发明实施例中,策略网络利用标准的深度Q-learning进行网络优化。After inputting the current dialogue state s into the policy network, the policy network will output the action decision value Q(s, a) about the two actions, and then the action of this round of dialogue can be determined according to the action decision value. In the embodiment of the present invention, the policy network uses standard deep Q-learning to optimize the network.
步骤42:利用动作决策向用户进行对话推荐。Step 42: Use action decision to make dialogue recommendation to the user.
具体的,利用动作决策向用户进行对话推荐的过程,可以包括:Specifically, the process of recommending dialogues to users by using action decisions may include:
步骤51:当动作决策为项目推荐时,向用户端发送交互预测值最大的候选项目,并接收反馈数据;Step 51: when the action decision is an item recommendation, send the candidate item with the largest interaction prediction value to the user terminal, and receive feedback data;
步骤52:当反馈数据表示接受候选项目,则退出对话推荐;Step 52: When the feedback data indicates that the candidate item is accepted, exit the dialogue recommendation;
步骤53:当反馈数据表示拒绝候选项目,则在候选项目集中移除候选项目,并利用完成移除后的候选项目集,执行利用项目相关值计算更新后的候选项目集中各候选项目的交互预测值;Step 53: When the feedback data indicates that the candidate item is rejected, the candidate item is removed from the candidate item set, and the candidate item set after the removal is used to perform the interactive prediction of each candidate item in the updated candidate item set using the item correlation value calculation. value;
具体的,当策略网络确定本轮对话为项目推荐,则会从V cand中提取交互预测值较大的前预设数量n个候选项目V rec。若用户接受了该项目,则对话推荐结束;否则,应当从候选项目集V cand中移除推荐的项目,即V cand=V cand-V recSpecifically, when the policy network determines that the current round of dialogue is an item recommendation, it will extract from V cand the first preset number of n candidate items V rec with larger interaction prediction values. If the user accepts the item, the dialog recommendation ends; otherwise, the recommended item should be removed from the candidate item set V cand , ie V cand =V cand -V rec .
步骤54:当动作决策为属性询问时,向用户端发送偏好预测值最大的候选项目属性,并接收反馈数据;Step 54: when the action decision is an attribute query, send the attribute of the candidate item with the largest preference prediction value to the user terminal, and receive the feedback data;
步骤55:利用反馈数据对候选项目集中的项目进行验证,并移除未通过验证的项目,最后利用完成移除后的候选项目集,执行利用项目相关值计算更新后的候选项目集中各候选项目的交互预测值的步骤。Step 55: Use the feedback data to verify the items in the candidate item set, and remove the items that have not passed the verification, and finally use the removed candidate item set to calculate each candidate item in the updated candidate item set using the item-related values. The steps for the interactive predictor of .
具体的,当策略网络确定本轮对话为项目属性询问时,从候选项目属性集P cand中选择出偏好预测值最大的候选项目属性p i∈P candSpecifically, when the policy network determines that this round of dialogue is an item attribute query, the candidate item attribute p i ∈P cand with the largest preference prediction value is selected from the candidate item attribute set P cand .
当接收到用户发回的反馈数据时,可利用反馈数据中的候选项目属性对候选项目进行验证。当反馈数据表示用户接受了候选项目属性p时,该验证过程为:更新用户接收的属性集
Figure PCTCN2021122167-appb-000031
并更新候选项目集
Figure PCTCN2021122167-appb-000032
当反馈数据表示用户拒绝了候选项目属性p时,该验证过程为:更新用户拒绝的属性集
Figure PCTCN2021122167-appb-000033
并更新候选项目集
Figure PCTCN2021122167-appb-000034
When the feedback data sent back by the user is received, the candidate item can be verified by using the candidate item attribute in the feedback data. When the feedback data indicates that the user has accepted the candidate item attribute p, the verification process is: update the attribute set received by the user
Figure PCTCN2021122167-appb-000031
and update the candidate itemset
Figure PCTCN2021122167-appb-000032
When the feedback data indicates that the user rejected the candidate item attribute p, the verification process is: update the attribute set rejected by the user
Figure PCTCN2021122167-appb-000033
and update the candidate itemset
Figure PCTCN2021122167-appb-000034
最后,可以理解的是,当对话推荐始终不能得到用户可接收的项目时,对话推荐将会持续若干轮。为了确保对话推荐的对话轮次不会一直增加,可将对话轮次利用预设阈值进行限制,当对话轮次到达预设阈值时则退出对话推荐。Finally, it can be understood that when the dialogue recommendation cannot always get an item acceptable to the user, the dialogue recommendation will continue for several rounds. In order to ensure that the number of dialogue rounds recommended by dialogue does not increase all the time, the dialogue rounds can be limited by a preset threshold, and when the dialogue round reaches the preset threshold, the dialogue recommendation is exited.
在一种可能的情况中,在利用动作决策向用户进行对话推荐之前,还可以包括:In a possible situation, before using the action decision to make a dialogue recommendation to the user, it can also include:
步骤61:判断动作决策对应的对话轮次是否大于预设阈值;Step 61: Determine whether the dialogue round corresponding to the action decision is greater than a preset threshold;
需要说明的是,本发明并不限定该预设阈值的具体数值,可根据实际应用需求进行设置。It should be noted that the present invention does not limit the specific value of the preset threshold, which can be set according to actual application requirements.
步骤62:若是,则退出对话推荐;Step 62: If yes, exit the dialogue recommendation;
步骤63:若否,则执行利用动作决策向用户进行对话推荐的步骤。Step 63: If not, execute the step of recommending dialogue to the user by using the action decision.
基于上述实施例,本方法首先获取了用户与项目之间的历史交互序列,由于该序列反映了用户对历史项目的偏好及交互顺序,因此本方法利用历史交互序列参与推荐网络模型训练,可确保利用推荐网络模型生成的候选项目集同时考虑了用户对历史项目的偏好及交互顺序,可生成更符合用户偏好的候选项目集,进而确保在对话推荐中可利用候选项目集有针对性地向用户进行对话,能够有效减少对话推荐的对话轮次,同时能够提升对话推荐的准确性。Based on the above embodiment, the method first obtains the historical interaction sequence between the user and the item. Since the sequence reflects the user's preference for historical items and the interaction sequence, the method uses the historical interaction sequence to participate in the training of the recommendation network model, which can ensure The candidate item set generated by the recommendation network model also considers the user's preference for historical items and the interaction order, and can generate a candidate item set that is more in line with the user's preference, thereby ensuring that the candidate item set can be used in dialogue recommendation. Conducting a dialogue can effectively reduce the number of dialogue rounds of dialogue recommendation, and at the same time can improve the accuracy of dialogue recommendation.
下面对本发明实施例提供的一种对话推荐装置、电子设备及存储介质进行介绍,下文描述的对话推荐装置、电子设备及存储介质与上文描述的对话推荐方法可相互对应参照。The following describes a dialogue recommendation apparatus, electronic device, and storage medium provided by the embodiments of the present invention. The dialogue recommendation apparatus, electronic equipment, and storage medium described below and the dialogue recommendation method described above may refer to each other correspondingly.
请参考图2,图2为本发明实施例所提供的一种对话推荐装置的结构框图,该装置可以包括:Please refer to FIG. 2. FIG. 2 is a structural block diagram of a dialogue recommendation apparatus provided by an embodiment of the present invention. The apparatus may include:
推荐网络模块201,用于获取用户与项目之间的历史交互序列,并将历史交互序列输入至包含嵌入层、自注意块及预测层的推荐网络模型中进 行训练,生成项目偏好值;其中,项目包含项目属性;The recommendation network module 201 is used to obtain the historical interaction sequence between the user and the item, and input the historical interaction sequence into the recommendation network model including the embedding layer, the self-attention block and the prediction layer for training, and generate the item preference value; wherein, project contains project properties;
候选项目集生成模块202,用于利用项目偏好值及用户未交互过的项目生成候选项目集;A candidate item set generation module 202, configured to generate a candidate item set by utilizing the item preference value and items that the user has not interacted with;
第一计算模块203,用于当接收到用户发送的偏好项目属性时,利用偏好项目属性更新候选项目集,并利用项目相关值计算更新后的候选项目集中各候选项目的交互预测值;The first calculation module 203 is configured to update the candidate item set using the preference item attribute when receiving the preference item attribute sent by the user, and use the item correlation value to calculate the interactive prediction value of each candidate item in the updated candidate item set;
第二计算模块204,用于利用交互预测值及候选项目生成候选属性集,并利用交互预测值计算候选属性集中各候选项目属性的偏好预测值;The second calculation module 204 is configured to generate a candidate attribute set using the interactive predicted value and the candidate item, and use the interactive predicted value to calculate the preference predicted value of each candidate item attribute in the candidate attribute set;
对话推荐模块205,用于将完成计算后的候选项目集及候选属性集输入策略网络中进行强化学习,向用户进行对话推荐。The dialogue recommendation module 205 is used for inputting the calculated candidate item set and candidate attribute set into the policy network for reinforcement learning, and performing dialogue recommendation to the user.
可选地,推荐网络模块201,可以包括:Optionally, the recommending network module 201 may include:
训练序列生成子模块,用于利用历史交互序列生成预设长度的训练序列;The training sequence generation sub-module is used to generate a training sequence of preset length by using the historical interaction sequence;
嵌入层子模块,用于将所有项目及训练序列输入嵌入层,输出整合嵌入矩阵;The embedding layer sub-module is used to input all items and training sequences into the embedding layer, and output the integrated embedding matrix;
自注意块子模块,用于将整合嵌入矩阵输入自注意块中进行迭代特征学习,生成学习矩阵;The self-attention block sub-module is used to input the integrated embedding matrix into the self-attention block for iterative feature learning and generate a learning matrix;
预测层子模块,用于将学习矩阵输入预测层进行矩阵分解,计算初始项目偏好值;The prediction layer sub-module is used to input the learning matrix into the prediction layer for matrix decomposition and calculate the initial item preference value;
网络优化子模块,用于利用二值交叉熵损失函数对推荐网络模型进行网络优化,直至二值交叉熵损失函数的输出值最小时,将输出值最小时的初始项目偏好值作为项目偏好值。The network optimization sub-module is used to optimize the recommendation network model by using the binary cross-entropy loss function until the output value of the binary cross-entropy loss function is the smallest, and the initial item preference value when the output value is the smallest is used as the item preference value.
可选地,对话推荐模块205,可以包括:Optionally, the dialogue recommendation module 205 may include:
强化学习子模块,用于将候选项目集及候选属性集输入策略网络中进行强化学习,生成动作决策;The reinforcement learning sub-module is used to input the candidate item set and the candidate attribute set into the policy network for reinforcement learning to generate action decisions;
对话推荐子模块,用于利用动作决策向用户进行对话推荐。The dialogue recommendation submodule is used to make dialogue recommendations to users using action decisions.
可选地,对话推荐子模块,可以包括:Optionally, the dialogue recommendation sub-module may include:
第一发送单元,用于当动作决策为项目推荐时,向用户端发送交互预测值最大的候选项目,并接收反馈数据;a first sending unit, configured to send the candidate item with the largest interaction prediction value to the user terminal when the action decision is an item recommendation, and receive feedback data;
第一处理单元,用于当反馈数据表示接受候选项目,则退出对话推荐;a first processing unit, configured to quit the dialogue recommendation when the feedback data indicates that the candidate item is accepted;
第二处理单元,用于当反馈数据表示拒绝候选项目,则在候选项目集中移除候选项目,并利用完成移除后的候选项目集,执行利用项目相关值计算更新后的候选项目集中各候选项目的交互预测值;The second processing unit is configured to remove the candidate item from the candidate item set when the feedback data indicates that the candidate item is rejected, and use the removed candidate item set to perform the calculation of each candidate item in the updated candidate item set using the item correlation value. The interactive predicted value of the item;
第二发送单元,用于当动作决策为属性询问时,向用户端发送偏好预测值最大的项目属性,并接收反馈数据;a second sending unit, configured to send the item attribute with the largest preference prediction value to the user terminal when the action decision is an attribute query, and receive feedback data;
第三处理单元,用于利用反馈数据对候选项目集中的项目进行验证,并移除未通过验证的项目,最后利用完成移除后的候选项目集,执行利用项目相关值计算更新后的候选项目集中各候选项目的交互预测值的步骤。The third processing unit is used to use the feedback data to verify the items in the candidate item set, remove the items that fail the verification, and finally use the removed candidate item set to calculate the updated candidate item using the item related value. The step of aggregating the interactive predictors of each candidate item.
可选地,对话推荐模块205,还可以包括:Optionally, the dialogue recommendation module 205 may further include:
对话轮次判断子模块,用于判断动作决策对应的对话轮次是否大于预设阈值;若是,则退出对话推荐;若否,则执行利用动作决策向用户进行对话推荐的步骤。The dialogue round judgment sub-module is used to judge whether the dialogue round corresponding to the action decision is greater than the preset threshold; if so, exit the dialogue recommendation; if not, execute the step of using the action decision to recommend the dialogue to the user.
可选地,第一计算模块203,包括:Optionally, the first computing module 203 includes:
移除操作子模块,用于移除候选项目集中不具有偏好项目属性的偏好项目。The removal operation submodule is used to remove the preference items that do not have the preference item attribute in the candidate item set.
可选地,第二计算模块204,包括:Optionally, the second computing module 204 includes:
候选属性集生成子模块,用于按交互预测值从大到小的顺序,利用前预设数量的候选项目所包含的项目属性生成候选属性集;The candidate attribute set generation sub-module is used to generate a candidate attribute set by using the item attributes contained in the previous preset number of candidate items in the order of the interactive prediction value from large to small;
第二计算子模块,用于利用候选项目的交互预测值,计算候选属性集中各候选项目属性的信息熵,并将信息熵作为候选项目属性的偏好预测值。The second calculation submodule is used for calculating the information entropy of each candidate item attribute in the candidate attribute set by using the interactive prediction value of the candidate item, and using the information entropy as the preference prediction value of the candidate item attribute.
基于上述实施例,请参考图3,图3为本发明实施例所提供的一种基于历史交互序列的对话推荐系统的结构框图。在本基于历史交互序列的对话推荐系统SeqCR(Sequential Conversation Recommender)中,Policy Network Module策略网络,用于完成上述实施例中对话推荐模块205的功能,例如接收反馈(feedback)并更新偏好(Update preference),并向Sequential Module序列模块更新候选项目及属性(Update Candidate items and attributes),Sequential Module序列模块,用于完成上述实施例中推荐网络模块201的功能,包含Embedding Layer嵌入层、Self-attention Block 自注意块及Prediction Layer预测层,其中自注意块包括Self-attention自注意层及Feed Forward Network前馈网络;Soring Module评分模块,用于完成上述实施例中候选项目集生成模块202、第一计算模块203及第二计算模块204的功能,如Item Scroring项目评分(计算交互预测值)及Attribute Scoring属性评分功能(计算偏好预测值),User表示用户,可接收策略网络发送的询问及向策略网络提供反馈(feedback)。Based on the above embodiment, please refer to FIG. 3 , which is a structural block diagram of a dialogue recommendation system based on a historical interaction sequence provided by an embodiment of the present invention. In this historical interaction sequence-based dialogue recommendation system SeqCR (Sequential Conversation Recommender), the Policy Network Module is used to complete the functions of the dialogue recommendation module 205 in the above embodiment, such as receiving feedback and updating preferences. ), and update candidate items and attributes (Update Candidate items and attributes) to the Sequential Module sequence module. The Sequential Module sequence module is used to complete the function of the recommended network module 201 in the above embodiment, including the Embedding Layer, Self-attention Block Self-attention block and Prediction Layer prediction layer, wherein the self-attention block includes Self-attention self-attention layer and Feed Forward Network feedforward network; Soring Module scoring module is used to complete the candidate itemset generation module 202 in the above embodiment, the first calculation The functions of the module 203 and the second calculation module 204, such as the Item Scroring item scoring (calculating the interactive prediction value) and the Attribute Scoring attribute scoring function (calculating the preference prediction value), User represents the user, and can receive the query sent by the policy network and send it to the policy network. Provide feedback.
本发明实施例还提供一种电子设备,包括:An embodiment of the present invention also provides an electronic device, including:
存储器,用于存储计算机程序;memory for storing computer programs;
处理器,用于执行计算机程序时实现如上述的对话推荐方法的步骤。The processor is configured to implement the steps of the above dialogue recommendation method when executing the computer program.
由于电子设备部分的实施例与对话推荐方法部分的实施例相互对应,因此电子设备部分的实施例请参见对话推荐方法部分的实施例的描述,这里暂不赘述。Since the embodiments of the electronic device part correspond to the embodiments of the dialogue recommendation method part, the embodiments of the electronic device part refer to the description of the embodiments of the dialogue recommendation method part, which will not be repeated here.
本发明实施例还提供一种存储介质,存储介质上存储有计算机程序,计算机程序被处理器执行时实现上述任意实施例的对话推荐方法的步骤。Embodiments of the present invention further provide a storage medium, where a computer program is stored on the storage medium, and when the computer program is executed by a processor, the steps of the dialog recommendation method of any of the foregoing embodiments are implemented.
由于存储介质部分的实施例与对话推荐方法部分的实施例相互对应,因此存储介质部分的实施例请参见对话推荐方法部分的实施例的描述,这里暂不赘述。Since the embodiments of the storage medium part correspond to the embodiments of the dialogue recommendation method part, the embodiments of the storage medium part refer to the description of the embodiments of the dialogue recommendation method part, which will not be repeated here.
说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。The various embodiments in the specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments, and the same and similar parts between the various embodiments can be referred to each other. As for the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant part can be referred to the description of the method.
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可 以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。Professionals may further realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of the two, in order to clearly illustrate the possibilities of hardware and software. Interchangeability, the above description has generally described the components and steps of each example in terms of functionality. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods of implementing the described functionality for each particular application, but such implementations should not be considered beyond the scope of the present invention.
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。The steps of a method or algorithm described in conjunction with the embodiments disclosed herein may be directly implemented in hardware, a software module executed by a processor, or a combination of the two. A software module can be placed in random access memory (RAM), internal memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other in the technical field. in any other known form of storage medium.
以上对本发明所提供的一种对话推荐方法、装置、电子设备及存储介质进行了详细介绍。本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以对本发明进行若干改进和修饰,这些改进和修饰也落入本发明权利要求的保护范围内。A dialogue recommendation method, device, electronic device and storage medium provided by the present invention have been described in detail above. The principles and implementations of the present invention are described herein by using specific examples, and the descriptions of the above embodiments are only used to help understand the method and the core idea of the present invention. It should be pointed out that for those skilled in the art, without departing from the principle of the present invention, several improvements and modifications can also be made to the present invention, and these improvements and modifications also fall within the protection scope of the claims of the present invention.

Claims (10)

  1. 一种对话推荐方法,其特征在于,包括:A dialogue recommendation method, comprising:
    获取用户与项目之间的历史交互序列,并将所述历史交互序列输入至包含嵌入层、自注意块及预测层的推荐网络模型中进行训练,生成项目偏好值;其中,所述项目包含项目属性;Obtain the historical interaction sequence between the user and the item, and input the historical interaction sequence into the recommendation network model including the embedding layer, the self-attention block and the prediction layer for training, and generate the item preference value; wherein, the item includes the item Attributes;
    利用所述项目偏好值及所述用户未交互过的项目生成候选项目集;generating a candidate item set using the item preference value and the items that the user has not interacted with;
    当接收到所述用户发送的偏好项目属性时,利用所述偏好项目属性更新所述候选项目集,并利用所述项目相关值计算更新后的候选项目集中各候选项目的交互预测值;When receiving the preference item attribute sent by the user, use the preference item attribute to update the candidate item set, and use the item correlation value to calculate the interaction prediction value of each candidate item in the updated candidate item set;
    利用所述交互预测值及所述候选项目生成候选属性集,并利用所述交互预测值计算所述候选属性集中各候选项目属性的偏好预测值;Generate a candidate attribute set using the interactive predicted value and the candidate item, and use the interactive predicted value to calculate the preference predicted value of each candidate item attribute in the candidate attribute set;
    将完成计算后的候选项目集及候选属性集输入策略网络中进行强化学习,向所述用户进行对话推荐。Input the calculated candidate item set and candidate attribute set into the policy network for reinforcement learning, and perform dialogue recommendation to the user.
  2. 根据权利要求1所述的对话推荐方法,其特征在于,所述将所述历史交互序列输入至包含嵌入层、自注意块及预测层的推荐网络模型中进行训练,生成项目偏好值,包括:The dialogue recommendation method according to claim 1, wherein the inputting the historical interaction sequence into a recommendation network model including an embedding layer, a self-attention block and a prediction layer for training to generate an item preference value, comprising:
    利用所述历史交互序列生成预设长度的训练序列;Use the historical interaction sequence to generate a training sequence of preset length;
    将所有所述项目及所述训练序列输入所述嵌入层,输出整合嵌入矩阵;Input all the items and the training sequence into the embedding layer, and output an integrated embedding matrix;
    将所述整合嵌入矩阵输入所述自注意块中进行迭代特征学习,生成学习矩阵;Inputting the integrated embedding matrix into the self-attention block for iterative feature learning to generate a learning matrix;
    将所述学习矩阵输入所述预测层进行矩阵分解,计算初始项目偏好值;Inputting the learning matrix into the prediction layer for matrix decomposition, and calculating the initial item preference value;
    利用二值交叉熵损失函数对所述推荐网络模型进行网络优化,直至所述二值交叉熵损失函数的输出值最小时,将所述输出值最小时的初始项目偏好值作为所述项目偏好值。The recommendation network model is optimized by using a binary cross-entropy loss function until the output value of the binary cross-entropy loss function is the smallest, and the initial item preference value when the output value is the smallest is used as the item preference value .
  3. 根据权利要求1所述的对话推荐方法,其特征在于,所述将完成计算后的候选项目集及候选属性集输入策略网络中进行强化学习,向所述用户进行对话推荐,包括:The dialogue recommendation method according to claim 1, wherein the inputting the calculated candidate item set and the candidate attribute set into a strategy network for reinforcement learning, and performing dialogue recommendation to the user, comprising:
    将所述候选项目集及所述候选属性集输入所述策略网络中进行所述强化学习,生成动作决策;Inputting the candidate item set and the candidate attribute set into the strategy network to perform the reinforcement learning to generate an action decision;
    利用所述动作决策向所述用户进行对话推荐。A dialog recommendation is made to the user using the action decision.
  4. 根据权利要求3所述的对话推荐方法,其特征在于,所述利用所述动作决策向所述用户进行对话推荐,包括:The dialogue recommendation method according to claim 3, wherein the performing dialogue recommendation to the user by using the action decision comprises:
    当所述动作决策为项目推荐时,向用户端发送所述交互预测值最大的候选项目,并接收反馈数据;When the action decision is an item recommendation, send the candidate item with the largest interaction prediction value to the user terminal, and receive feedback data;
    当所述反馈数据表示接受所述候选项目,则退出对话推荐;When the feedback data indicates that the candidate item is accepted, exit the dialogue recommendation;
    当所述反馈数据表示拒绝所述候选项目,则在所述候选项目集中移除所述候选项目,并利用完成移除后的候选项目集,执行所述利用所述项目相关值计算更新后的候选项目集中各候选项目的交互预测值;When the feedback data indicates that the candidate item is rejected, the candidate item is removed from the candidate item set, and using the candidate item set after the removal is completed, the calculation of the updated using the item correlation value is performed. The interactive prediction value of each candidate item in the candidate item set;
    当所述动作决策为属性询问时,向用户端发送所述偏好预测值最大的候选项目属性,并接收所述反馈数据;When the action decision is an attribute query, send the attribute of the candidate item with the largest preference prediction value to the user terminal, and receive the feedback data;
    利用所述反馈数据对所述候选项目集中的项目进行验证,并移除未通过验证的项目,最后利用完成移除后的候选项目集,执行所述利用所述项目相关值计算更新后的候选项目集中各候选项目的交互预测值的步骤。Use the feedback data to verify the items in the candidate item set, remove the items that fail the verification, and finally use the removed candidate item set to perform the calculation of the updated candidate item using the item related value The step of interactively predicting the value of each candidate item in the item set.
  5. 根据权利要求3所述的对话推荐方法,其特征在于,在利用所述动作决策向所述用户进行对话推荐之前,还包括:The dialogue recommendation method according to claim 3, wherein before using the action decision to recommend the dialogue to the user, the method further comprises:
    判断所述动作决策对应的对话轮次是否大于预设阈值;judging whether the dialogue round corresponding to the action decision is greater than a preset threshold;
    若是,则退出所述对话推荐;If so, exit the dialog recommendation;
    若否,则执行利用所述动作决策向所述用户进行对话推荐的步骤。If not, perform the step of making dialog recommendation to the user by using the action decision.
  6. 根据权利要求1所述的对话推荐方法,其特征在于,所述利用所述偏好项目属性更新所述候选项目集,包括:The dialogue recommendation method according to claim 1, wherein the updating the candidate item set using the preference item attribute comprises:
    移除所述候选项目集中不具有所述偏好项目属性的偏好项目。A preference item that does not have the preference item attribute in the candidate item set is removed.
  7. 根据权利要求1至6所述的对话推荐方法,其特征在于,所述利用所述交互预测值及所述候选项目生成候选属性集,并利用所述交互预测值计算所述候选属性集中各候选项目属性的偏好预测值,包括:The dialogue recommendation method according to claims 1 to 6, characterized in that, generating a candidate attribute set by using the interaction prediction value and the candidate item, and calculating each candidate attribute set in the candidate attribute set using the interaction prediction value Preference predictors for item attributes, including:
    按所述交互预测值从大到小的顺序,利用前预设数量的所述候选项目所包含的项目属性生成候选属性集;Generate a candidate attribute set by utilizing the item attributes contained in the first preset number of candidate items in descending order of the interactive predicted values;
    利用所述候选项目的交互预测值,计算所述候选属性集中各候选项目属性的信息熵,并将所述信息熵作为所述候选项目属性的偏好预测值。Using the interactive prediction value of the candidate item, calculate the information entropy of each candidate item attribute in the candidate attribute set, and use the information entropy as the preference prediction value of the candidate item attribute.
  8. 一种对话推荐装置,其特征在于,包括:A dialogue recommendation device, comprising:
    推荐网络模块,用于获取用户与项目之间的历史交互序列,并将所述历史交互序列输入至包含嵌入层、自注意块及预测层的推荐网络模型中进行训练,生成项目偏好值;其中,所述项目包含项目属性;The recommendation network module is used to obtain the historical interaction sequence between the user and the item, and input the historical interaction sequence into the recommendation network model including the embedding layer, the self-attention block and the prediction layer for training to generate the item preference value; wherein , the item contains item properties;
    候选项目集生成模块,用于利用所述项目偏好值及所述用户未交互过的项目生成候选项目集;A candidate item set generation module, configured to generate a candidate item set by using the item preference value and the items that the user has not interacted with;
    第一计算模块,用于当接收到所述用户发送的偏好项目属性时,利用所述偏好项目属性更新所述候选项目集,并利用所述项目相关值计算更新后的候选项目集中各候选项目的交互预测值;a first calculation module, configured to update the candidate item set using the preference item attribute when receiving the preference item attribute sent by the user, and use the item correlation value to calculate each candidate item in the updated candidate item set The interactive prediction value of ;
    第二计算模块,用于利用所述交互预测值及所述候选项目生成候选属性集,并利用所述交互预测值计算所述候选属性集中各候选项目属性的偏好预测值;a second calculation module, configured to generate a candidate attribute set using the interactive predicted value and the candidate item, and use the interactive predicted value to calculate the preference predicted value of each candidate item attribute in the candidate attribute set;
    对话推荐模块,用于将完成计算后的候选项目集及候选属性集输入策略网络中进行强化学习,向所述用户进行对话推荐。The dialogue recommendation module is used for inputting the calculated candidate item set and candidate attribute set into the policy network for reinforcement learning, and performing dialogue recommendation to the user.
  9. 一种电子设备,其特征在于,包括:An electronic device, comprising:
    存储器,用于存储计算机程序;memory for storing computer programs;
    处理器,用于执行所述计算机程序时实现如权利要求1至7任一项所述的对话推荐方法。The processor is configured to implement the dialogue recommendation method according to any one of claims 1 to 7 when executing the computer program.
  10. 一种存储介质,其特征在于,所述存储介质中存储有计算机可执行指令,所述计算机可执行指令被处理器加载并执行时,实现如权利要求1至7任一项所述的对话推荐方法。A storage medium, characterized in that the storage medium stores computer-executable instructions, and when the computer-executable instructions are loaded and executed by a processor, the dialogue recommendation according to any one of claims 1 to 7 is implemented method.
PCT/CN2021/122167 2021-03-23 2021-09-30 Conversation recommendation method and apparatus, electronic device, and storage medium WO2022198983A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110308759.6 2021-03-23
CN202110308759.6A CN112925892B (en) 2021-03-23 2021-03-23 Dialogue recommendation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2022198983A1 true WO2022198983A1 (en) 2022-09-29

Family

ID=76175614

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/122167 WO2022198983A1 (en) 2021-03-23 2021-09-30 Conversation recommendation method and apparatus, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN112925892B (en)
WO (1) WO2022198983A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112925892B (en) * 2021-03-23 2023-08-15 苏州大学 Dialogue recommendation method and device, electronic equipment and storage medium
CN113487379B (en) * 2021-06-24 2023-01-13 上海淇馥信息技术有限公司 Product recommendation method and device based on conversation mode and electronic equipment
CN113468420B (en) * 2021-06-29 2024-04-05 杭州摸象大数据科技有限公司 Product recommendation method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004114154A1 (en) * 2003-06-23 2004-12-29 University College Dublin, National University Of Ireland, Dublin A retrieval system and method
CN110008409A (en) * 2019-04-12 2019-07-12 苏州市职业大学 Based on the sequence of recommendation method, device and equipment from attention mechanism
CN110390108A (en) * 2019-07-29 2019-10-29 中国工商银行股份有限公司 Task exchange method and system based on deeply study
CN111797321A (en) * 2020-07-07 2020-10-20 山东大学 Personalized knowledge recommendation method and system for different scenes
CN112925892A (en) * 2021-03-23 2021-06-08 苏州大学 Conversation recommendation method and device, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11470165B2 (en) * 2018-02-13 2022-10-11 Ebay, Inc. System, method, and medium for generating physical product customization parameters based on multiple disparate sources of computing activity

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004114154A1 (en) * 2003-06-23 2004-12-29 University College Dublin, National University Of Ireland, Dublin A retrieval system and method
CN110008409A (en) * 2019-04-12 2019-07-12 苏州市职业大学 Based on the sequence of recommendation method, device and equipment from attention mechanism
CN110390108A (en) * 2019-07-29 2019-10-29 中国工商银行股份有限公司 Task exchange method and system based on deeply study
CN111797321A (en) * 2020-07-07 2020-10-20 山东大学 Personalized knowledge recommendation method and system for different scenes
CN112925892A (en) * 2021-03-23 2021-06-08 苏州大学 Conversation recommendation method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112925892A (en) 2021-06-08
CN112925892B (en) 2023-08-15

Similar Documents

Publication Publication Date Title
WO2022198983A1 (en) Conversation recommendation method and apparatus, electronic device, and storage medium
EP3711000B1 (en) Regularized neural network architecture search
CN109299396B (en) Convolutional neural network collaborative filtering recommendation method and system fusing attention model
CN110457589B (en) Vehicle recommendation method, device, equipment and storage medium
CN111506820B (en) Recommendation model, recommendation method, recommendation device, recommendation equipment and recommendation storage medium
CN113705811B (en) Model training method, device, computer program product and equipment
JP6819355B2 (en) Recommendation generation
US11423307B2 (en) Taxonomy construction via graph-based cross-domain knowledge transfer
CN108921342B (en) Logistics customer loss prediction method, medium and system
TW202001749A (en) Arbitrage identification method and device
WO2010048758A1 (en) Classification of a document according to a weighted search tree created by genetic algorithms
Wang et al. Learning to augment for casual user recommendation
CN115130542A (en) Model training method, text processing device and electronic equipment
CN111010595B (en) New program recommendation method and device
CN116401522A (en) Financial service dynamic recommendation method and device
CN113362852A (en) User attribute identification method and device
CN111445032A (en) Method and device for decision processing by using business decision model
CN116308551A (en) Content recommendation method and system based on digital financial AI platform
WO2023009766A1 (en) Evaluating output sequences using an auto-regressive language model neural network
US20220261683A1 (en) Constraint sampling reinforcement learning for recommendation systems
KR102612986B1 (en) Online recomending system, method and apparatus for updating recommender based on meta-leaining
Candan et al. Non stationary operator selection with island models
CN116776870B (en) Intention recognition method, device, computer equipment and medium
CN113779396B (en) Question recommending method and device, electronic equipment and storage medium
KR102618066B1 (en) Method, device and system for strengthening military security based on natural language process and image compare in soldier based community application

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21932592

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE