WO2022198983A1 - Procédé et appareil de recommandation de conversation, dispositif électronique et support de stockage - Google Patents

Procédé et appareil de recommandation de conversation, dispositif électronique et support de stockage Download PDF

Info

Publication number
WO2022198983A1
WO2022198983A1 PCT/CN2021/122167 CN2021122167W WO2022198983A1 WO 2022198983 A1 WO2022198983 A1 WO 2022198983A1 CN 2021122167 W CN2021122167 W CN 2021122167W WO 2022198983 A1 WO2022198983 A1 WO 2022198983A1
Authority
WO
WIPO (PCT)
Prior art keywords
item
candidate
attribute
preference
value
Prior art date
Application number
PCT/CN2021/122167
Other languages
English (en)
Chinese (zh)
Inventor
赵朋朋
田鑫涛
郝永静
Original Assignee
苏州大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州大学 filed Critical 苏州大学
Publication of WO2022198983A1 publication Critical patent/WO2022198983A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention relates to the field of dialogue recommendation, in particular to a dialogue recommendation method, device, electronic device and storage medium.
  • Conversational Recommender System is a recommender system that can actively obtain preference attributes from users and use the attributes to recommend items.
  • dialogue recommendation can use the preference attribute currently asked to the user for recommendation, and it can ask the user for preference attribute or recommend the item considering the historical items that the user has interacted with, but ignore the interaction between the user and the historical item
  • the impact of order on recommendation ignores the importance of the sequence of interaction history items, which makes it difficult for existing dialogue recommendation methods to efficiently and accurately recommend dialogue with users.
  • the purpose of the present invention is to provide a dialogue recommendation method, device, electronic device and storage medium, which can use the historical interaction sequence reflecting the user's historical item preference to train the recommendation network model and generate the candidate item set, thereby ensuring that the dialogue recommendation can be targeted Conduct conversations with sexual users to improve the efficiency and accuracy of conversation recommendations.
  • the present invention provides a dialogue recommendation method, including:
  • a dialog recommendation is made to the user using the action decision.
  • the performing a dialogue recommendation to the user by using the action decision includes:
  • the attribute of the candidate item with the largest predicted preference value is sent to the user terminal, and the feedback data is received;
  • the updating the candidate item set using the preference item attribute includes:
  • a preference item that does not have the preference item attribute in the candidate item set is removed.
  • generating a candidate attribute set using the interactive predicted value and the candidate item, and calculating the preference predicted value of each candidate item attribute in the candidate attribute set using the interactive predicted value includes:
  • the present invention also provides a dialogue recommendation device, comprising:
  • a candidate item set generation module configured to generate a candidate item set by using the item preference value and the items that the user has not interacted with
  • a first calculation module configured to update the candidate item set using the preference item attribute when receiving the preference item attribute sent by the user, and use the item correlation value to calculate each candidate item in the updated candidate item set
  • a second calculation module configured to generate a candidate attribute set using the interactive predicted value and the candidate item, and use the interactive predicted value to calculate the preference predicted value of each candidate item attribute in the candidate attribute set;
  • the dialogue recommendation module is used for inputting the calculated candidate item set and candidate attribute set into the policy network for reinforcement learning, and performing dialogue recommendation to the user.
  • the present invention also provides an electronic device, comprising:
  • the processor is configured to implement the above-mentioned dialogue recommendation method when executing the computer program.
  • the present invention further provides a storage medium, where computer-executable instructions are stored in the storage medium, and when the computer-executable instructions are loaded and executed by a processor, the above-mentioned dialog recommendation method is implemented.
  • the present invention provides a dialogue recommendation method, comprising: acquiring a historical interaction sequence between a user and an item, and inputting the historical interaction sequence into a recommendation network model including an embedding layer, a self-attention block and a prediction layer for training, and generating an item preference value; wherein the item includes an item attribute; a candidate item set is generated using the item preference value and an item that the user has not interacted with; when the preference item attribute sent by the user is received, the preference item is used Item attributes update the candidate item set, and use the item correlation value to calculate the interactive prediction value of each candidate item in the updated candidate item set; use the interactive prediction value and the candidate item to generate a candidate attribute set, and use the
  • the interaction prediction value is used to calculate the preference prediction value of each candidate item attribute in the candidate attribute set; the calculated candidate item set and candidate attribute set are input into the strategy network for reinforcement learning, and dialogue recommendation is made to the user.
  • this method first obtains the historical interaction sequence between the user and the item. Since the sequence reflects the user's preference for historical items and the interaction sequence, this method uses the historical interaction sequence to participate in the training of the recommendation network model, which can ensure the use of the recommendation network.
  • the candidate item set generated by the model also considers the user's preference for historical items and the interaction order, and can generate a candidate item set that is more in line with the user's preference, thereby ensuring that the candidate item set can be used in dialogue recommendation to conduct targeted dialogues with users. It can effectively reduce the dialogue rounds of dialogue recommendation, and at the same time can improve the accuracy of dialogue recommendation.
  • the present invention also provides a dialogue recommendation device, an electronic device and a storage medium, which have the above beneficial effects.
  • FIG. 1 is a flowchart of a dialog recommendation method provided by an embodiment of the present invention
  • FIG. 3 is a structural block diagram of a dialogue recommendation system based on historical interaction sequences provided by an embodiment of the present invention.
  • Conversational Recommender System is a recommender system that can actively obtain preference attributes from users and use the attributes to recommend items.
  • the dialogue recommendation can only be recommended by using the preference attributes currently inquired to the user, and it is difficult to consider the user's historical item preferences. It is difficult to make conversational recommendations with users efficiently and accurately.
  • the present invention provides a dialogue recommendation method, which can use the historical interaction sequence reflecting the user's historical item preference to train the recommendation network model and generate the candidate item set, thereby ensuring that the dialogue recommendation can conduct dialogue with the user in a targeted manner and improve the dialogue Recommended efficiency and accuracy. Please refer to FIG. 1.
  • FIG. 1 is a flowchart of a dialog recommendation method provided by an embodiment of the present invention. The method may include:
  • S101 Obtain a historical interaction sequence between a user and an item, and input the historical interaction sequence into a recommendation network model including an embedding layer, a self-attention block, and a prediction layer for training to generate an item preference value; the item includes an item attribute.
  • a recommendation network model including an embedding layer, a self-attention block, and a prediction layer for training to generate an item preference value; the item includes an item attribute.
  • the historical interaction sequence is the record of the user's interaction with the item in the past, and the items in the historical interaction sequence are sorted according to the time sequence of the interaction between the user and the item, and the interaction can be click to view, favorite, purchase, etc. Since the historical interaction sequence includes the user's historical preference for items and the user's interaction sequence for historical items, the embodiment of the present invention integrates the historical interaction sequence into the training of the recommendation network model, which can ensure that the recommendation network model can simultaneously integrate the user's historical preferences And the interactive sequence characteristics of historical items, which can ensure that the dialogue recommendation can conduct targeted dialogues according to the user's historical preferences, which can effectively improve the efficiency and accuracy of dialogue recommendation. It should be noted that the embodiments of the present invention do not limit a specific item.
  • the item is a specific item, for example, a physical item such as a book, or a virtual item such as a movie.
  • This embodiment of the present invention also does not limit the specific item attributes.
  • the item attributes reflect a certain feature of the item.
  • the item attribute can be the type of the book, such as novels, biographies, and textbooks. etc., it can also be an item attribute indicating whether it is a popular book, or it can be other item attributes related to books.
  • This embodiment of the present invention also does not limit the number of item attributes that an item can contain, and an item can have one or more item attributes.
  • the recommendation network model used in the embodiment of the present invention is based on a deep learning neural network.
  • the embodiments of the present invention do not limit the specific structures and learning methods of the embedding layer, the self-attention block, and the prediction layer in the recommendation network model, and users may refer to related technologies of deep learning neural networks.
  • the embodiment of the present invention does not limit the number of items that can be included in the historical interaction sequence, and one or more items can be included in the historical interaction sequence. Since the length of the historical interaction sequence of different users is different, in order to facilitate the training of the recommendation network model, the historical interaction sequence can be converted into a training sequence of preset length. It is understandable that when the number of items contained in the historical interaction sequence is too small, it will be difficult to calculate a reliable preference for the user, so a minimum number of items can be set for the historical interaction sequence, when the number of items contained in the historical interaction sequence is less than the minimum number of items. , the network model training will not be performed.
  • the embodiments of the present invention do not limit the specific value of the minimum number of items, which can be set by the user according to actual application requirements.
  • the minimum number of items may be 10. It should be noted that the embodiment of the present invention does not limit the specific value of the preset length, which can be set according to actual application requirements.
  • the historical interaction sequence is input into the recommendation network model including the embedding layer, the self-attention block and the prediction layer for training, and the process of generating the item preference value may include:
  • Step 11 Use the historical interaction sequence to generate a training sequence of preset length.
  • Step 12 Input all items and training sequences into the embedding layer, and output the integrated embedding matrix.
  • ⁇ d is obtained.
  • the training sequence embedding matrix E is integrated with a learnable position embedding matrix P ⁇ Rn ⁇ d to obtain the integrated embedding matrix:
  • Step 13 Input the integrated embedding matrix into the self-attention block for iterative feature learning to generate a learning matrix.
  • the embodiment of the present invention does not limit the specific learning method of the self-attention block, and the user may refer to the related technology of self-attention.
  • the embodiments of the present invention also do not limit the specific structure of the self-attention block.
  • the self-attention block includes a self-attention layer and a point-to-point feedforward network.
  • the present invention also does not limit whether to use multiple self-attention blocks for stacking computation. It can be understood that the more layers are stacked, the more features the self-attention block can learn.
  • the self-attention layer has three matrices Q(Query, query), K(Key, key), V(Value, value), all of which come from the same input into the self-attention block .
  • the dot product between Q and K will be calculated first. In order to prevent the dot product result from being too large, it will be divided by a scale. where d is the dimension of the query and key vectors. Finally, the calculation result is normalized and converted into a probability distribution through a softmax operation, and multiplied by the matrix V to obtain the representation of the weight sum. Attention is defined by a scalar dot product as:
  • the self-attention layer learns the complete training sequence. For this, a mask operation can be used to block the connection between Qi and Kj in the self-attention layer (i ⁇ j ). After the self-attention layer outputs the learning result Sk based on the first k items, a point-to-two-layer feedforward network can be used to convert the self-attention layer from a linear model to a nonlinear model. Meanwhile, in order to learn more complex item transformations, iterative feature learning can be performed by stacking self-attention blocks. The embodiment of the present invention does not limit the specific number of stacked layers, which can be set according to actual application requirements. For the bth (b>1) self-attention block, it can be defined as:
  • SA Self Attention
  • FFN Field Forward Network
  • RELU Rectified Linear Unit
  • W Q , W K , W V , W 1 , W 2 ⁇ R d ⁇ d are all learnable matrices
  • b 1 , b 2 are d-dimensional vectors.
  • Multi-layer neural network has strong feature learning ability. But simply adding more network layers can lead to problems such as overfitting and consuming more training time, because when the network becomes deeper, the hidden danger of gradient disappearance will also increase, and the model performance will decrease.
  • the above situation can be alleviated by residual connections.
  • the process of residual connection can be as follows: normalize the input x of the self-attention layer and the feedforward network, and at the same time perform a dropout (drop method) operation on the output of the self-attention layer and the feedforward network, and finally put the original The output x is added to the output after completing the dropout operation as the final output.
  • Step 14 Input the learning matrix into the prediction layer for matrix decomposition, and calculate the key value of the initial project
  • the MF layer (Matrix Factorization, matrix factorization layer) is input to perform matrix factorization to calculate the item-related value of item i to predict whether item i is a recommendable item.
  • the item-related value can be calculated by the following formula:
  • ri ,k represents the relevance of item i becoming the next item (that can be recommended) based on the first k items; N ⁇ R
  • Step 15 Use the binary cross-entropy loss function to optimize the recommendation network model until the output value of the binary cross-entropy loss function is the smallest, and set the initial item preference value when the output value is the smallest as the item preference value.
  • the embodiments of the present invention do not limit the specific process of network optimization using a binary cross entropy loss function (Binary cross entropy), and reference may be made to related technologies.
  • the binary cross-entropy loss function can be expressed as:
  • e t represents the expected output of the network at time step t
  • the item preference value can represent the user's interest preference for the item
  • the item preference value will be used as an indicator for generating a candidate item set for sorting the items. It can be understood that, in order to predict the items that the user may be interested in in the future, in this embodiment of the present invention, the items that the user has not interacted with will be used to generate a candidate item set, and the items that the user has not interacted with can be obtained by taking the total set containing all the items. It is obtained from the difference set of the historical item set corresponding to the historical interaction sequence.
  • dialog recommendation the dialog needs to be initiated by the user first sending a preferred item attribute.
  • the embodiment of the present invention does not limit the specific process for the user to send the preference item attribute.
  • the user can select from a preset item attribute list and send the selection data, or the user can send a dialogue recommendation request, and the dialogue recommendation system can send the selection data.
  • the embodiment of the present invention does not limit the specific method of using the preference item attribute and the candidate item set to generate the candidate attribute set. It can be understood that the item that does not have the preference item attribute in the candidate item set can be removed, and the candidate item can also be retained. Items with this preference item property are collected.
  • updating the candidate item set with the preference item attribute may include:
  • Step 21 Remove the items that do not have the preference item attribute in the candidate item set.
  • the following specifically describes the process of using the item correlation value to calculate the interactive prediction value of each candidate item in the updated candidate item set.
  • the set of attributes accepted by the user before the dialog starts and user denied attribute set are all empty
  • the candidate item set V cand contains items that the user has not interacted with
  • the candidate attribute set P cand is also empty.
  • the preference item attribute can be used to first remove the items that do not have the preference item attribute in the candidate item set to update the candidate item set, which can be specifically expressed as:
  • the output of the multi-layer self-attention block can be exploited
  • Calculate the interactive prediction value of each candidate item in the candidate item set It is obtained by inputting the initial training sequence V'u of the historical interaction sequence into the b-layer self-attention block.
  • the calculation process of the interactive predicted value can be expressed as:
  • item attributes belong to the item, and when users like certain items, they also have preferences for certain item attributes in these items, so after obtaining the interactive prediction value of the candidate item, the preference prediction of the candidate item attribute can also be calculated. value to determine the candidate item attributes that the user prefers.
  • the embodiment of the present invention does not limit the specific method of calculating the attribute preference prediction value of candidate items. It can be understood that multiple items can have the same item attribute.
  • the interactive prediction value of the item to which the item attribute belongs is calculated.
  • the average value of the interaction prediction value is used as the preference prediction value of the item attribute; of course, the information entropy of the candidate item attribute can also be calculated, and the information entropy can be used as the preference prediction value of the candidate item attribute, where the information entropy is A measure that removes information uncertainty. The lower the probability of something happening, the greater the information entropy it can provide when it happens.
  • the candidate item set can be efficiently filtered. Therefore, in the embodiment of the present invention, the preference prediction value of the candidate item attribute can be calculated by calculating the information entropy.
  • the amount of calculation of information entropy is large, so the candidate items can be sorted by the interactive prediction value, and the candidate attribute set can be generated by using the first preset number of candidate items with larger interactive prediction value, and finally the candidate attribute set can be sorted. Carry out information entropy calculation. It should be noted that the embodiment of the present invention does not limit the specific value of the preset number, which can be set according to actual application requirements.
  • the process of using the interactive predicted value and the candidate item to generate a candidate attribute set, and using the interactive predicted value to calculate the preference predicted value of each candidate item attribute in the candidate attribute set may include:
  • Step 31 Generate a candidate attribute set using the item attributes contained in the previous preset number of candidate items in the descending order of the interactive prediction values;
  • the candidate item set V cand is sorted by using the order of interactive prediction values from large to small, and the candidate attribute set is generated by using the first L candidate items:
  • Step 32 Calculate the information entropy of each candidate item attribute in the candidate attribute set by using the interactive prediction value of the candidate item, and use the information entropy as the preference prediction value of the candidate item attribute.
  • the information entropy of the candidate item attribute can be calculated by the following formula:
  • is the sigmoid function (activation function)
  • sv is the interactive prediction value of the candidate item V
  • Vp represents the item containing the item attribute p.
  • the embodiment of the present invention uses the method of weight entropy to arrange higher weights for important candidate items by calculating the information entropy instead of treating each item equally. To put it simply, if multiple items in the candidate item contain attribute p, it will be difficult for attribute p to filter the candidate item, and p is not a suitable attribute. , the candidate item attribute can quickly filter the candidate items.
  • the policy network is based on a deep reinforcement learning network, which can generate an action decision for the current dialogue round according to the input state and the user's historical dialogue results, and then use the action decision to recommend this round of dialogue.
  • the embodiments of the present invention do not limit the reinforcement learning process of the policy network, and reference may be made to the related technologies of the deep reinforcement learning network.
  • the embodiments of the present invention also do not limit the network optimization method of the policy network, and reference may also be made to the related technologies of the deep reinforcement learning network.
  • the calculated candidate item set and candidate attribute set are input into the policy network for reinforcement learning, and the process of recommending dialogue to the user may include:
  • Step 41 Input the candidate item set and the candidate attribute set into the policy network for reinforcement learning to generate action decisions.
  • the policy network determines whether the action recommended by the current round of dialogue is to ask or recommend through reinforcement learning.
  • the policy network involves four values, namely state, action, reward and policy.
  • the state contains the dialogue history and the length of the current candidate item set.
  • the dialogue history is encoded by s his , the size of which is the maximum round T of dialogue recommendation, and each dimension represents the user's dialogue history in the t-th round.
  • the user's conversation history can be represented by a special value. It should be noted that the present invention does not limit specific special values, which can be set according to actual application requirements.
  • the embodiment of the present invention includes two actions, ie, asking a ask and recommending a rec .
  • the middle reward rt in round t is the weighted sum of these five.
  • the embodiment of the present invention does not limit the specific setting method of the above-mentioned reward value, which can be set according to actual application requirements.
  • the policy network After inputting the current dialogue state s into the policy network, the policy network will output the action decision value Q(s, a) about the two actions, and then the action of this round of dialogue can be determined according to the action decision value.
  • the policy network uses standard deep Q-learning to optimize the network.
  • Step 42 Use action decision to make dialogue recommendation to the user.
  • the process of recommending dialogues to users by using action decisions may include:
  • Step 51 when the action decision is an item recommendation, send the candidate item with the largest interaction prediction value to the user terminal, and receive feedback data;
  • Step 52 When the feedback data indicates that the candidate item is accepted, exit the dialogue recommendation
  • Step 53 When the feedback data indicates that the candidate item is rejected, the candidate item is removed from the candidate item set, and the candidate item set after the removal is used to perform the interactive prediction of each candidate item in the updated candidate item set using the item correlation value calculation. value;
  • Step 54 when the action decision is an attribute query, send the attribute of the candidate item with the largest preference prediction value to the user terminal, and receive the feedback data;
  • Step 55 Use the feedback data to verify the items in the candidate item set, and remove the items that have not passed the verification, and finally use the removed candidate item set to calculate each candidate item in the updated candidate item set using the item-related values.
  • the candidate item attribute p i ⁇ P cand with the largest preference prediction value is selected from the candidate item attribute set P cand .
  • the candidate item can be verified by using the candidate item attribute in the feedback data.
  • the verification process is: update the attribute set received by the user and update the candidate itemset
  • the verification process is: update the attribute set rejected by the user and update the candidate itemset
  • the dialogue recommendation when the dialogue recommendation cannot always get an item acceptable to the user, the dialogue recommendation will continue for several rounds.
  • the dialogue rounds can be limited by a preset threshold, and when the dialogue round reaches the preset threshold, the dialogue recommendation is exited.
  • the action decision to make a dialogue recommendation to the user can also include:
  • Step 61 Determine whether the dialogue round corresponding to the action decision is greater than a preset threshold
  • the present invention does not limit the specific value of the preset threshold, which can be set according to actual application requirements.
  • Step 62 If yes, exit the dialogue recommendation
  • Step 63 If not, execute the step of recommending dialogue to the user by using the action decision.
  • the method first obtains the historical interaction sequence between the user and the item. Since the sequence reflects the user's preference for historical items and the interaction sequence, the method uses the historical interaction sequence to participate in the training of the recommendation network model, which can ensure The candidate item set generated by the recommendation network model also considers the user's preference for historical items and the interaction order, and can generate a candidate item set that is more in line with the user's preference, thereby ensuring that the candidate item set can be used in dialogue recommendation. Conducting a dialogue can effectively reduce the number of dialogue rounds of dialogue recommendation, and at the same time can improve the accuracy of dialogue recommendation.
  • the following describes a dialogue recommendation apparatus, electronic device, and storage medium provided by the embodiments of the present invention.
  • the dialogue recommendation apparatus, electronic equipment, and storage medium described below and the dialogue recommendation method described above may refer to each other correspondingly.
  • FIG. 2 is a structural block diagram of a dialogue recommendation apparatus provided by an embodiment of the present invention.
  • the apparatus may include:
  • the recommendation network module 201 is used to obtain the historical interaction sequence between the user and the item, and input the historical interaction sequence into the recommendation network model including the embedding layer, the self-attention block and the prediction layer for training, and generate the item preference value; wherein, project contains project properties;
  • a candidate item set generation module 202 configured to generate a candidate item set by utilizing the item preference value and items that the user has not interacted with;
  • the first calculation module 203 is configured to update the candidate item set using the preference item attribute when receiving the preference item attribute sent by the user, and use the item correlation value to calculate the interactive prediction value of each candidate item in the updated candidate item set;
  • the second calculation module 204 is configured to generate a candidate attribute set using the interactive predicted value and the candidate item, and use the interactive predicted value to calculate the preference predicted value of each candidate item attribute in the candidate attribute set;
  • the dialogue recommendation module 205 is used for inputting the calculated candidate item set and candidate attribute set into the policy network for reinforcement learning, and performing dialogue recommendation to the user.
  • the recommending network module 201 may include:
  • the self-attention block sub-module is used to input the integrated embedding matrix into the self-attention block for iterative feature learning and generate a learning matrix;
  • the prediction layer sub-module is used to input the learning matrix into the prediction layer for matrix decomposition and calculate the initial item preference value
  • the dialogue recommendation module 205 may include:
  • the reinforcement learning sub-module is used to input the candidate item set and the candidate attribute set into the policy network for reinforcement learning to generate action decisions;
  • the dialogue recommendation submodule is used to make dialogue recommendations to users using action decisions.
  • the dialogue recommendation sub-module may include:
  • a first sending unit configured to send the candidate item with the largest interaction prediction value to the user terminal when the action decision is an item recommendation, and receive feedback data;
  • a first processing unit configured to quit the dialogue recommendation when the feedback data indicates that the candidate item is accepted
  • the second processing unit is configured to remove the candidate item from the candidate item set when the feedback data indicates that the candidate item is rejected, and use the removed candidate item set to perform the calculation of each candidate item in the updated candidate item set using the item correlation value.
  • the interactive predicted value of the item is configured to remove the candidate item from the candidate item set when the feedback data indicates that the candidate item is rejected, and use the removed candidate item set to perform the calculation of each candidate item in the updated candidate item set using the item correlation value.
  • a second sending unit configured to send the item attribute with the largest preference prediction value to the user terminal when the action decision is an attribute query, and receive feedback data;
  • the third processing unit is used to use the feedback data to verify the items in the candidate item set, remove the items that fail the verification, and finally use the removed candidate item set to calculate the updated candidate item using the item related value.
  • the dialogue recommendation module 205 may further include:
  • the dialogue round judgment sub-module is used to judge whether the dialogue round corresponding to the action decision is greater than the preset threshold; if so, exit the dialogue recommendation; if not, execute the step of using the action decision to recommend the dialogue to the user.
  • the first computing module 203 includes:
  • the removal operation submodule is used to remove the preference items that do not have the preference item attribute in the candidate item set.
  • the second computing module 204 includes:
  • the candidate attribute set generation sub-module is used to generate a candidate attribute set by using the item attributes contained in the previous preset number of candidate items in the order of the interactive prediction value from large to small;
  • the second calculation submodule is used for calculating the information entropy of each candidate item attribute in the candidate attribute set by using the interactive prediction value of the candidate item, and using the information entropy as the preference prediction value of the candidate item attribute.
  • FIG. 3 is a structural block diagram of a dialogue recommendation system based on a historical interaction sequence provided by an embodiment of the present invention.
  • SeqCR Sequential Conversation Recommender
  • the Policy Network Module is used to complete the functions of the dialogue recommendation module 205 in the above embodiment, such as receiving feedback and updating preferences. ), and update candidate items and attributes (Update Candidate items and attributes) to the Sequential Module sequence module.
  • the Sequential Module sequence module is used to complete the function of the recommended network module 201 in the above embodiment, including the Embedding Layer, Self-attention Block Self-attention block and Prediction Layer prediction layer, wherein the self-attention block includes Self-attention self-attention layer and Feed Forward Network feedforward network; Soring Module scoring module is used to complete the candidate itemset generation module 202 in the above embodiment, the first calculation
  • the functions of the module 203 and the second calculation module 204 such as the Item Scroring item scoring (calculating the interactive prediction value) and the Attribute Scoring attribute scoring function (calculating the preference prediction value), User represents the user, and can receive the query sent by the policy network and send it to the policy network. Provide feedback.
  • the processor is configured to implement the steps of the above dialogue recommendation method when executing the computer program.
  • the embodiments of the electronic device part correspond to the embodiments of the dialogue recommendation method part, the embodiments of the electronic device part refer to the description of the embodiments of the dialogue recommendation method part, which will not be repeated here.
  • Embodiments of the present invention further provide a storage medium, where a computer program is stored on the storage medium, and when the computer program is executed by a processor, the steps of the dialog recommendation method of any of the foregoing embodiments are implemented.
  • the embodiments of the storage medium part correspond to the embodiments of the dialogue recommendation method part, the embodiments of the storage medium part refer to the description of the embodiments of the dialogue recommendation method part, which will not be repeated here.
  • a software module can be placed in random access memory (RAM), internal memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other in the technical field. in any other known form of storage medium.
  • RAM random access memory
  • ROM read only memory
  • electrically programmable ROM electrically erasable programmable ROM
  • registers hard disk, removable disk, CD-ROM, or any other in the technical field. in any other known form of storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé et un appareil de recommandation de conversation, un dispositif électronique et un support de stockage. Une séquence historique est utilisée pour améliorer l'efficacité de recommandation. Le procédé comprend les étapes consistant à : obtenir une séquence d'interaction historique entre un utilisateur et un élément, à entrer la séquence d'interaction historique dans un modèle de réseau de recommandation comprenant une couche d'incorporation, un bloc d'auto-attention, et une couche de prédiction pour l'entraînement, et générer une valeur de préférence d'élément, l'élément comprenant un attribut d'élément ; générer un ensemble d'éléments candidats en utilisant la valeur de préférence d'élément et des éléments n'ayant pas d'interaction avec l'utilisateur ; lorsqu'un attribut d'élément de préférence envoyé par l'utilisateur est reçu, mettre à jour l'ensemble d'éléments candidats en utilisant l'attribut d'élément de préférence, et calculer une valeur de prédiction d'interaction de chaque élément candidat dans l'ensemble d'éléments candidats mis à jour à l'aide d'une valeur de corrélation d'élément ; générer un ensemble d'attributs candidats à l'aide de la valeur de prédiction d'interaction et de l'élément candidat, et calculer une valeur de prédiction de préférence de chaque attribut d'élément candidat dans l'ensemble d'attributs candidats à l'aide de la valeur de prédiction d'interaction ; et entrer l'ensemble d'éléments candidats calculé et l'ensemble d'attributs candidats calculé dans un réseau de politiques pour l'apprentissage par renforcement, et réaliser une recommandation de conversation à l'utilisateur.
PCT/CN2021/122167 2021-03-23 2021-09-30 Procédé et appareil de recommandation de conversation, dispositif électronique et support de stockage WO2022198983A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110308759.6 2021-03-23
CN202110308759.6A CN112925892B (zh) 2021-03-23 2021-03-23 一种对话推荐方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022198983A1 true WO2022198983A1 (fr) 2022-09-29

Family

ID=76175614

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/122167 WO2022198983A1 (fr) 2021-03-23 2021-09-30 Procédé et appareil de recommandation de conversation, dispositif électronique et support de stockage

Country Status (2)

Country Link
CN (1) CN112925892B (fr)
WO (1) WO2022198983A1 (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112925892B (zh) * 2021-03-23 2023-08-15 苏州大学 一种对话推荐方法、装置、电子设备及存储介质
CN113487379B (zh) * 2021-06-24 2023-01-13 上海淇馥信息技术有限公司 一种基于对话式的产品推荐方法、装置和电子设备
CN113468420B (zh) * 2021-06-29 2024-04-05 杭州摸象大数据科技有限公司 产品推荐的方法和系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004114154A1 (fr) * 2003-06-23 2004-12-29 University College Dublin, National University Of Ireland, Dublin Systeme et procede d'extraction
CN110008409A (zh) * 2019-04-12 2019-07-12 苏州市职业大学 基于自注意力机制的序列推荐方法、装置及设备
CN110390108A (zh) * 2019-07-29 2019-10-29 中国工商银行股份有限公司 基于深度强化学习的任务型交互方法和系统
CN111797321A (zh) * 2020-07-07 2020-10-20 山东大学 一种面向不同场景的个性化知识推荐方法及系统
CN112925892A (zh) * 2021-03-23 2021-06-08 苏州大学 一种对话推荐方法、装置、电子设备及存储介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11470165B2 (en) * 2018-02-13 2022-10-11 Ebay, Inc. System, method, and medium for generating physical product customization parameters based on multiple disparate sources of computing activity

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004114154A1 (fr) * 2003-06-23 2004-12-29 University College Dublin, National University Of Ireland, Dublin Systeme et procede d'extraction
CN110008409A (zh) * 2019-04-12 2019-07-12 苏州市职业大学 基于自注意力机制的序列推荐方法、装置及设备
CN110390108A (zh) * 2019-07-29 2019-10-29 中国工商银行股份有限公司 基于深度强化学习的任务型交互方法和系统
CN111797321A (zh) * 2020-07-07 2020-10-20 山东大学 一种面向不同场景的个性化知识推荐方法及系统
CN112925892A (zh) * 2021-03-23 2021-06-08 苏州大学 一种对话推荐方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN112925892A (zh) 2021-06-08
CN112925892B (zh) 2023-08-15

Similar Documents

Publication Publication Date Title
WO2022198983A1 (fr) Procédé et appareil de recommandation de conversation, dispositif électronique et support de stockage
EP3711000B1 (fr) Recherche d'une architecture de réseau neuronal régularisée
CN109299396B (zh) 融合注意力模型的卷积神经网络协同过滤推荐方法及系统
CN110457589B (zh) 一种车辆推荐方法、装置、设备及存储介质
CN111506820B (zh) 推荐模型、方法、装置、设备及存储介质
CN111989696A (zh) 具有顺序学习任务的域中的可扩展持续学习的神经网络
CN113705811B (zh) 模型训练方法、装置、计算机程序产品及设备
US11423307B2 (en) Taxonomy construction via graph-based cross-domain knowledge transfer
CN108921342B (zh) 一种物流客户流失预测方法、介质和系统
TW202001749A (zh) 套現識別方法和裝置
WO2010048758A1 (fr) Classification d’un document en fonction d’un arbre de recherche pondéré créé par des algorithmes génétiques
Wang et al. Learning to augment for casual user recommendation
CN115130542A (zh) 模型训练方法、文本处理方法、装置及电子设备
US20240046922A1 (en) Systems and methods for dynamically updating machine learning models that provide conversational responses
CN111010595B (zh) 一种新节目推荐的方法及装置
CN116401522A (zh) 一种金融服务动态化推荐方法和装置
CN113362852A (zh) 一种用户属性识别方法和装置
CN111445032A (zh) 利用业务决策模型进行决策处理的方法及装置
CN116308551A (zh) 基于数字金融ai平台的内容推荐方法及系统
WO2023009766A1 (fr) Évaluation de séquences de sortie au moyen d'un réseau neuronal à modèle de langage auto-régressif
US20220261683A1 (en) Constraint sampling reinforcement learning for recommendation systems
KR102612986B1 (ko) 온라인 추천 시스템, 메타 학습 기반 추천기 업데이트 방법 및 장치
Candan et al. Non stationary operator selection with island models
CN116776870B (zh) 意图识别方法、装置、计算机设备及介质
CN113779396B (zh) 题目推荐方法和装置、电子设备、存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21932592

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE