CN113111241B - Multi-turn conversation method based on conversation history and reinforcement learning in game conversation - Google Patents

Multi-turn conversation method based on conversation history and reinforcement learning in game conversation Download PDF

Info

Publication number
CN113111241B
CN113111241B CN202110378191.5A CN202110378191A CN113111241B CN 113111241 B CN113111241 B CN 113111241B CN 202110378191 A CN202110378191 A CN 202110378191A CN 113111241 B CN113111241 B CN 113111241B
Authority
CN
China
Prior art keywords
conversation
history
turn
opponent
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110378191.5A
Other languages
Chinese (zh)
Other versions
CN113111241A (en
Inventor
庄越挺
汤斯亮
程广钊
谭炽烈
肖俊
李晓林
蒋韬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Tongdun Holdings Co Ltd
Original Assignee
Zhejiang University ZJU
Tongdun Holdings Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU, Tongdun Holdings Co Ltd filed Critical Zhejiang University ZJU
Priority to CN202110378191.5A priority Critical patent/CN113111241B/en
Publication of CN113111241A publication Critical patent/CN113111241A/en
Application granted granted Critical
Publication of CN113111241B publication Critical patent/CN113111241B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation
    • G06F16/90332Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/908Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a multi-turn dialogue method based on dialogue history and reinforcement learning in game dialogue, belonging to the field of intelligent agents and reinforcement learning models. The method comprises the following steps: firstly, taking multiple rounds of conversations as a limited repeated game process, storing the finished complete multiple rounds of conversations, and constructing a historical conversation information base; then in a new multi-turn conversation, an opponent action estimation model is built based on a memory network, the turn of the current conversation is used for retrieving a conversation history information base, and an estimation vector of the opponent next-step strategy is generated through multi-step estimation; and finally, fusing the information and the estimation vector of the current conversation based on the coding-decoding model, and making a response of the next step. In the multi-turn conversation process, the estimation vector of the existing conversation history and the response vector of the current conversation history are fused, so that the history information can be more fully utilized, and the conversation robot (intelligent agent) has higher adaptability and can make better response.

Description

Multi-turn conversation method based on conversation history and reinforcement learning in game conversation
Technical Field
The invention relates to the field of intelligent agents and reinforcement learning models, in particular to a method for multi-turn dialogue of an intelligent agent.
Background
Having a virtual assistant or a chat partner system with sufficient intelligence appears to be fantasy, possibly existing only in science fiction movies. However, in recent years, human-machine conversations have received increased attention from researchers due to their potential and attractive commercial value. With the development of big data and deep learning techniques, it would no longer be a fantasy to create an automated man-machine dialog system as our personal assistant or chat partner. Currently, people pay more and more attention to the dialog system in various fields, and the development of the dialog system is greatly promoted by the continuous progress of deep learning technology. For conversational systems, deep learning techniques may utilize large amounts of data to learn feature representation and reply generation strategies, where only a small amount of manual work is required. Today, we can easily access the "big data" of a conversation over a network, and we may be able to learn how to reply, and how to reply to almost any input, which would greatly allow us to build a data-driven, open conversation system between humans and computers. On the other hand, deep learning techniques have proven effective, can capture complex patterns in large data, and possess a large number of areas of research, such as computer vision, natural language processing and recommendation systems, and the like.
From an application point of view, dialog systems can be roughly divided into two categories: (1) task-oriented systems; (2) non-task-oriented systems (chat-type dialog systems). Real-world dialogue systems (such as bargaining and bargaining) are challenging tasks. An adversary typically has a different pattern and is a number of dialog turns, but the number of turns is limited. However, current research rarely uses previous interactions (historical information).
The multiple rounds of conversation can be viewed as a process of limited repeat gaming, with the history of the conversation comprising two parts, the first part being the complete multiple rounds of conversation that have ended (referred to as the past conversation history) and the second part being the rounds of the current multiple rounds of conversation that have been conducted (referred to as the current conversation history). Current dialog systems only focus on the utilization of the current dialog history, but ignore the past dialog history. Therefore, how to make full use of the historical information and respond better in the game conversation process is a technical problem to be solved urgently at present.
The history of the previous conversation is a complete conversation process, and the history information base stores the complete conversation when facing different opponents, so that the history information is obviously important. In a new multi-turn conversation (e.g., conversational gaming, bargaining, etc.) these past conversation histories can be utilized to infer a policy on the type of adversary in order to better respond.
Disclosure of Invention
The invention aims to provide a multi-turn conversation method based on conversation history and reinforcement learning in a game conversation, so that an intelligent agent has the capability of quickly adapting to a round-turn conversation robot, and the type and strategy of an opponent are deduced more quickly so as to respond.
In order to achieve the purpose of the invention, the invention specifically adopts the following technical scheme:
a multi-turn conversation method based on conversation history and reinforcement learning in game conversation comprises the following steps:
s1: taking multiple rounds of conversations as a limited repeated game process, storing the finished complete multiple rounds of conversations, and constructing an existing conversation history information base;
s2: in a current multi-turn conversation which is already carried out but not completed, the number of turns which are already carried out in the current multi-turn conversation is obtained as current conversation history, and a plurality of complete multi-turn conversations which are most similar to the current conversation history are searched in the past conversation history information base to serve as past history data; then in an opponent action estimation model constructed based on a memory network as a framework, the current conversation history is used as query, the past history data is used as queried content, and an estimation vector of the subsequent action of the opponent is generated through multi-step reasoning;
the opponent action estimation model is trained in advance, so that the output estimation vector of the subsequent action of the opponent can represent the actual vector of the subsequent action of the opponent;
s3: inputting the current conversation history and the estimation vector of the subsequent action of the opponent into a trained coding-decoding model, and making a response of the next step.
Preferably, the opponent action estimation model is a one-step opponent action estimation model, and an output estimation vector of the opponent action estimation model is an estimation vector representing a next action of the opponent.
Preferably, the opponent action estimation model is a multi-step opponent action estimation model, and an estimation vector output by the opponent action estimation model is an estimation vector representing all subsequent actions of the opponent in the current multiple rounds of multiple lines.
Preferably, when a new multi-turn dialogue starts, the first few turns of dialogue give responses directly according to the multi-turn dialogue model without responding based on the current dialogue history; and in the rest conversation turns, taking the turn already carried out in the current multiple turns of conversations as the current conversation history, and carrying out the next response according to S2 and S3.
Further, when a new multi-turn dialogue starts, the turn of response is directly given according to the multi-turn dialogue model as the first 3 to 5 turns.
Preferably, in the coding-decoding model, a vector obtained based on the current dialogue history and the estimated vector of the subsequent action of the opponent are subjected to fusion coding, and then decoded into a natural language or an action by using a neural network to make a response of the next step.
Further, in the encoding-decoding model, the fusion encoding mode is to splice vectors directly or fuse vectors through a self-attention mechanism.
Preferably, in the encoding-decoding model, the encoding part adopts a hierarchy-based encoder, and the decoding part adopts a multilayer feedforward neural network.
Preferably, when the opponent action estimation model is trained, the current dialogue history is input into the opponent action estimation model to generate an estimation vector of an opponent follow-up action, and the follow-up action of each multi-turn dialogue in the past history data is input into a fusion net neural network to generate an actual vector of the opponent follow-up action, and the two vectors are infinitely close to each other by optimizing model parameters.
Preferably, the multiple rounds of conversations are task-type conversations and chat-type conversations.
In the multi-turn conversation process, the estimation vector of the existing conversation history and the response vector of the current conversation history are fused, so that the history information can be more fully utilized, and the conversation robot (intelligent agent) has higher adaptability and can make better response. The method can be used as an architecture, is used for a method or a model which can completely merge previous multiple rounds of conversations, can also be combined with the latest research method in the current conversation field, and has better expansibility.
Drawings
Fig. 1 is a flow chart of a method for multiple rounds of dialogue based on dialogue history and reinforcement learning in a gaming dialogue.
FIG. 2 is a diagram of a one-step opponent estimation model.
Detailed Description
The invention will be further elucidated and described with reference to the drawings and the detailed description.
In a gaming session (e.g., bargaining), the type of opponents each confronted may be different, and the strategy of the opponents may vary. In new conversations, it is a challenge how to quickly infer the types and policies of the adversaries, and to make the most favorable responses.
The method provided by the invention is particularly applicable in the case of various strategies and types of the game dialogs or the opponents involved. In repeat games, game history is the primary basis for decision-making and defeat of an opponent. Historical information is a special form of knowledge upon which to judge the type of adversary, infer incomplete information, and predict the behavior of the adversary. The historical information can be said to be the only basis without other additional information from the adversary. Interaction in the real world (e.g. bargaining and bargaining) is a challenging task, with the adversary often having a different way and the rounds of interaction often being multiple but limited in number. However, current research rarely uses previous interactions (historical information). In the face of a wide variety of adversaries or strategies, how to quickly adapt an agent is an important issue. Many new policies are either integration or variants of old policies, so we can use historical information to make the agent quickly adaptive.
The limited repeat game is a primary game (or a stage game) repeated for a limited number of times, and the multiple rounds of conversations can be regarded as a process of the limited repeat game, and the history of the conversations comprises two types, wherein the first type is a complete multiple round of conversations which are ended (called a past conversation history), and the second type is a round which is already performed in the current multiple round of conversations (called a current conversation history). Thus, for multiple sessions, the information available not only is the history of the previous sessions, but also the previous session information. The invention effectively utilizes the two types of historical information at the same time, so that the conversation robot (intelligent agent) has higher adaptability and can make better response. The following will specifically explain the implementation of the present invention in detail.
Referring to fig. 1, in a preferred embodiment of the present invention, a method for multiple rounds of dialogue based on dialogue history and reinforcement learning in a gaming dialogue is provided, which comprises the following steps:
s1: and taking multiple rounds of conversations as a limited repeated game process, collecting and storing the completed multiple rounds of conversations in the agent, and constructing a historical information base of the previous conversations. In consideration of the limitation of storage capacity, complete rounds of conversations can be screened, the final scores can be obtained after the whole limited repeated game is ended, and the typical complete rounds of conversations with high scores are selected and stored. The conversation history which is finished before is used as the basis for the decision of the intelligent agent, so that the prior complete conversation information is stored and marked at the moment when the prior conversation history information base is constructed, and the rest current conversation history can be used for the inquiry of the next step.
S2: in a plurality of rounds of conversations (marked as current rounds of conversations) which are already carried out but not completed, the rounds which are already carried out in the current rounds of conversations are obtained as current conversation histories, a plurality of complete rounds of conversations which are most similar to the current conversation histories are searched in the past conversation history information base to serve as past history data, and only m rounds of similarities of previous rounds of the past histories are compared during searching (m is the number of rounds of conversations which are already carried out in the current rounds of conversations). The complete multiple rounds of conversations with the highest historical similarity to the current conversation can be used as retrieval results, wherein the similarity calculation method can adopt text similarity and other modes, firstly, the conversation (text) is converted into word vectors (word embedding), then, the cosine similarity among the word vectors is calculated, and certainly, other modes can also be adopted to calculate the similarity. Then, an Opponent Action estimation model (OAE) is constructed and trained by using a Memory Network (Memory Network) as a framework, in the Opponent Action estimation model, the current conversation history can be used as a query (query), the past history data can be used as queried content, and an estimation vector of the subsequent Action of the Opponent is generated through multi-step reasoning.
It should be noted that the above-mentioned opponent action estimation model needs to be trained in advance before being actually used, so that the output estimation vector of the opponent subsequent action can represent the actual vector of the opponent subsequent action, that is, two vectors are infinitely close.
In this embodiment, when the hand action estimation model is trained, the current dialogue history in the training data is input into the hand action estimation model (i.e. the memory network framework) to generate an estimation vector of the subsequent action of the hand, and the subsequent action of each multi-turn dialogue in the existing history data is input into the fusion net neural network to generate an actual vector of the subsequent action of the hand, and the two vectors are infinitely close to each other by optimizing the model parameters. Assuming that the number of turns that have been performed in the current session is m, the follow-up action of the input fusion net neural network is the m +1 th turn of the session or the m +1 th turn and all subsequent session histories in the completed multiple turns of the session.
The subsequent actions of the Opponent are specifically determined according to the subsequent Action condition which needs to be predicted by the agent, and if the number of the executed dialog turns in the current Multi-turn dialog is M, only predicting the M +1 th turn is called One-Step Opponent Action estimation (O-OAE), and predicting the M +1 th turn and all the actions later is called Multi-Step Opponent Action estimation (M-OAE).
Therefore, in the multi-step reasoning of the invention, the estimation model of the opponent action is constructed and trained by taking the memory network as a framework. As shown in fig. 2, in the adversary action estimation model, the current dialogue history is used as a query, the past history data is used as the queried content, and the memory network can perform inference in three steps (even multiple steps), and the inference process is as follows: firstly, after word vectors of the past history and the current history are obtained through a coding matrix, softmax operation is carried out to calculate the similarity of the word vectors, the related weight of the past history is obtained, then the past history is coded by different coding matrixes again and is subjected to weighted summation with the related weight, and the one-step reasoning is carried out. The multi-step reasoning will repeat the above operations, but each step of reasoning is different for the past history of the coding matrix. Finally, an estimated vector of the subsequent action of the opponent is generated.
S3: and inputting the current conversation history and the estimation vector of the subsequent action of the opponent output by the action estimation model of the opponent in the S2 into a trained coding-decoding (Encoder-Decoder) model together, and making a response of the next step.
In the above processes of S1 to S3, the current multiple rounds of conversations need to depend on the number of already performed rounds as the current conversation history, but when a new multiple rounds of conversations are started, the information in the first few rounds is too little, so that the current conversation history is not accurate. Thus, when a new multi-turn conversation starts, the first few turns of the conversation can give a response directly according to the multi-turn conversation model, without responding based on the current conversation history; and in the rest conversation turns, taking the turn already carried out in the current multiple turns of conversations as the current conversation history, and carrying out the next response according to S2 and S3. Here, the so-called multi-turn dialogue model is an agent owned by the dialogue robot before the present invention is applied, which can generate a response according to an existing method and model.
The number of rounds m in which the response is given directly according to the model of the multi-round dialog at the start of a new multi-round dialog can be determined from the total number of rounds of dialog and can be set to typically 3 to 5 rounds. The first 3 to 5 wheel conversations can respond according to the existing method and model, after the 3 to 5 wheel conversations, K previous conversation histories (K values are optimized and adjusted according to the reality) which are most similar to the current m wheel conversation histories are searched in a previous conversation history information base by utilizing the framework of the S2-S3, and then the estimation is carried out on the basis of the hand action estimation model.
The encoding-decoding model has the functions of encoding the current conversation history into vectors, performing fusion encoding on the vectors and the estimated vectors of the subsequent actions of the opponent obtained in the step S2, then decoding the vectors into natural language or actions (specifically, the natural language or the actions need to be determined according to the form of the conversation) by using a neural network, and making the next response. The fusion coding can be performed in different ways, for example, the vectors can be directly concatenated (concat) or the fusion coding can be performed through a self-attention mechanism (self-attentions). The specific form of the encoding-decoding model may be various as long as the corresponding function can be achieved. In the encoding-decoding model of the present embodiment, the encoding portion employs a hierarchy-based encoder, and the decoding portion employs a multi-layer feedforward neural network. And the current history is encoded by a hierarchical encoder and then fused with the current history, and finally, the next action is generated by a multi-layer feedforward neural network. Therefore, the invention is a framework, and the method or the model of previous multiple rounds of conversations can be completely fused in the framework, and the method or the model can also be combined with the latest research method in the current conversation field. When different types of game problems are faced, the income matrixes (income functions) of different game problems are different, but the opponent action estimation model is independent of the specific game problems, and the module can be reused as long as the past history and the current history belong to the same game problem.
The multi-round conversation method provided by the invention has the advantages that the historical information is more fully utilized, and the adaptability and the response accuracy of the conversation robot (intelligent agent) are improved. The current conversation robots are mainly classified into task-type conversation robots and chat-type conversation robots.
The task-based dialogue robot aims to help the user to complete a specific task (such as ordering food, booking tickets, etc.), and the less the number of turns of dialogue, the better the task is. The invention can make full use of the prior historical information base, can solve the problems of pertinence and adaptability under the guidance of the historical information, shortens the conversation turns, helps the user to complete the task more quickly and improves the use experience of the user. Moreover, the invention can rapidly migrate on different kinds of dialogue robots, such as ticket booking robots which are rapidly constructed by using meal booking-based dialogue robots.
The chat type conversation robot is mainly used for chatting with users, but the current chat conversation robot mainly has the problems of single response, repeated language, too short turn and the like. The invention is different from other chat type conversation robots based on retrieval models, can provide more diversified responses through rich historical information bases, enables the conversation robots to have the capabilities of simple logical reasoning and problem migration through a multi-step reasoning model (opponent action estimation model), and enables the conversation robots to be more intelligent and humanized by making responses of specific styles aiming at different user types.
Practical application results show that the multi-turn conversation method based on conversation history and reinforcement learning in the game conversation provided by the invention can enable the two conversation robots to have higher adaptability and make better responses.
The above-described embodiments are merely preferred embodiments of the present invention, which should not be construed as limiting the invention. Various changes and modifications may be made by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present invention. Therefore, the technical scheme obtained by adopting the mode of equivalent replacement or equivalent transformation is within the protection scope of the invention.

Claims (9)

1. A multi-turn conversation method based on conversation history and reinforcement learning in game conversation is characterized by comprising the following steps:
s1: taking multiple rounds of conversations as a limited repeated game process, storing the finished complete multiple rounds of conversations, and constructing an existing conversation history information base;
s2: in a current multi-turn conversation which is already carried out but not completed, the number of turns which are already carried out in the current multi-turn conversation is obtained as current conversation history, and a plurality of complete multi-turn conversations which are most similar to the current conversation history are searched in the past conversation history information base to serve as past history data; then in an opponent action estimation model constructed based on a memory network as a framework, the current conversation history is used as query, the past history data is used as queried content, and an estimation vector of the subsequent action of the opponent is generated through multi-step reasoning;
the opponent action estimation model is trained in advance, so that the output estimation vector of the subsequent action of the opponent can represent the actual vector of the subsequent action of the opponent; when the adversary action estimation model is trained, inputting the current dialogue history into the adversary action estimation model to generate an estimation vector of an adversary follow-up action, and simultaneously inputting the follow-up action of each multi-turn dialogue in the past history data into a fusion Net neural network to generate an actual vector of the adversary follow-up action, and enabling the two vectors to approach infinitely through optimizing model parameters;
s3: inputting the current dialogue history and the estimation vector of the adversary follow-up action into a trained coding-decoding model, and making a response of the next step.
2. The method of claim 1, wherein the opponent action estimation model is a one-step opponent action estimation model that outputs an estimation vector representing the next action of the opponent.
3. The method of claim 1, wherein the opponent action estimation model is a multi-step opponent action estimation model that outputs an estimation vector representing all subsequent actions of the opponent in the current multiple rounds of multiple lines.
4. The method of claim 1, wherein when a new multi-turn session is initiated, the first plurality of turns of the session respond directly according to the multi-turn session model without responding based on the current session history; and in the rest conversation turns, taking the turn already carried out in the current multiple turns of conversations as the current conversation history, and carrying out the next response according to S2 and S3.
5. The method of claim 4 wherein when a new session starts, the turn to give a response directly according to the multi-turn session model is the first 3~5 turns.
6. A multi-turn dialogue method based on dialogue history and reinforcement learning in a gaming dialogue as recited in claim 1, wherein in the coding-decoding model, vectors obtained based on the current dialogue history and estimated vectors of subsequent actions of the opponent are encoded in a fusion manner, and then decoded into natural language or actions by using a neural network, and responses are made in the next step.
7. A multiple-turn dialogue method based on dialogue history and reinforcement learning in gaming dialogues as recited in claim 6, wherein in the encoding-decoding model, the fusion encoding is performed by splicing vectors directly or by a self-attention mechanism.
8. The method for multiple rounds of dialogue based on dialogue history and reinforcement learning in gaming dialogue as recited in claim 1, wherein the encoding component employs a hierarchy-based encoder and the decoding component employs a multi-layer feed-forward neural network.
9. A method of multiple rounds of conversation based on conversation history and reinforcement learning in a betting conversation as claimed in claim 1, wherein said multiple rounds of conversation are task type conversation and chat type conversation.
CN202110378191.5A 2021-04-08 2021-04-08 Multi-turn conversation method based on conversation history and reinforcement learning in game conversation Active CN113111241B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110378191.5A CN113111241B (en) 2021-04-08 2021-04-08 Multi-turn conversation method based on conversation history and reinforcement learning in game conversation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110378191.5A CN113111241B (en) 2021-04-08 2021-04-08 Multi-turn conversation method based on conversation history and reinforcement learning in game conversation

Publications (2)

Publication Number Publication Date
CN113111241A CN113111241A (en) 2021-07-13
CN113111241B true CN113111241B (en) 2022-12-06

Family

ID=76715391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110378191.5A Active CN113111241B (en) 2021-04-08 2021-04-08 Multi-turn conversation method based on conversation history and reinforcement learning in game conversation

Country Status (1)

Country Link
CN (1) CN113111241B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800294A (en) * 2019-01-08 2019-05-24 中国科学院自动化研究所 Autonomous evolution Intelligent dialogue method, system, device based on physical environment game

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107885756B (en) * 2016-09-30 2020-05-08 华为技术有限公司 Deep learning-based dialogue method, device and equipment
CN108681610B (en) * 2018-05-28 2019-12-10 山东大学 generating type multi-turn chatting dialogue method, system and computer readable storage medium
CN111414460B (en) * 2019-02-03 2024-01-19 北京邮电大学 Multi-round dialogue management method and device combining memory storage and neural network
CN110188167B (en) * 2019-05-17 2021-03-30 北京邮电大学 End-to-end dialogue method and system integrating external knowledge
CN112115247B (en) * 2020-09-07 2023-10-10 中国人民大学 Personalized dialogue generation method and system based on long-short-time memory information

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800294A (en) * 2019-01-08 2019-05-24 中国科学院自动化研究所 Autonomous evolution Intelligent dialogue method, system, device based on physical environment game

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GRS:一种面向电商领域智能客服的生成-检索式对话模型;郭晓哲等;《华东师范大学学报(自然科学版)》;20200925(第05期);全文 *

Also Published As

Publication number Publication date
CN113111241A (en) 2021-07-13

Similar Documents

Publication Publication Date Title
CN108681610B (en) generating type multi-turn chatting dialogue method, system and computer readable storage medium
CN110119467B (en) Project recommendation method, device, equipment and storage medium based on session
CN110427490B (en) Emotional dialogue generation method and device based on self-attention mechanism
US20190197402A1 (en) Adding deep learning based ai control
US20180314942A1 (en) Scalable framework for autonomous artificial intelligence characters
Gorniak et al. Situated language understanding as filtering perceived affordances
CN114443827A (en) Local information perception dialogue method and system based on pre-training language model
US20180314963A1 (en) Domain-independent and scalable automated planning system using deep neural networks
Dai et al. A survey on dialog management: Recent advances and challenges
CN113220856A (en) Multi-round dialogue system based on Chinese pre-training model
Wang et al. Skill-based hierarchical reinforcement learning for target visual navigation
Mitsopoulos et al. Toward a psychology of deep reinforcement learning agents using a cognitive architecture
CN113111241B (en) Multi-turn conversation method based on conversation history and reinforcement learning in game conversation
CN117573834A (en) Multi-robot dialogue method and system for software-oriented instant service platform
CN112418421A (en) Roadside end pedestrian trajectory prediction algorithm based on graph attention self-coding model
CN116841708A (en) Multi-agent reinforcement learning method based on intelligent planning
Musilek et al. Enhanced learning classifier system for robot navigation
Cordier et al. Diluted near-optimal expert demonstrations for guiding dialogue stochastic policy optimisation
Vriend Artificial intelligence and economic theory
CN113743605A (en) Method for searching smoke and fire detection network architecture based on evolution method
Pang et al. Natural Language-conditioned Reinforcement Learning with Inside-out Task Language Development and Translation
Dsouza et al. Optimizing MRC Tasks: Understanding and Resolving Ambiguities
Rohmatillah et al. Advances and Challenges in Multi-Domain Task-Oriented Dialogue Policy Optimization
Saha et al. Transfer Learning based Task-oriented Dialogue Policy for Multiple Domains using Hierarchical Reinforcement Learning
Kim et al. DESEM: Depthwise Separable Convolution-Based Multimodal Deep Learning for In-Game Action Anticipation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant