CN113436752B

CN113436752B - Semi-supervised multi-round medical dialogue reply generation method and system

Info

Publication number: CN113436752B
Application number: CN202110577272.8A
Authority: CN
Inventors: 任昭春; 任鹏杰; 陈竹敏; 李冬冬; 马军
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2021-05-26
Filing date: 2021-05-26
Publication date: 2023-04-28
Anticipated expiration: 2041-05-26
Also published as: CN113436752A

Abstract

The invention belongs to the field of conversational information processing, and provides a semi-supervised multi-round medical conversation reply generation method and system. The method comprises the steps of inputting the problems of a patient in a first round of dialogue into a semi-supervised medical dialogue model to obtain replies of the first round of dialogue; in the second round and the later conversations, inputting the problems of the current round of patients and the replies of the previous round of conversations into a semi-supervised medical conversation model to obtain replies of the corresponding round of conversations until the patients have no new problem input; the semi-supervised medical dialogue model comprises a context encoder, a priori state tracker, an inference strategy state tracker, a priori strategy network, an inference strategy network and a reply generator, wherein the context encoder is used for encoding received information and inputting the information into the priori state tracker and the priori strategy network, the priori state tracker is used for continuously tracking the physical state of a user, the priori strategy network is used for generating a doctor action, and the reply generator is used for generating a corresponding reply according to the physical state and the doctor action.

Description

A semi-supervised multi-round medical dialogue response generation method and system

技术领域Technical Field

本发明属于对话式信息处理领域，尤其涉及一种半监督的多轮医疗对话回复生成方法及系统。The present invention belongs to the field of conversational information processing, and in particular relates to a semi-supervised multi-round medical conversation response generation method and system.

背景技术Background Art

本部分的陈述仅仅是提供了与本发明相关的背景技术信息，不必然构成在先技术。The statements in this section merely provide background information related to the present invention and do not necessarily constitute prior art.

同时为了解决开放领域的信息需求和高度垂直领域的专业需求，会话范式被用来将人们与信息联系起来。现有的对话系统可分为两大类：面向任务的和开放域对话系统。以任务为导向的对话系统旨在帮助人们完成特定的任务。例如日程安排，订餐馆，查询天气。开放域对话系统主要是与人们聊天，用于满足人们对信息和娱乐的需求。不同于医疗问答，真实医学场景中的对话更可能包含多轮交互。因为患者需要通过对话的上下文来表达他/她的症状、他/她正在服用的药物和他/她的病史。这一特性使得显式状态追踪变得不可或缺，其提供了比隐状态表示更具指示性和可解释性的信息。考虑到医学对话的特殊性，医学推理能力(例如是否开药，开什么药治疗疾病，询问何种症状)也是医疗诊断中不可或缺的特性。In order to address both the information needs of open domains and the professional needs of highly vertical domains, the conversational paradigm is used to connect people with information. Existing dialogue systems can be divided into two categories: task-oriented and open-domain dialogue systems. Task-oriented dialogue systems are designed to help people complete specific tasks. For example, scheduling, booking restaurants, and checking the weather. Open-domain dialogue systems are mainly used to chat with people to meet people's needs for information and entertainment. Unlike medical Q&A, conversations in real medical scenarios are more likely to contain multiple rounds of interactions. Because the patient needs to express his/her symptoms, the medicines he/she is taking, and his/her medical history through the context of the conversation. This feature makes explicit state tracking indispensable, which provides more indicative and interpretable information than hidden state representation. Considering the particularity of medical dialogue, medical reasoning capabilities (such as whether to prescribe medicine, what medicine to prescribe to treat the disease, and what symptoms to ask) are also indispensable features in medical diagnosis.

现有的医疗对话方法是基于任务导向的对话范式构建，遵循的是患者表达症状的，对话系统返回诊断结果(即确定病人患了什么疾病)的范式。其取得了很好的效果。但这些方法只聚焦于诊断这一单一领域，无法满足实际应用中病人的多种需求，而且其需要大量人工标注的状态和动作。当对话数据高度机密或数据规模巨大时是无法实现的，并且这些工作受限于训练数据规模的影响，甚至无法使用生成式的方法来生成回复，只能通过模板的方式来组成回复。一些任务型对话的方法可以应用于医疗对话中的状态追踪，但是其依旧无法应对无充分标注数据的情景。为了减轻任务导向对话系统对于数据标注的需求，Jin等和Zhang等都使用了半监督的学习方法来进行状态追踪，但忽视了对话主体的推理能力，即未建模医师的动作。Liang等提出一种利用未完全标注的数据来训练任务导向对话系统中的特定模块的方法，但是无法在训练时刻推理出未标注的标签，致使其在医疗对话系统中同时无状态和动作标注的情况下提升有限。发明人发现，这些方法都未考虑从大规模医疗知识中进行检索，未能生成富含知识的回复，在医学对话这种对于推理能力有很强需求的场景中表现很差。Existing medical dialogue methods are based on the task-oriented dialogue paradigm, following the paradigm that the patient expresses symptoms and the dialogue system returns the diagnosis result (i.e., determines what disease the patient has). It has achieved good results. However, these methods only focus on the single field of diagnosis, which cannot meet the various needs of patients in actual applications, and require a large number of manually labeled states and actions. This is impossible when the dialogue data is highly confidential or the data scale is huge, and these works are limited by the impact of the training data scale, and even generative methods cannot be used to generate replies, and replies can only be composed through templates. Some task-based dialogue methods can be applied to state tracking in medical dialogues, but they still cannot cope with scenarios without sufficient labeled data. In order to reduce the demand for data labeling in task-oriented dialogue systems, Jin et al. and Zhang et al. both used semi-supervised learning methods for state tracking, but ignored the reasoning ability of the dialogue subject, that is, the actions of the physician were not modeled. Liang et al. proposed a method to use incompletely labeled data to train specific modules in task-oriented dialogue systems, but could not infer unlabeled labels at the time of training, resulting in limited improvement in medical dialogue systems without both state and action labeling. The inventors found that these methods did not consider retrieval from large-scale medical knowledge, failed to generate knowledge-rich responses, and performed poorly in scenarios such as medical dialogues that have strong demands for reasoning ability.

发明内容Summary of the invention

为了解决上述背景技术中存在的技术问题，本发明提供一种半监督的多轮医疗对话回复生成方法及系统，其同时考虑了病人状态和医师动作，使得对话系统同时具备了建模用户身体状态和医学推理的能力。In order to solve the technical problems existing in the above-mentioned background technology, the present invention provides a semi-supervised multi-round medical dialogue response generation method and system, which takes into account both the patient status and the doctor's actions, so that the dialogue system has the ability to model the user's physical status and medical reasoning at the same time.

为了实现上述目的，本发明采用如下技术方案：In order to achieve the above object, the present invention adopts the following technical solution:

本发明的第一个方面提供一种半监督的多轮医疗对话回复生成方法。A first aspect of the present invention provides a semi-supervised method for generating responses to multi-round medical conversations.

一种半监督的多轮医疗对话回复生成方法，其包括：A semi-supervised multi-round medical dialogue response generation method, comprising:

将第一轮对话中病人的问题输入至半监督医疗对话模型，得到第一轮对话的回复；Input the patient's questions in the first round of dialogue into the semi-supervised medical dialogue model to obtain the responses in the first round of dialogue;

在第二轮及其后对话中，将当前轮病人的问题及上一轮对话的回复输入至半监督医疗对话模型中，得到相应轮对话的回复，直至病人无新的问题输入；In the second and subsequent rounds of dialogue, the patient's questions in the current round and the responses in the previous round are input into the semi-supervised medical dialogue model to obtain responses for the corresponding round of dialogue until the patient has no new questions input;

其中，半监督医疗对话模型包括上下文编码器、先验状态追踪器、推理状态追踪器、先验策略网络、推理策略网络和回复生成器，上下文编码器用于对接收到的信息进行编码并输入至先验状态追踪器和先验策略网络中，先验状态追踪器用于不断追踪用户的身体状态，先验策略网络用于生成医师相应的动作，回复生成器用于根据身体状态及医师动作，生成对应的回复；Among them, the semi-supervised medical dialogue model includes a context encoder, a priori state tracker, an inference state tracker, a priori policy network, an inference policy network and a reply generator. The context encoder is used to encode the received information and input it into the priori state tracker and the priori policy network. The prior state tracker is used to continuously track the user's physical state. The priori policy network is used to generate the doctor's corresponding actions. The reply generator is used to generate corresponding replies according to the physical state and the doctor's actions.

推理状态追踪器用于推理出用户的身体状态，推理策略网络用于推理出医师动作；推理状态追踪器和推理策略网络仅仅只在半监督医疗对话模型的训练阶段执行。The inference state tracker is used to infer the user's physical state, and the inference strategy network is used to infer the doctor's actions; the inference state tracker and the inference strategy network are only executed during the training phase of the semi-supervised medical dialogue model.

本发明的第二个方面提供一种半监督的多轮医疗对话回复生成系统。A second aspect of the present invention provides a semi-supervised multi-turn medical dialogue response generation system.

一种半监督的多轮医疗对话回复生成系统，其包括：A semi-supervised multi-turn medical dialogue response generation system, comprising:

第一轮对话回复生成模块，其用于将第一轮对话中病人的问题输入至半监督医疗对话模型，得到第一轮对话的回复；A first-round dialogue response generation module, which is used to input the patient's questions in the first-round dialogue into the semi-supervised medical dialogue model to obtain the responses to the first-round dialogue;

第二轮及其后对话回复生成模块，其用于在第二轮及其后对话中，将当前轮病人的问题及上一轮对话的回复输入至半监督医疗对话模型中，得到相应轮对话的回复，直至病人无新的问题输入；The second and subsequent dialogue response generation module is used to input the patient's questions in the current round and the responses in the previous round of dialogue into the semi-supervised medical dialogue model in the second and subsequent dialogues, and obtain the responses in the corresponding round of dialogues until the patient has no new questions input;

本发明的第三个方面提供一种计算机可读存储介质。A third aspect of the present invention provides a computer-readable storage medium.

一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现如上述所述的半监督的多轮医疗对话回复生成方法中的步骤。A computer-readable storage medium stores a computer program, which, when executed by a processor, implements the steps in the semi-supervised multi-round medical dialogue response generation method as described above.

本发明的第四个方面提供一种计算机设备。A fourth aspect of the present invention provides a computer device.

一种计算机设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现如上述所述的半监督的多轮医疗对话回复生成方法中的步骤。A computer device comprises a memory, a processor and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, the steps in the semi-supervised multi-round medical dialogue response generation method as described above are implemented.

与现有技术相比，本发明的有益效果是：Compared with the prior art, the present invention has the following beneficial effects:

(1)本发明在第二轮及其后对话中，将当前轮病人的问题及上一轮对话的回复输入至半监督医疗对话模型中，得到相应轮对话的回复，直至病人无新的问题输入，显式建模了用户的身体状态以及医师的动作，使用text span来进行表示，提升了模型对于病人生理状态建模和医疗推理的能力。(1) In the second and subsequent rounds of dialogue, the present invention inputs the patient's questions in the current round and the replies in the previous round of dialogue into the semi-supervised medical dialogue model to obtain replies for the corresponding round of dialogue until the patient has no new questions to input. The user's physical state and the doctor's actions are explicitly modeled and represented by text spans, thereby improving the model's ability to model the patient's physiological state and conduct medical reasoning.

(2)本发明在模型层面，将用户的身体状态和医师动作当做隐变量，并且提出了存在中间标注(即监督)和不存在中间标注(即无监督)的情况下，模型的训练方法。该方法大大减小了对话模型对于标注数据的依赖。(2) At the model level, the present invention treats the user's physical state and the doctor's actions as latent variables, and proposes a model training method with and without intermediate annotations (i.e., supervision). This method greatly reduces the dependence of the dialogue model on labeled data.

(3)本发明提出在策略网络学习的过程中，使用追踪到的病人状态从大规模医疗知识图谱中进行检索，显式的状态，动作和医疗知识图谱中的推理路径提升了对话系统生成回复的可解释性。(3) The present invention proposes to use the tracked patient status to retrieve from a large-scale medical knowledge graph during the process of policy network learning. The explicit status, action and reasoning path in the medical knowledge graph improves the interpretability of the responses generated by the dialogue system.

(4)在模型训练上，本发明提出了两阶段层叠推理的方法，提升了监督训练数据较少的情况下的稳定性。(4) In model training, the present invention proposes a two-stage cascading reasoning method to improve the stability when there is less supervised training data.

本发明附加方面的优点将在下面的描述中部分给出，部分将从下面的描述中变得明显，或通过本发明的实践了解到。Advantages of additional aspects of the present invention will be given in part in the following description, and in part will become obvious from the following description, or will be learned through practice of the present invention.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

构成本发明的一部分的说明书附图用来提供对本发明的进一步理解，本发明的示意性实施例及其说明用于解释本发明，并不构成对本发明的不当限定。The accompanying drawings in the specification, which constitute a part of the present invention, are used to provide a further understanding of the present invention. The exemplary embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute improper limitations on the present invention.

图1(a)是本发明实施例的监督数据训练；FIG. 1( a ) is a diagram of supervised data training according to an embodiment of the present invention;

图1(b)是本发明实施例的无监督数据训练；FIG1( b ) is an unsupervised data training of an embodiment of the present invention;

图1(c)是本发明实施例的测试阶段使用的模块；FIG1( c ) is a module used in the test phase of an embodiment of the present invention;

图2是本发明实施例的医疗对话系统具体实施方法；FIG2 is a specific implementation method of the medical dialogue system according to an embodiment of the present invention;

图3是本发明实施例的训练过程中的模型示意图。FIG. 3 is a schematic diagram of a model during a training process according to an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合附图与实施例对本发明作进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

应该指出，以下详细说明都是例示性的，旨在对本发明提供进一步的说明。除非另有指明，本文使用的所有技术和科学术语具有与本发明所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed descriptions are all illustrative and intended to provide further explanation of the present invention. Unless otherwise specified, all technical and scientific terms used herein have the same meanings as those commonly understood by those skilled in the art to which the present invention belongs.

需要注意的是，这里所使用的术语仅是为了描述具体实施方式，而非意图限制根据本发明的示例性实施方式。如在这里所使用的，除非上下文另外明确指出，否则单数形式也意图包括复数形式，此外，还应当理解的是，当在本说明书中使用术语“包含”和/或“包括”时，其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terms used herein are only for describing specific embodiments and are not intended to limit exemplary embodiments according to the present invention. As used herein, unless the context clearly indicates otherwise, the singular form is also intended to include the plural form. In addition, it should be understood that when the terms "comprising" and/or "including" are used in this specification, it indicates the presence of features, steps, operations, devices, components and/or combinations thereof.

术语解释：Terminology explanation:

编码器-解码器(Encoder-Decoder):一种神经网络结构，功能是将一个词序列编码后再解码转换成另一个词序列，主要用于机器翻译，对话系统等。Encoder-Decoder: A neural network structure that encodes a word sequence and then decodes it into another word sequence. It is mainly used in machine translation, dialogue systems, etc.

编码(encoding)：将词序列表示成一个连续向量。Encoding: Representing a word sequence as a continuous vector.

解码(decoding)：将一个连续向量表示成目标序列。Decoding: Represent a continuous vector as a target sequence.

期望(Expectation):试验中每次可能的结果乘以其结果概率的总和，本发明中使用E·的形式进行表示。Expectation: The sum of each possible outcome in an experiment multiplied by its outcome probability, which is expressed in the form of E· in the present invention.

KL散度(KL Divergence)：是两个概率分布之间的差别的非对称性度量，本发明采用KL(·||·)的形式表示，其计算公式如下：KL Divergence: It is an asymmetric measure of the difference between two probability distributions. The present invention uses the form of KL(·||·), and its calculation formula is as follows:

其中q,p表示两个离散分布，q(i)，p(i)分别表示分布q,p第i项概率值。Where q and p represent two discrete distributions, and q(i) and p(i) represent the probability values of the i-th item of distribution q and p respectively.

隐变量(Latent variable)：潜变量，或称隐变量，潜在变量，在统计学中的表示不可观测随机变量，与观测变量相对。Latent variable: latent variable, or hidden variable, latent variable, in statistics represents unobservable random variable, as opposed to observed variable.

训练阶段(Train)：神经网络模型的训练阶段接收训练数据作为输入，通过训练样本来不断调整神经网络模型中的参数。Training phase (Train): The training phase of the neural network model receives training data as input and continuously adjusts the parameters in the neural network model through training samples.

测试阶段(Test)：神经网络模型在训练过后，在测试阶段通过训练过的圣经网络模型参数输出输入数据对应的标签等信息。后面我们亦称之为部署阶段。Test phase: After the neural network model is trained, the trained neural network model parameters are used to output the labels and other information corresponding to the input data in the test phase. We will also call it the deployment phase.

实施例一Embodiment 1

本实施例提供了一种半监督的多轮医疗对话回复生成方法，其包括：This embodiment provides a semi-supervised multi-round medical dialogue response generation method, which includes:

上下文编码器用于对接收到的信息进行编码；对于第一轮对话的病人的问题进行直接编码；对于第二轮及其后对话病人的问题及相应上一轮对话的回复，编码形成上下文信息并输入至先验状态追踪器，推理状态追踪器，先验策略网络，推理策略网络和回复生成器五个模块中。The context encoder is used to encode the received information; the patient's questions in the first round of dialogue are directly encoded; for the patient's questions in the second and subsequent rounds of dialogue and the corresponding replies in the previous round of dialogue, the context information is encoded and input into the five modules of prior state tracker, reasoning state tracker, prior strategy network, reasoning strategy network and reply generator.

先验状态追踪器的输入信号为：从前一轮对话中的推理状态追踪器的输出概率分布中

采样的到的状态实例

输出概率分布

The input signal of the prior state tracker is: the output probability distribution of the inference state tracker in the previous round of dialogue

Sampled state instance

Output probability distribution

推理状态追踪器的输入信号为：从前一轮对话中的推理状态追踪器的输出概率分布中

采样的到的状态实例

和当前轮的医师回复R_t，输出概率分布

The input signal of the reasoning state tracker is: the output probability distribution of the reasoning state tracker in the previous round of dialogue

Sampled state instance

And the current round of physician responses R _t , output probability distribution

先验策略网络的输入信号为：从当前轮对话中的推理状态追踪器的输出概率分布中

采样的到的状态实例

以及外部医疗知识图谱G，输出概率分布

The input signal of the prior policy network is: the output probability distribution of the inference state tracker in the current round of dialogue

Sampled state instance

And the external medical knowledge graph G, output probability distribution

推理策略网络的输入信号为：从当前轮对话中的推理状态追踪器的输出概率分布中

采样的到的状态实例

和当前轮的医师回复R_t，输出概率分布

The input signal of the reasoning policy network is: the output probability distribution of the reasoning state tracker in the current round of dialogue

Sampled state instance

回复生成器的输入信号为：回复生成器输入分为训练阶段和测试阶段(就是部署的时候)两种情况，训练阶段其接收

以及从概率分布

中采样得到的实例

作为输入；测试阶段则接收

以及从概率分布

中采样得到的实例

作为输入，输出对话回复信息R_t。The input signal of the reply generator is: The reply generator input is divided into two situations: training phase and test phase (that is, when deployed). In the training phase, it receives

And from the probability distribution

The sampled examples

As input; the test phase receives

And from the probability distribution

The sampled examples

As input, the dialogue reply information R _t is output.

在实际部署阶段，在每个对话轮中，给定病人的表述，医疗对话系统采用先验状态追踪器不断追踪用户的身体状态，并且使用先验策略网络生成医师相应的动作，最后回复生成器结合从先验状态追踪器和先验策略网络采样得到的状态以及动作生成对应的回复，对应图1(c)过程。对话进程一直持续到病人无新的问题输入，即病人主动结束当前对话。In the actual deployment stage, in each dialogue round, given the patient's statement, the medical dialogue system uses the prior state tracker to continuously track the user's physical state, and uses the prior policy network to generate the doctor's corresponding actions. Finally, the reply generator combines the state and actions sampled from the prior state tracker and the prior policy network to generate the corresponding reply, corresponding to the process in Figure 1(c). The dialogue process continues until the patient has no new questions to input, that is, the patient actively ends the current dialogue.

医疗对话系统有两个关键特征：患者状态(症状、药物等)和医师动作(治疗、诊断等)。这两个特征使得医疗对话系统比其他知识密集型对话场景更加复杂。与任务导向对话系统相似，医学对话生成过程拆分为以下三个阶段：Medical dialogue systems have two key features: patient status (symptoms, medications, etc.) and physician actions (treatment, diagnosis, etc.). These two features make medical dialogue systems more complex than other knowledge-intensive dialogue scenarios. Similar to task-oriented dialogue systems, the medical dialogue generation process is divided into the following three stages:

(1)病人状态追踪：对于给定的对话历史，对话系统追踪状态的身体状态(state)；(1) Patient state tracking: For a given dialogue history, the dialogue system tracks the physical state of the patient.

(2)医师策略学习：给定病人状态和对话历史，对话系统给出当前医师的动作(action)；(2) Physician strategy learning: Given the patient status and dialogue history, the dialogue system gives the current physician’s action;

(3)医疗回复生成：给定对话历史，追踪到的状态以及预测的动作，给出流畅并准确的自然语言回复。(3) Medical response generation: Given the conversation history, tracked states, and predicted actions, give fluent and accurate natural language responses.

对于存在标注数据的场景，在对话的第t轮，病人给出问题或者描述自己的症状U_t，后医疗对话系统接收前一轮的回复R_t-1，当前轮问题U_t和前一轮追踪到的状态S_t-1，然后输出当前轮的状态S_t，后再利用R_t-1U_tS_t输出当前轮医师应采取的动作A_t，最后生成自然语言形式的回复R_t反馈给病人。但是在医疗对话系统中，很多情况下，病人的生理状态和医师的动作是不存在标注的。故我们将状态和动作都视为隐变量，并且考虑到state贯穿整个对话过程，所以使用一个序列的词来表示；医师动作亦是如此，即医师的回复中可能包含多个关键词。实际操作过程中，状态和动作的长度被设置为固定的长度分别为|S|和|A|。并且状态存在一个初始值，为“<pad><pad>...<pad>”，其中“<pad>”表示一个填充词。State和Action的设计细节如下:For scenarios with labeled data, in the tth round of the conversation, the patient asks a question or describes his symptoms U _t , and then the medical dialogue system receives the previous round's reply R _t-1 , the current round's question U _t and the state _St-1 tracked in the previous round, and then outputs the current round's state St _t , and then uses R _t-1 U _t St _t to output the action _At that the physician should take in the current round, and finally generates a natural language reply R _t to feed back to the patient. However, in medical dialogue systems, in many cases, the patient's physiological state and the physician's action are not labeled. Therefore, we regard both state and action as hidden variables, and considering that state runs through the entire conversation process, we use a sequence of words to represent it; the same is true for physician actions, that is, the physician's reply may contain multiple keywords. In actual operation, the length of state and action is set to a fixed length of |S| and |A| respectively. And the state has an initial value, which is "<pad><pad>...<pad>", where "<pad>" represents a filler word. The design details of State and Action are as follows:

state的设计：state用于记录整个对话过程中的对话系统所获取到的用户身体状态的信息，其使用一个序列的词来表示，例如“感冒发热咳嗽夜汗......”，并且其初始化为“<pad><pad>......<pad>”。State design: state is used to record the information about the user's physical state obtained by the dialogue system during the entire dialogue process. It is represented by a sequence of words, such as "cold, fever, cough, night sweats...", and is initialized to "<pad><pad>......<pad>".

action的设计:action用于表示医师回复的概要，其亦使用一个序列的表示，例如“999感冒灵颗粒急支糖浆......”。Action design: Action is used to express the summary of the doctor's reply, which is also expressed as a sequence, such as "999 Ganmao Ling Granules Jizhi Syrup..."

半监督医疗对话模型包含了六个模块，分别为上下文编码器(context encoder)，先验状态追踪器(prior state tracker)，推理状态追踪器(inference state tracker)，先验策略网络(prior policy network)，推理策略网络(inference state tracker)和回复生成器(response generator)。在一整个医疗对话中，往往包含多次交互，以下过程经历多轮直至对话结束。The semi-supervised medical dialogue model consists of six modules: context encoder, prior state tracker, inference state tracker, prior policy network, inference state tracker, and response generator. In a whole medical dialogue, there are often multiple interactions, and the following process goes through multiple rounds until the dialogue ends.

其中先验状态追踪器，推理状态追踪器用于病人状态追踪，其中推理状态追踪器只在训练阶段执行；先验策略网络，推理策略网络用于医师策略学习，其中推理策略网络只在训练阶段执行；回复生成器用于医疗回复生成。下面主要从无监督的角度，即使用无监督数据D^u，对应图1(b)来描述各个模块的输入输出。The prior state tracker and the reasoning state tracker are used for patient state tracking, and the reasoning state tracker is only executed in the training phase; the prior policy network and the reasoning policy network are used for physician policy learning, and the reasoning policy network is only executed in the training phase; the response generator is used for medical response generation. The following mainly describes the input and output of each module from an unsupervised perspective, that is, using unsupervised data ^Du , corresponding to Figure 1(b).

在t轮，上下文编码器是一个基于GRU(或者基于LSTM,Transformer,Bert)的编码器，其接收上一轮的回复R_t-1以及当前轮的病人的问题U_t作为输入，并且输出一个连续空间向量

来表示对话上下文。In round t, the context encoder is a GRU-based (or LSTM-based, Transformer-based, Bert-based) encoder that receives the previous round’s response R _t-1 and the current round’s patient’s question U _t as input, and outputs a continuous space vector

To represent the conversation context.

在第t轮，给定前一轮回复R_t-1以及当前轮的病人的问题U_t作为输入，上下文编码器首先使用双向GRU Encoder编码得到一个序列词粒度的表示H_t＝{h_t,1,h_t,2,…,h_t，M+N}，并且输出一个向量

来表示对话上下文。其中M和N分别表示R_t-1和U_t序列长度。In the tth round, given the previous round reply Rt _-1 and the current round patient's question _Ut as input, the context encoder first uses the bidirectional GRU Encoder to encode a sequence word granularity representation _Ht = {ht _,1 , _ht,2 ,…, _ht,M+N }, and outputs a vector

to represent the conversation context. Where M and N represent the length of R _t-1 and U _t sequences respectively.

其中

表示R_t-1第i个词的词嵌入(embedding)，这个BiGRU编码器采用了上一个时刻的上下文表示

初始化，attn[17]表示的是attention操作。in

Represents the word embedding of the i-th word in R _t-1 . This BiGRU encoder uses the context representation of the previous moment.

Initialization, attn[17] represents the attention operation.

先验状态追踪器，接收上下文编码器输出以及前一时刻的状态

作为输入，然后采用一个基于GRU的解码器来输出一个序列的词，即

推理状态追踪器采用了先验状态追踪器相近的结构，但是额外接受了当前轮的回复R_t作为输入，其输出一个词序列，即

我们使用

和

分别表示先验状态追踪器和推理状态追踪器，生成概率分布简写为

和

The prior state tracker receives the context encoder output and the state at the previous moment

As input, a GRU-based decoder is then used to output a sequence of words, namely

The reasoning state tracker uses a structure similar to the prior state tracker, but additionally accepts the current round response _Rt as input, and outputs a word sequence, namely

We use

and

They represent the prior state tracker and the inference state tracker respectively, and the generated probability distribution is abbreviated as

and

先验状态追踪器和推理状态追踪器均是编码器-解码器(Encoder-Decoder)结构。在无监督信息的情况下，所有的对话轮的状态是都是不可知的，并且后一轮的状态需要依赖于前一轮的状态作为输入，故我们从

采样得到

送入先验状态追踪器和推理状态追踪器中。Both the prior state tracker and the reasoning state tracker are encoder-decoder structures. In the absence of supervision information, the states of all dialogue rounds are unknown, and the state of the next round needs to depend on the state of the previous round as input, so we start from

Sampling

Feed it into the prior state tracker and the inference state tracker.

先验状态追踪器首先将采样得到的

编码为

使用

初始化先验状态追踪器的解码器，其中

为训练参数。在第i个解码时刻，输出

然后序列解码得到S_t的先验分布为:The prior state tracker first converts the sampled

Encoded as

use

Initialize the decoder of the prior state tracker, where

is the training parameter. At the i-th decoding moment, the output

Then the prior distribution of _St obtained by sequence decoding is:

其中MLP表示的是多层感知机(Multi-Layer Perceptron)。|S|为状态text span的长度。Where MLP stands for Multi-Layer Perceptron. |S| is the length of the state text span.

推理状态追踪器同先验状态追踪器的结构类似，其亦使用GRU Encoder将

编码为

另外编码R_t为

其使用

初始化解码器，其中

为训练参数。在第i个解码时刻，输出

然后序列解码得到S_t的近似后验分布：The inference state tracker has a similar structure to the prior state tracker, and it also uses the GRU Encoder to

Encoded as

In addition, R _t is encoded as

Its use

Initialize the decoder, where

is the training parameter. At the i-th decoding moment, the output

Then the sequence is decoded to obtain the approximate posterior distribution of _St :

先验策略网络，接收上下文编码器输出，当前轮的S_t以及外部医疗知识G作为输入，然后使用一个基于GRU的解码器输出一个序列的词，即

推理策略网络结构相近，其接收

S_t并且额外接收当前轮回复R_t作为输出，后输出一个序列的词，即

我们使用

和

分别表示先验策略网络和推理策略网络，简写为

和

The prior policy network receives the context encoder output, the current round _St and the external medical knowledge G as input, and then uses a GRU-based decoder to output a sequence of words, i.e.

The reasoning strategy network structure is similar, and its receiving

S _t and additionally receives the current round reply R _t as output, and then outputs a sequence of words, i.e.

We use

and

Represent the prior strategy network and the reasoning strategy network, respectively, abbreviated as

and

先验策略网络以及推理策略网络亦是Encoder-Decoder的结构。其中先验策略网络从

采样得到

推理策略网络从

采样得到

The prior strategy network and the reasoning strategy network are also encoder-decoder structures.

Sampling

The reasoning strategy network is

Sampling

在介绍两个策略网络之前，首先引入一个知识图谱检索操作qsub，和知识图谱编码操作RGAT[15]。qsub从G中使用追踪得到的状态从医疗知识图谱G中进行检索得到一个子图G_n，从state作为起始点，抽取出n步跳转可达的所有的节点和边，并且连接所有出现在state中的节点，以保证图G_n是全连通的。RGAT是一种图编码方法，其结合了边的类型，进行多次传播后得到节点的embedding表示，即一个连续空间上的向量表示。我们使用

表示G_n编码后的节点表示，其中|G_n|为G_n中的节点数量。Before introducing the two policy networks, we first introduce a knowledge graph retrieval operation qsub and a knowledge graph encoding operation RGAT[15]. qsub retrieves a subgraph _Gn from the medical knowledge graph G using the state obtained by tracing from G. Starting from state, it extracts all nodes and edges that can be reached by n-step jumps, and connects all nodes that appear in state to ensure that the graph _Gn is fully connected. RGAT is a graph encoding method that combines the type of edges and obtains the embedding representation of the node after multiple propagations, that is, a vector representation in a continuous space. We use

represents the node representation after _Gn encoding, where | _Gn | is the number of nodes in _Gn .

先验策略网络使用GRU Encoder将

编码为

后来使用

来初始化解码器，在第i个解码时刻，输出

解码过程包含两个部分，一种从词表中生成，另一种从检索得到的知识图谱G_n中进行拷贝。The prior policy network uses GRU Encoder to

Encoded as

Later use

To initialize the decoder, at the i-th decoding moment, output

The decoding process consists of two parts, one is generated from the vocabulary, and the other is copied from the retrieved knowledge graph _Gn .

其中e_j表示G_n中的第j个节点，g_j表示

中第j个节点的embedding。Z_A为生成Where e _j represents the jth node in G _n , g _j represents

The embedding of the jth node in _{Z A} is generated

拷贝的正则项。在e_j＝A_t,i的情况下I(e_j,A_t,i)＝1，否则I(e_j,A_t,i)＝0。Regularization term of the copy. In the case of e _j = A _t,i, I(e _j , A _t,i ) = 1, otherwise I(e _j , A _t,i ) = 0.

则A_t的先验分布可以表示为:Then the prior distribution of _At can be expressed as:

推理策略网络使用GRU Encoder编码

编码为

编码R_t为

后来使用

初始化解码器，在第i个解码时刻输出

为了强化R_t对于结果的影响，对于A_t近似后验分布，我们只考虑直接的生成概率。The reasoning policy network is encoded using GRU Encoder

Encoded as

The code R _t is

Later use

Initialize the decoder and output at the i-th decoding time

In order to strengthen the impact of R _t on the results, we only consider the direct generation probability for the approximate posterior distribution of A _t .

回复生成器是一个基于GRU的解码器，接收上下文编码器输出

S_t和A_t作为输入，然后输出医疗回复R_t。使用

表示回复生成器，简写为

The reply generator is a GRU-based decoder that receives the context encoder output

S _t and A _t are used as input, and then the medical response R _t is output.

Represents a reply generator, abbreviated as

回复生成器在无监督训练阶段只使用推理状态追踪器和推理策略网络的输出。在无监督训练过程中，我们从

和

分别采样得到

和

将其编码为

和

后来初始化回复生成器的解码器为

在第i个解码时刻输出

则得到R_t的输出概率为：The reply generator uses only the output of the inference state tracker and the inference policy network during the unsupervised training phase.

and

Sampled separately

and

Encode it as

and

Later the decoder of the reply generator is initialized as

Output at the i-th decoding time

Then the output probability of R _t is:

其中

表示从词表中生成的概率，

表示从

R_t-1和U_t中拷贝的概率，|R|为回复的长度。in

represents the probability generated from the vocabulary,

Indicates from

R _t-1 and the probability of copying in U _t , |R| is the length of the reply.

对于监督数据和无监督数据的训练损失函数分别为L^sup和L^un，其中L^un为：The training loss functions for supervised data and unsupervised data are L ^sup and L ^un respectively, where L ^un is:

其中E·表示期望，KL(·||·)表KL散度(Kullback-Leibler divergence)。Where E· represents expectation, and KL(·||·) represents KL divergence (Kullback-Leibler divergence).

考虑到在监督数据比例较少的情况下，训练中存在的不稳定性，即先验策略网络容易受从先验状态追踪器采样得到的错误的state误导。本发明提出了两阶段层叠推理训练方法，将L^un分别为多个训练部分，由于策略网络依赖于状态追踪器的输出，故首先优化

再同时优化剩余模块，以提高训练过程中的稳定性。L^un被拆分为L^s和L^a两个训练目标：Considering the instability in training when the proportion of supervised data is small, that is, the prior policy network is easily misled by the wrong state sampled from the prior state tracker, the present invention proposes a two-stage cascade reasoning training method, which divides ^Lun into multiple training parts. Since the policy network depends on the output of the state tracker, it is first optimized.

The remaining modules are then optimized simultaneously to improve the stability during training. L ^un is split into two training objectives, L ^s and L ^a :

在第一训练阶段，最小化L^s提升模型状态追踪性能，第二阶段最小化L^s+L^a以维持状态追踪效果以及训练模型的策略学习能力。我们将其命名为两阶段层叠推理训练方法。In the first training phase, minimizing L ^s improves the model state tracking performance, and in the second phase, minimizing L ^s + L ^a maintains the state tracking effect and the policy learning ability of the training model. We name it the two-stage cascade reasoning training method.

图3训练过程中的模型示意图，global_step为一个整数用于记录训练经过轮数。Figure 3 is a schematic diagram of the model during training. global_step is an integer used to record the number of training rounds.

在半监督场景下，用于模型训练的对话数据存在监督和无监督数据两个部分，下面我们分别介绍对于监督数据D^a和无监督数据D^u的训练方法。In the semi-supervised scenario, the conversation data used for model training consists of supervised and unsupervised data. Below we introduce the training methods for supervised data ^Da and unsupervised data ^Du respectively.

(a)对于监督数据D^a (a) For the supervised data ^Da

从D^a采样训练样本构成训练所需要的小批量(即mini-batch)，得到数据R_t-1,U_t,S_t-1,S_t,A_t，R_t。将对应输入数据送入上述的6个模块，对应图1中的(a)。采用Negative LogLikelihood(NLL)Loss来进行训练。实际的训练损失函数为:The training samples are sampled from ^Da to form the mini-batch required for training, and the data Rt _-1 , _Ut , St _-1 , _St , _At , and _Rt are obtained. The corresponding input data is sent to the above 6 modules, corresponding to (a) in Figure 1. Negative LogLikelihood (NLL) Loss is used for training. The actual training loss function is:

(b)对于无监督数据D^u (b) For the unsupervised data ^Du

从D^u采样训练样本构成训练所需要的小批量(即mini-batch)，得到数据R_t-₁,U_t,R_t。中间的标注数据S_t-1,S_t,A_t因无标注均缺失。我们从

采样得到

后将

送入到

和

中。再后来，从

和

中分别采样得到

和

分别作为

和

的输入。再然后从

采样得到

最后

结合R_t-1,U_t生成回复R_t，以上过程对应图1中的(b)。训练loss为L^un(亦可选用L^s+L^a作为训练loss，以提高训练稳定性)。We sample training samples from ^Du to form the mini-batch required for training, and obtain the data R _t - ₁ , U _t , R _t . The labeled data in the middle, St _-1 , St _t , At _, are missing because they are not labeled.

Sampling

Later

Send to

and

Later, from

and

The samples were obtained from

and

As

and

Then from

Sampling

at last

Combine R _t-1 and U _t to generate the reply R _t . The above process corresponds to (b) in Figure 1. The training loss is L ^un (L ^s +L ^a can also be used as the training loss to improve the training stability).

对于整个训练数据集D＝{D^a，D^u}，具体训练步骤如下:For the entire training data set D = {D ^a , ^Du }, the specific training steps are as follows:

Step1:假设监督数据D^a占全部训练数据D的比例为α(0≤α≤1)，选择0-1之间的随机数，如果小于α转Step2，如果大于α转Step3。Step 1: Assume that the proportion of the supervised data ^Da to the total training data D is α (0≤α≤1), select a random number between 0 and 1. If it is less than α, go to Step 2; if it is greater than α, go to Step 3.

Step2:采用于监督数据训练模型，对应(a)方式，训练loss为L^sup，梯度下降更新参数后转Step4。Step 2: Use supervised data to train the model, corresponding to method (a), with the training loss L ^sup . Update the parameters by gradient descent and then go to Step 4.

Step3:采用于监督数据训练模型，对应(b)方式，训练loss为L^un，梯度下降更新参数后转Step4。Step 3: Use supervised data to train the model, corresponding to method (b), the training loss is ^Lun , and the parameters are updated by gradient descent and then go to Step 4.

Step4:判断模型是否收敛，若收敛则转Step5,否则转Step1。Step 4: Determine whether the model converges. If so, go to Step 5, otherwise go to Step 1.

Step5:保存模型权重，结束训练，如图3所示。Step 5: Save the model weights and end the training, as shown in Figure 3.

使用目前工业界和学术界内公开的医疗对话数据集，训练得到半监督医疗对话模型。其中，对于采样得到的监督数据和无监督数据，送入模型中，算出对应的损失函数后进行梯度下降，优化模型参数。The semi-supervised medical dialogue model is trained using the currently public medical dialogue datasets in the industry and academia. The sampled supervised data and unsupervised data are fed into the model, and the corresponding loss function is calculated and then gradient descent is performed to optimize the model parameters.

模型训练完成后，模型的参数便全部固定，我们可将推理状态追踪器和推理策略网络丢弃。此时，模型就可以应用到实际的对话场景中。如图2所示，给定病人问题输入模型中，上下文编码器，先验状态追踪器，先验策略网络，回复生成器相继工作，(此刻回复生成器只使用先验状态追踪器输出

和先验策略网络的输出

作为输入)，最后生成回复返回给用户。对话系统持续和病人交互，先验状态追踪器在每个对话轮中先使用前一时刻的状态作为输入，后更新追踪到的病人身体状态，等待一段时间过后未接收到病人新的问题，结束当前会话。After the model training is completed, the model parameters are all fixed, and we can discard the inference state tracker and inference policy network. At this point, the model can be applied to actual conversation scenarios. As shown in Figure 2, given a patient question input model, the context encoder, prior state tracker, prior policy network, and response generator work in sequence (at this moment, the response generator only uses the prior state tracker output

and the output of the prior policy network

As input), the system generates a reply and returns it to the user. The dialogue system continues to interact with the patient. In each dialogue round, the prior state tracker first uses the state of the previous moment as input, then updates the tracked patient's physical state, and ends the current session after a period of time has passed without receiving any new questions from the patient.

实施例二Embodiment 2

其中，半监督医疗对话模型包括上下文编码器、先验状态追踪器、推理策略状态追踪器、先验策略网络、推理策略网络和回复生成器，上下文编码器用于对接收到的信息进行编码并输入至先验状态追踪器和先验策略网络中，先验状态追踪器用于不断追踪用户的身体状态，先验策略网络用于生成医师相应的动作，回复生成器用于根据身体状态及医师动作，生成对应的回复；Among them, the semi-supervised medical dialogue model includes a context encoder, a priori state tracker, an inference strategy state tracker, a priori strategy network, an inference strategy network and a reply generator. The context encoder is used to encode the received information and input it into the priori state tracker and the priori strategy network. The priori state tracker is used to continuously track the user's physical state. The priori strategy network is used to generate the doctor's corresponding actions. The reply generator is used to generate corresponding replies according to the physical state and the doctor's actions.

本实施例中的各个模块，与实施例一中的各个步骤一一对应，其具体实施过程相同，此处不再累述。Each module in this embodiment corresponds to each step in the first embodiment one by one, and the specific implementation process is the same, which will not be repeated here.

实施例三Embodiment 3

本实施例提供了一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现如上述所述的半监督的多轮医疗对话回复生成方法中的步骤。This embodiment provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps in the semi-supervised multi-round medical dialogue response generation method as described above.

实施例四Embodiment 4

本实施例提供了一种计算机设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现如上述所述的半监督的多轮医疗对话回复生成方法中的步骤。This embodiment provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, the steps in the semi-supervised multi-round medical dialogue response generation method as described above are implemented.

本领域内的技术人员应明白，本发明的实施例可提供为方法、系统、或计算机程序产品。因此，本发明可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且，本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present invention may be provided as methods, systems, or computer program products. Therefore, the present invention may take the form of hardware embodiments, software embodiments, or embodiments combining software and hardware. Moreover, the present invention may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) containing computer-usable program codes.

本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to the flowchart and/or block diagram of the method, device (system), and computer program product according to the embodiment of the present invention. It should be understood that each process and/or box in the flowchart and/or block diagram, as well as the combination of the process and/or box in the flowchart and/or block diagram can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processor or other programmable data processing device to produce a machine, so that the instructions executed by the processor of the computer or other programmable data processing device produce a device for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的程序可存储于一计算机可读取存储介质中，该程序在执行时，可包括如上述各方法的实施例的流程。其中，所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory，ROM)或随机存储记忆体(RandomAccessMemory，RAM)等。A person skilled in the art can understand that all or part of the processes in the above-mentioned embodiments can be implemented by instructing the relevant hardware through a computer program, and the program can be stored in a computer-readable storage medium, and when the program is executed, it can include the processes of the embodiments of the above-mentioned methods. The storage medium can be a disk, an optical disk, a read-only memory (ROM) or a random access memory (RAM), etc.

以上所述仅为本发明的优选实施例而已，并不用于限制本发明，对于本领域的技术人员来说，本发明可以有各种更改和变化。凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and variations. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the protection scope of the present invention.

Claims

1. A semi-supervised multi-round medical dialogue response generation method, characterized by comprising:

Input the patient's questions in the first round of dialogue into the semi-supervised medical dialogue model to obtain the responses in the first round of dialogue;

In the second and subsequent rounds of dialogue, the patient's questions in the current round and the responses in the previous round are input into the semi-supervised medical dialogue model to obtain responses for the corresponding round of dialogue until the patient has no new questions input;

Among them, the semi-supervised medical dialogue model includes a context encoder, a priori state tracker, an inference strategy state tracker, a priori strategy network, an inference strategy network and a response generator. The context encoder is used to encode the received information and input it into the priori state tracker and the priori strategy network. The priori state tracker is used to continuously track the user's physical state. The input signal of the priori strategy network is: the state instance sampled from the output probability distribution of the inference state tracker in the current round of dialogue, and the external medical knowledge graph G;

The decoding process of the prior policy network consists of two parts, one is generated from the vocabulary, and the other is copied from the retrieved knowledge graph _Gn :

in,

is a continuous space vector representing the conversation context; the prior strategy network uses the GRU autoencoder to

Encoded as

represents the output probability distribution of the inference state tracker in the current round of dialogue

The sampled state instance; MLP represents the multi-layer perception mechanism,

represents the output of the prior strategy network at the i-th decoding time; e _j represents the j-th node in G _n , and g _j represents

The word embedding of the jth node in ; Z _A is the regularization term for generating copies; in the case of e _j = At _,i , I(e _j ,A _t,i ) = 1, otherwise I(e _j ,A _t,i ) = 0;

The prior policy network is used to generate physician actions and output probability distribution

Among them, |A| represents the length of the action;

The response generator is used to generate corresponding responses according to the physical state and the physician's actions;

The inference state tracker is used to infer the user's physical state, and the inference strategy network is used to infer the doctor's actions. The inference state tracker and the inference strategy network are only executed during the training phase of the semi-supervised medical dialogue model.

In the unsupervised training process,

and

Sampled separately

and

Encode it as

and

Then, we initialize the decoder of the reply generator as

Output at the i-th decoding time

Then the output probability of R _t is:

in,

represents the probability generated from the vocabulary,

Indicates from

The probability of copying in R _t-1 and U _t , |R| is the length of the reply;

Represents the probability distribution

The examples sampled in ; R _t represents the doctor’s response in the current round; R _t-1 represents the doctor’s response in the previous round; U _t represents the question in the current round;

represents the output probability distribution of the reasoning state tracker in the current round of dialogue;

Represents the output probability distribution of the reasoning strategy network in the current round of dialogue;

According to the two-stage cascade inference training method, the training loss function L ^un of unsupervised data is split into two training targets L ^s and L ^a . Since the policy network depends on the output of the state tracker, the inference state tracker and the inference policy network are optimized first, and then the remaining modules are optimized at the same time.

in,

Where E· represents expectation, KL(·||·) represents KL divergence (Kullback-Leibler divergence); A _t represents the action that the doctor should take in the current round; S _t represents the output state of the current round; S _t-1 represents the state tracked in the previous round;

represents the output probability distribution of the reasoning state tracker in the previous round of dialogue;

represents the prior distribution of _At ;

represents a reply generator;

In the first stage, L ^s is minimized to improve the model state tracking performance, and in the second stage, L ^s + L ^a is minimized to maintain the state tracking effect and the strategy learning ability of the training model.

2. The semi-supervised multi-round medical dialogue response generation method as described in claim 1 is characterized in that the reasoning state tracker and the reasoning strategy network are both encoder-decoder structures.

3. The semi-supervised multi-round medical dialogue response generation method as described in claim 1 is characterized in that the prior state tracker and the prior strategy network are both encoder-decoder structures.

4. The semi-supervised multi-round medical dialogue response generation method as described in claim 1 is characterized in that the response generator is a GRU-based decoder.

5. A semi-supervised multi-round medical dialogue response generation system, characterized by comprising:

A first-round dialogue response generation module, which is used to input the patient's questions in the first-round dialogue into the semi-supervised medical dialogue model to obtain the responses to the first-round dialogue;

The second and subsequent dialogue response generation module is used to input the patient's questions in the current round and the responses in the previous round of dialogue into the semi-supervised medical dialogue model in the second and subsequent dialogues, and obtain the responses in the corresponding round of dialogues until the patient has no new questions input;

in,

Encoded as

Among them, |A| represents the length of the action;

In the unsupervised training process,

and

Sampled separately

and

Encode it as

and

Then, we initialize the decoder of the reply generator as

Output at the i-th decoding time

Then the output probability of R _t is:

in,

represents the probability generated from the vocabulary,

Indicates from

The probability of copying in R _t-1 and U _t , |R| is the length of the reply;

Represents the probability distribution

in,

represents the prior distribution of _At ;

represents a reply generator;

6. The semi-supervised multi-round medical dialogue response generation system as described in claim 5 is characterized in that the reasoning state tracker and the reasoning strategy network are both encoder-decoder structures; the prior state tracker and the prior strategy network are both encoder-decoder structures.

7. A computer-readable storage medium having a computer program stored thereon, characterized in that when the program is executed by a processor, the steps in the semi-supervised multi-round medical dialogue response generation method as described in any one of claims 1 to 4 are implemented.

8. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, the steps in the semi-supervised multi-round medical dialogue response generation method as described in any one of claims 1 to 4 are implemented.