CN113436752B - Semi-supervised multi-round medical dialogue reply generation method and system - Google Patents

Semi-supervised multi-round medical dialogue reply generation method and system Download PDF

Info

Publication number
CN113436752B
CN113436752B CN202110577272.8A CN202110577272A CN113436752B CN 113436752 B CN113436752 B CN 113436752B CN 202110577272 A CN202110577272 A CN 202110577272A CN 113436752 B CN113436752 B CN 113436752B
Authority
CN
China
Prior art keywords
dialogue
round
state
inference
semi
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110577272.8A
Other languages
Chinese (zh)
Other versions
CN113436752A (en
Inventor
任昭春
任鹏杰
陈竹敏
李冬冬
马军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202110577272.8A priority Critical patent/CN113436752B/en
Publication of CN113436752A publication Critical patent/CN113436752A/en
Application granted granted Critical
Publication of CN113436752B publication Critical patent/CN113436752B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H80/00ICT specially adapted for facilitating communication between medical practitioners or patients, e.g. for collaborative diagnosis, therapy or health monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Animal Behavior & Ethology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention belongs to the field of conversational information processing, and provides a semi-supervised multi-round medical conversation reply generation method and system. The method comprises the steps of inputting the problems of a patient in a first round of dialogue into a semi-supervised medical dialogue model to obtain replies of the first round of dialogue; in the second round and the later conversations, inputting the problems of the current round of patients and the replies of the previous round of conversations into a semi-supervised medical conversation model to obtain replies of the corresponding round of conversations until the patients have no new problem input; the semi-supervised medical dialogue model comprises a context encoder, a priori state tracker, an inference strategy state tracker, a priori strategy network, an inference strategy network and a reply generator, wherein the context encoder is used for encoding received information and inputting the information into the priori state tracker and the priori strategy network, the priori state tracker is used for continuously tracking the physical state of a user, the priori strategy network is used for generating a doctor action, and the reply generator is used for generating a corresponding reply according to the physical state and the doctor action.

Description

一种半监督的多轮医疗对话回复生成方法及系统A semi-supervised multi-round medical dialogue response generation method and system

技术领域Technical Field

本发明属于对话式信息处理领域,尤其涉及一种半监督的多轮医疗对话回复生成方法及系统。The present invention belongs to the field of conversational information processing, and in particular relates to a semi-supervised multi-round medical conversation response generation method and system.

背景技术Background Art

本部分的陈述仅仅是提供了与本发明相关的背景技术信息,不必然构成在先技术。The statements in this section merely provide background information related to the present invention and do not necessarily constitute prior art.

同时为了解决开放领域的信息需求和高度垂直领域的专业需求,会话范式被用来将人们与信息联系起来。现有的对话系统可分为两大类:面向任务的和开放域对话系统。以任务为导向的对话系统旨在帮助人们完成特定的任务。例如日程安排,订餐馆,查询天气。开放域对话系统主要是与人们聊天,用于满足人们对信息和娱乐的需求。不同于医疗问答,真实医学场景中的对话更可能包含多轮交互。因为患者需要通过对话的上下文来表达他/她的症状、他/她正在服用的药物和他/她的病史。这一特性使得显式状态追踪变得不可或缺,其提供了比隐状态表示更具指示性和可解释性的信息。考虑到医学对话的特殊性,医学推理能力(例如是否开药,开什么药治疗疾病,询问何种症状)也是医疗诊断中不可或缺的特性。In order to address both the information needs of open domains and the professional needs of highly vertical domains, the conversational paradigm is used to connect people with information. Existing dialogue systems can be divided into two categories: task-oriented and open-domain dialogue systems. Task-oriented dialogue systems are designed to help people complete specific tasks. For example, scheduling, booking restaurants, and checking the weather. Open-domain dialogue systems are mainly used to chat with people to meet people's needs for information and entertainment. Unlike medical Q&A, conversations in real medical scenarios are more likely to contain multiple rounds of interactions. Because the patient needs to express his/her symptoms, the medicines he/she is taking, and his/her medical history through the context of the conversation. This feature makes explicit state tracking indispensable, which provides more indicative and interpretable information than hidden state representation. Considering the particularity of medical dialogue, medical reasoning capabilities (such as whether to prescribe medicine, what medicine to prescribe to treat the disease, and what symptoms to ask) are also indispensable features in medical diagnosis.

现有的医疗对话方法是基于任务导向的对话范式构建,遵循的是患者表达症状的,对话系统返回诊断结果(即确定病人患了什么疾病)的范式。其取得了很好的效果。但这些方法只聚焦于诊断这一单一领域,无法满足实际应用中病人的多种需求,而且其需要大量人工标注的状态和动作。当对话数据高度机密或数据规模巨大时是无法实现的,并且这些工作受限于训练数据规模的影响,甚至无法使用生成式的方法来生成回复,只能通过模板的方式来组成回复。一些任务型对话的方法可以应用于医疗对话中的状态追踪,但是其依旧无法应对无充分标注数据的情景。为了减轻任务导向对话系统对于数据标注的需求,Jin等和Zhang等都使用了半监督的学习方法来进行状态追踪,但忽视了对话主体的推理能力,即未建模医师的动作。Liang等提出一种利用未完全标注的数据来训练任务导向对话系统中的特定模块的方法,但是无法在训练时刻推理出未标注的标签,致使其在医疗对话系统中同时无状态和动作标注的情况下提升有限。发明人发现,这些方法都未考虑从大规模医疗知识中进行检索,未能生成富含知识的回复,在医学对话这种对于推理能力有很强需求的场景中表现很差。Existing medical dialogue methods are based on the task-oriented dialogue paradigm, following the paradigm that the patient expresses symptoms and the dialogue system returns the diagnosis result (i.e., determines what disease the patient has). It has achieved good results. However, these methods only focus on the single field of diagnosis, which cannot meet the various needs of patients in actual applications, and require a large number of manually labeled states and actions. This is impossible when the dialogue data is highly confidential or the data scale is huge, and these works are limited by the impact of the training data scale, and even generative methods cannot be used to generate replies, and replies can only be composed through templates. Some task-based dialogue methods can be applied to state tracking in medical dialogues, but they still cannot cope with scenarios without sufficient labeled data. In order to reduce the demand for data labeling in task-oriented dialogue systems, Jin et al. and Zhang et al. both used semi-supervised learning methods for state tracking, but ignored the reasoning ability of the dialogue subject, that is, the actions of the physician were not modeled. Liang et al. proposed a method to use incompletely labeled data to train specific modules in task-oriented dialogue systems, but could not infer unlabeled labels at the time of training, resulting in limited improvement in medical dialogue systems without both state and action labeling. The inventors found that these methods did not consider retrieval from large-scale medical knowledge, failed to generate knowledge-rich responses, and performed poorly in scenarios such as medical dialogues that have strong demands for reasoning ability.

发明内容Summary of the invention

为了解决上述背景技术中存在的技术问题,本发明提供一种半监督的多轮医疗对话回复生成方法及系统,其同时考虑了病人状态和医师动作,使得对话系统同时具备了建模用户身体状态和医学推理的能力。In order to solve the technical problems existing in the above-mentioned background technology, the present invention provides a semi-supervised multi-round medical dialogue response generation method and system, which takes into account both the patient status and the doctor's actions, so that the dialogue system has the ability to model the user's physical status and medical reasoning at the same time.

为了实现上述目的,本发明采用如下技术方案:In order to achieve the above object, the present invention adopts the following technical solution:

本发明的第一个方面提供一种半监督的多轮医疗对话回复生成方法。A first aspect of the present invention provides a semi-supervised method for generating responses to multi-round medical conversations.

一种半监督的多轮医疗对话回复生成方法,其包括:A semi-supervised multi-round medical dialogue response generation method, comprising:

将第一轮对话中病人的问题输入至半监督医疗对话模型,得到第一轮对话的回复;Input the patient's questions in the first round of dialogue into the semi-supervised medical dialogue model to obtain the responses in the first round of dialogue;

在第二轮及其后对话中,将当前轮病人的问题及上一轮对话的回复输入至半监督医疗对话模型中,得到相应轮对话的回复,直至病人无新的问题输入;In the second and subsequent rounds of dialogue, the patient's questions in the current round and the responses in the previous round are input into the semi-supervised medical dialogue model to obtain responses for the corresponding round of dialogue until the patient has no new questions input;

其中,半监督医疗对话模型包括上下文编码器、先验状态追踪器、推理状态追踪器、先验策略网络、推理策略网络和回复生成器,上下文编码器用于对接收到的信息进行编码并输入至先验状态追踪器和先验策略网络中,先验状态追踪器用于不断追踪用户的身体状态,先验策略网络用于生成医师相应的动作,回复生成器用于根据身体状态及医师动作,生成对应的回复;Among them, the semi-supervised medical dialogue model includes a context encoder, a priori state tracker, an inference state tracker, a priori policy network, an inference policy network and a reply generator. The context encoder is used to encode the received information and input it into the priori state tracker and the priori policy network. The prior state tracker is used to continuously track the user's physical state. The priori policy network is used to generate the doctor's corresponding actions. The reply generator is used to generate corresponding replies according to the physical state and the doctor's actions.

推理状态追踪器用于推理出用户的身体状态,推理策略网络用于推理出医师动作;推理状态追踪器和推理策略网络仅仅只在半监督医疗对话模型的训练阶段执行。The inference state tracker is used to infer the user's physical state, and the inference strategy network is used to infer the doctor's actions; the inference state tracker and the inference strategy network are only executed during the training phase of the semi-supervised medical dialogue model.

本发明的第二个方面提供一种半监督的多轮医疗对话回复生成系统。A second aspect of the present invention provides a semi-supervised multi-turn medical dialogue response generation system.

一种半监督的多轮医疗对话回复生成系统,其包括:A semi-supervised multi-turn medical dialogue response generation system, comprising:

第一轮对话回复生成模块,其用于将第一轮对话中病人的问题输入至半监督医疗对话模型,得到第一轮对话的回复;A first-round dialogue response generation module, which is used to input the patient's questions in the first-round dialogue into the semi-supervised medical dialogue model to obtain the responses to the first-round dialogue;

第二轮及其后对话回复生成模块,其用于在第二轮及其后对话中,将当前轮病人的问题及上一轮对话的回复输入至半监督医疗对话模型中,得到相应轮对话的回复,直至病人无新的问题输入;The second and subsequent dialogue response generation module is used to input the patient's questions in the current round and the responses in the previous round of dialogue into the semi-supervised medical dialogue model in the second and subsequent dialogues, and obtain the responses in the corresponding round of dialogues until the patient has no new questions input;

其中,半监督医疗对话模型包括上下文编码器、先验状态追踪器、推理状态追踪器、先验策略网络、推理策略网络和回复生成器,上下文编码器用于对接收到的信息进行编码并输入至先验状态追踪器和先验策略网络中,先验状态追踪器用于不断追踪用户的身体状态,先验策略网络用于生成医师相应的动作,回复生成器用于根据身体状态及医师动作,生成对应的回复;Among them, the semi-supervised medical dialogue model includes a context encoder, a priori state tracker, an inference state tracker, a priori policy network, an inference policy network and a reply generator. The context encoder is used to encode the received information and input it into the priori state tracker and the priori policy network. The prior state tracker is used to continuously track the user's physical state. The priori policy network is used to generate the doctor's corresponding actions. The reply generator is used to generate corresponding replies according to the physical state and the doctor's actions.

推理状态追踪器用于推理出用户的身体状态,推理策略网络用于推理出医师动作;推理状态追踪器和推理策略网络仅仅只在半监督医疗对话模型的训练阶段执行。The inference state tracker is used to infer the user's physical state, and the inference strategy network is used to infer the doctor's actions; the inference state tracker and the inference strategy network are only executed during the training phase of the semi-supervised medical dialogue model.

本发明的第三个方面提供一种计算机可读存储介质。A third aspect of the present invention provides a computer-readable storage medium.

一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如上述所述的半监督的多轮医疗对话回复生成方法中的步骤。A computer-readable storage medium stores a computer program, which, when executed by a processor, implements the steps in the semi-supervised multi-round medical dialogue response generation method as described above.

本发明的第四个方面提供一种计算机设备。A fourth aspect of the present invention provides a computer device.

一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如上述所述的半监督的多轮医疗对话回复生成方法中的步骤。A computer device comprises a memory, a processor and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, the steps in the semi-supervised multi-round medical dialogue response generation method as described above are implemented.

与现有技术相比,本发明的有益效果是:Compared with the prior art, the present invention has the following beneficial effects:

(1)本发明在第二轮及其后对话中,将当前轮病人的问题及上一轮对话的回复输入至半监督医疗对话模型中,得到相应轮对话的回复,直至病人无新的问题输入,显式建模了用户的身体状态以及医师的动作,使用text span来进行表示,提升了模型对于病人生理状态建模和医疗推理的能力。(1) In the second and subsequent rounds of dialogue, the present invention inputs the patient's questions in the current round and the replies in the previous round of dialogue into the semi-supervised medical dialogue model to obtain replies for the corresponding round of dialogue until the patient has no new questions to input. The user's physical state and the doctor's actions are explicitly modeled and represented by text spans, thereby improving the model's ability to model the patient's physiological state and conduct medical reasoning.

(2)本发明在模型层面,将用户的身体状态和医师动作当做隐变量,并且提出了存在中间标注(即监督)和不存在中间标注(即无监督)的情况下,模型的训练方法。该方法大大减小了对话模型对于标注数据的依赖。(2) At the model level, the present invention treats the user's physical state and the doctor's actions as latent variables, and proposes a model training method with and without intermediate annotations (i.e., supervision). This method greatly reduces the dependence of the dialogue model on labeled data.

(3)本发明提出在策略网络学习的过程中,使用追踪到的病人状态从大规模医疗知识图谱中进行检索,显式的状态,动作和医疗知识图谱中的推理路径提升了对话系统生成回复的可解释性。(3) The present invention proposes to use the tracked patient status to retrieve from a large-scale medical knowledge graph during the process of policy network learning. The explicit status, action and reasoning path in the medical knowledge graph improves the interpretability of the responses generated by the dialogue system.

(4)在模型训练上,本发明提出了两阶段层叠推理的方法,提升了监督训练数据较少的情况下的稳定性。(4) In model training, the present invention proposes a two-stage cascading reasoning method to improve the stability when there is less supervised training data.

本发明附加方面的优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本发明的实践了解到。Advantages of additional aspects of the present invention will be given in part in the following description, and in part will become obvious from the following description, or will be learned through practice of the present invention.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

构成本发明的一部分的说明书附图用来提供对本发明的进一步理解,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。The accompanying drawings in the specification, which constitute a part of the present invention, are used to provide a further understanding of the present invention. The exemplary embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute improper limitations on the present invention.

图1(a)是本发明实施例的监督数据训练;FIG. 1( a ) is a diagram of supervised data training according to an embodiment of the present invention;

图1(b)是本发明实施例的无监督数据训练;FIG1( b ) is an unsupervised data training of an embodiment of the present invention;

图1(c)是本发明实施例的测试阶段使用的模块;FIG1( c ) is a module used in the test phase of an embodiment of the present invention;

图2是本发明实施例的医疗对话系统具体实施方法;FIG2 is a specific implementation method of the medical dialogue system according to an embodiment of the present invention;

图3是本发明实施例的训练过程中的模型示意图。FIG. 3 is a schematic diagram of a model during a training process according to an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合附图与实施例对本发明作进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

应该指出,以下详细说明都是例示性的,旨在对本发明提供进一步的说明。除非另有指明,本文使用的所有技术和科学术语具有与本发明所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed descriptions are all illustrative and intended to provide further explanation of the present invention. Unless otherwise specified, all technical and scientific terms used herein have the same meanings as those commonly understood by those skilled in the art to which the present invention belongs.

需要注意的是,这里所使用的术语仅是为了描述具体实施方式,而非意图限制根据本发明的示例性实施方式。如在这里所使用的,除非上下文另外明确指出,否则单数形式也意图包括复数形式,此外,还应当理解的是,当在本说明书中使用术语“包含”和/或“包括”时,其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terms used herein are only for describing specific embodiments and are not intended to limit exemplary embodiments according to the present invention. As used herein, unless the context clearly indicates otherwise, the singular form is also intended to include the plural form. In addition, it should be understood that when the terms "comprising" and/or "including" are used in this specification, it indicates the presence of features, steps, operations, devices, components and/or combinations thereof.

术语解释:Terminology explanation:

编码器-解码器(Encoder-Decoder):一种神经网络结构,功能是将一个词序列编码后再解码转换成另一个词序列,主要用于机器翻译,对话系统等。Encoder-Decoder: A neural network structure that encodes a word sequence and then decodes it into another word sequence. It is mainly used in machine translation, dialogue systems, etc.

编码(encoding):将词序列表示成一个连续向量。Encoding: Representing a word sequence as a continuous vector.

解码(decoding):将一个连续向量表示成目标序列。Decoding: Represent a continuous vector as a target sequence.

期望(Expectation):试验中每次可能的结果乘以其结果概率的总和,本发明中使用E·的形式进行表示。Expectation: The sum of each possible outcome in an experiment multiplied by its outcome probability, which is expressed in the form of E· in the present invention.

KL散度(KL Divergence):是两个概率分布之间的差别的非对称性度量,本发明采用KL(·||·)的形式表示,其计算公式如下:KL Divergence: It is an asymmetric measure of the difference between two probability distributions. The present invention uses the form of KL(·||·), and its calculation formula is as follows:

Figure BDA0003084779000000061
Figure BDA0003084779000000061

其中q,p表示两个离散分布,q(i),p(i)分别表示分布q,p第i项概率值。Where q and p represent two discrete distributions, and q(i) and p(i) represent the probability values of the i-th item of distribution q and p respectively.

隐变量(Latent variable):潜变量,或称隐变量,潜在变量,在统计学中的表示不可观测随机变量,与观测变量相对。Latent variable: latent variable, or hidden variable, latent variable, in statistics represents unobservable random variable, as opposed to observed variable.

训练阶段(Train):神经网络模型的训练阶段接收训练数据作为输入,通过训练样本来不断调整神经网络模型中的参数。Training phase (Train): The training phase of the neural network model receives training data as input and continuously adjusts the parameters in the neural network model through training samples.

测试阶段(Test):神经网络模型在训练过后,在测试阶段通过训练过的圣经网络模型参数输出输入数据对应的标签等信息。后面我们亦称之为部署阶段。Test phase: After the neural network model is trained, the trained neural network model parameters are used to output the labels and other information corresponding to the input data in the test phase. We will also call it the deployment phase.

实施例一Embodiment 1

本实施例提供了一种半监督的多轮医疗对话回复生成方法,其包括:This embodiment provides a semi-supervised multi-round medical dialogue response generation method, which includes:

将第一轮对话中病人的问题输入至半监督医疗对话模型,得到第一轮对话的回复;Input the patient's questions in the first round of dialogue into the semi-supervised medical dialogue model to obtain the responses in the first round of dialogue;

在第二轮及其后对话中,将当前轮病人的问题及上一轮对话的回复输入至半监督医疗对话模型中,得到相应轮对话的回复,直至病人无新的问题输入;In the second and subsequent rounds of dialogue, the patient's questions in the current round and the responses in the previous round are input into the semi-supervised medical dialogue model to obtain responses for the corresponding round of dialogue until the patient has no new questions input;

其中,半监督医疗对话模型包括上下文编码器、先验状态追踪器、推理状态追踪器、先验策略网络、推理策略网络和回复生成器,上下文编码器用于对接收到的信息进行编码并输入至先验状态追踪器和先验策略网络中,先验状态追踪器用于不断追踪用户的身体状态,先验策略网络用于生成医师相应的动作,回复生成器用于根据身体状态及医师动作,生成对应的回复;Among them, the semi-supervised medical dialogue model includes a context encoder, a priori state tracker, an inference state tracker, a priori policy network, an inference policy network and a reply generator. The context encoder is used to encode the received information and input it into the priori state tracker and the priori policy network. The prior state tracker is used to continuously track the user's physical state. The priori policy network is used to generate the doctor's corresponding actions. The reply generator is used to generate corresponding replies according to the physical state and the doctor's actions.

推理状态追踪器用于推理出用户的身体状态,推理策略网络用于推理出医师动作;推理状态追踪器和推理策略网络仅仅只在半监督医疗对话模型的训练阶段执行。The inference state tracker is used to infer the user's physical state, and the inference strategy network is used to infer the doctor's actions; the inference state tracker and the inference strategy network are only executed during the training phase of the semi-supervised medical dialogue model.

上下文编码器用于对接收到的信息进行编码;对于第一轮对话的病人的问题进行直接编码;对于第二轮及其后对话病人的问题及相应上一轮对话的回复,编码形成上下文信息并输入至先验状态追踪器,推理状态追踪器,先验策略网络,推理策略网络和回复生成器五个模块中。The context encoder is used to encode the received information; the patient's questions in the first round of dialogue are directly encoded; for the patient's questions in the second and subsequent rounds of dialogue and the corresponding replies in the previous round of dialogue, the context information is encoded and input into the five modules of prior state tracker, reasoning state tracker, prior strategy network, reasoning strategy network and reply generator.

先验状态追踪器的输入信号为:从前一轮对话中的推理状态追踪器的输出概率分布中

Figure BDA0003084779000000071
采样的到的状态实例
Figure BDA0003084779000000072
输出概率分布
Figure BDA0003084779000000073
The input signal of the prior state tracker is: the output probability distribution of the inference state tracker in the previous round of dialogue
Figure BDA0003084779000000071
Sampled state instance
Figure BDA0003084779000000072
Output probability distribution
Figure BDA0003084779000000073

推理状态追踪器的输入信号为:从前一轮对话中的推理状态追踪器的输出概率分布中

Figure BDA0003084779000000074
采样的到的状态实例
Figure BDA0003084779000000075
和当前轮的医师回复Rt,输出概率分布
Figure BDA0003084779000000076
The input signal of the reasoning state tracker is: the output probability distribution of the reasoning state tracker in the previous round of dialogue
Figure BDA0003084779000000074
Sampled state instance
Figure BDA0003084779000000075
And the current round of physician responses R t , output probability distribution
Figure BDA0003084779000000076

先验策略网络的输入信号为:从当前轮对话中的推理状态追踪器的输出概率分布中

Figure BDA0003084779000000077
采样的到的状态实例
Figure BDA0003084779000000078
以及外部医疗知识图谱G,输出概率分布
Figure BDA0003084779000000079
The input signal of the prior policy network is: the output probability distribution of the inference state tracker in the current round of dialogue
Figure BDA0003084779000000077
Sampled state instance
Figure BDA0003084779000000078
And the external medical knowledge graph G, output probability distribution
Figure BDA0003084779000000079

推理策略网络的输入信号为:从当前轮对话中的推理状态追踪器的输出概率分布中

Figure BDA00030847790000000710
采样的到的状态实例
Figure BDA00030847790000000711
和当前轮的医师回复Rt,输出概率分布
Figure BDA00030847790000000712
The input signal of the reasoning policy network is: the output probability distribution of the reasoning state tracker in the current round of dialogue
Figure BDA00030847790000000710
Sampled state instance
Figure BDA00030847790000000711
And the current round of physician responses R t , output probability distribution
Figure BDA00030847790000000712

回复生成器的输入信号为:回复生成器输入分为训练阶段和测试阶段(就是部署的时候)两种情况,训练阶段其接收

Figure BDA0003084779000000081
以及从概率分布
Figure BDA0003084779000000082
中采样得到的实例
Figure BDA0003084779000000083
作为输入;测试阶段则接收
Figure BDA0003084779000000084
以及从概率分布
Figure BDA0003084779000000085
中采样得到的实例
Figure BDA0003084779000000086
作为输入,输出对话回复信息Rt。The input signal of the reply generator is: The reply generator input is divided into two situations: training phase and test phase (that is, when deployed). In the training phase, it receives
Figure BDA0003084779000000081
And from the probability distribution
Figure BDA0003084779000000082
The sampled examples
Figure BDA0003084779000000083
As input; the test phase receives
Figure BDA0003084779000000084
And from the probability distribution
Figure BDA0003084779000000085
The sampled examples
Figure BDA0003084779000000086
As input, the dialogue reply information R t is output.

在实际部署阶段,在每个对话轮中,给定病人的表述,医疗对话系统采用先验状态追踪器不断追踪用户的身体状态,并且使用先验策略网络生成医师相应的动作,最后回复生成器结合从先验状态追踪器和先验策略网络采样得到的状态以及动作生成对应的回复,对应图1(c)过程。对话进程一直持续到病人无新的问题输入,即病人主动结束当前对话。In the actual deployment stage, in each dialogue round, given the patient's statement, the medical dialogue system uses the prior state tracker to continuously track the user's physical state, and uses the prior policy network to generate the doctor's corresponding actions. Finally, the reply generator combines the state and actions sampled from the prior state tracker and the prior policy network to generate the corresponding reply, corresponding to the process in Figure 1(c). The dialogue process continues until the patient has no new questions to input, that is, the patient actively ends the current dialogue.

医疗对话系统有两个关键特征:患者状态(症状、药物等)和医师动作(治疗、诊断等)。这两个特征使得医疗对话系统比其他知识密集型对话场景更加复杂。与任务导向对话系统相似,医学对话生成过程拆分为以下三个阶段:Medical dialogue systems have two key features: patient status (symptoms, medications, etc.) and physician actions (treatment, diagnosis, etc.). These two features make medical dialogue systems more complex than other knowledge-intensive dialogue scenarios. Similar to task-oriented dialogue systems, the medical dialogue generation process is divided into the following three stages:

(1)病人状态追踪:对于给定的对话历史,对话系统追踪状态的身体状态(state);(1) Patient state tracking: For a given dialogue history, the dialogue system tracks the physical state of the patient.

(2)医师策略学习:给定病人状态和对话历史,对话系统给出当前医师的动作(action);(2) Physician strategy learning: Given the patient status and dialogue history, the dialogue system gives the current physician’s action;

(3)医疗回复生成:给定对话历史,追踪到的状态以及预测的动作,给出流畅并准确的自然语言回复。(3) Medical response generation: Given the conversation history, tracked states, and predicted actions, give fluent and accurate natural language responses.

对于存在标注数据的场景,在对话的第t轮,病人给出问题或者描述自己的症状Ut,后医疗对话系统接收前一轮的回复Rt-1,当前轮问题Ut和前一轮追踪到的状态St-1,然后输出当前轮的状态St,后再利用Rt-1UtSt输出当前轮医师应采取的动作At,最后生成自然语言形式的回复Rt反馈给病人。但是在医疗对话系统中,很多情况下,病人的生理状态和医师的动作是不存在标注的。故我们将状态和动作都视为隐变量,并且考虑到state贯穿整个对话过程,所以使用一个序列的词来表示;医师动作亦是如此,即医师的回复中可能包含多个关键词。实际操作过程中,状态和动作的长度被设置为固定的长度分别为|S|和|A|。并且状态存在一个初始值,为“<pad><pad>...<pad>”,其中“<pad>”表示一个填充词。State和Action的设计细节如下:For scenarios with labeled data, in the tth round of the conversation, the patient asks a question or describes his symptoms U t , and then the medical dialogue system receives the previous round's reply R t-1 , the current round's question U t and the state St-1 tracked in the previous round, and then outputs the current round's state St t , and then uses R t-1 U t St t to output the action At that the physician should take in the current round, and finally generates a natural language reply R t to feed back to the patient. However, in medical dialogue systems, in many cases, the patient's physiological state and the physician's action are not labeled. Therefore, we regard both state and action as hidden variables, and considering that state runs through the entire conversation process, we use a sequence of words to represent it; the same is true for physician actions, that is, the physician's reply may contain multiple keywords. In actual operation, the length of state and action is set to a fixed length of |S| and |A| respectively. And the state has an initial value, which is "<pad><pad>...<pad>", where "<pad>" represents a filler word. The design details of State and Action are as follows:

state的设计:state用于记录整个对话过程中的对话系统所获取到的用户身体状态的信息,其使用一个序列的词来表示,例如“感冒发热咳嗽夜汗......”,并且其初始化为“<pad><pad>......<pad>”。State design: state is used to record the information about the user's physical state obtained by the dialogue system during the entire dialogue process. It is represented by a sequence of words, such as "cold, fever, cough, night sweats...", and is initialized to "<pad><pad>......<pad>".

action的设计:action用于表示医师回复的概要,其亦使用一个序列的表示,例如“999感冒灵颗粒急支糖浆......”。Action design: Action is used to express the summary of the doctor's reply, which is also expressed as a sequence, such as "999 Ganmao Ling Granules Jizhi Syrup..."

半监督医疗对话模型包含了六个模块,分别为上下文编码器(context encoder),先验状态追踪器(prior state tracker),推理状态追踪器(inference state tracker),先验策略网络(prior policy network),推理策略网络(inference state tracker)和回复生成器(response generator)。在一整个医疗对话中,往往包含多次交互,以下过程经历多轮直至对话结束。The semi-supervised medical dialogue model consists of six modules: context encoder, prior state tracker, inference state tracker, prior policy network, inference state tracker, and response generator. In a whole medical dialogue, there are often multiple interactions, and the following process goes through multiple rounds until the dialogue ends.

其中先验状态追踪器,推理状态追踪器用于病人状态追踪,其中推理状态追踪器只在训练阶段执行;先验策略网络,推理策略网络用于医师策略学习,其中推理策略网络只在训练阶段执行;回复生成器用于医疗回复生成。下面主要从无监督的角度,即使用无监督数据Du,对应图1(b)来描述各个模块的输入输出。The prior state tracker and the reasoning state tracker are used for patient state tracking, and the reasoning state tracker is only executed in the training phase; the prior policy network and the reasoning policy network are used for physician policy learning, and the reasoning policy network is only executed in the training phase; the response generator is used for medical response generation. The following mainly describes the input and output of each module from an unsupervised perspective, that is, using unsupervised data Du , corresponding to Figure 1(b).

在t轮,上下文编码器是一个基于GRU(或者基于LSTM,Transformer,Bert)的编码器,其接收上一轮的回复Rt-1以及当前轮的病人的问题Ut作为输入,并且输出一个连续空间向量

Figure BDA0003084779000000101
来表示对话上下文。In round t, the context encoder is a GRU-based (or LSTM-based, Transformer-based, Bert-based) encoder that receives the previous round’s response R t-1 and the current round’s patient’s question U t as input, and outputs a continuous space vector
Figure BDA0003084779000000101
To represent the conversation context.

在第t轮,给定前一轮回复Rt-1以及当前轮的病人的问题Ut作为输入,上下文编码器首先使用双向GRU Encoder编码得到一个序列词粒度的表示Ht={ht,1,ht,2,…,ht,M+N},并且输出一个向量

Figure BDA00030847790000001020
来表示对话上下文。其中M和N分别表示Rt-1和Ut序列长度。In the tth round, given the previous round reply Rt -1 and the current round patient's question Ut as input, the context encoder first uses the bidirectional GRU Encoder to encode a sequence word granularity representation Ht = {ht ,1 , ht,2 ,…, ht,M+N }, and outputs a vector
Figure BDA00030847790000001020
to represent the conversation context. Where M and N represent the length of R t-1 and U t sequences respectively.

Figure BDA0003084779000000102
Figure BDA0003084779000000102

Figure BDA0003084779000000103
Figure BDA0003084779000000103

其中

Figure BDA0003084779000000104
表示Rt-1第i个词的词嵌入(embedding),这个BiGRU编码器采用了上一个时刻的上下文表示
Figure BDA0003084779000000105
初始化,attn[17]表示的是attention操作。in
Figure BDA0003084779000000104
Represents the word embedding of the i-th word in R t-1 . This BiGRU encoder uses the context representation of the previous moment.
Figure BDA0003084779000000105
Initialization, attn[17] represents the attention operation.

先验状态追踪器,接收上下文编码器输出以及前一时刻的状态

Figure BDA0003084779000000106
作为输入,然后采用一个基于GRU的解码器来输出一个序列的词,即
Figure BDA0003084779000000107
推理状态追踪器采用了先验状态追踪器相近的结构,但是额外接受了当前轮的回复Rt作为输入,其输出一个词序列,即
Figure BDA0003084779000000108
我们使用
Figure BDA0003084779000000109
Figure BDA00030847790000001010
分别表示先验状态追踪器和推理状态追踪器,生成概率分布简写为
Figure BDA00030847790000001011
Figure BDA00030847790000001012
The prior state tracker receives the context encoder output and the state at the previous moment
Figure BDA0003084779000000106
As input, a GRU-based decoder is then used to output a sequence of words, namely
Figure BDA0003084779000000107
The reasoning state tracker uses a structure similar to the prior state tracker, but additionally accepts the current round response Rt as input, and outputs a word sequence, namely
Figure BDA0003084779000000108
We use
Figure BDA0003084779000000109
and
Figure BDA00030847790000001010
They represent the prior state tracker and the inference state tracker respectively, and the generated probability distribution is abbreviated as
Figure BDA00030847790000001011
and
Figure BDA00030847790000001012

先验状态追踪器和推理状态追踪器均是编码器-解码器(Encoder-Decoder)结构。在无监督信息的情况下,所有的对话轮的状态是都是不可知的,并且后一轮的状态需要依赖于前一轮的状态作为输入,故我们从

Figure BDA00030847790000001013
采样得到
Figure BDA00030847790000001014
送入先验状态追踪器和推理状态追踪器中。Both the prior state tracker and the reasoning state tracker are encoder-decoder structures. In the absence of supervision information, the states of all dialogue rounds are unknown, and the state of the next round needs to depend on the state of the previous round as input, so we start from
Figure BDA00030847790000001013
Sampling
Figure BDA00030847790000001014
Feed it into the prior state tracker and the inference state tracker.

先验状态追踪器首先将采样得到的

Figure BDA00030847790000001015
编码为
Figure BDA00030847790000001016
使用
Figure BDA00030847790000001017
初始化先验状态追踪器的解码器,其中
Figure BDA00030847790000001018
为训练参数。在第i个解码时刻,输出
Figure BDA00030847790000001019
然后序列解码得到St的先验分布为:The prior state tracker first converts the sampled
Figure BDA00030847790000001015
Encoded as
Figure BDA00030847790000001016
use
Figure BDA00030847790000001017
Initialize the decoder of the prior state tracker, where
Figure BDA00030847790000001018
is the training parameter. At the i-th decoding moment, the output
Figure BDA00030847790000001019
Then the prior distribution of St obtained by sequence decoding is:

Figure BDA0003084779000000111
Figure BDA0003084779000000111

其中MLP表示的是多层感知机(Multi-Layer Perceptron)。|S|为状态text span的长度。Where MLP stands for Multi-Layer Perceptron. |S| is the length of the state text span.

推理状态追踪器同先验状态追踪器的结构类似,其亦使用GRU Encoder将

Figure BDA0003084779000000112
编码为
Figure BDA0003084779000000113
另外编码Rt
Figure BDA0003084779000000114
其使用
Figure BDA0003084779000000115
初始化解码器,其中
Figure BDA0003084779000000116
为训练参数。在第i个解码时刻,输出
Figure BDA0003084779000000117
然后序列解码得到St的近似后验分布:The inference state tracker has a similar structure to the prior state tracker, and it also uses the GRU Encoder to
Figure BDA0003084779000000112
Encoded as
Figure BDA0003084779000000113
In addition, R t is encoded as
Figure BDA0003084779000000114
Its use
Figure BDA0003084779000000115
Initialize the decoder, where
Figure BDA0003084779000000116
is the training parameter. At the i-th decoding moment, the output
Figure BDA0003084779000000117
Then the sequence is decoded to obtain the approximate posterior distribution of St :

Figure BDA0003084779000000118
Figure BDA0003084779000000118

先验策略网络,接收上下文编码器输出,当前轮的St以及外部医疗知识G作为输入,然后使用一个基于GRU的解码器输出一个序列的词,即

Figure BDA0003084779000000119
推理策略网络结构相近,其接收
Figure BDA00030847790000001110
St并且额外接收当前轮回复Rt作为输出,后输出一个序列的词,即
Figure BDA00030847790000001111
我们使用
Figure BDA00030847790000001112
Figure BDA00030847790000001113
分别表示先验策略网络和推理策略网络,简写为
Figure BDA00030847790000001114
Figure BDA00030847790000001115
The prior policy network receives the context encoder output, the current round St and the external medical knowledge G as input, and then uses a GRU-based decoder to output a sequence of words, i.e.
Figure BDA0003084779000000119
The reasoning strategy network structure is similar, and its receiving
Figure BDA00030847790000001110
S t and additionally receives the current round reply R t as output, and then outputs a sequence of words, i.e.
Figure BDA00030847790000001111
We use
Figure BDA00030847790000001112
and
Figure BDA00030847790000001113
Represent the prior strategy network and the reasoning strategy network, respectively, abbreviated as
Figure BDA00030847790000001114
and
Figure BDA00030847790000001115

先验策略网络以及推理策略网络亦是Encoder-Decoder的结构。其中先验策略网络从

Figure BDA00030847790000001116
采样得到
Figure BDA00030847790000001117
推理策略网络从
Figure BDA00030847790000001118
采样得到
Figure BDA00030847790000001119
The prior strategy network and the reasoning strategy network are also encoder-decoder structures.
Figure BDA00030847790000001116
Sampling
Figure BDA00030847790000001117
The reasoning strategy network is
Figure BDA00030847790000001118
Sampling
Figure BDA00030847790000001119

在介绍两个策略网络之前,首先引入一个知识图谱检索操作qsub,和知识图谱编码操作RGAT[15]。qsub从G中使用追踪得到的状态从医疗知识图谱G中进行检索得到一个子图Gn,从state作为起始点,抽取出n步跳转可达的所有的节点和边,并且连接所有出现在state中的节点,以保证图Gn是全连通的。RGAT是一种图编码方法,其结合了边的类型,进行多次传播后得到节点的embedding表示,即一个连续空间上的向量表示。我们使用

Figure BDA0003084779000000121
表示Gn编码后的节点表示,其中|Gn|为Gn中的节点数量。Before introducing the two policy networks, we first introduce a knowledge graph retrieval operation qsub and a knowledge graph encoding operation RGAT[15]. qsub retrieves a subgraph Gn from the medical knowledge graph G using the state obtained by tracing from G. Starting from state, it extracts all nodes and edges that can be reached by n-step jumps, and connects all nodes that appear in state to ensure that the graph Gn is fully connected. RGAT is a graph encoding method that combines the type of edges and obtains the embedding representation of the node after multiple propagations, that is, a vector representation in a continuous space. We use
Figure BDA0003084779000000121
represents the node representation after Gn encoding, where | Gn | is the number of nodes in Gn .

先验策略网络使用GRU Encoder将

Figure BDA0003084779000000122
编码为
Figure BDA0003084779000000123
后来使用
Figure BDA0003084779000000124
Figure BDA0003084779000000125
来初始化解码器,在第i个解码时刻,输出
Figure BDA0003084779000000126
解码过程包含两个部分,一种从词表中生成,另一种从检索得到的知识图谱Gn中进行拷贝。The prior policy network uses GRU Encoder to
Figure BDA0003084779000000122
Encoded as
Figure BDA0003084779000000123
Later use
Figure BDA0003084779000000124
Figure BDA0003084779000000125
To initialize the decoder, at the i-th decoding moment, output
Figure BDA0003084779000000126
The decoding process consists of two parts, one is generated from the vocabulary, and the other is copied from the retrieved knowledge graph Gn .

Figure BDA0003084779000000127
Figure BDA0003084779000000127

Figure BDA0003084779000000128
Figure BDA0003084779000000128

其中ej表示Gn中的第j个节点,gj表示

Figure BDA0003084779000000129
中第j个节点的embedding。ZA为生成Where e j represents the jth node in G n , g j represents
Figure BDA0003084779000000129
The embedding of the jth node in Z A is generated

拷贝的正则项。在ej=At,i的情况下I(ej,At,i)=1,否则I(ej,At,i)=0。Regularization term of the copy. In the case of e j = A t,i, I(e j , A t,i ) = 1, otherwise I(e j , A t,i ) = 0.

则At的先验分布可以表示为:Then the prior distribution of At can be expressed as:

Figure BDA00030847790000001210
Figure BDA00030847790000001210

推理策略网络使用GRU Encoder编码

Figure BDA00030847790000001211
编码为
Figure BDA00030847790000001212
编码Rt
Figure BDA00030847790000001213
后来使用
Figure BDA00030847790000001214
初始化解码器,在第i个解码时刻输出
Figure BDA00030847790000001215
为了强化Rt对于结果的影响,对于At近似后验分布,我们只考虑直接的生成概率。The reasoning policy network is encoded using GRU Encoder
Figure BDA00030847790000001211
Encoded as
Figure BDA00030847790000001212
The code R t is
Figure BDA00030847790000001213
Later use
Figure BDA00030847790000001214
Initialize the decoder and output at the i-th decoding time
Figure BDA00030847790000001215
In order to strengthen the impact of R t on the results, we only consider the direct generation probability for the approximate posterior distribution of A t .

Figure BDA00030847790000001216
Figure BDA00030847790000001216

回复生成器是一个基于GRU的解码器,接收上下文编码器输出

Figure BDA00030847790000001217
St和At作为输入,然后输出医疗回复Rt。使用
Figure BDA00030847790000001218
表示回复生成器,简写为
Figure BDA00030847790000001219
The reply generator is a GRU-based decoder that receives the context encoder output
Figure BDA00030847790000001217
S t and A t are used as input, and then the medical response R t is output.
Figure BDA00030847790000001218
Represents a reply generator, abbreviated as
Figure BDA00030847790000001219

回复生成器在无监督训练阶段只使用推理状态追踪器和推理策略网络的输出。在无监督训练过程中,我们从

Figure BDA0003084779000000131
Figure BDA0003084779000000132
分别采样得到
Figure BDA0003084779000000133
Figure BDA0003084779000000134
将其编码为
Figure BDA0003084779000000135
Figure BDA0003084779000000136
后来初始化回复生成器的解码器为
Figure BDA0003084779000000137
在第i个解码时刻输出
Figure BDA0003084779000000138
则得到Rt的输出概率为:The reply generator uses only the output of the inference state tracker and the inference policy network during the unsupervised training phase.
Figure BDA0003084779000000131
and
Figure BDA0003084779000000132
Sampled separately
Figure BDA0003084779000000133
and
Figure BDA0003084779000000134
Encode it as
Figure BDA0003084779000000135
and
Figure BDA0003084779000000136
Later the decoder of the reply generator is initialized as
Figure BDA0003084779000000137
Output at the i-th decoding time
Figure BDA0003084779000000138
Then the output probability of R t is:

Figure BDA0003084779000000139
Figure BDA0003084779000000139

其中

Figure BDA00030847790000001310
表示从词表中生成的概率,
Figure BDA00030847790000001311
表示从
Figure BDA00030847790000001312
Rt-1和Ut中拷贝的概率,|R|为回复的长度。in
Figure BDA00030847790000001310
represents the probability generated from the vocabulary,
Figure BDA00030847790000001311
Indicates from
Figure BDA00030847790000001312
R t-1 and the probability of copying in U t , |R| is the length of the reply.

对于监督数据和无监督数据的训练损失函数分别为Lsup和Lun,其中Lun为:The training loss functions for supervised data and unsupervised data are L sup and L un respectively, where L un is:

Figure BDA00030847790000001313
Figure BDA00030847790000001313

其中E·表示期望,KL(·||·)表KL散度(Kullback-Leibler divergence)。Where E· represents expectation, and KL(·||·) represents KL divergence (Kullback-Leibler divergence).

考虑到在监督数据比例较少的情况下,训练中存在的不稳定性,即先验策略网络容易受从先验状态追踪器采样得到的错误的state误导。本发明提出了两阶段层叠推理训练方法,将Lun分别为多个训练部分,由于策略网络依赖于状态追踪器的输出,故首先优化

Figure BDA00030847790000001314
再同时优化剩余模块,以提高训练过程中的稳定性。Lun被拆分为Ls和La两个训练目标:Considering the instability in training when the proportion of supervised data is small, that is, the prior policy network is easily misled by the wrong state sampled from the prior state tracker, the present invention proposes a two-stage cascade reasoning training method, which divides Lun into multiple training parts. Since the policy network depends on the output of the state tracker, it is first optimized.
Figure BDA00030847790000001314
The remaining modules are then optimized simultaneously to improve the stability during training. L un is split into two training objectives, L s and L a :

Figure BDA00030847790000001315
Figure BDA00030847790000001315

Figure BDA00030847790000001316
Figure BDA00030847790000001316

在第一训练阶段,最小化Ls提升模型状态追踪性能,第二阶段最小化Ls+La以维持状态追踪效果以及训练模型的策略学习能力。我们将其命名为两阶段层叠推理训练方法。In the first training phase, minimizing L s improves the model state tracking performance, and in the second phase, minimizing L s + L a maintains the state tracking effect and the policy learning ability of the training model. We name it the two-stage cascade reasoning training method.

图3训练过程中的模型示意图,global_step为一个整数用于记录训练经过轮数。Figure 3 is a schematic diagram of the model during training. global_step is an integer used to record the number of training rounds.

在半监督场景下,用于模型训练的对话数据存在监督和无监督数据两个部分,下面我们分别介绍对于监督数据Da和无监督数据Du的训练方法。In the semi-supervised scenario, the conversation data used for model training consists of supervised and unsupervised data. Below we introduce the training methods for supervised data Da and unsupervised data Du respectively.

(a)对于监督数据Da (a) For the supervised data Da

从Da采样训练样本构成训练所需要的小批量(即mini-batch),得到数据Rt-1,Ut,St-1,St,At,Rt。将对应输入数据送入上述的6个模块,对应图1中的(a)。采用Negative LogLikelihood(NLL)Loss来进行训练。实际的训练损失函数为:The training samples are sampled from Da to form the mini-batch required for training, and the data Rt -1 , Ut , St -1 , St , At , and Rt are obtained. The corresponding input data is sent to the above 6 modules, corresponding to (a) in Figure 1. Negative LogLikelihood (NLL) Loss is used for training. The actual training loss function is:

Figure BDA0003084779000000141
Figure BDA0003084779000000141

(b)对于无监督数据Du (b) For the unsupervised data Du

从Du采样训练样本构成训练所需要的小批量(即mini-batch),得到数据Rt-1,Ut,Rt。中间的标注数据St-1,St,At因无标注均缺失。我们从

Figure BDA0003084779000000142
采样得到
Figure BDA0003084779000000143
后将
Figure BDA0003084779000000144
送入到
Figure BDA0003084779000000145
Figure BDA0003084779000000146
中。再后来,从
Figure BDA0003084779000000147
Figure BDA0003084779000000148
中分别采样得到
Figure BDA0003084779000000149
Figure BDA00030847790000001410
分别作为
Figure BDA00030847790000001411
Figure BDA00030847790000001412
的输入。再然后从
Figure BDA00030847790000001413
采样得到
Figure BDA00030847790000001414
最后
Figure BDA00030847790000001415
结合Rt-1,Ut生成回复Rt,以上过程对应图1中的(b)。训练loss为Lun(亦可选用Ls+La作为训练loss,以提高训练稳定性)。We sample training samples from Du to form the mini-batch required for training, and obtain the data R t - 1 , U t , R t . The labeled data in the middle, St -1 , St t , At , are missing because they are not labeled.
Figure BDA0003084779000000142
Sampling
Figure BDA0003084779000000143
Later
Figure BDA0003084779000000144
Send to
Figure BDA0003084779000000145
and
Figure BDA0003084779000000146
Later, from
Figure BDA0003084779000000147
and
Figure BDA0003084779000000148
The samples were obtained from
Figure BDA0003084779000000149
and
Figure BDA00030847790000001410
As
Figure BDA00030847790000001411
and
Figure BDA00030847790000001412
Then from
Figure BDA00030847790000001413
Sampling
Figure BDA00030847790000001414
at last
Figure BDA00030847790000001415
Combine R t-1 and U t to generate the reply R t . The above process corresponds to (b) in Figure 1. The training loss is L un (L s +L a can also be used as the training loss to improve the training stability).

对于整个训练数据集D={Da,Du},具体训练步骤如下:For the entire training data set D = {D a , Du }, the specific training steps are as follows:

Step1:假设监督数据Da占全部训练数据D的比例为α(0≤α≤1),选择0-1之间的随机数,如果小于α转Step2,如果大于α转Step3。Step 1: Assume that the proportion of the supervised data Da to the total training data D is α (0≤α≤1), select a random number between 0 and 1. If it is less than α, go to Step 2; if it is greater than α, go to Step 3.

Step2:采用于监督数据训练模型,对应(a)方式,训练loss为Lsup,梯度下降更新参数后转Step4。Step 2: Use supervised data to train the model, corresponding to method (a), with the training loss L sup . Update the parameters by gradient descent and then go to Step 4.

Step3:采用于监督数据训练模型,对应(b)方式,训练loss为Lun,梯度下降更新参数后转Step4。Step 3: Use supervised data to train the model, corresponding to method (b), the training loss is Lun , and the parameters are updated by gradient descent and then go to Step 4.

Step4:判断模型是否收敛,若收敛则转Step5,否则转Step1。Step 4: Determine whether the model converges. If so, go to Step 5, otherwise go to Step 1.

Step5:保存模型权重,结束训练,如图3所示。Step 5: Save the model weights and end the training, as shown in Figure 3.

使用目前工业界和学术界内公开的医疗对话数据集,训练得到半监督医疗对话模型。其中,对于采样得到的监督数据和无监督数据,送入模型中,算出对应的损失函数后进行梯度下降,优化模型参数。The semi-supervised medical dialogue model is trained using the currently public medical dialogue datasets in the industry and academia. The sampled supervised data and unsupervised data are fed into the model, and the corresponding loss function is calculated and then gradient descent is performed to optimize the model parameters.

模型训练完成后,模型的参数便全部固定,我们可将推理状态追踪器和推理策略网络丢弃。此时,模型就可以应用到实际的对话场景中。如图2所示,给定病人问题输入模型中,上下文编码器,先验状态追踪器,先验策略网络,回复生成器相继工作,(此刻回复生成器只使用先验状态追踪器输出

Figure BDA0003084779000000151
和先验策略网络的输出
Figure BDA0003084779000000152
作为输入),最后生成回复返回给用户。对话系统持续和病人交互,先验状态追踪器在每个对话轮中先使用前一时刻的状态作为输入,后更新追踪到的病人身体状态,等待一段时间过后未接收到病人新的问题,结束当前会话。After the model training is completed, the model parameters are all fixed, and we can discard the inference state tracker and inference policy network. At this point, the model can be applied to actual conversation scenarios. As shown in Figure 2, given a patient question input model, the context encoder, prior state tracker, prior policy network, and response generator work in sequence (at this moment, the response generator only uses the prior state tracker output
Figure BDA0003084779000000151
and the output of the prior policy network
Figure BDA0003084779000000152
As input), the system generates a reply and returns it to the user. The dialogue system continues to interact with the patient. In each dialogue round, the prior state tracker first uses the state of the previous moment as input, then updates the tracked patient's physical state, and ends the current session after a period of time has passed without receiving any new questions from the patient.

实施例二Embodiment 2

一种半监督的多轮医疗对话回复生成系统,其包括:A semi-supervised multi-turn medical dialogue response generation system, comprising:

第一轮对话回复生成模块,其用于将第一轮对话中病人的问题输入至半监督医疗对话模型,得到第一轮对话的回复;A first-round dialogue response generation module, which is used to input the patient's questions in the first-round dialogue into the semi-supervised medical dialogue model to obtain the responses to the first-round dialogue;

第二轮及其后对话回复生成模块,其用于在第二轮及其后对话中,将当前轮病人的问题及上一轮对话的回复输入至半监督医疗对话模型中,得到相应轮对话的回复,直至病人无新的问题输入;The second and subsequent dialogue response generation module is used to input the patient's questions in the current round and the responses in the previous round of dialogue into the semi-supervised medical dialogue model in the second and subsequent dialogues, and obtain the responses in the corresponding round of dialogues until the patient has no new questions input;

其中,半监督医疗对话模型包括上下文编码器、先验状态追踪器、推理策略状态追踪器、先验策略网络、推理策略网络和回复生成器,上下文编码器用于对接收到的信息进行编码并输入至先验状态追踪器和先验策略网络中,先验状态追踪器用于不断追踪用户的身体状态,先验策略网络用于生成医师相应的动作,回复生成器用于根据身体状态及医师动作,生成对应的回复;Among them, the semi-supervised medical dialogue model includes a context encoder, a priori state tracker, an inference strategy state tracker, a priori strategy network, an inference strategy network and a reply generator. The context encoder is used to encode the received information and input it into the priori state tracker and the priori strategy network. The priori state tracker is used to continuously track the user's physical state. The priori strategy network is used to generate the doctor's corresponding actions. The reply generator is used to generate corresponding replies according to the physical state and the doctor's actions.

推理状态追踪器用于推理出用户的身体状态,推理策略网络用于推理出医师动作;推理状态追踪器和推理策略网络仅仅只在半监督医疗对话模型的训练阶段执行。The inference state tracker is used to infer the user's physical state, and the inference strategy network is used to infer the doctor's actions; the inference state tracker and the inference strategy network are only executed during the training phase of the semi-supervised medical dialogue model.

本实施例中的各个模块,与实施例一中的各个步骤一一对应,其具体实施过程相同,此处不再累述。Each module in this embodiment corresponds to each step in the first embodiment one by one, and the specific implementation process is the same, which will not be repeated here.

实施例三Embodiment 3

本实施例提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如上述所述的半监督的多轮医疗对话回复生成方法中的步骤。This embodiment provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps in the semi-supervised multi-round medical dialogue response generation method as described above.

实施例四Embodiment 4

本实施例提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如上述所述的半监督的多轮医疗对话回复生成方法中的步骤。This embodiment provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, the steps in the semi-supervised multi-round medical dialogue response generation method as described above are implemented.

本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present invention may be provided as methods, systems, or computer program products. Therefore, the present invention may take the form of hardware embodiments, software embodiments, or embodiments combining software and hardware. Moreover, the present invention may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) containing computer-usable program codes.

本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to the flowchart and/or block diagram of the method, device (system), and computer program product according to the embodiment of the present invention. It should be understood that each process and/or box in the flowchart and/or block diagram, as well as the combination of the process and/or box in the flowchart and/or block diagram can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processor or other programmable data processing device to produce a machine, so that the instructions executed by the processor of the computer or other programmable data processing device produce a device for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(RandomAccessMemory,RAM)等。A person skilled in the art can understand that all or part of the processes in the above-mentioned embodiments can be implemented by instructing the relevant hardware through a computer program, and the program can be stored in a computer-readable storage medium, and when the program is executed, it can include the processes of the embodiments of the above-mentioned methods. The storage medium can be a disk, an optical disk, a read-only memory (ROM) or a random access memory (RAM), etc.

以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and variations. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the protection scope of the present invention.

Claims (8)

1.一种半监督的多轮医疗对话回复生成方法,其特征在于,包括:1. A semi-supervised multi-round medical dialogue response generation method, characterized by comprising: 将第一轮对话中病人的问题输入至半监督医疗对话模型,得到第一轮对话的回复;Input the patient's questions in the first round of dialogue into the semi-supervised medical dialogue model to obtain the responses in the first round of dialogue; 在第二轮及其后对话中,将当前轮病人的问题及上一轮对话的回复输入至半监督医疗对话模型中,得到相应轮对话的回复,直至病人无新的问题输入;In the second and subsequent rounds of dialogue, the patient's questions in the current round and the responses in the previous round are input into the semi-supervised medical dialogue model to obtain responses for the corresponding round of dialogue until the patient has no new questions input; 其中,半监督医疗对话模型包括上下文编码器、先验状态追踪器、推理策略状态追踪器、先验策略网络、推理策略网络和回复生成器,上下文编码器用于对接收到的信息进行编码并输入至先验状态追踪器和先验策略网络中,先验状态追踪器用于不断追踪用户的身体状态,先验策略网络的输入信号为:从当前轮对话中的推理状态追踪器的输出概率分布中采样得到的状态实例,以及外部医疗知识图谱G;Among them, the semi-supervised medical dialogue model includes a context encoder, a priori state tracker, an inference strategy state tracker, a priori strategy network, an inference strategy network and a response generator. The context encoder is used to encode the received information and input it into the priori state tracker and the priori strategy network. The priori state tracker is used to continuously track the user's physical state. The input signal of the priori strategy network is: the state instance sampled from the output probability distribution of the inference state tracker in the current round of dialogue, and the external medical knowledge graph G; 先验策略网络的解码过程包含两个部分,一种从词表中生成,另一种从检索得到的知识图谱Gn中进行拷贝:The decoding process of the prior policy network consists of two parts, one is generated from the vocabulary, and the other is copied from the retrieved knowledge graph Gn :
Figure FDA0004126833400000011
Figure FDA0004126833400000011
Figure FDA0004126833400000012
Figure FDA0004126833400000012
其中,
Figure FDA0004126833400000013
为一个连续空间向量,表示对话上下文;先验策略网络使用GRU自编码器将
Figure FDA0004126833400000014
编码为
Figure FDA0004126833400000015
Figure FDA0004126833400000016
表示从当前轮对话中的推理状态追踪器的输出概率分布中
Figure FDA0004126833400000017
采样的到的状态实例;MLP表示的是多层感知机制,
Figure FDA0004126833400000018
表示先验策略网络在第i个解码时刻的输出;ej表示Gn中的第j个节点,gj表示
Figure FDA0004126833400000019
中第j个节点的词嵌入;ZA为生成拷贝的正则项;在ej=At,i的情况下,I(ej,At,i)=1,否则I(ej,At,i)=0;
in,
Figure FDA0004126833400000013
is a continuous space vector representing the conversation context; the prior strategy network uses the GRU autoencoder to
Figure FDA0004126833400000014
Encoded as
Figure FDA0004126833400000015
Figure FDA0004126833400000016
represents the output probability distribution of the inference state tracker in the current round of dialogue
Figure FDA0004126833400000017
The sampled state instance; MLP represents the multi-layer perception mechanism,
Figure FDA0004126833400000018
represents the output of the prior strategy network at the i-th decoding time; e j represents the j-th node in G n , and g j represents
Figure FDA0004126833400000019
The word embedding of the jth node in ; Z A is the regularization term for generating copies; in the case of e j = At ,i , I(e j ,A t,i ) = 1, otherwise I(e j ,A t,i ) = 0;
先验策略网络用于生成医师动作,输出概率分布
Figure FDA0004126833400000021
Figure FDA0004126833400000022
The prior policy network is used to generate physician actions and output probability distribution
Figure FDA0004126833400000021
Figure FDA0004126833400000022
其中,|A|表示动作的长度;Among them, |A| represents the length of the action; 回复生成器用于根据身体状态及医师动作,生成对应的回复;The response generator is used to generate corresponding responses according to the physical state and the physician's actions; 推理状态追踪器用于推理出用户的身体状态,推理策略网络用于推理出医师动作;推理状态追踪器和推理策略网络仅仅只在半监督医疗对话模型的训练阶段执行;The inference state tracker is used to infer the user's physical state, and the inference strategy network is used to infer the doctor's actions. The inference state tracker and the inference strategy network are only executed during the training phase of the semi-supervised medical dialogue model. 在无监督训练过程中,从
Figure FDA0004126833400000023
Figure FDA0004126833400000024
分别采样得到
Figure FDA0004126833400000025
Figure FDA0004126833400000026
将其编码为
Figure FDA0004126833400000027
Figure FDA0004126833400000028
后,来初始化回复生成器的解码器为
Figure FDA0004126833400000029
Figure FDA00041268334000000210
在第i个解码时刻输出
Figure FDA00041268334000000211
则得到Rt的输出概率为:
In the unsupervised training process,
Figure FDA0004126833400000023
and
Figure FDA0004126833400000024
Sampled separately
Figure FDA0004126833400000025
and
Figure FDA0004126833400000026
Encode it as
Figure FDA0004126833400000027
and
Figure FDA0004126833400000028
Then, we initialize the decoder of the reply generator as
Figure FDA0004126833400000029
Figure FDA00041268334000000210
Output at the i-th decoding time
Figure FDA00041268334000000211
Then the output probability of R t is:
Figure FDA00041268334000000212
Figure FDA00041268334000000212
其中,
Figure FDA00041268334000000213
表示从词表中生成的概率,
Figure FDA00041268334000000214
表示从
Figure FDA00041268334000000215
Rt-1和Ut中拷贝的概率,|R|为回复的长度;
Figure FDA00041268334000000216
表示从概率分布
Figure FDA00041268334000000217
中采样得到的实例;Rt表示当前轮的医师回复;Rt-1表示前一轮的医师回复;Ut表示当前轮问题;
Figure FDA00041268334000000218
表示当前轮对话中的推理状态追踪器的输出概率分布;
Figure FDA00041268334000000219
表示当前轮对话中的推理策略网络的输出概率分布;
in,
Figure FDA00041268334000000213
represents the probability generated from the vocabulary,
Figure FDA00041268334000000214
Indicates from
Figure FDA00041268334000000215
The probability of copying in R t-1 and U t , |R| is the length of the reply;
Figure FDA00041268334000000216
Represents the probability distribution
Figure FDA00041268334000000217
The examples sampled in ; R t represents the doctor’s response in the current round; R t-1 represents the doctor’s response in the previous round; U t represents the question in the current round;
Figure FDA00041268334000000218
represents the output probability distribution of the reasoning state tracker in the current round of dialogue;
Figure FDA00041268334000000219
Represents the output probability distribution of the reasoning strategy network in the current round of dialogue;
根据两阶段层叠推理训练方法,将无监督数据的训练损失函数Lun拆分成Ls和La两个训练目标,由于策略网络依赖于状态追踪器的输出,故首先优化推理状态追踪器和推理策略网络,再同时优化剩余模块;According to the two-stage cascade inference training method, the training loss function L un of unsupervised data is split into two training targets L s and L a . Since the policy network depends on the output of the state tracker, the inference state tracker and the inference policy network are optimized first, and then the remaining modules are optimized at the same time. 其中,in,
Figure FDA0004126833400000031
Figure FDA0004126833400000031
Figure FDA0004126833400000032
Figure FDA0004126833400000032
Figure FDA0004126833400000033
Figure FDA0004126833400000033
其中E·表示期望,KL(·||·)表KL散度(Kullback-Leibler divergence);At表示当前轮医师应采取的动作;St表示输出当前轮的状态;St-1表示前一轮追踪到的状态;
Figure FDA0004126833400000034
表示前一轮对话中的推理状态追踪器的输出概率分布;
Figure FDA0004126833400000035
表示当前轮对话中的推理状态追踪器的输出概率分布;
Figure FDA0004126833400000036
表示At的先验分布;
Figure FDA0004126833400000037
表示回复生成器;
Where E· represents expectation, KL(·||·) represents KL divergence (Kullback-Leibler divergence); A t represents the action that the doctor should take in the current round; S t represents the output state of the current round; S t-1 represents the state tracked in the previous round;
Figure FDA0004126833400000034
represents the output probability distribution of the reasoning state tracker in the previous round of dialogue;
Figure FDA0004126833400000035
represents the output probability distribution of the reasoning state tracker in the current round of dialogue;
Figure FDA0004126833400000036
represents the prior distribution of At ;
Figure FDA0004126833400000037
represents a reply generator;
第一阶段最小化Ls提升模型状态追踪性能,第二阶段最小化Ls+La以维持状态追踪效果及训练模型的策略学习能力。In the first stage, L s is minimized to improve the model state tracking performance, and in the second stage, L s + L a is minimized to maintain the state tracking effect and the strategy learning ability of the training model.
2.如权利要求1所述的半监督的多轮医疗对话回复生成方法,其特征在于,推理状态追踪器和推理策略网络均是编码器-解码器结构。2. The semi-supervised multi-round medical dialogue response generation method as described in claim 1 is characterized in that the reasoning state tracker and the reasoning strategy network are both encoder-decoder structures. 3.如权利要求1所述的半监督的多轮医疗对话回复生成方法,其特征在于,先验状态追踪器和先验策略网络均是编码器-解码器结构。3. The semi-supervised multi-round medical dialogue response generation method as described in claim 1 is characterized in that the prior state tracker and the prior strategy network are both encoder-decoder structures. 4.如权利要求1所述的半监督的多轮医疗对话回复生成方法,其特征在于,回复生成器是一个基于GRU的解码器。4. The semi-supervised multi-round medical dialogue response generation method as described in claim 1 is characterized in that the response generator is a GRU-based decoder. 5.一种半监督的多轮医疗对话回复生成系统,其特征在于,包括:5. A semi-supervised multi-round medical dialogue response generation system, characterized by comprising: 第一轮对话回复生成模块,其用于将第一轮对话中病人的问题输入至半监督医疗对话模型,得到第一轮对话的回复;A first-round dialogue response generation module, which is used to input the patient's questions in the first-round dialogue into the semi-supervised medical dialogue model to obtain the responses to the first-round dialogue; 第二轮及其后对话回复生成模块,其用于在第二轮及其后对话中,将当前轮病人的问题及上一轮对话的回复输入至半监督医疗对话模型中,得到相应轮对话的回复,直至病人无新的问题输入;The second and subsequent dialogue response generation module is used to input the patient's questions in the current round and the responses in the previous round of dialogue into the semi-supervised medical dialogue model in the second and subsequent dialogues, and obtain the responses in the corresponding round of dialogues until the patient has no new questions input; 其中,半监督医疗对话模型包括上下文编码器、先验状态追踪器、推理策略状态追踪器、先验策略网络、推理策略网络和回复生成器,上下文编码器用于对接收到的信息进行编码并输入至先验状态追踪器和先验策略网络中,先验状态追踪器用于不断追踪用户的身体状态,先验策略网络的输入信号为:从当前轮对话中的推理状态追踪器的输出概率分布中采样得到的状态实例,以及外部医疗知识图谱G;Among them, the semi-supervised medical dialogue model includes a context encoder, a priori state tracker, an inference strategy state tracker, a priori strategy network, an inference strategy network and a response generator. The context encoder is used to encode the received information and input it into the priori state tracker and the priori strategy network. The priori state tracker is used to continuously track the user's physical state. The input signal of the priori strategy network is: the state instance sampled from the output probability distribution of the inference state tracker in the current round of dialogue, and the external medical knowledge graph G; 先验策略网络的解码过程包含两个部分,一种从词表中生成,另一种从检索得到的知识图谱Gn中进行拷贝:The decoding process of the prior policy network consists of two parts, one is generated from the vocabulary, and the other is copied from the retrieved knowledge graph Gn :
Figure FDA0004126833400000041
Figure FDA0004126833400000041
Figure FDA0004126833400000042
Figure FDA0004126833400000042
其中,
Figure FDA0004126833400000043
为一个连续空间向量,表示对话上下文;先验策略网络使用GRU自编码器将
Figure FDA0004126833400000044
编码为
Figure FDA0004126833400000045
Figure FDA0004126833400000046
表示从当前轮对话中的推理状态追踪器的输出概率分布中
Figure FDA0004126833400000047
采样的到的状态实例;MLP表示的是多层感知机制,
Figure FDA0004126833400000051
表示先验策略网络在第i个解码时刻的输出;ej表示Gn中的第j个节点,gj表示
Figure FDA0004126833400000052
中第j个节点的词嵌入;ZA为生成拷贝的正则项;在ej=At,i的情况下,I(ej,At,i)=1,否则I(ej,At,i)=0;
in,
Figure FDA0004126833400000043
is a continuous space vector representing the conversation context; the prior strategy network uses the GRU autoencoder to
Figure FDA0004126833400000044
Encoded as
Figure FDA0004126833400000045
Figure FDA0004126833400000046
represents the output probability distribution of the inference state tracker in the current round of dialogue
Figure FDA0004126833400000047
The sampled state instance; MLP represents the multi-layer perception mechanism,
Figure FDA0004126833400000051
represents the output of the prior strategy network at the i-th decoding time; e j represents the j-th node in G n , and g j represents
Figure FDA0004126833400000052
The word embedding of the jth node in ; Z A is the regularization term for generating copies; in the case of e j = At ,i , I(e j ,A t,i ) = 1, otherwise I(e j ,A t,i ) = 0;
先验策略网络用于生成医师动作,输出概率分布
Figure FDA0004126833400000053
Figure FDA0004126833400000054
The prior policy network is used to generate physician actions and output probability distribution
Figure FDA0004126833400000053
Figure FDA0004126833400000054
其中,|A|表示动作的长度;Among them, |A| represents the length of the action; 回复生成器用于根据身体状态及医师动作,生成对应的回复;The response generator is used to generate corresponding responses according to the physical state and the physician's actions; 推理状态追踪器用于推理出用户的身体状态,推理策略网络用于推理出医师动作;推理状态追踪器和推理策略网络仅仅只在半监督医疗对话模型的训练阶段执行;The inference state tracker is used to infer the user's physical state, and the inference strategy network is used to infer the doctor's actions. The inference state tracker and the inference strategy network are only executed during the training phase of the semi-supervised medical dialogue model. 在无监督训练过程中,从
Figure FDA0004126833400000055
Figure FDA0004126833400000056
分别采样得到
Figure FDA0004126833400000057
Figure FDA0004126833400000058
将其编码为
Figure FDA0004126833400000059
Figure FDA00041268334000000510
后,来初始化回复生成器的解码器为
Figure FDA00041268334000000511
Figure FDA00041268334000000512
在第i个解码时刻输出
Figure FDA00041268334000000513
则得到Rt的输出概率为:
In the unsupervised training process,
Figure FDA0004126833400000055
and
Figure FDA0004126833400000056
Sampled separately
Figure FDA0004126833400000057
and
Figure FDA0004126833400000058
Encode it as
Figure FDA0004126833400000059
and
Figure FDA00041268334000000510
Then, we initialize the decoder of the reply generator as
Figure FDA00041268334000000511
Figure FDA00041268334000000512
Output at the i-th decoding time
Figure FDA00041268334000000513
Then the output probability of R t is:
Figure FDA00041268334000000514
Figure FDA00041268334000000514
其中,
Figure FDA00041268334000000515
表示从词表中生成的概率,
Figure FDA00041268334000000516
表示从
Figure FDA00041268334000000517
Rt-1和Ut中拷贝的概率,|R|为回复的长度;
Figure FDA00041268334000000518
表示从概率分布
Figure FDA00041268334000000519
中采样得到的实例;Rt表示当前轮的医师回复;Rt-1表示前一轮的医师回复;Ut表示当前轮问题;
Figure FDA00041268334000000520
表示当前轮对话中的推理状态追踪器的输出概率分布;
Figure FDA00041268334000000521
表示当前轮对话中的推理策略网络的输出概率分布;
in,
Figure FDA00041268334000000515
represents the probability generated from the vocabulary,
Figure FDA00041268334000000516
Indicates from
Figure FDA00041268334000000517
The probability of copying in R t-1 and U t , |R| is the length of the reply;
Figure FDA00041268334000000518
Represents the probability distribution
Figure FDA00041268334000000519
The examples sampled in ; R t represents the doctor’s response in the current round; R t-1 represents the doctor’s response in the previous round; U t represents the question in the current round;
Figure FDA00041268334000000520
represents the output probability distribution of the reasoning state tracker in the current round of dialogue;
Figure FDA00041268334000000521
Represents the output probability distribution of the reasoning strategy network in the current round of dialogue;
根据两阶段层叠推理训练方法,将无监督数据的训练损失函数Lun拆分成Ls和La两个训练目标,由于策略网络依赖于状态追踪器的输出,故首先优化推理状态追踪器和推理策略网络,再同时优化剩余模块;According to the two-stage cascade inference training method, the training loss function L un of unsupervised data is split into two training targets L s and L a . Since the policy network depends on the output of the state tracker, the inference state tracker and the inference policy network are optimized first, and then the remaining modules are optimized at the same time. 其中,in,
Figure FDA0004126833400000061
Figure FDA0004126833400000061
Figure FDA0004126833400000062
Figure FDA0004126833400000062
Figure FDA0004126833400000063
Figure FDA0004126833400000063
其中E·表示期望,KL(·||·)表KL散度(Kullback-Leibler divergence);At表示当前轮医师应采取的动作;St表示输出当前轮的状态;St-1表示前一轮追踪到的状态;
Figure FDA0004126833400000064
表示前一轮对话中的推理状态追踪器的输出概率分布;
Figure FDA0004126833400000065
表示当前轮对话中的推理状态追踪器的输出概率分布;
Figure FDA0004126833400000066
表示At的先验分布;
Figure FDA0004126833400000067
表示回复生成器;
Where E· represents expectation, KL(·||·) represents KL divergence (Kullback-Leibler divergence); A t represents the action that the doctor should take in the current round; S t represents the output state of the current round; S t-1 represents the state tracked in the previous round;
Figure FDA0004126833400000064
represents the output probability distribution of the reasoning state tracker in the previous round of dialogue;
Figure FDA0004126833400000065
represents the output probability distribution of the reasoning state tracker in the current round of dialogue;
Figure FDA0004126833400000066
represents the prior distribution of At ;
Figure FDA0004126833400000067
represents a reply generator;
第一阶段最小化Ls提升模型状态追踪性能,第二阶段最小化Ls+La以维持状态追踪效果及训练模型的策略学习能力。In the first stage, L s is minimized to improve the model state tracking performance, and in the second stage, L s + L a is minimized to maintain the state tracking effect and the strategy learning ability of the training model.
6.如权利要求5所述的半监督的多轮医疗对话回复生成系统,其特征在于,推理状态追踪器和推理策略网络均是编码器-解码器结构;先验状态追踪器和先验策略网络均是编码器-解码器结构。6. The semi-supervised multi-round medical dialogue response generation system as described in claim 5 is characterized in that the reasoning state tracker and the reasoning strategy network are both encoder-decoder structures; the prior state tracker and the prior strategy network are both encoder-decoder structures. 7.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-4中任一项所述的半监督的多轮医疗对话回复生成方法中的步骤。7. A computer-readable storage medium having a computer program stored thereon, characterized in that when the program is executed by a processor, the steps in the semi-supervised multi-round medical dialogue response generation method as described in any one of claims 1 to 4 are implemented. 8.一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求1-4中任一项所述的半监督的多轮医疗对话回复生成方法中的步骤。8. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, the steps in the semi-supervised multi-round medical dialogue response generation method as described in any one of claims 1 to 4 are implemented.
CN202110577272.8A 2021-05-26 2021-05-26 Semi-supervised multi-round medical dialogue reply generation method and system Active CN113436752B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110577272.8A CN113436752B (en) 2021-05-26 2021-05-26 Semi-supervised multi-round medical dialogue reply generation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110577272.8A CN113436752B (en) 2021-05-26 2021-05-26 Semi-supervised multi-round medical dialogue reply generation method and system

Publications (2)

Publication Number Publication Date
CN113436752A CN113436752A (en) 2021-09-24
CN113436752B true CN113436752B (en) 2023-04-28

Family

ID=77802906

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110577272.8A Active CN113436752B (en) 2021-05-26 2021-05-26 Semi-supervised multi-round medical dialogue reply generation method and system

Country Status (1)

Country Link
CN (1) CN113436752B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111710150A (en) * 2020-05-14 2020-09-25 国网江苏省电力有限公司南京供电分公司 A method for detecting abnormal electricity consumption data based on adversarial self-encoding network
CN111797220A (en) * 2020-07-30 2020-10-20 腾讯科技(深圳)有限公司 Dialog generation method and device, computer equipment and storage medium
CN111897941A (en) * 2020-08-14 2020-11-06 腾讯科技(深圳)有限公司 Dialog generation method, network training method, device, storage medium and equipment
CN112464645A (en) * 2020-10-30 2021-03-09 中国电力科学研究院有限公司 Semi-supervised learning method, system, equipment, storage medium and semantic analysis method

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309275B (en) * 2018-03-15 2024-06-14 北京京东尚科信息技术有限公司 Dialog generation method and device
CN109582767B (en) * 2018-11-21 2024-05-17 北京京东尚科信息技术有限公司 Dialogue system processing method, device, equipment and readable storage medium
CN109977212B (en) * 2019-03-28 2020-11-24 清华大学深圳研究生院 Reply content generation method of conversation robot and terminal equipment
CN109992657B (en) * 2019-04-03 2021-03-30 浙江大学 A Conversational Question Generation Method Based on Enhanced Dynamic Reasoning
CN109933661B (en) * 2019-04-03 2020-12-18 上海乐言信息科技有限公司 Semi-supervised question-answer pair induction method and system based on deep generation model
CN110297895B (en) * 2019-05-24 2021-09-17 山东大学 Dialogue method and system based on free text knowledge
CN110321417B (en) * 2019-05-30 2021-06-11 山东大学 Dialog generation method, system, readable storage medium and computer equipment
CN111428483B (en) * 2020-03-31 2022-05-24 华为技术有限公司 Voice interaction method, device and terminal device
CN111767383B (en) * 2020-07-03 2022-07-08 思必驰科技股份有限公司 Conversation state tracking method, system and man-machine conversation method
CN112164476A (en) * 2020-09-28 2021-01-01 华南理工大学 A method for generating medical consultation dialogue based on multitasking and knowledge guidance
CN112289467B (en) * 2020-11-17 2022-08-02 中山大学 Low-resource scene migratable medical inquiry dialogue system and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111710150A (en) * 2020-05-14 2020-09-25 国网江苏省电力有限公司南京供电分公司 A method for detecting abnormal electricity consumption data based on adversarial self-encoding network
CN111797220A (en) * 2020-07-30 2020-10-20 腾讯科技(深圳)有限公司 Dialog generation method and device, computer equipment and storage medium
CN111897941A (en) * 2020-08-14 2020-11-06 腾讯科技(深圳)有限公司 Dialog generation method, network training method, device, storage medium and equipment
CN112464645A (en) * 2020-10-30 2021-03-09 中国电力科学研究院有限公司 Semi-supervised learning method, system, equipment, storage medium and semantic analysis method

Also Published As

Publication number Publication date
CN113436752A (en) 2021-09-24

Similar Documents

Publication Publication Date Title
Shen et al. Improving variational encoder-decoders in dialogue generation
CN112271001B (en) Medical consultation dialogue system and method applying heterogeneous graph neural network
Li et al. Adversarial learning for neural dialogue generation
US11494647B2 (en) Slot filling with contextual information
CN110111864B (en) A relational model-based medical report generating system and its generating method
CN113448477A (en) Interactive image editing method and device, readable storage medium and electronic equipment
CN112527966B (en) Network text emotion analysis method based on Bi-GRU neural network and self-attention mechanism
Chen et al. Delving deeper into the decoder for video captioning
CN114528898A (en) Scene graph modification based on natural language commands
CN116681810B (en) Virtual object action generation method, device, computer equipment and storage medium
CN114841122A (en) Text extraction method combining entity identification and relationship extraction, storage medium and terminal
Li et al. DQ-HGAN: A heterogeneous graph attention network based deep Q-learning for emotional support conversation generation
CN112131372A (en) Knowledge-driven dialogue strategy network optimization method, system and device
CN112463935A (en) Open domain dialogue generation method and model with strong generalized knowledge selection
US20250054322A1 (en) Attribute Recognition with Image-Conditioned Prefix Language Modeling
Pham et al. Applied Hedge Algebra Approach with Multilingual Large Language Models to Extract Hidden Rules in Datasets for Improvement of Generative AI Applications
Han et al. Guided discrete diffusion for electronic health record generation
CN113436752B (en) Semi-supervised multi-round medical dialogue reply generation method and system
CN113723079A (en) Method for hierarchical modeling contribution-aware context for long-distance dialog state tracking
WO2024243183A2 (en) Training human-guided ai networks
CN116402064B (en) Comment generation method, comment generation system, storage medium and electronic equipment
CN116977509A (en) Virtual object action generation method, device, computer equipment and storage medium
US20210103636A1 (en) Summarization of group chat threads
Ahmed Combining neural networks with knowledge for spoken dialogue systems
Afrae et al. A Question answering System with a sequence to sequence grammatical correction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant