CN113436752B

CN113436752B - Semi-supervised multi-round medical dialogue reply generation method and system

Info

Publication number: CN113436752B
Application number: CN202110577272.8A
Authority: CN
Inventors: 任昭春; 任鹏杰; 陈竹敏; 李冬冬; 马军
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2021-05-26
Filing date: 2021-05-26
Publication date: 2023-04-28
Anticipated expiration: 2041-05-26
Also published as: CN113436752A

Abstract

The invention belongs to the field of conversational information processing, and provides a semi-supervised multi-round medical conversation reply generation method and system. The method comprises the steps of inputting the problems of a patient in a first round of dialogue into a semi-supervised medical dialogue model to obtain replies of the first round of dialogue; in the second round and the later conversations, inputting the problems of the current round of patients and the replies of the previous round of conversations into a semi-supervised medical conversation model to obtain replies of the corresponding round of conversations until the patients have no new problem input; the semi-supervised medical dialogue model comprises a context encoder, a priori state tracker, an inference strategy state tracker, a priori strategy network, an inference strategy network and a reply generator, wherein the context encoder is used for encoding received information and inputting the information into the priori state tracker and the priori strategy network, the priori state tracker is used for continuously tracking the physical state of a user, the priori strategy network is used for generating a doctor action, and the reply generator is used for generating a corresponding reply according to the physical state and the doctor action.

Description

Semi-supervised multi-round medical dialogue reply generation method and system

Technical Field

The invention belongs to the field of conversational information processing, and particularly relates to a semi-supervised multi-round medical conversation reply generation method and system.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

At the same time, session paradigms are used to relate people to information in order to address information needs in open areas and professional needs in highly vertical areas. Existing dialog systems can be divided into two main categories: task oriented and open domain dialog systems. Task oriented dialog systems are intended to help people accomplish specific tasks. Such as scheduling, ordering restaurants, querying weather. Open domain dialog systems are mainly chatting with people to meet the needs of people for information and entertainment. Unlike medical questions and answers, conversations in real medical scenes are more likely to involve multiple rounds of interaction. Because the patient needs to express his/her symptoms, the medication he/she is taking, and his/her medical history by the context of the conversation. This feature makes explicit state tracking indispensable, providing more indicative and interpretable information than hidden state representations. Taking into account the specificity of medical dialogs, medical reasoning capabilities (e.g., whether to prescribe a drug, what to prescribe a drug to treat a disease, what to ask for symptoms) are also an indispensable feature in medical diagnostics.

Existing medical dialog methods are constructed based on task-oriented dialog paradigms, following which the patient expresses symptoms, and the dialog system returns a paradigm of diagnostic results (i.e., determining what disease the patient suffers from). It achieves good effect. However, these methods focus on diagnosing only a single field, and cannot meet various requirements of patients in practical applications, and require a large number of states and actions to be manually marked. When dialogue data is highly confidential or data-large, it is not possible to achieve, and these works are limited by the size of training data, even if the reply cannot be generated using a generative method, it can only be composed by means of templates. Some task-based dialog methods can be applied to state tracking in medical dialogs, but they still cannot cope with situations where there is insufficient annotation data. In order to alleviate the requirement of the task oriented dialog system for data annotation, jin and Zhang and the like both use semi-supervised learning methods for state tracking, but neglect the reasoning ability of the dialog body, i.e. the actions of the unmodeled physician. Liang et al propose a method of training specific modules in a task oriented dialog system using incompletely labeled data, but cannot infer unlabeled labels at the training time, resulting in limited improvement in the medical dialog system with simultaneous stateless and action labeling. The inventors have found that none of these methods allow for retrieval from large scale medical knowledge, fail to generate knowledge-rich replies, and perform poorly in situations where medical conversations are a strong requirement for reasoning.

Disclosure of Invention

In order to solve the technical problems in the background art, the invention provides a semi-supervised multi-round medical dialogue reply generation method and system, which simultaneously consider the state of a patient and the action of a doctor, so that the dialogue system has the capability of modeling the physical state of the user and medical reasoning.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

a first aspect of the present invention provides a semi-supervised multi-round medical session reply generation method.

A semi-supervised multi-round medical session reply generation method, comprising:

inputting the problems of the patient in the first round of dialogue into a semi-supervised medical dialogue model to obtain the reply of the first round of dialogue;

in the second round and the later conversations, inputting the problems of the current round of patients and the replies of the previous round of conversations into a semi-supervised medical conversation model to obtain replies of the corresponding round of conversations until the patients have no new problem input;

the semi-supervised medical dialogue model comprises a context encoder, a priori state tracker, an inference state tracker, a priori strategy network, an inference strategy network and a reply generator, wherein the context encoder is used for encoding received information and inputting the information into the priori state tracker and the priori strategy network, the priori state tracker is used for continuously tracking the physical state of a user, the priori strategy network is used for generating corresponding actions of a doctor, and the reply generator is used for generating corresponding replies according to the physical state and the actions of the doctor;

the reasoning state tracker is used for reasoning the physical state of the user, and the reasoning strategy network is used for reasoning the action of the doctor; the inference state tracker and the inference policy network are only executed during the training phase of the semi-supervised medical dialogue model.

A second aspect of the invention provides a semi-supervised multi-round medical session reply generation system.

A semi-supervised multi-round medical session reply generation system, comprising:

the first-round dialogue reply generation module is used for inputting the problems of the patient in the first-round dialogue into the semi-supervised medical dialogue model to obtain replies of the first-round dialogue;

the second round and the later dialogue reply generation module are used for inputting the problems of the current round of patients and the replies of the previous round of dialogue into the semi-supervised medical dialogue model in the second round and the later dialogue, so as to obtain replies of the corresponding round of dialogue until the patients have no new problem input;

A third aspect of the present invention provides a computer-readable storage medium.

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps in a semi-supervised multi-round medical session reply generation method as described above.

A fourth aspect of the invention provides a computer device.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in a semi-supervised multi-round medical session reply generation method as described above when the program is executed.

Compared with the prior art, the invention has the beneficial effects that:

(1) In the second round and the later dialogue, the invention inputs the problems of the current round of patients and the replies of the previous round of dialogue into the semi-supervised medical dialogue model to obtain the replies of the corresponding round of dialogue until the patients have no new problem input, explicitly models the physical state of the user and the actions of doctors, uses text span to express, and improves the capability of the model for modeling the physiological state of the patients and medical reasoning.

(2) The invention takes the physical state of a user and the action of a doctor as hidden variables at the model level, and provides a training method of the model under the condition that intermediate labels (i.e. supervision) exist and intermediate labels (i.e. no supervision) exist. The method greatly reduces the dependence of the dialogue model on the annotation data.

(3) In the process of strategy network learning, the invention uses the tracked patient state to retrieve from the large-scale medical knowledge graph, and the explicit state, action and reasoning path in the medical knowledge graph promote the interpretability of the response generated by the dialogue system.

(4) On model training, the invention provides a two-stage stacked reasoning method, which improves the stability under the condition of less supervision training data.

Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

FIG. 1 (a) is a supervised data training of an embodiment of the present invention;

FIG. 1 (b) is an unsupervised data training of an embodiment of the present invention;

FIG. 1 (c) is a module used in the testing phase of an embodiment of the present invention;

FIG. 2 is a diagram of a method for implementing a medical dialogue system according to an embodiment of the invention;

FIG. 3 is a schematic diagram of a model during training of an embodiment of the present invention.

Detailed Description

The invention will be further described with reference to the drawings and examples.

It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

Term interpretation:

an Encoder-Decoder (Encoder-Decoder) is a neural network structure, which is used to encode a word sequence, then decode it, and transform it into another word sequence, and is mainly used for machine translation and dialogue system.

Coding (encoding): the word sequence is represented as a continuous vector.

Decoding (decoding): a continuous vector is represented as a target sequence.

It is desirable (Expectation) that the results each time possible in the test are multiplied by the sum of the probabilities of the results, expressed in terms of E in the present invention.

KL Divergence (KL diversity): is an asymmetry measure of the difference between two probability distributions, the invention adopts KL (& |) i·) the form representation of the compound, the calculation formula is as follows:

where q, p represent two discrete distributions, q (i), and p (i) represent the distributions q, p, respectively, the ith probability value.

Hidden variable (variable): latent variables, or latent variables, represent statistically unobservable random variables as opposed to observed variables.

Training phase (Train): the training phase of the neural network model receives training data as input and continuously adjusts parameters in the neural network model through the training samples.

Test phase (Test): after the neural network model is trained, information such as labels corresponding to input data is output through the trained bible network model parameters in a test stage. Hereafter we will also refer to as deployment phase.

Example 1

The embodiment provides a semi-supervised multi-round medical dialogue reply generation method, which comprises the following steps:

The context encoder is used for encoding the received information; direct coding of the questions of the patient of the first session; for the second round and the following dialogue patient questions and corresponding previous round dialogue replies, the codes form context information and are input into five modules of a priori state tracker, an inference state tracker, a priori strategy network, an inference strategy network and a reply generator.

The input signal to the a priori state tracker is: from the probability distribution of the output of the inference state tracker in the previous dialog

Sampled state instance->

Output probability distribution->

The input signals of the inference state tracker are: from the probability distribution of the output of the inference state tracker in the previous dialog

Sampled state instance->

And physician reply R for the current round _t Output probability distribution

The input signals to the a priori policy network are: from the output probability distribution of the inference state tracker in the current round of dialogue

Sampled state instance->

And an external medical knowledge graph G, outputting probability distribution +.>

The input signals of the inference policy network are: from the output probability distribution of the inference state tracker in the current round of dialogue

Sampled state instance->

And physician reply R for the current round _t Output probability distribution->

The input signal of the reply generator is: the reply generator input is divided into two cases, a training phase and a testing phase (i.e., when deployed), the training phase receiving

From probability distribution->

Sample in sample>

As input; the test phase receives->

From probability distribution->

Sample in sample>

As an input there is provided,outputting dialogue reply information R _t 。

In the actual deployment phase, given the representation of the patient in each dialog wheel, the medical dialog system continuously tracks the physical state of the user by using the prior state tracker, generates corresponding actions of the doctor by using the prior policy network, and finally generates corresponding replies by combining the states and actions sampled from the prior state tracker and the prior policy network, corresponding to the process of fig. 1 (c). The session continues until the patient has no new problem input, i.e., the patient actively ends the current session.

The medical dialog system has two key features: patient status (symptoms, medication, etc.) and physician actions (treatment, diagnosis, etc.). These two features make the medical dialog system more complex than other knowledge-intensive dialog scenarios. Similar to the task oriented dialog system, the medical dialog generation process splits into three phases:

(1) Patient state tracking: for a given dialog history, the dialog system tracks the physical state (state) of the state;

(2) Physician policy learning: given the patient status and the dialogue history, the dialogue system gives the current physician's actions (actions);

(3) Medical reply generation: given the dialog history, the tracked states and the predicted actions, a fluent and accurate natural language reply is given.

For the scene with labeling data, at the t-th round of dialogue, the patient gives a question or describes his own symptoms U _t The post-medical dialogue system receives the reply R of the previous round _t-1 Current wheel problem U _t And the state S tracked by the previous round _t-1 Then output the state S of the current wheel _t Then re-use R _t-1 U _t S _t Output action A to be taken by current round doctor _t Finally, generating a reply R in the form of natural language _t And fed back to the patient. In medical dialog systems, however, there are many cases where the physiological state of the patient and the actions of the physician are not annotated. We regard both states and actions as hidden variables and let the state go through the entire dialog process, taking into accountRepresented by a sequence of words; the same is true of physician actions, i.e., the physician's response may include multiple keywords. In the actual operation, the lengths of the states and actions are set to be fixed lengths of |s| and |a| respectively. And the state has an initial value of'<pad><pad>...<pad>", wherein'<pad>"means a filler word. The details of the State and Action design are as follows:

state design: state is used to record information of the physical state of the user acquired by the dialog system throughout the course of the dialog, which is expressed using one sequence of words, for example, "cold fever cough night sweat.+ -. And which is initialized to" < pad > < pad > ".

The design of action-action is used to represent a summary of physician replies, which also uses a sequence representation, such as "999 common cold granule acute branch syrup.

The semi-supervised medical dialog model includes six modules, a context encoder (context encoder), a priori state tracker (prior state tracker), an inferential state tracker (inference state tracker), a priori policy network (prior policy network), an inferential policy network (inference state tracker), and a reply generator (response generator), respectively. In an entire medical session, which often involves multiple interactions, the following process goes through multiple rounds until the session ends.

Wherein the prior state tracker, the inference state tracker is for patient state tracking, wherein the inference state tracker is only executed during a training phase; the prior strategy network is used for the strategy learning of the doctor, wherein the reasoning strategy network is only executed in a training stage; the reply generator is for medical reply generation. From an unsupervised point of view, i.e. using unsupervised data D ^u The input and output of each module is described with respect to fig. 1 (b).

At round t, the context encoder is a GRU (or LSTM, transducer, bert) based encoder that receives the reply R of the previous round _t-1 Problem U of patient of current wheel _t As input, and output a continuous space vector

To represent the dialog context.

At round t, give the previous round to revert to R _t-1 Problem U of patient of current wheel _t As an input, the context Encoder first uses bi-directional GRU Encoder encoding to derive a representation H of sequence word granularity _t ＝{h _t,1 ,h _t,2 ,…,h _t，M+N And outputs a vector

To represent the dialog context. Wherein M and N each represent R _t-1 and U_t Sequence length.

wherein

R represents _t-1 Word embedding (embedding) of the i-th word, this biglu encoder uses the contextual representation of the last moment +.>

Initialization, attn [17 ]]Indicated is the action operation.

A priori state tracker receiving the context encoder output and the state of the previous time instant

As input, a GRU-based decoder is then used to output a sequence of words, namely +.>

The inference state tracker adopts a structure similar to that of the prior state tracker, but additionally accepts the reply R of the current round _t As input, it outputs a word sequence, i.e

We use +.>

and />

Representing a priori state tracker and inferential state tracker, respectively, generating a probability distribution abbreviated as +.>

and />

Both the a priori state tracker and the inferred state tracker are Encoder-Decoder (Encoder-Decoder) structures. In the case of unsupervised information, the state of all the dialog turns is agnostic and the state of the latter turn needs to be dependent on the state of the former turn as input, so we follow

Sampling to get->

Into an a priori state tracker and an inferred state tracker.

The prior state tracker firstly obtains the sampling

Coding as->

Use->

Initializing a decoder of a priori state tracker, wherein +.>

Is a training parameter. At the ith decoding time, output +.>

Then the sequence is decoded to obtain S _t Is:

wherein MLP denotes a Multi-Layer Perceptron (Multi-Layer Perceptron). And S is the length of the state text span.

The inference state tracker is similar in structure to the a priori state tracker, which will also use the GRU Encoder

Coding as->

In addition to coding R _t Is->

Use of->

Initializing the decoder, wherein->

Is a training parameter. At the ith decoding time, output +.>

Then the sequence is decoded to obtain S _t Is a similar posterior distribution of (1): />

A priori policy network for receiving the output of the context encoder, S of the current round _t And external medical knowledge G as input, then outputting a sequence of words using a GRU-based decoder, i.e

The inference policy network is similar in structure and receives +.>

S _t And additionally receives the current wheel reply R _t As an output, a sequence of words is then output, i.e. +.>

We use

and />

Respectively representing a priori policy network and an inferential policy network, abbreviated as +.>

and />

The prior policy network and the inference policy network are also the structures of the Encoder-Decoder. Wherein the prior policy network is slave

Sampling to get->

Inference policy network from->

Sampling to get->

Before introducing two strategy networks, firstly introducing a knowledge-graph searching operation qsub and a knowledge-graph coding operation RGAT 15]. The state obtained by the qsub from the G using tracking is retrieved from the medical knowledge graph G to obtain a sub graph G _n Extracting all nodes and edges reachable by n-step jump from state as starting point, and connecting all nodes appearing in state to ensure graph G _n Is fully connected. RGAT is a graph coding method which combines the types of edges, and obtains the ebedding representation of the node after multiple propagation, namely, a vector representation on a continuous space. We use

Represents G _n Encoded node representation, wherein |G _n I is G _n The number of nodes in the network.

The prior policy network will use GRU Encoder

Coding as->

Later use +.>

To initialize the decoder, and at the ith decoding time, output +.>

The decoding process comprises two parts, one is generated from the word list, and the other is the knowledge graph G obtained from the retrieval _n Copy of (a) is made.

wherein e_j Represents G _n The j-th node g _j Representation of

The j-th node of the list. Z is Z _A To generate

And (5) copying the regular term. At e _j ＝A _t,i In case I (e) _j ,A _t,i ) =1, otherwise I (e _j ,A _t,i )＝0。

Then A _t The a priori distribution of (c) can be expressed as:

inference policy network uses GRU Encoder coding

Coding as->

Coding R _t Is->

Is used later

Initializing the decoder to output +.>

To strengthen R _t For the effect of the results, for A _t Approximating the posterior distribution we consider only the direct probability of generation./>

The reply generator is a GRU-based decoder that receives the context encoder output

S _t and A_t As input, then output a medical reply R _t . Use->

Representing a reply generator, abbreviated +_>

The reply generator uses only the output of the inference state tracker and the inference policy network during the unsupervised training phase. During the unsupervised training, we follow

and />

Sampling to obtain +.>

and />

Encode it as +.>

And

the decoder of the reply generator is initialized later to +.>

Output at the ith decoding time

Then R is obtained _t The output probability of (2) is:

wherein

Representing the probability generated from the vocabulary, +.>

Representing from->

R _t-1 and U_t And R is the length of the reply.

Training loss functions for supervised and unsupervised data are L respectively ^sup and L^un, wherein L^un The method comprises the following steps:

wherein E represents the desired value of, KL (||·) Table KL divergence (Kullback-Leibler divergence).

Considering the instability that exists in training with a small proportion of supervision data, i.e. the a priori policy network is vulnerable to state errors from errors sampled by a priori state tracker. The invention provides a two-stage cascade reasoning training method, which comprises the following steps of ^un The training parts, respectively, are optimized first, since the policy network depends on the output of the state tracker

And simultaneously optimizing the rest modules to improve the stability in the training process. L (L) ^un Is split intoL ^s and L^a Two training goals:

in the first training phase, minimize L ^s Improving the state tracking performance of the model and minimizing L in the second stage ^s +L ^a To maintain state tracking effects and policy learning ability of the training model. We name this as a two-stage stacked inference training approach.

FIG. 3 is a schematic diagram of a model during training, global_step being an integer for recording the number of training passes.

In a semi-supervised scenario, dialogue data for model training has two parts, supervised and unsupervised data, respectively, we will describe below for the supervised data D ^a And unsupervised data D ^u Is a training method of (a).

(a) For supervision data D ^a

From D ^a Sampling training samples to form small batches (i.e. mini-batch) required by training to obtain data R _t-1 ,U _t ,S _t-1 ,S _t ,A _t ，R _t . The corresponding input data is fed to the above-mentioned 6 modules, corresponding to (a) in fig. 1. Negative Log Likelihood (NLL) Loss was used for training. The actual training loss function is:

(b) For unsupervised data D ^u

From D ^u Sampling training samples to form small batches (i.e. mini-batch) required by training to obtain data R _t - ₁ ,U _t ,R _t . Intermediate annotation data S _t-1 ,S _t ,A _t The marks are absent. We follow from

Sampling to get->

Will be->

Is sent to->

and />

Is a kind of medium. And then, from->

and />

Respectively sampling to obtain->

and />

Respectively as->

and />

Is input to the computer. And then from->

Sampling to get->

Finally->

Binding R _t-1 ,U _t Generating a reply R _t The above process corresponds to (b) in fig. 1. Training loss is L ^un (also optionally L ^s +L ^a As training loss to improve training stability).

For the whole training dataset d= { D ^a ，D ^u Specific training steps are as follows:

step1 assume supervisor data D ^a The proportion of the training data D is alpha (0 is less than or equal to alpha is less than or equal to 1), the random number between 0 and 1 is selected, if the random number is smaller than alpha to Step2, and if the random number is larger than alpha to Step3.

Step2, training a model by using the supervision data, wherein the training loss is L corresponding to the mode (a) ^sup The gradient falls to update the parameters and then goes to Step4.

Step3, training a model by using the supervision data, wherein the training loss is L in a corresponding mode (b) ^un The gradient falls to update the parameters and then goes to Step4.

Step4, judging whether the model converges, if so, turning to Step5, otherwise turning to Step1.

Step5, saving the model weight, and ending training, as shown in fig. 3.

The semi-supervised medical dialogue model is trained using medical dialogue datasets disclosed in the current industry and academia. And sending the sampled supervision data and the sampled non-supervision data into a model, calculating a corresponding loss function, then carrying out gradient descent, and optimizing model parameters.

After model training is completed, the parameters of the model are all fixed, and the inference state tracker and the inference policy network can be discarded. At this point, the model can be applied to the actual dialog scene. As shown in FIG. 2, in a given patient problem input model, the context encoder, the prior state tracker, the prior policy network, and the reply generator work sequentially (at this point the reply generator uses only the prior state tracker output)

And the output of the a priori policy network ∈>

As input), and finally generates a reply back to the user. The dialogue system continuously interacts with the patient, the prior state tracker uses the state of the previous moment as input in each dialogue round, updates the tracked physical state of the patient, waits for a period of time to receive no new problem of the patient, and ends the current session.

Example two

the semi-supervised medical dialogue model comprises a context encoder, a priori state tracker, an inference strategy state tracker, a priori strategy network, an inference strategy network and a reply generator, wherein the context encoder is used for encoding received information and inputting the information into the priori state tracker and the priori strategy network, the priori state tracker is used for continuously tracking the physical state of a user, the priori strategy network is used for generating corresponding actions of a doctor, and the reply generator is used for generating corresponding replies according to the physical state and the actions of the doctor;

The modules in this embodiment are in one-to-one correspondence with the steps in the first embodiment, and the implementation process is the same, which is not described here again.

Example III

The present embodiment provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps in a semi-supervised multi-round medical session reply generation method as described above.

Example IV

The present embodiment provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the semi-supervised multi-round medical dialogue reply generation method as described above when executing the program.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random access Memory (Random AccessMemory, RAM), or the like.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of generating a semi-supervised multi-round medical session reply, comprising:

the semi-supervised medical dialogue model comprises a context encoder, a priori state tracker, an inference strategy state tracker, a priori strategy network, an inference strategy network and a reply generator, wherein the context encoder is used for encoding received information and inputting the information into the priori state tracker and the priori strategy network, the priori state tracker is used for continuously tracking the physical state of a user, and input signals of the priori strategy network are as follows: a state instance obtained by sampling from the output probability distribution of an inference state tracker in the current round of dialogue and an external medical knowledge graph G;

the decoding process of the prior strategy network comprises two parts, one part is generated from the word list, and the other part is obtained from the retrieved knowledge graph G _n Copy of (a):

wherein ,

representing the dialogue context as a continuous space vector; the a priori policy network uses the GRU self-encoder to apply +.>

Coding as->

Representing +.>

Sampled incoming state realExamples are; MLP represents a multi-layer perceptron, the +.>

Representing the output of the prior strategy network at the ith decoding moment; e, e _j Represents G _n The j-th node g _j Representation->

Word embedding of the j-th node in the list; z is Z _A Generating a regularized item of the copy; at e _j ＝A _t,i In the case of (a), I (e _j ,A _t,i ) =1, otherwise I (e _j ,A _t,i )＝0；

The prior strategy network is used for generating doctor actions and outputting probability distribution

Where |a| represents the length of the action;

the reply generator is used for generating a corresponding reply according to the physical state and the action of the doctor;

the reasoning state tracker is used for reasoning the physical state of the user, and the reasoning strategy network is used for reasoning the action of the doctor; the inference state tracker and the inference policy network are only executed during the training phase of the semi-supervised medical dialogue model;

during the unsupervised training process, from

and />

Sampling to obtain +.>

and />

Encode it as +.>

and />

Later, the decoder of the reply generator is initialized to +.>

Output +.>

Then R is obtained _t The output probability of (2) is:

wherein ,

representing the probability generated from the vocabulary, +.>

Representing from->

R _t-1 and U_t The probability of the copy of the file, |R| is the length of the reply; />

Representing the probability distribution->

An instance obtained by sampling; r is R _t Physician replies indicating the current round; r is R _t-1 Representing a physician reply of a previous round; u (U) _t Representing a current wheel problem; />

Representing the output probability distribution of the inference state tracker in the current round of dialogue; />

Representing the output probability distribution of the inference strategy network in the current round of dialogue;

training loss function L of unsupervised data according to two-stage stacked reasoning training method ^un Splitting into L ^s and L^a The two training targets, because the strategy network depends on the output of the state tracker, firstly optimize the reasoning state tracker and the reasoning strategy network, and then optimize the rest modules at the same time;

wherein ,

wherein E represents the desired value of, KL (||·) table KL divergence (Kullback-Leibler divergence); a is that _t Representing the action that the current round of physician should take; s is S _t Representing the state of the output current wheel; s is S _t-1 Representing the state tracked by the previous round;

representing the output probability distribution of the inference state tracker in the previous dialog; />

Representation A _t Is a priori distributed of (a);

a representation reply generator;

first stage minimizing L ^s Improving the state tracking performance of the model and minimizing L in the second stage ^s +L ^a To maintain state tracking effects and policy learning ability of the training model.

2. The semi-supervised multi-round medical session reply generation method of claim 1, wherein the inference state tracker and the inference policy network are both encoder-decoder structures.

3. The semi-supervised multi-round medical session reply generation method of claim 1, wherein the a priori state tracker and the a priori policy network are both encoder-decoder structures.

4. The semi-supervised, multi-round medical session reply generation method of claim 1, wherein the reply generator is a GRU-based decoder.

5. A semi-supervised multi-round medical session reply generation system, comprising:

wherein ,

Coding as->

Representing +.>

Sampled state instances; MLP represents a multi-layer perceptron, the +.>

Where |a| represents the length of the action;

during the unsupervised training process, from

and />

Sampling to obtain +.>

and />

Encode it as +.>

and />

Later, the decoder of the reply generator is initialized to +.>

Output +.>

Then R is obtained _t The output probability of (2) is:

wherein ,

representing the probability generated from the vocabulary, +.>

Representing from->

Representing the probability distribution->

wherein ,

Representation A _t Is a priori distributed of (a);

a representation reply generator;

6. The semi-supervised multi-round medical session reply generation system of claim 5, wherein the inference state tracker and the inference policy network are both encoder-decoder structures; both the a priori state tracker and the a priori policy network are encoder-decoder structures.

7. A computer readable storage medium having stored thereon a computer program, which when executed by a processor performs the steps in the semi-supervised multi-round medical session reply generation method of any of claims 1-4.

8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the program, implements the steps of the semi-supervised multi-round medical session reply generation method of any of claims 1-4.