CN113436752A

CN113436752A - Semi-supervised multi-round medical dialogue reply generation method and system

Info

Publication number: CN113436752A
Application number: CN202110577272.8A
Authority: CN
Inventors: 任昭春; 任鹏杰; 陈竹敏; 李冬冬; 马军
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2021-05-26
Filing date: 2021-05-26
Publication date: 2021-09-24
Anticipated expiration: 2041-05-26
Also published as: CN113436752B

Abstract

The invention belongs to the field of conversational information processing, and provides a semi-supervised multi-round medical conversation reply generation method and system. Inputting the questions of the patient in the first round of conversation into a semi-supervised medical conversation model to obtain the reply of the first round of conversation; in the second round of conversation and the subsequent conversation, the problem of the patient in the current round and the reply of the previous round of conversation are input into the semi-supervised medical conversation model to obtain the reply of the corresponding round of conversation until the patient has no new problem to input; the semi-supervised medical dialogue model comprises a context encoder, a prior state tracker, a reasoning strategy state tracker, a prior strategy network, a reasoning strategy network and a reply generator, wherein the context encoder is used for encoding received information and inputting the encoded information into the prior state tracker and the prior strategy network, the prior state tracker is used for continuously tracking the body state of a user, the prior strategy network is used for generating a doctor action, and the reply generator is used for generating a corresponding reply according to the body state and the doctor action.

Description

Semi-supervised multi-round medical dialogue reply generation method and system

Technical Field

The invention belongs to the field of conversational information processing, and particularly relates to a semi-supervised multi-round medical conversation reply generation method and system.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

At the same time, to address the information needs of the open world and the professional needs of the highly vertical world, a conversational paradigm is used to link people with information. Existing dialog systems can be divided into two main categories: task-oriented and open-domain dialog systems. Task-oriented dialog systems are intended to help people accomplish specific tasks. Such as scheduling, ordering restaurants, querying weather. The open domain dialogue system is mainly used for chatting with people and meeting the requirements of people on information and entertainment. Unlike medical questions and answers, a conversation in a real medical scenario is more likely to involve multiple rounds of interaction. As the patient needs to express his/her symptoms, the medication he/she is taking and his/her medical history through the context of the conversation. This property makes explicit state tracking indispensable, which provides more indicative and interpretable information than hidden state representations. In consideration of the specificity of medical conversation, medical reasoning ability (e.g., whether to prescribe drugs, what drugs to prescribe diseases, what symptoms to ask) is also an indispensable characteristic in medical diagnosis.

Existing medical dialogue methods are built based on a task-oriented dialogue paradigm followed by a paradigm in which the patient expresses symptoms and the dialogue system returns a diagnosis (i.e., determines what disease the patient has). It has good effect. However, these methods focus on a single field of diagnosis, which cannot meet the multiple needs of patients in practical applications, and require a large number of manually labeled states and actions. This is not possible when the dialogue data is highly confidential or the data size is huge, and the work is limited by the size of the training data, and even the generative method cannot be used to generate the reply, and the reply can only be composed by means of the template. Some methods of task-based dialog may be applied to state tracking in medical dialogs, but they still do not address scenarios without sufficient annotated data. To alleviate the requirement of the task-oriented dialog system for data annotation, Jin et al and Zhang et al both use semi-supervised learning methods for state tracking, but ignore the reasoning ability of the dialog body, i.e. the actions of unmodeled physicians. Liang et al propose a method to train specific modules in a task-oriented dialog system using incompletely labeled data, but fail to infer unlabeled labels at the time of training, rendering it limited to promote without both state and action labeling in a medical dialog system. The inventors have found that none of these methods allow for retrieval from large-scale medical knowledge, fail to generate a knowledge-rich reply, and perform poorly in scenarios where there is a strong need for reasoning capabilities, such as medical dialogues.

Disclosure of Invention

In order to solve the technical problems in the background art, the invention provides a semi-supervised multi-round medical dialogue reply generation method and system, which simultaneously consider the patient state and the doctor action, so that the dialogue system has the capabilities of modeling the body state of a user and medical reasoning.

In order to achieve the purpose, the invention adopts the following technical scheme:

a first aspect of the invention provides a semi-supervised multi-round medical dialog reply generation method.

A semi-supervised, multi-round medical dialog reply generation method, comprising:

inputting the problems of the patients in the first round of conversation into the semi-supervised medical conversation model to obtain the reply of the first round of conversation;

in the second round of conversation and the subsequent conversation, the problem of the patient in the current round and the reply of the previous round of conversation are input into the semi-supervised medical conversation model to obtain the reply of the corresponding round of conversation until the patient has no new problem to input;

the semi-supervised medical dialogue model comprises a context encoder, a prior state tracker, an inference state tracker, a prior strategy network, an inference strategy network and a reply generator, wherein the context encoder is used for encoding received information and inputting the encoded information into the prior state tracker and the prior strategy network, the prior state tracker is used for continuously tracking the body state of a user, the prior strategy network is used for generating corresponding actions of a doctor, and the reply generator is used for generating corresponding replies according to the body state and the actions of the doctor;

the reasoning state tracker is used for reasoning the physical state of the user, and the reasoning strategy network is used for reasoning the actions of the doctor; the inference state tracker and the inference policy network are only executed during the training phase of the semi-supervised medical dialogue model.

A second aspect of the invention provides a semi-supervised, multi-round medical session reply generation system.

A semi-supervised, multi-round medical session reply generation system, comprising:

the first round of conversation reply generation module is used for inputting the problems of the patients in the first round of conversation into the semi-supervised medical conversation model to obtain the reply of the first round of conversation;

the second round and the subsequent dialogue reply generation module is used for inputting the problems of the patients in the current round and the replies of the previous round of dialogue into the semi-supervised medical dialogue model in the second round and the subsequent dialogues to obtain the replies of the corresponding rounds of dialogue until no new problems are input by the patients;

A third aspect of the invention provides a computer-readable storage medium.

A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the semi-supervised multi-round medical dialog reply generation method as described above.

A fourth aspect of the invention provides a computer apparatus.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program implementing the steps in the semi-supervised multiple round medical session reply generation method as described above.

Compared with the prior art, the invention has the beneficial effects that:

(1) in the second round and the later dialogues, the problems of the patients in the current round and the replies of the previous round of dialogues are input into the semi-supervised medical dialogue model to obtain the replies of the corresponding rounds of dialogues until the patients have no new problems to input, the body states of the users and the actions of doctors are explicitly modeled, and the text span is used for representing, so that the capability of the model for modeling the physiological states of the patients and medical reasoning is improved.

(2) The invention takes the body state and the doctor action of a user as hidden variables on the model level, and provides a training method of the model under the conditions that intermediate labels exist (namely supervision) and the intermediate labels do not exist (namely unsupervised). The method greatly reduces the dependence of the dialogue model on the annotation data.

(3) The invention provides that in the process of strategy network learning, the tracked patient state is used for searching from a large-scale medical knowledge map, and the explicity of generating a reply by a dialog system is improved by the explicit state, the action and the reasoning path in the medical knowledge map.

(4) In model training, the invention provides a two-stage stacking reasoning method, which improves the stability under the condition of less supervised training data.

Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.

FIG. 1(a) is a supervised data training of an embodiment of the present invention;

FIG. 1(b) is an unsupervised data training of an embodiment of the present invention;

FIG. 1(c) is a block diagram used in a testing phase of an embodiment of the present invention;

FIG. 2 illustrates a method for implementing the medical dialog system according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a model in a training process according to an embodiment of the present invention.

Detailed Description

The invention is further described with reference to the following figures and examples.

It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

Interpretation of terms:

Encoder-Decoder (Encoder-Decoder) a neural network structure, which is used to encode and decode a word sequence and convert it into another word sequence, mainly for machine translation, dialogue system, etc.

Encoding (encoding): the sequence of words is represented as a continuous vector.

Decoding (decoding): one continuous vector is represented as the target sequence.

It is expected (expecteration) that each possible result in the experiment is multiplied by the sum of the probability of the result, and the result is expressed in the form of E · in the present invention.

KL Divergence (KL Divergence): the invention is asymmetry measurement of difference between two probability distributions, which is expressed by KL (· | ·), and the calculation formula is as follows:

wherein q, p represent two discrete distributions, q (i), and p (i) represent the probability values of the ith item of the distribution q, p, respectively.

Latent variables (Latent variable): latent variables, or hidden variables, latent variables, represent statistically unobservable random variables as opposed to observed variables.

Training phase (Train): the training stage of the neural network model receives training data as input, and parameters in the neural network model are continuously adjusted through training samples.

Test phase (Test): after the neural network model is trained, the label information and the like corresponding to the input data are output through the trained parameters of the Bible network model in the testing stage. Hereafter we also refer to as the deployment phase.

Example one

The embodiment provides a semi-supervised multi-round medical dialogue reply generation method, which comprises the following steps:

The context encoder is used for encoding the received information; direct coding of questions for the patient of the first session; for the questions of the patient in the second and later dialogues and the corresponding replies of the previous dialogues, context information is formed by encoding and is input into five modules, namely a priori state tracker, an inference state tracker, a priori strategy network, an inference strategy network and a reply generator.

The input signals of the prior state tracker are: output probability distribution of inference state tracker from previous dialog round

Sampled state instances

Output probability distribution

The input signals of the inference state tracker are: output probability distribution of inference state tracker from previous dialog round

Sampled state instances

And physician's reply R of the current round_tOutput probability distribution

The input signals of the prior strategy network are: from the output probability distribution of the inference state tracker in the current round of dialog

Sampled state instances

And an external medical knowledge map G, outputting a probability distribution

The input signals of the inference strategy network are: from the output probability distribution of the inference state tracker in the current round of dialog

Sampled state instances

And physician's reply R of the current round_tOutput probability distribution

The input signals of the reply generator are: the reply generator input is divided into two cases, a training phase and a testing phase (namely, the time of deployment), wherein the training phase receives

And from the probability distribution

Example obtained by Midampling

As an input; the test stage receives

And from the probability distribution

Example obtained by Midampling

As input, the dialog reply message R is output_t。

In the actual deployment phase, in each dialogue wheel, given the patient's expression, the medical dialogue system continuously tracks the user's body state using the prior state tracker and generates the corresponding actions of the physician using the prior strategy network, and finally the reply generator generates the corresponding replies corresponding to the process of fig. 1(c) in combination with the state and actions sampled from the prior state tracker and the prior strategy network. The session process continues until the patient has no new questions entered, i.e., the patient actively ends the current session.

The medical dialog system has two key features: patient status (symptoms, medications, etc.) and physician actions (treatments, diagnoses, etc.). These two features make the medical dialog system more complex than other knowledge-intensive dialog scenarios. Similar to the task-oriented dialog system, the medical dialog generation process is split into the following three phases:

(1) tracking the state of the patient: for a given dialog history, the dialog system tracks the physical state (state) of the state;

(2) and (3) physician strategy learning: given the patient status and session history, the dialog system gives the current physician's actions (action);

(3) and (3) generating a medical response: given the dialog history, tracked states, and predicted actions, a fluent and accurate natural language response is given.

For the scenario with labeled data, in the t-th round of the session, the patient gives a question or describes his own symptoms U_tThe later medical dialogue system receives the reply R of the previous round_t-1Problem of current wheel U_tAnd the state S tracked in the previous round_t-1Then outputs the state S of the current wheel_tThen reuse R_t-1U_tS_tOutputting the action A that the current round physician should take_tFinally, a reply R in the form of a natural language is generated_tAnd feeding back to the patient. In medical dialog systems, however, there are many cases where there is no annotation on the physiological state of the patient and the actions of the physician. We consider both state and action as hidden variables and, considering that state runs through the whole dialog process, we use a sequence of words to represent; the same is true of physician actions, which may include multiple keywords in the physician's reply. In actual operation, the lengths of the states and actions are set to fixed lengths | S | and | a |, respectively. And the state has an initial value of "<pad><pad>...<pad>", wherein"<pad>"means a filler word. The details of State and Action design are as follows:

and (3) designing the state: the state is used to record information of the user's physical state acquired by the dialog system throughout the dialog process, which is represented using a sequence of words, such as "cold fever cough night sweat", and which is initialized to "< pad > < pad >".

action is designed to indicate a summary of physician replies, and also uses a representation of a sequence, such as "999 Ganmaoling granule yangtao syrup.

The semi-supervised medical dialogue model comprises six modules, namely a context coder (context encoder), a prior state tracker (prior state tracker), an inference state tracker (inference state tracker), a prior policy network (prior policy network), an inference policy network (inference state tracker) and a reply generator (response generator). Often, multiple interactions are involved throughout a medical session, with the following process going through multiple rounds until the session is complete.

Wherein the priori state tracker and the inference state tracker are used for tracking the state of the patient, and the inference state tracker is only executed in a training stage; a priori strategy network, a reasoning strategy network for physician strategy learning, wherein the reasoning strategy network is only executed in a training phase; the reply generator is used for medical reply generation. The following is mainly from an unsupervised point of view, i.e. using unsupervised data D^uInput and output of each block will be described with reference to fig. 1 (b).

In round t, the context encoder is a GRU-based (or LSTM-based, transform, Bert) -based encoder that receives the reply R of the previous round_t-1And problems U of patients of the current round_tAs input, and outputs a continuous space vector

To represent the context of the conversation.

At the t-th round, a previous round is given a return R_t-1And problems U of patients of the current round_tAs an input, the context Encoder first obtains a representation H of the granularity of a sequence of words using bidirectional GRU Encoder encoding_t＝{h_t,1,h_t,2,…,h_t，M+NAnd outputs a vector

To represent the context of the conversation. Wherein M and N each represent R_t-1 and U_tThe length of the sequence.

wherein

Represents R_t-1Word embedding (embedding) of the ith word, the BiGRU encoder adopts the context representation of the last time

Initialization, attn [17 ]]The attention operation is shown.

A priori state tracker receiving context encoder output and state of a previous time instant

As input, a GRU-based decoder is then employed to output a sequence of words, i.e.

The inference state tracker adopts a similar structure of the prior state tracker, but additionally accepts the reply R of the current wheel_tAs input, it outputs a sequence of words, i.e.

We use

And

respectively representing a priori state tracker and an inference state tracker, generating a probability distribution is abbreviated as

And

the a priori state tracker and the inference state tracker are both Encoder-Decoder (Encoder-Decoder) structures. In the case of unsupervised information, the status of all conversation wheels is agnostic,and the state of the next round needs to depend on the state of the previous round as input, so we can get from

Is sampled to obtain

Sending the data to a priori state tracker and an inference state tracker.

The prior state tracker firstly obtains the result of sampling

Is coded into

Use of

A decoder for initializing the a priori state tracker, wherein

Are training parameters. At the ith decoding time, output

Then the sequence is decoded to obtain S_tThe prior distribution of (a) is:

where MLP denotes a Multi-Layer Perceptron (Multi-Layer Perceptron). | S | is the length of the state text span.

The inference state tracker is similar in structure to the prior state tracker, which also uses the GRU Encoder

Is coded into

Additionally encoding R_tIs composed of

Use thereof

Initializing a decoder, wherein

Are training parameters. At the ith decoding time, output

Then the sequence is decoded to obtain S_tApproximate posterior distribution of (a):

a priori policy network receiving context encoder output, S of the current round_tAnd external medical knowledge G as input, and then using a GRU-based decoder to output a sequence of words, i.e. a word-to-word basis

Inference policy network architecture is similar, it receives

S_tAnd additionally receives a current wheel reply R_tAs output, a sequence of words is post-output, i.e.

We use

And

respectively representing a priori policy network and an inference policy network, abbreviated as

And

the prior strategy network and the inference strategy network are also the structure of the Encoder-Decoder. Wherein the a priori policy network is selected from

Is sampled to obtain

Inference policy network from

Is sampled to obtain

Before introducing two strategy networks, a knowledge graph retrieval operation qsub and a knowledge graph coding operation RGAT are introduced [15 ]]. qsub is searched from the G by using the tracked state from the medical knowledge map G to obtain a sub-map G_nTaking the state as the starting point, extracting all nodes and edges which can be reached by the n-step jump, and connecting all nodes appearing in the state to ensure the graph G_nAre all connected. RGAT is a graph coding method, which combines the types of edges, and obtains an embedding representation of nodes after multiple propagation, i.e. a vector representation on a continuous space. We use

Represents G_nEncoded node representation, wherein_nL is G_nThe number of nodes in (1).

The prior strategy network uses GRU Encoder

Is coded into

For later use

To initialize the decoder and output at the ith decoding time

The decoding process comprises two parts, one is generated from a word list, and the other is a knowledge graph G obtained by retrieval_nTo make a copy.

wherein e_jRepresents G_nJ-th node in (1), g_jTo represent

Embedding of the jth node in the hierarchy. Z_ATo generate

Copied canonical terms. At e_j＝A_t,iIn case of (e) is_j,A_t,i) 1, otherwise I (e)_j,A_t,i)＝0。

Then A is_tThe prior distribution of (a) can be expressed as:

inference policy network uses GRU Encoder coding

Is coded into

Code R_tIs composed of

For later use

Initializing decoder, outputting at i-th decoding time

To strengthen R_tFor the effect of the results, for A_tApproximating the posterior distribution, we only consider the direct probability of generation.

The reply generator is a GRU-based decoder that receives the context encoder output

S_t and A_tAs input, then outputs a medical response R_t. Use of

Representing a reply generator, abbreviated as

The reply generator uses only the outputs of the inference state tracker and the inference policy network during the unsupervised training phase. During unsupervised training, we follow

And

are respectively sampled to obtain

And

encode it into

And

the decoder which later initializes the reply generator is

Output at the i-th decoding time

Then R is obtained_tThe output probability of (c) is:

wherein

Representing the probability generated from the vocabulary,

represents from

R_t-1 and U_tProbability of medium copy, | R | is the length of the reply.

The training loss function for supervised and unsupervised data is L^sup and L^un, wherein L^unComprises the following steps:

where E.denotes expectation, KL (. DELTA.. means KL dispersion).

Considering the instability existing in the training under the condition of less supervision data proportion, namely the prior strategy network is easily misled by wrong state sampled from the prior state tracker. The invention provides a two-stage cascade reasoning training method, which is implemented by using L^unRespectively, a plurality of training parts, since the policy network depends on the output of the state tracker, first of all optimization

And simultaneously optimizing the rest modules to improve the stability in the training process. L is^unIs split into L^s and L^aTwo training objectives:

in the first training phase, L is minimized^sImproving model state tracking performance, minimizing L in the second stage^s+L^aTo maintain the state tracking effect and the strategy learning ability of the training model. We name it as a two-stage stacked inference training method.

Fig. 3 is a schematic diagram of a model in the training process, where global _ step is an integer used to record the number of training passes.

In the semi-supervised scenario, there are two parts of supervised and unsupervised data for the dialogue data of model training, and we describe below separately for the supervised data D^aAnd unsupervised data D^uThe training method of (1).

(a) For supervision data D^a

From D^aSampling training samples to form a small batch (namely mini-batch) required by training, and obtaining data R_t-1,U_t,S_t-1,S_t,A_t，R_t. The corresponding input data is fed into the above 6 modules, corresponding to (a) in fig. 1. Training was performed using Negative Log Likehood (NLL) Loss. The actual training loss function is:

(b) for unsupervised data D^u

From D^uSampling training samples to form a small batch (namely mini-batch) required by training, and obtaining data R_t-₁,U_t,R_t. Intermediate annotation data S_t-1,S_t,A_tThe absence of any label was observed. We start from

Is sampled to obtain

Then will

Is sent into

And

in (1). And then, from

And

respectively obtained by sampling

And

respectively as

And

is input. And then from

Is sampled to obtain

Finally, the

Bound to R_t-1,U_tGenerating a reply R_tThe above procedure corresponds to (b) in fig. 1. Training loss to L^un(L may be selected as well)^s+L^aAs a training loss to improve training stability).

For the entire training data set D ═ D^a，D^uThe specific training steps are as follows:

step1 hypothesis supervisory data D^aThe proportion of the total training data D is alpha (alpha is more than or equal to 0 and less than or equal to 1), random numbers between 0 and 1 are selected, if the random numbers are less than alpha to Step2, and if the random numbers are more than alpha to Step 3.

Step2, adopting a training model for supervising data, and corresponding to the mode (a), training the loss to be L^supThe gradient is decreased and the parameters are updated, and then Step4 is carried out.

Step3, adopting a training model for supervising data, and corresponding to the mode (b), training the loss to be L^unThe gradient is decreased and the parameters are updated, and then Step4 is carried out.

And Step4, judging whether the model converges, if so, turning to Step5, and otherwise, turning to Step 1.

Step5 saving the model weights and ending the training, as shown in FIG. 3.

Using the medical session data sets disclosed in the industry and academia at present, semi-supervised medical session models were trained. And sending the sampled supervised data and unsupervised data into a model, calculating a corresponding loss function, then performing gradient descent, and optimizing model parameters.

After the model training is completed, the parameters of the model are all fixed, and the inference state tracker and the inference strategy network can be discarded. At this time, the model can be applied to the actual dialog scenario. As shown in FIG. 2, given the patient problem input model, the context encoder, the prior state tracker, the prior strategy network, and the reply generator work sequentially, (at this point the reply generator uses only the prior state tracker output

And the output of the prior policy network

As input), and finally generates a reply back to the user. The conversation system continuously interacts with the patient, the prior state tracker uses the state of the previous moment as input in each conversation wheel, then updates the tracked body state of the patient, waits for a period of time, does not receive new problems of the patient, and ends the current conversation.

Example two

the semi-supervised medical dialogue model comprises a context encoder, a prior state tracker, a reasoning strategy state tracker, a prior strategy network, a reasoning strategy network and a reply generator, wherein the context encoder is used for encoding received information and inputting the encoded information into the prior state tracker and the prior strategy network, the prior state tracker is used for continuously tracking the body state of a user, the prior strategy network is used for generating corresponding actions of a doctor, and the reply generator is used for generating corresponding replies according to the body state and the actions of the doctor;

Each module in this embodiment corresponds to each step in the first embodiment one to one, and the specific implementation process is the same, which will not be described again here.

EXAMPLE III

The present embodiments provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps in the semi-supervised multi-round medical dialog reply generation method as described above.

Example four

The present embodiment provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the steps of the semi-supervised multi-round medical session reply generation method as described above.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A semi-supervised multi-round medical dialogue reply generation method is characterized by comprising the following steps:

the semi-supervised medical dialogue model comprises a context encoder, a prior state tracker, a reasoning strategy state tracker, a prior strategy network, a reasoning strategy network and a reply generator, wherein the context encoder is used for encoding received information and inputting the encoded information into the prior state tracker and the prior strategy network, the prior state tracker is used for continuously tracking the body state of a user, the prior strategy network is used for generating a doctor action, and the reply generator is used for generating a corresponding reply according to the body state and the doctor action;

2. The semi-supervised, multi-round medical dialog reply generation method of claim 1, wherein the inference state tracker and the inference policy network are both encoder-decoder architectures.

3. The semi-supervised multiple round of medical dialog reply generation method of claim 1, wherein the a priori state tracker and the a priori policy network are both encoder-decoder structures.

4. The method of claim 1, wherein the response generator is a GRU-based decoder.

5. The semi-supervised, multi-round medical dialog reply generation method of claim 1, wherein the semi-supervised medical dialog model is trained using supervised and unsupervised data.

6. The semi-supervised multiple round of medical conversation reply generation method of claim 5, wherein a training loss function of unsupervised data is split into L^s and L^aTwo training objectives, minimization of L^sImproving model state tracking performance, minimizing L in the second stage^s+L^aSo as to maintain the state tracking effect and the strategy learning ability of the training model.

7. A semi-supervised, multi-round medical dialog reply generation system, comprising:

8. The semi-supervised, multi-round medical dialog reply generation system of claim 7, wherein the inference state tracker and the inference policy network are both encoder-decoder architectures; both the a priori state tracker and the a priori policy network are encoder-decoder structures.

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the semi-supervised multi-round medical dialog reply generation method of any one of claims 1-6.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps in the semi-supervised multiple round medical session reply generation method of any one of claims 1-6.