CN116383364A

CN116383364A - Medical question-answering reply method and system based on doctor feedback and reinforcement learning

Info

Publication number: CN116383364A
Application number: CN202310600962.XA
Authority: CN
Inventors: 王振宇
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2023-05-26
Filing date: 2023-05-26
Publication date: 2023-07-04
Anticipated expiration: 2043-05-26
Also published as: CN116383364B

Abstract

The invention discloses a medical question-answering and answering method based on doctor feedback and reinforcement learning, which belongs to the technical field of intersection of natural language processing and medical sanitation, and further improves the professional accuracy and the co-emotion ability of a model through doctor feedback and reinforcement learning so as to have the medical question-answering ability with higher professional degree and have the characteristics of humanized answer and higher professional accuracy; the method comprises the following steps: s1, a large-scale medical document dataset is used for pre-training to obtain a large Chinese medical text model; s2, further supervising and fine-tuning the Chinese medical text large model by using the large-scale medical question-answering dialogue data set to obtain a Chinese medical question-answering reference model; s3, inputting the patient questions to the Chinese medical question-answering reference model to generate a plurality of machine replies, and performing medical feedback labeling on the plurality of machine replies by a doctor, and training to obtain an automatic medical reply effect evaluation model; and S4, obtaining a medical question-answer reply generation model after further training by adopting a reinforcement learning method.

Description

Medical question-answering reply method and system based on doctor feedback and reinforcement learning

Technical Field

The invention relates to the technical field of intersection of natural language processing and medical treatment and health, in particular to a medical question-answer reply method based on doctor feedback and reinforcement learning. The invention also relates to a system for implementing the method.

Background

The following contents exist in the field of medical question and answer at present:

before ChatGPT, the methods for generating the medical question-answer reply mainly comprise two types, namely a search type medical question-answer reply generation method and a generation type medical question-answer reply generation method. However, since the search content of the search type medical question-answer generating method is limited, the model of the traditional generation type medical question-answer generating method is smaller, and the complex human needs cannot be understood and answered, for example, the patient has no professional questions on the description of the diseases, and other models are needed to be added additionally to assist the answer.

After ChatGPT is born, although it achieves excellent results in many natural language understanding and generating tasks and even a large number of other downstream tasks, in practical application, it is considered to be quite understandable to the needs of people and have co-emotion ability, but we find that it has not yet well answered medical questions, because the medical field is special, the answer needs a certain expertise, and ChatGPT only introduces feedback of ordinary people and also lacks a certain medical knowledge. Therefore, it is needed to design a more intelligent medical question-answering method to solve the above problems.

Disclosure of Invention

The invention aims to provide a medical question-answering and answering method based on doctor feedback and reinforcement learning, which further improves the professional accuracy and co-emotion capability of a model through doctor feedback and reinforcement learning so as to have the medical question-answering capability with higher professional degree and has the characteristics of humanization and higher professional accuracy.

Another object of the present invention is to provide a medical question-answering system based on doctor feedback and reinforcement learning, by which a medical question-answering with high professional accuracy can be generated.

The former technical scheme adopted by the invention is as follows:

a medical question-answering method based on doctor feedback and reinforcement learning comprises the following steps:

s1, a large-scale medical document dataset is used for pre-training to obtain a large Chinese medical text model;

s2, further monitoring and fine-tuning the large Chinese medical text model obtained in the step 1 by using a large-scale medical question-answering dialogue data set to obtain a Chinese medical question-answering reference model;

s3, inputting the patient questions to the Chinese medical question-answering reference model obtained in the step 2 to generate a plurality of machine replies, marking the medical feedback of the plurality of machine replies by a doctor, and training to obtain an automatic medical reply effect evaluation model based on the medical feedback marking of the doctor;

and S4, obtaining a medical question-answer reply generation model after further training by adopting a reinforcement learning method.

Further, the step S1 includes the following steps:

s1.1, collecting medical documents in the medical field, arranging the collected medical documents into medical texts, and integrating the medical texts into a large-scale medical document data set;

s1.2, inputting each medical text in the medical document data set obtained in the step S1.1 into a medical input embedding layer one by one, and preprocessing the medical text into a medical text matrix;

s1.3, inputting the preprocessed medical text matrix into an Encoder with a plurality of Encoder layers stacked, outputting medical upper and lower Wen Yuyi matrixes with the same size, and creating a history medical reply matrix;

s1.4, inputting the medical upper and lower Wen Yuyi matrix and the historical medical reply matrix obtained in the step S1.3 into a Decoder containing a plurality of Decoder layer stacks, and adding the medical reply vector output by the Decoder into the historical medical reply matrix;

s1.5, outputting medical words in real time by passing each medical reply vector generated by the decoder through an output layer, and finally forming a medical text output;

s1.6, combining the four parts of the medical input embedded layer, the encoder, the decoder and the output layer in the steps S1.2-S1.5 to obtain the Chinese medical text large model.

Further, the medical input embedding layer in the step S1.2 includes a medical word embedding layer, a linear transformation layer and a relative position embedding layer, where the medical word embedding layer converts each medical word into a medical word vector by looking up a medical word list, and the medical word vectors are stacked longitudinally and connected in parallel to obtain a medical text matrix, the linear transformation layer reduces the length of each line of the medical text matrix by linear transformation, and the relative position embedding layer adds the relative position information learned by the model to each line of the medical text matrix.

Further, in the step S1.3, each Encoder layer of the Encoder outputs a hidden layer matrix with the same size as the input, each Encoder layer includes two sub-layers, i.e., a multi-head attention mechanism layer and a full connection layer, and the output of each sub-layer has a residual connection and is subjected to layer normalization processing.

Further, each attention header of the multi-header attention mechanism layer of the Encoder layer comprises

、/>

、/>

Three matrices to be learned, wherein,

for the length of each line of the medical text matrix, +.>

and />

Each medical text matrix is subjected to +.>

Or->

、/>

The dimension of the vector after matrix linear transformation, the input medical text matrix is combined with +.>

、/>

、/>

Multiplying one by one to obtain Q, K, V three matrixes respectively and then performing output operation to obtain output +.>

The output operation formula is: />

；

wherein ,

is->

Transposed matrix of>

Is a normalized exponential function;

and after the outputs of the plurality of attention heads are connected in parallel, the output of the multi-head attention mechanism layer is obtained, and the output calculation formula is as follows:

wherein ,

for the number of attention heads +.>

Is->

Q, K, V matrix of individual attention heads, +.>

Indicate->

The size of the individual attention head is +.>

Output matrix of>

Indicate->

The size of the attention heads is

Output matrix of>

For word number, ++>

For the matrix to be learned, < > for>

Is a transversal parallel function>

The output of the multi-head attention mechanism layer;

the full-connection layer is a two-layer full-connection neural network, and the output of the full-connection layer

The calculation formula of (2) is as follows:

wherein ,

for input matrix +.>

Are all parameters to be trained of the model, and are->

Is the number of neurons in the middle hidden layer.

Further, the step S2 includes the following steps:

s2.1, collecting a question-answer data set in the medical field, and integrating the question-answer data set into a large-scale medical question-answer dialogue data set;

s2.2, based on the large-scale medical question-answering dialogue data set, performing supervision fine tuning on the pre-trained Chinese medical text large model to obtain a Chinese medical question-answering reference model.

Further, the step S3 includes the following steps:

s3.1 giving input patient questions based on Chinese medical question-answering reference model

Generate->

Personal machine recovery

, wherein />

Is the mth machine reply;

s3.2 doctor pair

Machine reply vector +.>

Ordering the quality of (2) to obtain ordered machine reply +.>

Will->

Dividing into ∈10 by permutation and combination method>

The doctor feeds back the comparison pair, and the patient question is combined again>

Obtaining a doctor feedback comparison pair containing patient problems>

；

S3.3 pair

Repeating steps S3.1 to S3.2 for individual patient problems to obtain +.>

A doctor feedback comparison pair containing the patient problem is constructed to obtain a doctor feedback data set;

s3.4, training to obtain an automatic medical recovery effect evaluation model based on a doctor feedback data set, wherein a calculation formula of a loss function of the automatic medical recovery effect evaluation model is as follows:

；

wherein ,

for the model parameter +.>

In the case of (a) an automated medical return effect evaluation model for patient question +.>

And single machine reply->

Scalar output score of->

The physician is fed back with the labels of the data sets.

Further, the step S4 includes the following steps:

s4.1, backing up a Chinese medical question-answering reference model obtained in the step S2, and naming the backed up model as an RL model;

s4.2 randomly extracting a patient question in the Large Scale medical question and answer dialogue dataset

Generating a primary machine using a RL modelReply->

；

S4.3 patient problems to be extracted

And its corresponding machine reply->

Obtaining a score +.f of the output of the automatic evaluation model as the input of the primary automatic evaluation model obtained in step 3>

；

S4.4, based on a reinforcement learning algorithm, utilizing the scores output by the automatic evaluation model in the step S4.3

Updating parameters of the RL model;

s4.5, repeating the steps S4.2 to S4.4 for a plurality of times to obtain an RL model, wherein the obtained final RL model is the trained medical question-answering reply generation model.

Further, in the step S4.4, the objective function of the learning algorithm is strengthened

The formula is:

wherein ,

for RL model->

For distribution of data for reinforcement learning, +.>

In order to supervise the training model,

for distribution of training data during pre-training +.>

Is KL reward coefficient, < > is>

Is the loss factor of the pre-training.

The latter technical scheme adopted by the invention is as follows:

a medical question-answering system based on doctor feedback and reinforcement learning, comprising:

a large-scale medical document dataset module: the medical document collection system is used for collecting medical field documents, sorting the medical field documents into medical texts and integrating the medical text into a medical document data set;

chinese medical text large model module: the method is used for storing a Chinese medical text large model;

chinese medical question-answering benchmark model module: the method is used for storing a Chinese medical question-answer reference model;

automatic medical recovery effect evaluation model module: the method is used for evaluating the trained Chinese medical question-answering reference model;

model training module: the medical question-answer reply generation model is trained according to the large-scale medical document data set module, the Chinese medical question-answer reference model module and the automatic medical answer effect evaluation model module;

the medical question-answer reply generation model module: the medical question-answer reply generation model is used for storing the trained medical question-answer reply generation model;

the automatic medical response effect evaluation model is connected with the Chinese medical question-and-answer reference model module and the model training module, and the model training module is also connected with the Chinese medical question-and-answer reference model module and the medical question-and-answer generation model module.

Compared with the prior art, the invention has the following beneficial effects:

1. according to the medical question-answering reply method based on doctor feedback and reinforcement learning, a large-scale medical document dataset is utilized for pre-training to obtain a Chinese medical text large model; further supervising and fine-tuning the Chinese medical text large model by using the large-scale medical question-answering dialogue data set to obtain a Chinese medical question-answering reference model; inputting a patient problem into a Chinese medical question-answering reference model to generate a plurality of machine replies, marking the medical feedback of the plurality of machine replies by a doctor, and training to obtain an automatic medical reply effect evaluation model based on the medical feedback marking of the doctor; and obtaining a medical question-answer reply generation model after further training by adopting a reinforcement learning method. According to the method, the Chinese medical question-answering reference model has the medical question-answering capability with higher professional degree through the mode of large-scale medical document pre-training and large-scale medical question-answering dialogue fine tuning, and the professional accuracy and the co-emotion capability of the model are further improved through doctor feedback and reinforcement learning, so that the medical question-answering capability with higher professional degree is provided, the mechanical and non-professional problems of the traditional medical question-answering reply generation method are solved, and the reply is humanized and has high professional accuracy.

2. According to the medical question-answer reply system based on doctor feedback and reinforcement learning, a model training module is used for training a large-scale medical document data set module to obtain a large Chinese medical text model module, the large-scale medical document data set module is used for carrying out supervision and fine adjustment on the large Chinese medical text model module to obtain a standard Chinese medical question-answer model module, an automatic medical answer effect evaluation model module is trained by the standard Chinese medical question-answer model module, and the model training module is used for generating a model according to the standard Chinese medical question-answer model module and the medical question-answer trained by the automatic medical answer effect evaluation model module and then storing the generated model into a medical question-answer generation model module, so that medical question replies with high professional accuracy can be generated by the medical question-answer generation model module.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:

FIG. 1 is a schematic flow diagram of the steps of the method of the present invention;

FIG. 2 is a schematic diagram of a model architecture of a large model of Chinese medical text of the method of the present invention;

FIG. 3 is a schematic view of the Encoder layer structure of the large model of Chinese medical text of the method of the present invention;

FIG. 4 is a schematic diagram of the Decoder layer structure of the large model of Chinese medical text according to the method of the present invention;

fig. 5 is a block diagram of the connection of the modules of the system of the present invention.

Description of the embodiments

The technical scheme of the present invention will be described in further detail below with reference to the specific embodiments, but the present invention is not limited thereto.

As shown in fig. 1 to 4, a medical question-answering method based on doctor feedback and reinforcement learning of the present invention includes the following steps:

s1, a large Chinese medical text model is obtained by pre-training a large-scale medical document data set.

Further, the method comprises the following steps:

s1.1, collecting medical documents in the medical field, sorting the collected medical documents into medical texts, and integrating the medical texts into a large-scale medical document data set, wherein each sample in the medical document data set is the medical text.

S1.2, inputting each medical text in the medical document data set obtained in the step S1.1 into a medical input embedding layer one by one, and preprocessing the medical text into a medical text matrix.

The medical word embedding layer comprises a medical word embedding layer, a linear transformation layer and a relative position embedding layer, wherein the medical word embedding layer converts each medical word into a medical word vector by looking up a medical word list, the medical word vectors are longitudinally stacked and connected in parallel to obtain a medical text matrix, and the linear transformation layer reduces the length of each line of the medical text matrix through linear transformation so as to reduce the calculated amount. The relative position embedding layer adds the relative position information learned by the model to each row of the medical text matrix and is used for representing the relative position information among words in the medical text.

S1.3, inputting the preprocessed medical text matrix into an Encoder with a plurality of Encoder layers stacked, outputting a medical upper and lower Wen Yuyi matrix with the same size, and creating a history medical reply matrix.

Each Encoder layer of the Encoder outputs a hidden layer matrix with the same size as the input layer, each Encoder layer comprises two sub-layers, namely a multi-head attention mechanism layer and a full connection layer, the output of each sub-layer is provided with a residual connection, and each sub-layer is subjected to layer normalization processing after being subjected to residual connection, as shown in figure 3.

Each attention header of the multi-header attention mechanism layer of the Encoder layer includes

、

、/>

Three matrices to be learned, wherein +.>

For the length of each line of the medical text matrix, +.>

and />

Each medical text matrix is subjected to +.>

Or->

、/>

The dimension of the vector after the matrix linear transformation is generally chosen +.>

. Will inputMedical text matrix and->

、/>

、/>

Multiplying one by one to obtain Q, K, V matrixes respectively;

wherein ,

for input matrix +.>

Is the number of words entered at one time.

Then output operation is carried out to obtain the output of a single attention head

The output operation formula is:

；

wherein ,

is->

Transposed matrix of>

The normalized exponential function is a mathematical function, and the input-output relationship is as follows:

wherein ,

is natural logarithmic and is->

Representing each row of vectors in the input matrix>

Is>

The elements.

wherein ,

for the number of attention heads +.>

Is->

Q, K, V matrix of individual attention heads, +.>

Indicate->

The size of the individual attention head is +.>

Output matrix of>

Indicate->

The size of the attention heads is

Output matrix of>

For word number, ++>

For the matrix to be learned, < > for>

Is a transversal parallel function>

Is the output of the multi-head attention mechanism layer.

The calculation formula of (2) is as follows:

wherein ,

for the input matrix, for the output after residual connection summation and normalization processing after the previous multi-head attention mechanism layer,/for the input matrix>

Are all parameters to be trained of the model, and are->

Is the number of neurons in the middle hidden layer.

In the pre-training process, the historical medical reply matrix has higher similarity with the medical text matrix after the decoder is completely generated, and is also a target of the pre-training of the whole Chinese medical text large model.

wherein ,

reference numerals for large-scale medical document data sets, +.>

Is a parameter of +.>

Chinese medical text big model of->

Is a model->

Medical reply matrix generated during pre-training, < >>

And preprocessing the medical text to obtain a medical text matrix.

The Decoder outputs the medical reply vector, and the medical reply vector is obtained from the last row of the matrix output by the last Decoder layer;

each Decoder layer outputs a hidden layer matrix of the same size as the input.

Each Decoder layer, referring to figure 4, has a structure similar to that of the Encoder layer, and comprises two multi-head attention mechanism layers with residual error adding mechanisms and a full-connection layer, and the respective residual error adding output of each sub-layer is normalized by a sub-layer

And (3) operating.

The multi-head attention mechanism layer of the Decoder layer is identical to the multi-head attention mechanism layer of the Encoder layer in structure, and only the input is different from the input:

the first multi-head attention mechanism layer is input into a historical medical reply matrix or a hidden layer matrix;

the second multi-head attention mechanism layer has two inputs, the Q matrix is obtained by taking the output matrix of the first multi-head attention mechanism layer as the input, and the K, V matrix is obtained by a medical upper matrix and a medical lower matrix Wen Yuyi.

The fully connected layer of the Decoder layer is identical to the fully connected layer of the Encoder layer in structure.

the output layer comprises a linear transformation layer, a softmax operation layer and a medical word embedding layer.

The linear transformation layer transforms the length of the medical reply vector to the uniform length of the medical word vector;

the softmax operation layer converts the obtained vector into probability;

the medical word embedding layer is identical to the medical word embedding layer of the input embedding layer.

S1.6 referring to figure 2, the medical input embedding layer, the encoder, the decoder and the output layer of the steps S1.2-S1.5 are combined to obtain the Chinese medical text large model.

S2, further monitoring and fine-tuning the large Chinese medical text model obtained in the step 1 by using the large-scale medical question-answering dialogue data set to obtain a Chinese medical question-answering reference model.

Further, the method comprises the following steps:

s2.1, collecting a question-answer data set in the medical field, and integrating the question-answer data set into a large-scale medical question-answer dialogue data set.

The large-scale medical question-answer dialogue data set comprises input history medical question-answer dialogue and output doctor replies:

(1) Several historical medical question-answer dialogue inputs

, wherein

Question representing patient, ->

Representing doctor's return to the corresponding question, +.>

Representing the number of rounds of successive questions and answers, each item in the input consisting of several words +.>

，/>

Is->

The number of words entered;

(2) Output doctor reply containing several words

。

The supervision fine tuning is specifically to input an input history medical question-answer dialogue into a Chinese medical text large model, revise a generated target into a corresponding output doctor reply, and perform a plurality of rounds of training epoch to fit the doctor reply, wherein a corresponding loss function is as follows:

wherein ,

labels for large scale medical question-answering dialogue dataset,/-for the dialogue dataset>

Is a parameter of +.>

Chinese medical question of (F)Answer reference model->

Is a model->

Predicted medical reply matrix,/->

To output the doctor reply matrix after pretreatment.

S3, inputting the patient questions to the Chinese medical question-answering reference model obtained in the step 2 to generate a plurality of machine replies, marking the medical feedback of the doctor on the plurality of machine replies, and training to obtain an automatic medical reply effect evaluation model based on the medical feedback marking of the doctor.

Further, the method comprises the following steps:

Generate->

Personal machine recovery

, wherein />

Is the mth machine reply.

S3.2 doctor pair

Machine reply vector +.>

Ordering the quality of (2) to obtain ordered machine reply +.>

Will->

Dividing into ∈10 by permutation and combination method>

Obtaining a doctor feedback comparison pair containing patient problems>

。

S3.3 pair

Repeating steps S3.1 to S3.2 for individual patient problems to obtain +.>

And constructing and obtaining a doctor feedback data set by using a doctor feedback comparison pair containing the patient problems.

Wherein, the doctor feedback label is specifically that the doctor inputs the problem of the patient for each patient

Individual machine reply vector

Ordering the quality of (2) to obtain ordered machine reply +.>

。

For example, sampling from each department and inviting several doctors to input patient questions for each patient based on the reference indexes such as the fluency, continuity, accuracy, expertise and the like of the answer

Machine reply vector +.>

The advantages and disadvantages of the (3) are manually ordered, and the ordering is only the preference ordering of doctors and has no strict standard; obtain eachMultiple ordered machine replies to questions +.>

The number is the most in a number of different orders, as the final doctor feedback result +.>

。

The doctor feedback data set is specifically that the doctor feedback result

Divided into->

The doctor feeds back the comparison pair->

A comparison pair is a sample of the doctor feedback data set, altogether +.>

Samples of>

Is at->

and />

A machine reply, which is better for the doctor between, < ->

Representing a combined operation

。

S3.4, training to obtain an automatic medical recovery effect evaluation model based on a doctor feedback data set, wherein a calculation formula of a loss function of the automatic medical recovery effect evaluation model is as follows

An automated medical return effect evaluation model may be used to fit the physician's evaluation, scoring the machine return generated. The automatic medical reply effect evaluation model is trained on the basis of a Chinese medical question and answer reference model, specifically, an output layer is replaced by a linear layer, and the automatic medical reply effect evaluation model is used for mapping a medical reply vector output by a decoder into a score scalar.

The calculation formula of the loss function of the automatic medical recovery effect evaluation model is as follows:

；

wherein ,

for the model parameter +.>

And single machine reply->

Scalar output score of->

The physician is fed back with the labels of the data sets.

S4.1, backing up a Chinese medical question-answer reference model obtained in the step S2, and naming the backed up model as an RL model for reinforcement learning iteration.

Generating a machine reply using RL model>

。

S4.3 extractionPatient problems of (2)

And its corresponding machine reply->

。

The parameters of the RL model are updated.

Further, in the step S4.4, the reinforcement learning algorithm is aimed at updating parameters of the RL model to maximize a combined objective function, the objective function thereof

The formula is:

wherein ,

for RL model->

For distribution of data for reinforcement learning, +.>

In order to supervise the training model,

for distribution of training data during pre-training +.>

Is KL reward coefficient, < > is>

Is the loss factor of the pre-training.

The medical question-answering and answering method based on doctor feedback and reinforcement learning, provided by the invention, has the medical question-answering capability with higher expertise in a Chinese medical question-answering reference model through a mode of large-scale medical document pre-training and large-scale medical question-answering dialogue fine tuning, and further improves the expertise accuracy and co-emotion capability of the model through doctor feedback and reinforcement learning so as to have the medical question-answering capability with higher expertise, solve the mechanical and non-expertise problems of the traditional medical question-answering and answering generation method, and is superior to the traditional medical question-answering model in three aspects of fluency and expertise in answer, and has better performance in a medical question-answering generation task, and humanization and high expertise accuracy.

Referring to fig. 5, a medical question-answering system based on doctor feedback and reinforcement learning is characterized by comprising:

The model training module is used for training the large-scale medical document data set module to obtain a large Chinese medical text model module, the large-scale medical document data set module is used for carrying out supervision and fine adjustment on the large Chinese medical text model module to obtain a Chinese medical question-answer standard model module, the Chinese medical question-answer standard model module is used for training an automatic medical answer effect evaluation model module, the model training module is used for storing a medical question-answer generating model according to the Chinese medical question-answer standard model module and the medical question-answer generated model trained by the automatic medical answer effect evaluation model module, and medical question-answer with high professional accuracy can be generated by the medical question-answer generating model module.

The foregoing description of the preferred embodiments of the invention is intended to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

Claims

1. The medical question-answering method based on doctor feedback and reinforcement learning is characterized by comprising the following steps of:

2. The method for answering a medical question and answer based on doctor feedback and reinforcement study according to claim 1, wherein the step S1 comprises the steps of:

3. The medical question-answering method based on doctor feedback and reinforcement learning according to claim 2, wherein the medical input embedding layer in the step S1.2 includes a medical word embedding layer, a linear transformation layer and a relative position embedding layer, the medical word embedding layer converts each medical word into a medical word vector by looking up a medical word list, the medical word vectors are longitudinally stacked and connected to each other to obtain a medical text matrix, the linear transformation layer reduces the length of each line of the medical text matrix by linear transformation, and the relative position embedding layer adds the relative position information learned by the model to each line of the medical text matrix.

4. The method according to claim 2, wherein in step S1.3, each Encoder layer of the Encoder outputs a hidden layer matrix of the same size as the input, each Encoder layer includes two sub-layers of a multi-headed attention mechanism layer and a full connection layer, and the output of each sub-layer has a residual connection and is subjected to layer normalization.

5. The medical question-answering method based on doctor feedback and reinforcement learning according to claim 4, wherein each attention head of the multi-head attention mechanism layer of the Encoder layer includes

、

、/>

Three matrices to be learned, wherein +.>

For the length of each line of the medical text matrix, +.>

and />

Each medical text matrix is subjected to +.>

Or->

、/>

、/>

、/>

The output operation formula is:

；

wherein ,

is->

Transposed matrix of>

Is a normalized exponential function;

；

wherein ,

to pay attention toNumber of heads->

Is->

Q, K, V matrix of individual attention heads, +.>

Indicate->

The size of the individual attention head is +.>

Output matrix of>

Indicate->

The size of the individual attention head is +.>

Output matrix of>

For word number, ++>

For the matrix to be learned, < > for>

Is a transversal parallel function>

The output of the multi-head attention mechanism layer;

The calculation formula of (2) is as follows:

；

wherein ,

for input matrix +.>

Are all parameters to be trained of the model, and are->

Is the number of neurons in the middle hidden layer.

6. The method for answering a medical question and answer based on doctor feedback and reinforcement study according to claim 1, wherein the step S2 comprises the steps of:

7. The method for providing a medical question-answering based on doctor feedback and reinforcement learning according to claim 5, wherein the step S3 comprises the steps of:

Generate->

Personal machine recovery

, wherein />

Is the mth machine reply;

s3.2 doctor pair

Machine reply vector +.>

Sequencing the quality of (2) to obtain a sequenced machine reply

Will->

Dividing into ∈10 by permutation and combination method>

Obtaining a doctor feedback comparison pair containing patient problems>

；

S3.3 pair

Repeating steps S3.1 to S3.2 for individual patient problems to obtain +.>

；

wherein ,

for the model parameter +.>

And single machine reply->

Scalar output score of->

Feedback of the number of the data set for the doctor, +.>

For Sigmoid function, ++>

，/>

Is natural logarithm.

8. The method for providing a medical question-answering based on doctor feedback and reinforcement learning according to claim 7, wherein the step S4 comprises the steps of:

Generating a machine reply using RL model>

；

S4.3 patient problems to be extracted

And its corresponding machine reply->

；

Updating parameters of the RL model;

9. The method for providing a medical question-answering based on doctor feedback and reinforcement learning according to claim 8, wherein in step S4.4, the objective function of the reinforcement learning algorithm is

The formula is:

；

wherein ,

for RL model->

For distribution of data for reinforcement learning, +.>

In order to supervise the training model,

for distribution of training data during pre-training +.>

Is KL reward coefficient, < > is>

Is the loss factor of the pre-training.

10. A system for implementing the physician feedback and reinforcement learning based question-answering method of claim 1, comprising: