CN116383364A - Medical question-answering reply method and system based on doctor feedback and reinforcement learning - Google Patents
Medical question-answering reply method and system based on doctor feedback and reinforcement learning Download PDFInfo
- Publication number
- CN116383364A CN116383364A CN202310600962.XA CN202310600962A CN116383364A CN 116383364 A CN116383364 A CN 116383364A CN 202310600962 A CN202310600962 A CN 202310600962A CN 116383364 A CN116383364 A CN 116383364A
- Authority
- CN
- China
- Prior art keywords
- medical
- model
- question
- answering
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 230000002787 reinforcement Effects 0.000 title claims abstract description 40
- 238000012549 training Methods 0.000 claims abstract description 52
- 238000013210 evaluation model Methods 0.000 claims abstract description 40
- 230000000694 effects Effects 0.000 claims abstract description 31
- 239000011159 matrix material Substances 0.000 claims description 76
- 239000013598 vector Substances 0.000 claims description 26
- 230000007246 mechanism Effects 0.000 claims description 20
- 230000006870 function Effects 0.000 claims description 16
- 230000009466 transformation Effects 0.000 claims description 14
- 238000011084 recovery Methods 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000004422 calculation algorithm Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 4
- 238000007781 pre-processing Methods 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000012544 monitoring process Methods 0.000 claims description 3
- 210000002569 neuron Anatomy 0.000 claims description 3
- 230000004044 response Effects 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims 1
- 238000003058 natural language processing Methods 0.000 abstract description 2
- 238000002372 labelling Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 3
- 230000005477 standard model Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000007620 mathematical function Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Public Health (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Epidemiology (AREA)
- Pathology (AREA)
- Primary Health Care (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention discloses a medical question-answering and answering method based on doctor feedback and reinforcement learning, which belongs to the technical field of intersection of natural language processing and medical sanitation, and further improves the professional accuracy and the co-emotion ability of a model through doctor feedback and reinforcement learning so as to have the medical question-answering ability with higher professional degree and have the characteristics of humanized answer and higher professional accuracy; the method comprises the following steps: s1, a large-scale medical document dataset is used for pre-training to obtain a large Chinese medical text model; s2, further supervising and fine-tuning the Chinese medical text large model by using the large-scale medical question-answering dialogue data set to obtain a Chinese medical question-answering reference model; s3, inputting the patient questions to the Chinese medical question-answering reference model to generate a plurality of machine replies, and performing medical feedback labeling on the plurality of machine replies by a doctor, and training to obtain an automatic medical reply effect evaluation model; and S4, obtaining a medical question-answer reply generation model after further training by adopting a reinforcement learning method.
Description
Technical Field
The invention relates to the technical field of intersection of natural language processing and medical treatment and health, in particular to a medical question-answer reply method based on doctor feedback and reinforcement learning. The invention also relates to a system for implementing the method.
Background
The following contents exist in the field of medical question and answer at present:
before ChatGPT, the methods for generating the medical question-answer reply mainly comprise two types, namely a search type medical question-answer reply generation method and a generation type medical question-answer reply generation method. However, since the search content of the search type medical question-answer generating method is limited, the model of the traditional generation type medical question-answer generating method is smaller, and the complex human needs cannot be understood and answered, for example, the patient has no professional questions on the description of the diseases, and other models are needed to be added additionally to assist the answer.
After ChatGPT is born, although it achieves excellent results in many natural language understanding and generating tasks and even a large number of other downstream tasks, in practical application, it is considered to be quite understandable to the needs of people and have co-emotion ability, but we find that it has not yet well answered medical questions, because the medical field is special, the answer needs a certain expertise, and ChatGPT only introduces feedback of ordinary people and also lacks a certain medical knowledge. Therefore, it is needed to design a more intelligent medical question-answering method to solve the above problems.
Disclosure of Invention
The invention aims to provide a medical question-answering and answering method based on doctor feedback and reinforcement learning, which further improves the professional accuracy and co-emotion capability of a model through doctor feedback and reinforcement learning so as to have the medical question-answering capability with higher professional degree and has the characteristics of humanization and higher professional accuracy.
Another object of the present invention is to provide a medical question-answering system based on doctor feedback and reinforcement learning, by which a medical question-answering with high professional accuracy can be generated.
The former technical scheme adopted by the invention is as follows:
a medical question-answering method based on doctor feedback and reinforcement learning comprises the following steps:
s1, a large-scale medical document dataset is used for pre-training to obtain a large Chinese medical text model;
s2, further monitoring and fine-tuning the large Chinese medical text model obtained in the step 1 by using a large-scale medical question-answering dialogue data set to obtain a Chinese medical question-answering reference model;
s3, inputting the patient questions to the Chinese medical question-answering reference model obtained in the step 2 to generate a plurality of machine replies, marking the medical feedback of the plurality of machine replies by a doctor, and training to obtain an automatic medical reply effect evaluation model based on the medical feedback marking of the doctor;
and S4, obtaining a medical question-answer reply generation model after further training by adopting a reinforcement learning method.
Further, the step S1 includes the following steps:
s1.1, collecting medical documents in the medical field, arranging the collected medical documents into medical texts, and integrating the medical texts into a large-scale medical document data set;
s1.2, inputting each medical text in the medical document data set obtained in the step S1.1 into a medical input embedding layer one by one, and preprocessing the medical text into a medical text matrix;
s1.3, inputting the preprocessed medical text matrix into an Encoder with a plurality of Encoder layers stacked, outputting medical upper and lower Wen Yuyi matrixes with the same size, and creating a history medical reply matrix;
s1.4, inputting the medical upper and lower Wen Yuyi matrix and the historical medical reply matrix obtained in the step S1.3 into a Decoder containing a plurality of Decoder layer stacks, and adding the medical reply vector output by the Decoder into the historical medical reply matrix;
s1.5, outputting medical words in real time by passing each medical reply vector generated by the decoder through an output layer, and finally forming a medical text output;
s1.6, combining the four parts of the medical input embedded layer, the encoder, the decoder and the output layer in the steps S1.2-S1.5 to obtain the Chinese medical text large model.
Further, the medical input embedding layer in the step S1.2 includes a medical word embedding layer, a linear transformation layer and a relative position embedding layer, where the medical word embedding layer converts each medical word into a medical word vector by looking up a medical word list, and the medical word vectors are stacked longitudinally and connected in parallel to obtain a medical text matrix, the linear transformation layer reduces the length of each line of the medical text matrix by linear transformation, and the relative position embedding layer adds the relative position information learned by the model to each line of the medical text matrix.
Further, in the step S1.3, each Encoder layer of the Encoder outputs a hidden layer matrix with the same size as the input, each Encoder layer includes two sub-layers, i.e., a multi-head attention mechanism layer and a full connection layer, and the output of each sub-layer has a residual connection and is subjected to layer normalization processing.
Further, each attention header of the multi-header attention mechanism layer of the Encoder layer comprises、/>、/>Three matrices to be learned, wherein,for the length of each line of the medical text matrix, +.> and />Each medical text matrix is subjected to +.>Or->、/>The dimension of the vector after matrix linear transformation, the input medical text matrix is combined with +.>、/>、/>Multiplying one by one to obtain Q, K, V three matrixes respectively and then performing output operation to obtain output +.>The output operation formula is: />;
and after the outputs of the plurality of attention heads are connected in parallel, the output of the multi-head attention mechanism layer is obtained, and the output calculation formula is as follows:
wherein ,for the number of attention heads +.>Is->Q, K, V matrix of individual attention heads, +.>Indicate->The size of the individual attention head is +.>Output matrix of>Indicate->The size of the attention heads isOutput matrix of>For word number, ++>For the matrix to be learned, < > for>Is a transversal parallel function>The output of the multi-head attention mechanism layer;
the full-connection layer is a two-layer full-connection neural network, and the output of the full-connection layerThe calculation formula of (2) is as follows:
wherein ,for input matrix +.>Are all parameters to be trained of the model, and are->Is the number of neurons in the middle hidden layer.
Further, the step S2 includes the following steps:
s2.1, collecting a question-answer data set in the medical field, and integrating the question-answer data set into a large-scale medical question-answer dialogue data set;
s2.2, based on the large-scale medical question-answering dialogue data set, performing supervision fine tuning on the pre-trained Chinese medical text large model to obtain a Chinese medical question-answering reference model.
Further, the step S3 includes the following steps:
s3.1 giving input patient questions based on Chinese medical question-answering reference modelGenerate->Personal machine recovery, wherein />Is the mth machine reply;
s3.2 doctor pairMachine reply vector +.>Ordering the quality of (2) to obtain ordered machine reply +.>Will->Dividing into ∈10 by permutation and combination method>The doctor feeds back the comparison pair, and the patient question is combined again>Obtaining a doctor feedback comparison pair containing patient problems>;
S3.3 pairRepeating steps S3.1 to S3.2 for individual patient problems to obtain +.>A doctor feedback comparison pair containing the patient problem is constructed to obtain a doctor feedback data set;
s3.4, training to obtain an automatic medical recovery effect evaluation model based on a doctor feedback data set, wherein a calculation formula of a loss function of the automatic medical recovery effect evaluation model is as follows:
wherein ,for the model parameter +.>In the case of (a) an automated medical return effect evaluation model for patient question +.>And single machine reply->Scalar output score of->The physician is fed back with the labels of the data sets.
Further, the step S4 includes the following steps:
s4.1, backing up a Chinese medical question-answering reference model obtained in the step S2, and naming the backed up model as an RL model;
s4.2 randomly extracting a patient question in the Large Scale medical question and answer dialogue datasetGenerating a primary machine using a RL modelReply->;
S4.3 patient problems to be extractedAnd its corresponding machine reply->Obtaining a score +.f of the output of the automatic evaluation model as the input of the primary automatic evaluation model obtained in step 3>;
S4.4, based on a reinforcement learning algorithm, utilizing the scores output by the automatic evaluation model in the step S4.3Updating parameters of the RL model;
s4.5, repeating the steps S4.2 to S4.4 for a plurality of times to obtain an RL model, wherein the obtained final RL model is the trained medical question-answering reply generation model.
Further, in the step S4.4, the objective function of the learning algorithm is strengthenedThe formula is:
wherein ,for RL model->For distribution of data for reinforcement learning, +.>In order to supervise the training model,for distribution of training data during pre-training +.>Is KL reward coefficient, < > is>Is the loss factor of the pre-training.
The latter technical scheme adopted by the invention is as follows:
a medical question-answering system based on doctor feedback and reinforcement learning, comprising:
a large-scale medical document dataset module: the medical document collection system is used for collecting medical field documents, sorting the medical field documents into medical texts and integrating the medical text into a medical document data set;
chinese medical text large model module: the method is used for storing a Chinese medical text large model;
chinese medical question-answering benchmark model module: the method is used for storing a Chinese medical question-answer reference model;
automatic medical recovery effect evaluation model module: the method is used for evaluating the trained Chinese medical question-answering reference model;
model training module: the medical question-answer reply generation model is trained according to the large-scale medical document data set module, the Chinese medical question-answer reference model module and the automatic medical answer effect evaluation model module;
the medical question-answer reply generation model module: the medical question-answer reply generation model is used for storing the trained medical question-answer reply generation model;
the automatic medical response effect evaluation model is connected with the Chinese medical question-and-answer reference model module and the model training module, and the model training module is also connected with the Chinese medical question-and-answer reference model module and the medical question-and-answer generation model module.
Compared with the prior art, the invention has the following beneficial effects:
1. according to the medical question-answering reply method based on doctor feedback and reinforcement learning, a large-scale medical document dataset is utilized for pre-training to obtain a Chinese medical text large model; further supervising and fine-tuning the Chinese medical text large model by using the large-scale medical question-answering dialogue data set to obtain a Chinese medical question-answering reference model; inputting a patient problem into a Chinese medical question-answering reference model to generate a plurality of machine replies, marking the medical feedback of the plurality of machine replies by a doctor, and training to obtain an automatic medical reply effect evaluation model based on the medical feedback marking of the doctor; and obtaining a medical question-answer reply generation model after further training by adopting a reinforcement learning method. According to the method, the Chinese medical question-answering reference model has the medical question-answering capability with higher professional degree through the mode of large-scale medical document pre-training and large-scale medical question-answering dialogue fine tuning, and the professional accuracy and the co-emotion capability of the model are further improved through doctor feedback and reinforcement learning, so that the medical question-answering capability with higher professional degree is provided, the mechanical and non-professional problems of the traditional medical question-answering reply generation method are solved, and the reply is humanized and has high professional accuracy.
2. According to the medical question-answer reply system based on doctor feedback and reinforcement learning, a model training module is used for training a large-scale medical document data set module to obtain a large Chinese medical text model module, the large-scale medical document data set module is used for carrying out supervision and fine adjustment on the large Chinese medical text model module to obtain a standard Chinese medical question-answer model module, an automatic medical answer effect evaluation model module is trained by the standard Chinese medical question-answer model module, and the model training module is used for generating a model according to the standard Chinese medical question-answer model module and the medical question-answer trained by the automatic medical answer effect evaluation model module and then storing the generated model into a medical question-answer generation model module, so that medical question replies with high professional accuracy can be generated by the medical question-answer generation model module.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:
FIG. 1 is a schematic flow diagram of the steps of the method of the present invention;
FIG. 2 is a schematic diagram of a model architecture of a large model of Chinese medical text of the method of the present invention;
FIG. 3 is a schematic view of the Encoder layer structure of the large model of Chinese medical text of the method of the present invention;
FIG. 4 is a schematic diagram of the Decoder layer structure of the large model of Chinese medical text according to the method of the present invention;
fig. 5 is a block diagram of the connection of the modules of the system of the present invention.
Description of the embodiments
The technical scheme of the present invention will be described in further detail below with reference to the specific embodiments, but the present invention is not limited thereto.
As shown in fig. 1 to 4, a medical question-answering method based on doctor feedback and reinforcement learning of the present invention includes the following steps:
s1, a large Chinese medical text model is obtained by pre-training a large-scale medical document data set.
Further, the method comprises the following steps:
s1.1, collecting medical documents in the medical field, sorting the collected medical documents into medical texts, and integrating the medical texts into a large-scale medical document data set, wherein each sample in the medical document data set is the medical text.
S1.2, inputting each medical text in the medical document data set obtained in the step S1.1 into a medical input embedding layer one by one, and preprocessing the medical text into a medical text matrix.
The medical word embedding layer comprises a medical word embedding layer, a linear transformation layer and a relative position embedding layer, wherein the medical word embedding layer converts each medical word into a medical word vector by looking up a medical word list, the medical word vectors are longitudinally stacked and connected in parallel to obtain a medical text matrix, and the linear transformation layer reduces the length of each line of the medical text matrix through linear transformation so as to reduce the calculated amount. The relative position embedding layer adds the relative position information learned by the model to each row of the medical text matrix and is used for representing the relative position information among words in the medical text.
S1.3, inputting the preprocessed medical text matrix into an Encoder with a plurality of Encoder layers stacked, outputting a medical upper and lower Wen Yuyi matrix with the same size, and creating a history medical reply matrix.
Each Encoder layer of the Encoder outputs a hidden layer matrix with the same size as the input layer, each Encoder layer comprises two sub-layers, namely a multi-head attention mechanism layer and a full connection layer, the output of each sub-layer is provided with a residual connection, and each sub-layer is subjected to layer normalization processing after being subjected to residual connection, as shown in figure 3.
Each attention header of the multi-header attention mechanism layer of the Encoder layer includes、、/>Three matrices to be learned, wherein +.>For the length of each line of the medical text matrix, +.> and />Each medical text matrix is subjected to +.>Or->、/>The dimension of the vector after the matrix linear transformation is generally chosen +.>. Will inputMedical text matrix and->、/>、/>Multiplying one by one to obtain Q, K, V matrixes respectively;
Then output operation is carried out to obtain the output of a single attention headThe output operation formula is:
wherein ,is->Transposed matrix of>The normalized exponential function is a mathematical function, and the input-output relationship is as follows:
wherein ,is natural logarithmic and is->Representing each row of vectors in the input matrix>Is>The elements.
And after the outputs of the plurality of attention heads are connected in parallel, the output of the multi-head attention mechanism layer is obtained, and the output calculation formula is as follows:
wherein ,for the number of attention heads +.>Is->Q, K, V matrix of individual attention heads, +.>Indicate->The size of the individual attention head is +.>Output matrix of>Indicate->The size of the attention heads isOutput matrix of>For word number, ++>For the matrix to be learned, < > for>Is a transversal parallel function>Is the output of the multi-head attention mechanism layer.
The full-connection layer is a two-layer full-connection neural network, and the output of the full-connection layerThe calculation formula of (2) is as follows:
wherein ,for the input matrix, for the output after residual connection summation and normalization processing after the previous multi-head attention mechanism layer,/for the input matrix>Are all parameters to be trained of the model, and are->Is the number of neurons in the middle hidden layer.
In the pre-training process, the historical medical reply matrix has higher similarity with the medical text matrix after the decoder is completely generated, and is also a target of the pre-training of the whole Chinese medical text large model.
wherein ,reference numerals for large-scale medical document data sets, +.>Is a parameter of +.>Chinese medical text big model of->Is a model->Medical reply matrix generated during pre-training, < >>And preprocessing the medical text to obtain a medical text matrix.
The Decoder outputs the medical reply vector, and the medical reply vector is obtained from the last row of the matrix output by the last Decoder layer;
each Decoder layer outputs a hidden layer matrix of the same size as the input.
Each Decoder layer, referring to figure 4, has a structure similar to that of the Encoder layer, and comprises two multi-head attention mechanism layers with residual error adding mechanisms and a full-connection layer, and the respective residual error adding output of each sub-layer is normalized by a sub-layerAnd (3) operating.
The multi-head attention mechanism layer of the Decoder layer is identical to the multi-head attention mechanism layer of the Encoder layer in structure, and only the input is different from the input:
the first multi-head attention mechanism layer is input into a historical medical reply matrix or a hidden layer matrix;
the second multi-head attention mechanism layer has two inputs, the Q matrix is obtained by taking the output matrix of the first multi-head attention mechanism layer as the input, and the K, V matrix is obtained by a medical upper matrix and a medical lower matrix Wen Yuyi.
The fully connected layer of the Decoder layer is identical to the fully connected layer of the Encoder layer in structure.
S1.5, outputting medical words in real time by passing each medical reply vector generated by the decoder through an output layer, and finally forming a medical text output;
the output layer comprises a linear transformation layer, a softmax operation layer and a medical word embedding layer.
The linear transformation layer transforms the length of the medical reply vector to the uniform length of the medical word vector;
the softmax operation layer converts the obtained vector into probability;
the medical word embedding layer is identical to the medical word embedding layer of the input embedding layer.
S1.6 referring to figure 2, the medical input embedding layer, the encoder, the decoder and the output layer of the steps S1.2-S1.5 are combined to obtain the Chinese medical text large model.
S2, further monitoring and fine-tuning the large Chinese medical text model obtained in the step 1 by using the large-scale medical question-answering dialogue data set to obtain a Chinese medical question-answering reference model.
Further, the method comprises the following steps:
s2.1, collecting a question-answer data set in the medical field, and integrating the question-answer data set into a large-scale medical question-answer dialogue data set.
The large-scale medical question-answer dialogue data set comprises input history medical question-answer dialogue and output doctor replies:
(1) Several historical medical question-answer dialogue inputs, wherein Question representing patient, ->Representing doctor's return to the corresponding question, +.>Representing the number of rounds of successive questions and answers, each item in the input consisting of several words +.>,/>Is->The number of words entered;
S2.2, based on the large-scale medical question-answering dialogue data set, performing supervision fine tuning on the pre-trained Chinese medical text large model to obtain a Chinese medical question-answering reference model.
The supervision fine tuning is specifically to input an input history medical question-answer dialogue into a Chinese medical text large model, revise a generated target into a corresponding output doctor reply, and perform a plurality of rounds of training epoch to fit the doctor reply, wherein a corresponding loss function is as follows:
wherein ,labels for large scale medical question-answering dialogue dataset,/-for the dialogue dataset>Is a parameter of +.>Chinese medical question of (F)Answer reference model->Is a model->Predicted medical reply matrix,/->To output the doctor reply matrix after pretreatment.
S3, inputting the patient questions to the Chinese medical question-answering reference model obtained in the step 2 to generate a plurality of machine replies, marking the medical feedback of the doctor on the plurality of machine replies, and training to obtain an automatic medical reply effect evaluation model based on the medical feedback marking of the doctor.
Further, the method comprises the following steps:
s3.1 giving input patient questions based on Chinese medical question-answering reference modelGenerate->Personal machine recovery, wherein />Is the mth machine reply.
S3.2 doctor pairMachine reply vector +.>Ordering the quality of (2) to obtain ordered machine reply +.>Will->Dividing into ∈10 by permutation and combination method>The doctor feeds back the comparison pair, and the patient question is combined again>Obtaining a doctor feedback comparison pair containing patient problems>。
S3.3 pairRepeating steps S3.1 to S3.2 for individual patient problems to obtain +.>And constructing and obtaining a doctor feedback data set by using a doctor feedback comparison pair containing the patient problems.
Wherein, the doctor feedback label is specifically that the doctor inputs the problem of the patient for each patientIndividual machine reply vectorOrdering the quality of (2) to obtain ordered machine reply +.>。
For example, sampling from each department and inviting several doctors to input patient questions for each patient based on the reference indexes such as the fluency, continuity, accuracy, expertise and the like of the answerMachine reply vector +.>The advantages and disadvantages of the (3) are manually ordered, and the ordering is only the preference ordering of doctors and has no strict standard; obtain eachMultiple ordered machine replies to questions +.>The number is the most in a number of different orders, as the final doctor feedback result +.>。
The doctor feedback data set is specifically that the doctor feedback resultDivided into->The doctor feeds back the comparison pair->A comparison pair is a sample of the doctor feedback data set, altogether +.>Samples of>Is at-> and />A machine reply, which is better for the doctor between, < ->Representing a combined operation。
S3.4, training to obtain an automatic medical recovery effect evaluation model based on a doctor feedback data set, wherein a calculation formula of a loss function of the automatic medical recovery effect evaluation model is as follows
An automated medical return effect evaluation model may be used to fit the physician's evaluation, scoring the machine return generated. The automatic medical reply effect evaluation model is trained on the basis of a Chinese medical question and answer reference model, specifically, an output layer is replaced by a linear layer, and the automatic medical reply effect evaluation model is used for mapping a medical reply vector output by a decoder into a score scalar.
The calculation formula of the loss function of the automatic medical recovery effect evaluation model is as follows:
wherein ,for the model parameter +.>In the case of (a) an automated medical return effect evaluation model for patient question +.>And single machine reply->Scalar output score of->The physician is fed back with the labels of the data sets.
And S4, obtaining a medical question-answer reply generation model after further training by adopting a reinforcement learning method.
S4.1, backing up a Chinese medical question-answer reference model obtained in the step S2, and naming the backed up model as an RL model for reinforcement learning iteration.
S4.2 randomly extracting a patient question in the Large Scale medical question and answer dialogue datasetGenerating a machine reply using RL model>。
S4.3 extractionPatient problems of (2)And its corresponding machine reply->Obtaining a score +.f of the output of the automatic evaluation model as the input of the primary automatic evaluation model obtained in step 3>。
S4.4, based on a reinforcement learning algorithm, utilizing the scores output by the automatic evaluation model in the step S4.3The parameters of the RL model are updated.
Further, in the step S4.4, the reinforcement learning algorithm is aimed at updating parameters of the RL model to maximize a combined objective function, the objective function thereofThe formula is:
wherein ,for RL model->For distribution of data for reinforcement learning, +.>In order to supervise the training model,for distribution of training data during pre-training +.>Is KL reward coefficient, < > is>Is the loss factor of the pre-training.
S4.5, repeating the steps S4.2 to S4.4 for a plurality of times to obtain an RL model, wherein the obtained final RL model is the trained medical question-answering reply generation model.
The medical question-answering and answering method based on doctor feedback and reinforcement learning, provided by the invention, has the medical question-answering capability with higher expertise in a Chinese medical question-answering reference model through a mode of large-scale medical document pre-training and large-scale medical question-answering dialogue fine tuning, and further improves the expertise accuracy and co-emotion capability of the model through doctor feedback and reinforcement learning so as to have the medical question-answering capability with higher expertise, solve the mechanical and non-expertise problems of the traditional medical question-answering and answering generation method, and is superior to the traditional medical question-answering model in three aspects of fluency and expertise in answer, and has better performance in a medical question-answering generation task, and humanization and high expertise accuracy.
Referring to fig. 5, a medical question-answering system based on doctor feedback and reinforcement learning is characterized by comprising:
a large-scale medical document dataset module: the medical document collection system is used for collecting medical field documents, sorting the medical field documents into medical texts and integrating the medical text into a medical document data set;
chinese medical text large model module: the method is used for storing a Chinese medical text large model;
chinese medical question-answering benchmark model module: the method is used for storing a Chinese medical question-answer reference model;
automatic medical recovery effect evaluation model module: the method is used for evaluating the trained Chinese medical question-answering reference model;
model training module: the medical question-answer reply generation model is trained according to the large-scale medical document data set module, the Chinese medical question-answer reference model module and the automatic medical answer effect evaluation model module;
the medical question-answer reply generation model module: the medical question-answer reply generation model is used for storing the trained medical question-answer reply generation model;
the automatic medical response effect evaluation model is connected with the Chinese medical question-and-answer reference model module and the model training module, and the model training module is also connected with the Chinese medical question-and-answer reference model module and the medical question-and-answer generation model module.
The model training module is used for training the large-scale medical document data set module to obtain a large Chinese medical text model module, the large-scale medical document data set module is used for carrying out supervision and fine adjustment on the large Chinese medical text model module to obtain a Chinese medical question-answer standard model module, the Chinese medical question-answer standard model module is used for training an automatic medical answer effect evaluation model module, the model training module is used for storing a medical question-answer generating model according to the Chinese medical question-answer standard model module and the medical question-answer generated model trained by the automatic medical answer effect evaluation model module, and medical question-answer with high professional accuracy can be generated by the medical question-answer generating model module.
The foregoing description of the preferred embodiments of the invention is intended to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.
Claims (10)
1. The medical question-answering method based on doctor feedback and reinforcement learning is characterized by comprising the following steps of:
s1, a large-scale medical document dataset is used for pre-training to obtain a large Chinese medical text model;
s2, further monitoring and fine-tuning the large Chinese medical text model obtained in the step 1 by using a large-scale medical question-answering dialogue data set to obtain a Chinese medical question-answering reference model;
s3, inputting the patient questions to the Chinese medical question-answering reference model obtained in the step 2 to generate a plurality of machine replies, marking the medical feedback of the plurality of machine replies by a doctor, and training to obtain an automatic medical reply effect evaluation model based on the medical feedback marking of the doctor;
and S4, obtaining a medical question-answer reply generation model after further training by adopting a reinforcement learning method.
2. The method for answering a medical question and answer based on doctor feedback and reinforcement study according to claim 1, wherein the step S1 comprises the steps of:
s1.1, collecting medical documents in the medical field, arranging the collected medical documents into medical texts, and integrating the medical texts into a large-scale medical document data set;
s1.2, inputting each medical text in the medical document data set obtained in the step S1.1 into a medical input embedding layer one by one, and preprocessing the medical text into a medical text matrix;
s1.3, inputting the preprocessed medical text matrix into an Encoder with a plurality of Encoder layers stacked, outputting medical upper and lower Wen Yuyi matrixes with the same size, and creating a history medical reply matrix;
s1.4, inputting the medical upper and lower Wen Yuyi matrix and the historical medical reply matrix obtained in the step S1.3 into a Decoder containing a plurality of Decoder layer stacks, and adding the medical reply vector output by the Decoder into the historical medical reply matrix;
s1.5, outputting medical words in real time by passing each medical reply vector generated by the decoder through an output layer, and finally forming a medical text output;
s1.6, combining the four parts of the medical input embedded layer, the encoder, the decoder and the output layer in the steps S1.2-S1.5 to obtain the Chinese medical text large model.
3. The medical question-answering method based on doctor feedback and reinforcement learning according to claim 2, wherein the medical input embedding layer in the step S1.2 includes a medical word embedding layer, a linear transformation layer and a relative position embedding layer, the medical word embedding layer converts each medical word into a medical word vector by looking up a medical word list, the medical word vectors are longitudinally stacked and connected to each other to obtain a medical text matrix, the linear transformation layer reduces the length of each line of the medical text matrix by linear transformation, and the relative position embedding layer adds the relative position information learned by the model to each line of the medical text matrix.
4. The method according to claim 2, wherein in step S1.3, each Encoder layer of the Encoder outputs a hidden layer matrix of the same size as the input, each Encoder layer includes two sub-layers of a multi-headed attention mechanism layer and a full connection layer, and the output of each sub-layer has a residual connection and is subjected to layer normalization.
5. The medical question-answering method based on doctor feedback and reinforcement learning according to claim 4, wherein each attention head of the multi-head attention mechanism layer of the Encoder layer includes、、/>Three matrices to be learned, wherein +.>For the length of each line of the medical text matrix, +.> and />Each medical text matrix is subjected to +.>Or->、/>The dimension of the vector after matrix linear transformation, the input medical text matrix is combined with +.>、/>、/>Multiplying one by one to obtain Q, K, V three matrixes respectively and then performing output operation to obtain output +.>The output operation formula is:
and after the outputs of the plurality of attention heads are connected in parallel, the output of the multi-head attention mechanism layer is obtained, and the output calculation formula is as follows:
wherein ,to pay attention toNumber of heads->Is->Q, K, V matrix of individual attention heads, +.>Indicate->The size of the individual attention head is +.>Output matrix of>Indicate->The size of the individual attention head is +.>Output matrix of>For word number, ++>For the matrix to be learned, < > for>Is a transversal parallel function>The output of the multi-head attention mechanism layer;
the full-connection layer is a two-layer full-connection neural network, and the output of the full-connection layerThe calculation formula of (2) is as follows:
6. The method for answering a medical question and answer based on doctor feedback and reinforcement study according to claim 1, wherein the step S2 comprises the steps of:
s2.1, collecting a question-answer data set in the medical field, and integrating the question-answer data set into a large-scale medical question-answer dialogue data set;
s2.2, based on the large-scale medical question-answering dialogue data set, performing supervision fine tuning on the pre-trained Chinese medical text large model to obtain a Chinese medical question-answering reference model.
7. The method for providing a medical question-answering based on doctor feedback and reinforcement learning according to claim 5, wherein the step S3 comprises the steps of:
s3.1 giving input patient questions based on Chinese medical question-answering reference modelGenerate->Personal machine recovery, wherein />Is the mth machine reply;
s3.2 doctor pairMachine reply vector +.>Sequencing the quality of (2) to obtain a sequenced machine replyWill->Dividing into ∈10 by permutation and combination method>The doctor feeds back the comparison pair, and the patient question is combined again>Obtaining a doctor feedback comparison pair containing patient problems>;
S3.3 pairRepeating steps S3.1 to S3.2 for individual patient problems to obtain +.>A doctor feedback comparison pair containing the patient problem is constructed to obtain a doctor feedback data set;
s3.4, training to obtain an automatic medical recovery effect evaluation model based on a doctor feedback data set, wherein a calculation formula of a loss function of the automatic medical recovery effect evaluation model is as follows:
8. The method for providing a medical question-answering based on doctor feedback and reinforcement learning according to claim 7, wherein the step S4 comprises the steps of:
s4.1, backing up a Chinese medical question-answering reference model obtained in the step S2, and naming the backed up model as an RL model;
s4.2 randomly extracting a patient question in the Large Scale medical question and answer dialogue datasetGenerating a machine reply using RL model>;
S4.3 patient problems to be extractedAnd its corresponding machine reply->Obtaining a score +.f of the output of the automatic evaluation model as the input of the primary automatic evaluation model obtained in step 3>;
S4.4, based on a reinforcement learning algorithm, utilizing the scores output by the automatic evaluation model in the step S4.3Updating parameters of the RL model;
s4.5, repeating the steps S4.2 to S4.4 for a plurality of times to obtain an RL model, wherein the obtained final RL model is the trained medical question-answering reply generation model.
9. The method for providing a medical question-answering based on doctor feedback and reinforcement learning according to claim 8, wherein in step S4.4, the objective function of the reinforcement learning algorithm isThe formula is:
10. A system for implementing the physician feedback and reinforcement learning based question-answering method of claim 1, comprising:
a large-scale medical document dataset module: the medical document collection system is used for collecting medical field documents, sorting the medical field documents into medical texts and integrating the medical text into a medical document data set;
chinese medical text large model module: the method is used for storing a Chinese medical text large model;
chinese medical question-answering benchmark model module: the method is used for storing a Chinese medical question-answer reference model;
automatic medical recovery effect evaluation model module: the method is used for evaluating the trained Chinese medical question-answering reference model;
model training module: the medical question-answer reply generation model is trained according to the large-scale medical document data set module, the Chinese medical question-answer reference model module and the automatic medical answer effect evaluation model module;
the medical question-answer reply generation model module: the medical question-answer reply generation model is used for storing the trained medical question-answer reply generation model;
the automatic medical response effect evaluation model is connected with the Chinese medical question-and-answer reference model module and the model training module, and the model training module is also connected with the Chinese medical question-and-answer reference model module and the medical question-and-answer generation model module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310600962.XA CN116383364B (en) | 2023-05-26 | 2023-05-26 | Medical question-answering reply method and system based on doctor feedback and reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310600962.XA CN116383364B (en) | 2023-05-26 | 2023-05-26 | Medical question-answering reply method and system based on doctor feedback and reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116383364A true CN116383364A (en) | 2023-07-04 |
CN116383364B CN116383364B (en) | 2023-09-12 |
Family
ID=86980890
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310600962.XA Active CN116383364B (en) | 2023-05-26 | 2023-05-26 | Medical question-answering reply method and system based on doctor feedback and reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116383364B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116955576A (en) * | 2023-09-21 | 2023-10-27 | 神州医疗科技股份有限公司 | Question-answer reply method, system and equipment based on human feedback and reinforcement learning |
CN117198505A (en) * | 2023-08-23 | 2023-12-08 | 深圳大学 | Deep learning language model fine tuning method for clinical medicine decision assistance |
CN117709441A (en) * | 2024-02-06 | 2024-03-15 | 云南联合视觉科技有限公司 | Method for training professional medical large model through gradual migration field |
CN118013016A (en) * | 2024-03-12 | 2024-05-10 | 华南理工大学 | Human-like value alignment method and system based on multidimensional feedback reinforcement learning |
CN118095402A (en) * | 2024-04-29 | 2024-05-28 | 浙江实在智能科技有限公司 | Reward model training method and system based on human feedback reinforcement learning |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200097814A1 (en) * | 2018-09-26 | 2020-03-26 | MedWhat.com Inc. | Method and system for enabling interactive dialogue session between user and virtual medical assistant |
CN111274362A (en) * | 2020-02-01 | 2020-06-12 | 武汉大学 | Dialogue generation method based on transformer architecture |
CN112559702A (en) * | 2020-11-10 | 2021-03-26 | 西安理工大学 | Transformer-based natural language problem generation method in civil construction information field |
CN114611527A (en) * | 2022-03-01 | 2022-06-10 | 华南理工大学 | User personality perception task-oriented dialogue strategy learning method |
-
2023
- 2023-05-26 CN CN202310600962.XA patent/CN116383364B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200097814A1 (en) * | 2018-09-26 | 2020-03-26 | MedWhat.com Inc. | Method and system for enabling interactive dialogue session between user and virtual medical assistant |
CN111274362A (en) * | 2020-02-01 | 2020-06-12 | 武汉大学 | Dialogue generation method based on transformer architecture |
CN112559702A (en) * | 2020-11-10 | 2021-03-26 | 西安理工大学 | Transformer-based natural language problem generation method in civil construction information field |
CN114611527A (en) * | 2022-03-01 | 2022-06-10 | 华南理工大学 | User personality perception task-oriented dialogue strategy learning method |
Non-Patent Citations (2)
Title |
---|
ASHISH VASWANI 等: "Attention is all you need", Retrieved from the Internet <URL:https://arxiv.org/abs/1706.03762> * |
LONG OUYANG 等: "Training language models to follow instructions with human feedback", Retrieved from the Internet <URL:https://arxiv.org/abs/2203.02155> * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117198505A (en) * | 2023-08-23 | 2023-12-08 | 深圳大学 | Deep learning language model fine tuning method for clinical medicine decision assistance |
CN116955576A (en) * | 2023-09-21 | 2023-10-27 | 神州医疗科技股份有限公司 | Question-answer reply method, system and equipment based on human feedback and reinforcement learning |
CN116955576B (en) * | 2023-09-21 | 2024-07-02 | 神州医疗科技股份有限公司 | Question-answer reply method, system and equipment based on human feedback and reinforcement learning |
CN117709441A (en) * | 2024-02-06 | 2024-03-15 | 云南联合视觉科技有限公司 | Method for training professional medical large model through gradual migration field |
CN117709441B (en) * | 2024-02-06 | 2024-05-03 | 云南联合视觉科技有限公司 | Method for training professional medical large model through gradual migration field |
CN118013016A (en) * | 2024-03-12 | 2024-05-10 | 华南理工大学 | Human-like value alignment method and system based on multidimensional feedback reinforcement learning |
CN118013016B (en) * | 2024-03-12 | 2024-08-13 | 华南理工大学 | Human-like value alignment method and system based on multidimensional feedback reinforcement learning |
CN118095402A (en) * | 2024-04-29 | 2024-05-28 | 浙江实在智能科技有限公司 | Reward model training method and system based on human feedback reinforcement learning |
CN118095402B (en) * | 2024-04-29 | 2024-07-26 | 浙江实在智能科技有限公司 | Reward model training method and system based on human feedback reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN116383364B (en) | 2023-09-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116383364B (en) | Medical question-answering reply method and system based on doctor feedback and reinforcement learning | |
CN111695779B (en) | Knowledge tracking method, knowledge tracking device and storage medium | |
CN109697285B (en) | Hierarchical BilSt Chinese electronic medical record disease coding and labeling method for enhancing semantic representation | |
US20210034813A1 (en) | Neural network model with evidence extraction | |
CN109949929A (en) | A kind of assistant diagnosis system based on the extensive case history of deep learning | |
Barkan et al. | Scalable attentive sentence pair modeling via distilled sentence embedding | |
CN112086195B (en) | Admission risk prediction method based on self-adaptive ensemble learning model | |
CN109192299A (en) | A kind of medical analysis auxiliary system based on convolutional neural networks | |
CN112420191A (en) | Traditional Chinese medicine auxiliary decision making system and method | |
Raschka | Machine Learning Q and AI: 30 Essential Questions and Answers on Machine Learning and AI | |
CN106096286A (en) | Clinical path formulating method and device | |
Qian | Exploration of machine algorithms based on deep learning model and feature extraction | |
CN115223021A (en) | Visual question-answering-based fruit tree full-growth period farm work decision-making method | |
CN117497140A (en) | Multi-level depression state detection method based on fine granularity prompt learning | |
Kuila et al. | ECG signal classification using DEA with LSTM for arrhythmia detection | |
Zhang et al. | Neural Attentive Knowledge Tracing Model for Student Performance Prediction | |
CN114582449A (en) | Electronic medical record named entity standardization method and system based on XLNet-BiGRU-CRF model | |
CN116756361A (en) | Medical visual question-answering method based on corresponding feature fusion | |
CN113837490A (en) | Stock closing price prediction method for generating confrontation network based on wavelet denoising | |
Feng et al. | A Novel Binary Classification Algorithm for Carpal Tunnel Syndrome Detection Using LSTM | |
Falissard et al. | A deep artificial neural network based model for underlying cause of death prediction from death certificates | |
Sababa et al. | Classification of Dates Using Deep Learning | |
Sabah et al. | Pistachio Variety Classification using Convolutional Neural Networks | |
TWI852774B (en) | Classification method and classification device thereof | |
CN117194604B (en) | Intelligent medical patient inquiry corpus construction method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |