CN116521836A

CN116521836A - Biomedical extraction type question-answering method based on dynamic routing and answer voting

Info

Publication number: CN116521836A
Application number: CN202310330245.XA
Authority: CN
Inventors: 杨鹏; 胡中坚; 梁增玉; 裴宏梅
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2023-03-30
Filing date: 2023-03-30
Publication date: 2023-08-01

Abstract

The invention discloses a biomedical extraction type question-answering method based on dynamic routing and answer voting, which comprises the following steps of collecting and preprocessing a data set, and processing the data set into a format required by an extraction type question-answering task; designing a new routing algorithm to dynamically route the hidden layer of the pre-training model, and dynamically distributing proper weight; in the prediction stage, a voting mechanism is adopted to select more probable correct answers, and the mechanism can consider the similarity between the answers; a pre-fine tuning method is adopted to pre-train a model on a general corpus so as to improve the performance of the model on biomedical question-answering tasks; the invention adds dynamic route and answer voting based on ALBERT and pretrains the model, thereby effectively improving the model performance. On the premise of possessing the advantage of small parameter quantity, good performance can be obtained.

Description

Biomedical extraction type question-answering method based on dynamic routing and answer voting

Technical Field

The invention relates to a biomedical extraction type question-answering method based on dynamic routing and answer voting, and belongs to the technical field of Internet and artificial intelligence.

Background

Artificial intelligence is a very challenging science, and extraction questions and answers are one of the hot problems, so that the method has high research value. From an academic perspective, an extracted question-answer is one in which a machine is expected to extract an answer to a corresponding question after having performed an in-depth understanding and analysis of the semantics carried by a paragraph. From the practical application point of view, the extraction type question and answer has wide application value in our life. For example, intelligent customer service answers the user's questions using a question and answer technique. At the same time, it is also important in other fields, such as biomedical questions and answers.

Biomedical knowledge acquisition is an important task for information retrieval. Professionals and the public need help in acquiring and understanding biomedical concepts. In recent years, with the development of network technology and the accumulation of big data, healthcare services have increasingly emerged, including online medical information retrieval and biomedical questions and answers. Biomedical questions and answers are a subtask of Natural Language Processing (NLP) in the biomedical field. It can extract biomedical text and is helpful for knowledge retrieval. Biomedical questions and answers are an important component of questions and answers, which is a challenging task. Many existing biomedical question-answering methods are based on pre-trained language models. The pre-trained language model achieves high performance on a variety of NLP tasks. Especially the BERT series is free from the development of the pre-training model in the NLP field. The models such as BioBERT appearing later are trained in advance aiming at the biomedical medical library, so that the biomedical performance of the pre-trained model is effectively improved, and the model structure is similar to BERT and is mainly trained on biomedical corpus in a large amount. Pre-training models have become almost a non-bypassing topic in the current stage of each NLP task domain.

However, many of the work available does not fully exploit the hidden layer knowledge of the pre-trained model. For example, only the output of the last hidden layer of the pre-trained model is used, or only the outputs of multiple hidden layers are fixedly weighted. This approach is clearly inflexible and does not make good use of hidden layer knowledge learned by pre-trained models on a large corpus. In addition, a question often gives a set of answers. Most of the existing methods do not consider the meaning implied by the similarity between the answer and other answers. For example, one answer is similar to the other more answers, and is likely to be the correct answer. Such implicit information should not be ignored.

Therefore, the invention provides a biomedical extraction type question-answering method based on dynamic routing and answer voting. The dynamic routing mechanism can adaptively take full advantage of hidden layer knowledge. A dynamic routing algorithm is designed, which can dynamically give proper weights to a plurality of hidden layers and automatically adjust the weights instead of fixed weights. The algorithm can reasonably route multiple hidden layers. The answer voting strategy of the scheme can better select the optimal answer. The scheme designs an answer voting module to calculate voting scores, and can consider the similarity between answers. The answer score is no longer just a predictive score, but is made up of a predictive score and a voting score. In addition, in order to improve the performance of the model on biomedical questions, the model is also pre-trimmed on the squiad dataset, and a dynamic learning rate mechanism is introduced in the pre-trimming process.

Disclosure of Invention

Aiming at the problems and the shortcomings in the prior art, the invention provides a biomedical extraction type question-answering method based on dynamic routing and answer voting, which can dynamically route a plurality of hidden layers of a pre-training model, thereby more reasonably utilizing hidden layer knowledge learned by the pre-training model on a large-scale corpus, and in addition, for a group of answers predicted by the model, the similarity among the answers is considered, and the hidden information can be utilized to better generate proper answers.

In order to achieve the above object, the technical scheme of the present invention is as follows: a biomedical extraction type question-answering method based on dynamic routing and answer voting covers a dynamic routing mechanism and an answer voting strategy. The method mainly comprises three steps, and specifically comprises the following steps:

step 1, collecting and preprocessing a data set.

Since we were working on biomedical extraction question-answering tasks, we first collected biomedical datasets of bioasqfactor 4b, bioasqfactor 5b, bioasqfactor 6b, bioasqfactor 9b, etc., which were found on BioASQ challenge games. The squiad public dataset was then again collected and used to pre-train the model to improve model performance. While the dataset of BioASQ was used for model training and testing. We performed a conversion process on the dataset of bio asq, converting it into a format similar to squiad. The BioASQ biomedical data set can be used for the extraction type question-answering task after data processing.

And 2, training a model. First, the input of the model will be encoded by the embedding layer as an embedded vector. The embedded vector is encoded by the encoding layer to obtain the output of the hidden layer. Setting a plurality of hidden layers needing to be routed, taking the outputs of the hidden layers as inputs, and inputting the inputs into a dynamic routing module to obtain the dynamically routed outputs. The output of the linear layer is then obtained via a linear layer with an output dimension of 2. And (3) outputting the linear layer through a pre-output layer to obtain a group of pre-output answers and corresponding prediction scores. And then input into an answer voting module to obtain a corresponding voting score. And combining the prediction score and the voting score to obtain an answer score, and inputting the answer score into an output layer to obtain a final output answer. The details are as follows:

model input sequence, paragraph + problem, first through the embedding layer, convert to embedded vector E, then input the embedded vector into the coding layer:

H _all ＝Encoder(E)

wherein H is _all ＝(H ₁ ,H ₂ ,…,H ₁₂ ) Representing the output of the encoder hidden layer. H _i Representing the output of the ith hidden layer. Encoder structure with coding layer being transducer

Dynamic routing: for the output of the hidden layer, the last three layers are routed, and the output of the last three layers, namely H, is taken ₁₀ ，H ₁₁ ，H ₁₂ The three are used as inputs of the dynamic routing module, and can be obtained:

H＝Dynamic_Routing(H ₁₀ ，H ₁₁ ，H ₁₂ )

where H represents the output of the dynamic routing module.

After dynamic routing, a linear layer is connected, and the output dimension of the linear layer is 2.

L＝Linear(H)

Where L represents the output of the linear layer.

Answer voting: l is input to the pre-output layer,

A ^′ ,P＝pre_output(L)

wherein A is ^′ Representing the pre-output answer, P represents the corresponding predictive score.

Will A ^′ The answer voting module is input to the system,

V＝answer_voting(A ^′ )

where V represents the corresponding voting score.

S＝w ₁ *P+w ₂ *V

Where S represents the final answer score. And then the final answer is obtained through the output layer.

A＝output(S)

Where a represents the final answer.

Pre-training the model on the public dataset squiad can improve the performance of the model. The model was then trimmed to the BioASQ biomedical dataset and ready for testing on the BioASQ test set.

And 3, model testing. Inputting the test set into the model, starting model prediction, and obtaining a test result after the model operates.

Compared with the prior art, the invention has the following beneficial effects:

the biomedical field needs specific field knowledge, and the performance of the existing artificial intelligence technology such as question answering and the like needs to be improved, which also limits the application of the artificial intelligence technology in the biomedical field to a certain extent. The invention adopts the dynamic routing and answer voting technology and combines the pre-training on the public corpus, thereby greatly improving the performance of the model on biomedical questions and answers, which is beneficial to the application of NLP questions and answers technology in the biomedical field.

(1) The invention adopts a dynamic routing mechanism to adaptively and fully utilize hidden layer knowledge of the pre-training model. A dynamic routing algorithm is designed, which can dynamically give proper weights to a plurality of hidden layers and automatically adjust the weights instead of fixed weights. The algorithm can reasonably route a plurality of hidden layers, thereby fully utilizing the knowledge learned by the pre-training model on a large-scale corpus.

(2) The answer voting strategy of the invention can better select the optimal answer. An answer voting module is designed to calculate a voting score that takes into account the similarity between answers. The answer score is no longer just a predictive score, but is made up of a predictive score and a voting score. Such implicit information may help the model better generate the appropriate answer.

Drawings

FIG. 1 is a flow chart of a method according to an embodiment of the present invention;

FIG. 2 is a general block diagram of a method according to an embodiment of the present invention;

FIG. 3 is an explanatory diagram of the dynamic routing method of the present invention;

fig. 4 is a graph comparing a baseline model prediction sample.

Detailed Description

The invention is further illustrated below in conjunction with specific examples in order to enhance the understanding and appreciation of the invention.

Example 1: a biomedical extraction type question-answering method based on dynamic routing and answer voting firstly needs to collect biomedical data sets, then processes the biomedical data sets into an extraction type question-answering task format, and collects public data sets for pre-training. The model is pre-trained and then subjected to fine tuning test on biomedical tasks. The specific implementation steps of the invention are as follows:

step 1, collecting and preprocessing a data set. The bioasqfactor 4b, bioasqfactor 5b, bioasqfactor 6b,BioASQfactoid 9b biomedical datasets were first collected, and these datasets were found on the BioASQ challenge. The squiad public dataset was then collected again, the squiad dataset was used for pre-training the model, while the data set of bio asq was processed into a decimated question-and-answer task format for training and testing of the model.

And 2, training a model. The implementation of this step can be divided into the following sub-steps:

in the substep 2-1, the model inputs the sequence and paragraph + problem, firstly, the sequence is converted into an embedded vector E through an embedded layer, and then the embedded vector is input into an encoding layer:

H _all ＝Encoder(E)

wherein H is _all ＝(H ₁ ,H ₂ ,…,H ₁₂ ) Representing the output of the encoder hidden layer. H _i Representing the output of the ith hidden layer.

Coding layer: the coding layer is the encoder structure of the transducer, and an important part of the coding layer is the attention mechanism. For a given query matrix Q, key matrix K, value matrix V, the attention mechanism is calculated as follows:

MultiHead(Q,K,V)＝[head ₁ ,…,head _h ]W ^o

wherein W is _i ^q 、W _i ^k 、W _i ^v 、W ^o In order for the parameters to be able to be learned,is the scaling factor and h is the number of attention headers. The multi-head attention layer is connected with a feedforward layer, namely an encoder structure.

Substep 2-2, let us route the last three layers for the output of the hidden layer, let us take the output of the last three layers, i.e. H ₁₀ ，H ₁₁ ，H ₁₂ The three are used as inputs of the dynamic routing module, and can be obtained:

H＝Dynamic_Routing(H ₁₀ ，H ₁₁ ，H ₁₂ )

where H represents the output of the dynamic routing module.

Introduction of dynamic routing algorithm: first we define three initial weights b1, b2, b3, then define the number of iterations r: three routing values k1, k2, k3 are calculated for b1, b2, b3 using softmax, such that the sum of the routing values k1, k2, k3 is 1. And then flattening the output vectors H12, H11 and H10 of the last three hidden layers into one-dimensional vectors to obtain H2_f, H2_f and H2_f. The weighted sum x, k1×h12_f+k2×h11_f+k3×h10_f, is calculated and y is obtained by a nonlinear function squaring. Then updating b1, b2, b3, y multiplied by the transpose of h12_f plus the original b1 as new b1, and similarly obtaining new b2 and b3. After the iteration is completed, H12, H11, H10 are multiplied by the respective updated routing values and summed as output after the dynamic routing, i.e. k1×h12+k2×h11+k3×h10.

Wherein the squarish function refers to the squarish function in the Dynamic routing between capsules paper.

And 2-3, after dynamic routing, connecting a linear layer, wherein the output dimension of the linear layer is 2.

L＝Linear(H)

Where L represents the output of the linear layer.

Substep 2-4, answer voting: l is input to the pre-output layer,

A ^′ ,P＝pre_output(L)

Will A ^′ The answer voting module is input to the system,

V＝answer_voting(A ^′ )

where V represents the corresponding voting score.

S＝w ₁ *P+w ₂ *V

A＝output(S)

Where a represents the final answer.

For any x _i And x _j The answer voting score calculation formula:

wherein |x _i ∩x _j I represents x _i And x _j Number of common words in between. I x _i I represents x _i Is a word number of words. N represents the number of answers per set.

And 3, model testing. We first pre-trimmed the model on the squiad dataset, choosing the better performing checkpoints on squiad as the initializing checkpoints for trimming on bio asq. The test set was then tested after BioASQ fine tuning. To demonstrate the advantages of our model, we also compare the predicted samples of our model to the baseline model in FIG. 4.

In summary, the present invention uses a dynamic routing mechanism to adaptively fully utilize hidden layer knowledge of a pre-training model. We have devised a dynamic routing algorithm that dynamically assigns appropriate weights to multiple hidden layers and automatically adjusts the weights rather than fixed weights. The algorithm can reasonably route multiple hidden layers. The invention provides an answer voting strategy which can better select the optimal answer. An answer voting module is designed to calculate a voting score that takes into account the similarity between answers. The answer score is no longer just a predictive score, but is made up of a predictive score and a voting score. Such implicit information may help the model better generate the appropriate answer.

It is to be understood that the above-described embodiments are provided for illustrating the present invention and not for limiting the scope of the present invention, and that various modifications of the present invention, which are equivalent to those skilled in the art to which the present invention pertains, fall within the scope of the present invention defined in the appended claims after reading the present invention.

Claims

1. A biomedical extraction type question-answering method based on dynamic routing and answer voting, characterized in that the method comprises the following steps:

step 1, collecting and preprocessing a data set,

step 2, training the model,

and 3, model testing.

2. The biomedical extraction question-answering method based on dynamic routing and answer voting according to claim 1, characterized in that step 1, collecting and preprocessing data sets, specifically, firstly collecting bioasqfactor 4b, bioasqfactor 5b,BioASQfactoid6b,BioASQfactoid 9b biomedical data sets, then collecting squiad public data sets, wherein the squiad data sets are used for pre-training models, and processing the data sets of BioASQ into a task format of extraction question-answering for training and testing of the models.

3. The biomedical extraction question-answering method based on dynamic routing and answer voting according to claim 1, wherein in step 2, the model is trained by first pre-training the model on the squiad universal corpus, adopting a mechanism for dynamically reducing the learning rate in the pre-training process, and then fine-tuning on the bio asq dataset, specifically as follows:

in the substep 2-1, the model inputs sequences, paragraphs and questions, and the embedded vectors are first converted into embedded vectors E by the embedding layer, and then input into the coding layer:

H _all ＝Encoder(E)

wherein H is _all ＝(H ₁ ,H ₂ ,…,H ₁₂ ) Representing the output of the encoder hidden layer, H _i Representing the output of the ith hidden layer;

coding layer: the coding layer is the encoder structure of the transducer, an important part of which is the attention mechanism, which is calculated for a given query matrix Q, key matrix K, value matrix V as follows:

MultiHead(Q,K,V)＝[head ₁ ,…,head _h ]W ^o

wherein W is _i ^q 、W _i ^k 、W _i ^v 、W ^o In order for the parameters to be able to be learned,is a scaling factor, h is the number of attention heads, and a feedforward layer is connected to the multi-head attention layer, namely an encoder structure;

substep 2-2, for the output of the hidden layer, set the last three layers to be routed, and take the output of the last three layers, namely H ₁₀ ，H ₁₁ ，H ₁₂ The three are used as inputs of the dynamic routing module, and can be obtained:

H＝Dynamic_Routing(H ₁₀ ，H ₁₁ ，H ₁₂ )

wherein H represents the output of the dynamic routing module;

introduction of dynamic routing algorithm: first, three initial weights b1, b2, b3 are defined, and then the number of iterations r. iterations: calculating three routing values k1, k2, k3 using softmax for b1, k2, k3 such that the sum k1, k2, k3 of the routing values is 1, then flattening the output vectors H12, H11, H10 of the last three hidden layers into one-dimensional vectors to obtain h12_f, h11_f, h10_f, calculating a weighted sum x, i.e. k1×h12_f+k2×h11_f+k3×h10_f, obtaining y by a nonlinear function squarsh, then updating the transpose of b1, b2, b3, y multiplied by h12_f plus the original b1 as new b1, and similarly obtaining new b2 and b3, after the iteration is completed, multiplying the respective updated routing values by H12, H11, H10 and summing, as the output after the dynamic routing, i.e. k1+k12+k2×h11+k3×h10,

where the squarish function refers to the squarish function in the Dynamicrouting between capsules paper,

sub-step 2-3, after dynamic routing, a linear layer is connected, the output dimension of the linear layer is 2,

L＝Linear(H)

where L represents the output of the linear layer,

substep 2-4, answer voting: l is input to the pre-output layer,

A′，P＝pre-output(L)

where a' represents the pre-output answer, P represents the corresponding predictive score,

a' is input to the answer voting module,

V＝answer_voting(A′)

where V represents the corresponding voting score,

S＝w ₁ *P+w ₂ *V

where S represents the final answer score, and then the final answer is obtained through the output layer,

A＝output(S)

where a represents the final answer and where,

for any x _i And x _j The answer voting score calculation formula:

wherein |x _i ∩x _j I represents x _i And x _j Number of common words between, |x _i I represents x _i N represents the number of answers per set.

4. The biomedical extraction question-answering method based on dynamic routing and answer voting according to claim 1, wherein in step 3, the model test is performed, a test set is input into the model, model prediction is started, and a test result is obtained after the model is operated.