CN116521836A - Biomedical extraction type question-answering method based on dynamic routing and answer voting - Google Patents

Biomedical extraction type question-answering method based on dynamic routing and answer voting Download PDF

Info

Publication number
CN116521836A
CN116521836A CN202310330245.XA CN202310330245A CN116521836A CN 116521836 A CN116521836 A CN 116521836A CN 202310330245 A CN202310330245 A CN 202310330245A CN 116521836 A CN116521836 A CN 116521836A
Authority
CN
China
Prior art keywords
answer
model
output
voting
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310330245.XA
Other languages
Chinese (zh)
Inventor
杨鹏
胡中坚
梁增玉
裴宏梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202310330245.XA priority Critical patent/CN116521836A/en
Publication of CN116521836A publication Critical patent/CN116521836A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a biomedical extraction type question-answering method based on dynamic routing and answer voting, which comprises the following steps of collecting and preprocessing a data set, and processing the data set into a format required by an extraction type question-answering task; designing a new routing algorithm to dynamically route the hidden layer of the pre-training model, and dynamically distributing proper weight; in the prediction stage, a voting mechanism is adopted to select more probable correct answers, and the mechanism can consider the similarity between the answers; a pre-fine tuning method is adopted to pre-train a model on a general corpus so as to improve the performance of the model on biomedical question-answering tasks; the invention adds dynamic route and answer voting based on ALBERT and pretrains the model, thereby effectively improving the model performance. On the premise of possessing the advantage of small parameter quantity, good performance can be obtained.

Description

Biomedical extraction type question-answering method based on dynamic routing and answer voting
Technical Field
The invention relates to a biomedical extraction type question-answering method based on dynamic routing and answer voting, and belongs to the technical field of Internet and artificial intelligence.
Background
Artificial intelligence is a very challenging science, and extraction questions and answers are one of the hot problems, so that the method has high research value. From an academic perspective, an extracted question-answer is one in which a machine is expected to extract an answer to a corresponding question after having performed an in-depth understanding and analysis of the semantics carried by a paragraph. From the practical application point of view, the extraction type question and answer has wide application value in our life. For example, intelligent customer service answers the user's questions using a question and answer technique. At the same time, it is also important in other fields, such as biomedical questions and answers.
Biomedical knowledge acquisition is an important task for information retrieval. Professionals and the public need help in acquiring and understanding biomedical concepts. In recent years, with the development of network technology and the accumulation of big data, healthcare services have increasingly emerged, including online medical information retrieval and biomedical questions and answers. Biomedical questions and answers are a subtask of Natural Language Processing (NLP) in the biomedical field. It can extract biomedical text and is helpful for knowledge retrieval. Biomedical questions and answers are an important component of questions and answers, which is a challenging task. Many existing biomedical question-answering methods are based on pre-trained language models. The pre-trained language model achieves high performance on a variety of NLP tasks. Especially the BERT series is free from the development of the pre-training model in the NLP field. The models such as BioBERT appearing later are trained in advance aiming at the biomedical medical library, so that the biomedical performance of the pre-trained model is effectively improved, and the model structure is similar to BERT and is mainly trained on biomedical corpus in a large amount. Pre-training models have become almost a non-bypassing topic in the current stage of each NLP task domain.
However, many of the work available does not fully exploit the hidden layer knowledge of the pre-trained model. For example, only the output of the last hidden layer of the pre-trained model is used, or only the outputs of multiple hidden layers are fixedly weighted. This approach is clearly inflexible and does not make good use of hidden layer knowledge learned by pre-trained models on a large corpus. In addition, a question often gives a set of answers. Most of the existing methods do not consider the meaning implied by the similarity between the answer and other answers. For example, one answer is similar to the other more answers, and is likely to be the correct answer. Such implicit information should not be ignored.
Therefore, the invention provides a biomedical extraction type question-answering method based on dynamic routing and answer voting. The dynamic routing mechanism can adaptively take full advantage of hidden layer knowledge. A dynamic routing algorithm is designed, which can dynamically give proper weights to a plurality of hidden layers and automatically adjust the weights instead of fixed weights. The algorithm can reasonably route multiple hidden layers. The answer voting strategy of the scheme can better select the optimal answer. The scheme designs an answer voting module to calculate voting scores, and can consider the similarity between answers. The answer score is no longer just a predictive score, but is made up of a predictive score and a voting score. In addition, in order to improve the performance of the model on biomedical questions, the model is also pre-trimmed on the squiad dataset, and a dynamic learning rate mechanism is introduced in the pre-trimming process.
Disclosure of Invention
Aiming at the problems and the shortcomings in the prior art, the invention provides a biomedical extraction type question-answering method based on dynamic routing and answer voting, which can dynamically route a plurality of hidden layers of a pre-training model, thereby more reasonably utilizing hidden layer knowledge learned by the pre-training model on a large-scale corpus, and in addition, for a group of answers predicted by the model, the similarity among the answers is considered, and the hidden information can be utilized to better generate proper answers.
In order to achieve the above object, the technical scheme of the present invention is as follows: a biomedical extraction type question-answering method based on dynamic routing and answer voting covers a dynamic routing mechanism and an answer voting strategy. The method mainly comprises three steps, and specifically comprises the following steps:
step 1, collecting and preprocessing a data set.
Since we were working on biomedical extraction question-answering tasks, we first collected biomedical datasets of bioasqfactor 4b, bioasqfactor 5b, bioasqfactor 6b, bioasqfactor 9b, etc., which were found on BioASQ challenge games. The squiad public dataset was then again collected and used to pre-train the model to improve model performance. While the dataset of BioASQ was used for model training and testing. We performed a conversion process on the dataset of bio asq, converting it into a format similar to squiad. The BioASQ biomedical data set can be used for the extraction type question-answering task after data processing.
And 2, training a model. First, the input of the model will be encoded by the embedding layer as an embedded vector. The embedded vector is encoded by the encoding layer to obtain the output of the hidden layer. Setting a plurality of hidden layers needing to be routed, taking the outputs of the hidden layers as inputs, and inputting the inputs into a dynamic routing module to obtain the dynamically routed outputs. The output of the linear layer is then obtained via a linear layer with an output dimension of 2. And (3) outputting the linear layer through a pre-output layer to obtain a group of pre-output answers and corresponding prediction scores. And then input into an answer voting module to obtain a corresponding voting score. And combining the prediction score and the voting score to obtain an answer score, and inputting the answer score into an output layer to obtain a final output answer. The details are as follows:
model input sequence, paragraph + problem, first through the embedding layer, convert to embedded vector E, then input the embedded vector into the coding layer:
H all =Encoder(E)
wherein H is all =(H 1 ,H 2 ,…,H 12 ) Representing the output of the encoder hidden layer. H i Representing the output of the ith hidden layer. Encoder structure with coding layer being transducer
Dynamic routing: for the output of the hidden layer, the last three layers are routed, and the output of the last three layers, namely H, is taken 10 ,H 11 ,H 12 The three are used as inputs of the dynamic routing module, and can be obtained:
H=Dynamic_Routing(H 10 ,H 11 ,H 12 )
where H represents the output of the dynamic routing module.
After dynamic routing, a linear layer is connected, and the output dimension of the linear layer is 2.
L=Linear(H)
Where L represents the output of the linear layer.
Answer voting: l is input to the pre-output layer,
A ,P=pre_output(L)
wherein A is Representing the pre-output answer, P represents the corresponding predictive score.
Will A The answer voting module is input to the system,
V=answer_voting(A )
where V represents the corresponding voting score.
S=w 1 *P+w 2 *V
Where S represents the final answer score. And then the final answer is obtained through the output layer.
A=output(S)
Where a represents the final answer.
Pre-training the model on the public dataset squiad can improve the performance of the model. The model was then trimmed to the BioASQ biomedical dataset and ready for testing on the BioASQ test set.
And 3, model testing. Inputting the test set into the model, starting model prediction, and obtaining a test result after the model operates.
Compared with the prior art, the invention has the following beneficial effects:
the biomedical field needs specific field knowledge, and the performance of the existing artificial intelligence technology such as question answering and the like needs to be improved, which also limits the application of the artificial intelligence technology in the biomedical field to a certain extent. The invention adopts the dynamic routing and answer voting technology and combines the pre-training on the public corpus, thereby greatly improving the performance of the model on biomedical questions and answers, which is beneficial to the application of NLP questions and answers technology in the biomedical field.
(1) The invention adopts a dynamic routing mechanism to adaptively and fully utilize hidden layer knowledge of the pre-training model. A dynamic routing algorithm is designed, which can dynamically give proper weights to a plurality of hidden layers and automatically adjust the weights instead of fixed weights. The algorithm can reasonably route a plurality of hidden layers, thereby fully utilizing the knowledge learned by the pre-training model on a large-scale corpus.
(2) The answer voting strategy of the invention can better select the optimal answer. An answer voting module is designed to calculate a voting score that takes into account the similarity between answers. The answer score is no longer just a predictive score, but is made up of a predictive score and a voting score. Such implicit information may help the model better generate the appropriate answer.
Drawings
FIG. 1 is a flow chart of a method according to an embodiment of the present invention;
FIG. 2 is a general block diagram of a method according to an embodiment of the present invention;
FIG. 3 is an explanatory diagram of the dynamic routing method of the present invention;
fig. 4 is a graph comparing a baseline model prediction sample.
Detailed Description
The invention is further illustrated below in conjunction with specific examples in order to enhance the understanding and appreciation of the invention.
Example 1: a biomedical extraction type question-answering method based on dynamic routing and answer voting firstly needs to collect biomedical data sets, then processes the biomedical data sets into an extraction type question-answering task format, and collects public data sets for pre-training. The model is pre-trained and then subjected to fine tuning test on biomedical tasks. The specific implementation steps of the invention are as follows:
step 1, collecting and preprocessing a data set. The bioasqfactor 4b, bioasqfactor 5b, bioasqfactor 6b,BioASQfactoid 9b biomedical datasets were first collected, and these datasets were found on the BioASQ challenge. The squiad public dataset was then collected again, the squiad dataset was used for pre-training the model, while the data set of bio asq was processed into a decimated question-and-answer task format for training and testing of the model.
And 2, training a model. The implementation of this step can be divided into the following sub-steps:
in the substep 2-1, the model inputs the sequence and paragraph + problem, firstly, the sequence is converted into an embedded vector E through an embedded layer, and then the embedded vector is input into an encoding layer:
H all =Encoder(E)
wherein H is all =(H 1 ,H 2 ,…,H 12 ) Representing the output of the encoder hidden layer. H i Representing the output of the ith hidden layer.
Coding layer: the coding layer is the encoder structure of the transducer, and an important part of the coding layer is the attention mechanism. For a given query matrix Q, key matrix K, value matrix V, the attention mechanism is calculated as follows:
MultiHead(Q,K,V)=[head 1 ,…,head h ]W o
wherein W is i q 、W i k 、W i v 、W o In order for the parameters to be able to be learned,is the scaling factor and h is the number of attention headers. The multi-head attention layer is connected with a feedforward layer, namely an encoder structure.
Substep 2-2, let us route the last three layers for the output of the hidden layer, let us take the output of the last three layers, i.e. H 10 ,H 11 ,H 12 The three are used as inputs of the dynamic routing module, and can be obtained:
H=Dynamic_Routing(H 10 ,H 11 ,H 12 )
where H represents the output of the dynamic routing module.
Introduction of dynamic routing algorithm: first we define three initial weights b1, b2, b3, then define the number of iterations r: three routing values k1, k2, k3 are calculated for b1, b2, b3 using softmax, such that the sum of the routing values k1, k2, k3 is 1. And then flattening the output vectors H12, H11 and H10 of the last three hidden layers into one-dimensional vectors to obtain H2_f, H2_f and H2_f. The weighted sum x, k1×h12_f+k2×h11_f+k3×h10_f, is calculated and y is obtained by a nonlinear function squaring. Then updating b1, b2, b3, y multiplied by the transpose of h12_f plus the original b1 as new b1, and similarly obtaining new b2 and b3. After the iteration is completed, H12, H11, H10 are multiplied by the respective updated routing values and summed as output after the dynamic routing, i.e. k1×h12+k2×h11+k3×h10.
Wherein the squarish function refers to the squarish function in the Dynamic routing between capsules paper.
And 2-3, after dynamic routing, connecting a linear layer, wherein the output dimension of the linear layer is 2.
L=Linear(H)
Where L represents the output of the linear layer.
Substep 2-4, answer voting: l is input to the pre-output layer,
A ,P=pre_output(L)
wherein A is Representing the pre-output answer, P represents the corresponding predictive score.
Will A The answer voting module is input to the system,
V=answer_voting(A )
where V represents the corresponding voting score.
S=w 1 *P+w 2 *V
Where S represents the final answer score. And then the final answer is obtained through the output layer.
A=output(S)
Where a represents the final answer.
For any x i And x j The answer voting score calculation formula:
wherein |x i ∩x j I represents x i And x j Number of common words in between. I x i I represents x i Is a word number of words. N represents the number of answers per set.
And 3, model testing. We first pre-trimmed the model on the squiad dataset, choosing the better performing checkpoints on squiad as the initializing checkpoints for trimming on bio asq. The test set was then tested after BioASQ fine tuning. To demonstrate the advantages of our model, we also compare the predicted samples of our model to the baseline model in FIG. 4.
In summary, the present invention uses a dynamic routing mechanism to adaptively fully utilize hidden layer knowledge of a pre-training model. We have devised a dynamic routing algorithm that dynamically assigns appropriate weights to multiple hidden layers and automatically adjusts the weights rather than fixed weights. The algorithm can reasonably route multiple hidden layers. The invention provides an answer voting strategy which can better select the optimal answer. An answer voting module is designed to calculate a voting score that takes into account the similarity between answers. The answer score is no longer just a predictive score, but is made up of a predictive score and a voting score. Such implicit information may help the model better generate the appropriate answer.
It is to be understood that the above-described embodiments are provided for illustrating the present invention and not for limiting the scope of the present invention, and that various modifications of the present invention, which are equivalent to those skilled in the art to which the present invention pertains, fall within the scope of the present invention defined in the appended claims after reading the present invention.

Claims (4)

1. A biomedical extraction type question-answering method based on dynamic routing and answer voting, characterized in that the method comprises the following steps:
step 1, collecting and preprocessing a data set,
step 2, training the model,
and 3, model testing.
2. The biomedical extraction question-answering method based on dynamic routing and answer voting according to claim 1, characterized in that step 1, collecting and preprocessing data sets, specifically, firstly collecting bioasqfactor 4b, bioasqfactor 5b,BioASQfactoid6b,BioASQfactoid 9b biomedical data sets, then collecting squiad public data sets, wherein the squiad data sets are used for pre-training models, and processing the data sets of BioASQ into a task format of extraction question-answering for training and testing of the models.
3. The biomedical extraction question-answering method based on dynamic routing and answer voting according to claim 1, wherein in step 2, the model is trained by first pre-training the model on the squiad universal corpus, adopting a mechanism for dynamically reducing the learning rate in the pre-training process, and then fine-tuning on the bio asq dataset, specifically as follows:
in the substep 2-1, the model inputs sequences, paragraphs and questions, and the embedded vectors are first converted into embedded vectors E by the embedding layer, and then input into the coding layer:
H all =Encoder(E)
wherein H is all =(H 1 ,H 2 ,…,H 12 ) Representing the output of the encoder hidden layer, H i Representing the output of the ith hidden layer;
coding layer: the coding layer is the encoder structure of the transducer, an important part of which is the attention mechanism, which is calculated for a given query matrix Q, key matrix K, value matrix V as follows:
MultiHead(Q,K,V)=[head 1 ,…,head h ]W o
wherein W is i q 、W i k 、W i v 、W o In order for the parameters to be able to be learned,is a scaling factor, h is the number of attention heads, and a feedforward layer is connected to the multi-head attention layer, namely an encoder structure;
substep 2-2, for the output of the hidden layer, set the last three layers to be routed, and take the output of the last three layers, namely H 10 ,H 11 ,H 12 The three are used as inputs of the dynamic routing module, and can be obtained:
H=Dynamic_Routing(H 10 ,H 11 ,H 12 )
wherein H represents the output of the dynamic routing module;
introduction of dynamic routing algorithm: first, three initial weights b1, b2, b3 are defined, and then the number of iterations r. iterations: calculating three routing values k1, k2, k3 using softmax for b1, k2, k3 such that the sum k1, k2, k3 of the routing values is 1, then flattening the output vectors H12, H11, H10 of the last three hidden layers into one-dimensional vectors to obtain h12_f, h11_f, h10_f, calculating a weighted sum x, i.e. k1×h12_f+k2×h11_f+k3×h10_f, obtaining y by a nonlinear function squarsh, then updating the transpose of b1, b2, b3, y multiplied by h12_f plus the original b1 as new b1, and similarly obtaining new b2 and b3, after the iteration is completed, multiplying the respective updated routing values by H12, H11, H10 and summing, as the output after the dynamic routing, i.e. k1+k12+k2×h11+k3×h10,
where the squarish function refers to the squarish function in the Dynamicrouting between capsules paper,
sub-step 2-3, after dynamic routing, a linear layer is connected, the output dimension of the linear layer is 2,
L=Linear(H)
where L represents the output of the linear layer,
substep 2-4, answer voting: l is input to the pre-output layer,
A′,P=pre-output(L)
where a' represents the pre-output answer, P represents the corresponding predictive score,
a' is input to the answer voting module,
V=answer_voting(A′)
where V represents the corresponding voting score,
S=w 1 *P+w 2 *V
where S represents the final answer score, and then the final answer is obtained through the output layer,
A=output(S)
where a represents the final answer and where,
for any x i And x j The answer voting score calculation formula:
wherein |x i ∩x j I represents x i And x j Number of common words between, |x i I represents x i N represents the number of answers per set.
4. The biomedical extraction question-answering method based on dynamic routing and answer voting according to claim 1, wherein in step 3, the model test is performed, a test set is input into the model, model prediction is started, and a test result is obtained after the model is operated.
CN202310330245.XA 2023-03-30 2023-03-30 Biomedical extraction type question-answering method based on dynamic routing and answer voting Pending CN116521836A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310330245.XA CN116521836A (en) 2023-03-30 2023-03-30 Biomedical extraction type question-answering method based on dynamic routing and answer voting

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310330245.XA CN116521836A (en) 2023-03-30 2023-03-30 Biomedical extraction type question-answering method based on dynamic routing and answer voting

Publications (1)

Publication Number Publication Date
CN116521836A true CN116521836A (en) 2023-08-01

Family

ID=87394936

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310330245.XA Pending CN116521836A (en) 2023-03-30 2023-03-30 Biomedical extraction type question-answering method based on dynamic routing and answer voting

Country Status (1)

Country Link
CN (1) CN116521836A (en)

Similar Documents

Publication Publication Date Title
CN110929164A (en) Interest point recommendation method based on user dynamic preference and attention mechanism
CN112015868B (en) Question-answering method based on knowledge graph completion
CN112508077A (en) Social media emotion analysis method and system based on multi-modal feature fusion
CN108829756B (en) Method for solving multi-turn video question and answer by using hierarchical attention context network
CN111402928A (en) Attention-based speech emotion state evaluation method, device, medium and equipment
CN111048117A (en) Cross-library speech emotion recognition method based on target adaptation subspace learning
CN114022311A (en) Comprehensive energy system data compensation method for generating countermeasure network based on time sequence condition
CN115659254A (en) Power quality disturbance analysis method for power distribution network with bimodal feature fusion
CN111222689A (en) LSTM load prediction method, medium, and electronic device based on multi-scale temporal features
CN115563314A (en) Knowledge graph representation learning method for multi-source information fusion enhancement
CN113761777B (en) HP-OVMD-based ultra-short-term photovoltaic power prediction method
CN110188978A (en) A kind of university student's profession recommended method based on deep learning
CN116521836A (en) Biomedical extraction type question-answering method based on dynamic routing and answer voting
CN116821291A (en) Question-answering method and system based on knowledge graph embedding and language model alternate learning
CN115952360A (en) Domain-adaptive cross-domain recommendation method and system based on user and article commonality modeling
CN114239575B (en) Statement analysis model construction method, statement analysis method, device, medium and computing equipment
CN114120367B (en) Pedestrian re-recognition method and system based on circle loss measurement under meta-learning framework
CN113835964A (en) Cloud data center server energy consumption prediction method based on small sample learning
CN113988395A (en) Wind power ultra-short-term power prediction method based on SSD and dual attention mechanism BiGRU
CN114283301A (en) Self-adaptive medical image classification method and system based on Transformer
CN112465054A (en) Multivariate time series data classification method based on FCN
CN114343676B (en) Electroencephalogram emotion recognition method and device based on self-adaptive hierarchical graph neural network
CN114625871B (en) Ternary grouping method based on attention position joint coding
CN113379068B (en) Deep learning architecture searching method based on structured data
CN117972070B (en) Large model form question-answering method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination