CN111259127B

CN111259127B - Long text answer selection method based on transfer learning sentence vector

Info

Publication number: CN111259127B
Application number: CN202010043764.4A
Authority: CN
Inventors: 张引; 王炜
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2020-01-15
Filing date: 2020-01-15
Publication date: 2022-05-31
Anticipated expiration: 2040-01-15
Also published as: CN111259127A

Abstract

The invention discloses a long text answer selection method based on a migration learning sentence vector, which adopts a two-stage method to construct a migration learning sentence vector network and a training prediction network, wherein the migration learning sentence vector network comprises a twin network structure, an attention aggregation structure and a classification layer; the training prediction network includes a twin network structure and a distance metric layer. Firstly, the invention does not need to perform word segmentation on the data set text sequence, and directly takes the complete question answer sentence as input, thereby avoiding error propagation caused by word segmentation tools. Secondly, the training prediction network of the second stage has simple structure and high calculation efficiency. Finally, a transfer learning method is introduced to combine a twin network structure and an attention mechanism to obtain sentence vector model weights with more similar semantics, a sentence-level semantic vector is provided for the training prediction network of the second stage, and a better effect is obtained compared with a traditional method and a common deep learning network method, and especially for long text data, the effect is more prominent.

Description

Long text answer selection method based on transfer learning sentence vector

Technical Field

The invention relates to a pre-training language model and an attention mechanism in natural language processing and deep learning. In particular to a long text answer selection method based on a transfer learning sentence vector.

Background

The internet has developed at a high rate over the years and various information platforms have exploded in a "blowout" manner. According to the incomplete statistics of the Hootsuie website and the Wearescoial website, the number of netizens in the world has broken through 3.5 hundred million people by 2019, and 45% of the world population is users of social media. The data shows that network users have increased 439 million times from 2018 to 2019, and social media users have increased 348 million times in the year. The vast amount of data shows that the global network has reached a very developed world, and countless internet knowledge information is brought along with the network. A large number of websites carry network information and are flooded in the internet environment, and how to effectively search and utilize the information is a problem, so that the existence of a search engine is very important. The storage and computing speed of computers has already been met in the golden age, the computing power and the storage power of computers have become tripartite which hinders the development of search engines, and with the arrival of high-performance computing and high-performance storage, how to efficiently and accurately search the most relevant search results becomes the research focus of search engines.

Aiming at the research focus, the difficult problem of accurately retrieving the most relevant information in massive documents must be overcome. Looking at the history of search development, the first generation search engine, Archie 3, was primarily used to search files distributed among various hosts. When the world Wide Web appeared, EINet Galaxy (Tradewave Galaxy)4 appeared, which functioned as the earliest web portal. Through the updating of search engine technology in the middle, under the competition mainly based on Baidu search engine, Google search engine and Bing search engine which are dominant by large Internet companies such as Baidu, Google, Microsoft and the like, how to accurately search still remains a continuous research hotspot in the future. With the rise of artificial intelligence wave, machine learning and deep learning methods bring new solutions to the fields of image recognition, natural language processing, voice recognition processing and the like. In the face of the current situation that the results recalled by search engine retrieval are not ideal, many retrieval results need secondary screening and filtering by searchers, so that the automatic question-answering technology is produced.

The answer selection technology is an important step in the automatic question and answer technology, and is widely applied in life, for example, millet xiaoai classmates, Iphone Siri, microsoft mini-ice, and hectorite are all actual floor products of the automatic question and answer technology. In the field of task-based automatic question answering, a robot assistant with technical achievements of automatic question answering can greatly liberate both hands, and can control and complete a series of tasks only by using voice commands. In the field of chatting type automatic question answering, the chatting robot can add a person pleasure to boring life. In the field of modern medicine, an automatic question-answering technology can establish a more convenient and efficient communication mode for doctors and patients. Therefore, it is important how to improve the accuracy of question answering in the field of automatic question answering, and the answer selection technology, which plays a very important role in the search field of retrieval type automatic question answering, also plays a very important role in the search engine described above.

The existing answer selection method generally uses a twin network structure to respectively model a question text and an answer text, and finally distinguishes whether the question is matched with the answer or not through a similarity measurement method such as cosine distance and the like. However, the traditional method mainly focuses on short text matching tasks, lacks research on long text application scenes, and is difficult to solve the problems of 'semantic migration' and 'semantic gap' in the field of long text application. Moreover, because the question-answer data in the medical field generally has the characteristic of short question and long answer, and the matching effect and recall precision of the existing answer selection method can not meet the on-line requirement, the technical difficulties mainly involved in order to better select the answer of the long text data are as follows:

1. how to design a model modeling long text sequence;

2. how to utilize external knowledge and introduce a transfer learning method to improve recall precision;

3. how to design the effect of the evaluation index quantification model.

Disclosure of Invention

In order to solve the problems, the invention provides a long text answer selection method based on a migration learning sentence vector, wherein BERT is used as a feature extraction layer to model long text data, and a two-stage task of migration learning and training prediction is adopted. Firstly, the question and answer text sequence is used as input and is processed by using a BERT input format without additional word segmentation, so that error propagation caused by word segmentation is avoided. Secondly, a transfer learning method is used and a twin network structure and an attention aggregation structure are used as assistance, so that the problem and answer sentence vectors obtained by transfer learning are more semantically similar. Finally, sentence vectors of texts are obtained by initializing model weight parameters of transfer learning in the training prediction process, semantic similarity of question and answer sentence vectors is simply calculated through a distance measurement method, and higher recall efficiency and lower video memory occupation are obtained due to the fact that a training prediction network structure is simplified.

In order to achieve the purpose, the invention adopts the following technical scheme:

a long text answer selection method based on a transfer learning sentence vector comprises the following steps:

1) XPATH design crawlers are used for crawling doctor-patient question and answer data of the inquiry forum and cleaning the data; taking answers in the doctor-patient question-answer data as a positive sample; for questions in the doctor-patient question-answer data, retrieval recall of relevant answers is carried out by using a Lucene index tool, and the relevant answers are used as negative samples; constructing a point type answer selection data set according to the obtained positive sample and the negative sample, and dividing a transfer learning data set and a training prediction data set according to the proportion of 27: 1-8: 1;

2) establishing a transfer learning sentence vector network which comprises a twin network structure, an attention aggregation structure and a classification layer, wherein the twin network structure comprises an input layer, a feature extraction layer and a pooling layer which are paired, and the attention aggregation structure comprises an attention layer and an aggregation network layer; the feature extraction layer adopts a BERT model, loads a full-word covering weight BERT parameter for initialization, takes a mean value for pooling output after feature extraction, and carries out aggregation output on features sequentially through an attention layer and an aggregation network layer; splicing the polymerization output vector and the BERT pooling output vector, and inputting the polymerization output vector and the BERT pooling output vector into a classification layer for two-classification output;

training a migration learning sentence vector network by using the migration learning data set obtained in the step 1), matching two classification values of whether the question and the answer are matched with the real label by using an MRR (maximum likelihood ratio) and Precision @ K (K) evaluation index method, and selecting a network parameter corresponding to a model with the highest matching score to obtain a BertAttTL migration learning sentence vector model;

3) establishing a training prediction network which comprises a twin network structure and a distance measurement layer, wherein the twin network structure comprises a pair of an input layer, a feature extraction layer and a pooling layer; the feature extraction layer adopts a BERT model, uses the weight parameters of the BertAttTL migration learning sentence vector model obtained in the step 2) to initialize the BERT model and pooling layer parameters in a training prediction network, outputs question sentence vectors and answer sentence vectors through the pooling layer, inputs the two sentence vectors into a distance measurement layer to obtain semantic similarity, and divides the similarity by a threshold value to obtain whether similar binary classification values are output as prediction contents; training a training prediction network by using the training prediction data set obtained in the step 1), matching the finally obtained binary classification value with a real label by using an MRR (maximum likelihood ratio) and Precision @ K (K) evaluation index method, and selecting a network parameter corresponding to a model with the highest matching score to obtain a trained training prediction network;

4) inputting the questions to be processed and the answer texts into the training prediction network obtained in the step 3), and outputting the two classification values of all the candidate answers to obtain the final answer of the questions to be processed.

Further, the MRR and Precision @ K evaluation index method specifically comprises the following steps:

expressing the output of the transfer learning sentence vector network or the training prediction network as pred ═ p₁,p₂,...,p_n]Wherein p is_iThe predicted value of the ith candidate answer is represented as 0 or 1,0 represents dissimilarity, 1 represents similarity, and n represents the number of test samples in the sample set; the real tag data is expressed as label ═ t₁,t₁,,...,t_n]Wherein t is_iThe real label 0 or 1 represents the ith candidate answer, 0 represents dissimilar, 1 represents similar, and n represents the number of the test samples in the sample set; aiming at all candidate answers of a question, obtaining two classification values through a transfer learning sentence vector network or a training prediction network and then sequencing to obtain a ranking rank of a correct answer aiming at the ith question_i；

The MRR calculation formula is as follows:

wherein Q is a problem set, and | Q | represents the number of all problems;

precision @ K is calculated as:

where precisin represents precision, K represents the number of answers considered in the index, and the values in the present invention are 1,2, and 3, num (true answers) represents the number of correct answers, and sum (related K answers) represents the total number of recalled relevant answers.

Furthermore, the transfer learning sentence vector network comprises a twin network structure, an attention aggregation structure and a classification layer, wherein the twin network structure comprises an input layer, a feature extraction layer and a pooling layer, the attention aggregation structure comprises an attention layer and an aggregation network layer, the attention layer is mainly used for adding an attention mechanism in the twin network structure, the semantic representation of the answer text is enriched by using the context of the question, the semantic representation of the question text is enriched by using the context of the answer, and the matching effect can be effectively improved through the semantic interaction of the question and the answer; the aggregation network layer is mainly used for further deepening the model to model the characteristics of the similar part and the dissimilar part of the question and the answer through the comparison layer and the aggregation layer after the attention mechanism is carried out, so that the matching effect can be effectively improved on the basis of the attention mechanism. The feature extraction layer is modeled by using BERT, and the BERT is initialized by using a BERT weight parameter covered by a whole word;

inputting the paired samples into a twin network structure, wherein paired input layers correspond to two text sequences of a Question and an Answer, and respectively processing a Question text and an Answer text according to the input format [ CLS ] + Question + [ SEP ], [ CLS ] + Answer + [ SEP ] of BERT; after the BERT characteristic modeling, averaging the output of 12 layers of pooling layers to respectively obtain pooling output with unified dimensionality: a question pooling output Q pool and an answer pooling output A pool, wherein the dimension length is 768 dimensions;

inputting the question pooling output Q pool and the answer pooling output A pool into an attention layer, and respectively obtaining question semantic alignment vectors Z through an attention mechanism₂And answer semantic alignment vector Z₂'; mixing Q pool, A pool and Z₂And Z₂' input to the aggregation network layer, for problem, Q pool and Z₂By [ Q pool, Z₂],[Q pool,Q pool-Z₂],[Q pool,Q pool*Z₂]Transforming, and splicing by one layer of linear transformation to obtain splicing vector O₁,O₂,O₃]The spliced vector is subjected to a layer of linear transformation and uses a Dropout mechanism to obtain a Fused output of problem attention_Q(ii) a Similarly, for answers, A pool and Z₂' obtaining the answer attention aggregation output Fused through the aggregation network layer_A；

Will fuse_Q、Fused_AFurther splicing Q pool and A pool to obtain [ Q pool, A pool, | Q pool-A pool |, Q pool A pool, Fused Q, Fused A]Inputting the splicing vector into a classification layer, and obtaining prediction output pred ═ p through Softmax classification₁,p₂,...,p_n]Wherein p is_iThe predicted value of the ith candidate answer is represented by 0 or 1,0 represents dissimilarity, 1 represents similarity, and n represents the number of test samples in the sample set.

Further, the semantic similarity calculation method in step 3) adopts any one of cosine distance, manhattan distance, euler measurement and point multiplication measurement.

The invention has the following beneficial effects:

(1) the word representation of the long text data is obtained by using the pre-training language model BERT in the natural language processing technology, an additional data word segmentation stage is not needed, and the problem of inaccurate word segmentation caused by a word segmentation tool is avoided, so that the problem of semantic error propagation caused by inaccurate word segmentation is avoided;

(2) a two-stage method is designed, the first stage effectively uses a transfer learning method to utilize large-scale parallel corpus knowledge, the second stage uses a simple training prediction network to have higher model reasoning efficiency, and the two-stage task is integrated to have higher answer selection recall precision;

(3) aiming at a large-batch answer search scene, the method for directly obtaining the sentence vectors of all the text sequences can effectively avoid time-consuming calculation among a plurality of text pairs of the pre-training language model, and is higher in efficiency. For example: when the pre-training language model calculates the matching scores of the same question and m answers, the question and one answer need to be paired and sent into the model to be calculated every time, so that the question is repeatedly coded m times, the question and the answer are coded 2 x m times in total, the value m is very large in a large-scale search scene, and extra time overhead is very large, while the method only needs to obtain sentence vectors of the question and all the answers, only needs to code the question once and the answer m times for m +1 times, compared with the 2 x m times of coding work, the method reduces nearly half of coding time, and therefore the efficiency is higher;

(4) the invention adopts the pre-training language model BERT as the feature extractor, can effectively carry out semantic modeling on the long text data, and avoids the phenomena of 'semantic migration' and 'semantic gap' on the long text data by the existing answer selection method.

Drawings

FIG. 1 is a diagram of a transfer learning model architecture for a long text answer selection method based on transfer learning sentence vectors;

fig. 2 is a diagram of a training prediction model structure of a long text answer selection method based on a migration learning sentence vector.

Detailed Description

The present invention is described in detail below with reference to specific examples.

Because question-answer data in the medical field generally has the characteristic of short question answers and long question answers, the matching effect and recall precision of the existing answer selection method cannot meet the requirement of online, and therefore the long text answer selection method based on the migration learning sentence vector provided by the invention can effectively process the long text answer selection question through experimental verification.

As shown in fig. 1, the long text answer selection method based on a migration learning sentence vector provided by the present invention includes an input layer, a feature extraction layer, an attention aggregation network layer, and a classification layer, where the feature extraction layer adopts BERT for modeling, and BERT adopts BERT weight parameters covered by full words for initialization;

the input layer corresponds to two text sequences of questions and answers, and the two texts are processed according to the input format [ CLS ] + Question + [ SEP ], [ CLS ] + Answer + [ SEP ] of BERT. After the BERT characteristic modeling, averaging the outputs of 12 layers of pooling layers to obtain pooling outputs with uniform dimensionality, wherein the dimensionality length is 768 dimensions; the attention aggregation network layer obtains semantic alignment output by two text sequences through an attention mechanism, an alignment vector Z2 and a pooling output Z1 are transformed through [ Z1, Z2], [ Z1, Z1-Z2], [ Z1, Z1 x Z2], and are spliced through one layer of linear transformation to obtain [ O1, O2 and O3], the spliced vector is subjected to one layer of linear transformation and uses a Dropout mechanism to obtain question attention aggregation output FusedQ and answer attention aggregation output FusedA, the two are spliced with the pooling output to obtain [ Q pool, A pool, | Q pool-A pool |, Q pool a pool, Fused Q and Fused A ], the predicted output is obtained through Softmax classification, and the semantic transfer learning sentence network training is obtained.

As shown in fig. 2, in the long text answer selection method based on a transfer learning sentence vector provided by the present invention, the training prediction network included in the adopted training prediction network includes an input layer, a feature extraction layer and a distance measurement layer, the feature extraction layer adopts BERT, and is initialized by using the transfer learning weight parameter trained in step 3);

the input layer corresponds to two text sequences of questions and answers, and the two texts are processed according to the input format [ CLS ] + Question + [ SEP ], [ CLS ] + Answer + [ SEP ] of BERT. After the BERT characteristic modeling, averaging the outputs of 12 layers of pooling layers to obtain pooling outputs with uniform dimensionality, wherein the dimensionality length is 768 dimensions; initializing by using the transfer learning weight parameters trained in the step 3) to obtain sentence vectors with more similar semantics, calculating the similarity of the two sentence vectors by adopting cosine distance, Manhattan distance, Euler measurement and point multiplication measurement, and segmenting the similarity by using a threshold to obtain two classification values whether the similarity is similar or not.

In an embodiment of the present invention, answer selection is performed on long text question-and-answer data by using the above transfer learning sentence vector network and training prediction network, and the steps are as follows:

step one, a crawler frame is constructed through Python and XPATH, doctor-patient question and answer data are captured for medical inquiry platforms such as a Sanjiu health network, webpage labels outside texts, such as < div > and the like, are removed through a certain rule method, duplication of the data is removed, about 575 thousands of pieces of doctor-patient question and answer data are finally obtained through processing, and the doctor-patient question and answer data are stored in a warehouse according to a (question, disease description and disease answer) triple form.

And step two, recalling correlated answers to the questions by using a Lucene tool, recalling 500 negative sample answer sets which are sorted according to the correlation degree, and extracting one negative sample from the 1 st to 5 th negative samples, one negative sample from the 5 th to 50 th negative samples, one negative sample from the 50 th to 100 th negative samples and one negative sample from the 100 th to 500 th negative samples. For samples with less than 100 relevant negative sample answers to the recall, the sampling between the last 100 th to 500 th is reduced in the candidate answer set construction. 4354417 small sample data sets are sampled according to topic categories to serve as training prediction data sets, the small sample data sets comprise 120000 training sets, 20000 verification sets and 20000 test sets, and marking data are taken as migration learning data sets according to 8:1 of the total amount, wherein the migration learning data sets do not have cross parts with the training prediction data sets.

In one embodiment of the present invention, the corpus format is as follows:

wherein Question represents a Question text and Answer represents an Answer text.

And step three, building a transfer learning sentence vector network by using a Pythrch, initializing by using a full-word covering BERT weight parameter, wherein the network comprises an input layer, a feature extraction layer, an attention aggregation network layer and a classification layer, training and predicting are carried out on the transfer learning data set obtained in the step two, and finally, a sentence vector model weight file with more similar semantic vectors is obtained.

The loss function of the transfer learning sentence vector network training adopts cross entropy loss:

loss＝-y*logy′

where y represents the true label of whether the answer to the question matches, and y' is the model prediction vector of whether the sample data matches.

In the test set, for one question q and 3 answers [ a ]₁,a₂,a₃]For prediction vector pred [0.71,0.68.0.35 ]]And the real label ═ 0,1,0]In accordance with the following MRR calculation formula, | Q | ═ 1, pred ═ 1,1,0, and the threshold value 0.5 are used as the partition prediction results]The correct answer label can be known to predict correctly according to the real label, and the answers are sequenced according to the prediction probability to know that the second answer prediction probability is highest and arranged at the second position, namely rank_iWhen 2, MRR 1/2 is 0.5. According to the Precision-K calculation formula, K is 1,2 and 3, and it can be known that when K is 1, num (true answers) is 0, then Precision @1 is 0; when K is 2, num (true answers) is 1, sum (related K answers) is 2, Precision @2 is 0.5; when K is 3, num (true answers) is 1, sum (related K answers) is 3, Precision @3 is 1/3 is 0.33. In the embodiment, only one question and a plurality of answers are explained, a plurality of questions exist in the test set, and the final result index is calculated according to the average value of the number of the questions.

And step four, building a training prediction network by using a Pythrch, initializing by using a transfer learning sentence vector network weight model in the step three, wherein the training prediction network comprises an input layer, a feature extraction layer and a distance measurement layer, and training prediction is carried out on the training prediction small sample data set obtained in the step two.

The loss function of the training prediction network adopts the loss of mean square error:

loss＝(y-y＇)²

After the question sentence vector and the answer sentence vector are obtained, the semantic similarity of the two sentence vectors is calculated by using a cosine similarity classifier, and the formula is as follows, for example, the question sentence vector is [1,1,0,0,1 ]]The answer sentence vector is [0,1,1,0]Then calculate the similarity as

And (3) obtaining a pred prediction result aiming at all samples in the test set, simultaneously comparing the pred prediction result with the real label, and obtaining the index on the test set according to the MRR and Precision @ K (K takes a value of 1,2 and 3) calculation formula.

And fifthly, reasoning on the test set data by using the model trained in the step four, and finally segmenting the obtained predicted value according to a threshold value to obtain whether the answers to the questions are similar in semantic meaning.

Compared with the prior art, firstly, the invention does not need to perform word segmentation on the data set text sequence, and directly takes the complete question answer sentence as input, thereby avoiding error propagation caused by a word segmentation tool. Secondly, the training prediction network of the second stage has simple structure and high calculation efficiency. Finally, a transfer learning method is introduced to combine a twin network structure and an attention mechanism to obtain sentence vector model weights with more similar semantics, a sentence-level semantic vector is provided for the training prediction network of the second stage, and a better effect is obtained compared with a traditional method and a common deep learning network method, and especially for long text data, the effect is more prominent. In order to objectively evaluate the performance of the model of the present invention, the model of the present invention was compared with other models, including siemese RNN, QACNN, dett, Cam, Seq Match Seq, ESIM. The evaluation indexes adopted by the embodiment are MRR, Precision @1, Precision @2 and Precision @ 3. These indices are used to evaluate the similarity between the question and the recalled answer. The larger the value, the better the effect. As shown in Table 1, the invention integrates two-stage tasks, has higher answer selection recall precision, and has better model effect than all comparison models. As shown in Table 2, compared with the pre-trained language model BERT, the method of the invention has the advantages that the time consumption of the inference stage is only 0.5 second, and the efficiency is high.

TABLE 1 recall accuracy results of comparative experiments

Model (model)	MRR	Precision@1	Precision@2	Precision@3
					Siamese RNN	0.571769	0.311137	0.580483	0.833433
QACNN	0.612844	0.363327	0.650470	0.873225
					DEATT	0.525945	0.258348	0.508098	0.745051
Cam	0.636339	0.415917	0.656469	0.827634
					Seq Match Seq	0.631340	0.407518	0.651070	0.828834
ESIM	0.523529	0.254749	0.505299	0.743251
					The invention	0.739136	0.543491	0.818636	0.971406

TABLE 2 comparison of the calculated time-consuming results of the present invention and the pre-trained language model

Model (model)	Reasoning phase is time consuming (answer number m is 4)
		Pre-training language model BERT	4.5 seconds
The invention	0.5 second

The above examples only show one embodiment of the present invention, and the description is specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A long text answer selection method based on a transfer learning sentence vector is characterized by comprising the following steps:

1) obtaining authoritative doctor-patient question-answer data, and taking answers in the doctor-patient question-answer data as positive samples; for questions in the doctor-patient question-answer data, a Lucene index tool is used for retrieving and recalling correlation answers, and the correlation answers are used as negative samples; constructing an answer selection data set according to the obtained positive sample and the negative sample, and dividing a transfer learning data set and a training prediction data set according to the proportion of 27: 1-8: 1;

training a transfer learning sentence vector network by using the transfer learning data set obtained in the step 1), inputting a pairing sample into a twin network structure, wherein paired input layers correspond to two text sequences of a Question and an Answer, and respectively processing a Question text and an Answer text according to the input format [ CLS ] + Question + [ SEP ], [ CLS ] + Answer + [ SEP ] of BERT; after the BERT characteristic modeling, averaging the output of 12 layers of pooling layers to respectively obtain pooling output with unified dimensionality: a question pooling output Q pool and an answer pooling output A pool, wherein the dimension length is 768 dimensions;

inputting the question pooling output Q pool and the answer pooling output A pool into an attention layer, and respectively obtaining question semantic alignment vectors Z through an attention mechanism₂And answer semantic alignment vector Z₂'; mixing Q pool, A pool and Z₂And Z₂' input to the aggregation network layer, for problem, Q pool and Z₂By [ Q pool, Z₂],[Q pool,Qpool-Z₂],[Q pool,Q pool*Z₂]Transforming, and splicing by one layer of linear transformation to obtain splicing vector O₁,O₂,O₃]The spliced vector is subjected to one layer of linear transformation and uses a DropOut mechanism to obtain a Fused output Fused of the attention of the problem_Q(ii) a Similarly, for answers, A pool and Z₂' obtaining the answer attention aggregation output Fused through the aggregation network layer_A；

Will fuse_Q、Fused_AFurther splicing Q pool and A pool to obtain [ Q pool, A pool, | Q pool-A pool |, Q pool A pool, Fused Q, Fused A]Inputting the splicing vector into a classification layer, and obtaining prediction output pred ═ p through Softmax classification₁,p₂,...,p_n]Wherein p is_iThe predicted value of the ith candidate answer is represented as 0 or 1,0 represents dissimilar, 1 represents similar, and n represents the number of the test samples in the sample set;

matching two classification values of whether the question is matched with the answer or not with the real label by adopting an MRR (Markov random reference) and Precision @ K evaluation index method, and selecting a network parameter corresponding to the model with the highest matching score to obtain a BertAttTL migration learning sentence vector model; the MRR and Precision @ K evaluation index method specifically comprises the following steps:

expressing the output of the transfer learning sentence vector network or the training prediction network as pred ═ p₁,p₂,...,p_n]Wherein p is_iThe predicted value of the ith candidate answer is represented as 0 or 1,0 represents dissimilarity, 1 represents similarity, and n represents the number of test samples in the sample set; the real tag data is expressed as label ═ t₁,t₂,,...,t_n]Wherein t is_iThe true label of the ith candidate answer is 0 or 1,0 represents dissimilar, and 1 represents similar; aiming at all candidate answers of a question, obtaining two classification values through a transfer learning sentence vector network or a training prediction network and then sequencing to obtain a ranking rank of correct answers aiming at the ith question_i；

The MRR calculation formula is as follows:

wherein Q is a problem set, and | Q | represents the number of all problems;

precision @ K is calculated as:

wherein Precision represents Precision, K represents the number of answers considered in the index, the values are 1,2 and 3, num (true answers) represents the number of correct answers, and Sum (related K answers) represents the total number of recalled related answers;

3) establishing a training prediction network which comprises a twin network structure and a distance measurement layer, wherein the twin network structure comprises a pair of an input layer, a feature extraction layer and a pooling layer; the feature extraction layer adopts a BERT model, the BERT model and pooling layer parameters in the training prediction network are initialized by using the weight parameters of the BertAttTL migration learning sentence vector model obtained in the step 2), question sentence vectors and answer sentence vectors are output through the pooling layer, the two sentence vectors are input into the distance measurement layer to obtain semantic similarity, and the semantic similarity is obtained, and the semantic similarity is divided by using a threshold value according to the similarity to obtain two similar classification values which are used as prediction contents to be output; training a training prediction network by using the training prediction data set obtained in the step 1), matching the finally obtained two classification values with the real labels by adopting an MRR (Markov random Access) and Precision @ K (K) evaluation index method, and selecting network parameters corresponding to a model with the highest matching score to obtain the trained training prediction network;

4) inputting the questions to be processed and the answer texts into the training prediction network obtained in the step 3), and outputting the binary values of all the candidate answers to obtain the final answer of the questions to be processed.

2. The method as claimed in claim 1, wherein the semantic similarity calculation in step 3) is performed by any one of cosine distance, manhattan distance, euler metric, and dot product metric.