WO2022088672A1 - 基于bert的机器阅读理解方法、装置、设备及存储介质 - Google Patents

基于bert的机器阅读理解方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2022088672A1
WO2022088672A1 PCT/CN2021/097422 CN2021097422W WO2022088672A1 WO 2022088672 A1 WO2022088672 A1 WO 2022088672A1 CN 2021097422 W CN2021097422 W CN 2021097422W WO 2022088672 A1 WO2022088672 A1 WO 2022088672A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
question
vector information
trained
text
Prior art date
Application number
PCT/CN2021/097422
Other languages
English (en)
French (fr)
Inventor
侯丽
刘翔
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022088672A1 publication Critical patent/WO2022088672A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular, to a BERT-based machine reading comprehension method, apparatus, computer device, and computer-readable storage medium.
  • Machine reading comprehension is an important part of artificial intelligence technology, and with the advent of deep learning in the past few years, machine reading comprehension (which requires machines to answer questions based on a given context) has gained more and more attention, especially With the development of pre-trained language models represented by BERT (Bidirectional Encoder Representations from Transformers pre-trained language models), machine reading comprehension tasks have developed rapidly, mainly reflected from focusing on limited text to combining external knowledge, from focusing on specific snippets to a comprehensive understanding of the context. However, in practical application scenarios, it is often faced with a single question corresponding to documents retrieved by multiple search engines, that is, it is necessary to integrate the information of multiple documents to predict the answer.
  • BERT Bidirectional Encoder Representations from Transformers pre-trained language models
  • the Question Answering dataset of MS MARCO Microsoft MAchine Reading Comprehension
  • Each record contains one question, ten candidate documents, and these ten One or both of the candidate answers are documents that contain the answer to the question.
  • the ROUGEL value of the answer to reading comprehension directly using the long text spliced by ten candidate documents is about 0.48, which is better than reading comprehension directly in a single document containing the answer.
  • the ROUGEL value of the answer is about 0.56, and the difference between the two is about 0.08 points.
  • ROUGEL includes ROUGE (Recall-Oriented Understudy for Gisting Evaluation to evaluate automatic summarization and a set of indicators for machine translation) and L (longest common subsequence, longest common subsequence), ROUGEL value is a general answer quality evaluation in the field of machine reading comprehension Indicator, the larger the ROUGEL value, the better the quality of the predicted answer.
  • the use of the bert-based pre-trained language model to a certain extent solves the disadvantage of the limited input length of the existing model for multi-document scenarios, but its accuracy is lower than the accuracy of reading comprehension for a single document.
  • the main purpose of this application is to provide a BERT-based machine reading comprehension method, device, computer equipment, and computer-readable storage medium, which aims to solve the problem that existing models use bert-based pre-training language models to a certain extent.
  • the present application provides a BERT-based machine reading comprehension method, and the BERT-based machine reading comprehension method includes the following steps:
  • the present application also provides a BERT-based machine reading comprehension device, where the BERT-based machine reading comprehension device includes:
  • the first generation module is used to obtain the first question to be trained and a plurality of candidate documents, and the first question is combined with each candidate document to generate a pair of question documents to be trained;
  • the second generation module is used to The question document to be trained trains the first preset pre-trained language model to generate a document sorting model;
  • the third generation module is used for training the preset multi-document answer prediction model according to the question document to be trained, and generates a reading comprehension model;
  • an acquisition module configured to acquire a pair of question documents to be predicted, wherein the pair of question documents to be predicted includes a second question and a plurality of candidate documents corresponding to the second question;
  • an output module configured to sort the document based on the document sorting model , output the target document corresponding to the second question according to the question-document pair to be predicted;
  • a second acquisition module is configured to acquire the second question and the target document based on the reading comprehension model according to the second question and the target document
  • the reading comprehension model outputs the target text in the target
  • the present application also provides a computer device, the computer device comprising a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program is executed by the When the processor executes, the following steps are implemented:
  • the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, wherein when the computer program is executed by a processor, the following steps are implemented:
  • the present application provides a BERT-based machine reading comprehension method, device, computer equipment, and computer-readable storage medium.
  • the first question is combined with each candidate document respectively.
  • generate a pair of question documents to be trained train a first preset pre-trained language model according to the pair of question documents to be trained, and generate a document sorting model; train a preset multi-document answer prediction model according to the pair of question documents to be trained, generate Understanding the model; obtaining a pair of question documents to be predicted, wherein the pair of question documents to be predicted includes a second question and a plurality of candidate documents corresponding to the second question; based on the document sorting model, according to the question document to be predicted Yes, output the target document corresponding to the second question; based on the reading comprehension model, according to the second question and the target document, obtain the target text in the target document output by the reading comprehension model, and use The target text is used as the reading comprehension answer to the second question, and is implemented in the document sorting
  • the correlation between the candidate documents, so that the candidate documents are first sorted by scores, and the document with the highest score is output as the input document of the reading comprehension model.
  • the multi-document reading comprehension problem is converted into a single-document reading comprehension problem, and the interference of extracting answers during reading comprehension is reduced, thereby improving the accuracy of multi-document reading comprehension answers.
  • FIG. 1 is a schematic flowchart of a BERT-based machine reading comprehension method provided by an embodiment of the present application
  • Fig. 2 is a schematic flow chart of sub-steps of the BERT-based machine reading comprehension method in Fig. 1;
  • Fig. 3 is a schematic flow chart of sub-steps of the BERT-based machine reading comprehension method in Fig. 1;
  • Fig. 4 is a schematic flow chart of sub-steps of the BERT-based machine reading comprehension method in Fig. 1;
  • FIG. 5 is a schematic block diagram of a BERT-based machine reading comprehension device provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural block diagram of a computer device according to an embodiment of the present application.
  • Embodiments of the present application provide a BERT-based machine reading comprehension method, apparatus, computer device, and computer-readable storage medium.
  • the BERT-based machine reading comprehension method can be applied to a computer device, and the computer device can be an electronic device such as a notebook computer and a desktop computer.
  • FIG. 1 is a schematic flowchart of a BERT-based machine reading comprehension method provided by an embodiment of the present application.
  • the BERT-based machine reading comprehension method includes steps S101 to S106.
  • Step S101 acquiring a first question to be trained and a plurality of candidate documents, and combining the first question with each candidate document respectively to generate a pair of question documents to be trained.
  • a first question to be trained and multiple candidate documents are obtained, and the first question and each candidate document are combined.
  • the number of candidate documents to be trained is 10 documents
  • any one of the 10 candidate documents is combined with the first question to obtain the corresponding question document pair, and the obtained multiple question question document pairs are used as the waiting The training question document pair.
  • the question-document pairs to be trained include multiple question-document pairs, and the number of candidate documents is the same as the number of question-document pairs. For example, if the number of candidate documents is 10, then the number of question document pairs is 10.
  • Step S102 Train a first preset pre-trained language model according to the problem document pair to be trained, and generate a document sorting model.
  • the generated question document pair to be trained is input into the first preset pre-trained language model, and the common words of each question document pair in the question document pair to be trained are used. feature, obtain the probability value of each pair of question documents, obtain the corresponding loss function through the probability value of each pair of question documents, and update the model parameters of the first preset pre-training language model through the loss function to generate document sorting Model.
  • step S102 includes: sub-step S1021 to sub-step S1028 .
  • Sub-step S1021 Determine the first text vector information of the pair of question documents to be trained according to the dictionary file and the pair of question documents to be trained.
  • the first preset pre-trained language model includes a dictionary file vocab.txt, and the first question and each candidate document in the pair of question documents to be trained are segmented through the dictionary file vocab.txt, and the segmented The first question is spliced with each candidate document to obtain a corresponding first text sequence.
  • the first text sequence includes the identification type of the first text sequence, and the segmentation position symbols of the first question and each candidate document.
  • the obtained first text sequence is vectorized to obtain the corresponding text vector information.
  • the determining the first text vector information of the pair of question documents to be trained according to the dictionary file and the pair of question documents to be trained includes: pairing the question documents to be trained according to the dictionary Perform word segmentation on the pair to obtain the first question sequence of the first question in the pair of question documents to be trained and the document sequence of each document; splicing the first question sequence and the document sequence to generate corresponding The first text sequence of ; perform feature vector conversion on the first text sequence to obtain the corresponding first text vector information.
  • the first question and each candidate document in the pair of training question documents are segmented according to words by the dictionary file vocab.txt, to obtain the first question sequence of the first question and the candidate document sequence of each candidate document,
  • the first question sequence includes a plurality of words tokens_a
  • each candidate document sequence includes a plurality of words tokens_b.
  • the obtained first question sequence and each candidate document sequence are spliced to obtain a corresponding first text sequence.
  • the obtained first question sequence and each candidate document sequence are spliced, the spliced position is marked, and the starting position of the first question sequence is marked with [CLS], and this [CLS] is used as the first text sequence. Semantic notation.
  • the multiple candidate document sequences include a first candidate document sequence and a second candidate document sequence
  • the concatenated first text sequence is [CLS] first question sequence [SEP] first candidate document sequence [SEP] second document candidate Sequence [SEP] et al.
  • each word in the first text sequence is converted with the pre-trained word feature vector information to obtain the corresponding first text vector information, wherein the first text vector information includes the information of each word in the text sequence Semantic vector information, location vector information, and summed vector information for segment representations.
  • the pre-trained word feature vector information is converted to represent the first text sequence as a series of vectors capable of expressing text semantics.
  • the candidate document sequence is "you help me” or "I help you”, and the binary grammar of "you help me” is: “you, you help, help, help me, me”; “I help you”
  • the binary grammar is in turn: “I, I help, help, help you, you”, so that a dictionary can be constructed ⁇ "you”: 1, "you help”: 2, "help”: 3, "help me”: 4, “I”: 5, "I help”: 6, "Help me”: 7 ⁇
  • the vectorized result of "you help me” is represented by a dictionary as [1, 1, 1, 1, 1, 0, 0 ]; the "I help you” vectorized result is represented as [1, 0, 1, 0, 1, 1].
  • Sub-step S1022 Acquire first text semantic vector information corresponding to the first text vector information according to the self-attention network model and the first text vector information.
  • the first preset pre-trained language model includes a multi-head attention network model
  • the acquired text vector information is input into the multi-head attention network model
  • the multi-head attention network model obtains the fusion context information in the input text vector.
  • the vector representation corresponding to each word of obtains the first text semantic vector information output by the multi-head attention network model.
  • the acquiring first text semantic vector information corresponding to the first text vector information according to the self-attention network model and the first text vector information includes: converting the first text The vector information is input into the self-attention network model, and the text semantic vector information of each semantic space of the first text vector information is obtained; according to the text semantic vector information of each semantic space, the output of the self-attention network model is obtained The first text semantic vector information of .
  • the acquired first text vector information is input into a multi-head attention network model, where the multi-head attention network model includes a first linear mapping layer, and the text vector information is mapped to different images through the first linear mapping layer.
  • the semantic vector of the semantic space captures semantic information of different dimensions.
  • Concat is a vector splicing operation
  • W is the linear term that maps different semantic spaces back to the initial semantic space
  • C is the text semantic vector output by the multi-head self-attention network model.
  • the spliced vector information is mapped back to the original semantic space through the first linear mapping layer to obtain the output first text semantic vector information.
  • Sub-step S1023 Based on the stuttering tool, obtain the position feature vector information of the first question and each candidate document in the pair of question documents to be trained.
  • the first preset pre-trained language model includes jieba (a stuttering tool).
  • the stuttering tool is, for example, a stuttering word segmentation, and the words of all the words in the first text sequence of the document pair to be trained are extracted through the stuttering word segmentation.
  • the position features of the first question and each candidate document in the first text sequence of the question-document pair to be trained are extracted by the stuttering tool.
  • the first text sequence includes semantic symbols of the first question sequence and segmentation position symbols of each candidate document sequence, wherein the semantic symbols are used as the starting position symbols of the first question sequence.
  • the semantic symbols of the first question sequence and the segmentation position symbols of each candidate document sequence in the first question sequence are identified by the stuttering tool, and the position feature of the first question and the position feature of each candidate document are obtained.
  • One-hot encoding is performed on the obtained position features of the first question and the position features of each candidate document to obtain corresponding position feature vector information.
  • the one-hot encoding is an effective one-bit encoding, mainly using an N-bit state register to encode N states, each state has its own independent register bit, and only one bit is valid at any time
  • the one-hot Hot encoding is the representation of categorical variables as binary vectors. Categorical values are first mapped to integer values, then, each integer value is represented as a binary vector, which is zero except for the index of the integer, which is marked as 1.
  • Sub-step S1024 Determine the same word feature and non-common word feature of the first question and each candidate document in the pair of question documents to be trained, and obtain corresponding word feature vector information.
  • the same word features of the first question and each candidate document in the pair of question documents to be trained are determined. For example, after obtaining the first text sequence of the document pair to be trained, the first text sequence includes each The word tokens_a and each word tokens_b of each candidate document, any word tokens_b in each candidate document is matched with each word tokens_a of the first question, so as to obtain the first question and each candidate document.
  • Common word features and non-common words feature When the common word feature and the non-common word feature are obtained, binarize the common word feature and the non-common word feature to obtain the word vector feature corresponding to the common word feature and the non-common word feature information. For example, the acquired words with common word features are marked as 1, and the words with non-common word features are marked as 0.
  • Sub-step S1025 Obtain named entity feature vector information of the pair of question documents to be trained according to the Chinese word segmentation tool and the pair of question documents to be trained.
  • the first preset pre-trained language model includes a Chinese word segmentation tool (stanford corenlp), and the named entity in the text pair to be trained is determined by the Chinese word segmentation tool.
  • Named entities are person names, institution names, place names, and all other entities identified by names, such as numbers, dates, currencies, addresses, and so on.
  • a first text sequence of text pairs to be trained is acquired, and the first text sequence includes the word tokens_a of the first question and each word tokens_b of each candidate document.
  • the words corresponding to the named entities in each word tokens_a of the first question and the words corresponding to the named entities in each word tokens_b of each candidate document are determined, and the words corresponding to the named entities are subjected to one-hot encoding processing, Obtain the corresponding named entity feature vector information.
  • the one-hot encoding is an effective one-bit encoding, mainly using an N-bit state register to encode N states, each state has its own independent register bit, and only one bit is valid at any time
  • the one-hot Hot encoding is the representation of categorical variables as binary vectors. Categorical values are first mapped to integer values, then, each integer value is represented as a binary vector, which is zero except for the index of the integer, which is marked as 1.
  • Sub-step S1026, according to the first text semantic vector information, the position feature vector information, the same word feature vector information and the named entity feature vector information, obtain the first text of the question document pair to be trained Semantic feature vector information.
  • the first text semantic vector information, location feature vector information, word feature vector information, and named entity feature vector information of the data to be trained are acquired, the first text semantic vector information, location feature vector information of the data to be trained The vector information, word feature vector information and named entity feature vector information are superimposed to obtain textual semantic feature vector information of the question document pair to be trained.
  • the semantic feature vector information includes the semantic feature vector information associated with the first question and each candidate document pair.
  • the first text semantic vector information, position feature vector information, word feature vector information and named entity feature vector information of the data to be trained are additionally stored in a unified file, and the corresponding first text semantic feature vector information is obtained. .
  • Sub-step S1027 Obtain a corresponding first loss function according to the first text semantic feature vector information.
  • the association vector information of each candidate document and the first question is obtained from the text semantic feature vector information, wherein the association vector information is that each candidate document contains the answer to the first question and the element corresponding to the position is 1, otherwise is 0.
  • the semantic vector of each candidate document is acquired from the text semantic vector information.
  • the semantic vector of each candidate document is linearly transformed to obtain the probability score value of each candidate document, and the obtained probability score values of multiple candidate documents are composed of Multidimensional vector information.
  • the log_softmax value is obtained according to the calculation of the multi-dimensional vector information. Through the log_softmax value and the associated vector information, the corresponding first loss function is obtained.
  • Sub-step S1028 Update the model parameters of the first preset pre-trained language model according to the first loss function to generate a document ranking model.
  • the corresponding model parameters are obtained through a back-propagation mechanism, and the model parameters of the first preset pre-trained language model are updated through the model parameters to generate a corresponding document sorting model.
  • Step S103 training a preset multi-document answer prediction model according to the question document to be trained to generate a reading comprehension model.
  • the question document pair to be trained includes a first question and a plurality of candidate documents, determine a target candidate document that contains an answer to the first question in the plurality of candidate documents, and select the target candidate document.
  • a new question document pair is formed with the first question.
  • a preset multi-document answer prediction model is trained according to the second text semantic vector information, and a corresponding reading comprehension model is generated.
  • step S103 includes: sub-step S1031 to sub-step S1033 .
  • Sub-step S1031 Determine the target candidate document most similar to the answer of the first question among the multiple candidate documents of the question document pair to be trained, and form a new question with the first question and the target candidate document document pair.
  • a pair of question documents to be trained is obtained, and the pair of question documents to be trained includes a first question and multiple candidate document pairs, and a marked candidate document pair among the multiple candidate document pairs is obtained, and the marked candidate document pair is obtained.
  • the document pair is used as a target candidate document pair, and the target candidate document pair and the first question form a new question document pair.
  • Sub-step S1032 Obtain second text semantic vector information of the new question document pair according to the second preset pre-trained language model.
  • the second preset pre-trained language model includes a dictionary file vocab.txt, and the first question and the target candidate document in the new question document pair are segmented through the dictionary file vocab.txt, and the segmented The first question and the target candidate document are spliced to obtain the corresponding second text sequence.
  • the second text sequence includes the identification type of the second text sequence, and the segmentation position symbols of the first question and the target candidate document.
  • the obtained second text sequence is vectorized to obtain the corresponding second text vector information.
  • the first question and the target candidate document in the pair of training question documents are segmented by the dictionary file vocab.txt according to words, and the first question sequence of the first question and the target candidate document sequence of the target candidate document are obtained, for example , the first question sequence includes multiple word tokens_a, and the target candidate document sequence includes multiple word tokens_b.
  • the obtained first question sequence and target candidate document sequence are spliced to obtain a corresponding second text sequence.
  • the obtained first question sequence and the target candidate document sequence are spliced, the spliced position is marked, and the starting position of the first question sequence is marked with [CLS], and this [CLS] is used as the second text sequence.
  • Semantic notation Take [SEP] as the separation symbol between the first question sequence and the target candidate document sequence.
  • the concatenated second text sequence is [CLS] first question sequence [SEP] target candidate document sequence [SEP].
  • the pre-trained word feature vector information When obtaining the second text sequence, convert each word in the second text sequence with the pre-trained word feature vector information to obtain the corresponding second text vector information, wherein the second text vector information includes each word in the second text sequence Semantic vector information, position vector information, and summed vector information of segment representations of words.
  • the pre-trained word feature vector information is converted to represent the second text sequence as a series of vectors capable of expressing the semantics of the second text.
  • the target candidate document sequence is "you help me” or "I help you”
  • the binary grammar of "you help me” is: “you, you help, help, help me, me”; “I help you”
  • the binary grammar of the order is: “I, I help, help, help you, you", thus constructing a dictionary ⁇ "you”: 1, "you help”: 2, "help”: 3, "help me” : 4, “I”: 5, "I help”: 6, "Help me”: 7 ⁇
  • the vectorized result of "You help me” is represented as [1, 1, 1, 1, 1, 0, 0];
  • the "I help you” vectorized result is represented as [1, 0, 1, 0, 1, 1].
  • the second preset pre-trained language model includes a multi-head attention network model, and the acquired second text vector information is input into the multi-head attention network model, and the multi-head attention network model obtains the input second text vector information and fuses the context The vector representation corresponding to each word of the information is obtained, and the second text semantic vector information output by the multi-head attention network model is obtained.
  • the acquired second text vector information is input into a multi-head attention network model, where the multi-head attention network model includes a first linear mapping layer, and the second text vector information is mapped to different
  • the semantic vector of the semantic space captures semantic information of different dimensions.
  • Concat is a vector splicing operation
  • W is the linear term that maps different semantic spaces back to the initial semantic space
  • C is the second text semantic vector output by the multi-head self-attention network model.
  • the spliced vector information is mapped back to the original semantic space through the first linear mapping layer to obtain the output second text semantic vector information.
  • Sub-step S1033 Train a preset multi-document answer prediction model according to the second text semantic vector information and the preset labeled answer document, and generate a corresponding reading comprehension model.
  • a preset multi-question document answer prediction model is trained on the second text semantic vector and the preset labeled answer document.
  • the preset multi-question document answer prediction model is a preset multi-document machine reading comprehension answer prediction model
  • the preset multi-document machine reading comprehension answer prediction model is trained by using the second text semantic vector information and the pre-labeled answer document to obtain the multi-document machine reading comprehension answer prediction model.
  • the target candidate document corresponding to the second text semantic vector information has multiple answer starting position probabilities and multiple answer ending position probabilities, as well as the starting position probabilities and answer ending position probabilities of the preset labeled answer document.
  • the training of a preset multi-document answer prediction model according to the second text semantic vector information and the preset labeled answer document to generate a corresponding reading comprehension model includes: converting the second text semantic vector Information and preset labeled answer documents are input into a preset multi-document machine answer prediction model, and the answer start position probability and answer end position probability of the target document in the second text semantic vector information are obtained, as well as the preset labeled answer.
  • the answer starting position probability and the answer ending position probability of the document according to the answer starting position probability and the answer ending position probability of the target document, and the answer starting position probability and the answer ending position probability of the preset labeled answer document, get The corresponding second loss function; according to the second loss function and the back-propagation mechanism, the model parameters of the preset multi-document answer prediction model are updated to generate a corresponding reading comprehension model.
  • the second text semantic vector information and the preset labeled answer document are input into the preset multi-document machine answer prediction model, and the target candidate document in the second text semantic vector information is calculated by the preset multi-document machine answer prediction model.
  • the answer starting position probability and the answer ending position probability of each word of , and the answer starting position probability and the answer ending position probability of the preset labeled answer document, where the answer starting position probability and the answer ending position probability are 1.
  • the answer start position probability and the answer end position of each word in the target candidate document in the second text semantic vector information are calculated to obtain the answer start position probability and answer end position Location probability.
  • the starting position of the answer for the first question and the ending position of the answer for the first question in the preset labeled answer document according to the starting position of the answer for the first question and the answer to the first question in the preset labeled answer document.
  • the answer end position, and the answer start position probability and the answer end position probability of each word of the target candidate document in the second text semantic vector information to obtain a corresponding loss function.
  • LANS is the loss function
  • log is the logarithm
  • N is the number of samples.
  • Step S104 Acquire a pair of question documents to be predicted, wherein the pair of question documents to be predicted includes a second question and a plurality of document selection candidates corresponding to the second question.
  • a pair of question documents to be predicted is obtained.
  • the question document pair to be predicted includes a plurality of candidate documents corresponding to the second question and the second question, and the second question is combined with the candidate documents of each second question to obtain the corresponding The second question document pair, wherein the question document pair to be predicted includes a plurality of question document pairs.
  • Step S105 Based on the document sorting model, output the target document corresponding to the second question according to the pair of question documents to be predicted.
  • the document sorting model includes a dictionary file vocab.txt, through which the second question in the document pair to be predicted and each candidate document corresponding to the second question are segmented according to words, to obtain the second question.
  • the second question sequence of the question and the candidate document sequence of each candidate document for example, the second question sequence includes a plurality of words tokens_a, and each candidate document sequence includes a plurality of words tokens_b.
  • the obtained second question sequence and each candidate document sequence are spliced to obtain the corresponding second text sequence.
  • the obtained second question sequence and each candidate document sequence are spliced, the spliced position is marked, and the start position of the second question sequence is marked with [CLS], and this [CLS] is used as the second text sequence.
  • Semantic notation Take [SEP] as the separation symbol between the second question sequence and the candidate document sequence or each candidate document sequence.
  • the multiple candidate document sequences include a first candidate document sequence and a second candidate document sequence, and the spliced second text sequence is [CLS] Second Question Sequence [SEP] First Candidate Document Sequence [SEP] Second Document Candidate Sequence [SEP] et al.
  • each word in the second text sequence is converted with the pre-trained word feature vector information to obtain the corresponding second text vector information, wherein the second text vector information includes the information of each word in the text sequence.
  • the pre-trained word feature vector information is converted to represent the second text sequence as a series of vectors capable of expressing text semantics.
  • the candidate document sequence is "you help me” or "I help you”, and the binary grammar of "you help me” is: “you, you help, help, help me, me”; “I help you”
  • the binary grammar is in turn: “I, I help, help, help you, you”, so that a dictionary can be constructed ⁇ "you”: 1, "you help”: 2, "help”: 3, "help me”: 4, “I”: 5, "I help”: 6, "Help me”: 7 ⁇
  • the vectorized result of "you help me” is represented by a dictionary as [1, 1, 1, 1, 1, 0, 0 ]; the "I help you” vectorized result is represented as [1, 0, 1, 0, 1, 1].
  • the document sorting model includes a multi-head attention network model, and the acquired second text vector information is input into the multi-head attention network model, and the multi-head attention network model includes a first linear mapping layer, through which the first linear mapping layer will The text vector information is mapped to semantic vectors in different semantic spaces, capturing semantic information of different dimensions.
  • Concat is a vector splicing operation
  • W is the linear term that maps different semantic spaces back to the initial semantic space
  • C is the text semantic vector output by the multi-head self-attention network model.
  • the spliced vector information is mapped back to the original semantic space through the first linear mapping layer to obtain the output second text semantic vector information.
  • the document ranking model includes jieba (a stuttering tool), through which the positional features of the second question and each candidate document in the second text sequence are extracted.
  • the second text sequence includes semantic symbols of the second question sequence and segmentation position symbols of each candidate document sequence, wherein the semantic symbols are used as the starting position symbols of the second question sequence.
  • the semantic symbols of the second question sequence and the segmentation position symbols of each candidate document sequence in the second question sequence are identified by the stuttering tool, so as to obtain the position feature of the second question and the position feature of each candidate document.
  • the obtained position feature of the second question and the position feature of each candidate document are processed by one-hot encoding to obtain the corresponding position feature vector information.
  • the one-hot encoding is an effective one-bit encoding, mainly using an N-bit state register to encode N states, each state has its own independent register bit, and only one bit is valid at any time, and the one-hot Hot encoding is the representation of categorical variables as binary vectors. Categorical values are first mapped to integer values, then, each integer value is represented as a binary vector, which is zero except for the index of the integer, which is marked as 1.
  • the second text sequence includes each word tokens_a of the second question and each Each word tokens_b in the candidate documents, any word tokens_b in each candidate document is matched with each word tokens_a in the first question, so as to obtain the common word features and non-common word features of the second question and each candidate document.
  • the common word feature and the non-common word feature are obtained, binarize the common word feature and the non-common word feature to obtain the word vector feature corresponding to the common word feature and the non-common word feature information. For example, the acquired words with common word features are marked as 1, and the words with non-common word features are marked as 0.
  • the document ranking model includes a Chinese word segmentation tool (stanford corenlp), which determines the named entities in the document pair to be predicted.
  • Named entities are person names, institution names, place names, and all other entities identified by names, such as numbers, dates, currencies, addresses, and so on.
  • a second text sequence of question document pairs to be predicted is obtained, and the second text sequence includes the word tokens_a of the second question and each word tokens_b of each candidate document.
  • the words corresponding to the named entities in each word tokens_a of the second question and the words corresponding to the named entities in each word tokens_b of each candidate document are determined, and the words corresponding to the named entities are subjected to one-hot encoding processing, Obtain the corresponding named entity feature vector information.
  • the one-hot encoding is an effective one-bit encoding, mainly using an N-bit state register to encode N states, each state has its own independent register bit, and only one bit is valid at any time
  • the one-hot Hot encoding is the representation of categorical variables as binary vectors. Categorical values are first mapped to integer values, then, each integer value is represented as a binary vector, which is zero except for the index of the integer, which is marked as 1.
  • the second text semantic vector information When acquiring the second text semantic vector information, location feature vector information, word feature vector information and named entity feature vector information of the pair of question documents to be predicted, the second text semantic vector information, location feature vector information of the pair of question documents to be predicted The vector information, word feature vector information and named entity feature vector information are superimposed to obtain textual semantic feature vector information of the question document pair to be predicted.
  • the semantic feature vector information includes the semantic feature vector information associated with the second question and each candidate document pair.
  • the second text semantic vector information, position feature vector information, word feature vector information and named entity feature vector information of the problem document pair to be predicted are additionally stored in a unified file to obtain the corresponding second text semantic feature vector information.
  • the candidate document corresponding to the second question of is used as the target document, and the target document output by the document ranking model is obtained.
  • Step S106 based on the reading comprehension model, according to the second question and the target document, obtain the target text in the target document output by the reading comprehension model, and use the target text as the second question reading comprehension answers.
  • a second question document pair is generated between the target document and the second question, and the second question document pair is input into the preset reading comprehension model, and the preset reading comprehension model will be passed through the preset
  • the reading comprehension model determines that each word in the target document is the starting position of the answer to the second question and the probability that the answer ends. According to the probability that each word in the target document is the end position probability of the answer to the second question and each word in the target document is The probability of the starting position of the answer to the second question is obtained, and the starting position and ending position of the answer for the second question in the target document are obtained. According to the answer start position and answer end position of the second question, the target text in the target document is determined, so as to obtain the target text output by the reading comprehension model, and use the target text as the reading comprehension answer of the second question.
  • step S106 includes: sub-step S1061 to sub-step S1064.
  • Sub-step S1061 form the second question and the target document into a corresponding second question document pair, and input them into the input layer of the reading comprehension model.
  • the reading comprehension model includes an input layer
  • the acquired target document pair of the second question and the second question form a second question document pair
  • the second question document pair is input into the input layer of the reading comprehension model
  • the feature information of the second question and the target document is extracted through the input layer.
  • the second question and the target document are word-segmented to obtain the corresponding second question sequence and the target document sequence
  • the second question sequence and the target document sequence are spliced to obtain the corresponding target text sequence.
  • Sub-step S1062 based on the probability prediction layer of the reading comprehension model, predict the starting position probabilities and the ending position probabilities of multiple answers corresponding to the second question in the target document.
  • An example is to predict the starting position probability and the ending position probability of the answer corresponding to the second question in the target text through the probability prediction layer of the reading comprehension model.
  • Sub-step S1063 based on the probability comparison layer of the reading comprehension model, compare a plurality of the answer start position probabilities and the answer end position probabilities, and determine the target start position with the highest probability and the target end position with the highest probability .
  • the probability comparison layer of the reading comprehension model compare the probability that each word is the starting position of the answer to the second question and the probability that each word is the ending position of the answer to the second question, and determine the answer with the highest probability of starting position.
  • One word determine the position of the first word in the target document, and use the position of the first word in the target document as the target starting position; determine the second word with the highest probability of the end position of the answer, and determine the second word in the target document. position in the target document, and take the position of the second word in the target document as the target end position.
  • Sub-step S1064 Based on the output layer of the reading comprehension model, acquire the target text corresponding to the target start position and the target end position in the target document output by the output layer.
  • the corresponding target text is determined.
  • the part between the target start position and the target end position in the target document is used as the target text.
  • the target text in the target document is determined, the target text is output through the output layer of the reading comprehension model, so as to obtain the target text output by the reading comprehension model.
  • the document sorting model is implemented by adding part-of-speech tagging information, information on whether the characters in the document appear in the question, and named entity identification information to capture the correlation between the question and multiple candidate documents , so that the scores of the candidate documents are sorted first, and the document with the highest score is output as the input document of the reading comprehension model.
  • the multi-document reading comprehension problem is converted into a single-document reading comprehension problem, and the interference of extracting answers during reading comprehension is reduced, thereby improving the accuracy of multi-document reading comprehension answers.
  • FIG. 5 is a schematic block diagram of a BERT-based machine reading comprehension apparatus provided by an embodiment of the present application.
  • the BERT-based machine reading comprehension device 400 includes: a first generation module 401, a second generation module 402, a third generation module 403, a first acquisition module 404, an output module 405, and a second acquisition module 406.
  • the first generation module 401 is used to obtain a first question to be trained and a plurality of candidate documents, and combine the first question with each candidate document respectively to generate a pair of question documents to be trained;
  • the second generation module 402 is configured to train a first preset pre-trained language model according to the problem document pair to be trained, and generate a document sorting model;
  • the third generation module 403 is used for training a preset multi-document answer prediction model according to the question document to be trained, and generating a reading comprehension model;
  • the first obtaining module 404 is configured to obtain a pair of question documents to be predicted, wherein the pair of question documents to be predicted includes a second question and a plurality of candidate documents corresponding to the second question;
  • An output module 405, configured to output the target document corresponding to the second question according to the pair of question documents to be predicted based on the document sorting model
  • the second obtaining module 406 is configured to obtain, based on the reading comprehension model, according to the second question and the target document, the target text in the target document output by the reading comprehension model, and use the target text as A reading comprehension answer to the second question.
  • the second generation module 402 is specifically also used for:
  • the position feature vector information, the same word feature vector information and the named entity feature vector information obtain the first text semantic feature vector information of the question document pair to be trained ;
  • the model parameters of the first preset pre-trained language model are updated according to the first loss function to generate a document ranking model.
  • the second generation module 402 is specifically also used for:
  • the second generation module 402 is specifically also used for:
  • the first text semantic vector information output by the self-attention network model is acquired.
  • the third generation module 403 is also specifically used for:
  • a preset multi-document answer prediction model is trained according to the second text semantic vector information and the preset labeled answer document, and a corresponding reading comprehension model is generated.
  • the third generation module 403 is also specifically used for:
  • the corresponding second loss function is obtained
  • the model parameters of the preset multi-document answer prediction model are updated to generate a corresponding reading comprehension model.
  • the second obtaining module 406 is also specifically used for:
  • the second question and the target document are formed into a corresponding second question document pair, and input into the input layer of the reading comprehension model;
  • the target text corresponding to the target start position and the target end position in the target document output by the output layer is acquired.
  • the apparatuses provided in the above embodiments may be implemented in the form of a computer program, and the computer program may be executed on the computer device as shown in FIG. 6 .
  • FIG. 6 is a schematic structural block diagram of a computer device according to an embodiment of the present application.
  • the computer device may be a terminal.
  • the computer device includes a processor, a memory, and a network interface connected through a system bus, wherein the memory may include a non-volatile storage medium and an internal memory.
  • the nonvolatile storage medium can store operating systems and computer programs.
  • the computer program includes program instructions that, when executed, can cause the processor to execute any BERT-based machine reading comprehension method.
  • the processor is used to provide computing and control capabilities to support the operation of the entire computer equipment.
  • the internal memory provides an environment for running a computer program in a non-volatile storage medium, and when the computer program is executed by the processor, the processor can cause the processor to execute any BERT-based machine reading comprehension method.
  • the network interface is used for network communication, such as sending assigned tasks.
  • the network interface is used for network communication, such as sending assigned tasks.
  • FIG. 6 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
  • the processor may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSP), application specific integrated circuits (Application Specific Integrated circuits) Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor can be a microprocessor or the processor can also be any conventional processor or the like.
  • the processor is configured to run a computer program stored in the memory to implement the following steps:
  • a preset multi-document answer prediction model is trained to generate a reading comprehension model
  • the processor trains a first preset pre-trained language model according to the problem document pair to be trained, and generates a document sorting model for implementation, it is used to implement:
  • the position feature vector information, the same word feature vector information and the named entity feature vector information obtain the first text semantic feature vector information of the question document pair to be trained ;
  • the model parameters of the first preset pre-trained language model are updated according to the first loss function to generate a document ranking model.
  • the processor when the processor determines, according to the dictionary file and the pair of question documents to be trained, that the first text vector information of the pair of question documents to be trained is implemented, the processor is used to implement:
  • the processor when the processor obtains the first text semantic vector information corresponding to the first text vector information according to the self-attention network model and the first text vector information, the processor is used to implement:
  • the first text semantic vector information output by the self-attention network model is acquired.
  • the processor presets a multi-document answer prediction model for training according to the question document to be trained, and generates a reading comprehension model for implementation, it is used to realize:
  • a preset multi-document answer prediction model is trained according to the second text semantic vector information and the preset labeled answer document, and a corresponding reading comprehension model is generated.
  • the processor trains a preset multi-document answer prediction model according to the second text semantic vector information and the preset labeled answer document, and generates a corresponding reading comprehension model for implementation, it is used to implement:
  • the corresponding second loss function is obtained
  • the model parameters of the preset multi-document answer prediction model are updated to generate a corresponding reading comprehension model.
  • the processor based on the reading comprehension model, according to the second question and the target document, obtains the target text in the target document from the reading comprehension model and outputs the target text for realization. :
  • the second question and the target document are formed into a corresponding second question document pair, and input into the input layer of the reading comprehension model;
  • the target text corresponding to the target start position and the target end position in the target document output by the output layer is acquired.
  • Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, the computer program includes program instructions, and the method implemented when the program instructions are executed may refer to this document Apply for various embodiments of BERT-based machine reading comprehension methods.
  • the computer-readable storage medium may be an internal storage unit of the computer device described in the foregoing embodiments, such as a hard disk or a memory of the computer device.
  • the computer-readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk equipped on the computer device, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) ) card, Flash Card, etc.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function, and the like; The data created by the use of the node, etc.
  • the blockchain referred to in this application is a new application mode of computer technologies such as storage, point-to-point transmission, consensus mechanism, and encryption algorithm of text sorting models and reading comprehension models.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

一种基于BERT的机器阅读理解方法、装置、计算机设备及计算机可读存储介质,涉及人工智能和神经网络技术领域,该方法包括:根据待训练问题文档对训练第一预置预训练语言模型,生成文档排序模型(S102);根据所述待训练问题文档对训练预置多文档答案预测模型,生成阅读理解模型(S103);基于文档排序模型,根据待预测问题文档对,输出所述第二问题对应的目标文档(S105);基于所述阅读理解模型,根据所述第二问题和所述目标文档,获取所述阅读理解模型输出所述目标文档中的目标文本,并将所述目标文本作为所述第二问题的阅读理解答案(S106),实现将多文档阅读理解问题转换为单文档阅读理解问题,降低阅读理解时抽取答案的干扰,从而提高多文档阅读理解答案的准确性。

Description

基于BERT的机器阅读理解方法、装置、设备及存储介质
本申请要求于2020年10月29日提交中国专利局、申请号为2020111873810、发明名称为“基于BERT的机器阅读理解方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种基于BERT的机器阅读理解方法、装置、计算机设备及计算机可读存储介质。
背景技术
机器阅读理解是人工智能技术的重要组成部分,过去几年里,随着深度学习的出现,机器阅读理解(其要求机器基于给定的上下文回答问题)已经赢得了越来越广泛的关注,尤其是随着以BERT(Bidirectional Encoder Representations from Transformers预训练语言模型)为代表的预训练语言模型的发展,机器阅读理解任务都有着飞速的发展,主要体现为从关注限定文本到结合外部知识,从关注特定片段到对上下文的全面理解。然而在实际的应用场景中,常常会面临单个问题对应多个搜索引擎检索获得的文档,即需要整合多个文档的信息来预测答案。
发明人发现目前对于多文档阅读理解,多采用多个文档进行拼接成一个长文档,再采用文档滑动窗口,将长文档拆分为固定长度的多个段文本,每个文本段均与问题进行拼接后进行阅读理解,对同一个问题选择多个文本段中得分最高的答案作为该问题的阅读理解答案。例如,MS MARCO(Microsoft MAchine Reading Comprehension微软机器阅读理解)的Question Answering(智能问答)数据集是一个国际上权威的阅读理解数据集,其每条记录包含一个问题,十个候选文档,这十个候选答案有一个或者两个是包含问题答案文档。使用基于bert预训练语言模型在测试集上进行对比测试,直接使用十个候选文档拼接的长文本进行阅读理解的答案ROUGEL值大概为0.48,而比直接在含有答案的单个文档中进行阅读理解的答案ROUGEL值大概为0.56,两者相差约0.08分。其中,ROUGEL包括ROUGE(Recall-Oriented Understudy for Gisting Evaluation评估自动文摘以及机器翻译的一组指标)和L(longest common subsequence,最长公共子序列),ROUGEL值是机器阅读理解领域通用的答案质量评价指标,ROUGEL值越大,代表预测答案质量越好。使用基于bert预训练语言模型在一定程度上解决了现有模型对于多文档场景的输入长度受限的缺点,但其准确率与对单个文档进行阅读理解的准确率较低。
发明内容
本申请的主要目的在于提供一种基于BERT的机器阅读理解方法、装置、计算机设备及计算机可读存储介质,旨在解决现有使用基于bert预训练语言模型在一定程度上解决了现有模型对于多文档场景的输入长度受限的缺点,但其准确率与对单个文档进行阅读理解的准确率较低的技术问题。
第一方面,本申请提供一种基于BERT的机器阅读理解方法,所述基于BERT的机器阅读理解方法包括以下步骤:
获取待训练的第一问题和多个候选文档,将所述第一问题分别与各个候选文档组合,生成待训练问题文档对;根据所述待训练问题文档对训练第一预置预训练语言模型,生成文档排序模型;根据所述待训练问题文档对训练预置多文档答案预测模型,生成阅读理解模型;获取待预测问题文档对,其中,所述待预测问题文档对包括第二问题和所述第二问题对应的多个候选文档;基于所述文档排序模型,根据所述待预测问题文档对,输出所述第二问题对应的目标文档;基于所述阅读理解模型,根据所述第二问题和所述目标文档,获取所述阅读理解模型输出所述目标文档中的目标文本,并将所述目标文本作为所述第二问题的阅读理解答案。
第二方面,本申请还提供一种基于BERT的机器阅读理解装置,所述基于BERT的机器阅读理解装置包括:
第一生成模块,用于获取待训练的第一问题和多个候选文档,将所述第一问题分别与各个候选文档组合,生成待训练问题文档对;第二生成模块,用于根据所述待训练问题文档对训练第一预置预训练语言模型,生成文档排序模型;第三生成模块,用于根据所述待训练问题文档对训练预置多文档答案预测模型,生成阅读理解模型;第一获取模块,用于获取待预测问题文档对,其中,所述待预测问题文档对包括第二问题和所述第二问题对应的多个候选文档;输出模块,用于基于所述文档排序模型,根据所述待预测问题文档对,输出所述第二问题对应的目标文档;第二获取模块,用于基于所述阅读理解模型,根据所述第二问题和所述目标文档,获取所述阅读理解模型输出所述目标文档中的目标文本,并将所述目标文本作为所述第二问题的阅读理解答案。
第三方面,本申请还提供一种计算机设备,所述计算机设备包括处理器、存储器、以及存储在所述存储器上并可被所述处理器执行的计算机程序,其中所述计算机程序被所述处理器执行时,实现如下步骤:
获取待训练的第一问题和多个候选文档,将所述第一问题分别与各个候选文档组合,生成待训练问题文档对;根据所述待训练问题文档对训练第一预置预训练语言模型,生成文档排序模型;根据所述待训练问题文档对训练预置多文档答案预测模型,生成阅读理解模型;获取待预测问题文档对,其中,所述待预测问题文档对包括第二问题和所述第二问题对应的多个候选文档;基于所述文档排序模型,根据所述待预测问题文档对,输出所述第二问题对应的目标文档;基于所述阅读理解模型,根据所述第二问题和所述目标文档, 获取所述阅读理解模型输出所述目标文档中的目标文本,并将所述目标文本作为所述第二问题的阅读理解答案。
第四方面,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,其中所述计算机程序被处理器执行时,实现如下步骤:
获取待训练的第一问题和多个候选文档,将所述第一问题分别与各个候选文档组合,生成待训练问题文档对;根据所述待训练问题文档对训练第一预置预训练语言模型,生成文档排序模型;根据所述待训练问题文档对训练预置多文档答案预测模型,生成阅读理解模型;获取待预测问题文档对,其中,所述待预测问题文档对包括第二问题和所述第二问题对应的多个候选文档;基于所述文档排序模型,根据所述待预测问题文档对,输出所述第二问题对应的目标文档;基于所述阅读理解模型,根据所述第二问题和所述目标文档,获取所述阅读理解模型输出所述目标文档中的目标文本,并将所述目标文本作为所述第二问题的阅读理解答案。
本申请提供一种基于BERT的机器阅读理解方法、装置、计算机设备及计算机可读存储介质,通过获取待训练的第一问题和多个候选文档,将所述第一问题分别与各个候选文档组合,生成待训练问题文档对;根据所述待训练问题文档对训练第一预置预训练语言模型,生成文档排序模型;根据所述待训练问题文档对训练预置多文档答案预测模型,生成阅读理解模型;获取待预测问题文档对,其中,所述待预测问题文档对包括第二问题和所述第二问题对应的多个候选文档;基于所述文档排序模型,根据所述待预测问题文档对,输出所述第二问题对应的目标文档;基于所述阅读理解模型,根据所述第二问题和所述目标文档,获取所述阅读理解模型输出所述目标文档中的目标文本,并将所述目标文本作为所述第二问题的阅读理解答案,实现在文档排序模型中,加入词性标注信息、文档中的字符是否在问题中出现的信息,以及命名实体识别信息,来捕捉问题与多个候选文档之间的相关性,从而先对候选文档进行得分排序,输出得分最高的一个文档,作为阅读理解模型的输入文档。以此来把多文档阅读理解问题转换为单文档阅读理解问题,降低阅读理解时抽取答案的干扰,从而提高多文档阅读理解答案的准确性。
附图说明
图1为本申请实施例提供的一种基于BERT的机器阅读理解方法的流程示意图;
图2为图1中的基于BERT的机器阅读理解方法的子步骤流程示意图;
图3为图1中的基于BERT的机器阅读理解方法的子步骤流程示意图;
图4为图1中的基于BERT的机器阅读理解方法的子步骤流程示意图;
图5为本申请实施例提供的一种基于BERT的机器阅读理解装置的示意性框图;
图6为本申请一实施例涉及的计算机设备的结构示意框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例提供一种基于BERT的机器阅读理解方法、装置、计算机设备及计算机可读存储介质。其中,该基于BERT的机器阅读理解方法可应用于计算机设备中,该计算机设备可以是笔记本电脑、台式电脑等电子设备。
请参照图1,图1为本申请的实施例提供的一种基于BERT的机器阅读理解方法的流程示意图。
如图1所示,该基于BERT的机器阅读理解方法包括步骤S101至步骤S106。
步骤S101、获取待训练的第一问题和多个候选文档,将所述第一问题分别与各个候选文档组合,生成待训练问题文档对。
示范例的,获取待训练的第一问题和多个候选文档,将该第一问题与各个候选文档进行组合。例如,待训练的多个候选文档为10个文档时,将这10个中任意一个候选文档分别与第一问题进行组合,得到对应的问题文档对,将得到的多个问题问题文档对作为待训练问题文档对。其中,待训练问题问题文档对包括多个问题文档对,候选文档的数量与问题文档对的数量相同。例如,候选文档的数量为10,则问题文档对的数量为10。
步骤S102、根据所述待训练问题文档对训练第一预置预训练语言模型,生成文档排序模型。
示范例的,通过生成的待训练问题文档对,将该待训练问题文档对输入到第一预置预训练语言模型中,通过该待训练问题文档对中每一对问题文档对的共同字词特征,得到每一对问题文档对的概率值,通过每一对问题文档对的概率值得到对应的损失函数,并通过该损失函数更新第一预置预训练语言模型的模型参数,生成文档排序模型。
在一实施例中,具体地,参照图2,步骤S102包括:子步骤S1021至子步骤S1028。
子步骤S1021、根据所述字典文件和所述待训练问题文档对,确定所述待训练问题文档对的第一文本向量信息。
示范例的,第一预置预训练语言模型包括字典文件vocab.txt,通过该字典文件vocab.txt将待训练问题文档对中的第一问题和各个候选文档进行切分,将切分后的第一问题和各个候选文档进行拼接,得到对应的第一文本序列。其中,第一文本序列中包括第一文本序列的标识类型,以及第一问题和各个候选文档的分割位置符号。将得到的第一文本序列进行向量化表示,得到对应的文本向量信息。
在一实施例中,所述根据所述字典文件和所述待训练问题文档对,确定所述待训练问题文档对的第一文本向量信息,包括:根据所述字典对所述待训练问题文档对进行字词切分,得到所述待训练问题文档对中第一问题的第一问题序列以及所述各个文档的文档序 列;将所述第一问题序列和所述文档序列进行拼接,生成对应的第一文本序列;将所述第一文本序列进行特征向量转换,得到对应的第一文本向量信息。
示范例的,通过该字典文件vocab.txt对待训练问题文档对中的第一问题和各个候选文档按照字词进行切分,得到第一问题的第一问题序列和各个候选文档的候选文档序列,例如,第一问题序列包括多个单词tokens_a,各个候选文档序列包括多个单词tokens_b。将得到的第一问题序列和各个候选文档序列进行拼接,得到对应的第一文本序列。例如,将得到的第一问题序列和各个候选文档序列进行拼接,对该拼接的位置进行标记,将第一问题序列的开始位置用[CLS]进行标记,该[CLS]作为第一文本序列的语义符号。将[SEP]作为第一问题序列与候选文档序列或各个候选文档序列之间的分割符号。例如,多个候选文档序列包括第一候选文档序列和第二候选文档序列,拼接得到的第一文本序列为[CLS]第一问题序列[SEP]第一候选文档序列[SEP]第二文档候选序列[SEP]等。
在得到第一文本序列时,将第一文本序列中每个单词用预训练的单词特征向量信息转换,得到对应的第一文本向量信息,其中第一文本向量信息包括文本序列中每个单词的语义向量信息、位置向量信息、分段表示的加和向量信息。示范性的,该预训练的单词特征向量信息转换为将第一文本序列表示成一系列能够表达文本语义的向量。例如,候选文档序列为“你帮我”或“我帮你”,“你帮我”的二元语法依次为:“你,你帮,帮,帮我,我”;“我帮你”的二元语法依次为:“我,我帮,帮,帮你,你”,从而可以构造一个字典{“你”:1,“你帮”:2,“帮”:3,“帮我”:4,“我”:5,“我帮”:6,“帮你”:7},通过字典将“你帮我”向量化结果表示为[1,1,1,1,1,0,0];将“我帮你”向量化结果表示为[1,0,1,0,1,1,1]。
子步骤S1022、根据所述自注意力网络模型和所述第一文本向量信息,获取所述第一文本向量信息对应的第一文本语义向量信息。
示范例的,该第一预置预训练语言模型包括多头注意力网络模型,将获取到的文本向量信息输入到多头注意力网络模型中,该多头注意力网络模型获取输入文本向量中融合上下文信息的每个单词所对应的向量表示,获取多头注意力网络模型输出的第一文本语义向量信息。
在一实施例中,所述根据所述自注意力网络模型和所述第一文本向量信息,获取所述第一文本向量信息对应的第一文本语义向量信息,包括:将所述第一文本向量信息输入所述自注意力网络模型,得到所述第一文本向量信息的各个语义空间的文本语义向量信息;根据所述各个语义空间的文本语义向量信息,获取所述自注意力网络模型输出的第一文本语义向量信息。
示范性的,将获取到的第一文本向量信息输入到多头注意力网络模型中,该多头注意力网络模型包括第一线性映射层,通该第一线性映射层将该文本向量信息映射到不同语义空间的语义向量,捕捉不同维度的语义信息。例如,第一线性映射层中的线性项公式为Q′ i=QW i Q,K′ i=KW t k,V′ i=VW i V,其中Q为查询值、K为键值、V为值向量,i为映射 到i个语义空间的线性项,Q′,i,K′i,V′j为第i个语义空间的语义向量。
通过在不同语义空间语义向量上进行self-attention操作,输出不同语义空间的文本语
Figure PCTCN2021097422-appb-000001
的文本语义向量。在得到不同语义空间的文本语义向量时,将不同语义空间的文本语义向量进行拼接,例如c=Concat(hend 1,......,hend i)W,其中,Concat为向量拼接操作,W为不同语义空间映射回初始语义空间的线性项,C为多头自注意力网络模型输出的文本语义向量。将拼接后的向量信息通过第一线性映射层映射回原语义空间,得到输出的第一文本语义向量信息。
子步骤S1023、基于结巴工具,获取所述待训练问题文档对中第一问题和所述各个候选文档的位置特征向量信息。
示范例的,该第一预置预训练语言模型包括jieba(结巴工具),结巴工具比如为结巴分词,通过结巴分词把待训练问题文档对的第一文本序列中所有字词的词语都提取出来,通过该结巴工具提取待训练问题文档对的第一文本序列中第一问题和各个候选文档的位置特征。例如,第一文本序列中包括第一问题序列的语义符号和各个候选文档序列的分割位置符号,其中,将语义符号作为第一问题序列的开始位置符号。通过结巴工具识别第一问题序列中的第一问题序列的语义符号和各个候选文档序列的分割位置符号,得到第一问题的位置特征和各个候选文档的位置特征。将得到的第一问题的位置特征和各个候选文档的位置特征进行one-hot编码处理,得到对应的位置特征向量信息。其中,one-hot编码为一位有效编码,主要是采用N位状态寄存器来对N个状态进行编码,每个状态都由他独立的寄存器位,并且在任意时候只有一位有效,且one-hot编码是分类变量作为二进制向量的表示。首先将分类值映射到整数值,然后,每个整数值被表示为二进制向量,除了整数的索引之外,它都是零值,它被标记为1。
子步骤S1024、确定所述待训练问题文档对中所述第一问题与所述各个候选文档的相同字词特征和非共同字词特征,得到对应的字词特征向量信息。
示范例的,确定待训练问题文档对中第一问题和各个候选文档的相同字词特征,例如,在得到待训练问题文档对的第一文本序列,该第一文本序列包括第一问题的各个单词tokens_a和各个候选文档的各个单词tokens_b,各个候选文档中的任意一个单词tokens_b与第一问题的各个单词tokens_a进行匹配,从而得到第一问题与各个候选文档的共同字词特征和非共同字词特征。在获取到共同字词特征和非共同字词特征时,对该共同字词特征和非共同字词特征进行二值化处理,得到共同字词特征和非共同字词特征对应的字词向量特征信息。例如,将获取到的共同字词特征的字词标记为1,将非共同字词特征的字词标记为0。
子步骤S1025、根据所述中文分词工具和所述待训练问题文档对,获取所述待训练问题文档对的命名实体特征向量信息。
实施例的,第一预置预训练语言模型中包括中文分词工具(stanford corenlp),通过该中文分词工具确定待训练文本对中的命名实体。命名实体为人名、机构名、地名以及其他所有以名称为标识的实体,如,数字、日期、货币、地址等。例如,获取待训练文本对的第一文本序列,第一文本序列中包括第一问题的单词tokens_a和各个候选文档的各个单词tokens_b。根据中文分词工具确定第一问题的各个单词tokens_a中命名实体对应的字词,以及各个候选文档的各个单词tokens_b中命名实体对应的字词,将命名实体对应的字词进行one-hot编码处理,得到对应的命名实体特征向量信息。其中,one-hot编码为一位有效编码,主要是采用N位状态寄存器来对N个状态进行编码,每个状态都由他独立的寄存器位,并且在任意时候只有一位有效,且one-hot编码是分类变量作为二进制向量的表示。首先将分类值映射到整数值,然后,每个整数值被表示为二进制向量,除了整数的索引之外,它都是零值,它被标记为1。
子步骤S1026、根据所述第一文本语义向量信息、所述位置特征向量信息、所述相同字词特征向量信息和所述命名实体特征向量信息,得到所述待训练问题文档对的第一文本语义特征向量信息。
实施例的,在获取到待训练数据的第一文本语义向量信息、位置特征向量信息、字词特征向量信息和命名实体特征向量信息时,将待训练数据的第一文本语义向量信息、位置特征向量信息、字词特征向量信息和命名实体特征向量信息进行叠加,得到待训练问题文档对的文本语义特征向量信息,语义特征向量信息包括第一问题与各个候选文档对关联的语义特征向量信息。例如,将待训练数据的第一文本语义向量信息、位置特征向量信息、字词特征向量信息和命名实体特征向量信息追加特征向量存储在统一的文件中,得到对应的第一文本语义特征向量信息。
子步骤S1027、根据所述第一文本语义特征向量信息,得到对应的第一损失函数。
实施例的,从文本语义特征向量信息中获取各个候选文档与第一问题的关联向量信息,其中,关联向量信息为各个候选文档中包含第一问题的答案则对应的位置的元素为1,否则为0。在获取到第一文本语义向量信息时,从该文本语义向量信息中获取各个候选文档的语义向量。基于该第一预置预训练语言模型的第一线性映射层对各个候选文档的语义向量做线性变换,得到各个候选文档的概率得分值,将得到的多个候选文档的概率得分值组成多维向量信息。根据计算该多维向量信息得到log_softmax值。通过log_softmax值与关联向量信息,得到对应的第一损失函数。
子步骤S1028、根据所述第一损失函数更新所述第一预置预训练语言模型的模型参数,生成文档排序模型。
实施例的,在得到第一损失函数时,通过反向传播机制,得到对应的模型参数,通过该模型参数更新第一预置预训练语言模型的模型参数,生成对应的文档排序模型。
步骤S103、根据所述待训练问题文档对训练预置多文档答案预测模型,生成阅读理解模型。
示范例的,获取待训练问题文档对,该待训练问题文档对包括第一问题和多个候选文档,确定该多个候选文档中包含第一问题的答案的目标候选文档,将该目标候选文档与第一问题组成新的问题文档对。将新的问题文档对输入到第二预置预训练语言模型中,通过该第二预置预训练语言模型对新的问题文档对进行语义映射,得到新的问题文档对的第二文本语义向量信息。根据该第二文本语义向量信息训练预置多文档答案预测模型,生成对应的阅读理解模型。
在一实施例中,具体地,参照图3,步骤S103包括:子步骤S1031至子步骤S1033。
子步骤S1031、确定所述待训练问题文档对的多个候选文档中与所述第一问题的答案最相似的目标候选文档,并将所述第一问题与所述目标候选文档组成新的问题文档对。
示范例的,获取到待训练问题文档对,该待待训练问题文档对包括第一问题和多个候选文档对,获取多个候选文档对中带标记的候选文档对,将该带标记的候选文档对作为目标候选文档对,将该目标候选文档对与第一问题组成新的问题文档对。
子步骤S1032、根据第二预置预训练语言模型,得到所述新的问题文档对的第二文本语义向量信息。
示范例的,第二预置预训练语言模型包括字典文件vocab.txt,通过该字典文件vocab.txt将新的问题文档对中的第一问题和目标候选文档进行切分,将切分后的第一问题和目标候选文档进行拼接,得到对应的第二文本序列。其中,第二文本序列中包括第二文本序列的标识类型,以及第一问题和目标候选文档的分割位置符号。将得到的第二文本序列进行向量化表示,得到对应的第二文本向量信息。
例如,通过该字典文件vocab.txt对待训练问题文档对中的第一问题和目标候选文档按照字词进行切分,得到第一问题的第一问题序列和目标候选文档的目标候选文档序列,例如,第一问题序列包括多个单词tokens_a,目标候选文档序列包括多个单词tokens_b。将得到的第一问题序列和目标候选文档序列进行拼接,得到对应的第二文本序列。例如,将得到的第一问题序列和目标候选文档序列进行拼接,对该拼接的位置进行标记,将第一问题序列的开始位置用[CLS]进行标记,该[CLS]作为第二文本序列的语义符号。将[SEP]作为第一问题序列与目标候选文档序列之间的分割符号。例如,拼接得到的第二文本序列为[CLS]第一问题序列[SEP]目标候选文档序列[SEP]。
在得到第二文本序列时,将第二文本序列中每个单词用预训练的单词特征向量信息转换,得到对应的第二文本向量信息,其中第二文本向量信息包括第二文本序列中每个单词的语义向量信息、位置向量信息、分段表示的加和向量信息。示范性的,该预训练的单词特征向量信息转换为将第二文本序列表示成一系列能够表达第二文本语义的向量。例如,目标候选文档序列为“你帮我”或“我帮你”,“你帮我”的二元语法依次为:“你,你帮,帮,帮我,我”;“我帮你”的二元语法依次为:“我,我帮,帮,帮你,你”,从而可以构造一个字典{“你”:1,“你帮”:2,“帮”:3,“帮我”:4,“我”:5, “我帮”:6,“帮你”:7},“你帮我”向量化结果表示为[1,1,1,1,1,0,0];“我帮你”向量化结果表示为[1,0,1,0,1,1,1]。
该第二预置预训练语言模型包括多头注意力网络模型,将获取到的第二文本向量信息输入到多头注意力网络模型中,该多头注意力网络模型获取输入第二文本向量信息中融合上下文信息的每个单词所对应的向量表示,获取多头注意力网络模型输出的第二文本语义向量信息。
例如,将获取到的第二文本向量信息输入到多头注意力网络模型中,该多头注意力网络模型包括第一线性映射层,通该第一线性映射层将该第二文本向量信息映射到不同语义空间的语义向量,捕捉不同维度的语义信息。例如,第一线性映射层中的线性项公式为Q′ i=QW i Q,K′ i=KW t k,V′ i=VW i V,其中Q为查询值、K为键值、V为值向量,i为映射到i个语义空间的线性项,Q′,i,K′i,V′j为第i个语义空间的语义向量。
通过在不同语义空间语义向量上进行self-attention操作,输出不同语义空间的文本语
Figure PCTCN2021097422-appb-000002
的文本语义向量。在得到不同语义空间的文本语义向量时,将不同语义空间的文本语义向量进行拼接,例如c=Concat(hend 1,......,hend i)W,其中,Concat为向量拼接操作,W为不同语义空间映射回初始语义空间的线性项,C为多头自注意力网络模型输出的第二文本语义向量。将拼接后的向量信息通过第一线性映射层映射回原语义空间,得到输出的第二文本语义向量信息。
子步骤S1033、根据所述第二文本语义向量信息和预置带标签答案文档训练预置多文档答案预测模型,生成对应的阅读理解模型。
示范例的,在获取到第二文本语句向量信息时,将该第二文本语义向量和预置带标签答案文档训练预置多问文档答案预测模型。该预置多问文档答案预测模型为预置多文档机器阅读理解答案预测模型,通过该第二文本语义向量信息和预置带标签答案文档训练预置多文档机器阅读理解答案预测模型,得到该第二文本语义向量信息对应的目标候选文档的多个答案起始位置概率和多个答案结束位置概率概率,以及预置带标签答案文档的起始位置概率和答案结束位置概率。根据目标候选文档的多个答案起始位置概率和多个答案结束位置概率概率,以及预置带标签答案文档的答案起始位置概率和答案结束位置概率更新预置多文档机器阅读理解答案预测模型,生成对应的阅读理解模型。
在一实施例中,所述根据所述第二文本语义向量信息和预置带标签答案文档训练预置多文档答案预测模型,生成对应的阅读理解模型,包括:将所述第二文本语义向量信息和预置带标签答案文档输入预置多文档机器答案预测模型,得到所述第二文本语义向量信息中目标文档的答案起始位置概率和答案结尾位置概率,以及所述预置带标签答案文档的答案起始位置概率和答案结尾位置概率;根据所述目标文档的答案起始位置概率和答案结尾位置概率,以及预置带标签答案文档的答案起始位置概率和答案结尾位置概率,得到对应 的第二损失函数;根据所述损第二失函数和反向传播机制,更新所述预置多文档答案预测模型的模型参数,生成对应的阅读理解模型。
实施例的,将第二文本语义向量信息和与预置带标签答案文档输入预置多文档机器答案预测模型中,通过预置多文档机器答案预测模型计算第二文本语义向量信息中目标候选文档的各个单词的答案起始位置概率和答案结尾位置概率,以及预置带标签答案文档的答案起始位置概率和答案结尾位置概率,其中,答案起始位置概率和答案结尾位置概率为1。例如,基于预置多文档机器答案预测模型中的答案起始位置概率公式和答案结束位置概率公式,计算得到第二文本语义向量信息中目标候选文档的各个单词的答案起始位置概率和答案结尾位置概率。在获取到预置带标签答案文档的答案起始位置概率和结尾位置概率时,基于答案起始位置概率公式Ps=soft max(WsC)和答案结束位置概率公式Pe=soft max(WeC),得到第二文本语义向量信息中目标候选文档的各个单词的答案起始位置概率和答案结尾位置概率,其中,Ps为目标候选文档的各个单词为第一问题的答案起始位置概率,Pe为目标候选文档的各个单词为第一问题的答案结束位置概率,Ws为预置带标签答案文档为第一问题的答案起始位置概率,We为为预置带标签答案文档为为第一问题的答案结束位置概率,其中,C为常量。
获取预置带标签答案文档中为第一问题的答案起始位置和为第一问题的答案结束位置,根据预置带标签答案文档中为第一问题的答案起始位置和为第一问题的答案结束位置,以及第二文本语义向量信息中目标候选文档的各个单词的答案起始位置概率和答案结尾位置概率,得到对应的损失函数。例如,基于损失公式
Figure PCTCN2021097422-appb-000003
其中,LANS为损失函数,log为对数,
Figure PCTCN2021097422-appb-000004
为预置带标签答案文档中为第一问题的答案起始位置,
Figure PCTCN2021097422-appb-000005
为预置带标签答案文档中为第一问题的答案结束位置,N为样本数量。在得到损函数时,通过对该损失函数进行反向传播机制,得到对应的模型参数,将该模型参数更新预置多文档答案预测模型的模型参数,生成对应的阅读理解模型。
步骤S104、获取待预测问题文档对,其中,所述待预测问题文档对包括第二问题和所述第二问题对应的多个选档文候。
示范例的,获取待预测问题文档对,该待预测问题文档对包括第二问题和第二问题对应的多个候选文档,将第二问题与各个第二问题的候选文档进行组合,得到对应的第二问题文档对,其中,待预测问题文档对包括多个问题文档对。
步骤S105、基于所述文档排序模型,根据所述待预测问题文档对,输出所述第二问题对应的目标文档。
示范例的,文档排序模型包括字典文件vocab.txt,通过该字典文件vocab.txt对待预测问题文档对中的第二问题和第二问题对应的各个候选文档按照字词进行切分,得到第二问题的第二问题序列和各个候选文档的候选文档序列,例如,第二问题序列包括多个单词tokens_a,各个候选文档序列包括多个单词tokens_b。将得到的第二问题序列和各个候选文 档序列进行拼接,得到对应的第二文本序列。例如,将得到的第二问题序列和各个候选文档序列进行拼接,对该拼接的位置进行标记,将第二问题序列的开始位置用[CLS]进行标记,该[CLS]作为第二文本序列的语义符号。将[SEP]作为第二问题序列与候选文档序列或各个候选文档序列之间的分割符号。例如,多个候选文档序列包括第一候选文档序列和第二候选文档序列,拼接得到的第二文本序列为[CLS]第二问题序列[SEP]第一候选文档序列[SEP]第二文档候选序列[SEP]等。
在得到第二文本序列时,将第二文本序列中每个单词用预训练的单词特征向量信息转换,得到对应的第二文本向量信息,其中第二文本向量信息包括文本序列中每个单词的语义向量信息、位置向量信息、分段表示的加和向量信息。示范性的,该预训练的单词特征向量信息转换为将第二文本序列表示成一系列能够表达文本语义的向量。例如,候选文档序列为“你帮我”或“我帮你”,“你帮我”的二元语法依次为:“你,你帮,帮,帮我,我”;“我帮你”的二元语法依次为:“我,我帮,帮,帮你,你”,从而可以构造一个字典{“你”:1,“你帮”:2,“帮”:3,“帮我”:4,“我”:5,“我帮”:6,“帮你”:7},通过字典将“你帮我”向量化结果表示为[1,1,1,1,1,0,0];将“我帮你”向量化结果表示为[1,0,1,0,1,1,1]。
该文档排序模型包括多头注意力网络模型,将获取到的第二文本向量信息输入到多头注意力网络模型中,该多头注意力网络模型包括第一线性映射层,通该第一线性映射层将该文本向量信息映射到不同语义空间的语义向量,捕捉不同维度的语义信息。例如,第一线性映射层中的线性项公式为Q′ i=QW i Q,K′ i=KW t k,V′ i=VW i V,其中Q为查询值、K为键值、V为值向量,i为映射到i个语义空间的线性项,Q′,i,K′i,V′j为第i个语义空间的语义向量。
通过在不同语义空间语义向量上进行self-attention操作,输出不同语义空间的文本语
Figure PCTCN2021097422-appb-000006
的文本语义向量。在得到不同语义空间的文本语义向量时,将不同语义空间的文本语义向量进行拼接,例如c=Concat(hend 1,......,hend i)W,其中,Concat为向量拼接操作,W为不同语义空间映射回初始语义空间的线性项,C为多头自注意力网络模型输出的文本语义向量。将拼接后的向量信息通过第一线性映射层映射回原语义空间,得到输出的第二文本语义向量信息。
该文档排序模型包括jieba(结巴工具),通过该结巴工具提取第二文本序列中第二问题和各个候选文档的位置特征。例如,第二文本序列中包括第二问题序列的语义符号和各个候选文档序列的分割位置符号,其中,将语义符号作为第二问题序列的开始位置符号。通过结巴工具识别第二问题序列中的第二问题序列的语义符号和各个候选文档序列的分割位置符号,得到第二问题的位置特征和各个候选文档的位置特征。将得到的第二问题的位置特征和各个候选文档的位置特征进行one-hot编码处理,得到对应的位置特征向量信 息。其中,one-hot编码为一位有效编码,主要是采用N位状态寄存器来对N个状态进行编码,每个状态都由他独立的寄存器位,并且在任意时候只有一位有效,且one-hot编码是分类变量作为二进制向量的表示。首先将分类值映射到整数值,然后,每个整数值被表示为二进制向量,除了整数的索引之外,它都是零值,它被标记为1。
确定待预测问题文档对中第二问题和各个候选文档的相同字词特征,例如,在得到待预测问题文档对的第二文本序列,该第二文本序列包括第二问题的各个单词tokens_a和各个候选文档的各个单词tokens_b,各个候选文档中的任意一个单词tokens_b与第一问题的各个单词tokens_a进行匹配,从而得到第二问题与各个候选文档的共同字词特征和非共同字词特征。在获取到共同字词特征和非共同字词特征时,对该共同字词特征和非共同字词特征进行二值化处理,得到共同字词特征和非共同字词特征对应的字词向量特征信息。例如,将获取到的共同字词特征的字词标记为1,将非共同字词特征的字词标记为0。
文档排序模型中包括中文分词工具(stanford corenlp),通过该中文分词工具确定待预测问题文档对中的命名实体。命名实体为人名、机构名、地名以及其他所有以名称为标识的实体,如,数字、日期、货币、地址等。例如,获取待预测问题文档对的第二文本序列,第二文本序列中包括第二问题的单词tokens_a和各个候选文档的各个单词tokens_b。根据文分词工具确定第二问题的各个单词tokens_a中命名实体对应的字词,以及各个候选文档的各个单词tokens_b中命名实体对应的字词,将命名实体对应的字词进行one-hot编码处理,得到对应的命名实体特征向量信息。其中,one-hot编码为一位有效编码,主要是采用N位状态寄存器来对N个状态进行编码,每个状态都由他独立的寄存器位,并且在任意时候只有一位有效,且one-hot编码是分类变量作为二进制向量的表示。首先将分类值映射到整数值,然后,每个整数值被表示为二进制向量,除了整数的索引之外,它都是零值,它被标记为1。
在获取到待预测问题文档对的第二文本语义向量信息、位置特征向量信息、字词特征向量信息和命名实体特征向量信息时,将待预测问题文档对的第二文本语义向量信息、位置特征向量信息、字词特征向量信息和命名实体特征向量信息进行叠加,得到待预测问题文档对的文本语义特征向量信息,语义特征向量信息包括第二问题与各个候选文档对关联的语义特征向量信息。例如,将待预测问题文档对的第二文本语义向量信息、位置特征向量信息、字词特征向量信息和命名实体特征向量信息追加特征向量存储在统一的文件中,得到对应的第二文本语义特征向量信息。根据文档排序模型的模型参数计算第二文本语义特征向量信息,预测第二问题对应的各个候选文档对的得分值,确定得分值最高的二问题对应的候选文档,将该得分值最高的二问题对应的候选文档作为目标文档,获取文档排序模型输出的目标文档。
步骤S106、基于所述阅读理解模型,根据所述第二问题和所述目标文档,获取所述阅读理解模型输出所述目标文档中的目标文本,并将所述目标文本作为所述第二问题的阅读理解答案。
示范例的,在获取到第二问题的目标文档时,将该目标文档与第二问题生成第二问题文档对,将第二问题文档对输入到预置阅读理解模型中,将通过该预置阅读理解模型确定该目标文档中各个单词为第二问题的答案起始位置和该答案结束位置概率,根据该目标文档中各个单词为第二问题的答案结束位置概率和该目标文档中各个单词为第二问题的答案起始位置概率,得到目标文档中为第二问题的答案起始位置和答案结束位置。根据第二问题的答案起始位置和答案结束位置,确定目标文档中的目标文本,从而获取阅读理解模型输出的目标文本,将该目标文本作为第二问题的阅读理解答案。
在一实施例中,具体地,参照图4,步骤S106包括:子步骤S1061至子步骤S1064。
子步骤S1061、将所述第二问题和所述目标文档组成对应的第二问题文档对,并输入到所述阅读理解模型的输入层中。
示范例的,阅读理解模型包括输入层,将获取到的第二问题的目标文档对和第二问题组成第二问题文档对,并将第二问题文档对输入到阅读理解模型的输入层中,通过该输入层提取第二问题和目标文档的特征信息。例如,将第二问题和目标文档进行字词切分,得到对应的第二问题序列和目标文档序列,并将第二问题序列和目标文档序列进行拼接,得到对应的目标文本序列。
子步骤S1062、基于所述阅读理解模型的概率预测层,预测所述目标文档中所述第二问题对应的多个答案起始位置概率和答案结束位置概率。
示范例为,通过阅读理解模型的概率预测层,预测目标文当中第二问题对应答案的起始位置概率和就结束位置概率。示范性的,概率预测层获取目标文本序列中目标文档的各个单词为第二问题的答案起始位置概率和各个单词为第二问题的答案结束位置概率,例如,通过答案开始位置概率公式Ps=soft max(WsC)和答案结束位置概率公式Pe=soft max(WeC),分别得到各个单词为第二问题的答案起始位置概率和各个单词为第二问题的答案结束位置概率。
子步骤S1063、基于所述阅读理解模型的概率比对层,比对多个所述答案起始位置概率和所述答案结束位置概率,确定概率最高的目标起始位置和概率最高的目标结束位置。
示范例的,通过阅读理解模型的概率比对层,比对各个单词为第二问题的答案起始位置概率和各个单词为第二问题的答案结束位置概率,确定答案起始位置概率最高的第一单词,确定该第一单词在目标文档中的位置,并将第一单词在目标文档中的位置作为目标起始位置;确定答案结束位置概率最高的第二单词,确定第二单词在目标文档中的位置,并将第二单词在目标文档中的位置作为目标结束位置。
子步骤S1064、基于所述阅读理解模型的输出层,获取所述输出层输出的所述目标文档中所述目标起始位置和所述目标结束位置对应的目标文本。
示范例的,在确定目标文档中的目标起始位置和目标结束位置时,确定对应的目标文本。例如,将目标文档中的目标起始位置至目标结束位置中间的部分作为目标文本。在确 定目标文档中的目标文本时,通过阅读理解模型的输出层将目标文本输出,从而获取阅读理解模型的输出的目标文本。
在本申请实施例中,实现在文档排序模型中,加入词性标注信息、文档中的字符是否在问题中出现的信息,以及命名实体识别信息,来捕捉问题与多个候选文档之间的相关性,从而先对候选文档进行得分排序,输出得分最高的一个文档,作为阅读理解模型的输入文档。以此来把多文档阅读理解问题转换为单文档阅读理解问题,降低阅读理解时抽取答案的干扰,从而提高多文档阅读理解答案的准确性。
请参照图5,图5为本申请实施例提供的一种基于BERT的机器阅读理解装置的示意性框图。
如图5所示,该基于BERT的机器阅读理解装置400,包括:第一生成模块401、第二生成模块402、第三生成模块403、第一获取模块404、输出模块405、第二获取模块406。
第一生成模块401,用于获取待训练的第一问题和多个候选文档,将所述第一问题分别与各个候选文档组合,生成待训练问题文档对;
第二生成模块402,用于根据所述待训练问题文档对训练第一预置预训练语言模型,生成文档排序模型;
第三生成模块403,用于根据所述待训练问题文档对训练预置多文档答案预测模型,生成阅读理解模型;
第一获取模块404,用于获取待预测问题文档对,其中,所述待预测问题文档对包括第二问题和所述第二问题对应的多个候选文档;
输出模块405,用于基于所述文档排序模型,根据所述待预测问题文档对,输出所述第二问题对应的目标文档;
第二获取模块406,用于基于所述阅读理解模型,根据所述第二问题和所述目标文档,获取所述阅读理解模型输出所述目标文档中的目标文本,并将所述目标文本作为所述第二问题的阅读理解答案。
其中,第二生成模块402具体还用于:
根据所述字典文件和所述待训练问题文档对,确定所述待训练问题文档对的第一文本向量信息;
根据所述自注意力网络模型和所述第一文本向量信息,获取所述第一文本向量信息对应的第一文本语义向量信息;
基于结巴工具,获取所述待训练问题文档对中第一问题和所述各个候选文档的位置特征向量信息;
确定所述待训练问题文档对中所述第一问题与所述各个候选文档的相同字词特征和非共同字词特征,得到对应的字词特征向量信息;
根据所述中文分词工具和所述待训练问题文档对,获取所述待训练问题文档对的命名实体特征向量信息;
根据所述第一文本语义向量信息、所述位置特征向量信息、所述相同字词特征向量信息和所述命名实体特征向量信息,得到所述待训练问题文档对的第一文本语义特征向量信息;
根据所述第一文本语义特征向量信息,得到对应的第一损失函数;
根据所述第一损失函数更新所述第一预置预训练语言模型的模型参数,生成文档排序模型。
其中,第二生成模块402具体还用于:
根据所述字典对所述待训练问题文档对进行字词切分,得到所述待训练问题文档对中第一问题的第一问题序列以及所述各个文档的文档序列;
将所述第一问题序列和所述文档序列进行拼接,生成对应的第一文本序列;
将所述第一文本序列进行特征向量转换,得到对应的第一文本向量信息。其中,第二生成模块402具体还用于:
将所述第一文本向量信息输入所述自注意力网络模型,得到所述第一文本向量信息的各个语义空间的文本语义向量信息;
根据所述各个语义空间的文本语义向量信息,获取所述自注意力网络模型输出的第一文本语义向量信息。
其中,确第三生成模块403具体还用于:
确定所述待训练问题文档对的多个候选文档中与所述第一问题的答案最相似的目标候选文档,并将所述第一问题与所述目标候选文档组成新的问题文档对;
根据第二预置预训练语言模型,得到所述新的问题文档对的第二文本语义向量信息;
根据所述第二文本语义向量信息和预置带标签答案文档训练预置多文档答案预测模型,生成对应的阅读理解模型。
其中,确第三生成模块403具体还用于:
将所述第二文本语义向量信息和预置带标签答案文档输入预置多文档机器答案预测模型,得到所述第二文本语义向量信息中目标文档的答案起始位置概率和答案结尾位置概率,以及所述预置带标签答案文档的答案起始位置概率和答案结尾位置概率;
根据所述目标文档的答案起始位置概率和答案结尾位置概率,以及预置带标签答案文档的答案起始位置概率和答案结尾位置概率,得到对应的第二损失函数;
根据所述损第二失函数和反向传播机制,更新所述预置多文档答案预测模型的模型参数,生成对应的阅读理解模型。
其中,第二获取模块406具体还用于:
将所述第二问题和所述目标文档组成对应的第二问题文档对,并输入到所述阅读理解模型的输入层中;
基于所述阅读理解模型的概率预测层,预测所述目标文档中所述第二问题对应的多个答案起始位置概率和答案结束位置概率;
基于所述阅读理解模型的概率比对层,比对多个所述答案起始位置概率和所述答案结束位置概率,确定概率最高的目标起始位置和概率最高的目标结束位置;
基于所述阅读理解模型的输出层,获取所述输出层输出的所述目标文档中所述目标起始位置和所述目标结束位置对应的目标文本。
需要说明的是,所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的装置和各模块及单元的具体工作过程,可以参考前述基于BERT的机器阅读理解方法实施例中的对应过程,在此不再赘述。
上述实施例提供的装置可以实现为一种计算机程序的形式,该计算机程序可以在如图6所示的计算机设备上运行。
请参阅图6,图6为本申请实施例提供的一种计算机设备的结构示意性框图。该计算机设备可以为终端。
如图6所示,该计算机设备包括通过系统总线连接的处理器、存储器和网络接口,其中,存储器可以包括非易失性存储介质和内存储器。
非易失性存储介质可存储操作系统和计算机程序。该计算机程序包括程序指令,该程序指令被执行时,可使得处理器执行任意一种基于BERT的机器阅读理解方法。
处理器用于提供计算和控制能力,支撑整个计算机设备的运行。
内存储器为非易失性存储介质中的计算机程序的运行提供环境,该计算机程序被处理器执行时,可使得处理器执行任意一种基于BERT的机器阅读理解方法。
该网络接口用于进行网络通信,如发送分配的任务等。本领域技术人员可以理解,图6中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
应当理解的是,处理器可以是中央处理单元(Central Processing Unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
其中,在一个实施例中,所述处理器用于运行存储在存储器中的计算机程序,以实现如下步骤:
获取待训练的第一问题和多个候选文档,将所述第一问题分别与各个候选文档组合,生成待训练问题文档对;
根据所述待训练问题文档对训练第一预置预训练语言模型,生成文档排序模型;
根据所述待训练问题文档对训练预置多文档答案预测模型,生成阅读理解模型;
获取待预测问题文档对,其中,所述待预测问题文档对包括第二问题和所述第二问题对应的多个候选文档;
基于所述文档排序模型,根据所述待预测问题文档对,输出所述第二问题对应的目标文档;
基于所述阅读理解模型,根据所述第二问题和所述目标文档,获取所述阅读理解模型输出所述目标文档中的目标文本,并将所述目标文本作为所述第二问题的阅读理解答案。
在一个实施例中,所述处理器根据所述待训练问题文档对训练第一预置预训练语言模型,生成文档排序模型实现时,用于实现:
根据所述字典文件和所述待训练问题文档对,确定所述待训练问题文档对的第一文本向量信息;
根据所述自注意力网络模型和所述第一文本向量信息,获取所述第一文本向量信息对应的第一文本语义向量信息;
基于结巴工具,获取所述待训练问题文档对中第一问题和所述各个候选文档的位置特征向量信息;
确定所述待训练问题文档对中所述第一问题与所述各个候选文档的相同字词特征和非共同字词特征,得到对应的字词特征向量信息;
根据所述中文分词工具和所述待训练问题文档对,获取所述待训练问题文档对的命名实体特征向量信息;
根据所述第一文本语义向量信息、所述位置特征向量信息、所述相同字词特征向量信息和所述命名实体特征向量信息,得到所述待训练问题文档对的第一文本语义特征向量信息;
根据所述第一文本语义特征向量信息,得到对应的第一损失函数;
根据所述第一损失函数更新所述第一预置预训练语言模型的模型参数,生成文档排序模型。
在一个实施例中,所述处理器根据所述字典文件和所述待训练问题文档对,确定所述待训练问题文档对的第一文本向量信息实现时,用于实现:
根据所述字典对所述待训练问题文档对进行字词切分,得到所述待训练问题文档对中第一问题的第一问题序列以及所述各个文档的文档序列;
将所述第一问题序列和所述文档序列进行拼接,生成对应的第一文本序列;
将所述第一文本序列进行特征向量转换,得到对应的第一文本向量信息。
在一个实施例中,所述处理器根据所述自注意力网络模型和所述第一文本向量信息,获取所述第一文本向量信息对应的第一文本语义向量信息实现时,用于实现:
将所述第一文本向量信息输入所述自注意力网络模型,得到所述第一文本向量信息的各个语义空间的文本语义向量信息;
根据所述各个语义空间的文本语义向量信息,获取所述自注意力网络模型输出的第一文本语义向量信息。
在一个实施例中,所述处理器根据所述待训练问题文档对训练预置多文档答案预测模型,生成阅读理解模型实现时,用于实现:
确定所述待训练问题文档对的多个候选文档中与所述第一问题的答案最相似的目标候选文档,并将所述第一问题与所述目标候选文档组成新的问题文档对;
根据第二预置预训练语言模型,得到所述新的问题文档对的第二文本语义向量信息;
根据所述第二文本语义向量信息和预置带标签答案文档训练预置多文档答案预测模型,生成对应的阅读理解模型。
在一个实施例中,所述处理器所述根据所述第二文本语义向量信息和预置带标签答案文档训练预置多文档答案预测模型,生成对应的阅读理解模型实现时,用于实现:
将所述第二文本语义向量信息和预置带标签答案文档输入预置多文档机器答案预测模型,得到所述第二文本语义向量信息中目标文档的答案起始位置概率和答案结尾位置概率,以及所述预置带标签答案文档的答案起始位置概率和答案结尾位置概率;
根据所述目标文档的答案起始位置概率和答案结尾位置概率,以及预置带标签答案文档的答案起始位置概率和答案结尾位置概率,得到对应的第二损失函数;
根据所述损第二失函数和反向传播机制,更新所述预置多文档答案预测模型的模型参数,生成对应的阅读理解模型。
在一个实施例中,所述处理器基于所述阅读理解模型,根据所述第二问题和所述目标文档,获取所述阅读理解模型输出所述目标文档中的目标文本实现时,用于实现:
将所述第二问题和所述目标文档组成对应的第二问题文档对,并输入到所述阅读理解模型的输入层中;
基于所述阅读理解模型的概率预测层,预测所述目标文档中所述第二问题对应的多个答案起始位置概率和答案结束位置概率;
基于所述阅读理解模型的概率比对层,比对多个所述答案起始位置概率和所述答案结束位置概率,确定概率最高的目标起始位置和概率最高的目标结束位置;
基于所述阅读理解模型的输出层,获取所述输出层输出的所述目标文档中所述目标起始位置和所述目标结束位置对应的目标文本。
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序中包括程序指令,所述程序指令被执行时所实现的方法可参照本申请基于BERT的机器阅读理解方法的各个实施例。
其中,所述计算机可读存储介质可以是前述实施例所述的计算机设备的内部存储单元,例如所述计算机设备的硬盘或内存。所述计算机可读存储介质也可以是所述计算机设备的外部存储设备,例如所述计算机设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。所述计算机可读 存储介质可以是非易失性,也可以是易失性。
进一步地,所述计算机可读存储介质可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据区块链节点的使用所创建的数据等。
本申请所指区块链为文本排序模型和阅读理解模型的存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (20)

  1. 一种基于BERT的机器阅读理解方法,其中,包括:
    获取待训练的第一问题和多个候选文档,将所述第一问题分别与各个候选文档组合,生成待训练问题文档对;
    根据所述待训练问题文档对训练第一预置预训练语言模型,生成文档排序模型;
    根据所述待训练问题文档对训练预置多文档答案预测模型,生成阅读理解模型;
    获取待预测问题文档对,其中,所述待预测问题文档对包括第二问题和所述第二问题对应的多个候选文档;
    基于所述文档排序模型,根据所述待预测问题文档对,输出所述第二问题对应的目标文档;
    基于所述阅读理解模型,根据所述第二问题和所述目标文档,获取所述阅读理解模型输出所述目标文档中的目标文本,并将所述目标文本作为所述第二问题的阅读理解答案。
  2. 如权利要求1所述的基于BERT的机器阅读理解方法,其中,所述第一预置预训练语言模型包括字典文件、自注意力网络模型、结巴工具和中文分词工具;所述根据所述待训练问题文档对训练第一预置预训练语言模型,生成文档排序模型,包括:
    根据所述字典文件和所述待训练问题文档对,确定所述待训练问题文档对的第一文本向量信息;
    根据所述自注意力网络模型和所述第一文本向量信息,获取所述第一文本向量信息对应的第一文本语义向量信息;
    基于结巴工具,获取所述待训练问题文档对中第一问题和所述各个候选文档的位置特征向量信息;
    确定所述待训练问题文档对中所述第一问题与所述各个候选文档的相同字词特征和非共同字词特征,得到对应的字词特征向量信息;
    根据所述中文分词工具和所述待训练问题文档对,获取所述待训练问题文档对的命名实体特征向量信息;
    根据所述第一文本语义向量信息、所述位置特征向量信息、所述相同字词特征向量信息和所述命名实体特征向量信息,得到所述待训练问题文档对的第一文本语义特征向量信息;
    根据所述第一文本语义特征向量信息,得到对应的第一损失函数;
    根据所述第一损失函数更新所述第一预置预训练语言模型的模型参数,生成文档排序模型。
  3. 如权利要求2所述的基于BERT的机器阅读理解方法,其中,所述根据所述字典文件和所述待训练问题文档对,确定所述待训练问题文档对的第一文本向量信息,包括:
    根据所述字典对所述待训练问题文档对进行字词切分,得到所述待训练问题文档对中第一问题的第一问题序列以及所述各个文档的文档序列;
    将所述第一问题序列和所述文档序列进行拼接,生成对应的第一文本序列;
    将所述第一文本序列进行特征向量转换,得到对应的第一文本向量信息。
  4. 如权利要求2所述的基于BERT的机器阅读理解方法,其中,所述根据所述自注意力网络模型和所述第一文本向量信息,获取所述第一文本向量信息对应的第一文本语义向量信息,包括:
    将所述第一文本向量信息输入所述自注意力网络模型,得到所述第一文本向量信息的各个语义空间的文本语义向量信息;
    根据所述各个语义空间的文本语义向量信息,获取所述自注意力网络模型输出的第一文本语义向量信息。
  5. 如权利要求1所述的基于BERT的机器阅读理解方法,其中,所述根据所述待训练问题文档对训练预置多文档答案预测模型,生成阅读理解模型,包括:
    确定所述待训练问题文档对的多个候选文档中与所述第一问题的答案最相似的目标候选文档,并将所述第一问题与所述目标候选文档组成新的问题文档对;
    根据第二预置预训练语言模型,得到所述新的问题文档对的第二文本语义向量信息;
    根据所述第二文本语义向量信息和预置带标签答案文档训练预置多文档答案预测模型,生成对应的阅读理解模型。
  6. 如权利要求5所述的基于BERT的机器阅读理解方法,其中,所述根据所述第二文本语义向量信息和预置带标签答案文档训练预置多文档答案预测模型,生成对应的阅读理解模型,包括:
    将所述第二文本语义向量信息和预置带标签答案文档输入预置多文档机器答案预测模型,得到所述第二文本语义向量信息中目标文档的答案起始位置概率和答案结尾位置概率,以及所述预置带标签答案文档的答案起始位置概率和答案结尾位置概率;
    根据所述目标文档的答案起始位置概率和答案结尾位置概率,以及预置带标签答案文档的答案起始位置概率和答案结尾位置概率,得到对应的第二损失函数;
    根据所述损第二失函数和反向传播机制,更新所述预置多文档答案预测模型的模型参数,生成对应的阅读理解模型。
  7. 如权利要求1所述的基于BERT的机器阅读理解方法,其中,基于所述阅读理解模型,根据所述第二问题和所述目标文档,获取所述阅读理解模型输出所述目标文档中的目标文本,包括:
    将所述第二问题和所述目标文档组成对应的第二问题文档对,并输入到所述阅读理解模型的输入层中;
    基于所述阅读理解模型的概率预测层,预测所述目标文档中所述第二问题对应的多个答案起始位置概率和答案结束位置概率;
    基于所述阅读理解模型的概率比对层,比对多个所述答案起始位置概率和所述答案结束位置概率,确定概率最高的目标起始位置和概率最高的目标结束位置;
    基于所述阅读理解模型的输出层,获取所述输出层输出的所述目标文档中所述目标起始位置和所述目标结束位置对应的目标文本。
  8. 一种基于BERT的机器阅读理解装置,其中,所述装置包括:
    第一生成模块,用于获取待训练的第一问题和多个候选文档,将所述第一问题分别与各个候选文档组合,生成待训练问题文档对;
    第二生成模块,用于根据所述待训练问题文档对训练第一预置预训练语言模型,生成文档排序模型;
    第三生成模块,用于根据所述待训练问题文档对训练预置多文档答案预测模型,生成阅读理解模型;
    第一获取模块,用于获取待预测问题文档对,其中,所述待预测问题文档对包括第二问题和所述第二问题对应的多个候选文档;
    输出模块,用于基于所述文档排序模型,根据所述待预测问题文档对,输出所述第二问题对应的目标文档;
    第二获取模块,用于基于所述阅读理解模型,根据所述第二问题和所述目标文档,获取所述阅读理解模型输出所述目标文档中的目标文本,并将所述目标文本作为所述第二问题的阅读理解答案。
  9. 一种计算机设备,其中,所述计算机设备包括处理器、存储器、以及存储在所述存储器上并可被所述处理器执行的计算机程序,其中所述计算机程序被所述处理器执行时,实现如下步骤:
    获取待训练的第一问题和多个候选文档,将所述第一问题分别与各个候选文档组合,生成待训练问题文档对;
    根据所述待训练问题文档对训练第一预置预训练语言模型,生成文档排序模型;
    根据所述待训练问题文档对训练预置多文档答案预测模型,生成阅读理解模型;
    获取待预测问题文档对,其中,所述待预测问题文档对包括第二问题和所述第二问题对应的多个候选文档;
    基于所述文档排序模型,根据所述待预测问题文档对,输出所述第二问题对应的目标文档;
    基于所述阅读理解模型,根据所述第二问题和所述目标文档,获取所述阅读理解模型输出所述目标文档中的目标文本,并将所述目标文本作为所述第二问题的阅读理解答案。
  10. 如权利要求9所述的计算机设备,其中,所述第一预置预训练语言模型包括字典文件、自注意力网络模型、结巴工具和中文分词工具;所述根据所述待训练问题文档对训练第一预置预训练语言模型,生成文档排序模型,包括:
    根据所述字典文件和所述待训练问题文档对,确定所述待训练问题文档对的第一文本向量信息;
    根据所述自注意力网络模型和所述第一文本向量信息,获取所述第一文本向量信息对应的第一文本语义向量信息;
    基于结巴工具,获取所述待训练问题文档对中第一问题和所述各个候选文档的位置特征向量信息;
    确定所述待训练问题文档对中所述第一问题与所述各个候选文档的相同字词特征和非共同字词特征,得到对应的字词特征向量信息;
    根据所述中文分词工具和所述待训练问题文档对,获取所述待训练问题文档对的命名实体特征向量信息;
    根据所述第一文本语义向量信息、所述位置特征向量信息、所述相同字词特征向量信息和所述命名实体特征向量信息,得到所述待训练问题文档对的第一文本语义特征向量信息;
    根据所述第一文本语义特征向量信息,得到对应的第一损失函数;
    根据所述第一损失函数更新所述第一预置预训练语言模型的模型参数,生成文档排序模型。
  11. 如权利要求10所述的计算机设备,其中,所述根据所述字典文件和所述待训练问题文档对,确定所述待训练问题文档对的第一文本向量信息,包括:
    根据所述字典对所述待训练问题文档对进行字词切分,得到所述待训练问题文档对中第一问题的第一问题序列以及所述各个文档的文档序列;
    将所述第一问题序列和所述文档序列进行拼接,生成对应的第一文本序列;
    将所述第一文本序列进行特征向量转换,得到对应的第一文本向量信息。
  12. 如权利要求10所述的计算机设备,其中,所述根据所述自注意力网络模型和所述第一文本向量信息,获取所述第一文本向量信息对应的第一文本语义向量信息,包括:
    将所述第一文本向量信息输入所述自注意力网络模型,得到所述第一文本向量信息的各个语义空间的文本语义向量信息;
    根据所述各个语义空间的文本语义向量信息,获取所述自注意力网络模型输出的第一文本语义向量信息。
  13. 如权利要求9所述的计算机设备,其中,所述根据所述待训练问题文档对训练预置多文档答案预测模型,生成阅读理解模型,包括:
    确定所述待训练问题文档对的多个候选文档中与所述第一问题的答案最相似的目标候选文档,并将所述第一问题与所述目标候选文档组成新的问题文档对;
    根据第二预置预训练语言模型,得到所述新的问题文档对的第二文本语义向量信息;
    根据所述第二文本语义向量信息和预置带标签答案文档训练预置多文档答案预测模型,生成对应的阅读理解模型。
  14. 如权利要求13所述的计算机设备,其中,所述根据所述第二文本语义向量信息和预置带标签答案文档训练预置多文档答案预测模型,生成对应的阅读理解模型,包括:
    将所述第二文本语义向量信息和预置带标签答案文档输入预置多文档机器答案预测模型,得到所述第二文本语义向量信息中目标文档的答案起始位置概率和答案结尾位置概率,以及所述预置带标签答案文档的答案起始位置概率和答案结尾位置概率;
    根据所述目标文档的答案起始位置概率和答案结尾位置概率,以及预置带标签答案文档的答案起始位置概率和答案结尾位置概率,得到对应的第二损失函数;
    根据所述损第二失函数和反向传播机制,更新所述预置多文档答案预测模型的模型参数,生成对应的阅读理解模型。
  15. 如权利要求9所述的计算机设备,其中,基于所述阅读理解模型,根据所述第二问题和所述目标文档,获取所述阅读理解模型输出所述目标文档中的目标文本,包括:
    将所述第二问题和所述目标文档组成对应的第二问题文档对,并输入到所述阅读理解模型的输入层中;
    基于所述阅读理解模型的概率预测层,预测所述目标文档中所述第二问题对应的多个答案起始位置概率和答案结束位置概率;
    基于所述阅读理解模型的概率比对层,比对多个所述答案起始位置概率和所述答案结束位置概率,确定概率最高的目标起始位置和概率最高的目标结束位置;
    基于所述阅读理解模型的输出层,获取所述输出层输出的所述目标文档中所述目标起始位置和所述目标结束位置对应的目标文本。
  16. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有计算机程序,其中所述计算机程序被处理器执行时,实现如下步骤:
    获取待训练的第一问题和多个候选文档,将所述第一问题分别与各个候选文档组合,生成待训练问题文档对;
    根据所述待训练问题文档对训练第一预置预训练语言模型,生成文档排序模型;
    根据所述待训练问题文档对训练预置多文档答案预测模型,生成阅读理解模型;
    获取待预测问题文档对,其中,所述待预测问题文档对包括第二问题和所述第二问题对应的多个候选文档;
    基于所述文档排序模型,根据所述待预测问题文档对,输出所述第二问题对应的目标文档;
    基于所述阅读理解模型,根据所述第二问题和所述目标文档,获取所述阅读理解模型输出所述目标文档中的目标文本,并将所述目标文本作为所述第二问题的阅读理解答案。
  17. 如权利要求16所述的计算机可读存储介质,其中,所述第一预置预训练语言模型包括字典文件、自注意力网络模型、结巴工具和中文分词工具;所述根据所述待训练问题文档对训练第一预置预训练语言模型,生成文档排序模型,包括:
    根据所述字典文件和所述待训练问题文档对,确定所述待训练问题文档对的第一文本向量信息;
    根据所述自注意力网络模型和所述第一文本向量信息,获取所述第一文本向量信息对应的第一文本语义向量信息;
    基于结巴工具,获取所述待训练问题文档对中第一问题和所述各个候选文档的位置特征向量信息;
    确定所述待训练问题文档对中所述第一问题与所述各个候选文档的相同字词特征和非共同字词特征,得到对应的字词特征向量信息;
    根据所述中文分词工具和所述待训练问题文档对,获取所述待训练问题文档对的命名实体特征向量信息;
    根据所述第一文本语义向量信息、所述位置特征向量信息、所述相同字词特征向量信息和所述命名实体特征向量信息,得到所述待训练问题文档对的第一文本语义特征向量信息;
    根据所述第一文本语义特征向量信息,得到对应的第一损失函数;
    根据所述第一损失函数更新所述第一预置预训练语言模型的模型参数,生成文档排序模型。
  18. 如权利要求17所述的计算机可读存储介质,其中,所述根据所述字典文件和所述待训练问题文档对,确定所述待训练问题文档对的第一文本向量信息,包括:
    根据所述字典对所述待训练问题文档对进行字词切分,得到所述待训练问题文档对中第一问题的第一问题序列以及所述各个文档的文档序列;
    将所述第一问题序列和所述文档序列进行拼接,生成对应的第一文本序列;
    将所述第一文本序列进行特征向量转换,得到对应的第一文本向量信息。
  19. 如权利要求17所述的计算机可读存储介质,其中,所述根据所述自注意力网络模型和所述第一文本向量信息,获取所述第一文本向量信息对应的第一文本语义向量信息,包括:
    将所述第一文本向量信息输入所述自注意力网络模型,得到所述第一文本向量信息的各个语义空间的文本语义向量信息;
    根据所述各个语义空间的文本语义向量信息,获取所述自注意力网络模型输出的第一文本语义向量信息。
  20. 如权利要求16所述的计算机可读存储介质,其中,所述根据所述待训练问题文档对训练预置多文档答案预测模型,生成阅读理解模型,包括:
    确定所述待训练问题文档对的多个候选文档中与所述第一问题的答案最相似的目标候选文档,并将所述第一问题与所述目标候选文档组成新的问题文档对;
    根据第二预置预训练语言模型,得到所述新的问题文档对的第二文本语义向量信息;
    根据所述第二文本语义向量信息和预置带标签答案文档训练预置多文档答案预测模 型,生成对应的阅读理解模型。
PCT/CN2021/097422 2020-10-29 2021-05-31 基于bert的机器阅读理解方法、装置、设备及存储介质 WO2022088672A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011187381.0 2020-10-29
CN202011187381.0A CN112464641B (zh) 2020-10-29 2020-10-29 基于bert的机器阅读理解方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022088672A1 true WO2022088672A1 (zh) 2022-05-05

Family

ID=74834226

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/097422 WO2022088672A1 (zh) 2020-10-29 2021-05-31 基于bert的机器阅读理解方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN112464641B (zh)
WO (1) WO2022088672A1 (zh)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114818644A (zh) * 2022-06-27 2022-07-29 北京云迹科技股份有限公司 文本模板生成方法、装置、设备及存储介质
CN114926272A (zh) * 2022-06-16 2022-08-19 平安科技(深圳)有限公司 基于端到端模型的行为逾期预测方法、系统、设备和介质
CN115169368A (zh) * 2022-09-07 2022-10-11 北京沃丰时代数据科技有限公司 基于多文档的机器阅读理解方法及装置
CN115269807A (zh) * 2022-08-17 2022-11-01 北京中科深智科技有限公司 一种基于问题类型识别的问答对联合生成模型
CN115525773A (zh) * 2022-10-10 2022-12-27 北京智源人工智能研究院 知识图谱补全模型的训练方法和装置
CN115587175A (zh) * 2022-12-08 2023-01-10 阿里巴巴达摩院(杭州)科技有限公司 人机对话及预训练语言模型训练方法、系统及电子设备
CN116312915A (zh) * 2023-05-19 2023-06-23 之江实验室 一种电子病历中药物术语标准化关联方法及系统
CN117521659A (zh) * 2024-01-04 2024-02-06 西安电子科技大学 基于语义增强预训练孪生网络的中文实体链接方法和系统

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112464641B (zh) * 2020-10-29 2023-01-03 平安科技(深圳)有限公司 基于bert的机器阅读理解方法、装置、设备及存储介质
CN113010655B (zh) * 2021-03-18 2022-12-16 华南理工大学 一种机器阅读理解的回答与干扰项生成方法、装置
CN113204611A (zh) * 2021-04-06 2021-08-03 北京百度网讯科技有限公司 建立阅读理解模型的方法、阅读理解方法及对应装置
CN113159187B (zh) * 2021-04-23 2024-06-14 北京金山数字娱乐科技有限公司 分类模型训练方法及装置、目标文本确定方法及装置
CN113407685B (zh) * 2021-05-14 2024-09-06 北京金山数字娱乐科技有限公司 一种样本筛选方法及装置
CN113191159B (zh) * 2021-05-25 2023-01-20 广东电网有限责任公司广州供电局 一种机器阅读理解方法、装置、设备和存储介质
CN113515620A (zh) * 2021-07-20 2021-10-19 云知声智能科技股份有限公司 电力设备技术标准文档排序方法、装置、电子设备和介质
CN113688876B (zh) * 2021-07-30 2023-08-22 华东师范大学 一种基于lda和bert的金融文本机器阅读理解方法
CN113590787B (zh) * 2021-07-30 2024-07-26 胡昌然 一种机器阅读理解方法、装置、计算机设备及计算机可读存储介质
CN113779360A (zh) * 2021-08-18 2021-12-10 深圳技术大学 基于多头问答模型的解题方法、装置、设备及存储介质
CN113722436A (zh) * 2021-08-30 2021-11-30 平安科技(深圳)有限公司 文本信息提取方法、装置、计算机设备及存储介质
CN113836268A (zh) * 2021-09-24 2021-12-24 北京百度网讯科技有限公司 文档理解方法及装置、电子设备和介质
CN113837294B (zh) * 2021-09-27 2023-09-01 平安科技(深圳)有限公司 模型训练及调用方法、装置、计算机设备、存储介质
CN115905459A (zh) * 2022-03-07 2023-04-04 北京有限元科技有限公司 问题答案的预测方法、装置及存储介质
CN114638365B (zh) * 2022-05-17 2022-09-06 之江实验室 一种机器阅读理解推理方法及装置、电子设备、存储介质
CN115309910B (zh) * 2022-07-20 2023-05-16 首都师范大学 语篇要素和要素关系联合抽取方法、知识图谱构建方法
CN115455160B (zh) * 2022-09-02 2024-08-06 腾讯科技(深圳)有限公司 一种多文档阅读理解方法、装置、设备及存储介质
CN116720008B (zh) * 2023-08-11 2024-01-09 之江实验室 一种机器阅读方法、装置、存储介质及电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160358072A1 (en) * 2015-06-05 2016-12-08 Google Inc. Reading comprehension neural networks
CN110765254A (zh) * 2019-10-21 2020-02-07 北京理工大学 一种融合多视角答案重排序的多文档问答系统模型
CN110866102A (zh) * 2019-11-07 2020-03-06 浪潮软件股份有限公司 检索处理方法
CN111046152A (zh) * 2019-10-12 2020-04-21 平安科技(深圳)有限公司 Faq问答对自动构建方法、装置、计算机设备及存储介质
CN111460089A (zh) * 2020-02-18 2020-07-28 北京邮电大学 一种多段落阅读理解候选答案排序方法和装置
CN112464641A (zh) * 2020-10-29 2021-03-09 平安科技(深圳)有限公司 基于bert的机器阅读理解方法、装置、设备及存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5048598B2 (ja) * 2008-06-25 2012-10-17 ヤフー株式会社 テキスト抽出装置、そのシステム、その方法、および、そのプログラム
CN110096699B (zh) * 2019-03-20 2023-06-09 华南师范大学 基于语义的机器阅读理解的候选答案筛选方法和系统
CN110647629B (zh) * 2019-09-20 2021-11-02 北京理工大学 一种多粒度答案排序的多文档机器阅读理解方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160358072A1 (en) * 2015-06-05 2016-12-08 Google Inc. Reading comprehension neural networks
CN111046152A (zh) * 2019-10-12 2020-04-21 平安科技(深圳)有限公司 Faq问答对自动构建方法、装置、计算机设备及存储介质
CN110765254A (zh) * 2019-10-21 2020-02-07 北京理工大学 一种融合多视角答案重排序的多文档问答系统模型
CN110866102A (zh) * 2019-11-07 2020-03-06 浪潮软件股份有限公司 检索处理方法
CN111460089A (zh) * 2020-02-18 2020-07-28 北京邮电大学 一种多段落阅读理解候选答案排序方法和装置
CN112464641A (zh) * 2020-10-29 2021-03-09 平安科技(深圳)有限公司 基于bert的机器阅读理解方法、装置、设备及存储介质

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114926272B (zh) * 2022-06-16 2023-05-12 平安科技(深圳)有限公司 基于端到端模型的行为逾期预测方法、系统、设备和介质
CN114926272A (zh) * 2022-06-16 2022-08-19 平安科技(深圳)有限公司 基于端到端模型的行为逾期预测方法、系统、设备和介质
CN114818644A (zh) * 2022-06-27 2022-07-29 北京云迹科技股份有限公司 文本模板生成方法、装置、设备及存储介质
CN115269807A (zh) * 2022-08-17 2022-11-01 北京中科深智科技有限公司 一种基于问题类型识别的问答对联合生成模型
CN115169368A (zh) * 2022-09-07 2022-10-11 北京沃丰时代数据科技有限公司 基于多文档的机器阅读理解方法及装置
CN115169368B (zh) * 2022-09-07 2022-11-29 北京沃丰时代数据科技有限公司 基于多文档的机器阅读理解方法及装置
CN115525773A (zh) * 2022-10-10 2022-12-27 北京智源人工智能研究院 知识图谱补全模型的训练方法和装置
CN115587175A (zh) * 2022-12-08 2023-01-10 阿里巴巴达摩院(杭州)科技有限公司 人机对话及预训练语言模型训练方法、系统及电子设备
CN115587175B (zh) * 2022-12-08 2023-03-14 阿里巴巴达摩院(杭州)科技有限公司 人机对话及预训练语言模型训练方法、系统及电子设备
CN116312915A (zh) * 2023-05-19 2023-06-23 之江实验室 一种电子病历中药物术语标准化关联方法及系统
CN116312915B (zh) * 2023-05-19 2023-09-19 之江实验室 一种电子病历中药物术语标准化关联方法及系统
CN117521659A (zh) * 2024-01-04 2024-02-06 西安电子科技大学 基于语义增强预训练孪生网络的中文实体链接方法和系统
CN117521659B (zh) * 2024-01-04 2024-03-26 西安电子科技大学 基于语义增强预训练孪生网络的中文实体链接方法和系统

Also Published As

Publication number Publication date
CN112464641A (zh) 2021-03-09
CN112464641B (zh) 2023-01-03

Similar Documents

Publication Publication Date Title
WO2022088672A1 (zh) 基于bert的机器阅读理解方法、装置、设备及存储介质
CN108959246B (zh) 基于改进的注意力机制的答案选择方法、装置和电子设备
CN108829757B (zh) 一种聊天机器人的智能服务方法、服务器及存储介质
US11468239B2 (en) Joint intent and entity recognition using transformer models
CN111814466A (zh) 基于机器阅读理解的信息抽取方法、及其相关设备
WO2020224219A1 (zh) 中文分词方法、装置、电子设备及可读存储介质
CN111506714A (zh) 基于知识图嵌入的问题回答
WO2022088671A1 (zh) 自动问答方法、装置、设备及存储介质
WO2022142011A1 (zh) 一种地址识别方法、装置、计算机设备及存储介质
CN110209832B (zh) 上下位关系的判别方法、系统和计算机设备
WO2020233131A1 (zh) 问答处理方法、装置、计算机设备和存储介质
CN110765277B (zh) 一种基于知识图谱的移动端的在线设备故障诊断方法
CN112287069B (zh) 基于语音语义的信息检索方法、装置及计算机设备
CN112925898B (zh) 基于人工智能的问答方法、装置、服务器及存储介质
CN111599340A (zh) 一种多音字读音预测方法、装置及计算机可读存储介质
WO2024109619A1 (zh) 敏感数据识别方法、装置、设备及计算机存储介质
CN112052329A (zh) 文本摘要生成方法、装置、计算机设备及可读存储介质
Das et al. Sentence embedding models for similarity detection of software requirements
CN115827819A (zh) 一种智能问答处理方法、装置、电子设备及存储介质
CN116304307A (zh) 一种图文跨模态检索网络训练方法、应用方法及电子设备
CN113420119B (zh) 基于知识卡片的智能问答方法、装置、设备及存储介质
CN113204956B (zh) 多模型训练方法、摘要分段方法、文本分段方法及装置
CN113515593A (zh) 基于聚类模型的话题检测方法、装置和计算机设备
CN114842982B (zh) 一种面向医疗信息系统的知识表达方法、装置及系统
CN115630652A (zh) 客服会话情感分析系统、方法及计算机系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21884415

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21884415

Country of ref document: EP

Kind code of ref document: A1