WO2023134085A1 - Procédé de prédiction et appareil de prédiction de réponse à une question, dispositif électronique et support de stockage - Google Patents

Procédé de prédiction et appareil de prédiction de réponse à une question, dispositif électronique et support de stockage Download PDF

Info

Publication number
WO2023134085A1
WO2023134085A1 PCT/CN2022/090750 CN2022090750W WO2023134085A1 WO 2023134085 A1 WO2023134085 A1 WO 2023134085A1 CN 2022090750 W CN2022090750 W CN 2022090750W WO 2023134085 A1 WO2023134085 A1 WO 2023134085A1
Authority
WO
WIPO (PCT)
Prior art keywords
candidate
question
original
text
vector
Prior art date
Application number
PCT/CN2022/090750
Other languages
English (en)
Chinese (zh)
Inventor
舒畅
陈又新
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023134085A1 publication Critical patent/WO2023134085A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular to a method for predicting answers to questions, a predicting device, electronic equipment, and a storage medium.
  • Machine reading comprehension aims to allow machines to find answers to questions in a given text. This is a basic application scenario in natural language processing. Machine reading comprehension is widely used in question answering and dialogue systems.
  • the embodiment of the present application proposes a method for predicting answers to questions, including:
  • the original topic data includes original article data and original question data to be answered;
  • the associated data includes question mark vectors, candidate mark vectors corresponding to each of the candidate texts, and associated values; wherein, the The association value is used to characterize the association between the original question data and each of the candidate texts;
  • the embodiment of the present application proposes a device for predicting answers to questions, including:
  • the acquisition module is used to acquire the original topic data to be predicted; the original topic data includes original article data and original question data to be answered;
  • An encoding module configured to encode the original article data and the original question data according to a preset first pre-training model to obtain a question encoding vector and an article encoding vector;
  • An attention screening module configured to perform attention screening processing on the question encoding vector and the article encoding vector to obtain a plurality of candidate texts
  • An association module configured to associate the original question data with each of the candidate texts to obtain associated data; wherein the associated data includes question mark vectors, candidate mark vectors corresponding to each of the candidate texts, and association values ; Wherein, the association value is used to characterize the association between the original question data and each of the candidate texts;
  • An answer screening module configured to perform answer screening processing on the question mark vector, the plurality of candidate texts, and each of the candidate mark vectors to obtain a confidence degree corresponding to each of the candidate texts; wherein, the confidence degree Used to characterize the probability that the candidate text contains a candidate answer;
  • a processing module configured to determine a candidate position according to the correlation value, the confidence level and a preset prediction threshold; wherein the candidate position is the position of the candidate answer;
  • a matching module configured to match corresponding candidate texts according to the candidate positions to obtain the candidate answers.
  • the embodiment of the present application provides an electronic device, including:
  • the program is stored in the memory, and the processor executes the at least one program to implement a method for predicting an answer to a question; wherein the method for predicting an answer to a question includes:
  • the original topic data includes original article data and original question data to be answered;
  • the associated data includes question mark vectors, candidate mark vectors corresponding to each of the candidate texts, and associated values; wherein, the The association value is used to characterize the association between the original question data and each of the candidate texts;
  • the embodiment of the present application provides a storage medium, the storage medium is a computer-readable storage medium, and the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are used to make the computer Carry out a method for predicting the answer to the question; wherein, the method for predicting the answer to the question includes:
  • the original topic data includes original article data and original question data to be answered;
  • the associated data includes question mark vectors, candidate mark vectors corresponding to each of the candidate texts, and associated values; wherein, the The association value is used to characterize the association between the original question data and each of the candidate texts;
  • a method for predicting answers to questions, a predicting device, electronic equipment, and a storage medium proposed in this application can delete useless text information that has nothing to do with the answers through attention screening mechanisms, association processing, and answer screening processing, effectively Select the part of the article that is relevant to the question, thereby improving the accuracy of the predicted answer.
  • Fig. 1 is the flowchart of the prediction method of the question answer that the embodiment of the present application provides;
  • Fig. 2 is the flowchart of the specific method of step S300 in Fig. 1;
  • Fig. 3 is the flow chart of the concrete method of step S330 in Fig. 2;
  • Fig. 4 is the flow chart of the specific method of step S400 in Fig. 1;
  • Fig. 5 is the flowchart of the specific method of step S430 in Fig. 4;
  • FIG. 6 is a flowchart of a specific method of step S500 in FIG. 1;
  • FIG. 7 is a flowchart of a specific method of step S530 in FIG. 6;
  • FIG. 8 is a block diagram of a device for predicting answers to questions provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a hardware structure of an electronic device provided by an embodiment of the present application.
  • Artificial Intelligence It is a new technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence; artificial intelligence is a branch of computer science. Intelligence attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a manner similar to human intelligence. Research in this field includes robotics, language recognition, image recognition, natural language processing, and expert systems. Artificial intelligence can simulate the information process of human consciousness and thinking. Artificial intelligence is also a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • Natural language processing uses computers to process, understand and use human languages (such as Chinese, English, etc.). NLP belongs to a branch of artificial intelligence and is an interdisciplinary subject between computer science and linguistics. Known as computational linguistics. Natural language processing includes syntax analysis, semantic analysis, text understanding, etc. Natural language processing is often used in technical fields such as machine translation, handwritten and printed character recognition, speech recognition and text-to-speech conversion, information retrieval, information extraction and filtering, text classification and clustering, public opinion analysis and opinion mining. It involves language processing Related data mining, machine learning, knowledge acquisition, knowledge engineering, artificial intelligence research and linguistics research related to language computing, etc.
  • Medical cloud refers to the use of "cloud computing" to create a medical and health service cloud based on new technologies such as cloud computing, mobile technology, multimedia, 4G communication, big data, and the Internet of Things, combined with medical technology.
  • the platform realizes the sharing of medical resources and the expansion of medical coverage.
  • medical cloud improves the efficiency of medical institutions and facilitates residents to seek medical treatment. For example, appointment registration, electronic medical records, and medical insurance in hospitals are all products of the combination of cloud computing and the medical field. Medical cloud also has the advantages of data security, information sharing, dynamic expansion, and overall layout.
  • BERT Bidirectional Encoder Representation from Transformers
  • the BERT model further increases the generalization ability of the word vector model, fully describes the character-level, word-level, sentence-level and even inter-sentence relationship features, and is built based on Transformer.
  • Token Embeddings is a word vector, and the first word is a CLS mark, which can be used for subsequent classification tasks;
  • Segment Embeddings is used to distinguish two kinds of sentences, because pre- Training is not only to do LM, but also to do classification tasks with two sentences as input;
  • Position Embeddings the position word vector here is not the trigonometric function in transform, but learned by BERT after training.
  • BERT directly trains a Position Embeddings to retain position information. Each position randomly initializes a vector, joins the model training, and finally obtains an embedding containing position information. In the final combination of Position Embeddings and word embeddings, BERT chooses to splice directly .
  • CLS layer (classification): The CLS layer is part of the BERT model and is used for downstream classification tasks. It is mainly used for single text classification tasks and sentence pair classification tasks.
  • the BERT model inserts a [CLS] symbol in front of the text, and uses the output vector corresponding to the symbol as the semantic representation of the entire text for text classification. It can be understood that this symbol without obvious semantic information will more "fairly" fuse the semantic information of each word/word in the text compared with other words/words already in the text.
  • Sentence pair classification task The actual application scenarios of the sentence pair classification task include: question answering (judging whether a question matches an answer), sentence matching (whether two sentences express the same meaning), etc.
  • sentence pair classification task in addition to adding the [CLS] symbol and using the corresponding output as the semantic representation of the text, the BERT model also uses a [SEP] symbol as a segmentation for the two input sentences, and divides the two sentences respectively. The words are appended with two different text vectors for differentiation.
  • Sigmod function is a common S-type function in biology, also known as the S-type growth curve. In information science, due to its single-increase and inverse function single-increase properties, it is often used as the activation function of neural networks.
  • the sigmod function is also called the Logistic function. It is used for the output of the hidden layer neural unit. The value range is (0,1). It can map a real number to the interval of (0,1) and can be used for binary classification. The effect is better when the feature difference is complex or the difference is not particularly large.
  • Softmax function is a normalized exponential function that can "compress" a K-dimensional vector z containing any real number into another K-dimensional real vector, so that the range of each element is between (0,1). , and the sum of all elements is 1, this function is often used in multi-classification problems.
  • Attention mechanism is the attention mechanism. In layman's terms, the Attention mechanism is to focus on important points and ignore other unimportant factors. Attention is divided into spatial attention and temporal attention. The former is used for image processing and the latter is used for natural language processing. The Attention mechanism in this application is a temporal attention mechanism in natural language processing. The principle of Attention is to calculate the matching degree between the current input sequence and the output vector. The higher the matching degree is, the higher the relative score of the focus point is. The matching degree weight calculated by Attention is limited to the current sequence pair.
  • AI artificial intelligence
  • the embodiments of the present application may acquire and process relevant data based on artificial intelligence technology.
  • artificial intelligence is the theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .
  • Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • Machine Reading Comprehension is a task that tests how well a machine understands natural language by asking it to answer questions based on a given context. It has the potential to revolutionize the relationship between humans and machines. interaction between. MRC is widely used in question answering and dialogue systems.
  • the embodiment of the present application proposes a method for predicting answers to questions, a predicting device, electronic equipment, and a storage medium, which can effectively select parts related to questions in an article, and delete useless text information efficiently, thereby improving Accuracy of predicted answers.
  • the question answer prediction method, prediction device, electronic device, and storage medium provided in the embodiments of the present application are specifically described through the following embodiments. First, the question answer prediction method in the embodiments of the present application is described.
  • the method for predicting the answer to a question provided in the embodiment of the present application relates to the technical field of artificial intelligence.
  • the method for predicting the answer to the question provided by the embodiment of the present application can be applied to the terminal, can also be applied to the server, and can also be software running on the terminal or the server.
  • the terminal can be a smart phone, a tablet computer, a notebook computer, a desktop computer, or a smart watch;
  • the server end can be configured as an independent physical server, or as a server cluster composed of multiple physical servers or as a distributed
  • the system can also be configured to provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery network (Content Delivery Network, CDN) and large Cloud servers for basic cloud computing services such as data and artificial intelligence platforms; software can be applications that implement activity classification model training methods, but are not limited to the above forms.
  • the embodiments of the present application can be used in many general-purpose or special-purpose computer system environments or configurations. Examples: personal computers, server computers, handheld or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer computing devices, network PCs, minicomputers, mainframe computers, including A distributed computing environment for any of the above systems or devices, etc.
  • This application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer.
  • program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • the application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote computer storage media including storage devices.
  • some embodiments of the present application provide a method for predicting answers to questions, including step S100 , step S200 , step S300 , step S400 , step S500 , step S600 and step S700 . These seven steps are described in detail below, and it should be understood that the method for predicting the answer to a question includes but is not limited to these seven steps.
  • Step S100 Obtain the original topic data to be predicted; wherein, the original topic data includes original article data and original question data to be answered.
  • the original topic data may be medical data or other text data. If the original topic data is medical data, the original topic data can be obtained through the medical cloud server, or can be obtained through other channels.
  • Step S200 Encoding the original article data and original question data according to the preset first pre-training model to obtain question encoding vectors and article encoding vectors.
  • the preset first pre-training model may be a BERT model or other neural network models, which is not specifically limited in this application.
  • the letter Q is used to represent the original question data
  • the letter P is used to represent the original article data.
  • the original question data Q and the original article data are input into the BERT model for training, and the hidden state of the last layer of the BERT model is used as
  • the coding results of the original question data Q and the original article data P, the question coding vector and the article coding vector are obtained, and the question coding vector and the article coding vector are recorded as H Q and HP respectively, so as to achieve the original question data and the original article Data encoding processing.
  • Step S300 Perform attention screening on the question encoding vector and the article encoding vector to obtain multiple candidate texts.
  • Step S400 Perform association processing on the original question data and each candidate text to obtain associated data; wherein the associated data includes a question mark vector, a candidate mark vector corresponding to each candidate text, and an associated value; wherein, the associated value is used to represent the original question The correlation between the data and each candidate text.
  • Step S500 Perform answer screening processing on the question mark vector, multiple candidate texts and each candidate mark vector to obtain the corresponding confidence of each candidate text; where the confidence is used to represent the probability that the candidate text contains the candidate answer.
  • Step S600 Determine the candidate position according to the correlation value, the confidence level and the preset prediction threshold; where the candidate position is the position of the candidate answer.
  • Step S700 Match the corresponding candidate texts according to the candidate positions to obtain candidate answers.
  • Score 1 represents the correlation value
  • Score 2 represents the confidence level.
  • the target score value Score is obtained according to the correlation value and the confidence degree, and the target score value is determined by formula (1), which is:
  • the candidate positions are determined according to the target score value Score and the preset prediction threshold. If the target score value Score is greater than or equal to the preset prediction threshold, the candidate text is considered to be the position of the candidate answer, otherwise, the candidate text is considered not to contain the candidate answer, and a null character is output.
  • the corresponding candidate text is matched according to the candidate position, and the candidate answer corresponding to the original question data can be obtained.
  • the method for predicting the answer to the question in the embodiment of the present application uses the preset first pre-training model to encode the original article data and original question data in the original question data to obtain the question encoding vector and the article encoding vector, and then encode the question
  • the vector and the article encoding vector are subjected to attention screening processing to screen multiple candidate texts, and then the original question data and each candidate text are associated with each other to obtain the association value used to characterize the association between the original question data and each candidate text , the question label vector corresponding to the original question data and the candidate label vector corresponding to the candidate text; then answer screening processing is performed on the question label vector, candidate text and candidate label vector, and the confidence degree used to represent the probability of each candidate text containing the candidate answer is obtained , and finally determine the candidate positions according to the correlation value, the confidence level and the preset prediction threshold, so as to determine the candidate answers.
  • the attention screening mechanism, association processing, and answer screening processing useless text information that has nothing to do with the answer can be deleted, and the part related to the question
  • the original article data includes a plurality of original texts
  • each original text includes a plurality of text words
  • the original question data includes a plurality of question words.
  • step S300 includes step S310, step S320, and step S330. These two steps will be described in detail below with reference to FIG. 2. It should be understood that step S300 includes but not limited to step S310 to step S330.
  • Step S310 Perform attention operation on the question encoding vector and the article encoding vector according to the preset first attention model to obtain an attention matrix; wherein, the attention matrix includes a plurality of attention values, and each attention value is used to represent How important each text word is to the question word.
  • step S310 of some embodiments the original question data and original article data are encoded through the above step S200, and after obtaining the question encoding vector H Q and the article encoding vector H P , the question is encoded by the preset first attention model Vector H Q and article encoding vector H P piece attention operation.
  • the structure of the first attention model can adopt the method of match attention, and the match attention is shown in the formula (2), and the formula (2) is as follows:
  • A represents the calculated attention matrix
  • SoftMax represents the softmax function
  • W is the weight matrix
  • b represents the bias
  • e is a unit vector.
  • the attention matrix of the question encoding vector H Q and the article encoding vector H P can be calculated by formula (2).
  • the structure of the first attention model can also use the calculation method of cross attention. Its update formula is similar to the self-attention of the transformer, and it also uses three matrices of Q, K, and V to calculate the attention results. Different Yes, here K and V are calculated with H Q , and Q is calculated with HP . However, the attention result of the final operation has the same format as the match attention.
  • the dimension of the attention matrix A is consistent with the dimension of HP (H Q ) T , and the attention matrix A records the attention situation of each token embedding of the original question data on the original article data. That is, A i,j can represent the importance of the j-th text word in the original article data in the i-th question word of the original question data, then It can characterize the importance of the jth text word in the original article data in the entire original question data.
  • Step S320 Obtain a preset first attention threshold.
  • Step S330 Screen the original text according to the first attention threshold and the attention matrix to obtain multiple candidate texts.
  • step S330 of some embodiments since A i,j can represent the importance of the jth text word in the original article data in the i'th question word of the original question data, then the first attention threshold can be used for The original text is screened to obtain multiple candidate texts, where PA represents the candidate texts.
  • step S330 includes but not limited to step S331 and step S332 , which will be described in detail below in conjunction with FIG. 3 .
  • Step S331 Calculate the attention value of the same text word on the original question data according to the attention matrix, and obtain the corresponding text attention value; wherein, the text attention value is used to represent the importance of the text word to the original question data.
  • Step S332 If the text attention value is greater than the first attention threshold, obtain the original text corresponding to the text word to obtain the corresponding candidate text.
  • the text attention value is less than or equal to the preset first attention threshold, it means that the text word is not important enough for the original question data, and it is judged as useless text information.
  • a i,j can represent the importance of the jth text word in the original article data in the i'th question word of the original question data, then It can characterize the importance of the jth text word in the original article data in the entire original question data.
  • the attention value of the same text word to the original question data is calculated to obtain the corresponding text attention value, that is, A *j . If A *j is greater than the preset first attention threshold, the text word corresponding to , and use the original text as one of the candidate texts in multiple candidate texts. Repeat this operation to get multiple candidate texts.
  • step S400 includes step S410 , step S420 and step S430 . These three steps are described in detail below, it should be understood that step S400 includes but not limited to step S410 to step S430.
  • Step S410 Input the original question data and each candidate text into a preset second pre-training model; wherein, the second pre-training model includes a first neural network and a second neural network.
  • Step S420 Classifying and labeling the original question data and each candidate text through the first neural network to obtain a question label vector and a candidate label vector for each candidate text.
  • Step S430 Perform mapping and classification processing on the question label vector and each candidate label vector through the second neural network to obtain corresponding correlation values.
  • the second pre-training model can adopt the BERT model again, but the parameters of the BERT model are different from the BERT model of the aforementioned first pre-training model.
  • the first neural network It is the CLS layer of BERT.
  • the question mark vector and the candidate mark vector are input to the second neural network for fine-tuning, mapping and classification processing, and the associated value Score 1 corresponding to the candidate text is obtained.
  • the score of the correlation value Score 1 is between 0 and 1. The higher the score, the greater the probability that the candidate answer is in the candidate text.
  • the correlation value Score 1 determines the relevance between the original question data and the candidate text.
  • the second neural network includes a fully connected layer and an activation classification layer.
  • Step S430 includes step S431 and step S432, and these two steps will be described in detail below, it should be understood that step S430 includes but not limited to step S431 and step S432.
  • Step S431 Perform fully-connected processing on the question mark vector and each candidate mark vector through the fully-connected layer to obtain corresponding fully-connected values.
  • Step S432 Perform activation classification processing on the fully-connected values through the activation classification layer to obtain corresponding associated values.
  • the second neural network includes a fully connected layer and an activation classification layer, and the activation classification layer is a sigmod function.
  • the result of the last layer of CLS of the BERT model is connected to the fully connected network for fine adjustment to obtain the fully connected value, and then the fully connected value is passed through a sigmod function layer to output the judgment score to obtain the associated value.
  • step S500 includes step S510 , step S520 and step S530 . It should be understood that step S500 includes but is not limited to these three steps, which will be described in detail below.
  • Step S510 Perform attention screening on the question label vector and each candidate label vector by using the preset second attention model to obtain a plurality of texts to be detected.
  • Step S520 Obtain a vector to be detected corresponding to the text to be detected according to the text to be detected and the candidate marker vectors.
  • Step S530 Perform screening and prediction processing on the text to be detected and the vector to be detected through the preset answer prediction model to obtain the corresponding confidence level of the text to be detected.
  • the structure of the second attention model may or may not be consistent with the structure of the foregoing first attention model. Regardless of whether the structure of the two is consistent, the calculation process of the attention result is similar.
  • the question mark vector and the candidate mark vector continue to be processed by BERT encoding, and then screened through a layer of attention structure model to obtain multiple texts to be detected. Then, according to the text to be detected, the matching process is performed from the candidate tag vector to obtain the vector to be detected corresponding to the text to be detected, and finally, the vector to be detected and the text to be detected are input into the answer prediction model to obtain the confidence corresponding to the text to be detected .
  • the answer prediction model includes a first fully connected multi-head network and a second fully connected multi-head network.
  • Step S530 includes but not limited to step S531, step S532 and step S533.
  • Step S531 Perform initial prediction processing on the text to be detected and the vector to be detected through the first fully connected multi-head network, to obtain the initial prediction position of the text to be detected and the initial mark position of the vector to be detected;
  • Step S532 Perform end prediction processing on the text to be detected and the vector to be detected through the second fully connected multi-head network to obtain the predicted end position of the text to be detected and the end mark position of the vector to be detected.
  • Step S533 Obtain the confidence corresponding to the text to be detected according to the start prediction position, the start mark position, the end prediction position and the end mark position.
  • the hidden results corresponding to the candidate text PA are input into two fully connected multi-head networks (the first fully connected multi-head network and the second fully connected multi-head network) with the same structure but different parameters.
  • the multi-head network is connected to predict the start position and end position of candidate answers respectively.
  • the long result of judging the start position is represented by si
  • the long result of judging the end position is represented by e i . Mark the maximum subscripts of the two fully connected multi-head networks as start and end respectively, then the candidate answer
  • the text to be detected and the vector to be detected are subjected to initial prediction processing through the first fully-connected multi-head network, and the initial prediction position of the text to be detected and the initial mark position of the vector to be detected are obtained, and treated by the second fully-connected multi-head network
  • the detected text and the vector to be detected are subjected to end prediction processing to obtain the predicted end position of the text to be detected and the end mark position of the vector to be detected.
  • Score 2 (S start -S 1 )+(e end -e 1 ) (3)
  • Score 2 The goal of Score 2 is similar to that of Score 1 , but the starting point is different. Confidence Score 2 is used to represent the confidence of the extracted answer.
  • the target score value Score is obtained according to Score 1 and Score 2. If the target score value Score is greater than the preset prediction threshold, the A candidate obtained above is used as the candidate answer. If the target score value Score is less than The preset prediction threshold means that there is no answer, and a null character is output.
  • some embodiments of the present application also propose a prediction device 800 for question answers, including an acquisition module 810, an encoding module 820, an attention screening module 830, an association module 840, and an answer screening module 850 , a processing module 860 and a matching module 870.
  • the obtaining module 810 is used to obtain the original topic data to be predicted; the original topic data includes original article data and original question data to be answered.
  • the encoding module 820 is configured to encode the original article data and the original question data according to the preset first pre-training model to obtain question encoding vectors and article encoding vectors.
  • the attention screening module 830 is used to perform attention screening processing on the question coding vector and the article coding vector to obtain a plurality of candidate texts;
  • the association module 840 is used to associate the original question data with each candidate text to obtain associated data; wherein the associated data includes a question mark vector, a candidate mark vector corresponding to each candidate text, and an associated value; wherein, the associated value is used for Characterize the correlation between the original question data and each candidate text.
  • the answer screening module 850 is used to perform answer screening processing on the question mark vector, a plurality of candidate texts and each candidate mark vector, to obtain a confidence degree corresponding to each candidate text; wherein, the confidence degree is used to represent whether the candidate text contains the candidate answer probability.
  • the processing module 860 is configured to determine the candidate position according to the correlation value, the confidence level and the preset prediction threshold; wherein the candidate position is the position of the candidate answer.
  • the matching module 870 is configured to match corresponding candidate texts according to candidate positions to obtain candidate answers.
  • the question answer prediction device 800 of the embodiment of the present application encodes the original article data and original question data in the original question data through the preset first pre-training model to obtain the question encoding vector and the article encoding vector, and then the question
  • the encoding vector and the article encoding vector are subjected to attention screening processing to screen multiple candidate texts, and then the original question data and each candidate text are associated with each other to obtain the association used to characterize the relevance of the original question data and each candidate text value, the question mark vector corresponding to the original question data, and the candidate mark vector corresponding to the candidate text; then answer screening process is performed on the question mark vector, candidate text and candidate mark vector, and the confidence used to characterize the probability of each candidate text containing the candidate answer is obtained degree, and finally determine the candidate position according to the correlation value, confidence degree and preset prediction threshold, so as to determine the candidate answer.
  • the attention screening mechanism, association processing, and answer screening processing useless text information that has nothing to do with the answer can be deleted, and the part related to the question in the article can
  • the device for predicting the answer to the question in the embodiment of the present application corresponds to the method for predicting the answer to the question mentioned above.
  • the specific prediction process please refer to the method for predicting the answer to the question mentioned above, which will not be repeated here.
  • the embodiment of the present application also provides an electronic device, including:
  • the program is stored in the memory, and the processor executes at least one program to implement a method for predicting the answer to the question in the present application, wherein the method for predicting the answer to the question includes: obtaining the original question data to be predicted; where the original question data includes the original Article data and original question data to be answered; encode the original article data and original question data according to the preset first pre-training model to obtain question encoding vectors and article encoding vectors; pay attention to the question encoding vectors and article encoding vectors
  • the original question data is associated with each candidate text to obtain associated data; wherein the associated data includes question mark vectors, candidate mark vectors corresponding to each candidate text, and associated values; among them,
  • the correlation value is used to represent the correlation between the original question data and each candidate text; the answer screening process is performed on the question mark vector, multiple candidate texts and each candidate mark vector to obtain the corresponding confidence of each candidate text; where , the confidence degree is used to characterize the probability that the candidate text contains the candidate answer; the candidate position is determined
  • Figure 9 illustrates the hardware structure of an electronic device in another embodiment, the electronic device includes:
  • the processor 910 may be implemented by a general-purpose central processing unit (Central Processing Unit, CPU), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute Relevant programs to realize the technical solutions provided by the embodiments of the present application;
  • CPU Central Processing Unit
  • ASIC Application Specific Integrated Circuit
  • the memory 920 may be implemented in the form of a read-only memory (Read Only Memory, ROM), a static storage device, a dynamic storage device, or a random access memory (Random Access Memory, RAM).
  • the memory 920 can store operating systems and other application programs.
  • the relevant program codes are stored in the memory 920 and called by the processor 910 to execute the implementation of the present application. A method of predicting the answer to the question of the example;
  • the input/output interface 930 is used to realize information input and output
  • the communication interface 940 is used to realize the communication interaction between the device and other devices, and the communication can be realized through a wired method (such as USB, network cable, etc.), or can be realized through a wireless method (such as a mobile network, WIFI, Bluetooth, etc.);
  • bus 950 to transfer information between various components of the device (eg, processor 910, memory 920, input/output interface 930, and communication interface 940);
  • the processor 910 , the memory 920 , the input/output interface 930 and the communication interface 940 are connected to each other within the device through the bus 950 .
  • the embodiment of the present application also provides a storage medium, the storage medium is a computer-readable storage medium, and the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are used to make a computer execute a method of answering a question.
  • the prediction method of the question answer includes: obtaining the original topic data to be predicted; wherein, the original topic data includes the original article data and the original question data to be answered; according to the preset first pre-training model, the original article data Perform encoding processing with the original question data to obtain the question encoding vector and the article encoding vector; perform attention screening processing on the question encoding vector and the article encoding vector to obtain multiple candidate texts; perform association processing on the original question data and each candidate text, Obtain associated data; wherein the associated data includes a question mark vector, a candidate mark vector corresponding to each candidate text, and an associated value; wherein, the associated value is used to characterize the relevance between the original question data and each candidate text; the question mark vector , a plurality of candidate texts and each candidate tag vector to perform answer screening processing to obtain the corresponding confidence level of each candidate text; wherein, the confidence level is used to represent the probability that the candidate text contains the candidate answer; according to the associated value, confidence level and preset
  • the prediction threshold of is used
  • the computer readable storage medium can be nonvolatile or volatile.
  • memory can be used to store non-transitory software programs and non-transitory computer-executable programs.
  • the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage devices.
  • the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • At least one (item) means one or more, and “multiple” means two or more.
  • “And/or” is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, “A and/or B” can mean: only A exists, only B exists, and A and B exist at the same time , where A and B can be singular or plural.
  • the character “/” generally indicates that the contextual objects are an “or” relationship.
  • At least one of the following” or similar expressions refer to any combination of these items, including any combination of single or plural items.
  • At least one item (piece) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c ", where a, b, c can be single or multiple.
  • the disclosed devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • a unit described as a separate component may or may not be physically separated, and a component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including multiple instructions to make an electronic device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other various media that can store programs.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

L'invention concerne un procédé de prédiction et un appareil de prédiction de réponse à une question, un dispositif électronique et un support de stockage. Le procédé consiste à : acquérir des données d'article d'origine à prédire et des données de question d'origine appelant une réponse ; coder les données d'article d'origine et les données de question d'origine selon un premier modèle de préentraînement prédéfini pour obtenir un vecteur de codage de question et un vecteur de codage d'article ; effectuer un traitement de filtrage d'attention sur le vecteur de codage de question et le vecteur de codage d'article pour obtenir une pluralité de textes candidats ; effectuer un traitement d'association sur les données de question d'origine et les textes candidats pour obtenir un vecteur de marque de question, un vecteur de marque candidat et une valeur d'association ; effectuer un traitement de criblage de réponse sur le vecteur de marque de question, les textes candidats et le vecteur de marque candidat pour obtenir une confiance correspondante ; déterminer une position candidate en fonction de la valeur d'association, de la confiance et d'un seuil de prédiction prédéfini ; et mettre en correspondance un texte candidat correspondant selon la position candidate pour obtenir une réponse candidate. Le procédé peut améliorer la précision de prédiction d'une réponse à une question.
PCT/CN2022/090750 2022-01-11 2022-04-29 Procédé de prédiction et appareil de prédiction de réponse à une question, dispositif électronique et support de stockage WO2023134085A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210025867.7A CN114416962A (zh) 2022-01-11 2022-01-11 问题答案的预测方法、预测装置、电子设备、存储介质
CN202210025867.7 2022-01-11

Publications (1)

Publication Number Publication Date
WO2023134085A1 true WO2023134085A1 (fr) 2023-07-20

Family

ID=81272360

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/090750 WO2023134085A1 (fr) 2022-01-11 2022-04-29 Procédé de prédiction et appareil de prédiction de réponse à une question, dispositif électronique et support de stockage

Country Status (2)

Country Link
CN (1) CN114416962A (fr)
WO (1) WO2023134085A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117540730A (zh) * 2023-10-10 2024-02-09 鹏城实验室 文本标注方法和装置、计算机设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491433A (zh) * 2018-02-09 2018-09-04 平安科技(深圳)有限公司 聊天应答方法、电子装置及存储介质
US20180300314A1 (en) * 2017-04-12 2018-10-18 Petuum Inc. Constituent Centric Architecture for Reading Comprehension
CN110647629A (zh) * 2019-09-20 2020-01-03 北京理工大学 一种多粒度答案排序的多文档机器阅读理解方法
CN113486203A (zh) * 2021-07-09 2021-10-08 平安科技(深圳)有限公司 基于问答平台的数据处理方法、装置及相关设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180300314A1 (en) * 2017-04-12 2018-10-18 Petuum Inc. Constituent Centric Architecture for Reading Comprehension
CN108491433A (zh) * 2018-02-09 2018-09-04 平安科技(深圳)有限公司 聊天应答方法、电子装置及存储介质
CN110647629A (zh) * 2019-09-20 2020-01-03 北京理工大学 一种多粒度答案排序的多文档机器阅读理解方法
CN113486203A (zh) * 2021-07-09 2021-10-08 平安科技(深圳)有限公司 基于问答平台的数据处理方法、装置及相关设备

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117540730A (zh) * 2023-10-10 2024-02-09 鹏城实验室 文本标注方法和装置、计算机设备及存储介质

Also Published As

Publication number Publication date
CN114416962A (zh) 2022-04-29

Similar Documents

Publication Publication Date Title
CN110795543B (zh) 基于深度学习的非结构化数据抽取方法、装置及存储介质
CN110737801B (zh) 内容分类方法、装置、计算机设备和存储介质
CN111753060A (zh) 信息检索方法、装置、设备及计算机可读存储介质
CN112131883B (zh) 语言模型训练方法、装置、计算机设备和存储介质
CN113255320A (zh) 基于句法树和图注意力机制的实体关系抽取方法及装置
CN114897060B (zh) 样本分类模型的训练方法和装置、样本分类方法和装置
CN113705315B (zh) 视频处理方法、装置、设备及存储介质
CN114358007A (zh) 多标签识别方法、装置、电子设备及存储介质
CN111881292B (zh) 一种文本分类方法及装置
CN113239169A (zh) 基于人工智能的回答生成方法、装置、设备及存储介质
CN113128431B (zh) 视频片段检索方法、装置、介质与电子设备
CN112632244A (zh) 一种人机通话的优化方法、装置、计算机设备及存储介质
CN114416995A (zh) 信息推荐方法、装置及设备
CN111145914B (zh) 一种确定肺癌临床病种库文本实体的方法及装置
CN115827819A (zh) 一种智能问答处理方法、装置、电子设备及存储介质
CN113392265A (zh) 多媒体处理方法、装置及设备
CN114691864A (zh) 文本分类模型训练方法及装置、文本分类方法及装置
WO2023134085A1 (fr) Procédé de prédiction et appareil de prédiction de réponse à une question, dispositif électronique et support de stockage
CN114490949A (zh) 基于bm25算法的文档检索方法、装置、设备及介质
CN114091452A (zh) 一种基于适配器的迁移学习方法、装置、设备及存储介质
CN113449081A (zh) 文本特征的提取方法、装置、计算机设备及存储介质
CN111445545B (zh) 一种文本转贴图方法、装置、存储介质及电子设备
CN117493491A (zh) 一种基于机器学习的自然语言处理方法及系统
CN116956925A (zh) 电子病历命名实体识别方法和装置、电子设备及存储介质
CN114491076B (zh) 基于领域知识图谱的数据增强方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22919725

Country of ref document: EP

Kind code of ref document: A1