CN115455160A

CN115455160A - Multi-document reading understanding method, device, equipment and storage medium

Info

Publication number: CN115455160A
Application number: CN202211071561.1A
Authority: CN
Inventors: 杨韬
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-09-02
Filing date: 2022-09-02
Publication date: 2022-12-09
Anticipated expiration: 2042-09-02
Also published as: CN115455160B

Abstract

The embodiment of the application provides a method, a device, equipment and a storage medium for reading and understanding multiple documents, which are used for acquiring semantic features among answer documents extracted with the same answer, so that the sequencing effect in an answer reordering model is increased, and the accuracy of reading and understanding the multiple documents is improved. The method comprises the following steps: acquiring a target question and a plurality of reply documents corresponding to the target question, and establishing an answer extraction model and an answer reordering model; calling an answer extraction model to obtain a prediction answer set of a plurality of answer documents to the target question; classifying the predicted answer sets according to the same predicted answers to obtain at least one answer set; calling an answer reordering model to obtain at least one answer representation set corresponding to at least one answer set; calling an answer reordering model to obtain the prediction scores of the prediction answers corresponding to the answer representation set; and outputting a final answer corresponding to the target question according to the prediction score. The application can be applied to the field of artificial intelligence.

Description

Multi-document reading understanding method, device, equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a method, an apparatus, a device, and a storage medium for reading and understanding multiple documents.

Background

Machine reading and understanding (MRC) is an important task in the field of Natural Language Processing (NLP), and aims to enable a machine to extract relevant information and knowledge according to given questions and articles so as to obtain answers. Compared with the basic tasks of Named Entity Recognition (NER), relation extraction and the like in natural language processing, the MRC is a more complex and higher-layer task, has higher requirement on understanding of semantics and extracts more text information. It can be applied to a plurality of fields such as a search engine.

Search engines are a very important way for people to obtain information. I.e., a large number of users searching for queries through a search engine, which includes a large number of question-and-answer type query operations. However, conventional web search engines generally do a "match" task, i.e., a correlation match is made between the query question and the response document, rather than a more precise understanding of the question.

Intelligent question-and-answer techniques can just remedy this limitation of traditional search engines. The user submits the natural language query to the system, and the system directly returns the answer meeting the user requirement, so that the labor participation cost is reduced, and the process of acquiring information and knowledge by the user is changed into a question-and-answer mode. For a user, the user is helped to acquire the answer in the fastest time, the search experience is good, for a content provider, the answer is displayed on the top first, more exposure and flow can be obtained, and the ecological construction of the content is facilitated. Therefore, there is a need for a solution that can improve the accuracy of reading and understanding multiple documents.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment and a storage medium for reading and understanding multiple documents, which are used for classifying predicted answers extracted from multiple answer documents and acquiring semantic features among the answer documents with the same answer extracted, so that the ranking effect in an answer reordering model is increased, and the accuracy of reading and understanding the multiple documents is improved.

In view of this, an aspect of the present application provides a method for reading and understanding multiple documents, including: acquiring a target question and a plurality of answer documents corresponding to the target question, and establishing an answer extraction model and an answer reordering model;

calling the answer extraction model to obtain a predicted answer set of the plurality of answer documents to the target question;

classifying the predicted answer set according to the same predicted answer to obtain at least one answer set;

calling the answer reordering model to obtain at least one answer representation set corresponding to the at least one answer set, wherein each answer set in the at least one answer set corresponds to a predicted answer, and the answers in the at least one answer representation set are represented as the representation vectors of the predicted answers;

calling the answer reordering model to obtain the prediction scores of the prediction answers corresponding to the answer representation set;

and outputting a final answer corresponding to the target question according to the prediction score.

Another aspect of the present application provides a multi-document reading understanding model, including: the system comprises an acquisition module, a query module and a query module, wherein the acquisition module is used for acquiring a target question and a plurality of reply documents corresponding to the target question and establishing an answer extraction model and an answer reordering model;

the processing module is used for calling the answer extraction model to obtain a prediction answer set of the plurality of answer documents to the target question; classifying the predicted answer set according to the same predicted answer to obtain at least one answer set; calling the answer reordering model to obtain at least one answer representation set corresponding to the at least one answer set, wherein the answer representation in the at least one answer representation set is a representation vector of a predicted answer corresponding to each answer set in the at least one answer set; calling the answer reordering model to obtain the prediction score of the prediction answer corresponding to the answer representation set;

and the output module is used for outputting the final answer corresponding to the target question according to the prediction score.

In one possible design, in another implementation manner of another aspect of the embodiment of the present application, the processing module is specifically configured to invoke the answer reordering model to obtain a first set of output vectors of the target question and each answer document in a first answer document set, where the first answer document set includes answer documents corresponding to a first answer set in the at least one answer set;

selecting character strings of the predicted answers corresponding to the first answer document set from the first output vector set to perform pooling to obtain an intermediate answer representation set of each predicted answer in the first answer set;

performing self-attention processing on the intermediate answer representation set to obtain a first answer representation of the predicted answer corresponding to the first answer set;

and repeating the steps to obtain the answer representation of the predicted answer corresponding to each answer set in the at least one answer set, and classifying the answer representation into the at least one answer representation set, wherein the first answer representation is included in the at least one answer representation set.

In one possible design, in another implementation manner of another aspect of the embodiment of the present application, the processing module is specifically configured to splice each character of the target question with each character of each reply document in the first reply document set, and use a start character and an interval character to obtain a first word sequence set;

and inputting the first word sequence set into the answer reordering model to obtain the first output vector set.

In one possible design, in another implementation manner of another aspect of the embodiment of the present application, the processing module is specifically configured to perform self-attention processing on each intermediate answer representation in the intermediate answer representation set to obtain a self-attention prediction score set;

normalizing the self-attention prediction score set to obtain a normalized prediction score set;

and carrying out weighted summation on all the normalized prediction scores in the normalized prediction score set to obtain the first answer representation.

In one possible design, in another implementation manner of another aspect of the embodiment of the present application, the obtaining module is further configured to obtain a training sample set, and establish an initial answer extraction model and an initial answer reordering model, where the training sample set includes a question sample set and a reply document sample set corresponding to the question sample set;

the device also comprises a training module which is used for training the initial answer extraction model and the initial answer reordering model by utilizing the training sample set to obtain the answer extraction model and the answer reordering model.

In a possible design, in another implementation manner of another aspect of the embodiment of the present application, the initial answer extraction model and the initial answer reordering model share the same coding layer, and the training module is specifically configured to call the coding layer to obtain a second output vector corresponding to the training sample set;

inputting the second output vector into the initial answer extraction model to obtain a first loss value and a training answer set;

inputting the training answer set into the initial answer reordering model to obtain a second loss value;

and reversely adjusting the weighting parameters of the initial answer extraction model and the initial answer reordering model by utilizing the sum of the first loss value and the second loss value to obtain the answer extraction model and the answer reordering model.

In one possible design, in another implementation manner of another aspect of the embodiment of the present application, the training module is specifically configured to input the output vector into the initial answer extraction model to obtain the training answer set;

obtaining the prediction scores of the starting words and the ending words corresponding to each training answer in the training answer set of the output vector;

converting the predicted score of the start word into a first probability value by using an activation function, and converting the predicted score of the end word into a second probability value by using the activation function;

obtaining the first loss value according to the first probability value and the second probability value;

the training module is specifically configured to input the training answer set into the initial answer reordering model to obtain a training answer representation set corresponding to the training answer set;

acquiring a training prediction score set corresponding to each training answer in the training answer representation set;

converting each training prediction score in the training prediction score set into a third probability value by using an activation function;

and obtaining the second loss value by using the cross entropy and the third probability value.

Another aspect of the present application provides a computer device, comprising: a memory, a processor, and a bus system;

wherein, the memory is used for storing programs;

a processor for executing the program in the memory, the processor being configured to perform the above-described aspects of the method according to instructions in the program code;

the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.

Another aspect of the present application provides a computer-readable storage medium having stored therein instructions, which when executed on a computer, cause the computer to perform the method of the above-described aspects.

In another aspect of the application, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided by the above aspects.

According to the technical scheme, the embodiment of the application has the following advantages: the predicted answers extracted from the multiple answer documents are classified, and the semantic features between the answer documents with the same answer extracted are obtained, so that the ranking effect in the answer reordering model is improved, and the accuracy of reading and understanding of the multiple documents is improved.

Drawings

FIG. 1 is a schematic flow diagram of a question-answering system;

FIG. 2 is a schematic diagram of an architecture of an application system of a multi-document reading understanding method in an embodiment of the present application;

FIG. 3 is a diagram illustrating an architecture of an answer extraction model and an answer reordering model according to an embodiment of the present application;

FIG. 4a is a schematic diagram of a training architecture of an answer extraction model and an answer reordering model according to an embodiment of the present application;

FIG. 4b is a diagram illustrating a network architecture of an answer extraction model and an answer reordering model according to an embodiment of the present application;

FIG. 5a is a schematic diagram of another training architecture of an answer extraction model and an answer reordering model in the embodiment of the present application;

FIG. 5b is a schematic diagram of another network architecture of an answer extraction model and an answer reordering model according to an embodiment of the present application;

FIG. 6 is a diagram of an embodiment of a multi-document reading understanding method in the embodiment of the present application;

FIG. 7 is a schematic diagram of an embodiment of a multi-document reading understanding apparatus in the embodiment of the present application;

FIG. 8 is a schematic view of an embodiment of a multi-document reading and understanding apparatus in the embodiment of the present application;

FIG. 9 is a schematic view of an embodiment of a multi-document reading understanding apparatus in the embodiment of the present application;

fig. 10 is a schematic view of an embodiment of a multi-document reading and understanding apparatus in the embodiment of the present application.

Detailed Description

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Moreover, the terms "comprises," "comprising," and "corresponding" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In view of the many terms of art to which this application relates, reference will now be made to those terms of art.

Word segmentation, and recombination of continuous word sequences into word sequences according to certain specifications. The effect of recognizing words is achieved by letting the computer simulate the understanding of a sentence by a human.

The entity word refers to things which can exist independently, serve as the basis of all attributes and are all things primitive, namely the entity word refers to words which can represent the entity. Nouns and pronouns are entity words, for example, "towering" and "wife" are entity words.

The intention word, which is intended to clearly recognize the objectives to be achieved, refers to words that can represent a problem, such as "who", and "where" are the intention words.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The method specially studies how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

Deep Learning (DL) is a branch of machine Learning, an algorithm that attempts to perform high-level abstraction of data using multiple processing layers that contain complex structures or consist of multiple nonlinear transformations.

Neural Networks (NN), a deep learning model that mimics the structure and function of biological Neural networks in the field of machine learning and cognitive science.

Information extraction (information extraction), namely extracting specific event or fact information from natural language text, and helping a user to automatically classify, extract and reconstruct mass contents. The specific event or fact information generally includes an entity (entity), a relationship (relationship), and an event (event). For example, time, place, key people are extracted from news, or product name, development time, performance index, etc. are extracted from technical documents. Because information extraction can extract information frames and fact information interested by users from natural language, the information extraction is widely applied in knowledge maps, information retrieval, question-answering systems, emotion analysis and text mining. The information extraction mainly comprises three subtasks: entity extraction and chain finger, relationship extraction, and event extraction. The entity extraction and chain finger is named entity identification. The relationship extraction is triple extraction, and is mainly used for extracting relationships among entities. The event extraction is equivalent to the extraction of a multivariate relation.

Relationship Extraction (RE), which gives an entity pair and text containing the entity pair, aims to determine the semantic relationship of the entity pair based on the text. For example, given an entity pair (M nation, national president) and text ("president candidate a defeated president candidate B in the most recent large selection, becoming the next president of M nation \8230;), the user wishes to identify that the relationship between the entities" president candidate a "and" M nation "is" national president ". In the relationship extraction, a set of relationships, such as "national president", is typically predefined.

Relationship Classification (RC), a modeling manner of relationship extraction, that is, converting relationship extraction into a Classification problem, where each relationship corresponds to a category.

A Question and answer system (QA) identifies the location of an answer to a text from the text given a text and a Question.

Machine reading and understanding (MRC) is an important task in the field of Natural Language Processing (NLP), and aims to allow a machine to extract relevant information and knowledge according to given questions and articles, so as to obtain answers. Compared with the basic tasks of Named Entity Recognition (NER), relation extraction and the like in natural language processing, the MRC is a more complex and higher-layer task, has higher requirement on understanding of semantics and extracts more text information. It can be applied to a variety of fields, such as search engines. Search engines are a very important way for people to obtain information. I.e., a large number of users searching for queries through a search engine, which includes a large number of question-and-answer type query operations. However, conventional web search engines generally do a "match" task, i.e., a correlation match is made between the query question and the response document, rather than a more precise understanding of the question. The intelligent question-answering technology can exactly make up for the limitation of the traditional search engine. The user submits the natural language query to the system, and the system directly returns the answer meeting the user requirement, so that the manual participation cost is reduced, and the process that the user acquires information and knowledge is changed into a question-answer mode. For a user, the user is helped to acquire the answer in the fastest time, the search experience is good, for a content provider, the answer is displayed on the top for the first time, more exposure and flow can be obtained, and the construction of content ecology is facilitated. The conventional intelligent question-answering technology at present mainly comprises three important modules: the specific processes of the article retrieval module, the answer extraction module and the answer reordering module can be as shown in fig. 1:

acquiring input query characters, and filtering by question and answer intentions to obtain specific query questions; then, relevant reply documents are detected from the paragraph index library through an article retrieval module; and then inputting the multiple documents into a multiple document answer extraction module and an answer reordering module to obtain final answers. The answer extraction module is used for extracting answers from each retrieved paragraph, and the answer reordering module is used for uniformly ordering all the extracted answers and finally selecting the answer with the highest score as the final answer of the query question (query). Namely, the answer extraction and the answer reordering adopt a pipeline mode, namely, the answer extraction is carried out firstly, and then the answer reordering is carried out. The reordering of answers is generally based on artificial features, i.e., some artificial features are constructed. For example, the number of answer documents from which answers are extracted, the positions of the answers in the answer documents, the scores of the answer extraction module, and other features train a Logistic Regression (LR) linear model or a Gradient Boosting Decision Tree (GBDT) Tree model. However, in this process, the answer documents of the answers are extracted independently of each other, and the semantic features of the answer documents are not used.

In order to solve the problem, the application provides the following technical scheme: acquiring a target question and a plurality of answer documents corresponding to the target question, and establishing an answer extraction model and an answer reordering model; calling the answer extraction model to obtain a predicted answer set of the plurality of answer documents to the target question; classifying the predicted answer set according to the same predicted answer to obtain at least one answer set; calling the answer reordering model to obtain at least one answer characterization set corresponding to the at least one answer set, wherein each answer set in the at least one answer set corresponds to one predicted answer, and the answers in the at least one answer characterization set are characterized as characterization vectors of the predicted answers; calling the answer reordering model to obtain the prediction scores of the prediction answers corresponding to the answer representation set; and outputting a final answer corresponding to the target question according to the prediction score. Therefore, the predicted answers extracted from the multiple answer documents are classified, and the semantic features among the answer documents with the same answer extracted are obtained, so that the ranking effect in the answer reordering model is improved, and the accuracy of reading and understanding of the multiple documents is improved.

The method provided by the present application is applied to the system architecture shown in fig. 2, and fig. 2 is an architecture schematic diagram of the system architecture in the embodiment of the present application, and as shown in fig. 2, the system architecture includes a server and a terminal device, and a client (i.e., a search engine or a social software applet) is deployed on the terminal device, where the client may run on the terminal device in a browser form, may run on the terminal device in an independent Application (APP) form, and the specific presentation form of the client is not limited herein. The server related to the application can be an independent physical server, can also be a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server for providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, safety service, content Delivery Network (CDN), big data and an artificial intelligence platform. The terminal device may be a smart phone, a tablet computer, a notebook computer, a palm computer, a personal computer, a smart television, a smart watch, a vehicle-mounted device, a wearable device, and the like, but is not limited thereto. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein. The number of servers and terminal devices is also not limited. The scheme provided by the application can be independently completed by the terminal device, can also be independently completed by the server, and can also be completed by the cooperation of the terminal device and the server, so that the application is not particularly limited. Among them, the reply document referred to in the present application may be stored in a Database (Database). The database can be regarded as an electronic file cabinet, namely a place for storing electronic files, and users can add, inquire, update, delete and the like to the data in the files. A "database" is a collection of data that is stored together in a manner that can be shared by multiple users, has as little redundancy as possible, and is independent of the application. A Database Management System (DBMS) is a computer software System designed for managing a Database, and generally has basic functions such as storage, interception, security assurance, and backup. The database management system may be categorized according to the database model it supports, such as relational, extensible Markup Language (XML); or classified according to the type of computer supported, e.g., server cluster, mobile phone; or classified according to the Query Language used, such as Structured Query Language (SQL), XQuery; or by performance impulse emphasis, e.g., maximum size, maximum operating speed; or other classification schemes. Regardless of the manner of classification used, some DBMSs are capable of supporting multiple query languages, for example, simultaneously, across classes.

It is understood that in the specific implementation of the present application, related data such as response documents, when the above embodiments of the present application are applied to specific products or technologies, user permission or consent needs to be obtained, and the collection, use and processing of related data need to comply with related laws and regulations and standards of related countries and regions.

It is understood that, in the embodiment of the present application, before performing multi-document reading understanding, a training method of an answer extraction model and an answer reordering model may also be provided in the embodiment of the present application. In an exemplary scheme, the method for training the answer extraction model and the answer reordering model provided in the embodiment of the present application can be executed by a computer device. An implementation environment of the method for training the answer extraction model and the answer reordering model provided in the embodiment of the present application is introduced below, and fig. 3 is a schematic diagram of an implementation environment of the method for training the answer extraction model and the answer reordering model provided in the embodiment of the present application. Referring to fig. 3, the implementation environment includes a terminal device 101 and a server 102. The terminal device 101 and the server 102 can be connected directly or indirectly through wired or wireless communication, and the application is not limited herein.

In some embodiments, the terminal device 101 is a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, a smart voice interaction device, a smart home appliance, a vehicle-mounted terminal, and the like, but is not limited thereto. The terminal device 101 is installed and operated with a client supporting content recommendation, where the client may be operated on the terminal device 101 in a browser form, or may be operated on the terminal device in an independent Application (APP) form, and a specific presentation form of the client is not limited herein. In some embodiments, the server 102 is an independent physical server, can also be a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server providing basic cloud computing services such as cloud service, cloud database, cloud computing, cloud function, cloud storage, web service, cloud communication, middleware service, domain name service, security service, content Delivery Network (CDN), big data and artificial intelligence platform, and the like. The server 102 is used for providing background services for the application programs supporting the virtual scenes. In some embodiments, the server 102 undertakes primary computing tasks and the terminal device 101 undertakes secondary computing tasks, e.g., the terminal device 101 provides sample data to the server 102, the server 102 provides a training process for the multi-objective recommendation model; or, the server 102 and the terminal device 101 perform cooperative computing by using a distributed computing architecture.

It is understood that the number of the terminal devices 101 may be more or less. For example, the number of the terminal devices 101 may be only one, or the number of the terminal devices 101 may be several tens or hundreds, or more. That is, the number and the device type of the terminal device 101 are not limited in the embodiment of the present application.

Next, a training architecture of the method for training the answer extraction model and the answer reordering model provided in the embodiment of the present application is introduced, for example, in the training architecture shown in fig. 4a, a training sample set of the answer extraction model is obtained; and taking the real extracted answers in the training sample set as supervision data to supervise and train the answer extraction model. After the training sample set is obtained, the overall training process may be as follows: inputting the training sample set into the answer extraction model to output a pre-estimated answer and other output results, then calculating a first loss of the pre-estimated answer and the real extracted answer, and reversely adjusting parameters of the answer extraction model according to the first loss value so as to realize the training of the answer extraction model until a training finishing condition is reached; classifying the estimated answers according to the same estimated answer rules into a set to obtain a plurality of estimated answer sets, obtaining a final answer representation of the answers in the estimated answer set from each of the plurality of estimated answer sets through a representation layer and a coding layer of an answer reordering model, inputting the final answer representation into the answer reordering model to obtain estimated scores of the estimated answers corresponding to each of the plurality of estimated answer sets, and calculating a second loss by using the estimated scores and the real answers; finally, parameters of the answer reordering model are reversely adjusted according to the second loss so as to achieve the training of the answer reordering model until the condition of finishing the training is reached. It is understood that, for the training architecture shown in FIG. 4a, the network architecture of the answer extraction model and the answer reordering model can be as shown in FIG. 4 b.

It can be understood that, in order to improve the training effect of the answer extraction model and the answer reordering model, a training architecture of the method for training the answer extraction model and the answer reordering model provided in the embodiment of the present application is introduced below, for example, in the training architecture shown in fig. 5a, a training sample set of the answer extraction model is obtained; and taking the real extracted answers in the training sample set as supervision data to supervise and train the answer extraction model. After the training sample set is obtained, the overall training process may be as follows: inputting the training sample set into the answer extraction model to output a predicted answer and other output results, and then calculating a first loss by using the predicted answer and the real extracted answer; classifying the pre-estimated answers into a plurality of pre-estimated answer sets according to the same rule that the pre-estimated answers are divided into a set, obtaining a final answer representation of the answers in the pre-estimated answer sets from each pre-estimated answer set in the plurality of pre-estimated answer sets through a representation layer and a coding layer of an answer reordering model, inputting the final answer representation into the answer reordering model to obtain pre-estimated scores of the pre-estimated answers corresponding to the pre-estimated answer sets, and calculating a second pre-estimated score and the real answerLoss; and finally, taking the sum of the second loss and the first loss value as the overall loss value of the answer extraction model and the answer reordering model, and reversely adjusting the parameters of the answer extraction model and the answer reordering model according to the overall loss value so as to realize the training of the answer extraction model and the answer reordering model until the condition of finishing the training is reached. It is understood that, as shown in fig. 5a, the network structure of the answer extraction model and the answer reordering model can be as shown in fig. 5 b. In this process, the answer extraction model passes through the formula logits _i ＝vVemb _i And formula P _i ＝softmax(logits _i ) Calculating to obtain a prediction score and a probability value of each character (i.e. token), then selecting a prediction answer according to the prediction score and the probability value of each character, and determining the probability of the starting position (i.e. start token) of the prediction answer and the probability of the ending position (i.e. end token) of the prediction answer, i.e. the starting character of the prediction answer and the ending character of the prediction answer; then according to formula Loss1= -logP _start -logP _end Obtaining a first loss of the answer extraction model; meanwhile, the answer reordering model is represented by the formula Loss2= -log (Prob) _label ) And calculating to obtain the total Loss of the answer extraction model and the answer reordering model, namely Loss1+ Loss2.

Answer extraction in this embodiment refers to finding a question answer a given a question Q and one or more text segments P (P1, P2, P3.. Pn). The machine reading understanding is that given a piece of text, paragraph, and a Question, the Answer is obtained. It is generally assumed that Answer is contained in the text, so the goal of machine reading understanding task is to get a span (start, end), start representing the position of the Answer's start character in the Paragraph, and end representing the position of the Answer's end character in the Paragraph. And performing natural language understanding according to the questions and the corresponding texts, and predicting answers of the questions according to the texts. The answer of the extraction type reading and understanding task is continuous characters appearing in the original text, and the answer must be a range in the text. Machine reading comprehends several modes, namely different types of questions and different answer types, and generally, there are three types of questions: simple questions, which can be answered with simple facts, the answers are usually entities, and the answers are short; a somewhat more complex narrative question, a somewhat longer answer; complex questions, usually about a point of view or opinion. And the step of reordering the answers refers to uniformly ordering the answers obtained in the Answer extraction, and finally selecting the Answer with the highest score as the Answer of the given question Q.

In this embodiment, the answer extraction model and the answer reordering model may be GBDT, linear model, or depth model, which is not limited herein.

It should be noted that, the implementation environment of the multi-document reading understanding method provided in the embodiment of the present application may be the same as or different from the implementation environment of the training method of the answer extraction model and the answer reordering model, and the embodiment of the present application does not limit this.

With reference to fig. 6, a terminal device is taken as an execution subject, and an answer extraction model and an answer reordering model obtained by training a training architecture shown in fig. 5a are described below, where an embodiment of the multi-document reading understanding method in the present application includes:

601. the method comprises the steps of obtaining a target question and a plurality of answer documents corresponding to the target question, and establishing an answer extraction model and an answer reordering model.

The terminal equipment acquires a target problem input by a user through input equipment of the terminal equipment, and then acquires a plurality of answer documents corresponding to the target problem through article retrieval; meanwhile, the terminal equipment can also be provided with an answer extraction model and an answer reordering model or connected with a server provided with the answer extraction model and the answer reordering model. In this embodiment, the answer extraction model and the answer reordering model can be obtained by training using the training method shown in fig. 4a or fig. 4 b.

In this embodiment, the article retrieval may also be referred to as text recall. It may be specifically as follows: namely, after the user inputs the target question through the input interface of the terminal and the input is completed, the target question is sent to the server. After receiving the target question, the server recalls a text to be recalled (i.e., the reply document) according to the target question to obtain at least one corresponding reply document, where the reply document may be a text in a knowledge base, and the knowledge base may be a database previously stored in the server by the user, or a text that is not stored in the knowledge base, such as a newly reported news on a web page, a text on a public number. In this process, after receiving the target question, the server may perform word segmentation processing on the target question to obtain a keyword in the target question, so as to determine a weight of the keyword in the target question. After obtaining the keywords in the target question, the server may determine the relevance between the keywords and the response document according to the keywords in the target question and the response document. After the server obtains the weights of the key words and the correlation degrees of the key words and the reply documents, the weights of the key words and the correlation degrees of the key words and the reply documents can be weighted and summed to obtain the correlation degree scores of the target question and the reply documents. After the server obtains the relevance scores of the target question and the reply documents, the reply documents may be sorted in a descending order based on the relevance scores of the target question and the reply documents to obtain the first N reply documents, and the first N reply documents are determined as the reply documents so as to be processed correspondingly in the following.

602. And calling the answer extraction model to obtain a predicted answer set of the plurality of answer documents to the target question.

The terminal equipment inputs the target question and a plurality of answer documents into the answer extraction model to obtain a predicted answer set of the answer documents to the target question. I.e., each reply document will extract an answer. In an exemplary scheme, if the target question is "who has started old bridge troops historically", it may obtain 6 response documents through article search, and each response document corresponds to an answer, and the result of the predicted answer set may be as follows: the answer of 4 reply documents is "Zhang-Kuangyin", one answer is "Zhao-Guang-Yi", and one answer is "Chaozong exercise".

In this embodiment, before the terminal device inputs the target question and the plurality of reply documents into the answer extraction model, data preprocessing is further required. Firstly, tokenization processing is carried out on target question and answer documents, namely sentences are converted into character-level sequences; and then connecting the target question sequence and the answer document sequence, separating the target question sequence and the answer document sequence by using the [ SEP ], adding the [ CLS ] at the beginning of the sequence to form the [ CLS ] target question [ SEP ] answer document [ SEP ], and carrying out padding processing after the connected sequence. After preprocessing, when the length of the input question + reply document sequence is greater than the maximum sequence length specified by a BERT network in an answer extraction model, the reply document is divided into a plurality of segments with certain step length and is respectively connected with the question, and an overlapping part with certain length is arranged between the reply document segment and the reply document segment, so that the semantics of the complete reply document is not reduced as far as possible by the divided reply document. Assuming that the target question is "who has historically initiated the arming policy", the reply document 1 is "songtai ancestor zhangyin established north song after initiating the arming policy", and the reply document 2 is "the event name of arming policy changed to the imperial dynasty of the north song country". The word sequences obtained after the data preprocessing can be shown as follows, i.e., the word sequences of the target question and the reply document 1 are [ CLS ] calendar [ history ] up [ who ] sending [ motion ] shown [ laid ] bridge [ soldier ] changed [ SEP ] man [ tai ] ancestor [ Zhao ] Zhang [ passage ] shown [ passage ] moving [ SEP ] moving [ soldier ] P; and the word sequence of the question and the answer document 2 is [ CLS ] [ CALENDAR ] [ SARY ] [ WARE ] [ TRANSMIT ] [ MOTION ] [ CHEN ] [ CABLE ] [ SEP ] [ CHE ] [ CHEM ] [ BRIDGE ] [ SOLDERS (BINDER) to (VARIATION) to (BIN) [ Song (THE BOARD) ] to (KAI) [ CHEN ] to (THE CORD ] to (the empress) [ TRANS ] to (THE END) ] to (THE END ] to (the EXE) [ THE END ] to (the EXE) [ THE PROCEMENT ] to (the SEP) ]. After the terminal device processes the target question and the reply document to obtain a word sequence, inputting the word sequence into a coding layer of the answer extraction model to obtain output vectors corresponding to characters in the target question and the reply document; and mixing the predicted answers to the output vector to obtain the predicted answer set. It is understood that the starting word of the predicted answer is a token (also called starttoken) corresponding to the starting character of the predicted answer in the word sequence, and the ending word of the predicted answer is a token (also called endtoken) corresponding to the ending character of the predicted answer in the word sequence. In an exemplary scenario, the predicted answer from the above reply document 1 is [ Zhao ] [ Con ] [ Ku ] and [ Yin ], and the starting word of the predicted answer is [ Zhao ] and the ending word thereof is [ Ku-Yin ].

In this embodiment, the answer extraction model may adopt the idea of a self-attention (attention) mechanism to train a start position for predicting the answer and an end position for predicting the answer, respectively, so as to extract the predicted answer. It is to be understood that the answer extraction model may also use other ways to extract the predicted answer, and is not limited herein.

603. And classifying the predicted answer set according to the same predicted answer to obtain at least one answer set.

After the terminal device obtains the predicted answer set corresponding to the target question, classifying the predicted answer set according to the same predicted answer to obtain at least one answer set. In the exemplary scenario as shown in step 602, if the predicted answer set includes 6 predicted answers, wherein 4 predicted answers are "zhangyin", the four predicted answers are assigned to an answer set, another "zhao-guanyi" is assigned to an answer set, and another "faggy training" is assigned to an answer set.

604. And calling the answer reordering model to obtain at least one answer characterization set corresponding to the at least one answer set, wherein each answer set in the at least one answer set corresponds to one predicted answer, and the answers in the at least one answer characterization set are characterized as the characterization vectors of the predicted answers.

The terminal device calls the answer reordering model to obtain a first output vector set of the target question and each answer document in a first answer document set, wherein the first answer document set comprises answer documents corresponding to a first answer set in the at least one answer set; selecting character strings of the predicted answers corresponding to the first answer document set from the first output vector set to perform pooling to obtain an intermediate answer representation set of each predicted answer in the first answer set; performing self-attention processing on the intermediate answer representation set to obtain a first answer representation of the predicted answer corresponding to the first answer set; and repeating the steps to obtain the answer representation of the predicted answer corresponding to each answer set in the at least one answer set, and classifying the answer representation into the at least one answer representation set, wherein the first answer representation is included in the at least one answer representation set.

Specifically, the terminal device may adopt the data preprocessing method in step 603 when obtaining the first output vector set, that is, each character of the target question and each character of each reply document in the first reply document set are spliced by using a start character and an interval character to obtain a first word sequence set; inputting the first word sequence set into the answer reordering model to obtain the first output vector set; then the terminal equipment selects character strings of the predicted answers corresponding to the first answer document set from the first output vector set to perform pooling so as to obtain an intermediate answer representation set of each predicted answer in the first answer set; then, performing self-attention processing on each intermediate answer representation in the intermediate answer representation set to obtain a self-attention prediction score set; normalizing the self-attention prediction score set to obtain a normalized prediction score set; and carrying out weighted summation on all the normalized prediction scores in the normalized prediction score set to obtain the first answer representation.

In an exemplary scheme, it is assumed that the answer corresponding to the first answer set is "zhangangyin", the corresponding reply documents are a reply document 1 and a reply document 2, and the target question is "who has historically initiated the arming of the bridge", the reply document 1 is "song tai zu zhangyin" which has established north sons after initiating the arming of the bridge ", and the reply document 2 is" the event name of the arming of the emperor of north sons opening country. The word sequences obtained after the data preprocessing can be shown as follows, i.e., the word sequences of the target question and the reply document 1 are [ CLS ] calendar [ history ] up [ who ] sending [ motion ] shown [ laid ] bridge [ soldier ] changed [ SEP ] man [ tai ] ancestor [ Zhao ] Zhang [ passage ] shown [ passage ] moving [ SEP ] moving [ soldier ] P; the word sequence of the question and the answer document 2 is [ CLS ] [ history ] [ person ] [ Send ] [ move ] [ Lin ] [ soldier ] to [ SEP ] [ soldier ] to [ bridge ] [ soldier ] to [ SEP ] to [ imperial ] to [ SEP ] to [ person ] to [ SEP ]. If the answer corresponding to the second answer set is "Zhao Guang Yi", the answer document 3 corresponding to the answer document, and the target question is "who has started old bridge military variation historically", the answer document 3 is "Zhao Guang Yi and brothers thereof have established North Song after starting old bridge military variation", the word sequence of the question and the answer document 3 is [ CLS ] [ history ] [ person ] [ issue ] [ move ] [ Ling ] [ soldiers ] [ SEP ] [ Zhao ] [ light ] [ meaning ] [ and ] [ brother ] [ SEP ] [ link ] [ SEP ] [ SE ] [ soldiers ] [ Ling ] [ SOL ] [ SOLE ] [ SOLUTION ] [ Ling ] [ SEP ]. If the answer corresponding to the third answer set is 'diesel training', the answer document 4 corresponding to the answer document, the target question is 'historical one who has started military troops on old bridge', the answer document 4 is 'diesel training fell to the monarch of a country of death after military troops on old bridge', the word sequence of the target question and the answer document 3 is "CLS" calendar "history" up "man" move "old" bridge "soldiers" changed "SEP [ CHAI ] [ CAI ] [ BRIDGE ] [ BINDING ] [ DEMAN ] [ VARIATION ] [ BACK ] [ THAN ] [ CAUSED ] [ CHEM ] [ CONDITION ] [ SECP ]. And then inputting the word sequences into the coding layer of the answer reordering model to obtain an output vector set of each word sequence. In this embodiment, in order to facilitate the terminal device to obtain the predicted answer from the output vector set, when performing data preprocessing, labeling processing may be performed on the predicted answer extracted by the answer extraction model. In this embodiment, the first answer set is used as "-Kuangyin" for explanation, and then the terminal device intercepts "Zhang-Kuangyin" in each word sequence in the first answer set, and performs pooling to obtain an intermediate answer representation of the answers corresponding to the first answer set.

In this embodiment, the pooling process may use formula 1:

the equation 1 is: v _i ＝avg_pooling(V _t ，V _t+1 ，…，V _t+m ) (ii) a Wherein, the V _t ，V _t+1 ，…，V _t+m Output vector sequence for indicating the predicted answer correspondence in a reply document, such as the output vector (V) corresponding to "Zhang Kuangyin" in reply document 1 _t ，V _t+1 ，V _t+2 ) (ii) a The V is _i Indicating an intermediate answer representation for the ith reply document. In the first answer set, the terminal device acquires 2 intermediate answer representations, namely the intermediate answer representation V corresponding to the 'zhangKuangyin' in the answer document 1 ₁ Intermediate answer characterization V corresponding to "Zhang-Kuangyin" in answer document 2 ₂ 。

Then, after obtaining the intermediate answer representation, the terminal device obtains an answer representation of the predicted answer corresponding to the first answer set by using the self-attention processing mechanism. In this embodiment, the terminal device may use a self-attention processing mechanism according to formulas 2 to 4:

the equation 2 is: s. the _i ＝V ^T tanh(WV _i +B)；

The equation 3 is:

the equation 4 is: v _answer ＝a ₁ V ₁ +a ₂ V ₂ +…+a _k V _i )；

Wherein S in the formula 2 _i For indicating a corresponding self-attention score for each of the intermediate answer representations, V ^T The method is used for indicating the rank solving of a parameter matrix V, wherein W is a parameter matrix; the B is a parameter vector. A in the equation 3 _k The self-attention score is used for indicating that normalization processing is carried out on the self-attention score corresponding to the intermediate answer representation to obtain a normalized self-attention score; v in the equation 4 _answer And the answer representation is used for indicating the predicted answer corresponding to the answer set. In an exemplary scheme, the output vector (V) corresponding to "Zhang-Kuangyin" in the reply document 1 _t ，V _t+1 ，V _t+2 ) (ii) a The V is _i Indicating an intermediate answer representation for the ith reply document. In the first answer set, the terminal device acquires 2 intermediate answer representations, namely the intermediate answer representation V corresponding to the 'zhangKuangyin' in the answer document 1 ₁ The intermediate answer characterization V corresponding to "Z-Kuangyin" in the answer document 2 ₂ . After the self-attention processing mechanism is used for processing, the self-attention score corresponding to the Zhang-Kuangyin in the answer document 1 is 80, and the self-attention score corresponding to the Zhang-Kuangyin in the answer document 2 is 80; the normalized self-attention score corresponding to "-Kuangyin" in the answer document 1 is 0.5, and the normalized self-attention score corresponding to "-Zhang-Kuangyin" in the answer document 2 is 0.5; the answer of the corresponding Z-Kuangyin in the first answer set is characterized as "0.5V ₁ +0.5V ₂ ”。

And in analogy, the terminal device performs the same processing on other answers, so as to obtain a plurality of answer representation sets corresponding to the target question.

605. And calling the answer reordering model to obtain the prediction scores of the predicted answers corresponding to the answer representation set.

And the terminal equipment inputs each answer representation in the answer representation set into the answer reordering model to obtain the prediction scores of various types of predicted answers.

In this embodiment, the following formula 5 may be adopted for the terminal device to obtain the prediction score according to the answer representation calculation:

equation 5 is: logits = WV _answer (ii) a Wherein the logits are used to indicate the prediction scores of the answers characterizing the corresponding predicted answers.

It can be understood that, in the process of training the answer reordering model, the terminal device may further obtain a corresponding loss value according to the predicted answer, and specifically, the following formulas 6 and 7 may be used:

equation 6 is: prob = softmax (locations);

equation 7 is: loss = -log (Prob) _label )；

Wherein, the formula 6 is used to indicate that the prediction score obtains the probability value through the activation function, then the formula 7 is used to indicate that the cross entropy processing is performed on the probability value to obtain the loss value of the answer reordering model, and the Prob _label Is a probability value labeled 0 or 1.

606. And outputting a final answer corresponding to the target question according to the prediction score.

And the terminal equipment selects the answer with the highest prediction score and outputs the answer as the final answer corresponding to the target question.

In this embodiment, the predicted answers extracted from the multiple answer documents are classified, and the semantic features between the answer documents from which the same answer is extracted are obtained, so that the ranking effect in the answer ranking model is increased, and the accuracy of reading and understanding of the multiple documents is improved.

Referring to fig. 7, fig. 7 is a schematic view of an embodiment of a multiple document reading and understanding apparatus 20 in the embodiment of the present application, and the multiple document reading and understanding apparatus 20 includes:

an obtaining module 201, configured to obtain a target question and multiple reply documents corresponding to the target question, and establish an answer extraction model and an answer reordering model;

the processing module 202 is configured to invoke the answer extraction model to obtain a predicted answer set of the plurality of answer documents to the target question; classifying the predicted answer set according to the same predicted answer to obtain at least one answer set; calling the answer reordering model to obtain at least one answer characterization set corresponding to the at least one answer set, wherein each answer set in the at least one answer set corresponds to one predicted answer, and the answer characterization in the at least one answer characterization set is the characterization vector of the predicted answer; calling the answer reordering model to obtain the prediction score of the prediction answer corresponding to the answer representation set;

and the output module 203 is configured to output a final answer corresponding to the target question according to the prediction score.

In the embodiment of the application, a multi-document reading and understanding device is provided. By adopting the device, the predicted answers extracted from the multiple answer documents are classified, and the semantic features between the answer documents with the same answer extracted are obtained, so that the ranking effect in the answer reordering model is improved, and the accuracy of reading and understanding of the multiple documents is improved.

Alternatively, on the basis of the embodiment corresponding to fig. 7, in another embodiment of the multiple document reading and understanding apparatus 20 provided by the embodiment of the present application,

the processing module 202 is specifically configured to invoke the answer reordering model to obtain a first output vector set of the target question and each answer document in a first answer document set, where the first answer document set includes an answer document corresponding to a first answer set in the at least one answer set;

In the embodiment of the application, a multi-document reading and understanding device is provided. By adopting the device, the same answers in the answer set are divided into a set, after each word sequence of the answer documents in the same set is subjected to pooling processing, the pooled characterization vectors are subjected to self-attention processing to obtain the final answer characterization of the same answer, and thus the context of each different answer document is processed, so that the final answer characterization has rich semantic features, and the accuracy of multi-document reading and understanding is improved.

Optionally, on the basis of the embodiment corresponding to fig. 7, in another embodiment of the multiple document reading and understanding apparatus 20 provided in the embodiment of the present application, the processing module 202 is specifically configured to splice each character of the target question with each character of each reply document in the first reply document set, and use a start character and an interval character to obtain a first word sequence set;

In the embodiment of the application, a multi-document reading and understanding device is provided. By adopting the device, the output vector expression is carried out on each character, the representation of the answer can be improved, and the accuracy of multi-document reading understanding is improved.

Optionally, on the basis of the embodiment corresponding to fig. 7, in another embodiment of the multi-document reading and understanding apparatus 20 provided in the embodiment of the present application, the processing module 202 is specifically configured to perform self-attention processing on each intermediate answer representation in the intermediate answer representation set to obtain a self-attention prediction score set;

In the embodiment of the application, a multi-document reading and understanding device is provided. By adopting the device, after the prediction scores of the answers are weighted and summed, the contexts of different answer documents are processed, so that the final answer representation has rich semantic features, and the accuracy of multi-document reading and understanding is improved.

the acquisition module is also used for acquiring a training sample set, and establishing an initial answer extraction model and an initial answer reordering model, wherein the training sample set comprises a question sample set and a reply document sample set corresponding to the question sample set;

as shown in fig. 8, the apparatus further includes a training module 204 for training the initial answer extraction model and the initial answer reordering model by using the training sample set to obtain the answer extraction model and the answer reordering model.

In the embodiment of the application, a multi-document reading and understanding device is provided. By adopting the device, model training is carried out according to the training sample set, and meanwhile, the predicted answers extracted from the multiple answer documents are classified according to classes in the training process, so that the semantic features among the answer documents with the same answer are obtained, thereby increasing the ranking effect in the re-ranking network and improving the accuracy of reading and understanding of the multiple documents.

Optionally, on the basis of the embodiment corresponding to fig. 8, in another embodiment of the multi-document reading and understanding apparatus 20 provided in the embodiment of the present application, the initial answer extraction model and the initial answer reordering model share the same coding layer, and the training module 204 is specifically configured to call the coding layer to obtain a second output vector corresponding to the training sample set;

In the embodiment of the application, a multi-document reading and understanding device is provided. By adopting the device, the answer extraction model and the answer reordering model share the coding layer, so that the two models can be jointly studied, and the effect of the multi-document reading understanding model is improved.

Optionally, on the basis of the embodiment corresponding to fig. 8, in another embodiment of the apparatus 20 for reading and understanding multiple documents provided in the embodiment of the present application, the training module 204 is specifically configured to input the output vector into the initial answer extraction model to obtain the training answer set;

the training module 204 is specifically configured to input the training answer set into the initial answer reordering model to obtain a training answer representation set corresponding to the training answer set;

In the embodiment of the application, a multi-document reading and understanding device is provided. By adopting the device, the answer extraction model and the answer reordering model share the coding layer, and the effect of the multi-document reading understanding model is improved.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a server provided by an embodiment of the present application, and the server 300 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 322 (e.g., one or more processors) and a memory 332, and one or more storage media 330 (e.g., one or more mass storage devices) storing an application 342 or data 344. Memory 332 and storage media 330 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 330 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 322 may be configured to communicate with the storage medium 330 to execute a series of instruction operations in the storage medium 330 on the server 300.

The Server 300 may also include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input-output interfaces 358, and/or one or more operating systems 341, such as a Windows Server ^TM ，Mac OS X ^TM ，Unix ^TM ,Linux ^TM ，FreeBSD ^TM And so on.

The steps performed by the server in the above embodiments may be based on the server structure shown in fig. 9.

Referring to fig. 10, for convenience of description, only a portion related to the embodiment of the present application is shown, and details of the method portion related to the embodiment of the present application are not disclosed. In the embodiment of the present application, a terminal device is taken as an example to explain:

fig. 10 is a block diagram illustrating a partial structure of a smartphone related to a terminal device provided in an embodiment of the present application. Referring to fig. 10, the smart phone includes: radio Frequency (RF) circuitry 410, memory 420, input unit 430, display unit 440, sensor 450, audio circuitry 460, wireless fidelity (WiFi) module 470, processor 480, and power supply 490. Those skilled in the art will appreciate that the smartphone configuration shown in fig. 10 is not limiting and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components.

The following describes each component of the smart phone in detail with reference to fig. 10:

the RF circuit 410 may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, receives downlink information of a base station and then processes the received downlink information to the processor 480; in addition, the data for designing uplink is transmitted to the base station. In general, RF circuitry 410 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuitry 410 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to global system for mobile communication (GSM), general Packet Radio Service (GPRS), code Division Multiple Access (CDMA), wideband Code Division Multiple Access (WCDMA), long Term Evolution (LTE), email, short Message Service (SMS), etc.

The memory 420 may be used to store software programs and modules, and the processor 480 executes various functional applications and data processing of the smart phone by operating the software programs and modules stored in the memory 420. The memory 420 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phone book, etc.) created according to the use of the smartphone, and the like. Further, the memory 420 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 430 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the smartphone. Specifically, the input unit 430 may include a touch panel 431 and other input devices 432. The touch panel 431, also called a touch screen, may collect touch operations of a user on or near the touch panel 431 (e.g., operations of the user on or near the touch panel 431 using any suitable object or accessory such as a finger or a stylus) and drive the corresponding connection device according to a preset program. Alternatively, the touch panel 431 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, and sends the touch point coordinates to the processor 480, and can receive and execute commands sent by the processor 480. In addition, the touch panel 431 may be implemented by various types such as resistive, capacitive, infrared, and surface acoustic wave. The input unit 430 may include other input devices 432 in addition to the touch panel 431. In particular, other input devices 432 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 440 may be used to display information input by the user or information provided to the user and various menus of the smartphone. The display unit 440 may include a display panel 441, and optionally, the display panel 441 may be configured in the form of a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), or the like. Further, the touch panel 431 can cover the display panel 441, and when the touch panel 431 detects a touch operation on or near the touch panel 431, the touch operation is transmitted to the processor 480 to determine the type of the touch event, and then the processor 480 provides a corresponding visual output on the display panel 441 according to the type of the touch event. Although in fig. 10, the touch panel 431 and the display panel 441 are two independent components to implement the input and output functions of the smart phone, in some embodiments, the touch panel 431 and the display panel 441 may be integrated to implement the input and output functions of the smart phone.

The smartphone may also include at least one sensor 450, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel 441 according to the brightness of ambient light, and a proximity sensor that may turn off the display panel 441 and/or the backlight when the smartphone is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications (such as horizontal and vertical screen switching, related games, magnetometer attitude calibration) for recognizing the attitude of the smartphone, and related functions (such as pedometer and tapping) for vibration recognition; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the smart phone, further description is omitted here.

The audio circuit 460, speaker 461, microphone 462 may provide an audio interface between the user and the smartphone. The audio circuit 460 may transmit the electrical signal converted from the received audio data to the speaker 461, and convert the electrical signal into a sound signal for output by the speaker 461; on the other hand, the microphone 462 converts the collected sound signals into electrical signals, which are received by the audio circuit 460 and converted into audio data, which are then processed by the audio data output processor 480, either by the RF circuit 410 for transmission to, for example, another smartphone, or by outputting the audio data to the memory 420 for further processing.

WiFi belongs to short-distance wireless transmission technology, and the smart phone can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 470, and provides wireless broadband internet access for the user. Although fig. 10 shows the WiFi module 470, it is understood that it does not belong to the essential constitution of the smartphone and can be omitted entirely as needed within the scope not changing the essence of the invention.

The processor 480 is a control center of the smart phone, connects various parts of the entire smart phone by using various interfaces and lines, and performs various functions of the smart phone and processes data by operating or executing software programs and/or modules stored in the memory 420 and calling data stored in the memory 420, thereby integrally monitoring the smart phone. Optionally, processor 480 may include one or more processing units; optionally, the processor 480 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, an application program, and the like, and the modem processor mainly processes wireless communication. It will be appreciated that the modem processor described above may not be integrated into processor 480.

The smart phone further includes a power source 490 (e.g., a battery) for supplying power to various components, and optionally, the power source may be logically connected to the processor 480 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system.

Although not shown, the smart phone may further include a camera, a bluetooth module, and the like, which are not described in detail herein.

The steps performed by the terminal device in the above-described embodiment may be based on the terminal device structure shown in fig. 10.

Embodiments of the present application further provide a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the method described in the foregoing embodiments.

Embodiments of the present application also provide a computer program product including a program, which, when run on a computer, causes the computer to perform the method described in the foregoing embodiments.

It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a portable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present application.

Claims

1. A method for reading and understanding multiple documents, comprising:

acquiring a target question and a plurality of answer documents corresponding to the target question, and establishing an answer extraction model and an answer reordering model;

classifying the predicted answer sets according to the same predicted answers to obtain at least one answer set;

calling the answer reordering model to obtain at least one answer representation set corresponding to the at least one answer set, wherein each answer set in the at least one answer set corresponds to one predicted answer, and the answers in the at least one answer representation set are represented by the representation vectors of the predicted answers;

2. The method of claim 1, wherein said invoking the answer reordering model to obtain at least one answer characterization set corresponding to the at least one answer set comprises:

calling the answer reordering model to obtain a first output vector set of the target question and each answer document in a first answer document set, wherein the first answer document set comprises answer documents corresponding to a first answer set in at least one answer set;

selecting character strings of predicted answers corresponding to the first answer document set from the first output vector set to perform pooling to obtain an intermediate answer representation set of each predicted answer in the first answer set;

and repeating the steps to obtain answer representations of the predicted answers corresponding to the answer sets in the at least one answer set, and classifying the answer representations into the at least one answer representation set, wherein the first answer representation is included in the at least one answer representation set.

3. The method of claim 2, wherein said invoking the answer re-ranking model to obtain a first set of output vectors for the target question and each of the response documents in the first set of response documents comprises:

splicing each character of the target question with each character of each reply document in the first reply document set by using a start character and an interval character to obtain a first word sequence set;

4. The method according to claim 2, wherein said self-attentive processing the intermediate answer representation set to obtain a first answer representation of predicted answers corresponding to the first answer set comprises:

performing self-attention processing on each intermediate answer representation in the intermediate answer representation set to obtain a self-attention prediction score set;

5. The method according to any one of claims 1 to 4, further comprising:

acquiring a training sample set, and establishing an initial answer extraction model and an initial answer reordering model, wherein the training sample set comprises a question sample set and a reply document sample set corresponding to the question sample set;

and training the initial answer extraction model and the initial answer reordering model by using the training sample set to obtain the answer extraction model and the answer reordering model.

6. The method of claim 5, wherein the initial answer extraction model and the initial answer reordering model share the same coding layer, and training the initial answer extraction model and the initial answer reordering model with the training sample set to obtain the answer extraction model and the answer reordering model comprises:

calling the coding layer to obtain a second output vector corresponding to the training sample set;

and reversely adjusting the weight parameters of the initial answer extraction model and the initial answer reordering model by using the sum of the first loss value and the second loss value to obtain the answer extraction model and the answer reordering model.

7. The method of claim 6, wherein inputting the output vector into the initial answer extraction model to obtain a first loss value and a training answer set comprises:

inputting the output vector into the initial answer extraction model to obtain the training answer set;

acquiring the prediction scores of starting words and ending words corresponding to each training answer in the training answer set of the output vector;

converting the prediction scores of the start words into first probability values by using an activation function, and converting the prediction scores of the end words into second probability values by using the activation function;

the inputting the training answer set into the initial answer reordering model to obtain a second loss value includes:

inputting the training answer set into the initial answer reordering model to obtain a training answer representation set corresponding to the training answer set;

converting each training prediction score in the set of training prediction scores into a third probability value by using an activation function;

8. A multi-document reading understanding model, comprising:

the system comprises an acquisition module, a query module and a query module, wherein the acquisition module is used for acquiring a target question, a plurality of reply documents corresponding to the target question, an answer extraction model and an answer reordering model;

the processing module is used for calling the answer extraction model to obtain a prediction answer set of the plurality of answer documents to the target question; classifying the predicted answer sets according to the same predicted answers to obtain at least one answer set; calling the answer reordering model to obtain at least one answer representation set corresponding to the at least one answer set, wherein each answer set in the at least one answer set corresponds to one predicted answer, and the answers in the at least one answer representation set are represented by the representation vectors of the predicted answers; calling the answer reordering model to obtain the prediction score of the prediction answer corresponding to the answer representation set;

9. A computer device, comprising: a memory, a processor, and a bus system;

wherein the memory is used for storing programs;

the processor for executing the program in the memory, the processor for performing the method of any one of claims 1 to 7 according to instructions in program code;

10. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1 to 7.