CN115455160B

CN115455160B - Multi-document reading and understanding method, device, equipment and storage medium

Info

Publication number: CN115455160B
Application number: CN202211071561.1A
Authority: CN
Inventors: 杨韬
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-09-02
Filing date: 2022-09-02
Publication date: 2024-08-06
Anticipated expiration: 2042-09-02
Also published as: CN115455160A

Abstract

The embodiment of the application provides a multi-document reading and understanding method, device, equipment and storage medium, which are used for acquiring semantic features among reply documents from which the same answers are extracted, so that the ordering effect in an answer reordering model is increased, and the accuracy of multi-document reading and understanding is improved. Comprising the following steps: acquiring a target question, a plurality of reply documents corresponding to the target question, and establishing an answer extraction model and an answer reordering model; invoking an answer extraction model to obtain a predicted answer set of a plurality of answer documents on the target questions; classifying the predicted answer sets according to the same predicted answer to obtain at least one answer set; invoking an answer reordering model to obtain at least one answer characterization set corresponding to at least one answer set; invoking an answer reordering model to obtain the prediction scores of the prediction answers corresponding to the answer characterization set; and outputting a final answer corresponding to the target question according to the prediction score. The application can be applied to the field of artificial intelligence.

Description

Multi-document reading and understanding method, device, equipment and storage medium

Technical Field

The application relates to the field of artificial intelligence, in particular to a multi-document reading and understanding method, a device, equipment and a storage medium.

Background

Machine reading understanding (MACHINE READING comprehension, MRC) is an important task in the field of natural language processing (natural language processing, NLP) that aims to let machines extract relevant information and knowledge from given questions and articles to get answers. Compared with the basic tasks of Named Entity Recognition (NER) in natural language processing, such as relation extraction, MRC is a more complex, higher-level task, has higher requirement on understanding semantics, and extracts more text information. It can be applied to a variety of fields such as search engines.

Search engines are a very important way for people to obtain information. I.e., a large number of users search queries through a search engine, which includes a large number of question-and-answer type query operations. However, conventional web search engines typically do "matching," i.e., relatedly matching the queried question with the reply document, rather than more precisely understanding the question.

Intelligent question-answering techniques can just compensate for this limitation of conventional search engines. The user submits natural language inquiry to the system, the system directly returns answers meeting the user demands, the cost of manual participation is reduced, and the process of acquiring information and knowledge by the user is changed into a one-to-one answer mode. For the user, help the user to obtain the answer in the fastest time, search experience is better, and for the content provider, answer first top-set shows, can obtain more exposure and flow, helps the construction of content ecology. Therefore, a scheme capable of improving the accuracy of reading and understanding multiple documents is highly demanded.

Disclosure of Invention

The embodiment of the application provides a multi-document reading and understanding method, device, equipment and storage medium, which are used for classifying predicted answers extracted from a plurality of answer documents and acquiring semantic features among the answer documents from which the same answer is extracted, so that the ordering effect in an answer reordering model is improved, and the accuracy of multi-document reading and understanding is improved.

In view of this, one aspect of the present application provides a multi-document reading and understanding method, including: acquiring a target question, a plurality of reply documents corresponding to the target question, and establishing an answer extraction model and an answer reordering model;

invoking the answer extraction model to obtain a predicted answer set of the plurality of answer documents to the target question;

classifying the predicted answer sets according to the same predicted answer to obtain at least one answer set;

Invoking the answer reordering model to obtain at least one answer characterization set corresponding to the at least one answer set, wherein each answer set in the at least one answer set corresponds to a predicted answer, and the answer characterization in the at least one answer characterization set is a characterization vector of the predicted answer;

Invoking the answer reordering model to obtain the prediction scores of the prediction answers corresponding to the answer characterization set;

And outputting a final answer corresponding to the target question according to the prediction score.

In another aspect, the present application provides a multi-document reading and understanding apparatus, comprising: the acquisition module is used for acquiring a target question, a plurality of reply documents corresponding to the target question, and establishing an answer extraction model and an answer reordering model;

The processing module is used for calling the answer extraction model to acquire a predicted answer set of the plurality of answer documents on the target questions; classifying the predicted answer sets according to the same predicted answer to obtain at least one answer set; invoking the answer reordering model to obtain at least one answer characterization set corresponding to the at least one answer set, wherein the answer characterization in the at least one answer characterization set is characterized by a characterization vector of a predicted answer corresponding to each answer set in the at least one answer set; invoking the answer reordering model to obtain the prediction scores of the prediction answers corresponding to the answer characterization set;

And the output module is used for outputting a final answer corresponding to the target question according to the prediction score.

In one possible design, in another implementation of another aspect of the embodiments of the present application, the processing module is specifically configured to invoke the answer reordering model to obtain the target question and a first output vector set of each answer document in a first answer document set, where the first answer document set includes answer documents corresponding to the first answer set in the at least one answer set;

Selecting a character string of the predicted answer corresponding to the first answer document set from the first output vector set, and pooling to obtain an intermediate answer characterization set of each predicted answer in the first answer set;

performing self-attention processing on the intermediate answer characterization set to obtain a first answer characterization of a predicted answer corresponding to the first answer set;

and so on, traversing to obtain answer characterizations of the predicted answers corresponding to each answer set in the at least one answer set, and classifying the answer characterizations into the at least one answer characterizations set, wherein the first answer characterizations are contained in the at least one answer characterizations set.

In one possible design, in another implementation manner of another aspect of the embodiments of the present application, the processing module is specifically configured to splice each character of the target question with each character of each reply document in the first reply document set, and obtain a first word sequence set by using the start character and the interval character;

the first word sequence set is input into the answer reordering model to obtain the first output vector set.

In one possible design, in another implementation of another aspect of the embodiments of the present application, the processing module is specifically configured to perform self-attention processing on each intermediate answer token in the set of intermediate answer tokens to obtain a set of self-attention prediction scores;

normalizing the self-attention prediction score set to obtain a normalized prediction score set;

And carrying out weighted summation on each normalized prediction score in the normalized prediction score set to obtain the first answer representation.

In one possible design, in another implementation manner of another aspect of the embodiments of the present application, the obtaining module is further configured to obtain a training sample set, and establish an initial answer extraction model and an initial answer reordering model, where the training sample set includes a question sample set and a answer document sample set corresponding to the question sample set;

The device also comprises a training module for training the initial answer extraction model and the initial answer reordering model by using the training sample set to obtain the answer extraction model and the answer reordering model.

In one possible design, in another implementation manner of another aspect of the embodiments of the present application, the initial answer extraction model and the initial answer reordering model share the same coding layer, and the training module is specifically configured to invoke the coding layer to obtain a second output vector corresponding to the training sample set;

Inputting the second output vector into the initial answer extraction model to obtain a first loss value and a training answer set;

Inputting the training answer set into the initial answer reordering model to obtain a second loss value;

And reversely adjusting the weight parameters of the initial answer extraction model and the initial answer reordering model by using the sum of the first loss value and the second loss value to obtain the answer extraction model and the answer reordering model.

In one possible design, in another implementation of another aspect of the embodiments of the present application, the training module is specifically configured to input the output vector into the initial answer extraction model to obtain the training answer set;

acquiring the predictive score of a start word and the predictive score of an end word corresponding to each training answer in the training answer set of the output vector;

converting the predictive score of the start word into a first probability value by using an activation function, and converting the predictive score of the end word into a second probability value by using the activation function;

obtaining the first loss value according to the first probability value and the second probability value;

the training module is specifically configured to input the training answer set into the initial answer reordering model to obtain a training answer characterization set corresponding to the training answer set;

Acquiring a training prediction score set corresponding to each training answer in the training answer characterization set;

converting each training predictive score in the training predictive score set into a third probability value using an activation function;

And obtaining the second loss value by using the cross entropy and the third probability value.

Another aspect of the present application provides a computer apparatus comprising: a memory, a processor, and a bus system;

wherein the memory is used for storing programs;

the processor is used for executing the program in the memory, and the processor is used for executing the method according to the aspects according to the instructions in the program code;

The bus system is used to connect the memory and the processor to communicate the memory and the processor.

Another aspect of the application provides a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the methods of the above aspects.

In another aspect of the application, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the methods provided in the above aspects.

From the above technical solutions, the embodiment of the present application has the following advantages: and classifying the predicted answers extracted from the plurality of answer documents, and acquiring semantic features among the answer documents from which the same answer is extracted, so that the ordering effect in an answer reordering model is improved, and the accuracy of reading and understanding of the multiple documents is improved.

Drawings

FIG. 1 is a schematic flow diagram of a question-answering system;

FIG. 2 is a schematic diagram of an application system architecture of a multi-document reading and understanding method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a system architecture of an answer extraction model and an answer reordering model according to an embodiment of the application;

FIG. 4a is a schematic diagram of a training architecture of an answer extraction model and an answer reordering model according to an embodiment of the application;

FIG. 4b is a schematic diagram of a network architecture of the answer extraction model and the answer reordering model according to the embodiment of the application;

FIG. 5a is a schematic diagram of another training architecture of the answer extraction model and the answer reordering model according to the embodiment of the application;

FIG. 5b is a schematic diagram of another network architecture of the answer extraction model and the answer reordering model according to the embodiment of the application;

FIG. 6 is a schematic diagram of one embodiment of a multi-document reading understanding method in an embodiment of the application;

FIG. 7 is a schematic diagram of an embodiment of a multi-document reading and understanding device in accordance with an embodiment of the present application;

FIG. 8 is a schematic diagram of an embodiment of a multiple document reading and understanding device in accordance with an embodiment of the application;

FIG. 9 is a schematic diagram of an embodiment of a multiple document reading and understanding device in accordance with an embodiment of the present application;

FIG. 10 is a schematic diagram of an embodiment of a multi-document reading and understanding device in an embodiment of the application.

Detailed Description

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "includes" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

In view of the many terms involved in the present application, these terms will be described first.

And (3) word segmentation, namely recombining the continuous word sequences into word sequences according to a certain specification. The effect of word recognition is achieved by enabling a computer to simulate the understanding of a sentence by a person.

Entity words, an entity refers to what can exist independently and serve as a basis for all attributes and a universal primitive, namely, the entity words refer to words capable of representing the entity. Nouns and pronouns are physical words, e.g. "Zhang San", "wife" are physical words.

The intention word, which is intended to clearly recognize the aim to be achieved, refers to words that can represent problems, such as "who", "where" is the intention word.

Machine learning (MACHINE LEARNING, ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

Deep learning (DEEP LEARNING, DL) is a branch of machine learning, an algorithm that attempts to abstract data at a high level using multiple processing layers that contain complex structures or consist of multiple nonlinear transformations.

Neural networks (Neural Network, NN), a deep learning model that mimics the structure and function of biological neural networks in the field of machine learning and cognitive sciences.

Information extraction (information extraction), i.e., extracting specific event or fact information from natural language text, helps the user to automatically classify, extract and reconstruct massive content. The specific event or fact information generally includes an entity (entity), a relationship (relation), an event (event). For example, time, place, key persona is extracted from news, or product name, development time, performance index, etc. is extracted from technical documents. Because the information extraction can extract the information frame and the fact information of interest to the user from the natural language, the information extraction has wide application in knowledge graph, information retrieval, question-answering system, emotion analysis and text mining. The information extraction mainly comprises three subtasks: entity extraction and chain finger, relationship extraction, and event extraction. The entity extraction and chain finger is named entity identification. The relation extraction is triple (triple) extraction, and is mainly used for extracting the relation between entities. Event extraction corresponds to the extraction of a multivariate relationship.

Relation extraction (Relation Extraction, RE), given an entity pair and text containing the entity pair, aims to determine the semantic relation of the entity pair based on the text. For example, given an entity pair (M country, country president) and text ("president candidate a defeats president candidate B in the most recent big choice, becoming the next president … of M country"), the user wishes to identify the relationship between the entities "president candidate a" and "M country" as "country president". In the relationship extraction, a set of relationships, such as "national president", is typically predefined.

Relationship classification (Relation Classification, RC), a modeling approach to relationship extraction, namely converting relationship extraction into classification problems, where each relationship corresponds to a class.

A question and answer system (Question Answering, QA) is provided that, given a piece of text and a question, the question and answer system can identify the answer location of the text from the text.

Machine reading understanding (MACHINE READING comprehension, MRC) is an important task in the field of natural language processing (natural language processing, NLP) that aims to let machines extract relevant information and knowledge from given questions and articles to get answers. Compared with the basic tasks of Named Entity Recognition (NER) in natural language processing, such as relation extraction, MRC is a more complex, higher-level task, has higher requirement on understanding semantics, and extracts more text information. It can be applied to a variety of fields such as search engines. Search engines are a very important way for people to obtain information. I.e., a large number of users search queries through a search engine, which includes a large number of question-and-answer type query operations. However, conventional web search engines typically do "matching," i.e., relatedly matching the queried question with the reply document, rather than more precisely understanding the question. Intelligent question-answering techniques can just compensate for this limitation of conventional search engines. The user submits natural language inquiry to the system, the system directly returns answers meeting the user demands, the cost of manual participation is reduced, and the process of acquiring information and knowledge by the user is changed into a one-to-one answer mode. For the user, help the user to obtain the answer in the fastest time, search experience is better, and for the content provider, answer first top-set shows, can obtain more exposure and flow, helps the construction of content ecology. The traditional intelligent question-answering technology at present mainly comprises three important modules: the specific flow of the article retrieval module, the answer extraction module and the answer reordering module can be as shown in fig. 1:

Acquiring input query characters, and filtering through question and answer intentions to obtain specific query questions; then detecting related reply documents from the paragraph index library through the article retrieval module; the multi-document is then input to a multi-document answer extraction module and an answer reordering module to obtain a final answer. The answer extraction module extracts an answer from each retrieved paragraph, and the answer reordering module performs a unified ranking on all the extracted answers, and finally selects the answer with the highest score as the final answer of the query question (query). Namely, answer extraction and answer reordering are performed by adopting a pipeline mode, namely, answer extraction is performed first, and then answer reordering is performed. Answer reordering is generally based on artificial features, i.e. some artificial features are constructed. Such as the number of answer documents from which the answer was extracted, the location of the answer in the answer documents, the score of the answer extraction module, etc., trains a logistic regression (Logistic Regression, LR) linear model or a gradient boosting decision tree (Gradient Boosting Decision Tree, GBDT) tree model. In this process, however, the answer documents from which the respective answers are extracted are independent of each other, and the semantic features of each other are not utilized.

In order to solve the problem, the application provides the following technical scheme: acquiring a target question, a plurality of reply documents corresponding to the target question, and establishing an answer extraction model and an answer reordering model; invoking the answer extraction model to obtain a predicted answer set of the plurality of answer documents to the target question; classifying the predicted answer sets according to the same predicted answer to obtain at least one answer set; invoking the answer reordering model to obtain at least one answer characterization set corresponding to the at least one answer set, wherein each answer set in the at least one answer set corresponds to a predicted answer, and the answer characterization in the at least one answer characterization set is a characterization vector of the predicted answer; invoking the answer reordering model to obtain the prediction scores of the prediction answers corresponding to the answer characterization set; and outputting a final answer corresponding to the target question according to the prediction score. Therefore, the predicted answers extracted from the plurality of answer documents are classified, semantic features among the answer documents from which the same answer is extracted are obtained, and therefore the ordering effect in the answer reordering model is improved, and the accuracy of reading and understanding of the multiple documents is improved.

The method provided by the application is applied to a system architecture shown in fig. 2, fig. 2 is a schematic diagram of the system architecture in the embodiment of the application, as shown in fig. 2, the system architecture includes a server and a terminal device, and a client (i.e. a search engine or a social software applet) is deployed on the terminal device, where the client may run on the terminal device in a browser mode, may also run on the terminal device in an Application (APP) mode, etc., and the specific presentation form of the client is not limited herein. The server related by the application can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDNs), basic cloud computing services such as big data and artificial intelligent platforms and the like. The terminal device may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a palm computer, a personal computer, a smart television, a smart watch, a vehicle-mounted device, a wearable device, and the like. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein. The number of servers and terminal devices is not limited either. The scheme provided by the application can be independently completed by the terminal equipment, can be independently completed by the server, and can be completed by the cooperation of the terminal equipment and the server, so that the scheme is not particularly limited. Among them, the reply document referred to in the present application may be stored in a Database (Database). The database can be considered as an electronic filing cabinet, namely a place for storing electronic files, and a user can perform operations such as adding, inquiring, updating, deleting and the like on data in the files. A "database" is a collection of data stored together in a manner that can be shared with multiple users, with as little redundancy as possible, independent of the application. The Database management system (Database MANAGEMENT SYSTEM, DBMS) is a computer software system designed for managing databases, and generally has basic functions of storage, interception, security, backup and the like. The database management system may classify according to the database model it supports, e.g., relational, extensible markup language (Extensible Markup Language, XML); or by the type of computer supported, e.g., server cluster, mobile phone; or by classification according to the query language used, e.g., structured query language (Structured Query Language, SQL), XQuery; or by performance impact emphasis, such as maximum scale, maximum speed of operation; or other classification schemes. Regardless of the manner of classification used, some DBMSs are able to support multiple query languages across categories, for example, simultaneously.

It will be appreciated that in the specific embodiments of the present application, related data such as reply documents are referred to, and when the above embodiments of the present application are applied to specific products or technologies, user approval or consent is required, and the collection, use and processing of related data is required to comply with the relevant laws and regulations and standards of the relevant countries and regions.

It can be understood that, in the embodiment of the present application, before performing multi-document reading understanding, the training method of the answer extraction model and the answer reordering model may also be provided in the embodiment of the present application. In an exemplary embodiment, the training method of the answer extraction model and the answer reordering model provided by the embodiment of the application can be executed by computer equipment. Next, an implementation environment of the training method of the answer extraction model and the answer reordering model provided by the embodiment of the application is introduced, and fig. 3 is a schematic diagram of an implementation environment of the training method of the answer extraction model and the answer reordering model provided by the embodiment of the application. Referring to fig. 3, the implementation environment includes a terminal device 101 and a server 102. The terminal device 101 and the server 102 can be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein.

In some embodiments, the terminal device 101 is, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a smart voice interaction device, a smart home appliance, a vehicle-mounted terminal, and the like. The terminal device 101 is installed and operated with a client supporting content recommendation, where the client may be operated on the terminal device 101 in the form of a browser, or may be operated on the terminal device in the form of a stand-alone Application (APP), and the specific presentation form of the client is not limited herein. In some embodiments, the server 102 is a stand-alone physical server, can be a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms. The server 102 is configured to provide background services for applications that support virtual scenarios. In some embodiments, the server 102 takes on primary computing effort and the terminal device 101 takes on secondary computing effort, e.g., the terminal device 101 provides sample data to the server 102, the server 102 provides a training process for the multi-objective recommendation model; or the server 102 and the terminal device 101 perform cooperative computing by adopting a distributed computing architecture.

It will be appreciated that the number of terminal devices 101 described above may be greater or lesser. For example, the terminal apparatus 101 may be only one, or the terminal apparatus 101 may be several tens or hundreds, or more. I.e. the number and device type of the terminal devices 101 are not limited in the embodiment of the present application.

Next, a training architecture of a training method of an answer extraction model and an answer reordering model provided by an embodiment of the present application is described, where a training sample set of the answer extraction model is obtained in the training architecture shown in fig. 4 a; the real extraction answers in the training sample set are used as supervision data to conduct supervision training on the answer extraction model. After the training sample set is obtained, the whole training process can be as follows: the training sample set inputs the answer extraction model to output estimated answers and other output results, then the estimated answers and the real extraction answers calculate first losses, and parameters of the answer extraction model are reversely adjusted according to the first loss values so as to realize training of the answer extraction model until the condition of finishing training is reached; classifying the estimated answers into a set according to the same estimated answer rule to obtain a plurality of estimated answer sets, obtaining a final answer representation of the answers in the estimated answer sets through a representation layer and a coding layer of an answer reordering model by each estimated answer set in the plurality of estimated answer sets, inputting the final answer representation into the answer reordering model to obtain estimated scores of the estimated answers corresponding to each estimated answer set, and calculating a second loss by the estimated scores and the real answers; and finally, reversely adjusting parameters of the answer reordering model according to the second loss so as to realize training of the answer reordering model until reaching the condition of finishing training. It will be appreciated that the training architecture shown in fig. 4a, the network architecture of the answer extraction model and the answer reordering model can be shown in fig. 4 b.

It can be understood that, in order to promote the training effect of the answer extraction model and the answer reordering model, a training architecture of the training method of the answer extraction model and the answer reordering model provided by the embodiment of the application is introduced below, and a training sample set of the answer extraction model is obtained in the training architecture shown in fig. 5 a; the real extraction answers in the training sample set are used as supervision data to conduct supervision training on the answer extraction model. After the training sample set is obtained, the whole training process can be as follows: inputting the training sample set into the answer extraction model to output estimated answers and other output results, and then calculating a first loss by the estimated answers and the real extracted answers; classifying the estimated answers into a set according to the same estimated answer rule to obtain a plurality of estimated answer sets, obtaining a final answer representation of the answers in the estimated answer sets through a representation layer and a coding layer of an answer reordering model by each estimated answer set in the plurality of estimated answer sets, inputting the final answer representation into the answer reordering model to obtain estimated scores of the estimated answers corresponding to each estimated answer set, and calculating a second loss by the estimated scores and the real answers; and finally, taking the sum of the second loss and the first loss as the overall loss value of the answer extraction model and the answer reordering model, and reversely adjusting the parameters of the answer extraction model and the answer reordering model according to the overall loss value so as to realize the training of the answer extraction model and the answer reordering model until reaching the condition of finishing training. It will be appreciated that the training architecture shown in fig. 5a, the network architecture of the answer extraction model and the answer reordering model can be shown in fig. 5 b. In this process, the answer extraction model calculates a prediction score and a probability value of each character (i.e., token) through a formula logits _i＝vVemb_i and a formula P _i＝softmax(logits_i, then selects a predicted answer according to the prediction score and the probability value of each character, and determines a probability of a start position (i.e., start token) of the predicted answer and a probability of an end position (i.e., end token) of the predicted answer, i.e., a start character of the predicted answer and an end character of the predicted answer; then, obtaining a first Loss of the answer extraction model according to a formula los1= -log P _start-logP_end; meanwhile, the answer reordering model is calculated by the formula loss2= -log (Prob _label), and then the overall Loss of the answer extraction model and the answer reordering model is loss1+loss2.

Answer extraction in this embodiment refers to finding out a question answer a given a question Q and one or more text fragments P (P1, P2, P3,..pn). The machine reading understanding is that given a text segment Paragraph and Question, the Answer is obtained. It is generally assumed that the Answer is contained in the text, so the goal of the machine reading understanding task is to get a span (start), which represents the position of the start character of the Answer in the Paragraph, and end, which represents the position of the end character of the Answer in the Paragraph. And carrying out natural language understanding according to the questions and the corresponding short text, and predicting answers to the questions according to the text. The answer to the decimated reading understanding task is a continuous word that appears in the text, and the answer must be a range in the text. Machine reading understands that there are several modes, i.e. different types of questions, answer types are also different, and in general, there are three questions: simple questions can be answered with simple facts, the answer is usually an entity, and the answer is brief; slightly complex narrative questions with slightly longer answers; the complex problem is usually about a point of view or opinion. Answer reordering means that the answers obtained in Answer extraction are subjected to unified ordering, and finally the Answer with the highest score is selected as Answer of a given question Q.

In this embodiment, the answer extraction model and the answer reordering model may be GBDT, a linear model, or a depth model, which is not limited herein.

It should be noted that, the implementation environment of the multi-document reading and understanding method provided by the embodiment of the present application may be the same as or different from the implementation environment of the training methods of the answer extraction model and the answer reordering model, which is not limited in the embodiment of the present application.

With reference to the foregoing description, the multi-document reading and understanding method of the present application will be described below, referring to fig. 6, in which a terminal device is used as an execution body, and an answer extraction model and an answer reordering model obtained by training with the training architecture shown in fig. 5a are described, and one embodiment of the multi-document reading and understanding method in the embodiment of the present application includes:

601. and acquiring the target question, a plurality of reply documents corresponding to the target question, and establishing an answer extraction model and an answer reordering model.

The terminal equipment acquires a target question input by a user through input equipment of the terminal equipment, and then acquires a plurality of answer documents corresponding to the target question through article retrieval; meanwhile, the terminal equipment can be provided with an answer extraction model and an answer reordering model or connected with a server provided with the answer extraction model and the answer reordering model. In this embodiment, the answer extraction model and the answer reordering model may be obtained by training using the training method shown in fig. 4a or fig. 4 b.

In this embodiment, the article retrieval may also be referred to as text recall. The method can be concretely as follows: namely, after the user inputs the target problem through the input interface of the terminal, the target problem is sent to the server after the input is completed. After receiving the target question, the server carries out recall processing on a text to be recalled (i.e. the reply document) according to the target question to obtain at least one corresponding reply document, wherein the reply document can be text in a knowledge base, and the knowledge base can be a database stored in the server in advance by a user or text which is not stored in the knowledge base, such as news reported recently on a webpage and text on a public number. In the process, after receiving the target problem, the server can perform word segmentation processing on the target problem to obtain keywords in the target problem, so that the weight of the keywords in the target problem is determined. After obtaining the keywords in the target questions, the server can determine the relevance between the keywords and the reply documents according to the keywords in the target questions and the reply documents. After the server obtains the weight of the key word and the relevance of the key word and the reply document, the weight of the key word and the relevance of the key word and the reply document can be weighted and summed to obtain the relevance score of the target problem and the reply document. After the server obtains the relevance scores of the target questions and the reply documents, the reply documents can be ordered in a descending order based on the relevance scores of the target questions and the reply documents to obtain the first N reply documents, and the first N reply documents are determined to be the reply documents so as to carry out corresponding processing on the reply documents subsequently.

602. And calling the answer extraction model to obtain a predicted answer set of the plurality of answer documents to the target question.

The terminal device inputs the target question and a plurality of answer documents into the answer extraction model to obtain a predicted answer set of the plurality of answer documents to the target question. I.e., each reply document will draw an answer. In an exemplary scenario, if the target question is "who has initiated Chen Qiao mutiny in history", 6 answer documents can be obtained through article retrieval, and each answer document corresponds to one answer, and the result of the predicted answer set may be as follows: the answers of the 4 reply documents are 'Zhaokuangkuang', the answer of one reply document is 'Zhao Guangyi', and the answer of one reply document is 'Chai Zongxun'.

In this embodiment, the terminal device also needs to perform data preprocessing before inputting the target question and a plurality of reply documents into the answer extraction model. Firstly, carrying out tokenization processing on the target questions and the reply documents, namely converting sentences into character-level sequences; and then connecting the target question sequence with the reply document sequence, separating the target question sequence by the 'SEP', adding the 'CLS' at the beginning of the sequence to form a [ CLS ] target question [ SEP ] reply document [ SEP ], and performing packing processing after the connected sequence. After preprocessing, when the length of the input question + answer document sequence is greater than the maximum sequence length specified by the BERT network in the answer extraction model, the answer document is divided into a plurality of sections which are respectively connected with the questions in a certain step length, and overlapping parts with a certain length are arranged between the answer document dividing sections so as to ensure that the divided answer document does not cut down the semantics of the complete answer document as much as possible. Assuming that the target question is "who initiated Chen Qiao mutiny historically", the reply document 1 is "Song Taizu, which establishes North Song after initiating Chen Qiao mutiny, and the reply document 2 is" Chen Qiao mutiny is the name of the event that the North Song emperor initiated politics. The word sequence obtained after the data preprocessing can be as follows, namely, the word sequence of the target question and the reply document 1 is "[ CLS ] [ history ] [ over ] [ who sends ] [ animal ] [ bridge ] [ soldier ] [ SEP ] [ Song ] [ too ] [ Zaozuo ] [ Zhao ] [ ] [ Zhang ] [ Zhi Shen ] [ dynamic ] [ Chen ] [ bridge ] [ variant ] [ after ] [ construction ] [ immediately ] [ north ] [ Song ] [ SEP ]"; the word sequence of the target question and the reply document 2 is "[ CLS ] [ history ] [ over ] [ who [ is sent ] [ is driven ] [ is aged ] [ is soldier ] [ is changed ] [ is SEP ] [ is aged ]: [ MEANS FOR SOLVING ] A [ soldier ] becomes [ north ] a [ Song ] an [ Song ] a [ king ] a [ emperor ] a [ issued ] a [ dynamic ] a [ political ] a [ variant ] a [ things ] a [ name ] a [ SEP ]. After processing the target question and the reply document to obtain a word sequence, the terminal equipment inputs a coding layer of the answer extraction model to the word sequence to obtain output vectors corresponding to the target question and each character in the reply document; and carrying out mixed predictive answers on the output vector to obtain the predictive answer set. It will be appreciated that the beginning word of the predicted answer is a token corresponding to the beginning character of the predicted answer in the word sequence (also referred to as starttoken), and the ending word of the predicted answer is a token corresponding to the ending character of the predicted answer in the word sequence (also referred to as endtoken). In an exemplary scheme, the predicted answer obtained according to the above reply document 1 is [ Zhao ] and [ Zha ] and [ ], then the starting word of the predicted answer is [ Zhao ] and the ending word is [ ].

In this embodiment, the answer extraction model may use the idea of self-attention (attention) mechanism to train the starting position and ending position of the predicted answer, respectively, so as to extract the predicted answer. It will be appreciated that the answer extraction model may also use other ways to extract the predicted answer, which is not limited herein.

603. Classifying the predicted answer sets according to the same predicted answer to obtain at least one answer set.

And after the terminal equipment acquires the predicted answer set corresponding to the target question, classifying the predicted answer set according to the same predicted answer to obtain at least one answer set. In the exemplary scenario shown in step 602, if the set of predicted answers includes 6 predicted answers, where 4 predicted answers are "Zhaokuangyin", then the four predicted answers are assigned to one answer set, while another "Zhao Guangyi" is assigned to one answer set and another "Chai Zongxun" is assigned to one answer set.

604. And invoking the answer reordering model to obtain at least one answer characterization set corresponding to the at least one answer set, wherein each answer set in the at least one answer set corresponds to one predicted answer, and the answer characterization in the at least one answer characterization set is a characterization vector of the predicted answer.

The terminal equipment invokes the answer reordering model to acquire a first output vector set of each answer document in the target question and a first answer document set, wherein the first answer document set comprises answer documents corresponding to a first answer set in the at least one answer set; selecting a character string of the predicted answer corresponding to the first answer document set from the first output vector set, and pooling to obtain an intermediate answer characterization set of each predicted answer in the first answer set; performing self-attention processing on the intermediate answer characterization set to obtain a first answer characterization of a predicted answer corresponding to the first answer set; and so on, traversing to obtain answer characterizations of the predicted answers corresponding to each answer set in the at least one answer set, and classifying the answer characterizations into the at least one answer characterizations set, wherein the first answer characterizations are contained in the at least one answer characterizations set.

Specifically, the terminal device may use the data preprocessing method in step 603 as described above when the terminal device obtains the first output vector set, that is, each character of the target question and each character of each reply document in the first reply document set are spliced by using the start character and the interval character to obtain the first word sequence set; inputting the first word sequence set into the answer reordering model to obtain the first output vector set; then the terminal equipment selects the character strings of the predicted answers corresponding to the first answer document set from the first output vector set to pool to obtain an intermediate answer characterization set of each predicted answer in the first answer set; then, carrying out self-attention processing on each intermediate answer token in the intermediate answer token set to obtain a self-attention prediction score set; normalizing the self-attention prediction score set to obtain a normalized prediction score set; and carrying out weighted summation on each normalized prediction score in the normalized prediction score set to obtain the first answer representation.

In an exemplary scenario, it is assumed that the answer corresponding to the first answer set is "Zhao Kuiyan", the answer documents corresponding to the first answer set are answer document 1 and answer document 2, and the target problem is "who has launched Chen Qiao mutiny historically", the answer document 1 is "Song Taizu Zhaozhang, which has established North Song after launching Chen Qiao mutiny", and the answer document 2 is "Chen Qiao mutiny is the event name of the North Song's emperor launched politically. The word sequence obtained after the data preprocessing can be as follows, namely, the word sequence of the target question and the reply document 1 is "[ CLS ] [ history ] [ over ] [ who sends ] [ animal ] [ bridge ] [ soldier ] [ SEP ] [ Song ] [ too ] [ Zaozuo ] [ Zhao ] [ ] [ Zhang ] [ Zhi Shen ] [ dynamic ] [ Chen ] [ bridge ] [ variant ] [ after ] [ construction ] [ immediately ] [ north ] [ Song ] [ SEP ]"; The word sequence of the target question and the reply document 2 is "[ CLS ] [ history ] [ over ] [ who [ is sent ] [ is driven ] [ is aged ] [ is soldier ] [ is changed ] [ is SEP ] [ is aged ]: [ MEANS FOR SOLVING ] A [ soldier ] becomes [ north ] a [ Song ] an [ Song ] a [ king ] a [ emperor ] a [ issued ] a [ dynamic ] a [ political ] a [ variant ] a [ things ] a [ name ] a [ SEP ]. If the answer corresponding to the second answer set is ' Zhao Guangyi ', the answer document 3 corresponds to ' who started Chen Qiao mutiny ' in history ', the answer document 3 is ' Zhao Guangyi and the brothers thereof establish the north sons ' after starting Chen Qiao mutiny ', the word sequence of the answer document 3 and the target question is ' CLS ' history ', ' is ' who sent ' over ' and ' how ' is sent ' and ' how ' is changed ' according to ' bridge ' and ' SEP ' ]. if the answer corresponding to the third answer set is "Chai Zongxun", the answer document 4 corresponds to the answer document, and the answer document 4 is "who started Chen Qiao mutiny" in history, the answer document 4 is "Chai Zongxun is the monarch of the death country after Chen Qiao mutiny", then the word sequence of the answer document 3 is "[ CLS ], [ history ], [ over ], [ who ] sent ], [ dynamic ], [ aged ], [ bridge ] [ soldier ], [ SEP ], [ in the middle ], [ soldier ], [ in the middle ], [ P ], [ in the middle ], [ [ bridge ], [ is ] is the death state ]. And then inputting the word sequences into a coding layer of the answer reordering model to obtain an output vector set of each word sequence. In this embodiment, in order to facilitate the terminal device to obtain the predicted answer from the output vector set, the predicted answer extracted by the answer extraction model may be labeled during data preprocessing. In this embodiment, the first answer set is described as "Zhao Kuang, and then the terminal device intercepts" Zhao Kuang "in each word sequence in the first answer set, and performs pooling processing to obtain an intermediate answer representation of the answer corresponding to the first answer set.

In this embodiment, the pooling process may use formula 1:

The formula 1 is: v _i＝avg_pooling(V_t,V_t+1,…,V_t+m); wherein, the V _t,V_t+1,…,V_t+m is used for indicating an output vector sequence corresponding to the predicted answer in a certain reply document, such as an output vector (V _t,V_t+1,V_t+2) corresponding to "Zhangkuangyin" in the reply document 1; the V _i is used to indicate an intermediate answer token for the ith answer document. The terminal equipment in the first answer set acquires 2 intermediate answer characterizations, namely an intermediate answer characterization V ₁ corresponding to 'Zhaozhangzhangzhang' in the answer document 1, and an intermediate answer characterization V ₂ corresponding to 'Zhaozhangzhangzhangzhangzhango' in the answer document 2.

And then the terminal equipment obtains the answer characterization of the predicted answer corresponding to the first answer set by utilizing the self-attention processing mechanism after obtaining the intermediate answer characterization. In this embodiment, the terminal device adopts a self-attention processing mechanism and can pass through formulas 2 to 4:

The formula 2 is: s _i＝V^Ttanh(WV_i +B);

the formula 3 is:

The formula 4 is: v _answer＝a₁V₁+a₂V₂+…+a_kV_i);

Wherein S _i in the formula 2 is used for indicating the self-attention score corresponding to each intermediate answer token, and V ^T is used for indicating rank of the parameter matrix V, where W is a parameter matrix; the B is a parameter vector. A _k in the formula 3 is used for indicating that the self-attention score corresponding to the intermediate answer representation is normalized to obtain a normalized self-attention score; v _answer in this formula 4 is used to indicate the answer characterization of the predicted answer to which this answer set corresponds. In one exemplary scenario, the output vector (V _t,V_t+1,V_t+2) corresponding to "Zhang Bun" in reply document 1; the V _i is used to indicate an intermediate answer token for the ith answer document. The terminal equipment in the first answer set acquires 2 intermediate answer characterizations, namely an intermediate answer characterization V ₁ corresponding to 'Zhaozhangzhangzhang' in the answer document 1, and an intermediate answer characterization V ₂ corresponding to 'Zhaozhangzhangzhangzhangzhango' in the answer document 2. The self-attention score corresponding to "Zhaokuan" in the answer document 1 is 80, and the self-attention score corresponding to "Zhaokuan" in the answer document 2 is 80; the normalized self-attention score corresponding to "Zhaokuangyin" in the answer document 1 is 0.5, and the normalized self-attention score corresponding to "Zhaokuangyin" in the answer document 2 is 0.5; the answer of "Zhaokuangying" corresponding to the first answer set is characterized as "0.5V ₁+0.5V₂".

And the terminal equipment performs the same processing on other answers, so as to obtain a plurality of answer characterization sets corresponding to the target questions.

605. And calling the answer reordering model to obtain the prediction scores of the prediction answers corresponding to the answer characterization set.

And the terminal equipment inputs each answer characteristic in the answer characteristic set into the answer reordering model to obtain the prediction scores of various prediction answers.

In this embodiment, the following formula 5 may be adopted for the terminal device to calculate the prediction score according to the answer characteristic:

equation 5 is: logits = WV _answer; wherein, logits is used to indicate the prediction score of the corresponding prediction answer for the answer characterization.

It can be understood that, in the process of training the answer reordering model, the terminal device may further obtain a corresponding loss value according to the predicted answer, and specifically may adopt the following formula 6 and formula 7:

Equation 6 is: prob= softmax (logits);

Equation 7 is: loss= -log (Prob _label);

Wherein, the formula 6 is used for indicating that the prediction score obtains a probability value through activating a function, then the formula 7 is used for indicating that the probability value is subjected to cross entropy processing to obtain a loss value of the answer reordering model, and the Prob _label is a probability value marked as 0 or 1.

606. And outputting a final answer corresponding to the target question according to the prediction score.

And the terminal equipment selects the answer with the highest predictive score to output as the final answer corresponding to the target question.

In this embodiment, the predicted answers extracted from the multiple answer documents are classified, and semantic features between the answer documents from which the same answer is extracted are obtained, so that the ordering effect in the answer reordering model is increased, and the accuracy of reading and understanding of multiple documents is improved.

Referring to fig. 7, fig. 7 is a schematic diagram illustrating an embodiment of a multi-document reading and understanding apparatus according to an embodiment of the present application, the multi-document reading and understanding apparatus 20 includes:

An obtaining module 201, configured to obtain a target question, a plurality of reply documents corresponding to the target question, and establish an answer extraction model and an answer reordering model;

a processing module 202, configured to invoke the answer extraction model to obtain a predicted answer set of the plurality of answer documents for the target question; classifying the predicted answer sets according to the same predicted answer to obtain at least one answer set; invoking the answer reordering model to obtain at least one answer characterization set corresponding to the at least one answer set, wherein each answer set in the at least one answer set corresponds to a predicted answer, and the answer characterization in the at least one answer characterization set is characterized by a characterization vector of the predicted answer; invoking the answer reordering model to obtain the prediction scores of the prediction answers corresponding to the answer characterization set;

and the output module 203 is configured to output a final answer corresponding to the target question according to the prediction score.

The embodiment of the application provides a multi-document reading and understanding device. By adopting the device, the predicted answers extracted by the plurality of answer documents are classified, and semantic features among the answer documents extracting the same answer are acquired, so that the ordering effect in an answer reordering model is improved, and the accuracy of reading and understanding of multiple documents is improved.

Alternatively, in another embodiment of the multiple document reading and understanding apparatus 20 provided in the embodiment of the present application based on the embodiment corresponding to fig. 7 described above,

The processing module 202 is specifically configured to invoke the answer reordering model to obtain a first output vector set of each answer document in the target question and a first answer document set, where the first answer document set includes answer documents corresponding to a first answer set in the at least one answer set;

The embodiment of the application provides a multi-document reading and understanding device. By adopting the device, the same answers in the answer set are divided into one set, each word sequence of the answer documents in the same set is subjected to pooling, and the pooled token vectors are subjected to self-attention processing to obtain the final answer token of the same answer, so that the contexts of different answer documents are processed, the final answer token has rich semantic features, and the accuracy of reading and understanding of multiple documents is improved.

Optionally, in another embodiment of the multi-document reading and understanding device 20 provided in the embodiment of the present application based on the embodiment corresponding to fig. 7, the processing module 202 is specifically configured to splice each character of the target question with each character of each reply document in the first reply document set, and obtain a first word sequence set by using the start character and the interval character;

The embodiment of the application provides a multi-document reading and understanding device. By adopting the device, the representation of the output vector of each character can be improved, so that the accuracy of reading and understanding of multiple documents is improved.

Optionally, in another embodiment of the multi-document reading and understanding device 20 according to the embodiment of fig. 7, the processing module 202 is specifically configured to perform self-attention processing on each intermediate answer token in the intermediate answer token set to obtain a self-attention prediction score set;

The embodiment of the application provides a multi-document reading and understanding device. By adopting the device, after the prediction scores of the answers are weighted and summed, the context of each different answer document is processed, so that the final answer characterization has rich semantic features, and the accuracy of reading and understanding of multiple documents is improved.

The acquisition module is also used for acquiring a training sample set, and establishing an initial answer extraction model and an initial answer reordering model, wherein the training sample set comprises a question sample set and a answer document sample set corresponding to the question sample set;

As shown in fig. 8, the apparatus further includes a training module 204 for training the initial answer extraction model and the initial answer reordering model to obtain the answer extraction model and the answer reordering model by using the training sample set.

The embodiment of the application provides a multi-document reading and understanding device. By adopting the device, model training is carried out according to the training sample set, and simultaneously, the predicted answers extracted by a plurality of answer documents are classified according to the types in the training process, so that semantic features among the answer documents of the same answer are obtained, the ordering effect in a reordering network is improved, and the accuracy of reading and understanding of multiple documents is improved.

Optionally, based on the embodiment corresponding to fig. 8, in another embodiment of the multi-document reading and understanding device 20 provided by the embodiment of the present application, the initial answer extraction model and the initial answer reordering model share the same coding layer, and the training module 204 is specifically configured to invoke the coding layer to obtain a second output vector corresponding to the training sample set;

The embodiment of the application provides a multi-document reading and understanding device. By adopting the device, the answer extraction model and the answer reordering model share the coding layer, so that the two models can be combined for learning, and meanwhile, the effect of the answer extraction model and the answer reordering model is improved.

Optionally, based on the embodiment corresponding to fig. 8, in another embodiment of the multi-document reading and understanding device 20 provided in the embodiment of the present application, the training module 204 is specifically configured to input the output vector into the initial answer extraction model to obtain the training answer set;

The training module 204 is specifically configured to input the training answer set into the initial answer reordering model to obtain a training answer characterization set corresponding to the training answer set;

The embodiment of the application provides a multi-document reading and understanding device. By adopting the device, the answer extraction model and the answer reordering model share the coding layer, and meanwhile, the effects of the answer extraction model and the answer reordering model are improved.

Referring to fig. 9, fig. 9 is a schematic diagram of a server structure according to an embodiment of the present application, where the server 300 may have a relatively large difference due to different configurations or performances, and may include one or more central processing units (central processing units, CPU) 322 (e.g., one or more processors) and a memory 332, and one or more storage mediums 330 (e.g., one or more mass storage devices) storing application programs 342 or data 344. Wherein the memory 332 and the storage medium 330 may be transitory or persistent. The program stored on the storage medium 330 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, the central processor 322 may be configured to communicate with the storage medium 330 and execute a series of instruction operations in the storage medium 330 on the server 300.

The Server 300 may also include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input/output interfaces 358, and/or one or more operating systems 341, such as Windows Server ^TM,Mac OS X^TM,Unix^TM,Linux^TM,FreeBSD^TM, or the like.

The steps performed by the server in the above embodiments may be based on the server structure shown in fig. 9.

The multi-document reading and understanding device provided by the application can be used for terminal equipment, please refer to fig. 10, only the relevant parts of the embodiment of the application are shown for convenience of explanation, and specific technical details are not disclosed, and please refer to the method parts of the embodiment of the application. In the embodiment of the application, a terminal device is taken as a smart phone for example to describe:

Fig. 10 is a block diagram showing a part of a structure of a smart phone related to a terminal device provided by an embodiment of the present application. Referring to fig. 10, the smart phone includes: radio Frequency (RF) circuitry 410, memory 420, input unit 430, display unit 440, sensor 450, audio circuitry 460, wireless fidelity (WIRELESS FIDELITY, wiFi) module 470, processor 480, and power supply 490. Those skilled in the art will appreciate that the smartphone structure shown in fig. 10 is not limiting of the smartphone and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

The following describes each component of the smart phone in detail with reference to fig. 10:

The RF circuit 410 may be used for receiving and transmitting signals during the process of receiving and transmitting information or communication, in particular, after receiving downlink information of the base station, the downlink information is processed by the processor 480; in addition, the data of the design uplink is sent to the base station. In general, RF circuitry 410 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (low noise amplifier, LNA), a duplexer, and the like. In addition, the RF circuitry 410 may also communicate with networks and other devices via wireless communications. The wireless communications may use any communication standard or protocol including, but not limited to, global System for Mobile communications (global system of mobile communication, GSM), general packet radio service (GENERAL PACKET radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), long term evolution (long term evolution, LTE), email, short message service (short MESSAGING SERVICE, SMS), and the like.

The memory 420 may be used to store software programs and modules, and the processor 480 may perform various functional applications and data processing of the smartphone by executing the software programs and modules stored in the memory 420. The memory 420 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebooks, etc.) created according to the use of the smart phone, etc. In addition, memory 420 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The input unit 430 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the smart phone. In particular, the input unit 430 may include a touch panel 431 and other input devices 432. The touch panel 431, also referred to as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on the touch panel 431 or thereabout using any suitable object or accessory such as a finger, a stylus, etc.), and drive the corresponding connection device according to a predetermined program. Alternatively, the touch panel 431 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 480, and can receive commands from the processor 480 and execute them. In addition, the touch panel 431 may be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. The input unit 430 may include other input devices 432 in addition to the touch panel 431. In particular, other input devices 432 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.

The display unit 440 may be used to display information input by a user or information provided to the user and various menus of the smart phone. The display unit 440 may include a display panel 441, and optionally, the display panel 441 may be configured in the form of a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), or the like. Further, the touch panel 431 may cover the display panel 441, and when the touch panel 431 detects a touch operation thereon or nearby, the touch operation is transmitted to the processor 480 to determine the type of the touch event, and then the processor 480 provides a corresponding visual output on the display panel 441 according to the type of the touch event. Although in fig. 10, the touch panel 431 and the display panel 441 are two separate components to implement the input and input functions of the smart phone, in some embodiments, the touch panel 431 and the display panel 441 may be integrated to implement the input and output functions of the smart phone.

The smartphone may also include at least one sensor 450, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel 441 according to the brightness of ambient light, and a proximity sensor that may turn off the display panel 441 and/or the backlight when the smartphone is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and direction when stationary, and can be used for identifying the application of the gesture of the smart phone (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration identification related functions (such as pedometer and knocking), and the like; other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that may also be configured with the smart phone are not described in detail herein.

Audio circuitry 460, speaker 461, microphone 462 can provide an audio interface between the user and the smartphone. The audio circuit 460 may transmit the received electrical signal after the audio data conversion to the speaker 461, and the electrical signal is converted into a sound signal by the speaker 461 and output; on the other hand, microphone 462 converts the collected sound signals into electrical signals, which are received by audio circuit 460 and converted into audio data, which are processed by audio data output processor 480, and transmitted via RF circuit 410 to, for example, another smart phone, or which are output to memory 420 for further processing.

WiFi belongs to a short-distance wireless transmission technology, and a smart phone can help a user to send and receive emails, browse webpages, access streaming media and the like through a WiFi module 470, so that wireless broadband Internet access is provided for the user. Although fig. 10 shows a WiFi module 470, it is understood that it does not belong to the essential constitution of a smart phone, and can be omitted entirely as required within the scope of not changing the essence of the invention.

The processor 480 is a control center of the smart phone, connects various parts of the entire smart phone using various interfaces and lines, and performs various functions and processes data of the smart phone by running or executing software programs and/or modules stored in the memory 420 and invoking data stored in the memory 420, thereby performing overall monitoring of the smart phone. Optionally, the processor 480 may include one or more processing units; alternatively, the processor 480 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 480.

The smart phone also includes a power supply 490 (e.g., a battery) for powering the various components, optionally in logical communication with the processor 480 through a power management system that performs functions such as managing charge, discharge, and power consumption.

Although not shown, the smart phone may further include a camera, a bluetooth module, etc., which will not be described herein.

The steps performed by the terminal device in the above-described embodiments may be based on the terminal device structure shown in fig. 10.

Embodiments of the present application also provide a computer-readable storage medium having a computer program stored therein, which when run on a computer, causes the computer to perform the method as described in the foregoing embodiments.

Embodiments of the present application also provide a computer program product comprising a program which, when run on a computer, causes the computer to perform the method described in the previous embodiments.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A multiple document reading understanding method, comprising:

Acquiring a target question, a plurality of reply documents corresponding to the target question, and establishing an answer extraction model and an answer reordering model;

invoking the answer extraction model to obtain a predicted answer set of the plurality of answer documents to the target questions;

Invoking the answer reordering model to obtain at least one answer characterization set corresponding to the at least one answer set, including: invoking the answer reordering model to acquire a first output vector set of each answer document in the target question and a first answer document set, wherein the first answer document set comprises answer documents corresponding to a first answer set in the at least one answer set; selecting the character strings of the predicted answers corresponding to the first answer document set from the first output vector set, and pooling to obtain an intermediate answer characterization set of each predicted answer in the first answer set; performing self-attention processing on the intermediate answer characterization set to obtain a first answer characterization of a predicted answer corresponding to the first answer set; and so on, traversing to obtain answer characterizations of predicted answers corresponding to each answer set in the at least one answer set, and classifying the answer characterizations into the at least one answer characterizations set, wherein the first answer characterizations are contained in the at least one answer characterizations set, each answer set in the at least one answer set corresponds to one predicted answer, and the answer characterizations in the at least one answer characterizations set are characterization vectors of the predicted answers;

2. The method of claim 1, wherein invoking the answer reordering model to obtain the target question and a first set of output vectors for each reply document in a first set of reply documents comprises:

Splicing each character of the target question with each character of each reply document in the first reply document set by using the start character and the interval character to obtain a first word sequence set;

and inputting the first word sequence set into the answer reordering model to obtain the first output vector set.

3. The method of claim 1, wherein the self-attentive processing of the set of intermediate answer tokens to obtain a first answer token of a predicted answer corresponding to the first answer set comprises:

Performing self-attention processing on each intermediate answer token in the intermediate answer token set to obtain a self-attention prediction score set;

4. A method according to any one of claims 1 to 3, further comprising:

acquiring a training sample set, and establishing an initial answer extraction model and an initial answer reordering model, wherein the training sample set comprises a question sample set and a answer document sample set corresponding to the question sample set;

And training the initial answer extraction model and the initial answer reordering model by using the training sample set to obtain the answer extraction model and the answer reordering model.

5. The method of claim 4, wherein the initial answer extraction model and the initial answer reordering model share a same coding layer, wherein training the initial answer extraction model and the initial answer reordering model using the training sample set to obtain the answer extraction model and the answer reordering model comprises:

Invoking the coding layer to acquire a second output vector corresponding to the training sample set;

And reversely adjusting weight parameters of the initial answer extraction model and the initial answer reordering model by using the sum of the first loss value and the second loss value to obtain the answer extraction model and the answer reordering model.

6. The method of claim 5, wherein inputting the output vector into the initial answer extraction model to obtain a first loss value and a training answer set comprises:

inputting the output vector into the initial answer extraction model to obtain the training answer set;

The step of inputting the training answer set into the initial answer reordering model to obtain a second loss value includes:

Inputting the training answer set into the initial answer reordering model to obtain a training answer characterization set corresponding to the training answer set;

converting each training predictive score in the training predictive score set to a third probability value using an activation function;

7. A multiple document reading and understanding device, comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a target question, a plurality of reply documents corresponding to the target question, an answer extraction model and an answer reordering model;

the processing module is used for calling the answer extraction model to obtain a predicted answer set of the plurality of answer documents on the target questions; classifying the predicted answer sets according to the same predicted answer to obtain at least one answer set; invoking the answer reordering model to obtain at least one answer characterization set corresponding to the at least one answer set, including: invoking the answer reordering model to acquire a first output vector set of each answer document in the target question and a first answer document set, wherein the first answer document set comprises answer documents corresponding to a first answer set in the at least one answer set; selecting the character strings of the predicted answers corresponding to the first answer document set from the first output vector set, and pooling to obtain an intermediate answer characterization set of each predicted answer in the first answer set; performing self-attention processing on the intermediate answer characterization set to obtain a first answer characterization of a predicted answer corresponding to the first answer set; and so on, traversing to obtain answer characterizations of predicted answers corresponding to each answer set in the at least one answer set, and classifying the answer characterizations into the at least one answer characterizations set, wherein the first answer characterizations are contained in the at least one answer characterizations set, each answer set in the at least one answer set corresponds to one predicted answer, and the answer characterizations in the at least one answer characterizations set are characterization vectors of the predicted answers; invoking the answer reordering model to obtain the prediction scores of the prediction answers corresponding to the answer characterization set;

8. The apparatus according to claim 7, wherein the processing module is specifically configured to:

9. The apparatus according to claim 7, wherein the processing module is specifically configured to:

10. The apparatus according to any one of claims 7 to 9, further comprising: a training module;

The acquisition module is further used for acquiring a training sample set and establishing an initial answer extraction model and an initial answer reordering model, wherein the training sample set comprises a question sample set and a answer document sample set corresponding to the question sample set;

The training module is configured to train the initial answer extraction model and the initial answer reordering model to obtain the answer extraction model and the answer reordering model by using the training sample set.

11. The apparatus of claim 10, wherein the initial answer extraction model and the initial answer reordering model share the same coding layer, the training module being specifically configured to:

12. The apparatus according to claim 11, wherein the training module is specifically configured to:

The training module is specifically further configured to:

13. A computer device, comprising: a memory, a processor, and a bus system;

wherein the memory is used for storing programs;

the processor being for executing a program in the memory, the processor being for executing the method of any one of claims 1 to 6 according to instructions in program code;

The bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.

14. A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 6.

15. A computer program product, the computer program product comprising computer instructions stored in a computer readable storage medium; a processor of a computer device reads the computer instructions from the computer readable storage medium, the processor executing the computer instructions, causing the computer device to perform the method of any one of claims 1 to 6.