CN113792550A - Method and device for determining predicted answer and method and device for reading and understanding - Google Patents

Method and device for determining predicted answer and method and device for reading and understanding Download PDF

Info

Publication number
CN113792550A
CN113792550A CN202111110989.8A CN202111110989A CN113792550A CN 113792550 A CN113792550 A CN 113792550A CN 202111110989 A CN202111110989 A CN 202111110989A CN 113792550 A CN113792550 A CN 113792550A
Authority
CN
China
Prior art keywords
word
graph network
feature vector
initial
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111110989.8A
Other languages
Chinese (zh)
Inventor
潘璋
李长亮
李小龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Digital Entertainment Co Ltd
Original Assignee
Beijing Kingsoft Digital Entertainment Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Digital Entertainment Co Ltd filed Critical Beijing Kingsoft Digital Entertainment Co Ltd
Priority to CN202111110989.8A priority Critical patent/CN113792550A/en
Publication of CN113792550A publication Critical patent/CN113792550A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The application provides a method and a device for determining a predicted answer, and a method and a device for reading and understanding, wherein the method for determining the predicted answer comprises the following steps: converting the value of each dimension of a target hidden layer feature vector into at least one prediction probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and the at least one prediction probability corresponding to each dimension represents the probability that the prediction label of the word unit corresponding to each dimension is at least one label; determining a prediction label of a word unit corresponding to each dimension based on at least one prediction probability corresponding to each dimension; and determining a predicted answer based on the predicted label of the word unit corresponding to each dimension. By means of the sequence labeling, the prediction label of each word unit can be determined, the prediction answer can be determined according to the prediction label, when the model parameters are adjusted, the prediction label of the correct prediction answer can be closer to the correct label, and the training efficiency and the use accuracy of the reading understanding model can be improved by means of the method.

Description

Method and device for determining predicted answer and method and device for reading and understanding
Technical Field
The present application relates to the field of natural language processing technologies, and in particular, to a method and an apparatus for determining a predicted answer, a method and an apparatus for reading and understanding, a computing device, and a computer-readable storage medium.
Background
Machine reading understanding is a research dedicated to teach machine reading human language and understand its connotation, and is widely used as a hot direction in the field of natural language processing with the development of natural language processing technology. The machine reading understanding task focuses more on the understanding of the text and learns the relevant information from the text so that questions related to the text can be answered.
In the prior art, a method for training a machine to understand a text is mainly to construct a model to be trained, and obtain a reading understanding model meeting requirements by training the model to be trained, so that the reading understanding model can complete a reading understanding task as accurately as possible. Specifically, the sample question and the sample answer may be input into the model to be trained as training samples, the model to be trained may output a predicted answer, and the model to be trained is optimized according to a difference between the predicted answer and the sample answer, so as to obtain a desired reading understanding model.
However, the above method only considers the correlation between the questions and the answers, which is relatively single, and some questions may be applicable to different texts, and the obtained answers for different texts are different, and in addition, the method directly determines the predicted answer based on the sample question and the sample answer, which considers that the sample question and the sample answer are integrated, the obtained predicted answer has a low accuracy, which may result in an increase in the number of times of model training, and therefore, the training efficiency of training the reading understanding model by the above method is low, and the accuracy of the reading understanding task performed by the trained reading understanding model may be low.
Disclosure of Invention
In view of the above, the present disclosure provides a method for determining a predicted answer. The application also relates to a reading understanding model training method, a reading understanding method, a predicted answer determining device, a reading understanding model training device, a reading understanding device, a computing device and a computer readable storage medium, which are used for solving the technical defects in the prior art.
According to a first aspect of embodiments of the present application, there is provided a method for training a reading understanding model, including:
constructing an initial first graph network of sample text fragments and sample answers and an initial second graph network of sample questions and the sample answers by reading a graph construction network layer of an understanding model;
inputting the sample text segment, the sample question and the sample answer into a text processing layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network;
inputting the first graph network and the second graph network into a graph volume network layer of the reading understanding model to obtain a predicted answer;
training the reading understanding model based on the difference between the predicted answer and the sample answer until a training stop condition is reached.
According to a second aspect of embodiments of the present application, there is provided a reading understanding method including:
constructing an initial first graph network of a target text and a target answer and an initial second graph network of a target question and the target answer through a graph construction network layer of a reading understanding model, wherein the reading understanding model is trained by the method of the first aspect;
inputting the target text, the target question and the target answer into a text processing layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network;
and inputting the first graph network and the second graph network into a graph volume network layer of the reading understanding model to obtain an answer of the target question.
According to a third aspect of embodiments of the present application, there is provided a method for determining a predicted answer, including:
converting the value of each dimension of a target hidden layer feature vector into at least one prediction probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and the at least one prediction probability corresponding to each dimension represents the probability that the prediction label of the word unit corresponding to each dimension is at least one label;
determining a prediction label of a word unit corresponding to each dimension based on at least one prediction probability corresponding to each dimension;
and determining a predicted answer based on the predicted label of the word unit corresponding to each dimension.
According to a fourth aspect of embodiments of the present application, there is provided another reading and understanding method, including:
constructing an initial first graph network of a target text and a target answer and an initial second graph network of a target question and the target answer by reading a graph construction network layer of an understanding model, wherein the reading understanding model is obtained by training the method of the first aspect or the third aspect;
inputting the target text, the target question and the target answer into a text processing layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network;
inputting the first graph network and the second graph network into a graph convolution network layer of the reading understanding model, and determining a target hidden layer feature vector;
converting the value of each dimension of the target hidden layer feature vector into at least one probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and at least one probability corresponding to each dimension represents the probability that the label of the word unit corresponding to each dimension is at least one label;
determining a label of a word unit corresponding to each dimension based on at least one probability corresponding to each dimension;
and determining an answer of the target question based on the label of the word unit corresponding to each dimension.
According to a fifth aspect of embodiments of the present application, there is provided an apparatus for determining a predicted answer, including:
the first conversion module is configured to convert a value of each dimension of a target hidden layer feature vector into at least one prediction probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and the at least one prediction probability corresponding to each dimension represents the probability that a prediction label of the word unit corresponding to each dimension is at least one label;
a first determining module configured to determine a prediction label of a word unit corresponding to each dimension based on at least one prediction probability corresponding to each dimension;
a second determining module configured to determine a predicted answer based on the predicted label of the word unit corresponding to each dimension.
According to a sixth aspect of embodiments of the present application, there is provided a reading and understanding apparatus, comprising:
a graph network construction module configured to construct an initial first graph network of a target text and a target answer and an initial second graph network of a target question and the target answer by reading a graph construction network layer of an understanding model, wherein the reading understanding model is trained by the method of the first aspect or the third aspect;
a text processing module configured to input the target text, the target question and the target answer into a text processing layer of the reading understanding model, and add attention values to nodes and edges included in the initial first graph network and the initial second graph network, respectively, to obtain a first graph network and a second graph network;
a third determination module configured to input the first graph network and the second graph network into a graph convolution network layer of the reading understanding model, and determine a target hidden layer feature vector;
a second conversion module configured to convert a value of each dimension of the target hidden layer feature vector into at least one probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and the at least one probability corresponding to each dimension represents a probability that a label of the word unit corresponding to each dimension is at least one label;
a fourth determining module configured to determine a label of a word unit corresponding to each dimension based on at least one probability corresponding to each dimension;
a fifth determining module configured to determine an answer to the target question based on the label of the word unit corresponding to each dimension.
According to a seventh aspect of embodiments of the present application, there is provided a training apparatus for reading an understanding model, including:
a first graph network construction module configured to construct an initial first graph network of sample text fragments and sample answers by reading a graph construction network layer of an understanding model, and construct an initial second graph network of sample questions and the sample answers;
a first text processing module configured to input the sample text segment, the sample question, and the sample answer into a text processing layer of the reading understanding model, and add attention values to nodes and edges included in the initial first graph network and the initial second graph network, respectively, to obtain a first graph network and a second graph network;
the prediction module is configured to input the first graph network and the second graph network into a graph volume network layer of the reading understanding model to obtain a prediction answer;
a training module configured to train the reading understanding model based on a difference between the predicted answer and the sample answer until a training stop condition is reached.
According to an eighth aspect of embodiments of the present application, there is provided a reading and understanding apparatus including:
a second graph network construction module configured to construct an initial first graph network of the target text and the target answer and an initial second graph network of the target question and the target answer by reading a graph construction network layer of the understanding model;
a second text processing module configured to input the target text, the target question and the target answer into a text processing layer of the reading understanding model, and add attention values to nodes and edges included in the initial first graph network and the initial second graph network, respectively, to obtain a first graph network and a second graph network;
a determination module configured to input the first graph network and the second graph network into a graph volume network layer of the reading understanding model, and determine an answer to the target question.
According to a ninth aspect of embodiments of the present application, there is provided a computing device comprising a memory, a processor and computer instructions stored on the memory and executable on the processor, the processor implementing the steps of the method of the first, second, third or fourth aspect when executing the instructions.
According to a tenth aspect of embodiments of the present application, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the method of the first, second, third or fourth aspect described above.
According to an eleventh aspect of embodiments of the present application, there is provided a chip storing computer instructions that, when executed by the chip, implement the steps of the method according to the first, second, third or fourth aspect.
In the embodiment of the application, an initial first graph network of a sample text fragment and a sample answer is constructed by reading a graph construction network layer of an understanding model, and an initial second graph network of a sample question and the sample answer is constructed; inputting the sample text segment, the sample question and the sample answer into a text processing layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network; inputting the first graph network and the second graph network into a graph volume network layer of the reading understanding model to obtain a predicted answer; training the reading understanding model based on the difference between the predicted answer and the sample answer until a training stop condition is reached. By the method, the incidence relation among the sample text segment, the sample question and the sample answer can be effectively utilized, the reading understanding model is trained by combining the incidence relation among the sample text segment, the sample question and the sample answer, and the accuracy of the reading understanding task executed by the reading understanding model can be improved.
In the embodiment of the application, a value of each dimension of a target hidden layer feature vector is converted into at least one prediction probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and the at least one prediction probability corresponding to each dimension represents the probability that a prediction label of the word unit corresponding to each dimension is at least one label; determining a prediction label of a word unit corresponding to each dimension based on at least one prediction probability corresponding to each dimension; and determining a predicted answer based on the predicted label of the word unit corresponding to each dimension. By means of the sequence labeling, the prediction label of each word unit can be determined, the prediction answer can be determined according to the prediction label, when the model parameters are adjusted, the prediction label of the correct prediction answer can be closer to the correct label, and the method can improve the training efficiency and the use accuracy of the reading understanding model.
Drawings
FIG. 1 is a block diagram of a computing device according to an embodiment of the present application;
FIG. 2 is a flow chart of a method for training a reading understanding model according to an embodiment of the present application;
FIG. 3 is a data flow diagram between layers for reading and understanding a model during model training according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an initial third graph network provided by an embodiment of the present application;
FIG. 5 is a schematic diagram of an initial first graph network provided by an embodiment of the present application;
FIG. 6 is a schematic diagram of an initial fourth graph network provided by an embodiment of the present application;
FIG. 7 is a schematic diagram of an initial second graph network provided by an embodiment of the present application;
FIG. 8 is a process flow diagram of a reading understanding model training method applied to choice questions according to an embodiment of the present application;
FIG. 9 is a flow chart of a reading understanding method provided by an embodiment of the present application;
FIG. 10 is a diagram illustrating a flow of data between layers of a read understanding model when applied to an application according to an embodiment of the present application;
FIG. 11 is a schematic diagram of another initial first graph network provided by an embodiment of the present application;
FIG. 12 is a schematic diagram of another initial second graph network provided by an embodiment of the present application;
FIG. 13 is a process flow diagram of a reading understanding model applied to a choice question provided by an embodiment of the present application;
FIG. 14 is a schematic diagram of a reading comprehension model training apparatus according to an embodiment of the present application;
fig. 15 is a schematic structural diagram of a reading and understanding apparatus according to an embodiment of the present application.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.
The terminology used in the one or more embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the present application. As used in one or more embodiments of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present application refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments of the present application to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first aspect may be termed a second aspect, and, similarly, a second aspect may be termed a first aspect, without departing from the scope of one or more embodiments of the present application. The word "if," as used herein, may be interpreted as "responsive to a determination," depending on the context.
First, the noun terms to which one or more embodiments of the present application relate are explained.
The Bert model: the Bidirectional Encoder reconstruction from Transformer is a dynamic word vector technology, a Bidirectional Transformer model is adopted to train a non-labeled data set, and characteristic information of preceding and following words is comprehensively considered, so that the problems of one word ambiguity and the like can be better solved.
GCN model: graph connected Network, Graph convolution Network model, can be used to extract features of a Graph.
Word vector: a representation of a word is intended to enable a computer to process the word.
Word embedding: refers to the process of embedding a high-dimensional space with the number of all words into a continuous vector space with a much lower dimension, each word or phrase being mapped as a vector on the real number domain.
Word unit: before any actual processing of the input text, it needs to be segmented into language units such as words, punctuation marks, numbers or letters, which are called word units. For English text, a word unit can be a word, a punctuation mark, a number, etc.; for Chinese text, the smallest word unit can be a word, a punctuation mark, a number, etc.
word2 vec: a method for word embedding processing is an efficient word vector training method constructed by Mikolov on the basis of Bengio Neural Network Language Model (NNLM). Namely, the method can be used for carrying out word embedding processing on the text to obtain a word vector of the text.
A first word unit: in a model training stage of reading and understanding a model, a first word unit is a word unit obtained after word segmentation processing is carried out on a sample text fragment; in the stage of executing the reading understanding task by the reading understanding model, the first word unit is a word unit obtained after word segmentation processing is carried out on the target text.
The first word unit group: a word unit group composed of a plurality of first word units.
A second word unit: in a model training stage of reading and understanding the model, the second word unit is a word unit obtained after word segmentation processing is carried out on the sample problem; in the stage of executing the reading understanding task by the reading understanding model, the second word unit is a word unit obtained after performing word segmentation processing on the target problem.
The second word unit group: a word unit group composed of a plurality of second word units.
A third word unit: in the model training stage of the reading understanding model, the second word unit is a word unit obtained after the word segmentation processing is carried out on the sample answer; in the stage of executing the reading understanding task by the reading understanding model, the second word unit is a word unit obtained after performing word segmentation processing on the target answer.
The third word unit group: a word unit group composed of a plurality of third word units.
A first feature vector: in a model training stage of the reading understanding model, a first feature vector is a vector obtained after word embedding processing is carried out on a first word unit in a sample text fragment; in the stage of executing the reading understanding task by the reading understanding model, the first feature vector is a vector obtained by performing word embedding processing on a first word unit of the target text.
First feature vector group: and the characteristic vector group is formed by a plurality of first characteristic vectors.
Second feature vector: in a model training stage of reading and understanding the model, the second feature vector is a vector obtained after word embedding processing is carried out on a second word unit in the sample problem; in the stage of executing the reading understanding task by the reading understanding model, the first feature vector is a vector obtained by performing word embedding processing on the second word unit of the target problem.
Second feature vector group: and the feature vector group is formed by a plurality of second feature vectors.
The third feature vector: in the model training stage of the reading understanding model, the third feature vector is a vector obtained after word embedding processing is carried out on a third word unit in the sample answer; in the stage of executing the reading understanding task by the reading understanding model, the third feature vector is a vector obtained by performing word embedding processing on a third word unit of the target answer.
Third feature vector group: and the characteristic vector group is formed by a plurality of third characteristic vectors.
Initial first graph network: in a model training stage of reading and understanding a model, an initial first graph network is a graph network for representing the incidence relation between a sample text fragment and a sample answer; in the stage of executing the reading understanding task by the reading understanding model, the initial first graph network is a graph network which represents the incidence relation between the target text and the target answer.
Initial second graph network: in a model training stage of reading and understanding the model, the initial second graph network is a graph network for representing the incidence relation between the sample question and the sample answer; in the stage of executing the reading understanding task by the reading understanding model, the initial second graph network is a graph network for representing the incidence relation between the target question and the target answer.
Initial third graph network: in a model training stage of reading and understanding the model, the initial third graph network is a graph network for representing the dependency relationship among word units in the sample text segment; in the stage of executing the reading understanding task by the reading understanding model, the initial third graph network is a graph network for representing the dependency relationship between word units in the target text.
Initial fourth graph network: in a model training phase of reading and understanding the model, the initial third graph network is a graph network for representing the dependency relationship between word units in the sample problem; in the stage of executing the reading understanding task by the reading understanding model, the initial third graph network is a graph network for characterizing the dependency relationship between word units in the target problem.
First graph network: initial first graph network including attention values of nodes and attention values of edges
Second graph network: initial second graph network comprising attention values of nodes and attention values of edges
First hidden layer feature vector: and the vector representation of the first graph network is obtained after the convolution processing is carried out on the first graph network through the graph convolution network layer.
Second hidden layer feature vector: and the second graph network is subjected to convolution processing by the graph convolution network layer to obtain vector representation of the second graph network.
Target hidden layer feature vector: and combining the first hidden layer feature vector and the second hidden layer feature vector to obtain a vector representation.
The present application provides a reading comprehension model training method, and the present application also relates to a reading comprehension model training device, a computing device, and a computer readable storage medium, which are described in detail in the following embodiments one by one.
FIG. 1 shows a block diagram of a computing device 100 according to an embodiment of the present application. The components of the computing device 100 include, but are not limited to, memory 110 and processor 120. The processor 120 is coupled to the memory 110 via a bus 130 and a database 150 is used to store data.
Computing device 100 also includes access device 140, access device 140 enabling computing device 100 to communicate via one or more networks 160. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 140 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In one embodiment of the present application, the above-mentioned components of the computing device 100 and other components not shown in fig. 1 may also be connected to each other, for example, by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 1 is for purposes of example only and is not limiting as to the scope of the present application. Those skilled in the art may add or replace other components as desired.
Computing device 100 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 100 may also be a mobile or stationary server.
Wherein, the processor 120 may execute the steps in the training method of reading understanding model shown in fig. 2. Fig. 2 shows a flowchart of a training method for reading understanding models, which includes steps 202 to 210, according to an embodiment of the present application.
Step 202, an initial first graph network of the sample text fragments and the sample answers is constructed by reading the graph construction network layer of the understanding model, and an initial second graph network of the sample questions and the sample answers is constructed.
Wherein the reading understanding model is used for executing the reading understanding task, and the correct answer of the question can be output under the condition of the given text, the question and the answer to be selected. The sample answer is a correct answer to the sample question corresponding to the sample text fragment. The sample text segment may be any text segment obtained by segmenting the sample text.
The initial first graph network is used for representing the incidence relation between the sample text fragments and the sample answers, and the initial second graph network is used for representing the incidence relation between the sample questions and the sample answers.
In some embodiments, the training data set may be constructed in advance from a plurality of sample texts, a plurality of sample questions, and a plurality of sample answers.
As an example, there is a corresponding relationship among a plurality of sample texts, a plurality of sample questions and a plurality of sample answers, since the sample texts are usually chapter-level texts, the data size is relatively large, and the model processing is relatively difficult, each sample text can be segmented or sentence-wise processed to obtain a plurality of sample text fragments of each sample text, and the plurality of sample text fragments of each sample text all correspond to the sample questions and sample answers corresponding to the sample text, so that a plurality of sample text fragments, a plurality of sample questions and a plurality of sample answers can be stored in the training data set, and a corresponding relationship exists among the sample text fragments, the sample questions and the sample answers, and one sample text fragment, one sample question and one sample answer in the corresponding relationship can be referred to as a set of training data.
As another example, since the sample text is usually a chapter-level text, the data size is large, and the model processing is difficult, each sample text may be segmented or sentence-divided to obtain a plurality of sample text fragments of each sample text. Taking a reference sample text as an example, a plurality of sample text segments of the reference sample text may be referred to as reference sample text segments, a sample question corresponding to the reference sample text may be referred to as a reference sample question, and a sample answer corresponding to both the reference sample text and the reference sample question may be referred to as a reference sample answer, then the plurality of sample text segments may be respectively matched with the reference sample question, a plurality of first similarities may be determined, and the plurality of sample text segments may be respectively matched with the reference sample answer, a plurality of second similarities may be determined, a reference sample text segment whose first similarity and second similarity are both greater than a similarity threshold may be obtained, and it may be considered that the obtained reference sample text segment has a strong association with the reference sample question and the reference sample text segment has a strong association with the reference sample answer, therefore, the obtained reference sample text fragment, the reference sample question and the reference sample answer may be taken as a set of training data. And performing the above processing on each sample text to obtain multiple groups of training data, wherein the sample text segments in each group of training data have higher relevance with the corresponding sample questions and sample answers.
By the two exemplary ways described above, a training data set including multiple sets of training data may be created, and the multiple sets of training data may be obtained from the training data set and input into the graph building network layer of the reading understanding model.
Illustratively, referring to fig. 3, a sample text fragment, a sample question, and a sample answer may be input into a graph building network layer of the reading understanding model, an initial first graph network based on the sample text fragment and the sample answer, and an initial second graph network based on the sample question and the sample answer.
In implementation, the specific implementation of constructing an initial first graph network of sample text fragments and sample answers and an initial second graph network of sample questions and the sample answers by reading a graph construction network layer of the understanding model may include: and constructing an initial third graph network based on the dependency relationship among the word units in the sample text segment, and constructing an initial fourth graph network based on the dependency relationship among the word units in the sample question. And constructing the initial first graph network based on the incidence relation between the initial third graph network and the sample answer, and constructing the initial second graph network based on the incidence relation between the initial fourth graph network and the sample answer.
Wherein the initial third graph network is used for characterizing the dependency relationship between word units in the sample text segment. The initial fourth graph network is used to characterize dependencies between word units in the sample problem.
That is, an initial third graph network reflecting the dependency relationship between word units in the sample text segment may be constructed first, and then the first graph network may be constructed based on the initial third graph network and according to the association relationship between the sample answer and the sample text segment. And constructing an initial fourth graph network reflecting the dependency relationship among word units in the sample question, and constructing a second graph network according to the incidence relationship between the sample answer and the sample question on the basis of the initial fourth graph network.
Therefore, the incidence relation between the word unit of the sample text segment and the word unit of the sample answer can be clearly described through the first graph network, the incidence relation between the word unit of the sample question and the word unit of the sample answer can be clearly described through the second graph network, the incidence relation between the word unit of the sample question and the word unit of the sample answer is preliminarily obtained, and preparation is made for further subsequent use.
In some embodiments, constructing the initial third graph network based on the dependencies between word units in the sample text segments may include: taking word units in the sample text fragment as nodes to obtain a plurality of nodes; and connecting the nodes with the dependency relationship based on the dependency relationship among the word units in the sample text fragment to obtain the initial third graph network.
That is, the initial third graph network that characterizes the dependency relationship between word units in the sample text segment may be constructed by taking the word units in the sample text segment as nodes and the dependency relationship between the word units as edges. Therefore, the association relationship among the word units in the sample text segment can be preliminarily determined, and the learning of the model on the relationship among the word units in the sample text segment can be strengthened.
As an example, dependency analysis may be performed on the sample text segment through a Stanford Core NLP (Natural Language Processing) algorithm, so as to obtain a dependency relationship between multiple word units in the sample text segment.
Illustratively, by performing dependency analysis on a sample text segment "i love my country" through a Stanford Core NLP algorithm, we can obtain "i" as a subject, "love" as a predicate, and "my country" as an object, and can obtain the dependency relationship between "i", "love", "i", "of", "ancestor", and "country" with each other. For example, in the sample text segment, one "me" and "love", one "me" and "am" both have dependencies, one "love" and "am" also have dependencies, and one "am" and "mn" have dependencies, and based on the above dependencies, the initial third graph network shown in fig. 4 can be obtained.
In some embodiments, the specific implementation of constructing the initial first graph network based on the incidence relation between the initial third graph network and the sample answer may include: and connecting the target node with a node in the initial third graph network by taking the word unit in the sample answer as the target node based on the incidence relation between the word unit in the sample answer and the word unit in the sample text segment to obtain the initial first graph network.
That is, the word unit in the sample answer may be used as a target node, and the target node may be connected to a node corresponding to the word unit of the sample text segment in the initial third graph network, so that the initial first graph network representing the association relationship between the word unit of the sample text segment and the word unit of the sample answer may be obtained, and the model may preliminarily learn the association relationship between the sample text segment and the sample answer.
As an example, a target node corresponding to a word unit in the sample answer may be connected to a node corresponding to each word unit in the sample text segment. Or, as another example, the target node corresponding to the word unit in the sample answer may be connected to the node in the initial third graph network, which has an association relationship with the target node.
Illustratively, taking the sample text segment as "i love my home" and the sample answer as "home", respectively connecting the "ancestors" in the sample answer to each node in the initial third graph network, and respectively connecting the "country" in the sample answer to each node in the initial third graph network, a first graph network shown in fig. 5 may be obtained, and the bold node in fig. 5 is the target node.
In some embodiments, the constructing the initial fourth graph network based on the dependencies between word units in the sample question may include: taking word units in the sample problem as nodes to obtain a plurality of nodes; and connecting the nodes with the dependency relationship based on the dependency relationship among the word units in the sample problem to obtain the initial fourth graph network.
That is, the initial fourth graph network that characterizes the dependency relationship between word units in the sample problem may be constructed by taking the word units in the sample problem as nodes and the dependency relationship between the word units as edges. Therefore, the incidence relation among the word units in the sample problem can be preliminarily determined, and the learning of the model on the relation among the word units in the sample problem can be strengthened.
As an example, the dependency analysis of the sample problem can be performed by the Stanford Core NLP algorithm, and the dependency relationship between multiple word units in the sample problem can be obtained.
As an example, by performing dependency analysis on the sample problem "who i love" through the Stanford Core NLP algorithm, we can get "i" as the subject, "love" as the predicate, "who" as the object, and can get the dependency relationship between "i", "love", "who" and each other. For example, in the sample problem, there is a dependency relationship between "i" and "love", there is a dependency relationship between "love" and "who", and there is a dependency relationship between "i" and "who", based on the above dependency relationship, referring to fig. 6, the initial fourth graph network shown in fig. 6 can be obtained.
In some embodiments, the specific implementation of constructing the initial first graph network based on the incidence relation between the initial third graph network and the sample answer may include: and connecting the target node with a node in the initial third graph network by taking the word unit in the sample answer as the target node based on the incidence relation between the word unit in the sample answer and the word unit in the sample text segment to obtain the initial first graph network.
In some embodiments, the specific implementation of constructing the initial second graph network based on the incidence relation between the initial fourth graph network and the sample answer may include: and connecting the target node with a node in the initial fourth graph network by taking the word unit in the sample answer as the target node based on the incidence relation between the word unit in the sample answer and the word unit in the sample question to obtain the initial second graph network.
That is, the word unit in the sample answer may be used as the target node, and the target node may be connected to the node corresponding to the word unit of the sample question in the initial fourth graph network, so that the initial second graph network representing the association relationship between the word unit of the sample question and the word unit of the sample answer may be obtained, and the model may preliminarily learn the association relationship between the sample question and the sample answer.
As an example, a target node corresponding to a word unit in the sample answer may be connected to a node corresponding to each word unit in the sample question. Or, as another example, a target node corresponding to the word unit in the sample answer may be connected to a node in the initial fourth graph network, which has an association relationship with the target node.
Illustratively, taking the sample question as "who i love", and the sample answer as "home", the "ancestors" in the sample answer may be respectively connected to each node in the initial fourth graph network, and the "country" in the sample answer may be respectively connected to each node in the initial fourth graph network, so that the initial second graph network shown in fig. 7 may be obtained, and the bold node in fig. 7 is the target node.
In the embodiment of the application, the reading understanding model can be trained by fully utilizing the incidence relation between the sample text segment and the sample answer and the incidence relation between the sample text segment and the sample question, so that the accuracy of the reading understanding task executed by the reading understanding model can be improved.
Step 204, inputting the sample text segment, the sample question and the sample answer into a feature extraction layer of the reading understanding model, and respectively obtaining a first feature vector group, a second feature vector group and a third feature vector group.
As one example, a feature extraction layer may be used to extract features of the input text. The first feature vector group is a feature vector group obtained after the sample text segment passes through the feature extraction layer, the second feature vector group is a feature vector group obtained after the sample question sample text segment passes through the feature extraction layer, and the third feature vector group is a feature vector group obtained after the sample answer sample text segment passes through the feature extraction layer. And the first feature vector group comprises a plurality of first feature vectors, each first feature vector corresponds to one word unit in the sample text segment, the second feature vector group comprises a plurality of second feature vectors, each second feature vector corresponds to one word unit in the sample question, the third feature vector group comprises a plurality of third feature vectors, and each third feature vector corresponds to one word unit in the sample answer.
For example, referring to fig. 3, a sample text segment, a sample question, and a sample answer may be input into a feature extraction layer of the reading understanding model to determine a first feature vector group, a second feature vector group, and a third feature vector group, respectively.
In implementation, the specific implementation of this step may include: performing word segmentation processing on the sample text segment, the sample question and the sample answer to respectively obtain a first word unit group, a second word unit group and a third word unit group; performing word embedding processing on the first word unit group, the second word unit group and the third word unit group to obtain a first word vector group, a second word vector group and a third word vector group respectively; and coding the first word vector group, the second word vector group and the third word vector group to respectively obtain the first feature vector group, the second feature vector group and the third feature vector group.
In the embodiment of the present application, the feature extraction layer may include a word embedding processing function and an encoding function. As one example, the feature extraction layer may include a word embedding processing module and an encoding module.
Illustratively, the feature extraction layer may employ the structure of the Bert model. Because the feature vector obtained by the Bert model is the feature vector combined with full-text semantic information, the feature vectors of sample text fragments, sample questions and word units in sample answers can be more fully utilized, and the accuracy of reading and understanding the model can be improved.
As an example, taking a sample text segment as an example, if the sample text segment is a chinese text, a word may be divided into a word unit, and a punctuation mark may be divided into a word unit; if the sample text segment is a foreign language text, a word can be divided into a word unit, and a phrase can be divided into a word unit; if there are numbers in the sample text segment, the numbers can be divided into a word unit individually.
Exemplarily, assuming that the sample text segment is "lilai is called" fairy ", seven first word units of" li "," white "," by "," say "," verse "," fairy "can be obtained.
As an example, a word embedding process may be performed on each first word unit in the first word unit group in a one-hot (one-hot) encoding manner to obtain a word vector of each first word unit, a word embedding process may be performed on each second word unit in the second word unit group to obtain a word vector of each second word unit, and a word embedding process may be performed on each word unit in the third word unit group to obtain a word vector of each third word unit.
As another example, word embedding processing may be performed on each first word unit in the first word unit group in a word2vec coding manner to obtain a word vector of each first word unit, word embedding processing may be performed on each second word unit in the second word unit group to obtain a word vector of each second word unit, and word embedding processing may be performed on each word unit in the third word unit group to obtain a word vector of each third word unit.
As an example, each first word vector, each second word vector, and each third word vector are encoded, so that a vector representation after full-text semantic information of a sample text fragment corresponding to each first word unit, that is, a first feature vector, can be obtained, a vector representation after full-text semantic information of a sample question corresponding to each second word unit, that is, a second feature vector, and a vector representation after answer full-text semantic information of a sample corresponding to each third word unit, that is, a third feature vector, can be obtained, and then a first feature vector group, a second feature vector group, and a third feature vector group can be obtained.
Illustratively, taking the sample answer as "li white" as an example, inputting "li white" into the feature extraction layer, performing word segmentation on "li white" to obtain word units "li" and "white", performing word embedding processing on "li" and "white" respectively to obtain a word vector of "li" and a word vector of "white", encoding the word vector of "li" and the word vector of "white", obtaining a third feature vector obtained after "li" is combined with the word vector of "white", and obtaining a third feature vector obtained after "white" is combined with the word vector of "li", where assuming that the third feature vector corresponding to "li" is x and the third feature vector corresponding to "white" is y, the third feature vector group may be xy. Similarly, the sample text segment 'Libai is called poetry' and is input into the feature extraction layer, the first feature vector of each word in the sample text segment can be output, and the sample question 'who is the poetry' is input into the feature extraction layer, and the second feature vector of each word in the sample question can be output.
Through the feature extraction, a first feature vector capable of accurately reflecting the semantics of each word unit in a sample text fragment, a second feature vector capable of accurately reflecting the semantics of each word unit in a sample question and a third feature vector capable of accurately reflecting the semantics of each word unit in a sample answer can be obtained, namely, the reading understanding model is trained by using more accurate feature vectors, and the accuracy of the trained model can be improved.
It should be noted that, in the embodiment of the present application, the feature extraction layer may adopt a structure of the BERT model that is already preprocessed and is finely adjusted by using a reading understanding task, so that the obtained first feature vector group, the second feature vector group, and the third feature vector group can respectively more accurately reflect the semantic features of the sample text fragment, the semantic features of the sample question, and the semantic features of the sample answer, and the training rate and the use accuracy of the model can be improved.
Step 206, inputting the first feature vector group, the second feature vector group and the third feature vector group into an attention layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network.
Wherein the first graph network is an initial first graph network that includes attention values of nodes and attention values of edges. The second graph network is an initial second graph network that includes attention values for nodes and attention values for edges.
As an example, the attention layer may employ the structure of the attention layer of the BERT model. Alternatively, the attention layer may adopt any other structure including a model of an attention mechanism, which is not limited in this embodiment of the present application.
As an example, in this step, a first feature vector group, a second feature vector group, a third feature vector group, an initial first graph network, and an initial second graph network may be input into an attention layer of the reading understanding model, an attention value is added to a node and an edge of the initial first graph network based on the first feature vector group and the second feature vector group to obtain the first graph network, and an attention value is added to a node and an edge of the initial second graph network based on the second feature vector group and the third feature vector group to obtain the second graph network. Exemplarily, referring to fig. 3, a first feature vector group, a second feature vector group, a third feature vector group, an initial first graph network, and an initial second graph network may be input into an attention layer of a reading understanding model, and an attention value is added to nodes and edges included in the initial first graph network based on the first feature vector group and the second feature vector group, so as to obtain a first graph network; and adding attention values to the nodes and edges included in the initial second graph network based on the second feature vector group and the third feature vector group to obtain a second graph network.
Or, as another example, in this step, the first feature vector group, the second feature vector group, and the third feature vector group may be input into an attention layer of the reading understanding model, an attention value of a node and an edge included in the initial first graph network is obtained based on the first feature vector group and the second feature vector group, and the attention value is added to the initial first graph network to obtain the first graph network; and obtaining attention values of nodes and edges included in the initial second graph network based on the second feature vector group and the third feature vector group, and adding the attention values to the initial second graph network to obtain the second graph network.
In implementation, the specific implementation of this step may include: adding, by the attention layer, attention values for nodes and edges of the initial first graph network based on the first set of feature vectors and the third set of feature vectors; adding, by the attention layer, attention values for nodes and edges of the initial second graph network based on the second set of feature vectors and the third set of feature vectors.
As an example, the initial first graph network characterizes an association between a sample text segment and a sample answer, the first set of feature vectors is a feature representation of the sample text segment, and the third set of feature vectors is a feature representation of the sample answer, so that attention values can be added to nodes and edges of the initial first graph network according to the first set of feature vectors and the third set of feature vectors. Similarly, the initial second graph network represents the incidence relation between the sample question and the sample answer, the second feature vector group is the feature representation of the sample question, and the third feature vector group is the feature representation of the sample answer, so that the attention values can be added to the nodes and the edges of the initial second graph network according to the second feature vector group and the third feature vector group.
The nodes in the initial first graph network are word units of the sample text segments and the sample answers, so that attention values can be added to the nodes and the edges of the initial first graph network at the attention level according to the first feature vector group and the third feature vector group, and the association relationship between the sample text segments and the sample answers can be further captured. Similarly, the nodes in the initial second graph network are word units of the sample question and the sample answer, so that the attention values can be added to the nodes and the edges of the initial second graph network at the attention level according to the second feature vector group and the third feature vector group, and the association relationship between the sample question and the sample answer can be further captured. Therefore, the reading understanding model can further learn the incidence relation among the sample text fragments, the sample answers and the sample questions, and the accuracy of the reading understanding model for processing the reading understanding task is improved.
In some embodiments, the specific implementation of adding, by the attention layer, attention values to the nodes and edges of the initial first graph network based on the first set of feature vectors and the third set of feature vectors may include: taking a first feature vector in the first feature vector group as an attention value of a first node in the initial first graph network, wherein the first node is a node corresponding to a word unit of the sample text segment in the initial first graph network; taking a third feature vector in the third feature vector group as an attention value of a second node in the initial first graph network, wherein the second node is a node corresponding to a word unit of the sample answer in the initial first graph network; determining an attention value between two first nodes of an edge in the initial first graph network based on the first feature vector group and serving as the attention value of the edge; based on the first set of feature vectors and the third set of feature vectors, determining and using an attention value between a first node and a second node of the initial first graph network where an edge exists as an attention value of the edge.
That is, the first feature vector in the first feature vector group may be used as the attention value of the node corresponding to the word unit of the sample text segment in the initial first graph network, and the third feature vector in the third feature vector group may be used as the attention value of the node corresponding to the word unit of the sample answer in the initial first graph network. And determining an attention value of an edge between word units of the sample text fragment in the initial first graph network according to the first feature vector group, and determining an attention value of an edge between word units of the sample text fragment and word units of the sample answer in the initial first graph network according to the first feature vector group and the third feature vector group. Therefore, the incidence relation between word units in the sample text segment and the incidence relation between the sample text segment and the sample answers can be further learned, and the accuracy of the reading understanding model obtained through training is convenient to improve.
As an example, for two first nodes where an edge exists, attention calculation may be performed on the first feature vectors of word units corresponding to the two first nodes, and the attention value of the edge may be obtained. Specifically, the attention calculation on the two first feature vectors is to multiply the two first feature vectors and normalize the result to obtain the attention value. Referring to fig. 5, there is an edge between "i" and "ai" in fig. 5, and "i" and "ai" are word units in a sample text segment, a first feature vector of the word unit "i" may be obtained from a first feature vector group, and a first feature vector of "ai" may be obtained from the first feature vector group, the first feature vector of "i" and the first feature vector of "ai" may be multiplied, and normalization processing may be performed on the product, so that an attention value of the edge between "i" and "ai" may be obtained.
As an example, for a first node and a second node where an edge exists, attention calculation may be performed on a first feature vector of a word unit corresponding to the first node and a third feature vector of the word unit corresponding to the second node, and an attention value of the edge may be obtained. Specifically, the attention calculation on the first feature vector and the third feature vector is to multiply the first feature vector and the third feature vector and normalize the result to obtain the attention value. Illustratively, referring to fig. 5, an edge exists between "me" and "ancestor" in fig. 5, and "me" is a word unit in a sample text segment, and "ancestor" is a word unit in a sample answer, a first feature vector of the word unit "me" may be obtained from a first feature vector group, and a third feature vector of the "ancestor" may be obtained from a third feature vector group, and the first feature vector of "me" and the third feature vector of "ancestor" may be multiplied, and the product is normalized, so that an attention value of the edge between "me" and "ancestor" may be obtained.
By the above manner, the attention value of each edge and the attention value of each node in fig. 5 can be determined, and the first graph network can be obtained by adding the attention values of the nodes and the edges to the initial first graph network.
In some embodiments, the specific implementation of adding, by the attention layer, attention values to the nodes and edges of the initial second graph network based on the second set of feature vectors and the third set of feature vectors may include: taking a second feature vector in the second feature vector group as an attention value of a third node in the initial second graph network, wherein the third node is a node corresponding to a word unit of the sample problem in the initial second graph network; taking a third feature vector in the third feature vector group as an attention value of a fourth node in the initial second graph network, wherein the fourth node is a node corresponding to a word unit of the sample answer in the initial second graph network; determining an attention value between two third nodes of the initial second graph network with edges as the attention value of the edges based on the second feature vector group; based on the third feature vector group, determining an attention value between a third node and a fourth node of the initial second graph network where the edge exists and using the attention value as the attention value of the edge.
That is, the second feature vector in the second feature vector group may be used as the attention value of the node corresponding to the word unit of the sample question in the initial second graph network, and the third feature vector in the third feature vector group may be used as the attention value of the node corresponding to the word unit of the sample answer in the initial second graph network. And determining an attention value of an edge between word units of the sample question in the initial second graph network according to the second feature vector group, and determining an attention value of an edge between word units of the sample question and word units of the sample answer in the initial second graph network according to the second feature vector group and the third feature vector group. Therefore, the incidence relation between word units in the sample question and the incidence relation between the sample question and the sample answer can be further learned, and the accuracy of the reading understanding model obtained through training is convenient to improve.
As an example, for two third nodes where an edge exists, attention calculation may be performed on the second feature vectors of the word units corresponding to the two third nodes, and the attention value of the edge may be obtained. Specifically, the attention calculation on the two second feature vectors is to multiply the two second feature vectors and normalize the result to obtain the attention value. Illustratively, referring to fig. 7, an edge exists between "i" and "who" in fig. 7, and "i" and "who" are word units in the sample question, a second feature vector of the word unit "i" may be obtained from the second feature vector group, and a second feature vector of "who" may be obtained from the second feature vector group, and the second feature vector of "i" and the second feature vector of "who" may be multiplied, and normalization processing may be performed on the products, and a value of attention of the edge between "i" and "who" may be obtained.
As an example, for a third node and a fourth node where an edge exists, attention calculation may be performed on a second feature vector of a word unit corresponding to the third node and a third feature vector of a word unit corresponding to the fourth node, and an attention value of the edge may be obtained. Specifically, the attention calculation for the second feature vector and the third feature vector is to multiply the second feature vector and the third feature vector and normalize the result to obtain the attention value. Illustratively, referring to fig. 7, there is an edge between "who" and "country" in fig. 7, and "who" is a word unit in the sample question, "country" is a word unit in the sample answer, a second feature vector of the word unit "who" can be obtained from the second feature vector group, and a third feature vector of "country" can be obtained from the third feature vector group, the second feature vector of "who" and the third feature vector of "country" can be multiplied, the product is normalized, and the attention value of the edge between "who" and "country" can be obtained.
By the above manner, the attention value of each edge and the attention value of each node in fig. 7 can be determined, and the attention values of the nodes and the edges are added to the initial second graph network, so that the second graph network can be obtained.
Note that, in the embodiment of the present application, attention calculation may be performed on two feature vectors by the following formula (1).
Figure BDA0003271152460000131
Wherein, in formula (1), attention represents the attention value, softmax (·) is a normalization function, Q and K represent two eigenvectors, respectively, dkIs a constant and T is a matrix transpose.
For example, referring to fig. 7, an edge exists between "who" and "country" in fig. 7, and "who" is a word unit in the sample question, "country" is a word unit in the sample answer, a second feature vector of the word unit "who" may be obtained as Q from the second feature vector group, and a third feature vector of "country" may be obtained as K from the third feature vector group, and the attention value of the edge between "who" and "country" may be determined by the above formula (1).
In the embodiment of the application, the incidence relation among the sample text segment, the sample question and the sample answer can be further captured through the attention layer, the incidence relation is converted into the attention value and is given to the initial first graph network and the initial second graph network, the first graph network and the second graph network are obtained, the model further learns the incidence relation among the sample text segment, the sample question and the sample answer, and the accuracy of the reading understanding model obtained through training can be improved.
It should be noted that, steps 204 to 206 are steps of inputting the sample text segment, the sample question, and the sample answer into the text processing layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network, respectively, to obtain a specific implementation of the first graph network and the second graph network.
And step 208, inputting the first graph network and the second graph network into a graph volume network layer of the reading understanding model to obtain a predicted answer.
As an example, the graph convolution network layer may be a GCN model.
Illustratively, referring to fig. 3, the first graph network and the second graph network may be input into a graph volume network layer of the reading understanding model to obtain the predicted answer.
In an implementation, inputting the first graph network and the second graph network into the graph volume network layer of the reading understanding model, and obtaining a specific implementation of the predicted answer may include: determining, by the graph convolution network layer, a first hidden layer feature vector of the first graph network and a second hidden layer feature vector of the second graph network; carrying out weighted summation on the first hidden layer feature vector and the second hidden layer feature vector to obtain a target hidden layer feature vector; determining the predicted answer based on the target hidden layer feature vector.
As an example, the first hidden layer feature vector is a vector representation of the first graph network obtained by performing convolution processing on the first graph network through the graph convolution network layer, and can be regarded as a graph feature vector of the first graph network. The second hidden layer feature vector is a vector representation of the second graph network obtained after the second graph network is subjected to convolution processing by the graph convolution network layer, and can be regarded as a graph feature vector of the second graph network. The target hidden layer feature vector is a vector representation obtained by combining vector representations of the first graph network and the second graph network.
In some embodiments, the graph network may be convolved at the graph convolution network layer by the following equation (2).
Figure BDA0003271152460000141
Wherein in formula (2), i represents the ith node in the graph network, j represents the jth node in the graph network,
Figure BDA0003271152460000142
the feature vector representing the i +1 th convolutional layer input to the i-th node, σ (·) represents a nonlinear transfer function, which may be a ReLU activation function, NiRepresenting node i and all nodes connected to node i,
Figure BDA0003271152460000143
denotes the j (th)The characteristic vector of the first convolution layer, C, is input to each nodeijIndicating the attention value of the edge between the ith node and the jth node,
Figure BDA0003271152460000144
represents the weight of the jth node at the ith convolutional layer,
Figure BDA0003271152460000145
represents the intercept of the jth node at the ith convolutional layer.
As an example, a graph convolutional layer network layer may include a plurality of convolutional layers, each convolutional layer includes a preset weight parameter matrix, a weight of each node in each convolutional layer may be an initial weight in the weight parameter matrix, and similarly, each convolutional layer may include a preset intercept parameter matrix, and an intercept of each node in each convolutional layer may be an initial intercept in the intercept parameter matrix. In the subsequent training process, the weight parameter matrix and the intercept parameter matrix of each convolutional layer can be adjusted according to the training condition.
For example, taking the first graph network as an example, assuming that the graph convolutional layer network layer includes two convolutional layers, in the first convolutional layer, the feature vector of each node in the first graph network may be used as an input, and the weight parameter matrix and the intercept parameter matrix of the first convolutional layer may be used as preset parameters, and by using the above formula (2), it may be determined that the feature vector of the second convolutional layer is input to each node in the first graph network, that is, the feature vector obtained after each node in the first graph network performs a convolution process. Then, in the second convolutional layer, the obtained feature vector of the second convolutional layer can be input to each node as an input, the weight parameter matrix and the intercept parameter matrix of the second convolutional layer are used as preset parameters, and the feature vector of the third convolutional layer input to each node in the first graph network, namely the feature vector obtained after each node in the first graph network is subjected to convolution processing twice can be determined through the formula (2). And splicing the feature vectors obtained after the convolution processing is performed twice on a plurality of nodes in the first graph network to obtain a first hidden layer feature vector of the first graph network.
As an example, when the first hidden layer feature vector and the second hidden layer feature vector are subjected to weighted summation, a weight of the first hidden layer feature vector and a weight of the second hidden layer feature vector may be the same or different, and may be set by a user according to actual needs or may be set by a computing device as a default, which is not limited in this embodiment of the present application.
Through the method, the potential association relationship among the nodes in the first graph network and the potential association relationship among the nodes in the second graph network can be obtained, so that the potential association relationship among the sample text fragments, the sample questions and the sample answers can be conveniently read and understood by the model, and the accuracy of the model is improved.
In some embodiments, determining a specific implementation of the predicted answer based on the target hidden layer feature vector may include: converting the value of each dimension of the target hidden layer feature vector into at least one prediction probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and the at least one prediction probability corresponding to each dimension represents the probability that the prediction label of the word unit corresponding to each dimension is at least one label; determining a prediction label of a word unit corresponding to each dimension based on at least one prediction probability corresponding to each dimension; and determining the predicted answer based on the predicted label of the word unit corresponding to each dimension.
As an example, the sequence annotation function is a function used in performing sequence annotation, and can map an input vector into at least one-dimensional probabilities, that is, at least one probability can be obtained for each vector. The Sequence Tagging may be referred to as Sequence Tagging, and after the probability corresponding to the vector of each dimension is determined through a Sequence Tagging function, a preset tag may be tagged to each word unit according to the probability.
As an example, the label may be B, I, O. Wherein, B may be called Begin, and represents the initial word of the answer, i.e. the first word of the answer; i can be called Inside and represents the ending word in the middle of the answer, namely the second word to the last word of the answer; o may be referred to as Outside and represents a non-answer word, i.e., a word that is not an answer.
It should be noted that the length of the target hidden layer feature vector is the same as the length of the sample text segment, that is, the dimension of the target hidden layer feature vector and the number of word units of the sample text segment can be considered to be the same.
Exemplarily, assuming that the sample text fragment is "i love my country", the target hidden layer feature vector is a 6-dimensional vector, and the 6 dimensions respectively correspond to word units i, love, me, ancestor, and country, each dimension in the target hidden layer feature vector is converted into 3 prediction probabilities, and each prediction probability corresponds to a possibility of occurrence of the label "BIO". For example, for the word unit "i", assuming that the calculated prediction probabilities are 0.2, 0.3, and 0.5, respectively, it can be determined that the probability that the prediction label is "O" is the largest, and the prediction label corresponding to "i" is "O". Similarly, the prediction labels corresponding to the 6 word units can be determined as "O", "B", and "I", respectively. Since the label "B" represents the initial word of the answer and the label "I" represents the intermediate final word of the answer, it can be considered that "ancestor" and "nation" are predicted answers.
The prediction label of each word unit can be determined in a sequence labeling mode, the prediction answer can be determined according to the prediction label, when the model parameters are adjusted, the prediction label of the correct prediction answer can be closer to the correct label, and the training efficiency and accuracy of the reading understanding model can be improved in the mode.
As an example, the at least one tag includes an answer beginning word, an answer middle ending word, and a non-answer word, and determining a specific implementation of the predicted answer based on the predicted tag of the word unit corresponding to each dimension may include: and taking the word unit corresponding to the head word of the answer and the word unit corresponding to the middle ending word of the answer as the predicted answer.
That is to say, the head word of the answer and the middle head word of the answer can be spliced to obtain the predicted answer.
Continuing with the above example, if the label of the word unit "ancestor" is B, the label of the word unit "nation" is I, the label "B" represents the initial word of the answer, and the label "I" represents the final word in the middle of the answer, then "ancestor" may be determined as the predicted answer.
Step 210, training the reading understanding model based on the difference between the predicted answer and the sample answer until a training stop condition is reached.
In some embodiments, a difference between the prediction function and the sample answer may be determined by a loss function, and the reading understanding model may be trained based on the difference.
As an example, the training of the reading understanding model based on the difference value mainly adjusts the parameters of the graph convolution network layer based on the difference value so that the predicted answer and the sample answer can be closer in the subsequent training. For example, assuming that the answer is "home", if the probability of the "O" label corresponding to "country" is the highest during the training process, the parameters need to be adjusted in the model training so that the probability of the "country" corresponding to the "I" label is the highest.
For example, referring to fig. 3, a difference value may be determined based on the predicted answer and the sample answer, and a parameter of the graph convolution network layer may be adjusted based on the difference value.
In some embodiments, the training the reading understanding model based on the difference between the predicted answer and the sample answer until reaching the training stop condition may include: if the difference value is smaller than a preset threshold value, stopping training the reading understanding model; and if the difference is larger than or equal to the preset threshold, continuing to train the reading understanding model.
It should be noted that the preset threshold may be set by a user according to actual needs, or may be set by default by a computing device, which is not limited in this embodiment of the application.
That is to say, the reading understanding model may be trained based on a difference between the predicted answer and the sample answer, if the loss value is smaller than the preset threshold value, it may be considered that the current model parameter has substantially met the requirement, it may be considered that the reading understanding model has been trained, and therefore, the training of the reading understanding model may be stopped. If the loss value is greater than or equal to the preset threshold value, it can be considered that the difference between the predicted answer of the model and the sample answer is large, and the current model parameters cannot meet the requirements, so that the reading understanding model needs to be trained continuously.
Whether the reading understanding model is continuously trained or not is determined according to the relation between the difference value and the preset threshold value, the training degree of the reading understanding model can be accurately mastered, and the training efficiency of the model and the accuracy of the reading understanding task processed by the model are improved.
In other embodiments, the reaching of the training-stop condition may include: recording and carrying out iterative training once the predicted answer is obtained every time; and counting the training times of iterative training, and if the training times are greater than a time threshold value, determining that the training stopping condition is reached.
It should be noted that the time threshold may be set by a user according to actual needs, or may be set by default by a computing device, which is not limited in this embodiment of the application.
As an example, each time a predicted answer is obtained, it is stated that an iterative training is performed, one may be added to the recorded iterative training times, the training times are counted after each iterative training is performed, if the training times are greater than a time threshold, it is stated that the training for reading and understanding the model is sufficient, that is, a training stop condition is reached, and then the training is continued, a better effect may not be reached, so the training may be stopped. If the training times are less than or equal to the time threshold, the number of times of training the reading understanding model is too small, the reading understanding model may not be trained until the reading understanding model meets the actual requirement, and therefore training can be continued based on the difference value between the predicted answer and the sample answer.
Whether the reading understanding model is continuously trained or not is determined according to the corresponding relation between the times of iterative training and the time threshold, so that unnecessary iterative training can be reduced, the consumption of computing resources caused by iterative training is reduced, and the training efficiency of the model is improved.
In the embodiment of the application, an initial first graph network of a sample text fragment and a sample answer is constructed by reading a graph construction network layer of an understanding model, and an initial second graph network of a sample question and the sample answer is constructed; inputting the sample text segment, the sample question and the sample answer into a text processing layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network; inputting the first graph network and the second graph network into a graph volume network layer of the reading understanding model to obtain a predicted answer; training the reading understanding model based on the difference between the predicted answer and the sample answer until a training stop condition is reached. By the method, the incidence relation among the sample text segment, the sample question and the sample answer can be effectively utilized, the reading understanding model is trained by combining the incidence relation among the sample text segment, the sample question and the sample answer, and the accuracy of the reading understanding task executed by the reading understanding model can be improved.
The following description will further describe the reading understanding model training method provided in the present application with reference to fig. 8 by taking an example of an application of the reading understanding model training method in a reading understanding task. Fig. 8 shows a processing flow chart of a training method applied to a reading understanding model of a choice question according to an embodiment of the present application, which may specifically include the following steps:
step 802, sample text fragments, sample questions and sample answers are obtained.
For example, assuming that the sample text fragment is "i love my home", the sample question is a choice question, assuming that the sample question is "i love who", the choices include "home", "father", "mother", "family", and the sample answer is "home".
And step 804, inputting the sample text segment, the sample question and the sample answer into a graph construction network layer of the reading understanding model, and constructing an initial third graph network based on the dependency relationship among word units in the sample text segment.
In implementation, words in the sample text segment may be used as nodes to obtain a plurality of nodes, and based on the dependency relationship between word units in the sample text segment, the nodes having the dependency relationship are connected to obtain the initial third graph network.
For example, referring to fig. 4, the nodes of the initial third graph network include word units "i", "love", "i", "of", "ancestor", and "nation" in the sample text segment, and according to the dependency relationships among these six word units, it can be determined that there is an edge between one "i" and "love", one "i" and "of", "ancestor", respectively, one edge between "love" and "ancestor", and one edge between "ancestor" and "nation".
Step 806, based on the association relationship between the word units in the sample answer and the word units in the sample text segment, taking the word units in the sample answer as target nodes, and connecting the target nodes with the nodes in the initial third graph network to obtain an initial first graph network.
For example, referring to fig. 5, the initial first graph network may be obtained by determining "ancestor" as a target node, "determining" nation "as a target node, connecting the target node" ancestor "to each node in the initial third graph network, and connecting the target node" nation "to each node in the initial third graph network.
Step 808, inputting the sample text segment, the sample question and the sample answer into a graph construction network layer of the reading understanding model, and constructing an initial fourth graph network based on the dependency relationship among word units in the sample question.
In implementation, the words in the sample problem can be used as nodes to obtain a plurality of nodes; and connecting the nodes with the dependency relationship based on the dependency relationship among the word units in the sample problem to obtain the initial fourth graph network.
For example, referring to fig. 6, the nodes of the initial fourth graph network include word units "i", "love", "who" in the sample problem, and according to the dependencies between these three word units, it can be determined that there is an edge between "i" and "love", "who", respectively, and there is an edge between "love" and "who".
Step 810, based on the association relationship between the word unit in the sample answer and the word unit in the sample question, taking the word unit in the sample answer as a target node, and connecting the target node with a node in the initial fourth graph network to obtain the initial second graph network.
For example, referring to fig. 7, the initial second graph network may be obtained by determining "ancestor" as a target node, "determining" nation "as a target node, connecting the target node" ancestor "to each node in the initial fourth graph network, and connecting the target node" nation "to each node in the initial fourth graph network.
It should be noted that steps 802 to 810 are the lower descriptions of step 202, and the implementation process thereof is the same as the process of step 202, and for specific implementation, reference may be made to the related description of step 202, and this embodiment is not described herein again.
Step 812, performing word segmentation on the sample text segment to obtain a first word unit group, performing word segmentation on the sample question to obtain a second word unit group, and performing word segmentation on the sample answer to obtain a third word unit group.
Continuing with the above example, the sample text segment is segmented to obtain the first word unit groups, which are "me", "love", "me", "of", "ancestor" country ", respectively. Similarly, the second word unit group can be obtained by segmenting the sample question, and is "I", "love" and "who" respectively. The sample answers are participled to obtain a third word unit group, which is respectively 'ancestor' and 'nation'.
Step 814, performing word embedding processing on the first word unit group, the second word unit group, and the third word unit group to obtain a first word vector group, a second word vector group, and a third word vector group, respectively.
Taking the sample answer as "the country", the feature extraction layer may obtain a vector representation of each word in the text "the country", assuming that the third word vector corresponding to "the country" is x, and the third word vector corresponding to "the country" is y. Similarly, the word embedding processing is carried out on the sample text segment 'i love my country', the first word vector of each word in the sample text segment can be output, the word embedding processing is carried out on the sample question 'i love who' and the second word vector of each word in the sample question can be output.
Step 816, encoding the first word vector group, the second word vector group, and the third word vector group to obtain the first feature vector group, the second feature vector group, and the third feature vector group, respectively.
Continuing with the above example, encoding "ancestor" and "country" in the sample answer may result in a third feature vector of "ancestor" and a third feature vector of "country", respectively. Similarly, the first feature vector of me, the first feature vector of love and the first feature vector of who can be obtained by encoding me, love and who in the sample problem. The "i", "love", "me", "ancestor" and "country" in the sample text segment are encoded, so that a second feature vector of "i", "love", "me", "ancestor" and a second feature vector of "country" can be obtained respectively.
It should be noted that, step 812-step 816 are the lower descriptions of step 204, the implementation process thereof is the same as the process of step 204, and specific implementation may refer to the related description of step 204, which is not described herein again.
Step 818, adding attention values to the nodes and edges of the initial first graph network through the attention layer based on the first feature vector group and the third feature vector group to obtain a first graph network.
As an example, a first feature vector in the first feature vector group may be used as an attention value of a first node in the initial first graph network, where the first node is a node corresponding to a word unit of the sample text segment in the initial first graph network; taking a third feature vector in the third feature vector group as an attention value of a second node in the initial first graph network, wherein the second node is a node corresponding to a word unit of the sample answer in the initial first graph network; determining an attention value between two first nodes of an edge in the initial first graph network based on the first feature vector group and serving as the attention value of the edge; based on the first set of feature vectors and the third set of feature vectors, determining and using an attention value between a first node and a second node of the initial first graph network where an edge exists as an attention value of the edge.
Illustratively, referring to fig. 5, for two first nodes with edges, an edge exists between "i" and "ai" in fig. 5, and "i" and "ai" are word units in a sample text segment, a first feature vector of the word unit "i" may be obtained from a first feature vector group, and a first feature vector of "ai" may be obtained from the first feature vector group, the first feature vector of "i" and the first feature vector of "ai" may be multiplied, and normalization processing may be performed on the product, and an attention value of the edge between "i" and "ai" may be obtained. For the first node and the second node where an edge exists, an edge exists between "me" and "ancestor" in fig. 5, and "me" is a word unit in the sample text segment, and "ancestor" is a word unit in the sample answer, a first feature vector of the word unit "me" may be obtained from the first feature vector group, and a third feature vector of the "ancestor" may be obtained from the third feature vector group, and the first feature vector of "me" and the third feature vector of "ancestor" may be multiplied, and normalization processing may be performed on the product, so that an attention value of the edge between "me" and "ancestor" may be obtained.
By the above manner, the attention value of each edge and the attention value of each node in fig. 5 can be determined, and the first graph network can be obtained by adding the attention values of the nodes and the edges to the initial first graph network.
And 820, adding attention values to the nodes and edges of the initial second graph network through the attention layer based on the second feature vector group and the third feature vector group to obtain a second graph network.
As an example, a second feature vector in the second feature vector group is taken as an attention value of a third node in the initial second graph network, where the third node is a node corresponding to a word unit of the sample question in the initial second graph network; taking a third feature vector in the third feature vector group as an attention value of a fourth node in the initial second graph network, wherein the fourth node is a node corresponding to a word unit of the sample answer in the initial second graph network; determining an attention value between two third nodes of the initial second graph network with edges as the attention value of the edges based on the second feature vector group; based on the third feature vector group, determining an attention value between a third node and a fourth node of the initial second graph network where the edge exists and using the attention value as the attention value of the edge.
Illustratively, referring to fig. 7, for two third nodes with edges, an edge exists between "i" and "who" in fig. 7, and "i" and "who" are word units in the sample question, a second feature vector of the word unit "i" may be obtained from the second feature vector group, and a second feature vector of "who" may be obtained from the second feature vector group, and the second feature vector of "i" and the second feature vector of "who" may be multiplied, and the product is normalized, so that the attention value of the edge between "i" and "who" may be obtained. For the third node and the fourth node where an edge exists, an edge exists between "who" and "country" in fig. 7, and "who" is a word unit in the sample question, "country" is a word unit in the sample answer, a second feature vector of the word unit "who" can be obtained from the second feature vector group, and a third feature vector of "country" can be obtained from the third feature vector group, the second feature vector of "who" and the third feature vector of "country" can be multiplied, the product is normalized, and the attention value of the edge between "who" and "country" can be obtained.
By the above manner, the attention value of each edge and the attention value of each node in fig. 7 can be determined, and the attention values of the nodes and the edges are added to the initial second graph network, so that the second graph network can be obtained.
It should be noted that steps 812-820 are the lower descriptions of step 206, the implementation process thereof is the same as the process of step 206, and specific implementation may refer to the related description of step 206, which is not described herein again.
Step 822, inputting the first graph network and the second graph network into a graph convolution network layer of the reading understanding model, and determining a first hidden layer feature vector of the first graph network and a second hidden layer feature vector of the second graph network through the graph convolution network layer.
Step 824, performing weighted summation on the first hidden layer feature vector and the second hidden layer feature vector to obtain a target hidden layer feature vector.
And step 826, converting the value of each dimension of the target hidden layer feature vector into at least one prediction probability through a sequence labeling function.
Each dimension of the target hidden layer feature vector corresponds to a word unit, and at least one prediction probability corresponding to each dimension represents the probability that a prediction label of the word unit corresponding to each dimension is at least one label. In addition, the length of the target hidden layer feature vector is the same as the length of the sample text segment, that is, the dimension of the target hidden layer feature vector and the number of word units of the sample text segment can be considered to be the same.
Exemplarily, assuming that the target hidden-layer feature vector is a 6-dimensional vector, and the 6 dimensions respectively correspond to word units of my, love, me, ancestor, and country, each dimension in the target hidden-layer feature vector is converted into 3 prediction probabilities, and each probability corresponds to a possibility of occurrence of the label "BIO". For example, for the word unit "i", it is assumed that the calculated prediction probabilities are 0.2, 0.3, and 0.5, respectively.
Step 828, determining a prediction label of the word unit corresponding to each dimension based on the at least one prediction probability corresponding to each dimension.
Continuing the above example, since 0.5 max, it may be determined that "i" corresponds to a prediction tag of "O".
And step 830, taking the word unit corresponding to the head word of the answer and the word unit corresponding to the middle head word of the answer as the predicted answer.
Continuing the above example, assume that the prediction labels corresponding to the 6 word units are determined to be "O", "B", and "I", respectively. Since the label "B" represents the initial word of the answer and the label "I" represents the intermediate final word of the answer, it can be considered that "ancestor" and "nation" are predicted answers.
It should be noted that, step 822 to step 830 are the lower descriptions of step 208, and the implementation process thereof is the same as the process of step 208, and for specific implementation, reference may be made to the related description of step 208, and details of this embodiment are not described herein again.
Step 832, training the reading understanding model based on the difference between the predicted answer and the sample answer.
And 834, stopping training the reading understanding model if the loss value is smaller than a preset threshold value.
Step 836, if the loss value is greater than or equal to the preset threshold, continuing to train the reading understanding model.
It should be noted that, step 832 to step 836 are the lower descriptions of step 210, and the implementation process thereof is the same as that of step 210, and for specific implementation, reference may be made to the related description of step 210, and this embodiment is not described herein again.
In the embodiment of the application, an initial first graph network of a sample text fragment and a sample answer is constructed by reading a graph construction network layer of an understanding model, and an initial second graph network of a sample question and the sample answer is constructed; inputting the sample text segment, the sample question and the sample answer into a feature extraction layer of the reading understanding model to respectively obtain a first feature vector group, a second feature vector group and a third feature vector group; inputting the first feature vector group, the second feature vector group and the third feature vector group into an attention layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network; inputting the first graph network and the second graph network into a graph volume network layer of the reading understanding model to obtain a predicted answer; training the reading understanding model based on the difference between the predicted answer and the sample answer until a training stop condition is reached. By the method, the incidence relation among the sample text segment, the sample question and the sample answer can be effectively utilized, the reading understanding model is trained by combining the incidence relation among the sample text segment, the sample question and the sample answer, and the accuracy of the reading understanding task executed by the reading understanding model can be improved.
Referring to fig. 9, fig. 9 shows a flowchart of a reading understanding method provided according to an embodiment of the present application, including steps 902 to 908.
And 902, constructing an initial first graph network of the target text and the target answer and constructing an initial second graph network of the target question and the target answer by reading the graph construction network layer of the understanding model.
As an example, if the target question is a choice question, the target answer may be a text obtained by splicing multiple options; if the target question is a short answer, the target answer may be a keyword in the target text.
Exemplarily, assuming that the target text is "unlawful libuser write poetry, which is called poetry", the target question is a choice question, and the target question is "which poetry person is called poetry", the three options are "lisu", "dupu", and "sushi", respectively, the three options may be spliced as a target answer, and the target answer may be "lisu du fu sushi".
Illustratively, assuming that the target question is a short answer question and the target question is "which poem is called poem", and the target text is "to go into wine" in a luxurious language, which expresses the spirit of being too late, optimistic confidence and anger stuffiness to social reality, which is a work of poem lie white ", keywords may be extracted from the target text to obtain" to go into wine "," too late "," optimistic confidence "," poem "and" lie white ", and" to go into wine too late, optimistic confidence "may be used as the target answer.
As an example, an initial first graph network is used for representing the association relationship between the target text and the target answer, and an initial second graph network is used for representing the association relationship between the target question and the target answer.
Illustratively, referring to fig. 10, a target text, a target question and a target answer may be input into a graph building network layer of the reading understanding model, an initial first graph network is obtained based on the target text and the target answer, and an initial second graph network is obtained based on the target question and the target answer.
In implementation, if the text length of the target text is smaller than the length threshold, the specific implementation of constructing an initial first graph network of the target text and the target answer and constructing an initial second graph network of the target question and the target answer by reading the graph construction network layer of the understanding model may include: and constructing an initial third graph network based on the dependency relationship among the word units in the target text, and constructing an initial fourth graph network based on the dependency relationship among the word units in the target question. And constructing the initial first graph network based on the incidence relation between the initial third graph network and the target answer, and constructing the initial second graph network based on the incidence relation between the initial fourth graph network and the target answer.
And the initial third graph network is used for representing the dependency relationship between word units in the target text. The initial fourth graph network is used to characterize the dependencies between word units in the target problem.
That is, if the text length of the target text is smaller than the length threshold, the reading understanding model may process the target text, and may first construct an initial third graph network reflecting the dependency relationship between word units in the target text, and then construct a first graph network according to the association relationship between the target answer and the target text on the basis of the initial third graph network. And constructing an initial fourth graph network reflecting the dependency relationship among word units in the target question, and constructing a second graph network according to the incidence relationship between the target answer and the target question on the basis of the initial fourth graph network.
It should be noted that the length threshold may be set by a user according to actual needs, or may be set by default by a device, which is not limited in this embodiment of the application.
In some embodiments, constructing the initial third graph network based on dependencies between word units in the target text may include: taking word units in the target text as nodes to obtain a plurality of nodes; and connecting the nodes with the dependency relationship based on the dependency relationship among the word units in the target text to obtain the initial third graph network.
That is to say, the word units in the target text are taken as nodes, the dependency relationship between the word units is taken as an edge, and the initial third graph network which represents the dependency relationship between the word units in the target text can be constructed.
As an example, dependency analysis can be performed on the target text through a Stanford Core NLP algorithm, and the dependency relationship among multiple word units in the target text can be obtained.
Illustratively, taking the target text of "i love my country" as an example, performing dependency analysis on the target text of "i love my country" through the Stanford Core NLP algorithm can obtain "i" as a subject, "love" as a predicate, and "my country" as an object, and can obtain the dependency relationship between "i", "love", "i", "ancestor" and "country" to each other. For example, one "me" and "love", one "me" and "ancestor" both have a dependency relationship in the target text, one "love" and "ancestor", and "ancestor" and "country", based on which the initial third graph network shown in fig. 4 is obtained.
In some embodiments, the specific implementation of constructing the initial first graph network based on the association relationship between the initial third graph network and the target answer may include: and connecting the target node with a node in the initial third graph network by taking the word unit in the target answer as a target node based on the incidence relation between the word unit in the target answer and the word unit in the target text to obtain the initial first graph network.
That is, the word unit in the target answer may be used as the target node, and the target node may be connected to the node corresponding to the word unit of the target text in the initial third graph network, so that the initial first graph network representing the association relationship between the word unit of the target text and the word unit of the target answer may be obtained.
As an example, a target node corresponding to a word unit in the target answer may be connected to a node corresponding to each word unit in the target text. Or, as another example, a target node corresponding to the word unit in the target answer may be connected to a node in the initial third graph network, which has an association relationship with the target node.
Illustratively, taking the target text as "i love my home", taking the target question as an example of a choice question, assuming that the options include "home" and "country", the target answer is "home", the word unit in the target answer may be taken as a target node, the "ancestor" in the target answer is respectively connected to each node in the initial third graph network, the "country" in the target answer is respectively connected to each node in the initial third graph network, the "home" in the target answer is respectively connected to each node in the initial third graph network, the "country" in the target answer is respectively connected to each node in the initial third graph network, the first graph network shown in fig. 11 may be obtained, and the bold nodes in fig. 11 are target nodes.
In some embodiments, the constructing the initial fourth graph network based on the dependencies between word units in the target problem may include: taking word units in the target problem as nodes to obtain a plurality of nodes; and connecting the nodes with the dependency relationship based on the dependency relationship among the word units in the target problem to obtain the initial fourth graph network.
That is, the initial fourth graph network that characterizes the dependency relationship between word units in the target problem may be constructed by taking the word units in the target problem as nodes and taking the dependency relationship between the word units as edges.
As an example, the dependency analysis of the target problem can be performed by a Stanford Core NLP algorithm, and the dependency relationship between a plurality of word units in the target problem can be obtained.
As an example, by performing dependency analysis on the target question "who i love" through the Stanford Core NLP algorithm, we can get "i" as the subject, "love" as the predicate, "who" as the object, and can get the dependency relationship between "i", "love", "who" and each other. For example, there is a dependency relationship between "i" and "love" in the target problem, there is a dependency relationship between "love" and "who", and there is a dependency relationship between "i" and "who", based on the above-mentioned dependency relationship, referring to fig. 6, the initial fourth graph network shown in fig. 6 can be obtained.
In some embodiments, the specific implementation of constructing the initial second graph network based on the association relationship between the initial fourth graph network and the target answer may include: and connecting the target node with a node in the initial fourth graph network by taking the word unit in the target answer as a target node based on the incidence relation between the word unit in the target answer and the word unit in the target question to obtain the initial second graph network.
That is, the word unit in the target answer may be used as the target node, and the target node may be connected to the node corresponding to the word unit of the target question in the initial fourth graph network, so that the initial second graph network representing the association relationship between the word unit of the target question and the word unit of the target answer may be obtained.
As an example, a target node corresponding to a word unit in the target answer may be connected to a node corresponding to each word unit in the target question. Or, as another example, a target node corresponding to the word unit in the target answer may be connected to a node in the initial fourth graph network, which has an association relationship with the target node.
Illustratively, taking the target question as "who i love", and the target answer as "hometown of the country", word units in the target answer may be used as target nodes, the "ancestors" in the target answer are respectively connected to each node in the initial fourth graph network, the "country" in the target answer is respectively connected to each node in the initial fourth graph network, the "home" in the target answer is respectively connected to each node in the initial fourth graph network, and the "country" in the target answer is respectively connected to each node in the initial fourth graph network, so that the initial second graph network shown in fig. 12 may be obtained, and the bold nodes in fig. 12 are the target nodes.
In the embodiment of the application, the reading understanding model can be trained by fully utilizing the incidence relation between the target text and the target answer and the incidence relation between the target text and the target question, so that the accuracy of the reading understanding task executed by the reading understanding model can be improved.
It should be noted that, the above is described by taking the example that the text length of the target text is smaller than the length threshold, and if the target text is a chapter-level text, that is, the text length of the target text is greater than or equal to the length threshold, the reading understanding model may not be able to process the target text, so that the target text may be segmented or sentence-divided to obtain a plurality of target text segments, and then the initial first graph network of each target text segment and the target problem is constructed by the above method. For example, if the target text is divided into 3 target text segments, 3 first graph networks may be constructed.
Step 904, inputting the target text, the target question and the target answer into a feature extraction layer of the reading understanding model, and respectively obtaining a first feature vector group, a second feature vector group and a third feature vector group.
Wherein, the feature extraction layer can be used for extracting features of the input text.
As an example, the first feature vector group is a feature vector group obtained after the target text passes through the feature extraction layer, the second feature vector group is a feature vector group obtained after the target text of the target question passes through the feature extraction layer, and the third feature vector group is a feature vector group obtained after the target text of the target answer passes through the feature extraction layer. And the first feature vector group comprises a plurality of first feature vectors, each first feature vector corresponds to one word unit in the target text, the second feature vector group comprises a plurality of second feature vectors, each second feature vector corresponds to one word unit in the target question, the third feature vector group comprises a plurality of third feature vectors, and each third feature vector corresponds to one word unit in the target answer.
For example, referring to fig. 10, a target text, a target question and a target answer may be input into a feature extraction layer of a reading understanding model to determine a first feature vector group, a second feature vector group and a third feature vector group, respectively.
In implementation, if the text length of the target text is smaller than the length threshold, the specific implementation of this step may include: performing word segmentation processing on the target text, the target question and the target answer to respectively obtain a first word unit group, a second word unit group and a third word unit group; performing word embedding processing on the first word unit group, the second word unit group and the third word unit group to obtain a first word vector group, a second word vector group and a third word vector group respectively; and coding the first word vector group, the second word vector group and the third word vector group to respectively obtain the first feature vector group, the second feature vector group and the third feature vector group.
In the embodiment of the present application, the feature extraction layer may include a word embedding processing function and an encoding function. As one example, the feature extraction layer may include a word embedding processing module and an encoding module.
Illustratively, the feature extraction layer may employ the structure of the Bert model. Because the feature vector obtained by the Bert model is the feature vector combined with full-text semantic information, the feature vectors of the target text, the target question and the word unit in the target answer can be more fully utilized, and the accuracy of reading and understanding the model can be improved.
As an example, taking a target text as an example, if the target text is a chinese text, a word may be divided into a word unit, and a punctuation mark may be divided into a word unit; if the target text is a foreign language text, a word can be divided into a word unit, and a phrase can be divided into a word unit; if the target text has the numbers, the numbers can be divided into word units separately.
Exemplarily, assuming that the target text is "li-white lifetime written poetry countless, which is called poetry", the plurality of first word units of "li", "white", "one", "raw", "written", "poetry", "none", "number", "by", "called", "as", "poetry", "immortal" can be obtained.
As an example, a word embedding process may be performed on each first word unit in the first word unit group in a one-hot (one-hot) encoding manner to obtain a word vector of each first word unit, a word embedding process may be performed on each second word unit in the second word unit group to obtain a word vector of each second word unit, and a word embedding process may be performed on each word unit in the third word unit group to obtain a word vector of each third word unit.
As another example, word embedding processing may be performed on each first word unit in the first word unit group in a word2vec coding manner to obtain a word vector of each first word unit, word embedding processing may be performed on each second word unit in the second word unit group to obtain a word vector of each second word unit, and word embedding processing may be performed on each word unit in the third word unit group to obtain a word vector of each third word unit.
As an example, each first word vector, each second word vector, and each third word vector are encoded, so that a vector representation after fusion of full-text semantic information of a target text corresponding to each first word unit, that is, a first feature vector, and a vector representation after fusion of full-text semantic information of a target question corresponding to each second word unit, that is, a second feature vector, and a vector representation after fusion of full-text semantic information of a target answer corresponding to each third word unit, that is, a third feature vector, can be obtained, and then a first feature vector group, a second feature vector group, and a third feature vector group can be obtained.
Illustratively, taking an example that the target problem is "who i love", inputting "who i love" into the feature extraction layer, segmenting "who i love", obtaining word units "i", "love" and "who", performing word embedding processing on "i", "love" and "who" respectively, obtaining word vectors of "i", word vectors of "love" and word vectors of "who", encoding word vectors of "i", word vectors of "love" and word vectors of "who", obtaining a third feature vector obtained after "i" combines word vectors of "love" and "who", and obtaining a third feature vector obtained after "who" combines word vectors of "i" and "love" combine "word vectors of" and "love". Similarly, the target text 'i love my country' is input into the feature extraction layer, a first feature vector of each word in the target text can be output, and the target answer 'hometown' is input into the feature extraction layer, and a second feature vector of each word in the target answer can be output.
In the embodiment of the application, the feature extraction layer may adopt a structure of a BERT model which is preprocessed and is finely adjusted by using a reading understanding task, so that the obtained first feature vector group, the obtained second feature vector group and the obtained third feature vector group can respectively reflect the semantics of a target text, the semantics of a target question and the semantics of a target answer more accurately, and the training rate and the use accuracy of the model can be improved.
In the above description, the text length of the target text is smaller than the length threshold, and when the text length of the target text is smaller than the length threshold, the reading understanding model can process the target text, so that the target text can be directly subjected to word segmentation. In other embodiments, if the target text is a chapter-level text, that is, the text length of the target text is greater than or equal to the length threshold, the reading understanding model may not be able to process the target text, and therefore, the target text may be segmented or sentence-wise processed to obtain a plurality of target text segments, and then the first feature vector group of each target text segment is extracted through the feature extraction layer. For example, if the target text is divided into 3 target text segments, 3 first feature vector groups may be extracted, where the 3 first feature vector groups are respectively used to represent the semantics of the 3 target text segments. Moreover, the method for extracting the first feature vector group of the target text segment is the same as the above-mentioned method for extracting the first feature vector group of the target text, and the details are not repeated herein in this embodiment.
Step 906, inputting the first feature vector group, the second feature vector group, and the third feature vector group into an attention layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network, respectively, to obtain a first graph network and a second graph network.
Wherein the first graph network is an initial first graph network that includes attention values of nodes and attention values of edges. The second graph network is an initial second graph network that includes attention values for nodes and attention values for edges.
As an example, the attention layer may employ the structure of the attention layer of the BERT model. Alternatively, the attention layer may adopt any other structure including a model of an attention mechanism, which is not limited in this embodiment of the present application.
As an example, in this step, a first feature vector group, a second feature vector group, a third feature vector group, an initial first graph network, and an initial second graph network may be input into an attention layer of the reading understanding model, an attention value is added to a node and an edge of the initial first graph network based on the first feature vector group and the second feature vector group to obtain the first graph network, and an attention value is added to a node and an edge of the initial second graph network based on the second feature vector group and the third feature vector group to obtain the second graph network. Exemplarily, referring to fig. 10, a first feature vector group, a second feature vector group, a third feature vector group, an initial first graph network, and an initial second graph network may be input into an attention layer of a reading understanding model, and an attention value is added to nodes and edges included in the initial first graph network based on the first feature vector group and the second feature vector group, so as to obtain a first graph network; and adding attention values to the nodes and edges included in the initial second graph network based on the second feature vector group and the third feature vector group to obtain a second graph network.
Or, as another example, in this step, the first feature vector group, the second feature vector group, and the third feature vector group may be input into an attention layer of the reading understanding model, an attention value of a node and an edge included in the initial first graph network is obtained based on the first feature vector group and the second feature vector group, and the attention value is added to the initial first graph network to obtain the first graph network; and obtaining attention values of nodes and edges included in the initial second graph network based on the second feature vector group and the third feature vector group, and adding the attention values to the initial second graph network to obtain the second graph network.
In implementation, if the text length of the target text is smaller than the length threshold, the specific implementation of this step may include: adding, by the attention layer, attention values for nodes and edges of the initial first graph network based on the first set of feature vectors and the third set of feature vectors; adding, by the attention layer, attention values for nodes and edges of the initial second graph network based on the second set of feature vectors and the third set of feature vectors.
As an example, the initial first graph network represents the association relationship between the target text and the target answer, the first feature vector group is the feature representation of the target text, and the third feature vector group is the feature representation of the target answer, so that the attention values can be added to the nodes and edges of the initial first graph network according to the first feature vector group and the third feature vector group. Similarly, the initial second graph network represents the incidence relation between the target question and the target answer, the second feature vector group is the feature representation of the target question, and the third feature vector group is the feature representation of the target answer, so that the attention values can be added to the nodes and edges of the initial second graph network according to the second feature vector group and the third feature vector group.
In some embodiments, the specific implementation of adding, by the attention layer, attention values to the nodes and edges of the initial first graph network based on the first set of feature vectors and the third set of feature vectors may include: taking a first feature vector in the first feature vector group as an attention value of a first node in the initial first graph network, wherein the first node is a node corresponding to a word unit of the target text in the initial first graph network; taking a third feature vector in the third feature vector group as an attention value of a second node in the initial first graph network, wherein the second node is a node corresponding to a word unit of the target answer in the initial first graph network; determining an attention value between two first nodes of an edge in the initial first graph network based on the first feature vector group and serving as the attention value of the edge; based on the first set of feature vectors and the third set of feature vectors, determining and using an attention value between a first node and a second node of the initial first graph network where an edge exists as an attention value of the edge.
That is, the first feature vector in the first feature vector group may be used as the attention value of the node corresponding to the word unit of the target text in the initial first graph network, and the third feature vector in the third feature vector group may be used as the attention value of the node corresponding to the word unit of the target answer in the initial first graph network. And determining an attention value of an edge between word units of the target text in the initial first graph network according to the first feature vector group, and determining an attention value of an edge between word units of the target text and word units of the target answer in the initial first graph network according to the first feature vector group and the third feature vector group.
As an example, for two first nodes where an edge exists, attention calculation may be performed on the first feature vectors of word units corresponding to the two first nodes, and the attention value of the edge may be obtained. Specifically, the attention calculation on the two first feature vectors is to multiply the two first feature vectors and normalize the result to obtain the attention value. Illustratively, referring to fig. 11, there is an edge between "i" and "ai" in fig. 11, and "i" and "ai" are word units in the target text, a first feature vector of the word unit "i" may be obtained from the first feature vector group, and a first feature vector of "ai" may be obtained from the first feature vector group, and the first feature vector of "i" and the first feature vector of "ai" may be multiplied, and normalization processing may be performed on the product, and an attention value of the edge between "i" and "ai" may be obtained.
As an example, for a first node and a second node where an edge exists, attention calculation may be performed on a first feature vector of a word unit corresponding to the first node and a third feature vector of the word unit corresponding to the second node, and an attention value of the edge may be obtained. Specifically, the attention calculation on the first feature vector and the third feature vector is to multiply the first feature vector and the third feature vector and normalize the result to obtain the attention value. Illustratively, referring to fig. 11, there is an edge between "i" and "home" in fig. 11, and "i" is a word unit in the target text, and "home" is a word unit in the target answer, a first feature vector of the word unit "i" may be obtained from a first feature vector group, and a third feature vector of "home" may be obtained from a third feature vector group, and the first feature vector of "i" and the third feature vector of "home" may be multiplied, and the product is normalized, so that the attention value of the edge between "i" and "home" may be obtained.
By the above manner, the attention value of each edge and the attention value of each node in fig. 11 can be determined, and the first graph network can be obtained by adding the attention values of the nodes and the edges to the initial first graph network.
In some embodiments, the specific implementation of adding, by the attention layer, attention values to the nodes and edges of the initial second graph network based on the second set of feature vectors and the third set of feature vectors may include: taking a second feature vector in the second feature vector group as an attention value of a third node in the initial second graph network, wherein the third node is a node corresponding to a word unit of the target problem in the initial second graph network; taking a third feature vector in the third feature vector group as an attention value of a fourth node in the initial second graph network, wherein the fourth node is a node corresponding to a word unit of the target answer in the initial second graph network; determining an attention value between two third nodes of the initial second graph network with edges as the attention value of the edges based on the second feature vector group; based on the third feature vector group, determining an attention value between a third node and a fourth node of the initial second graph network where the edge exists and using the attention value as the attention value of the edge.
That is, the second feature vector in the second feature vector group may be used as the attention value of the node corresponding to the word unit of the target question in the initial second graph network, and the third feature vector in the third feature vector group may be used as the attention value of the node corresponding to the word unit of the target answer in the initial second graph network. And determining an attention value of an edge between word units of the target question in the initial second graph network according to the second feature vector group, and determining an attention value of an edge between word units of the target question and word units of the target answer in the initial second graph network according to the second feature vector group and the third feature vector group.
As an example, for two third nodes where an edge exists, attention calculation may be performed on the second feature vectors of the word units corresponding to the two third nodes, and the attention value of the edge may be obtained. Specifically, the attention calculation on the two second feature vectors is to multiply the two second feature vectors and normalize the result to obtain the attention value. Illustratively, referring to fig. 12, an edge exists between "i" and "who" in fig. 12, and "i" and "who" are word units in the target question, a second feature vector of the word unit "i" may be obtained from the second feature vector group, and a second feature vector of "who" may be obtained from the second feature vector group, and the second feature vector of "i" and the second feature vector of "who" may be multiplied, and normalization processing may be performed on the products, and a value of attention of the edge between "i" and "who" may be obtained.
As an example, for a third node and a fourth node where an edge exists, attention calculation may be performed on a second feature vector of a word unit corresponding to the third node and a third feature vector of a word unit corresponding to the fourth node, and an attention value of the edge may be obtained. Specifically, the attention calculation for the second feature vector and the third feature vector is to multiply the second feature vector and the third feature vector and normalize the result to obtain the attention value. Illustratively, referring to fig. 12, there is an edge between "who" and "home" in fig. 12, and "who" is a word unit in the target question, "home" is a word unit in the target answer, a second feature vector of the word unit "who" can be obtained from a second feature vector group, and a third feature vector of "home" can be obtained from a third feature vector group, the second feature vector of "who" and the third feature vector of "home" can be multiplied, the product is normalized, and the attention value of the edge between "who" and "home" can be obtained.
In the above manner, the attention value of each edge and the attention value of each node in fig. 12 can be determined, and the attention values of the nodes and the edges are added to the initial second graph network, so that the second graph network can be obtained.
In the embodiment of the present application, attention calculation may be performed on the two feature vectors by using the above formula (1), and for specific implementation, reference may be made to the relevant description of step 206, which is not described herein again.
It should be noted that, the determination of the first graph network is described as an example in the above, where the text length of the target text is smaller than the length threshold, that is, the first feature vector group corresponds to the target text. In other embodiments, for a target text, if the target text is split into a plurality of target text segments, and the first feature vector group is a feature vector group of the target text segment, attention values may be added to nodes and edges of the initial first graph network corresponding to the target text segment based on the first feature vector group of each target text segment and the third feature vector group of the target answer.
For example, if the target text is divided into 3 target text segments, 3 first feature vector groups may be extracted, generating 3 initial first graph networks. For the reference initial first graph network, which is generated based on the reference target text segment and the target answer, attention values may be added to nodes and edges of the reference initial first graph network according to the first feature vector group of the reference target text segment and the third feature vector group of the target answer, so as to obtain the reference first graph network. The reference target text segment is any one of a plurality of text segments, a reference initial first graph network corresponds to the reference target text segment, and the reference first graph network corresponds to the reference target text segment. Similarly, 3 first graph networks can be obtained in the above manner. In addition, the implementation process of adding the attention values to the nodes and edges of the initial first graph network corresponding to the target text segment is the same as the implementation process of adding the attention values to the nodes and edges of the initial first graph network, and reference may be specifically made to the related description of the foregoing embodiment in this step, which is not described herein again.
It should be noted that, steps 904 to 906 are steps of inputting the target text, the target question, and the target answer into a text processing layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network, respectively, to obtain a specific implementation of the first graph network and the second graph network.
Step 908, inputting the first graph network and the second graph network into the graph volume network layer of the reading understanding model to obtain an answer of the target question.
As an example, the Graph volume Network layer may be a GCN (Graph volume Network) model.
Illustratively, referring to fig. 10, the first graph network and the second graph network may be entered into a graph volume network layer of the reading understanding model to obtain answers.
In implementation, if the text length of the target text is smaller than the length threshold, the first graph network is a graph network that reflects an association relationship between the target text and the target answer, and the specific implementation of inputting the first graph network and the second graph network into the graph volume network layer of the reading understanding model to obtain the answer may include: determining, by the graph convolution network layer, a first hidden layer feature vector of the first graph network and a second hidden layer feature vector of the second graph network; carrying out weighted summation on the first hidden layer feature vector and the second hidden layer feature vector to obtain a target hidden layer feature vector; determining the answer based on the target hidden layer feature vector.
The first hidden layer feature vector is a vector representation of the first graph network obtained after the convolution processing is carried out on the first graph network through the graph convolution network layer. The second hidden layer feature vector is a vector representation of the second graph network obtained after the second graph network is subjected to convolution processing through the graph convolution network layer.
As an example, a first graph network may be input into a graph convolution network layer for convolution processing to obtain a first hidden layer feature vector, and a second graph network may be input into the graph convolution network layer for convolution processing to obtain a second hidden layer feature vector.
It should be noted that, in the graph convolution network layer, the graph network may be subjected to convolution processing through the above formula (2), and specific implementation may refer to the relevant description of step 208, which is not described herein again in this embodiment of the present application.
As an example, when the first hidden layer feature vector and the second hidden layer feature vector are subjected to weighted summation, a weight of the first hidden layer feature vector and a weight of the second hidden layer feature vector may be the same or different, and may be set by a user according to actual needs or may be set by a computing device as a default, which is not limited in this embodiment of the present application.
In some embodiments, determining a specific implementation of the answer based on the target hidden-layer feature vector may include: converting the value of each dimension of the target hidden layer feature vector into at least one probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and at least one probability corresponding to each dimension represents the probability that the label of the word unit corresponding to each dimension is at least one label; determining a label of a word unit corresponding to each dimension based on at least one probability corresponding to each dimension; and determining the answer based on the label of the word unit corresponding to each dimension.
As an example, the sequence annotation function is a function used in performing sequence annotation, and can map an input vector into at least one-dimensional probabilities, that is, at least one probability can be obtained for each vector.
For example, the target hidden layer feature vector may be used as an input of a sequence labeling function, and through calculation of the sequence labeling function, a probability corresponding to each dimension of the target hidden layer feature vector may be obtained.
As an example, the label may be B, I, O. Wherein, B represents the initial word of the answer, namely the first word of the answer; i represents an ending word in the middle of the answer, namely the second character to the last character of the answer; o denotes a non-answer word, i.e., a word that is not an answer.
It should be noted that the length of the target hidden layer feature vector is the same as the length of the target text.
Illustratively, taking the target text as "i love my country", the target hidden-layer feature vector is a 6-dimensional vector, and the 6 dimensions correspond to word units i, ancestor, and country respectively, each dimension in the target hidden-layer feature vector is converted into 3 probabilities, and each probability corresponds to the possibility of occurrence of the label "BIO". For example, for the word unit "love", assuming that the calculated probabilities are 0.2, 0.3, and 0.5, respectively, it can be determined that the probability of the label being "O" is the largest, and the label corresponding to "love" is "O". Similarly, let us assume that the labels corresponding to the 6 word units are "O", "B" and "I", respectively. Since the label "B" represents the initial word of the answer and the label "I" represents the intermediate final word of the answer, it can be considered that "ancestor" and "nation" are the answers.
As an example, the at least one tag includes an answer beginning word, an answer middle ending word, and a non-answer word, and determining a specific implementation of the answer based on the tag of the word unit corresponding to each dimension may include: and taking the word unit corresponding to the head word of the answer and the word unit corresponding to the middle ending word of the answer as the answers.
That is to say, the initial word of the answer and the final word in the middle of the answer can be spliced to obtain the answer.
Continuing the above example, "home" may be determined as the answer.
It should be noted that, the above description is given by taking an example in which the text length of the target text is smaller than the length threshold, that is, the first graph network corresponds to the entire target text. In other embodiments, for a target text, if the target text is split into a plurality of target text segments, the first graph network corresponds to the target text segments, and the answer obtained by inputting the first graph network and the second graph network into the graph convolution network layer corresponds to the target text segments, but the answer is not necessarily a correct answer to the target question. Thus, in this case, each target text segment may get one answer, and then multiple answers may be obtained, and then the correct answer to the target question may be determined from the multiple answers.
As an example, a target answer with the highest frequency of occurrence among the plurality of answers may be taken as the answer to the target question. For example, assuming that the target text is divided into 10 target text segments, each of the first graph network and the second graph network is input into the graph convolution network layer for processing, 10 answers may be obtained, and the 10 answers include the target answer, and the target answer with the largest occurrence number of times among the 10 answers may be used as the answer to the target question.
By the method, the association relation among the target text, the target question and the target answer can be effectively extracted by utilizing the characteristic vectors of the target text, the target question and the target answer, the answer of the target question is determined through the reading understanding model by combining the association relation among the sample text, the target question and the target answer, and the accuracy of the reading understanding task executed by the reading understanding model can be improved.
The following will further describe the training method of the reading understanding model by taking the application of the reading understanding method provided by the present application in a reading understanding task as an example with reference to fig. 13. Fig. 13 shows a processing flow chart of a reading understanding method applied to a choice topic according to an embodiment of the present application, which may specifically include the following steps:
step 1302: and acquiring a target text, a target question and a target answer.
In the embodiment of the present application, the form of the target question and the text length of the target text are not limited, and the reading and understanding method is described in the embodiment by taking the target question as a choice question and taking the text length of the target text smaller than the length threshold as an example.
For example, the target text is "i love my home", the target question is "who i love", and the target answer is two options, i.e., "home".
Step 1304: and inputting the target text, the target question and the target answer into a graph construction network layer of the reading understanding model, and constructing an initial third graph network based on the dependency relationship among word units in the target text.
For example, taking the target text "i love my country" as an example, performing dependency analysis on the target text "i love my country" through the Stanford Core NLP algorithm may obtain "i" as a subject, "love" as a predicate, "my country" as an object, and may obtain the dependency relationship between "i", "love", "i", "ancestor" and "country" with each other. For example, one "me" and "love", one "me" and "ancestor" both have a dependency relationship in the target text, one "love" and "ancestor", and "ancestor" and "country", based on which the initial third graph network shown in fig. 4 is obtained.
Step 1306: and connecting the target node with a node in the initial third graph network by taking the word unit in the target answer as the target node based on the incidence relation between the word unit in the target answer and the word unit in the target text to obtain the initial first graph network.
Continuing with the above example, the word units in the target answer may be used as target nodes, the "ancestors" in the target answer are respectively connected to each node in the initial third graph network, the "countries" in the target answer are respectively connected to each node in the initial third graph network, the "homes" in the target answer are respectively connected to each node in the initial third graph network, and the "counties" in the target answer are respectively connected to each node in the initial third graph network, so that the first graph network shown in fig. 11 may be obtained, where the bold nodes in fig. 11 are the target nodes.
Step 1308: and inputting the target text, the target question and the target answer into a graph construction network layer of the reading understanding model, and constructing an initial fourth graph network based on the dependency relationship among word units in the target question.
Continuing with the above example, by performing dependency analysis on the target problem "who i love" through the Stanford Core NLP algorithm, we can obtain that "i" is the subject, "love" is the predicate, and "who" is the object, and can obtain the dependency relationship between "i", "love", and "who" each other. For example, there is a dependency relationship between "i" and "love" in the target problem, there is a dependency relationship between "love" and "who", and there is a dependency relationship between "i" and "who", based on the above-mentioned dependency relationship, referring to fig. 6, the initial fourth graph network shown in fig. 6 can be obtained.
Step 1310: and connecting the target node with a node in the initial fourth graph network by taking the word unit in the target answer as the target node based on the incidence relation between the word unit in the target answer and the word unit in the target question to obtain the initial second graph network.
Continuing with the above example, the word units in the target answer may be used as target nodes, the "ancestors" in the target answer are respectively connected to each node in the initial fourth graph network, the "countries" in the target answer are respectively connected to each node in the initial fourth graph network, the "homes" in the target answer are respectively connected to each node in the initial fourth graph network, and the "counties" in the target answer are respectively connected to each node in the initial fourth graph network, so that the initial second graph network shown in fig. 12 may be obtained, where the bold nodes in fig. 12 are the target nodes.
Step 1312: inputting a target text, a target question and a target answer into a feature extraction layer of a reading understanding model, performing word segmentation processing on the target text to obtain a first word unit group, performing word segmentation processing on the target question to obtain a second word unit group, and performing word segmentation processing on the target answer to obtain a third word unit group.
Continuing with the above example, the first word unit group, which is "me", "love", "me", "ancestor" country ", respectively, can be obtained after the word segmentation of the target text. Similarly, the word segmentation is performed on the target question to obtain a second word unit group, which is "I", "love", and "who", respectively. The word segmentation is performed on the target answer to obtain a third word unit group, which is "ancestor", "country", "home", and "county", respectively.
And 1314, performing word embedding processing on the first word unit group, the second word unit group and the third word unit group to respectively obtain a first word vector group, a second word vector group and a third word vector group.
Continuing the above example, performing word embedding processing on each first word unit in the first word unit group may obtain a first word vector of "me", "love", "me", "ancestor", and a first word vector of "country", respectively. Similarly, each second word unit in the second word unit group is subjected to word embedding processing, which may be a "my" second word vector, a "love" second word vector, and a "who" second word vector, respectively. Word embedding processing is carried out on each third word unit in the third word unit group, so that a third word vector of the ancestor, a third word vector of the country, a third word vector of the family and a third word vector of the country can be obtained respectively.
Step 1316, encoding the first word vector group, the second word vector group and the third word vector group to obtain a first feature vector group, a second feature vector group and a third feature vector group, respectively.
Continuing with the above example, encoding the word vector of "me", "love", "who" and "who" may result in a third feature vector obtained after "me" combines the word vectors of "love", "who" and may result in a third feature vector obtained after "love" combines the word vectors of "me" and "who", and may result in a third feature vector obtained after "who" combines the word vectors of "me" and "love". Similarly, a second feature vector of each word unit in the target question and a third feature vector of each word unit in the target answer can be obtained.
Step 1318, adding attention values to the nodes and edges of the initial first graph network through the attention layer based on the first feature vector group and the third feature vector group, and obtaining the first graph network.
Continuing the above example, the feature vector of each node in FIG. 11 may be taken as the attention value of each node. In fig. 11, an edge exists between "i" and "ai", and "i" and "ai" are word units in the target text, a first feature vector of the word unit "i" may be obtained from the first feature vector group, and a first feature vector of "ai" may be obtained from the first feature vector group, the first feature vector of "i" and the first feature vector of "ai" may be multiplied, and normalization processing may be performed on the product, so that an attention value of the edge between "i" and "ai" may be obtained. An edge exists between 'me' and 'home', the 'me' is a word unit in a target text, the 'home' is a word unit in a target answer, a first feature vector of the word unit 'me' can be obtained from a first feature vector group, a third feature vector of the 'home' can be obtained from a third feature vector group, the first feature vector of the 'me' and the third feature vector of the 'home' can be multiplied, normalization processing is carried out on the products, and an attention value of the edge between the 'me' and the 'home' can be obtained.
By the above manner, the attention value of each edge and the attention value of each node in fig. 11 can be determined, and the first graph network can be obtained by adding the attention values of the nodes and the edges to the initial first graph network.
Step 1320, adding attention values to the nodes and edges of the initial second graph network through the attention layer based on the second feature vector group and the third feature vector group to obtain a second graph network.
Continuing the above example, the feature vector of each node in FIG. 11 may be taken as the attention value of each node. In fig. 12, an edge exists between "me" and "who", and "me" and "who" are word units in the target problem, a second feature vector of the word unit "me" may be obtained from the second feature vector group, and a second feature vector of "who" may be obtained from the second feature vector group, the second feature vector of "me" and the second feature vector of "who" may be multiplied, and normalization processing may be performed on the products, so that an attention value of the edge between "me" and "who" may be obtained. An edge exists between 'who' and 'home', the 'who' is a word unit in the target question, the 'home' is a word unit in the target answer, a second feature vector of the word unit 'who' can be obtained from the second feature vector group, a third feature vector of the 'home' can be obtained from the third feature vector group, the second feature vector of the 'who' and the third feature vector of the 'home' can be multiplied, normalization processing is carried out on the products, and the attention value of the edge between the 'who' and the 'home' can be obtained.
In the above manner, the attention value of each edge and the attention value of each node in fig. 12 can be determined, and the attention values of the nodes and the edges are added to the initial second graph network, so that the second graph network can be obtained.
Step 1322 is to input the first graph network and the second graph network into the graph convolution network layer of the reading understanding model, and determine a first hidden layer feature vector of the first graph network and a second hidden layer feature vector of the second graph network through the graph convolution network layer.
As an example, a first graph network may be input into a graph convolution network layer for convolution processing to obtain a first hidden layer feature vector, and a second graph network may be input into the graph convolution network layer for convolution processing to obtain a second hidden layer feature vector.
And step 1324, performing weighted summation on the first hidden layer feature vector and the second hidden layer feature vector to obtain a target hidden layer feature vector.
Step 1326, converting the value of each dimension of the target hidden layer feature vector into at least one prediction probability through a sequence labeling function.
As an example, the sequence annotation function is a function used in performing sequence annotation, and can map an input vector into at least one-dimensional probabilities, that is, at least one probability can be obtained for each vector.
For example, the target hidden layer feature vector may be used as an input of a sequence labeling function, and through calculation of the sequence labeling function, a probability corresponding to each dimension of the target hidden layer feature vector may be obtained.
Continuing with the above example, if the target text is "i love my country", and includes 6 word units, the target hidden layer feature vector is a 6-dimensional vector, and the 6 dimensions correspond to the word units i, ancestor, and country, respectively, each dimension in the target hidden layer feature vector is converted into 3 prediction probabilities, and each probability corresponds to the possibility of occurrence of the label "BIO". For example, for the word unit "ancestor", it is assumed that the calculated prediction probabilities are 0.5, 0.3, and 0.2, respectively, 0.5 is the probability that the label of the word unit "ancestor" is "B", 0.3 is the probability that the label of the word unit "ancestor" is "I", 0.2 is the probability that the label of the word unit "ancestor" is "O", and for the word unit "country", it is assumed that the calculated prediction probabilities are 0.3, 0.6, and 0.1, respectively, 0.3 is the probability that the label of the word unit "country" is "B", 0.6 is the probability that the label of the word unit "country" is "I", and 0.1 is the probability that the label of the word unit "country" is "O".
At step 1328, a prediction label of the word unit corresponding to each dimension is determined based on the at least one prediction probability corresponding to each dimension.
Continuing with the above example, since 0.5 is the largest among the prediction probabilities corresponding to the word unit "ancestor" and 0.5 is the probability that the label of the word unit "ancestor" is "B", the prediction label corresponding to the word unit "ancestor" can be determined to be "O", 0.6 is the largest among the prediction probabilities corresponding to the word unit "country", and 0.6 is the probability that the label of the word unit "country" is "I", the prediction label corresponding to the "country" can be determined to be "I".
Step 1330, the word unit corresponding to the head word of the answer and the word unit corresponding to the end word in the middle of the answer are used as the answer of the target question.
Continuing the above example, assume that the 6 word units identified as "I, love, I, ancestor, and State" correspond to labels "O", "B", and "I", respectively. Since the label "B" represents an answer leading word and the label "I" represents an answer intermediate ending word, it can be determined that the answer to the target question is "home".
By the method, the association relation among the target text, the target question and the target answer can be effectively extracted by utilizing the characteristic vectors of the target text, the target question and the target answer, the answer of the target question is determined through the reading understanding model by combining the association relation among the target text, the target question and the target answer, and the accuracy of the reading understanding task executed by the reading understanding model can be improved.
Corresponding to the above method embodiment, the present application further provides an embodiment of a training apparatus for reading and understanding a model, and fig. 14 illustrates a schematic structural diagram of a training apparatus for reading and understanding a model according to an embodiment of the present application. As shown in fig. 14, the apparatus may include:
a first graph network construction module 1402 configured to construct an initial first graph network of sample text fragments and sample answers by reading graph construction network layers of the understanding model, and construct an initial second graph network of sample questions and the sample answers;
a first text processing module 1404 configured to input the sample text segment, the sample question, and the sample answer into a text processing layer of the reading understanding model, and add attention values to nodes and edges included in the initial first graph network and the initial second graph network, respectively, to obtain a first graph network and a second graph network;
a prediction module 1406 configured to input the first graph network and the second graph network into a graph volume network layer of the reading understanding model to obtain a predicted answer;
a training module 1408 configured to train the reading understanding model based on a difference between the predicted answer and the sample answer until a training stop condition is reached.
Optionally, the first text processing module 1404 configured to:
inputting the sample text segment, the sample question and the sample answer into a feature extraction layer of the reading understanding model to respectively obtain a first feature vector group, a second feature vector group and a third feature vector group;
inputting the first feature vector group, the second feature vector group and the third feature vector group into an attention layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network.
Optionally, the first text processing module 1404 configured to:
performing word segmentation processing on the sample text segment, the sample question and the sample answer to respectively obtain a first word unit group, a second word unit group and a third word unit group;
performing word embedding processing on the first word unit group, the second word unit group and the third word unit group to obtain a first word vector group, a second word vector group and a third word vector group respectively;
and coding the first word vector group, the second word vector group and the third word vector group to respectively obtain the first feature vector group, the second feature vector group and the third feature vector group.
Optionally, a first graph network building module 1402 configured to:
constructing an initial third graph network based on the dependency relationship among the word units in the sample text segment, and constructing an initial fourth graph network based on the dependency relationship among the word units in the sample question;
and constructing the initial first graph network based on the incidence relation between the initial third graph network and the sample answer, and constructing the initial second graph network based on the incidence relation between the initial fourth graph network and the sample answer.
Optionally, the first graph network building module 1402 is configured to:
taking word units in the sample text fragment as nodes to obtain a plurality of nodes;
and connecting the nodes with the dependency relationship based on the dependency relationship among the word units in the sample text fragment to obtain the initial third graph network.
Optionally, the first graph network building module 1402 is configured to:
and connecting the target node with a node in the initial third graph network by taking the word unit in the sample answer as the target node based on the incidence relation between the word unit in the sample answer and the word unit in the sample text segment to obtain the initial first graph network.
Optionally, the first graph network building module 1402 is configured to:
taking word units in the sample problem as nodes to obtain a plurality of nodes;
and connecting the nodes with the dependency relationship based on the dependency relationship among the word units in the sample problem to obtain the initial fourth graph network.
Optionally, the first graph network building module 1402 is configured to:
and connecting the target node with a node in the initial fourth graph network by taking the word unit in the sample answer as the target node based on the incidence relation between the word unit in the sample answer and the word unit in the sample question to obtain the initial second graph network.
Optionally, the first text processing module 1404 configured to:
adding, by the attention layer, attention values for nodes and edges of the initial first graph network based on the first set of feature vectors and the third set of feature vectors;
adding, by the attention layer, attention values for nodes and edges of the initial second graph network based on the second set of feature vectors and the third set of feature vectors.
Optionally, the first text processing module 1404 configured to:
taking a first feature vector in the first feature vector group as an attention value of a first node in the initial first graph network, wherein the first node is a node corresponding to a word unit of the sample text segment in the initial first graph network;
taking a third feature vector in the third feature vector group as an attention value of a second node in the initial first graph network, wherein the second node is a node corresponding to a word unit of the sample answer in the initial first graph network;
determining an attention value between two first nodes of an edge in the initial first graph network based on the first feature vector group and serving as the attention value of the edge;
based on the first set of feature vectors and the third set of feature vectors, determining and using an attention value between a first node and a second node of the initial first graph network where an edge exists as an attention value of the edge.
Optionally, the first text processing module 1404 configured to:
taking a second feature vector in the second feature vector group as an attention value of a third node in the initial second graph network, wherein the third node is a node corresponding to a word unit of the sample problem in the initial second graph network;
taking a third feature vector in the third feature vector group as an attention value of a fourth node in the initial second graph network, wherein the fourth node is a node corresponding to a word unit of the sample answer in the initial second graph network;
determining an attention value between two third nodes of the initial second graph network with edges as the attention value of the edges based on the second feature vector group;
based on the third feature vector group, determining an attention value between a third node and a fourth node of the initial second graph network where the edge exists and using the attention value as the attention value of the edge.
Optionally, the prediction module 1406 is configured to:
determining, by the graph convolution network layer, a first hidden layer feature vector of the first graph network and a second hidden layer feature vector of the second graph network;
carrying out weighted summation on the first hidden layer feature vector and the second hidden layer feature vector to obtain a target hidden layer feature vector;
determining the predicted answer based on the target hidden layer feature vector.
Optionally, the prediction module 1406 is configured to:
converting the value of each dimension of the target hidden layer feature vector into at least one prediction probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and the at least one prediction probability corresponding to each dimension represents the probability that the prediction label of the word unit corresponding to each dimension is at least one label;
determining a prediction label of a word unit corresponding to each dimension based on at least one prediction probability corresponding to each dimension;
and determining the predicted answer based on the predicted label of the word unit corresponding to each dimension.
Optionally, the prediction module 1406 is configured to:
the at least one label comprises an answer initial word, an answer intermediate final word and a non-answer word, and a word unit corresponding to the answer initial word and a word unit corresponding to the answer intermediate final word are used as the prediction answer.
Optionally, the training module 1408 is configured to:
if the difference value is smaller than a preset threshold value, stopping training the reading understanding model;
and if the difference is larger than or equal to the preset threshold, continuing to train the reading understanding model.
Optionally, the training module 1408 is configured to:
recording and carrying out iterative training once the predicted answer is obtained every time;
and counting the training times of iterative training, and if the training times are greater than a time threshold value, determining that the training stopping condition is reached.
In the embodiment of the application, an initial first graph network of a sample text fragment and a sample answer is constructed by reading a graph construction network layer of an understanding model, and an initial second graph network of a sample question and the sample answer is constructed; inputting the sample text segment, the sample question and the sample answer into a text processing layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network; inputting the first graph network and the second graph network into a graph volume network layer of the reading understanding model to obtain a predicted answer; training the reading understanding model based on the difference between the predicted answer and the sample answer until a training stop condition is reached. By the method, the incidence relation among the sample text segment, the sample question and the sample answer can be effectively utilized, the reading understanding model is trained by combining the incidence relation among the sample text segment, the sample question and the sample answer, and the accuracy of the reading understanding task executed by the reading understanding model can be improved.
The above is an illustrative scheme of the training device for reading and understanding the model in the embodiment. It should be noted that the technical solution of the training apparatus for reading and understanding the model and the technical solution of the training method for reading and understanding the model belong to the same concept, and details that are not described in detail in the technical solution of the training apparatus for reading and understanding the model can be referred to the description of the technical solution of the training method for reading and understanding the model.
Corresponding to the above method embodiment, the present application further provides an embodiment of a reading and understanding apparatus, and fig. 15 shows a schematic structural diagram of a reading and understanding apparatus provided in an embodiment of the present application. As shown in fig. 15, the apparatus may include:
a second graph network construction module 1502 configured to construct an initial first graph network of the target text and the target answer and an initial second graph network of the target question and the target answer by reading graph construction network layers of the understanding model;
a second text processing module 1504, configured to input the target text, the target question and the target answer into a text processing layer of the reading understanding model, and add attention values to nodes and edges included in the initial first graph network and the initial second graph network, respectively, to obtain a first graph network and a second graph network;
a determining module 1506 configured to input the first graph network and the second graph network into a graph volume network layer of the reading understanding model, and determine an answer to the target question.
Optionally, a second text processing module 1504 configured to:
inputting the target text, the target question and the target answer into a feature extraction layer of the reading understanding model to respectively obtain a first feature vector group, a second feature vector group and a third feature vector group;
inputting the first feature vector group, the second feature vector group and the third feature vector group into an attention layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network.
Optionally, a second text processing module 1504 configured to:
performing word segmentation processing on the target text, the target question and the target answer to respectively obtain a first word unit group, a second word unit group and a third word unit group;
performing word embedding processing on the first word unit group, the second word unit group and the third word unit group to obtain a first word vector group, a second word vector group and a third word vector group respectively;
and coding the first word vector group, the second word vector group and the third word vector group to respectively obtain the first feature vector group, the second feature vector group and the third feature vector group.
Optionally, second graph network building module 1502 is configured to:
constructing an initial third graph network based on the dependency relationship among the word units in the target text, and constructing an initial fourth graph network based on the dependency relationship among the word units in the target question;
and constructing the initial first graph network based on the incidence relation between the initial third graph network and the target answer, and constructing the initial second graph network based on the incidence relation between the initial fourth graph network and the target answer.
Optionally, the second graph network constructing module 1502 is configured to:
taking word units in the target text as nodes to obtain a plurality of nodes;
and connecting the nodes with the dependency relationship based on the dependency relationship among the word units in the target text to obtain the initial third graph network.
Optionally, the second graph network constructing module 1502 is configured to:
and connecting the target node with a node in the initial third graph network by taking the word unit in the target answer as a target node based on the incidence relation between the word unit in the target answer and the word unit in the target text to obtain the initial first graph network.
Optionally, the second graph network constructing module 1502 is configured to:
taking word units in the target problem as nodes to obtain a plurality of nodes;
and connecting the nodes with the dependency relationship based on the dependency relationship among the word units in the target problem to obtain the initial fourth graph network.
Optionally, the second graph network constructing module 1502 is configured to:
and connecting the target node with a node in the initial fourth graph network by taking the word unit in the target answer as a target node based on the incidence relation between the word unit in the target answer and the word unit in the target question to obtain the initial second graph network.
Optionally, a second text processing module 1504 configured to:
adding, by the attention layer, attention values for nodes and edges of the initial first graph network based on the first set of feature vectors and the third set of feature vectors;
adding, by the attention layer, attention values for nodes and edges of the initial second graph network based on the second set of feature vectors and the third set of feature vectors.
Optionally, a second text processing module 1504 configured to:
taking a first feature vector in the first feature vector group as an attention value of a first node in the initial first graph network, wherein the first node is a node corresponding to a word unit of the target text in the initial first graph network;
taking a third feature vector in the third feature vector group as an attention value of a second node in the initial first graph network, wherein the second node is a node corresponding to a word unit of the target answer in the initial first graph network;
determining an attention value between two first nodes of an edge in the initial first graph network based on the first feature vector group and serving as the attention value of the edge;
based on the first set of feature vectors and the third set of feature vectors, determining and using an attention value between a first node and a second node of the initial first graph network where an edge exists as an attention value of the edge.
Optionally, a second text processing module 1504 configured to:
taking a second feature vector in the second feature vector group as an attention value of a third node in the initial second graph network, wherein the third node is a node corresponding to a word unit of the target problem in the initial second graph network;
taking a third feature vector in the third feature vector group as an attention value of a fourth node in the initial second graph network, wherein the fourth node is a node corresponding to a word unit of the target answer in the initial second graph network;
determining an attention value between two third nodes of the initial second graph network with edges as the attention value of the edges based on the second feature vector group;
based on the third feature vector group, determining an attention value between a third node and a fourth node of the initial second graph network where the edge exists and using the attention value as the attention value of the edge.
Optionally, the determining module 1506 is configured to:
determining, by the graph convolution network layer, a first hidden layer feature vector of the first graph network and a second hidden layer feature vector of the second graph network;
carrying out weighted summation on the first hidden layer feature vector and the second hidden layer feature vector to obtain a target hidden layer feature vector;
determining the answer based on the target hidden layer feature vector.
Optionally, the determining module 1506 is configured to:
converting the value of each dimension of the target hidden layer feature vector into at least one probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and at least one probability corresponding to each dimension represents the probability that the label of the word unit corresponding to each dimension is at least one label;
determining a label of a word unit corresponding to each dimension based on at least one probability corresponding to each dimension;
and determining the answer based on the label of the word unit corresponding to each dimension.
Optionally, the determining module 1506 is configured to:
the at least one label comprises an answer initial word, an answer intermediate final word and a non-answer word, and a word unit corresponding to the answer initial word and a word unit corresponding to the answer intermediate final word are used as the answers.
In the embodiment of the application, an initial first graph network of a target text and a target answer is constructed through a graph construction network layer of a reading understanding model, and an initial second graph network of a target question and the target answer is constructed; inputting the target text, the target question and the target answer into a text processing layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network; and inputting the first graph network and the second graph network into a graph volume network layer of the reading understanding model to obtain an answer of the target question. By the method, the feature vectors of the target text, the target question and the target answer can be effectively utilized, the incidence relation among the target text, the target question and the target answer is extracted, the answer of the target question is determined through the reading understanding model by combining the incidence relation among the target text, the target question and the target answer, and the accuracy of the reading understanding task executed by the reading understanding model can be improved.
The above is a schematic scheme of a reading and understanding device of the embodiment. It should be noted that the technical solution of the reading and understanding apparatus and the technical solution of the reading and understanding method belong to the same concept, and details that are not described in detail in the technical solution of the reading and understanding apparatus can be referred to the description of the technical solution of the reading and understanding method.
It should be noted that the components in the device claims should be understood as functional blocks which are necessary to implement the steps of the program flow or the steps of the method, and each functional block is not actually defined by functional division or separation. The device claims defined by such a set of functional modules are to be understood as a functional module framework for implementing the solution mainly by means of a computer program as described in the specification, and not as a physical device for implementing the solution mainly by means of hardware.
An embodiment of the present application further provides a computing device, which includes a memory, a processor, and computer instructions stored in the memory and executable on the processor, where the processor executes the instructions to implement the steps of the reading understanding model training method, or implement the steps of the reading understanding method.
The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the above-mentioned reading and understanding model training method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the above-mentioned reading and understanding model training method. Alternatively, the technical solution of the computing device and the technical solution of the reading and understanding method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the reading and understanding method.
An embodiment of the present application further provides a computer-readable storage medium, which stores computer instructions, and when the instructions are executed by a processor, the method for training the reading understanding model as described above is implemented, or the method for training the reading understanding model as described above is implemented.
The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium and the technical solution of the above-mentioned reading and understanding model training method belong to the same concept, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the above-mentioned reading and understanding model training method. Alternatively, the technical solution of the storage medium and the technical solution of the reading and understanding method belong to the same concept, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the reading and understanding method.
The embodiment of the application discloses a chip, which stores computer instructions, and the instructions are executed by a processor to implement the steps of the reading understanding model training method as described above, or implement the steps of the reading understanding method as described above.
The foregoing description of specific embodiments of the present application has been presented. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The preferred embodiments of the present application disclosed above are intended only to aid in the explanation of the application. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and its practical applications, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and their full scope and equivalents.

Claims (33)

1. A method for determining a predicted answer, the method comprising:
converting the value of each dimension of a target hidden layer feature vector into at least one prediction probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and the at least one prediction probability corresponding to each dimension represents the probability that the prediction label of the word unit corresponding to each dimension is at least one label;
determining a prediction label of a word unit corresponding to each dimension based on at least one prediction probability corresponding to each dimension;
and determining a predicted answer based on the predicted label of the word unit corresponding to each dimension.
2. The method of claim 1, wherein prior to converting the values of each dimension of the target hidden layer feature vector into at least one prediction probability by a sequence labeling function, further comprising:
determining a first hidden layer feature vector of a first graph network and a second hidden layer feature vector of a second graph network through a graph convolution network layer;
and carrying out weighted summation on the first hidden layer feature vector and the second hidden layer feature vector to obtain the target hidden layer feature vector.
3. The method of claim 2, wherein the graph convolution network layer is a GCN model.
4. A method as claimed in claim 2 or 3, wherein said graph convolution network layer convolves said first graph network by:
Figure FDA0003271152450000011
wherein i represents an ith node in the first graph network, j represents a jth node in the first graph network,
Figure FDA0003271152450000012
the feature vector representing the i +1 th convolutional layer input to the i-th node, σ (·) represents a nonlinear transfer function, which is a ReLU activation function, NiRepresenting node i and all nodes connected to node i,
Figure FDA0003271152450000013
feature vector, C, representing the jth node input of the ith convolutional layerijIndicating the attention value of the edge between the ith node and the jth node,
Figure FDA0003271152450000014
represents the weight of the jth node at the ith convolutional layer,
Figure FDA0003271152450000015
represents the intercept of the jth node at the ith convolutional layer.
5. The method of claim 2 or 3, wherein the graph convolutional network layer comprises a plurality of convolutional layers, wherein the convolutional layers comprise a preset weight parameter matrix, and the weight of each node in each convolutional layer is an initial weight in the weight parameter matrix; or the convolutional layers comprise preset intercept parameter matrixes, and the intercept of each node in each convolutional layer is the initial intercept in the intercept parameter matrixes.
6. The method of claim 2, wherein prior to determining, by the graph convolution network layer, the first hidden layer feature vector of the first graph network and the second hidden layer feature vector of the second graph network, further comprising:
constructing an initial first graph network of sample text fragments and sample answers and an initial second graph network of sample questions and the sample answers by reading a graph construction network layer of an understanding model;
inputting the sample text segment, the sample question and the sample answer into a text processing layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network;
after the predicted answer is determined based on the predicted label of the word unit corresponding to each dimension, the method further comprises the following steps:
training the reading understanding model based on the difference between the predicted answer and the sample answer until a training stop condition is reached.
7. The method of claim 6, wherein the text processing layer comprises a feature extraction layer and an attention layer; inputting the sample text segment, the sample question and the sample answer into a text processing layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network, including:
inputting the sample text segment, the sample question and the sample answer into a feature extraction layer of the reading understanding model to respectively obtain a first feature vector group, a second feature vector group and a third feature vector group;
inputting the first feature vector group, the second feature vector group and the third feature vector group into an attention layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network.
8. The method of claim 7, wherein the feature extraction layer employs a structure of a Bert model.
9. The method of claim 7, wherein the attention layer adopts a structure of an attention layer of a Bert model.
10. The method of claim 7, wherein inputting the sample text passage, the sample question, and the sample answer into a feature extraction layer of the reading understanding model to obtain a first feature vector group, a second feature vector group, and a third feature vector group, respectively, comprises:
performing word segmentation processing on the sample text segment, the sample question and the sample answer to respectively obtain a first word unit group, a second word unit group and a third word unit group;
performing word embedding processing on the first word unit group, the second word unit group and the third word unit group to obtain a first word vector group, a second word vector group and a third word vector group respectively;
and coding the first word vector group, the second word vector group and the third word vector group to respectively obtain the first feature vector group, the second feature vector group and the third feature vector group.
11. The method of claim 10, wherein performing word segmentation on the sample text segment, the sample question, and the sample answer to obtain a first word unit group, a second word unit group, and a third word unit group, respectively, comprises:
if the sample text segment is a Chinese text, respectively and independently dividing a character, a punctuation mark and a number into a word unit, and forming the first word unit group by the word units obtained by dividing the sample text segment; or if the sample text segment is a foreign language text, dividing a word or a phrase into a word unit, and forming the first word unit group by the word units obtained by dividing the sample text segment;
if the sample problem is a Chinese text, a word, a punctuation mark and a number are respectively and independently divided into a word unit, and the word units obtained by the sample problem division form the second word unit group; or if the sample question is a foreign language text, dividing a word or a phrase into word units, and forming the second word unit group by the word units obtained by dividing the sample question;
if the sample answer is a Chinese text, respectively and independently dividing a character, a punctuation mark and a number into a word unit, and forming a third word unit group by the word units obtained by dividing the sample answer; or if the sample answer is a foreign language text, dividing a word or a phrase into word units, and forming the third word unit group by the word units obtained by dividing the sample answer.
12. The method of claim 10, wherein performing word embedding processing on the first word unit group, the second word unit group, and the third word unit group to obtain a first word vector group, a second word vector group, and a third word vector group, respectively, comprises:
performing word embedding processing on each first word unit in the first word unit group by adopting a single hot coding or word2vec coding mode to obtain the first word vector group;
performing word embedding processing on each second word unit in the second word unit group by adopting a single hot coding or word2vec coding mode to obtain a second word vector group;
and performing word embedding processing on each third word unit in the third word unit group by adopting a single hot coding or word2vec coding mode to obtain the third word vector group.
13. The method of claim 10, wherein encoding the first set of word vectors, the second set of word vectors, and the third set of word vectors to obtain the first set of feature vectors, the second set of feature vectors, and the third set of feature vectors, respectively, comprises:
and coding each first word vector, each second word vector and each third word vector to respectively obtain a first feature vector of each first word unit, a second feature vector of each second word unit and a third feature vector of each third word unit, wherein the first feature vector of each first word unit is represented by fused sample text fragment full-text semantic information corresponding to each first word unit, the second feature vector of each second word unit is represented by fused sample question full-text semantic information corresponding to each second word unit, and the third feature vector of each third word unit is represented by fused sample full-text semantic information corresponding to each third word unit.
14. The method of claim 6, wherein constructing an initial first graph network of sample text snippets and sample answers and an initial second graph network of sample questions and sample answers by reading a graph construction network layer of an understanding model comprises:
constructing an initial third graph network based on the dependency relationship among the word units in the sample text segment, and constructing an initial fourth graph network based on the dependency relationship among the word units in the sample question;
and constructing the initial first graph network based on the incidence relation between the initial third graph network and the sample answer, and constructing the initial second graph network based on the incidence relation between the initial fourth graph network and the sample answer.
15. The method of claim 14, wherein constructing an initial third graph network based on dependencies between word units in the sample text segments comprises:
taking word units in the sample text fragment as nodes to obtain a plurality of nodes;
and connecting the nodes with the dependency relationship based on the dependency relationship among the word units in the sample text fragment to obtain the initial third graph network.
16. The method of claim 14 or 15, wherein constructing the initial first graph network based on the initial third graph network and the correlations between the sample answers comprises:
and connecting the target node with a node in the initial third graph network by taking the word unit in the sample answer as the target node based on the incidence relation between the word unit in the sample answer and the word unit in the sample text segment to obtain the initial first graph network.
17. The method of claim 14, wherein constructing an initial fourth graph network based on dependencies between word units in the sample problem comprises:
taking word units in the sample problem as nodes to obtain a plurality of nodes;
and connecting the nodes with the dependency relationship based on the dependency relationship among the word units in the sample problem to obtain the initial fourth graph network.
18. The method according to claim 14, 15 or 17, wherein the dependency is calculated by a Stanford Core NLP algorithm.
19. The method of claim 14 or 17, wherein constructing the initial second graph network based on the correlations between the initial fourth graph network and the sample answers comprises:
and connecting the target node with a node in the initial fourth graph network by taking the word unit in the sample answer as the target node based on the incidence relation between the word unit in the sample answer and the word unit in the sample question to obtain the initial second graph network.
20. The method of claim 7, wherein inputting the first set of feature vectors, the second set of feature vectors, and the third set of feature vectors into an attention layer of the reading understanding model, adding attention values to nodes and edges included in the initial first graph network and the initial second graph network, respectively, to obtain a first graph network and a second graph network, comprises:
adding, by the attention layer, attention values for nodes and edges of the initial first graph network based on the first set of feature vectors and the third set of feature vectors;
adding, by the attention layer, attention values for nodes and edges of the initial second graph network based on the second set of feature vectors and the third set of feature vectors.
21. The method of claim 20, wherein adding, by the attention layer, attention values for nodes and edges of the initial first graph network based on the first set of feature vectors and the third set of feature vectors comprises:
taking a first feature vector in the first feature vector group as an attention value of a first node in the initial first graph network, wherein the first node is a node corresponding to a word unit of the sample text segment in the initial first graph network;
taking a third feature vector in the third feature vector group as an attention value of a second node in the initial first graph network, wherein the second node is a node corresponding to a word unit of the sample answer in the initial first graph network;
determining an attention value between two first nodes of an edge in the initial first graph network based on the first feature vector group and serving as the attention value of the edge;
based on the first set of feature vectors and the third set of feature vectors, determining and using an attention value between a first node and a second node of the initial first graph network where an edge exists as an attention value of the edge.
22. The method of claim 21, wherein the method of calculating the attention value comprises:
for two first nodes with edges, performing attention calculation on first feature vectors of word units corresponding to the two first nodes to obtain attention values of the edges; alternatively, the first and second electrodes may be,
and for a first node and a second node with edges, performing attention calculation on a first feature vector of a word unit corresponding to the first node and a third feature vector of the word unit corresponding to the second node to obtain the attention value of the edges.
23. The method of claim 20, wherein adding, by the attention layer, attention values for nodes and edges of the initial second graph network based on the second set of feature vectors and the third set of feature vectors comprises:
taking a second feature vector in the second feature vector group as an attention value of a third node in the initial second graph network, wherein the third node is a node corresponding to a word unit of the sample problem in the initial second graph network;
taking a third feature vector in the third feature vector group as an attention value of a fourth node in the initial second graph network, wherein the fourth node is a node corresponding to a word unit of the sample answer in the initial second graph network;
determining an attention value between two third nodes of the initial second graph network with edges as the attention value of the edges based on the second feature vector group;
based on the third feature vector group, determining an attention value between a third node and a fourth node of the initial second graph network where the edge exists and using the attention value as the attention value of the edge.
24. The method of claim 23, wherein the method of calculating the attention value comprises:
for two third nodes with edges, performing attention calculation on second feature vectors of word units corresponding to the two third nodes to obtain attention values of the edges; alternatively, the first and second electrodes may be,
and for a third node and a fourth node with edges, performing attention calculation on a second feature vector of the word unit corresponding to the third node and a third feature vector of the word unit corresponding to the fourth node to obtain the attention value of the edges.
25. The method of any one of claims 20-24, wherein the formula for calculating the attention value is:
Figure FDA0003271152450000081
wherein, attention represents attention value, softmax (·) is a normalization function, Q and K represent two eigenvectors respectively, dkIs a constant and T is a matrix transpose.
26. The method of claim 1, wherein the at least one label comprises an answer beginning word, an answer middle ending word, and a non-answer word; determining the predicted answer based on the predicted label of the word unit corresponding to each dimension, including:
and taking the word unit corresponding to the head word of the answer and the word unit corresponding to the middle ending word of the answer as the predicted answer.
27. The method of claim 6, wherein training the reading understanding model based on a difference between the predicted answer and the sample answer until a training stop condition is reached comprises:
if the difference value is smaller than a preset threshold value, stopping training the reading understanding model;
and if the difference is larger than or equal to the preset threshold, continuing to train the reading understanding model.
28. The method of claim 6, wherein reaching a training stop condition comprises:
recording and carrying out iterative training once the predicted answer is obtained every time;
and counting the training times of iterative training, and if the training times are greater than a time threshold value, determining that the training stopping condition is reached.
29. A method of reading comprehension, the method comprising:
constructing an initial first graph network of a target text and a target answer and an initial second graph network of a target question and the target answer through a graph construction network layer of a reading understanding model, wherein the reading understanding model is trained by the method of any one of claims 6 to 28;
inputting the target text, the target question and the target answer into a text processing layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network;
inputting the first graph network and the second graph network into a graph convolution network layer of the reading understanding model, and determining a target hidden layer feature vector;
converting the value of each dimension of the target hidden layer feature vector into at least one probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and at least one probability corresponding to each dimension represents the probability that the label of the word unit corresponding to each dimension is at least one label;
determining a label of a word unit corresponding to each dimension based on at least one probability corresponding to each dimension;
and determining an answer of the target question based on the label of the word unit corresponding to each dimension.
30. An apparatus for determining a predicted answer, the apparatus comprising:
the first conversion module is configured to convert a value of each dimension of a target hidden layer feature vector into at least one prediction probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and the at least one prediction probability corresponding to each dimension represents the probability that a prediction label of the word unit corresponding to each dimension is at least one label;
a first determining module configured to determine a prediction label of a word unit corresponding to each dimension based on at least one prediction probability corresponding to each dimension;
a second determining module configured to determine a predicted answer based on the predicted label of the word unit corresponding to each dimension.
31. A reading and understanding apparatus, comprising:
a graph network construction module configured to construct an initial first graph network of the target text and the target answer and an initial second graph network of the target question and the target answer by reading a graph construction network layer of an understanding model, wherein the reading understanding model is trained by the method of any one of claims 6 to 28;
a text processing module configured to input the target text, the target question and the target answer into a text processing layer of the reading understanding model, and add attention values to nodes and edges included in the initial first graph network and the initial second graph network, respectively, to obtain a first graph network and a second graph network;
a third determination module configured to input the first graph network and the second graph network into a graph convolution network layer of the reading understanding model, and determine a target hidden layer feature vector;
a second conversion module configured to convert a value of each dimension of the target hidden layer feature vector into at least one probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and the at least one probability corresponding to each dimension represents a probability that a label of the word unit corresponding to each dimension is at least one label;
a fourth determining module configured to determine a label of a word unit corresponding to each dimension based on at least one probability corresponding to each dimension;
a fifth determining module configured to determine an answer to the target question based on the label of the word unit corresponding to each dimension.
32. A computing device comprising a memory, a processor and computer instructions stored on the memory and executable on the processor, wherein the processor when executing the instructions performs the steps of the method for predictive answer determination of any one of claims 1 to 28, or the steps of the method for reading and understanding of claim 29, or the apparatus of claim 30 or 31.
33. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the method for determining a predicted answer according to any one of claims 1 to 28, or implement the steps of the method for reading and understanding according to claim 29, or include the apparatus according to claim 30 or 31.
CN202111110989.8A 2021-04-08 2021-04-08 Method and device for determining predicted answer and method and device for reading and understanding Pending CN113792550A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111110989.8A CN113792550A (en) 2021-04-08 2021-04-08 Method and device for determining predicted answer and method and device for reading and understanding

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111110989.8A CN113792550A (en) 2021-04-08 2021-04-08 Method and device for determining predicted answer and method and device for reading and understanding
CN202110375810.5A CN112800186B (en) 2021-04-08 2021-04-08 Reading understanding model training method and device and reading understanding method and device

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN202110375810.5A Division CN112800186B (en) 2021-04-08 2021-04-08 Reading understanding model training method and device and reading understanding method and device

Publications (1)

Publication Number Publication Date
CN113792550A true CN113792550A (en) 2021-12-14

Family

ID=75816480

Family Applications (4)

Application Number Title Priority Date Filing Date
CN202110375810.5A Active CN112800186B (en) 2021-04-08 2021-04-08 Reading understanding model training method and device and reading understanding method and device
CN202111111031.0A Active CN113792121B (en) 2021-04-08 2021-04-08 Training method and device of reading and understanding model, reading and understanding method and device
CN202111110989.8A Pending CN113792550A (en) 2021-04-08 2021-04-08 Method and device for determining predicted answer and method and device for reading and understanding
CN202111110988.3A Active CN113792120B (en) 2021-04-08 2021-04-08 Graph network construction method and device, reading and understanding method and device

Family Applications Before (2)

Application Number Title Priority Date Filing Date
CN202110375810.5A Active CN112800186B (en) 2021-04-08 2021-04-08 Reading understanding model training method and device and reading understanding method and device
CN202111111031.0A Active CN113792121B (en) 2021-04-08 2021-04-08 Training method and device of reading and understanding model, reading and understanding method and device

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202111110988.3A Active CN113792120B (en) 2021-04-08 2021-04-08 Graph network construction method and device, reading and understanding method and device

Country Status (1)

Country Link
CN (4) CN112800186B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688207B (en) * 2021-08-24 2023-11-17 思必驰科技股份有限公司 Modeling processing method and device based on structural reading understanding of network

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180307969A1 (en) * 2017-04-20 2018-10-25 Hitachi, Ltd. Data analysis apparatus, data analysis method, and recording medium
CN109002519A (en) * 2018-07-09 2018-12-14 北京慧闻科技发展有限公司 Answer selection method, device and electronic equipment based on convolution loop neural network
CN110309283A (en) * 2019-06-28 2019-10-08 阿里巴巴集团控股有限公司 A kind of answer of intelligent answer determines method and device
CN110309305A (en) * 2019-06-14 2019-10-08 中国电子科技集团公司第二十八研究所 Machine based on multitask joint training reads understanding method and computer storage medium
CN110647629A (en) * 2019-09-20 2020-01-03 北京理工大学 Multi-document machine reading understanding method for multi-granularity answer sorting
CN111046661A (en) * 2019-12-13 2020-04-21 浙江大学 Reading understanding method based on graph convolution network
US20200151613A1 (en) * 2018-11-09 2020-05-14 Lunit Inc. Method and apparatus for machine learning
CN111460092A (en) * 2020-03-11 2020-07-28 中国电子科技集团公司第二十八研究所 Multi-document-based automatic complex problem solving method
CN112434142A (en) * 2020-11-20 2021-03-02 海信电子科技(武汉)有限公司 Method for marking training sample, server, computing equipment and storage medium

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11853903B2 (en) * 2017-09-28 2023-12-26 Siemens Aktiengesellschaft SGCNN: structural graph convolutional neural network
US20190122111A1 (en) * 2017-10-24 2019-04-25 Nec Laboratories America, Inc. Adaptive Convolutional Neural Knowledge Graph Learning System Leveraging Entity Descriptions
CN108959396B (en) * 2018-06-04 2021-08-17 众安信息技术服务有限公司 Machine reading model training method and device and question and answer method and device
CN111445020B (en) * 2019-01-16 2023-05-23 阿里巴巴集团控股有限公司 Graph-based convolutional network training method, device and system
CN114254750A (en) * 2019-01-29 2022-03-29 北京金山数字娱乐科技有限公司 Accuracy loss determination method and apparatus
US11461619B2 (en) * 2019-02-18 2022-10-04 Nec Corporation Spatio temporal gated recurrent unit
US10861437B2 (en) * 2019-03-28 2020-12-08 Wipro Limited Method and device for extracting factoid associated words from natural language sentences
CN110210021B (en) * 2019-05-22 2021-05-28 北京百度网讯科技有限公司 Reading understanding method and device
CN110457450B (en) * 2019-07-05 2023-12-22 平安科技(深圳)有限公司 Answer generation method based on neural network model and related equipment
CN110598573B (en) * 2019-08-21 2022-11-25 中山大学 Visual problem common sense reasoning model and method based on multi-domain heterogeneous graph guidance
US11593672B2 (en) * 2019-08-22 2023-02-28 International Business Machines Corporation Conversation history within conversational machine reading comprehension
EP3783531A1 (en) * 2019-08-23 2021-02-24 Tata Consultancy Services Limited Automated conversion of text based privacy policy to video
CN110619123B (en) * 2019-09-19 2021-01-26 电子科技大学 Machine reading understanding method
CN110750630A (en) * 2019-09-25 2020-02-04 北京捷通华声科技股份有限公司 Generating type machine reading understanding method, device, equipment and storage medium
CN110781663B (en) * 2019-10-28 2023-08-29 北京金山数字娱乐科技有限公司 Training method and device of text analysis model, text analysis method and device
CN111274800B (en) * 2020-01-19 2022-03-18 浙江大学 Inference type reading understanding method based on relational graph convolution network
CN111310848B (en) * 2020-02-28 2022-06-28 支付宝(杭州)信息技术有限公司 Training method and device for multi-task model
CN111626044B (en) * 2020-05-14 2023-06-30 北京字节跳动网络技术有限公司 Text generation method, text generation device, electronic equipment and computer readable storage medium
CN112380835B (en) * 2020-10-10 2024-02-20 中国科学院信息工程研究所 Question answer extraction method integrating entity and sentence reasoning information and electronic device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180307969A1 (en) * 2017-04-20 2018-10-25 Hitachi, Ltd. Data analysis apparatus, data analysis method, and recording medium
CN109002519A (en) * 2018-07-09 2018-12-14 北京慧闻科技发展有限公司 Answer selection method, device and electronic equipment based on convolution loop neural network
US20200151613A1 (en) * 2018-11-09 2020-05-14 Lunit Inc. Method and apparatus for machine learning
CN110309305A (en) * 2019-06-14 2019-10-08 中国电子科技集团公司第二十八研究所 Machine based on multitask joint training reads understanding method and computer storage medium
CN110309283A (en) * 2019-06-28 2019-10-08 阿里巴巴集团控股有限公司 A kind of answer of intelligent answer determines method and device
CN110647629A (en) * 2019-09-20 2020-01-03 北京理工大学 Multi-document machine reading understanding method for multi-granularity answer sorting
CN111046661A (en) * 2019-12-13 2020-04-21 浙江大学 Reading understanding method based on graph convolution network
CN111460092A (en) * 2020-03-11 2020-07-28 中国电子科技集团公司第二十八研究所 Multi-document-based automatic complex problem solving method
CN112434142A (en) * 2020-11-20 2021-03-02 海信电子科技(武汉)有限公司 Method for marking training sample, server, computing equipment and storage medium

Also Published As

Publication number Publication date
CN112800186B (en) 2021-10-12
CN112800186A (en) 2021-05-14
CN113792121A (en) 2021-12-14
CN113792120B (en) 2023-09-15
CN113792121B (en) 2023-09-22
CN113792120A (en) 2021-12-14

Similar Documents

Publication Publication Date Title
CN111581961B (en) Automatic description method for image content constructed by Chinese visual vocabulary
CN111783462A (en) Chinese named entity recognition model and method based on dual neural network fusion
CN113239169A (en) Artificial intelligence-based answer generation method, device, equipment and storage medium
CN112800768A (en) Training method and device for nested named entity recognition model
CN113220832A (en) Text processing method and device
CN113505601A (en) Positive and negative sample pair construction method and device, computer equipment and storage medium
CN114495129A (en) Character detection model pre-training method and device
CN114691864A (en) Text classification model training method and device and text classification method and device
CN112632258A (en) Text data processing method and device, computer equipment and storage medium
CN112800186B (en) Reading understanding model training method and device and reading understanding method and device
Kumari et al. Context-based question answering system with suggested questions
CN115221315A (en) Text processing method and device, and sentence vector model training method and device
CN114417863A (en) Word weight generation model training method and device and word weight generation method and device
Islam et al. Bengali Caption Generation for Images Using Deep Learning
CN114648005A (en) Multi-fragment machine reading understanding method and device for multitask joint learning
CN113961686A (en) Question-answer model training method and device, question-answer method and device
CN114077831A (en) Training method and device for problem text analysis model
CN114692610A (en) Keyword determination method and device
CN112015891A (en) Method and system for classifying messages of network inquiry platform based on deep neural network
CN114610819B (en) Entity relation extraction method
CN112364660B (en) Corpus text processing method, corpus text processing device, computer equipment and storage medium
CN113377965B (en) Method and related device for sensing text keywords
CN112395419B (en) Training method and device of text classification model and text classification method and device
CN115705356A (en) Question answering method and device
Wei et al. AxialRE: Axial Attention for Dialogue Relation Extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination