CN112632253A - Answer extraction method and device based on graph convolution network and related components - Google Patents

Answer extraction method and device based on graph convolution network and related components Download PDF

Info

Publication number
CN112632253A
CN112632253A CN202011577396.8A CN202011577396A CN112632253A CN 112632253 A CN112632253 A CN 112632253A CN 202011577396 A CN202011577396 A CN 202011577396A CN 112632253 A CN112632253 A CN 112632253A
Authority
CN
China
Prior art keywords
vector
document
answer
graph
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011577396.8A
Other languages
Chinese (zh)
Inventor
黄勇其
王伟
于翠翠
张兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Runlian Software System Shenzhen Co Ltd
Original Assignee
Runlian Software System Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Runlian Software System Shenzhen Co Ltd filed Critical Runlian Software System Shenzhen Co Ltd
Priority to CN202011577396.8A priority Critical patent/CN112632253A/en
Publication of CN112632253A publication Critical patent/CN112632253A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/328Management therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Human Computer Interaction (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses an answer extraction method, an answer extraction device and relevant components based on a graph convolution network, wherein the method comprises the following steps: acquiring a user question and a document containing the user question; constructing a graph network for the documents based on syntactic dependency analysis; fusing a document and the graph network by utilizing a graph convolution network to obtain a document vector, and fusing a user question and the graph network to obtain a question vector; learning the document vector and the question vector based on the graph convolution network and the attention mechanism to obtain an answer starting probability and an answer terminating probability in the document; and respectively taking the answer starting probability and the answer ending probability as a starting index and an ending index of the document, and taking the text between the starting index and the ending index as an answer of the user question. The method constructs the graph network for the document containing the user question based on the graph convolution network and the syntactic dependency analysis, and enables the graph network to be semantically fused with the user question and the document, so that the accuracy rate of answer extraction is improved.

Description

Answer extraction method and device based on graph convolution network and related components
Technical Field
The invention relates to the technical field of natural language processing, in particular to an answer extraction method and device based on a graph convolution network and a related component.
Background
Machine reading comprehension task (MRC) refers to giving an article and finding, by machine, an answer to a question from the article based on the article. Early MRC systems relied primarily on rules and artificially generated data sets, which are difficult to generalize to other fields due to their small size. After entering a machine learning era, a machine reading understanding task is defined as a supervised learning task, collected data are manually labeled to be paragraphs, a question and an answer triple, a machine learning algorithm is introduced, and fitting of training data is realized by adding a rich semantic feature set.
With the development of deep learning and the emergence of larger labeled datasets, such as SQuAD dataset (a question and answer dataset), CNN/Daily mail dataset (a supervised dataset) and the like, a deep neural network is introduced into a machine reading understanding task, such as Match-LSTM model, which is originally used for a text inclusion task, and is added with Pointer-Net to make the text inclusion task suitable for the reading understanding task later, and the model treats a question as premise, treats an article as hypheis (hypothesis), is equivalent to searching the article for answers with the question and provides the regions of the answers through the Pointer-Net model (mainly used for solving the combined optimization class of questions). The BiDAF model is another typical machine reading understanding model, a dual-flow attention mechanism is introduced into the BiDAF model, and the problem-aware context representation is obtained through bidirectional attention interaction between the problem and the context, so that the extraction accuracy of the answer is improved. Although the two models can extract answers to a certain degree, problems also exist, for example, the two models use an LSTM network, which results in slower training speed and prediction speed. In order to improve the training speed and further improve the answer extraction accuracy, the QANet (a question-answering framework) adopts convolution instead of the conventional RNN structure and adopts various techniques, so that the training speed and the inference speed are greatly improved without affecting the precision. Although QANT can improve the training speed, the accuracy is improved a little because only the characteristics of the text are used, and if additional information is introduced to deeply depict the text characteristics, the accuracy of answer extraction can be further improved.
Disclosure of Invention
The embodiment of the invention provides an answer extraction method and device based on a graph convolution network, computer equipment and a storage medium, aiming at improving the accuracy of answer extraction.
In a first aspect, an embodiment of the present invention provides an answer extraction method based on a graph convolution network, including:
acquiring a user question and a document containing the user question;
constructing a graph network for the documents based on syntactic dependency analysis;
fusing the document and the graph network by utilizing a graph convolution network to obtain a document vector, and fusing the user question and the graph network to obtain a question vector;
learning the document vector and the question vector based on the graph convolution network and the attention mechanism to obtain answer starting probability and answer ending probability in the document;
and respectively taking the answer starting probability and the answer ending probability as a starting index and an ending index of the document, and taking the text between the starting index and the ending index as the answer of the user question.
In a second aspect, an embodiment of the present invention provides an answer extraction device based on a graph-convolution network, including:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a user question and a document containing the user question;
a construction unit for constructing a graph network for the document based on syntactic dependency analysis;
the fusion unit is used for fusing the document and the graph network by utilizing a graph convolution network to obtain a document vector and fusing the user question and the graph network to obtain a question vector;
the learning unit is used for learning the document vector and the question vector based on the graph convolution network and the attention mechanism so as to obtain answer starting probability and answer ending probability in the document;
and the answer extraction unit is used for respectively taking the answer starting probability and the answer ending probability as a starting index and an ending index of the document, and taking the text between the starting index and the ending index as the answer of the user question.
In a third aspect, an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor, when executing the computer program, implements the method for extracting answers based on the graph volume network according to the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when executed by a processor, the computer program implements the method for extracting answers based on a graph-volume network according to the first aspect.
The embodiment of the invention provides an answer extraction method, an answer extraction device, computer equipment and a storage medium based on a graph convolution network, wherein the method comprises the following steps: acquiring a user question and a document containing the user question; constructing a graph network for the documents based on syntactic dependency analysis; fusing the document and the graph network by utilizing a graph convolution network to obtain a document vector, and fusing the user question and the graph network to obtain a question vector; learning the document vector and the question vector based on the graph convolution network and the attention mechanism to obtain answer starting probability and answer ending probability in the document; and respectively taking the answer starting probability and the answer ending probability as a starting index and an ending index of the document, and taking the text between the starting index and the ending index as the answer of the user question. The embodiment of the invention constructs the graph network for the document containing the user question based on the graph convolution network and the syntactic dependency analysis, and makes the graph network perform semantic fusion with the user question and the document, thereby improving the accuracy of answer extraction.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of an answer extraction method based on a graph-convolution network according to an embodiment of the present invention;
fig. 2 is a schematic sub-flowchart of step S102 in an answer extraction method based on a graph-convolution network according to an embodiment of the present invention;
fig. 3 is a schematic sub-flowchart of step S104 in an answer extraction method based on a graph-convolution network according to an embodiment of the present invention;
fig. 4a is a schematic diagram illustrating an example of an answer extraction method based on a graph-convolution network according to an embodiment of the present invention;
fig. 4b is a schematic diagram illustrating another example of an answer extraction method based on a graph-convolution network according to an embodiment of the present invention;
fig. 5 is a schematic diagram illustrating another example of an answer extraction method based on a graph-convolution network according to an embodiment of the present invention;
fig. 6 is a schematic network structure diagram of an answer extraction method based on a graph-convolution network according to an embodiment of the present invention;
fig. 7a is a schematic structural diagram of a graph volume network fusion layer in an answer extraction method based on a graph volume network according to an embodiment of the present invention;
fig. 7b is a schematic structural diagram of a feature fusion layer in an answer extraction method based on a graph-convolution network according to an embodiment of the present invention;
fig. 8 is a schematic block diagram of an answer extraction device based on a graph-convolution network according to an embodiment of the present invention;
fig. 9 is a sub-schematic block diagram of a construction unit 802 in an answer extraction apparatus based on a graph-convolution network according to an embodiment of the present invention;
fig. 10 is a sub-schematic block diagram of a learning unit 804 in an answer extraction apparatus based on a graph-volume network according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Referring to fig. 1, fig. 1 is a schematic flow chart of an answer extraction method based on a graph-convolution network according to an embodiment of the present invention, which specifically includes: steps S101 to S105.
S101, obtaining a user question and a document containing the user question;
s102, constructing a graph network for the document based on syntactic dependency analysis;
s103, fusing the document and the graph network by utilizing a graph convolution network to obtain a document vector, and fusing the user question and the graph network to obtain a question vector;
s104, learning the document vector and the question vector based on the graph convolution network and the attention mechanism to obtain answer starting probability and answer ending probability in the document;
and S105, taking the answer starting probability and the answer ending probability as a starting index and an ending index of the document respectively, and taking the text between the starting index and the ending index as the answer of the user question.
In this embodiment, when extracting the answer to the user question (Query) from the document (Context), the graph network is first constructed for the document according to the syntactic dependency relationship. And fusing the user question and the document with the graph network respectively according to the graph network, and correspondingly obtaining the question vector and the document vector. And then, learning and calculating the question vector and the document vector through the attention mechanism to obtain corresponding answer initial probability and answer termination probability, respectively using the answer initial probability and the answer termination probability as an initial index and a termination index of the document, and then extracting a text between the initial index and the termination index as an answer of the user question.
The embodiment introduces a syntactic dependency analysis technology to construct a graph network, and takes words with syntactic dependency relationship with the central word as adjacent nodes of the central word. In the prior art, a central word is taken as a node and a word in a central word sliding window is taken as an adjacent node according to a sliding window mode, but redundant information is introduced in the mode, and other information contributing to the node outside the sliding window is lost. In contrast, the graph network is constructed through syntactic dependency analysis, other node information meaningful for the node in the sentence is reserved, the graph network is not limited by distance when constructed, the node information can be described from richer dimensions, the node representation is more accurate, and the finally extracted answer is more accurate and reliable. In addition, in this embodiment, adaptive semantic Fusion is respectively performed on the first word vector representation corresponding to the graph network and the user question and the second word vector representation corresponding to the document through the Feature Fusion Layer (Feature Fusion Layer), so that semantic representations of the user question and the document are richer.
In one embodiment, the step S101 includes:
the method comprises the steps of obtaining a user question and a document containing the user question, and learning the user question and the document respectively by using a preset word vector model to obtain a first word vector representation corresponding to the user question and a second word vector representation corresponding to the document.
In a specific embodiment, before learning the user question and the document respectively by using a preset word vector model to obtain a first word vector representation corresponding to the user question and a second word vector representation corresponding to the document, the preset word vector model is trained by using a genim toolkit, and a characteristic dimension can be customized for the word vector model, for example, the vector dimension is set to 256. Further, the first word vector representation and the second word vector representation obtained by the word vector model are [ N, S1, H ] and [ N, S2, H ], respectively, where N is the number of training samples, S1 is the sequence length of the user question, and S2 is the sequence length of the document.
In one embodiment, as shown in fig. 2, the step S102 includes: steps S201 to S203.
S201, segmenting the document according to sentences, and segmenting each segmented sentence to obtain words contained in each sentence;
s202, regarding each sentence, taking each word as a graph node of the graph network, obtaining the dependency relationship among the words, and connecting the corresponding graph nodes according to the dependency relationship among the words;
s203, obtaining the same words in each sentence, and connecting different sentences by using the same words as connecting nodes, thereby constructing a graph network for obtaining the document.
In this embodiment, when constructing a graph network, a sentence is first segmented from a document containing a user question, words in the sentence are further segmented, and then, for each sentence, the graph network of the sentence is constructed, that is, the words in the sentence are used as graph nodes, and words with dependency relationships are connected. After the graph network of each sentence is constructed, the sentences are connected through the same words in the sentences, and therefore the graph network of the documents is formed. When words with dependency relationship are connected, the dependent arcs can be used for connection, the specific direction of the dependent arcs can be that dependent words point to dominant words, and the dependency relationship is a major-predicate relationship, a motile-guest relationship, a shape-intermediate relationship and the like among the words.
For example, syntactic dependency analysis on a sentence "deep learning is a new research direction" more important than sentence "direction in effort" in the field of machine learning, can obtain a syntactic dependency diagram as shown in fig. 4a and 4b, where dependency labels in fig. 4a and 4b are shown in table 1:
type of relationship Marking Type of relationship Marking
Moving guest relationship VOB In a parallel relationship COO
Inter-guest relationships IOB Intermediary relation POB
Preposition object FOB Left additive relationship LAD
Concurrent language DBL Right additive relationship RAD
Centering relationships ATT Independent structure IS
Middle structure ADV Core relationships HED
Dynamic compensation structure CMP
TABLE 1
For example, the relationships between the word "domain/6" and the words "machine/4", "learning/5", "middle/7" are all in a centered relationship (where the numbers 6, 4, 5, 7 represent the index of the word in the sentence, the same applies hereinafter), so in the graph network, there is a connection between the node "domain/6" and the nodes "machine/4", "learning/5", "middle/7". In addition, in conjunction with FIG. 6, "learn" is linked to "deep", "Yes" in addition to "Domain/6" because "learn/5" is also dependent on "deep/1" and "Yes/3", the same word with different indices sharing nodes in the graph network. Different sentences are linked together by the same word, such as the above two sentences linked together by the same word "direction/12", "direction/1".
Further, in an embodiment, the step S102 further includes:
initializing the weight of the edge between every two nodes in the graph network according to the following formula:
Figure BDA0002864750720000071
Figure BDA0002864750720000072
r′i,j=ri,j
in the formula (II), r'i,jAnd ri,jAll represent the weight of the edge between node i and node j, where r'i,jIs recorded as the weight, r, before updating of node i and node ji,jRecording as updated weight, r 'of node i and node j'i,jInitialisation to 1, viVector, v, representing node ijVector representing node j, cos (v)i,vj) Denotes viAnd vjCosine similarity of (c). idxiRepresenting the position of node i in the sentence, idxjRepresents the position of node j in the sentence, len represents the length of the sentence,
Figure BDA0002864750720000073
representing the relative distance between two nodes;
updating the weights of the nodes in the graph network according to the following formula:
Figure BDA0002864750720000074
Figure BDA0002864750720000075
Figure BDA0002864750720000076
in the formula, AijFor adjacency matrices, i.e. edge-to-edge connection weights, DiiIs a matrix of degrees of the adjacency matrix,
Figure BDA0002864750720000081
to regress the Laplace matrix in the form of a one-stroke, W(l)As a weight matrix, b(l)For the bias term, σ is a non-linear mapping function,hi (l)to updated weights of nodes i, hj (l-1)Is the weight of node j before updating.
In this embodiment, after the graph network is constructed, in order to represent the importance of each node of the graph network, the weight of each node and the weight of an edge between any two nodes may be updated and initialized, for example, the labels v1-v15 for the edge in fig. 5, that is, the weights refer to the weights of the edges between two nodes.
In one embodiment, the step S103 includes:
obtaining a node vector which is the same as the second word vector representation in the graph network, and fusing the second word vector representation and the node vector according to the following formula:
new_x=x+node*σ(W1[x,node]+b1)
wherein σ is a nonlinear activation function, node is the node vector, x is the second word vector representation, [ x, node [ ]]Representing the concatenation of said second word vector representation and node vector, W1To weight matrices to be trained, b1As the offset term, new _ x is the output vector;
inputting the output vector new _ x into a feed-forward neural network, and outputting the document vector by the feed-forward neural network:
h=f(w3*f(w2new_x+b2)+b3)
wherein h is the document vector, w2、w3、b2And b3Are all parameters to be trained, and f is an activation function.
In this embodiment, referring to fig. 7a, when the graph network and the second word vector representation are merged by using a Fusion layer (Fusion layer) of the graph convolutional network, first, a node vector that is the same as the second word vector representation is selected from node vectors of the graph network, and the second word vector representation is merged with the selected node vector. In other words, for each word in the document, the same word is found in the graph network, and the node in the graph network of the word is obtainedAnd adaptively fusing the vector representation with the second word vector representation. Through nonlinear mapping (such as sigmoid nonlinear mapping), the importance degrees of different words in a sentence can be obtained, the node vectors are fused according to the importance degrees, and the fused vectors are calculated by utilizing a feedforward neural network, so that the document vectors can be obtained. For example, the document vector h is obtained, and the dimension of the document vector h is [ N, S ]2,H]
In another embodiment, when the graph network and the first word vector representation are merged by using the merging layer of the graph convolution network, a node vector identical to the first word vector representation is first selected from node vectors of the graph network, and the first word vector representation is merged with the selected node vector. Then, through nonlinear mapping (e.g., sigmoid nonlinear mapping), importance degrees of different words in sentences of the user problem can be obtained, the node vectors are fused according to the importance degrees, and the fused vectors are calculated by using a feedforward neural network, so that the problem vector can be obtained, for example, the problem vector u is obtained, and the dimensionality of the problem vector u is [ N, S ]1,H]。
In one embodiment, as shown in fig. 3, the step S104 includes: steps S301 to S304.
S301, inputting the document vector and the problem vector to a mutual attention layer, and obtaining a first target vector;
s302, calculating the first target vector by using a self-attention layer to obtain a second target vector;
s303, inputting the second target vector and the document vector to a feature fusion layer to obtain a third target vector;
s304, inputting the third target vector to an output layer, and outputting the corresponding answer starting probability and answer ending probability by the output layer.
In this embodiment, the question vector and the document vector are calculated by the mutual attention layer to obtain the first target vector, the first target vector is calculated by the self-attention layer to obtain the second target vector, the second target vector and the document vector are subjected to fusion calculation by the feature fusion layer to obtain the third target vector, an answer start probability and an answer termination probability corresponding to the third target vector are output by the output layer, and a corresponding answer is extracted from the document according to the answer start probability and the answer termination probability.
In one embodiment, the step S301 includes:
calculating the similarity between the document vector and the problem vector according to the following formula to obtain a target matrix Sij
Figure BDA0002864750720000091
In the formula, hiFor the document vector, ujFor the purpose of the problem vector,
Figure BDA0002864750720000092
is a vector dot product and is a product of the vector dots,
Figure BDA0002864750720000093
the dimension of the obtained S matrix is represented as S2*S1,S2Is the sequence length of the document, S1S is a matrix obtained by multiplying the document vector and the question vector for the sequence length of the user question;
for the target matrix S according to the following formulaiAnd (3) carrying out normalization treatment:
Figure BDA0002864750720000094
where b is the normalized output vector, maxrow(S) represents the row-wise maximization of S,
Figure BDA0002864750720000095
indicating best by rowAfter a large value, S is obtained2A real number;
calculating the normalized output vector b according to the following formula to obtain the first target vector g:
g=b·h
wherein h is the document vector hiA matrix is formed.
In this embodiment, when the first target vector is output through the Inter-Attention Layer (Inter-Attention Layer), the similarity between the document vector and the problem vector is first calculated to obtain the target matrix, then the target matrix is normalized by using the softmax function, and then the output vector after the normalization processing is calculated to obtain the first target vector.
In one embodiment, the step S302 includes:
calculating the first target vector input into the self-attention layer according to the following formula to obtain the second target vector m:
Figure BDA0002864750720000101
where Q, K and V are the three matrices resulting from multiplying the input vector g (i.e., the first target vector) by one matrix,
Figure BDA0002864750720000102
on a scale, the characteristic dimensions of the Q, K and V matrices are represented. In a specific application scenario, d is setk=256。
In one embodiment, the step S303 includes:
inputting the second target vector and the document vector into a feature fusion layer, and calculating according to the following formula to obtain a third target vector:
v=m+x*σ(W4[x,m]+b4)
where v is the third target vector, W4And b4For trainable parameters, m isThe second target vector, x is the document vector, and σ is an activation function.
In this embodiment, with reference to fig. 7b, in order to retain useful information for answer extraction in the document, the document vector and the second target vector calculated from the attention layer are input to the feature fusion layer (featurefusion layer), so as to filter out interference information in the document and improve accuracy of final answer extraction.
In one embodiment, the step S304 includes:
and respectively calculating the answer starting probability and the answer terminating probability according to the following formulas:
start=softmax(W5v+b5)
end=softmax(W6v+b6)
in the formula, W5、b5、W6And b6V is the third target vector, start represents the answer start probability, and end represents the answer end probability.
In this embodiment, the answer start probability and the answer termination probability of the user question in the document may be obtained through the above formulas, the answer start probability and the answer termination probability may be used as indexes in the document, that is, the start answer index and the termination answer index, and a text between the start answer index and the termination answer index may be used as an answer to the user question.
In an embodiment, as shown in fig. 6, a graph network is constructed for a document (Context) containing a user question (Query) by using syntactic dependency analysis, then the graph network and the document, the graph network and the user question are fused by a feed-forward neural network respectively to obtain a document vector (h) and a question vector (u), the document vector and the question vector are input to a mutual attention layer to obtain a first target vector (g), the first target vector is calculated by using a self-attention layer to obtain a second target vector (m), the second target vector and the document vector are input to a feature fusion layer to obtain a third target vector (v), the third target vector is input to an output layer, and the output layer outputs a corresponding answer start probability and answer end probability.
Fig. 8 is a schematic block diagram of an answer extraction apparatus 800 based on a graph-convolution network according to an embodiment of the present invention, where the apparatus 800 includes:
an acquisition unit 801 configured to acquire a user question and a document containing the user question;
a construction unit 802 for constructing a graph network for the document based on syntactic dependency analysis;
a fusion unit 803, configured to fuse the document and the graph network by using a graph convolution network to obtain a document vector, and fuse the user question and the graph network to obtain a question vector;
a learning unit 804, configured to learn the document vector and the question vector based on the graph convolution network and the attention mechanism to obtain an answer start probability and an answer end probability in the document;
an answer extracting unit 805, configured to use the answer start probability and the answer end probability as a start index and an end index of the document, respectively, and use a text between the start index and the end index as an answer to the user question.
In an embodiment, the obtaining unit 801 includes:
the vector representation unit is used for acquiring a user question and a document containing the user question, and learning the user question and the document respectively by using a preset word vector model to obtain a first word vector representation corresponding to the user question and a second word vector representation corresponding to the document.
In one embodiment, as shown in fig. 9, the building unit 802 includes:
a segmentation unit 901, configured to segment the document according to sentences, and segment each segmented sentence to obtain words contained in each sentence;
a first connecting unit 902, configured to, for each sentence, take each word as a graph node of the graph network, obtain a dependency relationship between the words, and connect corresponding graph nodes according to the dependency relationship between the words;
a second connecting unit 903, configured to obtain the same word in each sentence, and connect different sentences using the same word as a connection node, so as to construct a graph network of the obtained document.
In one embodiment, the fusion unit 803 includes:
a document fusion unit, configured to obtain a node vector in the graph network, where the node vector is the same as the second word vector, and fuse the second word vector representation and the node vector according to the following formula:
new_x=x+node*σ(W1[x,node]+b1)
wherein σ is a nonlinear activation function, node is the node vector, x is the second word vector representation, [ x, node [ ]]Representing the concatenation of said second word vector representation and node vector, W1To weight matrices to be trained, b1As the offset term, new _ x is the output vector;
a feedforward output unit, configured to input the output vector new _ x to a feedforward neural network, and output the document vector by the feedforward neural network:
h=f(w3*f(w2new_x+b2)+b3)
wherein h is the document vector, w2、w3、b2And b3Are all parameters to be trained, and f is an activation function.
In one embodiment, as shown in fig. 10, the learning unit 804 includes:
a first input unit 1001 configured to input the document vector and the question vector to a mutual attention layer, and obtain a first target vector;
a first calculating unit 1002, configured to calculate the first target vector by using a self-attention layer to obtain a second target vector;
a second input unit 1003, configured to input the second target vector and the document vector to a feature fusion layer, and obtain a third target vector;
a third input unit 1004, configured to input the third target vector to an output layer, and output, by the output layer, a corresponding answer start probability and answer end probability.
In one embodiment, the second input unit 1003 includes:
the third calculating unit is used for inputting the second target vector and the document vector into a feature fusion layer, and calculating according to the following formula to obtain a third target vector:
v=m+x*σ(W4[x,m]+b4)
where v is the third target vector, W4And b4M is the second target vector, x is the document vector, and σ is the activation function for the trainable parameters.
In one embodiment, the third input unit 1004 includes:
the probability calculation unit is used for calculating and obtaining the answer starting probability and the answer ending probability according to the following formulas:
start=softmax(W5v+b5)
end=softmax(W6v+b6)
in the formula, W5、b5、W6And b6V is the third target vector, start represents the answer start probability, and end represents the answer end probability.
Since the embodiments of the apparatus portion and the method portion correspond to each other, please refer to the description of the embodiments of the method portion for the embodiments of the apparatus portion, which is not repeated here.
Embodiments of the present invention also provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed, the steps provided by the above embodiments can be implemented. The storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The embodiment of the present invention further provides a computer device, which may include a memory and a processor, where the memory stores a computer program, and the processor may implement the steps provided in the above embodiments when calling the computer program in the memory. Of course, the computer device may also include various network interfaces, power supplies, and the like.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. An answer extraction method based on a graph convolution network is characterized by comprising the following steps:
acquiring a user question and a document containing the user question;
constructing a graph network for the documents based on syntactic dependency analysis;
fusing the document and the graph network by utilizing a graph convolution network to obtain a document vector, and fusing the user question and the graph network to obtain a question vector;
learning the document vector and the question vector based on the graph convolution network and the attention mechanism to obtain answer starting probability and answer ending probability in the document;
and respectively taking the answer starting probability and the answer ending probability as a starting index and an ending index of the document, and taking the text between the starting index and the ending index as the answer of the user question.
2. The method of claim 1, wherein the obtaining of the user question and the document containing the user question comprises:
the method comprises the steps of obtaining a user question and a document containing the user question, and learning the user question and the document respectively by using a preset word vector model to obtain a first word vector representation corresponding to the user question and a second word vector representation corresponding to the document.
3. The method of claim 1, wherein the constructing a graph network for the document based on syntactic dependency analysis comprises:
segmenting the document according to sentences, and segmenting each segmented sentence to obtain words contained in each sentence;
for each sentence, each word is used as a graph node of the graph network, the dependency relationship among the words is obtained, and the corresponding graph nodes are connected according to the dependency relationship among the words;
and acquiring the same words in each sentence, and connecting different sentences by using the same words as connecting nodes, thereby constructing a graph network for obtaining the document.
4. The answer extraction method based on the graph convolution network as claimed in claim 2, wherein the fusing the document and the graph network by using the graph convolution network to obtain a document vector comprises:
obtaining a node vector which is the same as the second word vector representation in the graph network, and fusing the second word vector representation and the node vector according to the following formula:
new_x=x+node*σ(W1[x,node]+b1)
wherein σ is a nonlinear activation function, node is the node vector, x is the second word vector representation, [ x, node [ ]]Representing the concatenation of said second word vector representation and node vector, W1To weight matrices to be trained, b1As the offset term, new _ x is the output vector;
inputting the output vector new _ x into a feed-forward neural network, and outputting the document vector by the feed-forward neural network:
h=f(w3*f(w2new_x+b2)+b3)
wherein h is the document vector, w2、w3、b2And b3Are all parameters to be trained, and f is an activation function.
5. The method of claim 4, wherein learning the document vector and the question vector based on the graph convolution network and an attention mechanism to obtain an answer start probability and an answer end probability in the document comprises:
inputting the document vector and the problem vector to a mutual attention layer, and obtaining a first target vector;
calculating the first target vector by using a self-attention layer to obtain a second target vector;
inputting the second target vector and the document vector to a feature fusion layer, and obtaining a third target vector;
and inputting the third target vector to an output layer, and outputting the corresponding answer starting probability and answer ending probability by the output layer.
6. The method of claim 5, wherein the inputting the second target vector and the document vector into a feature fusion layer and obtaining a third target vector comprises:
inputting the second target vector and the document vector into a feature fusion layer, and calculating according to the following formula to obtain a third target vector:
v=m+x*σ(W4[x,m]+b4)
where v is the third target vector, W4And b4M is the second target vector, x is the document vector, and σ is the activation function for the trainable parameters.
7. The method for extracting answers based on the graph-convolution network as claimed in claim 5, wherein the inputting the third target vector to an output layer and outputting a corresponding answer start probability and answer end probability by the output layer comprises:
and respectively calculating the answer starting probability and the answer terminating probability according to the following formulas:
start=softmax(W5v+b5)
end=softmax(W6v+b6)
in the formula, W5、b5、W6And b6V is the third target vector, start represents the answer start probability, and end represents the answer end probability.
8. An answer extraction device based on a graph convolution network, comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a user question and a document containing the user question;
a construction unit for constructing a graph network for the document based on syntactic dependency analysis;
the fusion unit is used for fusing the document and the graph network by utilizing a graph convolution network to obtain a document vector and fusing the user question and the graph network to obtain a question vector;
the learning unit is used for learning the document vector and the question vector based on the graph convolution network and the attention mechanism so as to obtain answer starting probability and answer ending probability in the document;
and the answer extraction unit is used for respectively taking the answer starting probability and the answer ending probability as a starting index and an ending index of the document, and taking the text between the starting index and the ending index as the answer of the user question.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method for extracting answers based on a graph and volume network according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the method for extracting answers based on a graph-volume network according to any one of claims 1 to 7.
CN202011577396.8A 2020-12-28 2020-12-28 Answer extraction method and device based on graph convolution network and related components Pending CN112632253A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011577396.8A CN112632253A (en) 2020-12-28 2020-12-28 Answer extraction method and device based on graph convolution network and related components

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011577396.8A CN112632253A (en) 2020-12-28 2020-12-28 Answer extraction method and device based on graph convolution network and related components

Publications (1)

Publication Number Publication Date
CN112632253A true CN112632253A (en) 2021-04-09

Family

ID=75326040

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011577396.8A Pending CN112632253A (en) 2020-12-28 2020-12-28 Answer extraction method and device based on graph convolution network and related components

Country Status (1)

Country Link
CN (1) CN112632253A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113326692A (en) * 2021-06-24 2021-08-31 四川启睿克科技有限公司 Machine reading understanding method and device considering syntax structure
CN113486174A (en) * 2021-06-15 2021-10-08 北京三快在线科技有限公司 Model training, reading understanding method and device, electronic equipment and storage medium
CN113535912A (en) * 2021-05-18 2021-10-22 北京邮电大学 Text association method based on graph convolution network and attention mechanism and related equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180082183A1 (en) * 2011-02-22 2018-03-22 Thomson Reuters Global Resources Machine learning-based relationship association and related discovery and search engines
CN109977199A (en) * 2019-01-14 2019-07-05 浙江大学 A kind of reading understanding method based on attention pond mechanism
US20190311064A1 (en) * 2018-04-07 2019-10-10 Microsoft Technology Licensing, Llc Intelligent question answering using machine reading comprehension
CN111046661A (en) * 2019-12-13 2020-04-21 浙江大学 Reading understanding method based on graph convolution network
CN111274800A (en) * 2020-01-19 2020-06-12 浙江大学 Inference type reading understanding method based on relational graph convolution network
CN111611361A (en) * 2020-04-01 2020-09-01 西南电子技术研究所(中国电子科技集团公司第十研究所) Intelligent reading, understanding, question answering system of extraction type machine

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180082183A1 (en) * 2011-02-22 2018-03-22 Thomson Reuters Global Resources Machine learning-based relationship association and related discovery and search engines
US20190311064A1 (en) * 2018-04-07 2019-10-10 Microsoft Technology Licensing, Llc Intelligent question answering using machine reading comprehension
CN109977199A (en) * 2019-01-14 2019-07-05 浙江大学 A kind of reading understanding method based on attention pond mechanism
CN111046661A (en) * 2019-12-13 2020-04-21 浙江大学 Reading understanding method based on graph convolution network
CN111274800A (en) * 2020-01-19 2020-06-12 浙江大学 Inference type reading understanding method based on relational graph convolution network
CN111611361A (en) * 2020-04-01 2020-09-01 西南电子技术研究所(中国电子科技集团公司第十研究所) Intelligent reading, understanding, question answering system of extraction type machine

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113535912A (en) * 2021-05-18 2021-10-22 北京邮电大学 Text association method based on graph convolution network and attention mechanism and related equipment
CN113535912B (en) * 2021-05-18 2023-12-26 北京邮电大学 Text association method and related equipment based on graph rolling network and attention mechanism
CN113486174A (en) * 2021-06-15 2021-10-08 北京三快在线科技有限公司 Model training, reading understanding method and device, electronic equipment and storage medium
CN113326692A (en) * 2021-06-24 2021-08-31 四川启睿克科技有限公司 Machine reading understanding method and device considering syntax structure

Similar Documents

Publication Publication Date Title
CN108052588B (en) Method for constructing automatic document question-answering system based on convolutional neural network
CN111914067B (en) Chinese text matching method and system
CN112632253A (en) Answer extraction method and device based on graph convolution network and related components
CN114565104A (en) Language model pre-training method, result recommendation method and related device
CN110598207B (en) Word vector obtaining method and device and storage medium
CN111191002A (en) Neural code searching method and device based on hierarchical embedding
CN109145083B (en) Candidate answer selecting method based on deep learning
CN113705196A (en) Chinese open information extraction method and device based on graph neural network
CN113821527A (en) Hash code generation method and device, computer equipment and storage medium
CN110737837B (en) Scientific research collaborator recommendation method based on multi-dimensional features under research gate platform
CN116561251A (en) Natural language processing method
CN114648029A (en) Electric power field named entity identification method based on BiLSTM-CRF model
CN113010655B (en) Answer and interference item generation method and device for reading and understanding of machine
CN114036246A (en) Commodity map vectorization method and device, electronic equipment and storage medium
CN113761151A (en) Synonym mining method, synonym mining device, synonym question answering method, synonym question answering device, computer equipment and storage medium
Lyu et al. Deep learning for textual entailment recognition
Kajiwara et al. An iterative approach for the global estimation of sentence similarity
CN117034916A (en) Method, device and equipment for constructing word vector representation model and word vector representation
Reisi et al. Authorship attribution in historical and literary texts by a deep learning classifier
CN111291550A (en) Chinese entity extraction method and device
CN113128235A (en) Semantic understanding method
Karpagam et al. Deep learning approaches for answer selection in question answering system for conversation agents
Lee Natural Language Processing: A Textbook with Python Implementation
US11880664B2 (en) Identifying and transforming text difficult to understand by user
CN115809663A (en) Exercise analysis method, exercise analysis device, exercise analysis equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination