CN112632253A

CN112632253A - Answer extraction method and device based on graph convolution network and related components

Info

Publication number: CN112632253A
Application number: CN202011577396.8A
Authority: CN
Inventors: 黄勇其; 王伟; 于翠翠; 张兴
Original assignee: Runlian Software System Shenzhen Co Ltd
Current assignee: Runlian Software System Shenzhen Co Ltd
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2021-04-09

Abstract

The invention discloses an answer extraction method, an answer extraction device and relevant components based on a graph convolution network, wherein the method comprises the following steps: acquiring a user question and a document containing the user question; constructing a graph network for the documents based on syntactic dependency analysis; fusing a document and the graph network by utilizing a graph convolution network to obtain a document vector, and fusing a user question and the graph network to obtain a question vector; learning the document vector and the question vector based on the graph convolution network and the attention mechanism to obtain an answer starting probability and an answer terminating probability in the document; and respectively taking the answer starting probability and the answer ending probability as a starting index and an ending index of the document, and taking the text between the starting index and the ending index as an answer of the user question. The method constructs the graph network for the document containing the user question based on the graph convolution network and the syntactic dependency analysis, and enables the graph network to be semantically fused with the user question and the document, so that the accuracy rate of answer extraction is improved.

Description

Answer extraction method and device based on graph convolution network and related components

Technical Field

The invention relates to the technical field of natural language processing, in particular to an answer extraction method and device based on a graph convolution network and a related component.

Background

Machine reading comprehension task (MRC) refers to giving an article and finding, by machine, an answer to a question from the article based on the article. Early MRC systems relied primarily on rules and artificially generated data sets, which are difficult to generalize to other fields due to their small size. After entering a machine learning era, a machine reading understanding task is defined as a supervised learning task, collected data are manually labeled to be paragraphs, a question and an answer triple, a machine learning algorithm is introduced, and fitting of training data is realized by adding a rich semantic feature set.

With the development of deep learning and the emergence of larger labeled datasets, such as SQuAD dataset (a question and answer dataset), CNN/Daily mail dataset (a supervised dataset) and the like, a deep neural network is introduced into a machine reading understanding task, such as Match-LSTM model, which is originally used for a text inclusion task, and is added with Pointer-Net to make the text inclusion task suitable for the reading understanding task later, and the model treats a question as premise, treats an article as hypheis (hypothesis), is equivalent to searching the article for answers with the question and provides the regions of the answers through the Pointer-Net model (mainly used for solving the combined optimization class of questions). The BiDAF model is another typical machine reading understanding model, a dual-flow attention mechanism is introduced into the BiDAF model, and the problem-aware context representation is obtained through bidirectional attention interaction between the problem and the context, so that the extraction accuracy of the answer is improved. Although the two models can extract answers to a certain degree, problems also exist, for example, the two models use an LSTM network, which results in slower training speed and prediction speed. In order to improve the training speed and further improve the answer extraction accuracy, the QANet (a question-answering framework) adopts convolution instead of the conventional RNN structure and adopts various techniques, so that the training speed and the inference speed are greatly improved without affecting the precision. Although QANT can improve the training speed, the accuracy is improved a little because only the characteristics of the text are used, and if additional information is introduced to deeply depict the text characteristics, the accuracy of answer extraction can be further improved.

Disclosure of Invention

The embodiment of the invention provides an answer extraction method and device based on a graph convolution network, computer equipment and a storage medium, aiming at improving the accuracy of answer extraction.

In a first aspect, an embodiment of the present invention provides an answer extraction method based on a graph convolution network, including:

acquiring a user question and a document containing the user question;

constructing a graph network for the documents based on syntactic dependency analysis;

fusing the document and the graph network by utilizing a graph convolution network to obtain a document vector, and fusing the user question and the graph network to obtain a question vector;

learning the document vector and the question vector based on the graph convolution network and the attention mechanism to obtain answer starting probability and answer ending probability in the document;

and respectively taking the answer starting probability and the answer ending probability as a starting index and an ending index of the document, and taking the text between the starting index and the ending index as the answer of the user question.

In a second aspect, an embodiment of the present invention provides an answer extraction device based on a graph-convolution network, including:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a user question and a document containing the user question;

a construction unit for constructing a graph network for the document based on syntactic dependency analysis;

the fusion unit is used for fusing the document and the graph network by utilizing a graph convolution network to obtain a document vector and fusing the user question and the graph network to obtain a question vector;

the learning unit is used for learning the document vector and the question vector based on the graph convolution network and the attention mechanism so as to obtain answer starting probability and answer ending probability in the document;

and the answer extraction unit is used for respectively taking the answer starting probability and the answer ending probability as a starting index and an ending index of the document, and taking the text between the starting index and the ending index as the answer of the user question.

In a third aspect, an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor, when executing the computer program, implements the method for extracting answers based on the graph volume network according to the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when executed by a processor, the computer program implements the method for extracting answers based on a graph-volume network according to the first aspect.

The embodiment of the invention provides an answer extraction method, an answer extraction device, computer equipment and a storage medium based on a graph convolution network, wherein the method comprises the following steps: acquiring a user question and a document containing the user question; constructing a graph network for the documents based on syntactic dependency analysis; fusing the document and the graph network by utilizing a graph convolution network to obtain a document vector, and fusing the user question and the graph network to obtain a question vector; learning the document vector and the question vector based on the graph convolution network and the attention mechanism to obtain answer starting probability and answer ending probability in the document; and respectively taking the answer starting probability and the answer ending probability as a starting index and an ending index of the document, and taking the text between the starting index and the ending index as the answer of the user question. The embodiment of the invention constructs the graph network for the document containing the user question based on the graph convolution network and the syntactic dependency analysis, and makes the graph network perform semantic fusion with the user question and the document, thereby improving the accuracy of answer extraction.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of an answer extraction method based on a graph-convolution network according to an embodiment of the present invention;

fig. 2 is a schematic sub-flowchart of step S102 in an answer extraction method based on a graph-convolution network according to an embodiment of the present invention;

fig. 3 is a schematic sub-flowchart of step S104 in an answer extraction method based on a graph-convolution network according to an embodiment of the present invention;

fig. 4a is a schematic diagram illustrating an example of an answer extraction method based on a graph-convolution network according to an embodiment of the present invention;

fig. 4b is a schematic diagram illustrating another example of an answer extraction method based on a graph-convolution network according to an embodiment of the present invention;

fig. 5 is a schematic diagram illustrating another example of an answer extraction method based on a graph-convolution network according to an embodiment of the present invention;

fig. 6 is a schematic network structure diagram of an answer extraction method based on a graph-convolution network according to an embodiment of the present invention;

fig. 7a is a schematic structural diagram of a graph volume network fusion layer in an answer extraction method based on a graph volume network according to an embodiment of the present invention;

fig. 7b is a schematic structural diagram of a feature fusion layer in an answer extraction method based on a graph-convolution network according to an embodiment of the present invention;

fig. 8 is a schematic block diagram of an answer extraction device based on a graph-convolution network according to an embodiment of the present invention;

fig. 9 is a sub-schematic block diagram of a construction unit 802 in an answer extraction apparatus based on a graph-convolution network according to an embodiment of the present invention;

fig. 10 is a sub-schematic block diagram of a learning unit 804 in an answer extraction apparatus based on a graph-volume network according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Referring to fig. 1, fig. 1 is a schematic flow chart of an answer extraction method based on a graph-convolution network according to an embodiment of the present invention, which specifically includes: steps S101 to S105.

S101, obtaining a user question and a document containing the user question;

s102, constructing a graph network for the document based on syntactic dependency analysis;

s103, fusing the document and the graph network by utilizing a graph convolution network to obtain a document vector, and fusing the user question and the graph network to obtain a question vector;

s104, learning the document vector and the question vector based on the graph convolution network and the attention mechanism to obtain answer starting probability and answer ending probability in the document;

and S105, taking the answer starting probability and the answer ending probability as a starting index and an ending index of the document respectively, and taking the text between the starting index and the ending index as the answer of the user question.

In this embodiment, when extracting the answer to the user question (Query) from the document (Context), the graph network is first constructed for the document according to the syntactic dependency relationship. And fusing the user question and the document with the graph network respectively according to the graph network, and correspondingly obtaining the question vector and the document vector. And then, learning and calculating the question vector and the document vector through the attention mechanism to obtain corresponding answer initial probability and answer termination probability, respectively using the answer initial probability and the answer termination probability as an initial index and a termination index of the document, and then extracting a text between the initial index and the termination index as an answer of the user question.

The embodiment introduces a syntactic dependency analysis technology to construct a graph network, and takes words with syntactic dependency relationship with the central word as adjacent nodes of the central word. In the prior art, a central word is taken as a node and a word in a central word sliding window is taken as an adjacent node according to a sliding window mode, but redundant information is introduced in the mode, and other information contributing to the node outside the sliding window is lost. In contrast, the graph network is constructed through syntactic dependency analysis, other node information meaningful for the node in the sentence is reserved, the graph network is not limited by distance when constructed, the node information can be described from richer dimensions, the node representation is more accurate, and the finally extracted answer is more accurate and reliable. In addition, in this embodiment, adaptive semantic Fusion is respectively performed on the first word vector representation corresponding to the graph network and the user question and the second word vector representation corresponding to the document through the Feature Fusion Layer (Feature Fusion Layer), so that semantic representations of the user question and the document are richer.

In one embodiment, the step S101 includes:

the method comprises the steps of obtaining a user question and a document containing the user question, and learning the user question and the document respectively by using a preset word vector model to obtain a first word vector representation corresponding to the user question and a second word vector representation corresponding to the document.

In a specific embodiment, before learning the user question and the document respectively by using a preset word vector model to obtain a first word vector representation corresponding to the user question and a second word vector representation corresponding to the document, the preset word vector model is trained by using a genim toolkit, and a characteristic dimension can be customized for the word vector model, for example, the vector dimension is set to 256. Further, the first word vector representation and the second word vector representation obtained by the word vector model are [ N, S1, H ] and [ N, S2, H ], respectively, where N is the number of training samples, S1 is the sequence length of the user question, and S2 is the sequence length of the document.

In one embodiment, as shown in fig. 2, the step S102 includes: steps S201 to S203.

S201, segmenting the document according to sentences, and segmenting each segmented sentence to obtain words contained in each sentence;

s202, regarding each sentence, taking each word as a graph node of the graph network, obtaining the dependency relationship among the words, and connecting the corresponding graph nodes according to the dependency relationship among the words;

s203, obtaining the same words in each sentence, and connecting different sentences by using the same words as connecting nodes, thereby constructing a graph network for obtaining the document.

In this embodiment, when constructing a graph network, a sentence is first segmented from a document containing a user question, words in the sentence are further segmented, and then, for each sentence, the graph network of the sentence is constructed, that is, the words in the sentence are used as graph nodes, and words with dependency relationships are connected. After the graph network of each sentence is constructed, the sentences are connected through the same words in the sentences, and therefore the graph network of the documents is formed. When words with dependency relationship are connected, the dependent arcs can be used for connection, the specific direction of the dependent arcs can be that dependent words point to dominant words, and the dependency relationship is a major-predicate relationship, a motile-guest relationship, a shape-intermediate relationship and the like among the words.

For example, syntactic dependency analysis on a sentence "deep learning is a new research direction" more important than sentence "direction in effort" in the field of machine learning, can obtain a syntactic dependency diagram as shown in fig. 4a and 4b, where dependency labels in fig. 4a and 4b are shown in table 1:

type of relationship	Marking	Type of relationship	Marking
				Moving guest relationship	VOB	In a parallel relationship	COO
Inter-guest relationships	IOB	Intermediary relation	POB
				Preposition object	FOB	Left additive relationship	LAD
Concurrent language	DBL	Right additive relationship	RAD
				Centering relationships	ATT	Independent structure	IS
Middle structure	ADV	Core relationships	HED
				Dynamic compensation structure	CMP

TABLE 1

For example, the relationships between the word "domain/6" and the words "machine/4", "learning/5", "middle/7" are all in a centered relationship (where the

numbers

6, 4, 5, 7 represent the index of the word in the sentence, the same applies hereinafter), so in the graph network, there is a connection between the node "domain/6" and the nodes "machine/4", "learning/5", "middle/7". In addition, in conjunction with FIG. 6, "learn" is linked to "deep", "Yes" in addition to "Domain/6" because "learn/5" is also dependent on "deep/1" and "Yes/3", the same word with different indices sharing nodes in the graph network. Different sentences are linked together by the same word, such as the above two sentences linked together by the same word "direction/12", "direction/1".

Further, in an embodiment, the step S102 further includes:

initializing the weight of the edge between every two nodes in the graph network according to the following formula:

r′_i,j＝r_i,j

in the formula (II), r'_i,jAnd r_i,jAll represent the weight of the edge between node i and node j, where r'_i,jIs recorded as the weight, r, before updating of node i and node j_i,jRecording as updated weight, r 'of node i and node j'_i,jInitialisation to 1, v_iVector, v, representing node i_jVector representing node j, cos (v)_i,v_j) Denotes v_iAnd v_jCosine similarity of (c). idx_iRepresenting the position of node i in the sentence, idx_jRepresents the position of node j in the sentence, len represents the length of the sentence,

representing the relative distance between two nodes;

updating the weights of the nodes in the graph network according to the following formula:

in the formula, A_ijFor adjacency matrices, i.e. edge-to-edge connection weights, D_iiIs a matrix of degrees of the adjacency matrix,

to regress the Laplace matrix in the form of a one-stroke, W^(l)As a weight matrix, b^(l)For the bias term, σ is a non-linear mapping function,h_i ^(l)to updated weights of nodes i, h_j ^(l-1)Is the weight of node j before updating.

In this embodiment, after the graph network is constructed, in order to represent the importance of each node of the graph network, the weight of each node and the weight of an edge between any two nodes may be updated and initialized, for example, the labels v1-v15 for the edge in fig. 5, that is, the weights refer to the weights of the edges between two nodes.

In one embodiment, the step S103 includes:

obtaining a node vector which is the same as the second word vector representation in the graph network, and fusing the second word vector representation and the node vector according to the following formula:

new_x＝x+node*σ(W₁[x,node]+b₁)

wherein σ is a nonlinear activation function, node is the node vector, x is the second word vector representation, [ x, node [ ]]Representing the concatenation of said second word vector representation and node vector, W₁To weight matrices to be trained, b₁As the offset term, new _ x is the output vector;

inputting the output vector new _ x into a feed-forward neural network, and outputting the document vector by the feed-forward neural network:

h＝f(w₃*f(w₂new_x+b₂)+b₃)

wherein h is the document vector, w₂、w₃、b₂And b₃Are all parameters to be trained, and f is an activation function.

In this embodiment, referring to fig. 7a, when the graph network and the second word vector representation are merged by using a Fusion layer (Fusion layer) of the graph convolutional network, first, a node vector that is the same as the second word vector representation is selected from node vectors of the graph network, and the second word vector representation is merged with the selected node vector. In other words, for each word in the document, the same word is found in the graph network, and the node in the graph network of the word is obtainedAnd adaptively fusing the vector representation with the second word vector representation. Through nonlinear mapping (such as sigmoid nonlinear mapping), the importance degrees of different words in a sentence can be obtained, the node vectors are fused according to the importance degrees, and the fused vectors are calculated by utilizing a feedforward neural network, so that the document vectors can be obtained. For example, the document vector h is obtained, and the dimension of the document vector h is [ N, S ]₂,H]

In another embodiment, when the graph network and the first word vector representation are merged by using the merging layer of the graph convolution network, a node vector identical to the first word vector representation is first selected from node vectors of the graph network, and the first word vector representation is merged with the selected node vector. Then, through nonlinear mapping (e.g., sigmoid nonlinear mapping), importance degrees of different words in sentences of the user problem can be obtained, the node vectors are fused according to the importance degrees, and the fused vectors are calculated by using a feedforward neural network, so that the problem vector can be obtained, for example, the problem vector u is obtained, and the dimensionality of the problem vector u is [ N, S ]₁,H]。

In one embodiment, as shown in fig. 3, the step S104 includes: steps S301 to S304.

S301, inputting the document vector and the problem vector to a mutual attention layer, and obtaining a first target vector;

s302, calculating the first target vector by using a self-attention layer to obtain a second target vector;

s303, inputting the second target vector and the document vector to a feature fusion layer to obtain a third target vector;

s304, inputting the third target vector to an output layer, and outputting the corresponding answer starting probability and answer ending probability by the output layer.

In this embodiment, the question vector and the document vector are calculated by the mutual attention layer to obtain the first target vector, the first target vector is calculated by the self-attention layer to obtain the second target vector, the second target vector and the document vector are subjected to fusion calculation by the feature fusion layer to obtain the third target vector, an answer start probability and an answer termination probability corresponding to the third target vector are output by the output layer, and a corresponding answer is extracted from the document according to the answer start probability and the answer termination probability.

In one embodiment, the step S301 includes:

calculating the similarity between the document vector and the problem vector according to the following formula to obtain a target matrix S_ij：

In the formula, h_iFor the document vector, u_jFor the purpose of the problem vector,

is a vector dot product and is a product of the vector dots,

the dimension of the obtained S matrix is represented as S₂*S₁，S₂Is the sequence length of the document, S₁S is a matrix obtained by multiplying the document vector and the question vector for the sequence length of the user question;

for the target matrix S according to the following formula_iAnd (3) carrying out normalization treatment:

where b is the normalized output vector, max_row(S) represents the row-wise maximization of S,

indicating best by rowAfter a large value, S is obtained₂A real number;

calculating the normalized output vector b according to the following formula to obtain the first target vector g:

g＝b·h

wherein h is the document vector h_iA matrix is formed.

In this embodiment, when the first target vector is output through the Inter-Attention Layer (Inter-Attention Layer), the similarity between the document vector and the problem vector is first calculated to obtain the target matrix, then the target matrix is normalized by using the softmax function, and then the output vector after the normalization processing is calculated to obtain the first target vector.

In one embodiment, the step S302 includes:

calculating the first target vector input into the self-attention layer according to the following formula to obtain the second target vector m:

where Q, K and V are the three matrices resulting from multiplying the input vector g (i.e., the first target vector) by one matrix,

on a scale, the characteristic dimensions of the Q, K and V matrices are represented. In a specific application scenario, d is set_k＝256。

In one embodiment, the step S303 includes:

inputting the second target vector and the document vector into a feature fusion layer, and calculating according to the following formula to obtain a third target vector:

v＝m+x*σ(W₄[x,m]+b₄)

where v is the third target vector, W₄And b₄For trainable parameters, m isThe second target vector, x is the document vector, and σ is an activation function.

In this embodiment, with reference to fig. 7b, in order to retain useful information for answer extraction in the document, the document vector and the second target vector calculated from the attention layer are input to the feature fusion layer (featurefusion layer), so as to filter out interference information in the document and improve accuracy of final answer extraction.

In one embodiment, the step S304 includes:

and respectively calculating the answer starting probability and the answer terminating probability according to the following formulas:

start＝softmax(W₅v+b₅)

end＝softmax(W₆v+b₆)

in the formula, W₅、b₅、W₆And b₆V is the third target vector, start represents the answer start probability, and end represents the answer end probability.

In this embodiment, the answer start probability and the answer termination probability of the user question in the document may be obtained through the above formulas, the answer start probability and the answer termination probability may be used as indexes in the document, that is, the start answer index and the termination answer index, and a text between the start answer index and the termination answer index may be used as an answer to the user question.

In an embodiment, as shown in fig. 6, a graph network is constructed for a document (Context) containing a user question (Query) by using syntactic dependency analysis, then the graph network and the document, the graph network and the user question are fused by a feed-forward neural network respectively to obtain a document vector (h) and a question vector (u), the document vector and the question vector are input to a mutual attention layer to obtain a first target vector (g), the first target vector is calculated by using a self-attention layer to obtain a second target vector (m), the second target vector and the document vector are input to a feature fusion layer to obtain a third target vector (v), the third target vector is input to an output layer, and the output layer outputs a corresponding answer start probability and answer end probability.

Fig. 8 is a schematic block diagram of an answer extraction apparatus 800 based on a graph-convolution network according to an embodiment of the present invention, where the apparatus 800 includes:

an acquisition unit 801 configured to acquire a user question and a document containing the user question;

a construction unit 802 for constructing a graph network for the document based on syntactic dependency analysis;

a fusion unit 803, configured to fuse the document and the graph network by using a graph convolution network to obtain a document vector, and fuse the user question and the graph network to obtain a question vector;

a learning unit 804, configured to learn the document vector and the question vector based on the graph convolution network and the attention mechanism to obtain an answer start probability and an answer end probability in the document;

an answer extracting unit 805, configured to use the answer start probability and the answer end probability as a start index and an end index of the document, respectively, and use a text between the start index and the end index as an answer to the user question.

In an embodiment, the obtaining unit 801 includes:

the vector representation unit is used for acquiring a user question and a document containing the user question, and learning the user question and the document respectively by using a preset word vector model to obtain a first word vector representation corresponding to the user question and a second word vector representation corresponding to the document.

In one embodiment, as shown in fig. 9, the building unit 802 includes:

a segmentation unit 901, configured to segment the document according to sentences, and segment each segmented sentence to obtain words contained in each sentence;

a first connecting unit 902, configured to, for each sentence, take each word as a graph node of the graph network, obtain a dependency relationship between the words, and connect corresponding graph nodes according to the dependency relationship between the words;

a second connecting unit 903, configured to obtain the same word in each sentence, and connect different sentences using the same word as a connection node, so as to construct a graph network of the obtained document.

In one embodiment, the fusion unit 803 includes:

a document fusion unit, configured to obtain a node vector in the graph network, where the node vector is the same as the second word vector, and fuse the second word vector representation and the node vector according to the following formula:

new_x＝x+node*σ(W₁[x,node]+b₁)

a feedforward output unit, configured to input the output vector new _ x to a feedforward neural network, and output the document vector by the feedforward neural network:

h＝f(w₃*f(w₂new_x+b₂)+b₃)

In one embodiment, as shown in fig. 10, the learning unit 804 includes:

a first input unit 1001 configured to input the document vector and the question vector to a mutual attention layer, and obtain a first target vector;

a first calculating unit 1002, configured to calculate the first target vector by using a self-attention layer to obtain a second target vector;

a second input unit 1003, configured to input the second target vector and the document vector to a feature fusion layer, and obtain a third target vector;

a third input unit 1004, configured to input the third target vector to an output layer, and output, by the output layer, a corresponding answer start probability and answer end probability.

In one embodiment, the second input unit 1003 includes:

the third calculating unit is used for inputting the second target vector and the document vector into a feature fusion layer, and calculating according to the following formula to obtain a third target vector:

v＝m+x*σ(W₄[x,m]+b₄)

where v is the third target vector, W₄And b₄M is the second target vector, x is the document vector, and σ is the activation function for the trainable parameters.

In one embodiment, the third input unit 1004 includes:

the probability calculation unit is used for calculating and obtaining the answer starting probability and the answer ending probability according to the following formulas:

start＝softmax(W₅v+b₅)

end＝softmax(W₆v+b₆)

Since the embodiments of the apparatus portion and the method portion correspond to each other, please refer to the description of the embodiments of the method portion for the embodiments of the apparatus portion, which is not repeated here.

Embodiments of the present invention also provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed, the steps provided by the above embodiments can be implemented. The storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The embodiment of the present invention further provides a computer device, which may include a memory and a processor, where the memory stores a computer program, and the processor may implement the steps provided in the above embodiments when calling the computer program in the memory. Of course, the computer device may also include various network interfaces, power supplies, and the like.

The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims

1. An answer extraction method based on a graph convolution network is characterized by comprising the following steps:

acquiring a user question and a document containing the user question;

2. The method of claim 1, wherein the obtaining of the user question and the document containing the user question comprises:

3. The method of claim 1, wherein the constructing a graph network for the document based on syntactic dependency analysis comprises:

segmenting the document according to sentences, and segmenting each segmented sentence to obtain words contained in each sentence;

for each sentence, each word is used as a graph node of the graph network, the dependency relationship among the words is obtained, and the corresponding graph nodes are connected according to the dependency relationship among the words;

and acquiring the same words in each sentence, and connecting different sentences by using the same words as connecting nodes, thereby constructing a graph network for obtaining the document.

4. The answer extraction method based on the graph convolution network as claimed in claim 2, wherein the fusing the document and the graph network by using the graph convolution network to obtain a document vector comprises:

new_x＝x+node*σ(W₁[x,node]+b₁)

h＝f(w₃*f(w₂new_x+b₂)+b₃)

5. The method of claim 4, wherein learning the document vector and the question vector based on the graph convolution network and an attention mechanism to obtain an answer start probability and an answer end probability in the document comprises:

inputting the document vector and the problem vector to a mutual attention layer, and obtaining a first target vector;

calculating the first target vector by using a self-attention layer to obtain a second target vector;

inputting the second target vector and the document vector to a feature fusion layer, and obtaining a third target vector;

and inputting the third target vector to an output layer, and outputting the corresponding answer starting probability and answer ending probability by the output layer.

6. The method of claim 5, wherein the inputting the second target vector and the document vector into a feature fusion layer and obtaining a third target vector comprises:

v＝m+x*σ(W₄[x,m]+b₄)

7. The method for extracting answers based on the graph-convolution network as claimed in claim 5, wherein the inputting the third target vector to an output layer and outputting a corresponding answer start probability and answer end probability by the output layer comprises:

start＝softmax(W₅v+b₅)

end＝softmax(W₆v+b₆)

8. An answer extraction device based on a graph convolution network, comprising:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method for extracting answers based on a graph and volume network according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the method for extracting answers based on a graph-volume network according to any one of claims 1 to 7.