CN115564013B

CN115564013B - Method for improving learning representation capability of network representation, model training method and system

Info

Publication number: CN115564013B
Application number: CN202110908974.XA
Authority: CN
Inventors: 沈颖; 董晨鹤
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2021-08-09
Filing date: 2021-08-09
Publication date: 2024-02-09
Anticipated expiration: 2041-08-09
Also published as: CN115564013A

Abstract

The method for improving the learning representation capacity of the network representation, the model training method and the system provided by the invention acquire the set of each sub-graph according to the graph network; inputting the adjacency matrix and the feature matrix representing the subgraph into a graph convolution neural network to obtain an initial vector for network representation learning of the subgraph; according to the relation between the center node and each non-center node in the subgraph, calculating the attention weight between the center node and each non-center node; according to the initial vector and the attention weight between each non-central node, obtaining a node knowledge representation vector with weight after the sub-graph nodes and the relations are combined; adopting an attention mechanism to calculate and obtain the attention weight of the relation among the non-central nodes; and obtaining a weighted aggregate vector of the subgraph according to the node knowledge representation vector and the attention weights of the relations. The nodes are represented by the weighted aggregate vectors, so that node knowledge is embedded into the heterogeneous network with richer semantic information, and knowledge calculation and reasoning capacity of the heterogeneous network are improved.

Description

Method for improving learning representation capability of network representation, model training method and system

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a method for improving the learning representation capacity of network representation, a model training method and a model training system.

Background

In recent years, with the growth of heterogeneous networks in real life, heterogeneous network data mining has received a great deal of attention in both academia and industry fields, and is widely applied to research works such as question-answering systems, recommendation systems, information retrieval, social networks, computer vision and the like. The efficient representation of network nodes is a key issue in performing network analysis tasks. In recent years, researchers have proposed network representation learning aimed at learning low-dimensional dense vector representations for all nodes in a high-dimensional sparse network, preserving rich network information in order to perform subsequent network analysis tasks. Existing research efforts have not fully explored the interaction information between nodes and neighborhood nodes and therefore ignore much of the critical information. Say, the relationship graph rolling network R-GCN model does not utilize graph attention mechanisms, and only learns node representations directly from weight parameters. The heterograph attention network HAN considers only the heterogeneous subgraphs contained in the predefined meta-paths, which is computationally complex and not optimal in identifying heterogeneous network components. Further, in the real world network, the nodes often have the characteristic of high-dimensional sparseness, face the problem of dimension disasters, and have limited capacity of node representation.

Disclosure of Invention

The invention provides a method for improving the representation capability of network representation learning, a model training method and a system, and aims to improve the representation capability of network representation learning.

According to a first aspect, in one embodiment, there is provided a method for improving learning ability of a network representation, including:

obtaining a set of sub-graphs of a graph network according to the graph network; the subgraph includes: a central node, at least one non-central node, and relationships between nodes; the subgraph is represented by adopting an adjacency matrix and a feature matrix; the node comprises an identification of the node and characteristics of the node;

for each sub-graph, inputting an adjacent matrix and a feature matrix representing the sub-graph into a graph convolution neural network to obtain an initial vector for network representation learning of the sub-graph;

according to the relation between the center node and each non-center node in the subgraph, calculating the attention weight between the center node and each non-center node;

according to the attention weight between the initial vector and each non-central node, obtaining a node knowledge representation vector with weight after the sub-graph nodes and the relations are combined;

aiming at the central node in each sub-graph, attention weight of the relation among all non-central nodes is calculated by adopting an attention mechanism;

and obtaining a weighted aggregate vector of the subgraph according to the node knowledge representation vector of the subgraph and the attention weights of the relations.

In the method, attention weight of the relation between each non-central node is calculated by adopting an attention mechanism, and the method comprises the following steps:

obtaining a relation pair group of each neighborhood node of the central node according to the subgraph, wherein the relation pair group of the neighborhood nodes comprises at least one relation pair; the number of relation pairs in the relation pair group of the neighborhood nodes is related to the number of the neighborhood nodes; the relation pair comprises a relation e between a central node and the neighborhood node and a relation e' between the neighborhood node and other non-central nodes;

calculating the similarity between the relationship e' and the relationship e of each relationship pair in each relationship pair group;

obtaining the attention weight of the relation between the neighborhood node and other non-central nodes according to the similarity of each relation pair of the relation pair group of the neighborhood node;

and obtaining the attention weight of the relationship between each non-central node according to the attention weight of the relationship between each neighborhood node and other non-central nodes.

In the method, before the adjacency matrix and the feature matrix representing the subgraph are input into the graph convolution neural network, the method further comprises:

and symmetrically normalizing the adjacency matrix.

According to a second aspect, in one embodiment there is provided a method of improving learning capabilities of a network representation, comprising:

and obtaining a node knowledge representation vector with weight after the sub-graph nodes and the relations are combined according to the attention weight between the initial vector and each non-central node.

According to a third aspect, in one embodiment, there is provided a method for improving learning ability of a network representation, including:

and obtaining a weighted aggregate vector of the subgraph according to the initial vector and the attention weights of the relations.

According to a fourth aspect, in one embodiment, there is provided a training method of a question-answering model, including:

acquiring a question-answer training set, wherein the question-answer training set comprises a plurality of question answer pairs;

converting the questions of the question answer pair into corresponding subgraphs, and converting the answers of the question answer pair into corresponding subgraphs; the subgraph includes: a central node, at least one non-central node, and relationships between nodes; the subgraph is represented by adopting an adjacency matrix and a feature matrix; the node comprises an identification of the node and characteristics of the node;

obtaining a weighted aggregate vector of the subgraph according to the node knowledge representation vector of the subgraph and the attention weights of all the relations;

and (5) inputting the weighted aggregate vector of the question subgraph and the weighted aggregate vector of the corresponding answer subgraph into a question-answer model for training.

In the method, the question-answering model is a convolutional neural network model.

According to a fifth aspect, in one embodiment there is provided a system for improving learning capabilities of a network representation, comprising:

a memory for storing a program;

and a processor for implementing the method as described above by executing the program stored in the memory.

According to a sixth aspect, in one embodiment, there is provided a training system for a question-answering model, including:

a memory for storing a program;

According to a seventh aspect, an embodiment provides a computer readable storage medium having stored thereon a program executable by a processor to implement a method as described above.

According to the method for improving the learning representation capability of the network representation, the model training method and the system, the collection of each sub-graph is obtained according to the graph network; the subgraph includes: a central node, at least one non-central node, and relationships between nodes; the subgraph is represented by an adjacency matrix and a feature matrix; the nodes comprise identifiers and characteristics of the nodes; for each sub-graph, inputting an adjacent matrix and a feature matrix representing the sub-graph into a graph convolution neural network to obtain an initial vector for network representation learning of the sub-graph; according to the relation between the center node and each non-center node in the subgraph, calculating the attention weight between the center node and each non-center node; according to the initial vector and the attention weight between each non-central node, obtaining a node knowledge representation vector with weight after the sub-graph nodes and the relations are combined; aiming at the central node in each sub-graph, attention weight of the relation among all non-central nodes is calculated by adopting an attention mechanism; and obtaining a weighted aggregate vector of the subgraph according to the node knowledge representation vector of the subgraph and the attention weights of the relations. The nodes are represented by the weighted aggregate vectors, so that node knowledge is embedded into the heterogeneous network with richer semantic information, and knowledge calculation and reasoning capacity of the heterogeneous network are improved.

Drawings

FIG. 1 is a flow chart of an embodiment of a method for improving learning representation capabilities of a network representation according to the present invention;

FIG. 2 is a flow chart of an embodiment of step 5 in FIG. 1;

FIG. 3 is a flow chart of another embodiment of a method for improving the learning representation capability of a network representation provided by the present invention;

FIG. 4 is a flowchart of an embodiment of a model training method according to the present invention;

FIG. 5 is a block diagram illustrating an embodiment of a system for improving learning ability of a network representation according to the present invention;

fig. 6 is a block diagram of an embodiment of a training system for question-answering models provided by the present invention.

Detailed Description

The invention will be described in further detail below with reference to the drawings by means of specific embodiments. Wherein like elements in different embodiments are numbered alike in association. In the following embodiments, numerous specific details are set forth in order to provide a better understanding of the present application. However, one skilled in the art will readily recognize that some of the features may be omitted, or replaced by other elements, materials, or methods in different situations. In some instances, some operations associated with the present application have not been shown or described in the specification to avoid obscuring the core portions of the present application, and may not be necessary for a person skilled in the art to describe in detail the relevant operations based on the description herein and the general knowledge of one skilled in the art.

Furthermore, the described features, operations, or characteristics of the description may be combined in any suitable manner in various embodiments. Also, various steps or acts in the method descriptions may be interchanged or modified in a manner apparent to those of ordinary skill in the art. Thus, the various orders in the description and drawings are for clarity of description of only certain embodiments, and are not meant to be required orders unless otherwise indicated.

The numbering of the components itself, e.g. "first", "second", etc., is used herein merely to distinguish between the described objects and does not have any sequential or technical meaning. The terms "coupled" and "connected," as used herein, are intended to encompass both direct and indirect coupling (coupling), unless otherwise indicated.

According to the method, knowledge calculation in the heterogeneous network is realized by designing a combination mechanism and an aggregation mechanism and deploying a heterogeneous graph network combination aggregation cooperative mechanism. High quality node representations are learned from heterogeneous networks by knowledge embedding methods and useful neighborhood knowledge is extracted. Furthermore, the knowledge representation calculation is effectively integrated into the deep learning model, so that the graph network has practical significance after being processed. The aggregation and combination mechanism can adaptively generate neighborhood information aware embedding for the nodes, and better learn interaction in the nodes and among the nodes. The following will illustrate by way of specific examples.

Given a heterogeneous network g= (V, E, X), V _i E V, V is the set of all nodes; e is the set of edges in the network; x is an attribute (i.e. feature), node v _i N for neighborhood set _i Representation, N _i ＝{v _j |e _ij ∈E}。

The invention may use the following concepts:

adjacent nodes: if node v _i And v _j Directly connected, i.e. e _ij Not equal to 0, consider node v _i And v _j Are adjacent nodes.

Neighbor nodes: if node v _i And v _j There is a path between and from node v _i Starting from, it is possible to reach node v after k hops along the path _j Where k > 1, then node v is considered _i And v _j Are neighbor nodes to each other.

Network representation learning: for a given network g= (V, E, X), the network representation learning is intended for each node V in the network _i Learning a low-dimensional dense real-valued vectorWherein->Mapping function f: v _i →y _i The method can simultaneously reserve the structural information and the characteristic information of the nodes in the network.

As shown in fig. 1, the method for improving learning ability of a network representation provided by the invention comprises the following steps:

and step 1, obtaining a set of all sub-graphs of the graph network according to the graph network. For example, all nodes of the graph network are respectively used as central nodes to obtain adjacent nodes and neighbor nodes of the central nodes, and one central node, the adjacent nodes and the neighbor nodes form a subgraph. The graph network, i.e. the heterogeneous network G, is known. The subgraph includes: a central node, at least one non-central node, and relationships between nodes; the relationship is also called an edge. The node includes an identification of the node and a characteristic of the node, i.e., an attribute of the node. The subgraphs are represented by adjacency matrices and feature matrices.

The core idea of the combination mechanism provided by the invention is to explore given nodes and the relation thereof in a sparse heterogeneous network, and generate node embedding with rich semantics according to the weighted combination of the relation embedding. The combined attention mechanism is used for normalizing the neighborhood information of the current node and realizing the adjustment of the neighborhood information. Taking news recommendation based on a question and answer system as an example, general male users watch more military news, and female users watch more entertainment news, and a certain correlation between two-dimensional characteristics of gender and news channels can be found. In the field of recommended advertisements, a second order combination feature can be constructed based on two feature sets of gender= { male, female } and news= { military, entertainment } typically with one-hot coding features: gender_news= { male_military, male_entertainment, female_military, female_entertainment }. The feature combination improves the expressive power of the model by constructing intersecting features. The combining mechanism is specifically set forth below by steps 2, 3 and 4.

And 2, inputting an adjacent matrix and a feature matrix representing the subgraph into the graph rolling neural network for each subgraph to obtain an initial vector for network representation learning of the subgraph, namely a node representation vector of an hidden layer matrix coded by the graph rolling neural network.

The node-relation information of the central node v of the subgraph needs to be combined. The embodiment selects the graph convolutional neural network GCN, mainly considers that the GCN has good performance, has a small number of parameters, can be realized through matrix multiplication, and has high calculation efficiency.

For example, for a subgraph, the present embodiment employs a graph convolutional neural network to learn a mapping function f (X, a) that takes as input the adjacency matrix and feature matrix (also referred to as attribute matrix) of the subgraph. The propagation formula between layers of the graph roll-up neural network is as follows:

wherein the method comprises the steps ofI is an identity matrix; a is an adjacent matrix, which itself represents the link relation between each node and the adjacent node, and the matrix is obtained after I is added>The graph rolling operation object comprises information of own nodes and adjacent nodes; />Is the degree matrix of the network,/>I.e. the degree matrix of the network of ii is the i and j node matrix +.>The diagonal element value is the degree of each vertex and the remaining elements are 0. To prevent gradient extinction and gradient explosion, a symmetric normalization operation was introduced:so that the sum of each row of the matrix is 1; w (W) ^(l) Is a trainable layer-1 weight matrix; h ^(l) Is an implicit state of the first layer, for input layer H ⁽⁽⁾⁾ =x; σ (·) represents a nonlinear activation function, e.g., RELU (·) =max (0, ·). By stacking multiple graph convolutional neural network layers, the relationship between a node and neighboring nodes that are multi-hop away can be captured.

Specifically, this embodiment is described by taking a two-layer graph convolutional neural network as an example, and before inputting the adjacency matrix and the feature matrix representing the subgraph into the graph convolutional neural network, the method further includes: the adjacency matrix is symmetrically normalized, and the specific formula is as follows:

where D is the degree matrix of the networkIs a matrix of the original matrix of (a).

Then the processed adjacency matrixAnd the feature matrix X is input into a two-layer graph convolution neural network. The forward propagation model is shown in the following formula:

(3) The method comprises the steps of carrying out a first treatment on the surface of the Equation (3) and equation (1) are essentially the same, and function is to obtain a more accurate vector.

Wherein W is ⁽⁰⁾ The e F x d is a first layer weight matrix, which is used to map the features of the node into a hidden layer state, that is, to map the features into vectors, and because the vectors are in the process of the neural network, not in the final state, or in the state of the hidden layer process; f represents the length of the sentence and d represents the dimension of the vector. W (W) ⁽¹⁾ ∈d ₀ ×d ₁ Weight matrix for second hidden layer, d ₀ Representing the vector dimension, d, of a first layer of graph convolutional neural network ₁ Representing the vector dimension of the second layer graph convolution neural network. After two layers of graph convolution, obtaining an hidden layer matrix H= [ H ] coded by a graph convolution network ₁ ，h ₂ ，...h _|V| ]。h ₁ ，h ₂ ，...h _|V| Respectively representing a layer 1 hidden layer vector, a layer 2 hidden layer vector, a layer … … and a layer V hidden layer vector, wherein V is self-definition, and if only two layers exist, only h1 and h2 exist.

And 3, calculating the attention weight between the center node and each non-center node according to the relation between the center node and each non-center node in the subgraph. Center sectionPoint v _i And a non-central node v _j The relation (edge) e of (a) is known, the non-central node v _j Centering node v _i Importance of (v) _j ；v _i ) Attention weight of (a)The method comprises the following steps:

assuming that the entire sub-graph has k nodes, thenIs the vector of all k nodes at the hidden layer of layer (l-1). It can be seen that the attention weight between the center node and each non-center node can be calculated by equation (4). For example, a central node v _i Is male user A, non-central node v _j Is military news, non-central node v _k Is sports news. v _i Watch 10 times of military news and 8 times of sports news, so military news v _j Is weighted 1, sports news v _k The weight of (2) is 0.8. The weight is determined according to the viewing frequency. The frequency, i.e., the number of times links occur between nodes, is a known quantity.

The present invention utilizes activation functions such as ELU () and ReLU () to handle node v _i And v _j Is a cascade node feature of (a). Wherein the method comprises the steps ofAnd->Respectively node v _i And v _j The hidden layer representation of the first-1 encoded through the graph rolling network.

Is asymmetric because of node v _j Opposite node v _i May be of different importance than node v _i Opposite node v _j Is of importance. For a given relationship e, all node pairs share attention weights, so each node is affected by its neighborhood context (neighbor node).

And 4, obtaining a node knowledge representation vector with weight after the sub-graph nodes and the relations are combined according to the initial vector and the attention weight between each non-center node. For example, attention is weightedSubstitution uses the latter half of equation (8), i.e., substitution equation: />The vector output after the combination of the nodes and the relations of the heterogeneous network can be obtained>The vector is node knowledge embedding with weight and has practical significance. The node knowledge representation vector can be used for representing the sub-graph, namely the central node of the sub-graph, and the node knowledge representation vector of all the sub-graphs can be used for representing the graph network, so that the network representation learning is realized.

Of course, the method is not limited to this, and the aggregation mechanism can be combined on the basis of the combination mechanism, so that the representation capability of the network representation learning can be further improved.

The aggregation mechanism is an important mechanism of the Graph Neural Network (GNN) method in the network representation learning field, and aims to finally obtain the implicit representation of the node by continuously carrying out nonlinear transformation, propagation and aggregation on the node characteristics from the local neighborhood. Although the graph neural network has been successful in many tasks, there is essentially a drawback: the neighborhood aggregation mechanism uses node original characteristics to aggregate. In real world networks, the characteristics of nodes tend to be sparse in high dimensions. For example, in a social network, nodes represent users and edges represent friends among users, typically a set of id-type category features (categorical feature) are used to describe user portraits, such as gender, occupation, educational level, etc., which are then converted to binary features (binary features) by one-hot encoding. In the reference network, nodes represent papers, edges between nodes represent reference relationships, and node text information is usually encoded by using a bag-of-words model (bag-of-words) or a TF-IDF model as node characteristics. The above coding modes lead node characteristics to be high-dimensional and sparse, and as the existing graph neural network method is not designed specifically for the sparsity of network characteristics, the expression capacity of the model is limited to the problem of dimension disasters caused by the sparsity to a great extent.

The existing model often combines a combination mechanism and an aggregation mechanism together, and the two heterogeneous information, namely the characteristic information and the structural information, are not well fused, have poor stability and are not high in universality.

The invention expands the multiplication attention of the transducer model to the heterogeneous network field so as to calculate the similarity degree of the relation e between any two nodes of the neighborhood, such as the nodes i and j. The following is a detailed description.

And 5, aiming at the central node in each sub-graph, calculating the attention weight of the relation among the non-central nodes by adopting an attention mechanism.

The attention weight of the relationship between the non-central nodes is calculated by adopting an attention mechanism, and the method shown in fig. 2 can be adopted:

step 51, outputting the vector after combining the central node i and the correlation thereof of the heterograph networkMapping to a learnable weight matrix W _1，e ，W _2，e ，W _3，e The query vector q of the transducer model of the node i can be obtained _e，i Key vector k _e，i Sum (value) vector v _e，i ：

The other calculation modes of the relation between any two nodes are the same as the above.

Obtaining a relation pair group of each neighborhood node of the center node according to the subgraph, wherein the relation pair group of the neighborhood nodes comprises at least one relation pair; the number of relation pairs in the relation pair group of the neighborhood nodes is related to the number of the neighborhood nodes; typically a domain node corresponds to a relationship pair. The relationship pair includes a relationship e between the center node and the neighborhood node, and a relationship e' between the neighborhood node and other non-center nodes. The neighborhood nodes include neighboring nodes and neighbor nodes.

And step 52, calculating the similarity between the relationship e' and the relationship e of each relationship pair in each relationship pair group.

Assuming that the relationship between the center node and the neighborhood nodes is e, and the relationship between the neighborhood nodes and other non-center nodes is e'. The importance of the relationship e 'of node i can be calculated by capturing the similarity of the pair of relationships (e, e') by the following equation (6).

Wherein the method comprises the steps ofTransformer query vector q, which is node i _e，i Is a transposed matrix of (a); k (k) _e′，i Is the resulting key vector given the relationship e' between the center node and the neighborhood nodes and other non-center nodes. The similarity of (e, e') can be regarded as a relational embedding.

The softmax (·) activation function may also be employed to normalize the relationship embedded Rel _att ：

And 53, obtaining the attention weight of the relation between the neighborhood node and other non-central nodes according to the similarity of each relation pair of the relation pair group of the neighborhood node. The more similar e ' and e, the greater the attention weight of e ' and the greater the contribution of e ' embedding to the final embedding of node i.

And step 54, obtaining the attention weight of the relation between each non-central node according to the attention weight of the relation between each neighborhood node and other non-central nodes.

And 6, representing the vector according to the node knowledge of the subgraph and the attention weight of each relation to obtain the weighted aggregate vector of the subgraph. For example, a weighted aggregation of nodes is achieved by a relational embedded summation function:

that is, aggregation operation is realized by summing the Transformer query vector, the key vector and the value vector (obtaining heterogeneous network characteristics), taking the hidden layer vector (obtaining node structure characteristics) into consideration, and summing after softmax normalization operation.

The combining operation and the aggregation operation are performed in the heterogeneous network at the same time, and more knowledge can be learned than when the model only learns the combining operation or the aggregation operation.

The deployed combining and aggregation operations act on objects ("node-relationships" and "relationship-relationships") in the heterogeneous network, capturing more heterogeneous network component information.

The embedded universal heterograph network aggregation combination mechanism of the node i in the (l+1) th layer is calculated by adopting the following formula:

wherein,is a vector output function for exploring neighborhood information interaction between nodes-relationships, which gives nodes i, j e N by using a combination operation COMBINE () _i Is embedded in the relationship.

Is a weighted relation vector output function after calculating relation-relation interaction. The aggregation operation AGGRERATE (·) considers neighborhood relationship similarities and aggregates them together to form a final node embedding.

The above embodiment provides a method for improving the learning ability of the network representation based on a combination mechanism and based on a combination aggregation collaboration mechanism for the graph network, and of course, the method for improving the learning ability of the network representation based on only the aggregation mechanism may also be implemented, as shown in fig. 3, including the following steps:

step 1', obtaining a set of sub-graphs of a graph network according to the graph network; the subgraph includes: a central node, at least one non-central node, and relationships between nodes; the subgraph is represented by adopting an adjacency matrix and a feature matrix; the node includes an identification of the node and a characteristic of the node. The present step is the same as step 1 in the above embodiment, and will not be described here again.

And 2', inputting an adjacency matrix and a feature matrix representing the subgraph into the graph convolution neural network for each subgraph to obtain an initial vector for network representation learning of the subgraph. Similarly, the present step is the same as step 2 in the above embodiment, and will not be described here again.

And 3', aiming at the central nodes in each subgraph, calculating the attention weight of the relation among the non-central nodes by adopting an attention mechanism. This step is the same as step 5 in the above embodiment, and will not be described here again.

And 4', obtaining a weighted aggregate vector of the subgraph according to the initial vector and the attention weights of the relations. For example, a weighted aggregation of nodes is achieved by a relational embedded summation function:

The embedded universal heterograph network aggregation mechanism of node i in the (l+1) th layer is calculated by adopting the following formula:

the present invention also provides a system for improving learning ability of a network representation, as shown in fig. 5, comprising a first memory 10 and a first processor 20. The first memory 10 stores a program, and the first processor 20 implements the method shown in steps 1 to 4 in fig. 1, the method shown in fig. 1, or the method shown in fig. 3 by executing the program stored in the first memory 10. The specific process is described in detail in the above method embodiments, and is not described herein.

The invention can also utilize the method for improving the learning ability of the network representation to carry out model training, taking a question-answer model as an example, the training method of the question-answer model provided by the invention is shown in figure 4, and comprises the following steps:

step 1", acquiring a question and answer training set, wherein the question and answer training set comprises a plurality of question answer pairs. The question-answer training set is known and prepared in advance, so that training of the deep learning model is facilitated. In this embodiment, the question-answer model is described by taking a convolutional neural network model as an example.

Step 2", converting the questions of the question answer pair into corresponding subgraphs, and converting the answers of the question answer pair into corresponding subgraphs; the subgraph includes: a central node, at least one non-central node, and relationships between nodes; the subgraph is represented by adopting an adjacency matrix and a feature matrix; the node includes an identification of the node and a characteristic of the node. The specific process is the same as the step 1 in the above embodiment, and will not be described herein.

And 3', inputting an adjacency matrix and a feature matrix representing the subgraph into the graph convolution neural network for each subgraph to obtain an initial vector for network representation learning of the subgraph. The specific process is the same as the step 2 in the above embodiment, and will not be described herein.

And 4', calculating the attention weight between the center node and each non-center node according to the relationship between the center node and each non-center node in the subgraph. The specific process is the same as the step 3 in the above embodiment, and will not be described here again.

And step 5', obtaining a node knowledge representation vector with weight after the sub-graph nodes and the relations are combined according to the initial vector and the attention weight between each non-center node. The specific process is the same as step 4 of the above embodiment, and will not be described herein.

And 6', aiming at the central node in each sub-graph, calculating the attention weight of the relation among the non-central nodes by adopting an attention mechanism. The specific process is the same as step 5 in the above embodiment, and will not be described here again.

And 7', representing the vector according to the node knowledge of the subgraph and the attention weight of each relation to obtain a weighted aggregate vector of the subgraph. The specific process is the same as step 6 of the above embodiment, and will not be described herein.

And 8', inputting the weighted aggregate vector of the question subgraph and the weighted aggregate vector of the corresponding answer subgraph into the question-answer model for training. For example, after obtaining the weighted aggregate vector of the question sub-graph and the weighted aggregate vector of the corresponding answer sub-graph, performing two classifications by a softmax function, and finally obtaining the correlation degree score of the question and the answer, thereby performing selection ranking.

The weighted aggregate vector of questions and answers is input to the final softmax layer for two classifications after passing through a full connection layer:

y＝softmax(W _o o+b _o )

wherein W is _o ，b _o To learn parameters, they are convolution kernels of the full-connection layer output by the convolutional neural network, i.e. weight matrix W of the full-connection layer _o And an offset matrix b _o 。

Then, the whole end-to-end model needs to be trained to minimize the cross entropy loss function:

where p is the output of the softmax layer, θ contains all parameters in the network that need to be trained,is an L2 regularization factor.

And finally, determining the correlation degree of the questions and each candidate answer according to the results of the two classifications, and sequencing.

The present invention also provides a training system for a question-answering model, as shown in fig. 6, comprising a second memory 10 'and a second processor 20'. The second memory 10' stores a program, and the second processor 20' implements the method shown in fig. 4 by executing the program stored in the second memory 10 '. The specific process is described in detail in the above method embodiments, and is not described herein.

In summary, the invention provides a method for optimizing network representation learning, a model training method and a system, which can effectively process node-relationship data interaction of heterogeneous networks. The model deploys a new combination mechanism, explores node-relation interaction in a sparse heterogeneous network, and learns node embedding by using weighted combination of neighborhood characteristics; a new aggregation mechanism is deployed, interaction and similarity among the relations are explored, and final node embedding is obtained through weighted aggregation of node embedding. The embodiment enables the learned target node knowledge embedding to have richer semantic information so as to improve knowledge calculation and reasoning capability of the heterogeneous network and improve the performance of the question-answering system. The method has certain reference significance for the same problems existing in other natural language processing tasks.

Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by a computer program. When all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a computer readable storage medium, and the storage medium may include: read-only memory, random access memory, magnetic disk, optical disk, hard disk, etc., and the program is executed by a computer to realize the above-mentioned functions. For example, the program is stored in the memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above can be realized. In addition, when all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and the program in the above embodiments may be implemented by downloading or copying the program into a memory of a local device or updating a version of a system of the local device, and when the program in the memory is executed by a processor.

The foregoing description of the invention has been presented for purposes of illustration and description, and is not intended to be limiting. Several simple deductions, modifications or substitutions may also be made by a person skilled in the art to which the invention pertains, based on the idea of the invention.

Claims

1. A method for training a question-answering model, comprising:

aiming at the central node in each sub-graph, attention weight of the relation among all non-central nodes is calculated by adopting an attention mechanism; the attention weight of the relationship between the non-central nodes obtained by calculation through an attention mechanism comprises the following steps: obtaining a relation pair group of each neighborhood node of the central node according to the subgraph, wherein the relation pair group of the neighborhood nodes comprises at least one relation pair; the number of relation pairs in the relation pair group of the neighborhood nodes is related to the number of the neighborhood nodes; the relation pair comprises a relation e between a central node and the neighborhood node and a relation e' between the neighborhood node and other non-central nodes; calculating the similarity between the relationship e' and the relationship e of each relationship pair in each relationship pair group; obtaining the attention weight of the relation between the neighborhood node and other non-central nodes according to the similarity of each relation pair of the relation pair group of the neighborhood node; obtaining the attention weight of the relation between each non-central node according to the attention weight of the relation between each neighborhood node and other non-central nodes; the more similar e ' and e, the greater the attention weight of e ', the greater the contribution of e ' embedding to the final embedding of the node;

2. The method of claim 1, wherein before inputting the adjacency matrix and the feature matrix representing the subgraph into the graph convolutional neural network, further comprising:

and symmetrically normalizing the adjacency matrix.

3. A method for training a question-answering model, comprising:

obtaining a weighted aggregate vector of the subgraph according to the initial vector and the attention weights of the relations;

4. A method as claimed in claim 1 or claim 3, wherein the question-answer model is a convolutional neural network model.

5. A training system for a question-answering model, comprising:

a memory for storing a program;

a processor for implementing the method according to any one of claims 1-4 by executing a program stored in said memory.

6. A computer readable storage medium having stored thereon a program executable by a processor to implement the method of any one of claims 1 to 4.