CN114970816A - Method and device for training neural network of graph - Google Patents

Method and device for training neural network of graph Download PDF

Info

Publication number
CN114970816A
CN114970816A CN202210551012.8A CN202210551012A CN114970816A CN 114970816 A CN114970816 A CN 114970816A CN 202210551012 A CN202210551012 A CN 202210551012A CN 114970816 A CN114970816 A CN 114970816A
Authority
CN
China
Prior art keywords
neural network
graph
data
graph neural
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210551012.8A
Other languages
Chinese (zh)
Inventor
孙宝林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ant Blockchain Technology Shanghai Co Ltd
Original Assignee
Ant Blockchain Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ant Blockchain Technology Shanghai Co Ltd filed Critical Ant Blockchain Technology Shanghai Co Ltd
Priority to CN202210551012.8A priority Critical patent/CN114970816A/en
Publication of CN114970816A publication Critical patent/CN114970816A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a method and a device for training a graph neural network, wherein the graph neural network is used for representing learning graph structure data and comprises at least two coding layers and a dropout layer, and the method comprises the following steps: inputting node data of the graph structure data into a first coding layer and a first dropout layer of the graph neural network to obtain a first characterization vector; inputting node data of the graph structure data into a second coding layer and a second dropout layer of the graph neural network to obtain a second characterization vector; calculating loss values of the first token vector and the second token vector; and optimizing parameters of the graph neural network according to the loss values of the first characterization vector and the second characterization vector to obtain a first graph neural network model.

Description

Method and device for training neural network of graph
Technical Field
The application relates to the technical field of machine learning, in particular to a method and a device for training a neural network of a graph.
Background
The graph neural network can perform characterization learning on the graph structure data, extract and mine information in the graph structure data, and apply the information to downstream services, such as data classification, data prediction, data generation and the like.
The problem that node information is wasted and the node characterization capability is poor exists when the graph neural network in the prior art performs characterization learning on graph structure data.
Disclosure of Invention
In view of this, the present application provides a method and an apparatus for training a graph neural network, so as to solve the problems of node information waste and poor node characterization capability when the graph neural network performs characterization learning on graph structure data.
In a first aspect, a method for training a graph neural network for characterizing learning graph structure data is provided, the graph neural network including at least two encoding layers and a dropout layer, the method including: inputting node data of the graph structure data into a first coding layer and a first dropout layer of the graph neural network to obtain a first characterization vector; inputting node data of the graph structure data into a second coding layer and a second dropout layer of the graph neural network to obtain a second characterization vector; calculating loss values of the first token vector and the second token vector; and optimizing parameters of the graph neural network according to the loss values of the first characterization vector and the second characterization vector to obtain a first graph neural network model.
Optionally, the calculating the loss values of the first token vector and the second token vector includes: calculating loss values of the first token vector and the second token vector using a cross entropy loss function or a relative entropy loss function.
Optionally, the method further comprises: and training the first graph neural network model according to the label data of the application scene to obtain a second graph neural network model applied to the application scene.
Optionally, the method further comprises: and training the second graph neural network model according to the label data of the business data to obtain a third graph neural network model applied to the business.
Optionally, the graph neural network is one of a graph convolution neural network, a graph attention neural network, and an adaptive sensory path.
Optionally, the graph structure data is knowledge-graph data.
In a second aspect, an apparatus for training a graph neural network for characterizing learning graph structure data is provided, the graph neural network including at least two encoding layers and a dropout layer, the apparatus comprising: the first training module is configured to input node data of the graph structure data into a first coding layer and a first dropout layer of the graph neural network to obtain a first characterization vector; the second training module is configured to input node data of the graph structure data into a second coding layer and a second dropout layer of the graph neural network to obtain a second characterization vector; a calculation module configured to calculate loss values for the first token vector and the second token vector; and the optimization module is configured to optimize parameters of the graph neural network according to the loss values of the first characterization vector and the second characterization vector to obtain a first graph neural network model.
Optionally, the calculating the loss value of the first token vector and the second token vector comprises: calculating loss values of the first token vector and the second token vector using a cross entropy loss function or a relative entropy loss function.
Optionally, the apparatus further comprises: and the third training module is configured to train the first graph neural network model according to the label data of the application scene to obtain a second graph neural network model applied to the application scene.
Optionally, the apparatus further comprises: and the fourth training module is configured to train the second graph neural network model according to the label data of the business data to obtain a third graph neural network model applied to the business.
Optionally, the graph neural network is one of a graph convolution neural network, a graph attention neural network, and an adaptive sensory path.
Optionally, the graph structure data is knowledge-graph data.
A third aspect provides an apparatus for training a neural network, comprising a memory having executable code stored therein and a processor configured to execute the executable code to implement the method of the first aspect.
According to the method for training the neural network of the graph, at least two different dropout layers are adopted, fewer neural units are used for representation learning of graph structure data respectively, different representation vectors of node data are obtained, model parameters are optimized, the different representation vectors tend to be consistent, the neural network of the graph is forced to deeply mine information of the node data, node information content is improved, and node representation capacity of the neural network of the graph is improved.
Drawings
Fig. 1 is a schematic diagram of an enterprise knowledge graph according to an embodiment of the present application.
Fig. 2 is a schematic structural diagram of a graph neural network provided in an embodiment of the present application.
Fig. 3 is a schematic flowchart of a method for training a neural network according to an embodiment of the present disclosure.
Fig. 4 is a schematic flowchart of another method for training a neural network according to an embodiment of the present disclosure.
Fig. 5 is a schematic diagram of a method for training a neural network according to an embodiment of the present disclosure.
Fig. 6 is a schematic structural diagram of an apparatus for training a neural network according to an embodiment of the present disclosure.
Fig. 7 is a schematic structural diagram of another apparatus for training a neural network according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application are clearly and completely described below, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments.
With the continuous emergence of new internet applications such as social networks, mobile internet and internet of things, data generated among different entities grows exponentially, and the internal dependence and complexity of the data increase. Generally, the relationship between different entities can be characterized and characterized in the form of graph structure data, which can be a collection of nodes and edges, the nodes of the graph structure data can represent data objects, and the edges of the graph structure data can represent the relationship between the data objects.
The knowledge graph is one of graph structure data, and can express knowledge in a node mode and connect knowledge in series, risk diffusion, propagation and inference are carried out in a larger view angle, and mutual influence and action of different node knowledge are better grasped. A knowledge-graph may be composed of a large number of triples (head entities, relationships, tail entities), where an entity may be a node in a graph structure data structure and a relationship may be an edge in the graph structure data structure. The entity nodes in the knowledge-graph may be businesses, products, individuals, places, organizations, events, etc., and the relationships in the knowledge-graph may represent relationships between the entity nodes. As an example, the triplets in the knowledge-graph may be (yao, nationality, china), yao and china being two different entity nodes, nationality being a relationship of two entities, for example. Meanwhile, attribute data also exists in each node entity in the knowledge graph, for example, the attribute data of the entity node Yao is basketball player.
The knowledge graph is widely applied to the financial field, and in the enterprise wind control field, the knowledge graph is widely applied to associated risk identification, vacant shell identification and financial risk identification so as to identify and avoid uncertainty and risk brought by a main body due to changes of external related nodes (including people, enterprises, industries, markets and the like).
As an example, fig. 1 is a schematic diagram of an enterprise knowledge graph provided in an embodiment of the present application. As can be seen from fig. 1, the enterprise knowledge graph shown in fig. 1 may include 7 sets of triple data, which are respectively (hotel, located, new york), (hotel, approaching, time square), (hotel, hilton group), (hotel, approaching, wigill barbecue), (hotel, approaching, zhuli cake shop), (hotel, approaching, broadway show), and (time square, located, new york). The attributes of the entity node hotel and the entity node Hillton group can be organizations, the attributes of the entity node epoch Square and the entity node New York can be positions, the attributes of the entity node Virgill barbecue store and the entity node Zhulli cake store can be restaurants, and the attributes of the entity node Broadways show can be events.
Graph structure data characterization learning can be referred to as graph embedding technology (graph embedding), and graph structure data characterization can transform node data in graph structure data into low-dimensional dense embedded vector representation to obtain a characterization vector of the node data, and the obtained characterization vector (embedding) can be used for downstream tasks such as node classification, link prediction, visualization and the like.
When the graph structure data representation is performed, the prior art may adopt a manual feature extraction technology, a cross combination or tree feature extraction technology, a deep learning technology, a graph neural network, and the like. The artificially extracted features may be an artificially extracted enterprise feature representation, and the enterprise features may be, for example, the established age, and may be, for example, whether the information is lost or not. The method has the advantages that the characteristics of the enterprise knowledge graph are extracted manually, the number of the extracted characteristics is limited, the characteristics cannot be exhausted, the dimensionality is too high, and the effective information amount is small. The cross-combination feature extraction technology may perform cross-combination on different manually extracted features, and the tree feature extraction technology may perform feature extraction on a knowledge graph by using a decision tree, for example, and may perform feature extraction by using an XGB derivative tree in some embodiments. The Deep learning technology may include Deep factorization (Deep FM), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and the like, and when extracting characteristics of an enterprise in the Deep learning technology, an interrelation between the enterprises is not utilized, and a wider receptive field is lacking. The graph neural network can perform vector characterization on the knowledge graph by using the graph neural network, and perform downstream tasks according to the characterization vectors of the knowledge graph.
The Graph Neural Network (GNN) can use the neural network to learn graph structure data, extract and mine features and patterns in the graph structure data, and meet the algorithm generalization of the task requirements of graph learning such as clustering, classification, prediction, segmentation, generation and the like. The graph neural network can be an algorithm model of the specific landing of the knowledge graph, and nodes and knowledge of the knowledge graph are digitalized and converted into a depth graph model for calculation and propagation. The graph neural network may include a graph convolutional neural network (GCN), a graph attention neural network (GAT), and a graph neural network with adaptive receptive paths (Genie Path), etc.
As mentioned above, the graph neural network may perform characterization learning on the graph structure data, that is, the graph neural network may perform vectorization characterization on node data in the graph structure and use the node data in downstream tasks. When the graph structure data representation is learned, algorithms such as deep walk, node2vec, struc2vec, GCN, GAT and the like can be adopted. deep walk can learn hidden information in the graph structure, and can represent nodes in the graph as a vector containing potential information. The algorithm can comprise two parts of random walk and generation of a representation vector, wherein some vertex sequences are firstly extracted from a graph structure by using the random walk, then the generated fixed point sequences are regarded as sentences consisting of words by means of natural language processing thought, and finally each vertex is represented as a low-dimensional representation vector by using a natural language processing tool. The node2vec and struc2vec can sample random walks to obtain node data, and then the node data is modeled by a word vector processing method to obtain a characterization vector of the node data. The graph convolution neural network GCN can be a method for deep learning coding by using graph convolution, can be carried out by combining feature information and a structure simultaneously, and can better carry out feature characterization on original information. The graph Attention neural network can aggregate neighbor nodes through an Attention model (Attention model), self-adaptive distribution of different neighbor weights is achieved, more interactive information among nodes is considered, and accordingly the expression capacity of the graph neural network model is greatly improved. The attention model may be used for text classification tasks with effects exceeding traditional CNNs, long-term-memory (LSTM) networks, etc.
In the related technology, supervised learning is mostly adopted when the representation learning is carried out on the graph structure data, parameters of a learning model are adjusted through marked training data to obtain an optimized learning model, and the learning model is used for downstream tasks. When the deep learning technology and the graph neural network are used for graph structure data representation learning, a supervised learning mode is mostly adopted, and model parameters are optimized through tag data.
As an example, when supervised learning is adopted to enable the graph neural network to learn the enterprise knowledge graph, node data in the enterprise knowledge graph may be input into the graph neural network, a characterization vector of the node data is obtained through initialization, the characterization vector is compared with tag data of the node data, loss values of the node characterization vector and the tag data may be calculated, and the loss values are minimized by adjusting a graph neural parameter, so as to train the graph neural network, and the graph neural network may better perform characterization learning on the knowledge graph. However, when supervised learning is adopted for graph structure data representation learning, the bottom knowledge expression of the nodes is not captured, and the model is trained only through label data from top to bottom, so that the information of the bottom layer of the nodes is very wasted. That is, supervised learning only uses the difference between nodes and label data of the nodes to adjust the parameters of the learning model, and does not deeply mine information at the bottom layer of the nodes.
In the related technology, unsupervised learning can be adopted to perform representation learning on the graph structure data, manual labeled label information is not required to be relied on, and the data can be directly used as supervision information.
As an example, node data in an enterprise knowledge graph may be input into a graph neural network, positive samples and negative samples of initialized characterization vectors are generated, learning compares the positive samples and the negative samples, so that distances of similar examples in a feature space are reduced, distances of dissimilar examples in the feature space are reduced, differences are increased, a model characterization obtained through such a learning process may execute a downstream task, fine tuning is performed on a smaller label data set, and an unsupervised model learning process is implemented. However, when the characterization vector is generated initially, the positive sample and the negative sample are generated in a random generation manner, and there is no deeper information for mining the node of the graph structure data, so that the problem of poor node characterization capability exists, which causes the optimization effect of the learning model to be poor, thereby affecting the execution of downstream tasks.
In summary, in the related art, no matter supervised or unsupervised learning is used, the problems of node information waste and low node characterization capability exist, especially in the wind control of the enterprise on the chain, the data magnitude coverage on the chain is wide, the variety is various, but the node information is too sparse and fragmented, and contains a large amount of knowledge, and the prior art does not effectively learn a large amount of effective knowledge of the node information and applies the effective knowledge to the downstream tasks.
In view of this, the present application provides a method and an apparatus for training a graph neural network, so as to solve the problems of node information waste and poor node characterization capability when the graph neural network performs characterization learning on graph structure data.
The present application provides a method for training a neural network of a graph, which can be any one of the aforementioned GCN, GAT and Genie Path. The graph neural network is used for characterization learning of graph structure data, such as a knowledge graph, such as the enterprise knowledge graph shown in fig. 1.
The graph neural network provided by the embodiment of the application can comprise at least two coding layers and a dropout layer, and fig. 2 shows a structural schematic diagram of the graph neural network. The graph neural network shown in fig. 2 may include a plurality of coding layers and dropout layers, such as a first coding layer and a first dropout layer, a second coding layer and a second dropout layer … …, an nth coding layer and an nth dropout layer.
The encoding layer in the graph neural network provided in the embodiments of the present application may be various encoding models, and the encoding models may be, for example, Bidirectional encoding characterization (BERT) based on a converter, and in some embodiments, the BERT may include multiple transform layers, and the multiple transform layers may form the encoding layer in the graph neural network. The coding model may also be, for example, an Attention model, and in some embodiments, the Attention model may be located between coding layers — decoding layers (encoders), the Attention model is mediated between encoders and decoders in the encoders-decoders, first, weights are calculated according to characteristics of the encoders and the decoders, and then, weighted summation is performed on the characteristics of the encoders, and an output of the Attention model may be an output of a coding layer of the neural network in this application, in other words, the coding layer in this application may be an Attention model.
The dropout layer in the graph neural network provided by the embodiment of the application can be located in the computation layer of the graph neural network. dropout is a technology for preventing overfitting of a model, and neural network units can be temporarily and randomly discarded from a network according to a certain probability in the training process of a deep learning network, so that the learning model can have higher robustness. Neural network elements in one or more algorithm layers in the graph neural network may be temporarily dropped from the network, allowing the graph neural network to learn using fewer neural elements. In some embodiments, a dropout layer may be located at each level of computational hierarchy in the graph neural network; in other embodiments, the dropout layer may be located in a partial computation layer of the graph neural network, for example, the dropout layer may be located in a fully connected layer.
Fig. 3 is a schematic flowchart of a method for training a neural network according to an embodiment of the present disclosure.
In step S310, the node data of the graph structure data is input into the first coding layer and the first dropout layer of the graph neural network to obtain a first characterization vector.
The graph structure data may be the aforementioned knowledgegraph, which may be the enterprise knowledgegraph shown in FIG. 1. The node data "hotel, new york, time square, hilton group, wigel barbeque, july cake shop, broadway show" in the knowledge graph shown in fig. 1 may be input to the first coding layer and the first dropout layer of the graph neural network.
The node data of the graph structure data is input into a first coding layer and a first dropout layer of the graph neural network, so that a first characterization vector can be obtained, and the data structure of the graph structure data can be reserved by the first characterization vector. As an example, inputting node data of the graph structure data to a first coding layer of the graph neural network may obtain a first coding vector, performing characterization learning on the first coding vector by using a first dropout layer, and outputting a first characterization vector of the node data. The first characterization vector may include node information of the graph structure data and similarity information of neighboring nodes.
And performing characterization learning on the first coding vector by using the first dropout layer, so that part of neural units in the neural network of the graph can be randomly hidden, fewer neural units perform characterization learning on the first coding vector, and the first characterization vector of the node is obtained.
In step S320, the node data of the knowledge graph is input into the second coding layer and the second dropout layer of the graph neural network to obtain a second characterization vector.
In some embodiments, the first encoding layer and the second encoding layer may be the same encoding layer. The second dropout layer and the first dropout layer may not be the same dropout layer, and specifically, as one example, the first dropout layer and the second dropout layer may be different operator layers in the graph neural network, and as another example, the first dropout layer and the second dropout layer may be the same operator layer in the graph neural network, but the probability of random discarding of neural units is different.
In step S330, loss values of the first token vector and the second token vector are calculated.
The loss values of the first token vector and the second token vector can be calculated using a loss function, and the similarity of the two vectors can be measured by calculating the loss values of the two token vectors. The cross entropy can be a good measure of the difference in the distribution of the two token vectors, and thus the penalty function can be, for example, a cross entropy function, and can also be, for example, a relative entropy function based on the cross entropy to calculate the KL divergence of the first token vector and the second token vector.
In step S340, parameters of the graph neural network are optimized according to the loss values of the first characterization vector and the second characterization vector, so as to obtain a first graph neural network model.
The loss values of the first token vector and the second token vector can be used for measuring the similarity of the two vectors, and when the graph neural network training is carried out, the smaller the loss value is, the more similar the two feature vectors are.
Parameters of the graph neural network can be optimized according to the loss values, for example, parameters of an encoding layer and a dropout layer can be optimized according to the loss values. After the parameter tuning optimization, a first graph neural network model meeting the requirements can be obtained, and in some embodiments, the optimization iterator can be used to stop model optimization according to the convergence condition so as to obtain the optimized first graph neural network model.
In some embodiments, the graph neural network may include multiple coding layers and a dropout layer, node data of the graph structure data may be input to the multiple coding layers and the dropout layer, respectively, to obtain multiple characterization vectors, loss values of the multiple characterization vectors may be calculated, and the learning model may be optimized based on the loss values of the multiple characterization vectors.
Therefore, according to the method for training the neural network of the graph, at least two different dropout layers are adopted, fewer neural units are used for representation learning of graph structure data respectively, different representation vectors of node data are obtained, model parameters of the neural network of the graph are optimized, the different representation vectors of the node data tend to be consistent, the neural network of the graph can be forced to deeply mine information of the node data, representation information of higher dimensionality of the node is obtained, node knowledge content is improved, and node representation capability of the neural network of the graph is improved.
The graph neural network may be trained using existing graph structure data, for example, the graph structure data may be obtained from a graph structure database for model training, or the graph structure data may be constructed before training. In some embodiments, graph structure data may be constructed using data on a blockchain, and in a blockchain scenario, because characteristics of the blockchain are feasible, intercommunicated, and safe, the blockchain may communicate huge data of different enterprises, and the data have wide coverage and a wide variety of data levels. For example, an information-rich graph structure data can be constructed in conjunction with an enterprise knowledge graph in a risk brain system, where each node in the graph structure data can be one of a product, an individual, a place, an organization, an event.
In some embodiments, the first graph neural network model may be trained according to the label data of the application scenario, resulting in the second graph neural network model applied to the scenario.
The tag data of the application scenario may be attribute data of the node data in the graph structure data. As an example, in an enterprise knowledge graph, the tag data of an application scenario may be, for example, one of a product, an individual, a place, an organization, an event. As another example, in a knowledge graph in the financial field, the tag data of an application scenario may be a legal person, an asset, a company, and the like. As yet another example, in a knowledge graph in the medical field, the tag data of the application scenario may be a disease, a department, a part, and the like.
In the first graph neural network obtained through training, different coding and dropout are carried out on node information in graph structure data to obtain different characterization vectors, loss values of the different characterization vectors are calculated, parameters of a learning model are optimized according to the loss values, each node in the graph has stronger characterization capability to gather more bottom knowledge, and a universal first graph neural network model can be obtained. The method for training the neural network of the second graph applied to the scene is described below with reference to the enterprise knowledge graph shown in fig. 4 and fig. 1.
Fig. 4 is a schematic flowchart of another method for training a neural network according to an embodiment of the present disclosure. After the node data in the knowledge graph shown in fig. 1 is input into the graph neural network to obtain the representation vector of the node data in the knowledge graph, the graph neural network can be trained according to the label data of the application scene.
In step S410, a first cross entropy of a first token vector of a node and tag data of the node is calculated.
As an example, a first cross-entropy of a first characterization vector of the node data "hotel" shown in fig. 1 and the node label data "organization" may be calculated; as another example, a first cross entropy of the first token vector of the node data "july cake shop" and the tag data "restaurant" of the node shown in fig. 1 may be calculated.
In step S420, a second cross entropy of a second token vector of the node and the label data of the node is calculated.
As an example, a second cross-entropy of a second characterization vector of the node data "hotel" and the node label data "organization" shown in fig. 1 may be calculated; as another example, a second cross entropy of a second token vector of the node data "july cake shop" and the label data "restaurant" of the node shown in fig. 1 may be calculated.
In steps S410 and S420, a cross entropy function may be used to calculate a first cross entropy and a second cross entropy, and the cross entropy function may be as follows:
Figure BDA0003655079100000121
Figure BDA0003655079100000122
in step S430, the relative entropy of the first cross entropy and the second cross entropy is calculated.
In some embodiments, KL divergence of the first cross entropy and the second cross entropy may be calculated, such that the graph neural network may be optimized such that the outputs of the different dropout layers are as consistent as possible.
In some embodiments, the relative entropy of the first cross entropy and the second cross entropy may be calculated using a relative entropy function, which may be as follows:
Figure BDA0003655079100000123
in step S440, parameters of the first graph neural network model are optimized according to the relative entropy, and a second graph neural network model of the application scenario is obtained.
Parameters of the graph neural network can be adjusted according to the relative entropy of the first cross entropy and the second cross entropy, so that the relative entropy is as small as possible, and the second graph neural network applied to the scene is obtained. After the training, a characterization vector for each node, for example, a vector for node A [128, 128], can be obtained.
In some embodiments, the model parameters may be optimized according to the total loss value of the neural network, and the distribution of the loss values may be dynamically weighted, resulting in stable model performance. As an example, the total loss value of the graph neural network may be a sum of the first cross entropy, the second cross entropy, and the relative entropy, and the total loss value may be calculated by the following formula:
L i =L i (CE) +aL i (KL)
Figure BDA0003655079100000131
fig. 5 is another method for training a graph neural network according to an embodiment of the present disclosure, and as shown in fig. 5, when the graph neural network is trained, loss values loss3 of the first token vector and the second token vector may be calculated, and the model of the graph neural network may be optimized according to loss3 to obtain a model of the first graph neural network. And calculating loss values loss1 of the first characterization vectors of the node data and the tag data, loss2 of the second characterization vectors of the node data and the tag data, and KL divergence of the loss1 and the loss2 to obtain a second graph neural network model suitable for the scene. In this way, the separate training can use more data for training, and the model structure is more stable.
After the second graph neural network model adapting to the scene is obtained, the idea of transfer learning can be adopted, and the second graph neural network is trained according to the label data of the service data so as to match with the outlet of the service scene to obtain a third graph neural network applied to the service. Transfer learning is a method of machine learning that can reuse a pre-trained model in another task, and can include a model pre-training phase (pre-trained phase) and a model fine-tuning phase (fine-tune phase). The pre-train stage can be a self-supervision training process, and the learning bottom-layer logic knowledge is continuously optimized through an agent task; the fine-tune phase may be to adjust the model logic adaptation according to the service tag.
In the application, the stage of training the first graph neural network and the second graph neural network may be a model pre-training stage in the transfer learning, and the second graph neural network is finely tuned according to the service tag data to adapt to different service scenes to obtain a third graph neural network, which may be a model fine-tuning stage in the transfer learning.
The business label data may be business data when the graph neural network is used, and as an example, the business label data may be "default", and in this business scenario, the graph neural network may perform a judgment on whether node data in the graph structure data is default.
Therefore, the idea of transfer learning is adopted, the two-stage training task is constructed, the nodes are more fully characterized and learned in the model pre-training stage, and the bottom knowledge expression is optimized; and in the model fine tuning stage, according to the service label, a specific model is adopted to optimize the model, so that the model effect is improved. In the model pre-training stage, different dropout transformations are performed on the node information, a node knowledge system is continuously learned, depth representation information can be effectively extracted, and the effect of downstream tasks is improved greatly.
In some embodiments, the trained graph neural network model may be deployed on the blockchain, e.g., a trained third graph neural network model may be deployed on the blockchain. Because the trained graph neural network model can mine the bottom layer information of the node data in the graph structure data, less data samples can be used for characterization learning and applied to downstream services, the pressure of block chain storage and communication bandwidth can be reduced, and the prediction speed on the chain is improved. Meanwhile, the trained graph neural network model can effectively utilize various data with complicated information under the block chain scene, and the effect of downstream tasks is obviously improved.
Method embodiments of the present application are described in detail above in conjunction with fig. 1-5, and apparatus embodiments of the present application are described in detail below in conjunction with fig. 6 and 7. It is to be understood that the description of the apparatus embodiments corresponds to the description of the method embodiments, and therefore reference may be made to the preceding method embodiments for parts which are not described in detail.
Fig. 6 is a schematic structural diagram of an apparatus for training a neural network of a graph, which is provided in an example of the present application, for characterizing structural data of a learning graph, where the neural network of the graph includes at least two coding layers dropout. The apparatus 600 for training a neural network of a graph shown in fig. 6 may include:
a first training module 610, configured to input node data of the graph structure data into a first coding layer and a first dropout layer of the graph neural network, so as to obtain a first characterization vector;
a second training module 620, configured to input node data of the graph structure data into a second coding layer and a second dropout layer of the graph neural network, so as to obtain a second characterization vector;
a calculation module 630 configured to calculate loss values for the first token vector and the second token vector;
and the optimizing module 640 is configured to optimize parameters of the graph neural network according to the loss values of the first characterization vector and the second characterization vector to obtain a first graph neural network model.
Optionally, the calculating the loss value of the first token vector and the second token vector comprises: calculating a loss value for the first token vector and the second token vector using a cross entropy loss function or a relative entropy loss function.
Optionally, the apparatus further comprises: a third training module 650 configured to train the first graph neural network model according to the label data of the application scenario, resulting in a second graph neural network model applied to the application scenario.
Optionally, the apparatus further comprises: a fourth training module 660 configured to train the second graph neural network model according to the label data of the service data, so as to obtain a third graph neural network model applied to the service.
Optionally, the graph neural network is one of a graph convolution neural network, a graph attention neural network, and an adaptive sensory path.
Optionally, the graph structure data is knowledge-graph data.
Fig. 7 is a schematic structural diagram of another apparatus for training a neural network according to an embodiment of the present application. The management database apparatus 700 depicted in fig. 7 may include a memory 710 and a processor 720, and the memory 710 may be used to store executable code. The processor 720 may be configured to execute the executable code stored in the memory 710 to implement the steps of the various methods described previously. In some embodiments, the apparatus 700 may further include a network interface 730, and data exchange between the processor 720 and an external device may be implemented through the network interface 730.
In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware or any other combination. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a Digital Video Disc (DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (13)

1. A method of training a graph neural network for characterizing learning graph structure data, the graph neural network comprising at least two encoding layers and a dropout layer, the method comprising:
inputting node data of the graph structure data into a first coding layer and a first dropout layer of the graph neural network to obtain a first characterization vector;
inputting node data of the graph structure data into a second coding layer and a second dropout layer of the graph neural network to obtain a second characterization vector;
calculating loss values of the first token vector and the second token vector;
and optimizing parameters of the graph neural network according to the loss values of the first characterization vector and the second characterization vector to obtain a first graph neural network model.
2. The method of claim 1, the calculating penalty values for the first token vector and the second token vector, comprising:
calculating loss values of the first token vector and the second token vector using a cross entropy loss function or a relative entropy loss function.
3. The method of claim 1, further comprising:
and training the first graph neural network model according to the label data of the application scene to obtain a second graph neural network model applied to the application scene.
4. The method of claim 3, further comprising:
and training the second graph neural network model according to the label data of the service data to obtain a third graph neural network model applied to the service.
5. The method of claim 1, the graph neural network being one of a graph convolution neural network, a graph attention neural network, and an adaptive sensory path graph neural network.
6. The method of claim 1, the graph structure data being knowledge-graph data.
7. An apparatus for training a graph neural network for characterizing learning graph structure data, the graph neural network comprising at least two encoding layers and a dropout layer, the apparatus comprising:
the first training module is configured to input node data of the graph structure data into a first coding layer and a first dropout layer of the graph neural network to obtain a first characterization vector;
the second training module is configured to input node data of the graph structure data into a second coding layer and a second dropout layer of the graph neural network to obtain a second characterization vector;
a calculation module configured to calculate loss values for the first token vector and the second token vector;
and the optimization module is configured to optimize parameters of the graph neural network according to the loss values of the first characterization vector and the second characterization vector to obtain a first graph neural network model.
8. The apparatus of claim 7, the calculating penalty values for the first token vector and the second token vector, comprising:
calculating loss values of the first token vector and the second token vector using a cross entropy loss function or a relative entropy loss function.
9. The apparatus of claim 7, the apparatus further comprising:
and the third training module is configured to train the first graph neural network model according to the label data of the application scene to obtain a second graph neural network model applied to the application scene.
10. The apparatus of claim 9, the apparatus further comprising:
and the fourth training module is configured to train the second graph neural network model according to the label data of the service data to obtain a third graph neural network model applied to the service.
11. The apparatus of claim 7, the graph neural network is one of a graph convolution neural network, a graph attention neural network, and an adaptive sensing path graph neural network.
12. The method of claim 7, the graph structure data being knowledge-graph data.
13. An apparatus to train a graph neural network, comprising a memory having executable code stored therein and a processor configured to execute the executable code to implement the method of any of claims 1-6.
CN202210551012.8A 2022-05-20 2022-05-20 Method and device for training neural network of graph Pending CN114970816A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210551012.8A CN114970816A (en) 2022-05-20 2022-05-20 Method and device for training neural network of graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210551012.8A CN114970816A (en) 2022-05-20 2022-05-20 Method and device for training neural network of graph

Publications (1)

Publication Number Publication Date
CN114970816A true CN114970816A (en) 2022-08-30

Family

ID=82985370

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210551012.8A Pending CN114970816A (en) 2022-05-20 2022-05-20 Method and device for training neural network of graph

Country Status (1)

Country Link
CN (1) CN114970816A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116091208A (en) * 2023-01-16 2023-05-09 张一超 Credit risk enterprise identification method and device based on graph neural network
CN116186824A (en) * 2022-11-29 2023-05-30 清华大学 Building structure arrangement method based on image embedded graph neural network model

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116186824A (en) * 2022-11-29 2023-05-30 清华大学 Building structure arrangement method based on image embedded graph neural network model
CN116186824B (en) * 2022-11-29 2023-09-12 清华大学 Building structure arrangement method based on image embedded graph neural network model
CN116091208A (en) * 2023-01-16 2023-05-09 张一超 Credit risk enterprise identification method and device based on graph neural network
CN116091208B (en) * 2023-01-16 2023-10-27 张一超 Credit risk enterprise identification method and device based on graph neural network

Similar Documents

Publication Publication Date Title
CN113515770B (en) Method and device for determining target service model based on privacy protection
CN111695415B (en) Image recognition method and related equipment
CN112508085B (en) Social network link prediction method based on perceptual neural network
CN111753024B (en) Multi-source heterogeneous data entity alignment method oriented to public safety field
CN114970816A (en) Method and device for training neural network of graph
CN112215604B (en) Method and device for identifying transaction mutual-party relationship information
Hii et al. Multigap: Multi-pooled inception network with text augmentation for aesthetic prediction of photographs
CN112364976A (en) User preference prediction method based on session recommendation system
CN112580328A (en) Event information extraction method and device, storage medium and electronic equipment
CN115994226B (en) Clustering model training system and method based on federal learning
CN113378160A (en) Graph neural network model defense method and device based on generative confrontation network
CN115456043A (en) Classification model processing method, intent recognition method, device and computer equipment
US20230281247A1 (en) Video retrieval method and apparatus using vectorizing segmented videos
CN111026852B (en) Financial event-oriented hybrid causal relationship discovery method
Zhang et al. An intrusion detection method based on stacked sparse autoencoder and improved gaussian mixture model
CN112527959B (en) News classification method based on pooling convolution embedding and attention distribution neural network
He et al. Classification of metro facilities with deep neural networks
US20230306056A1 (en) Video retrieval method and apparatus using vectorized segmented videos based on key frame detection
CN116361643A (en) Model training method for realizing object recommendation, object recommendation method and related device
WO2023055614A1 (en) Embedding compression for efficient representation learning in graph
CN114596464A (en) Multi-feature interactive unsupervised target detection method and system, electronic device and readable storage medium
KR102670850B1 (en) Method for searching video based on video segmentation
CN110119465A (en) Merge the mobile phone application user preferences search method of LFM latent factor and SVD
CN114781471B (en) Entity record matching method and system
CN110674257B (en) Method for evaluating authenticity of text information in network space

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination