WO2023178793A1 - 双视角图神经网络模型的训练方法、装置、设备及介质 - Google Patents

双视角图神经网络模型的训练方法、装置、设备及介质 Download PDF

Info

Publication number
WO2023178793A1
WO2023178793A1 PCT/CN2022/090086 CN2022090086W WO2023178793A1 WO 2023178793 A1 WO2023178793 A1 WO 2023178793A1 CN 2022090086 W CN2022090086 W CN 2022090086W WO 2023178793 A1 WO2023178793 A1 WO 2023178793A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
graph data
dual
graph
neural network
Prior art date
Application number
PCT/CN2022/090086
Other languages
English (en)
French (fr)
Inventor
王俊
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023178793A1 publication Critical patent/WO2023178793A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Definitions

  • the present application relates to the field of artificial intelligence technology, and in particular to a training method, device, equipment and medium for a dual-view graph neural network model.
  • Graph data is a data structure, and graph neural network is a branch of deep learning on graph structure.
  • Common graph data includes nodes and edges, where nodes contain entity information and edges contain relationship information between entities.
  • Many learning tasks now require processing graph data, such as physical system modeling, social network analysis, traffic network analysis, protein structure prediction, and molecular property prediction, etc. These all require models to be able to learn relevant knowledge from the input of graph data, so The graph neural network model was born.
  • the model needs to learn highly transferable, reusable, and common knowledge. This approach of only focusing on node information will lead to graph data usually being used in the model. It mainly uses node features and ignores edge information, resulting in insufficient transferability and generalization capabilities of the model.
  • the purpose of this application is to provide a training method, device, equipment and medium for a dual-view graph neural network model to solve the problem that when graph data is used in the model, node features are usually mainly used and edge information is ignored, thereby making the model Technical issues of insufficient transferability and generalization capabilities.
  • the first aspect provides a training method for a dual-view graph neural network model, including:
  • the graph data includes nodes, edges and attribute information, and construct a node feature matrix and a node adjacency matrix based on the attribute information;
  • the edge feature matrix and the edge adjacency matrix into the edge perspective network of the dual-view graph neural network model to be trained to obtain the second judgment result of the graph data, and the edge perspective network is a GNN network;
  • the first judgment result of the graph data and the second judgment result of the graph data are weighted to obtain the judgment result of the graph data, and the node is jointly trained based on the judgment result of the graph data and the label of the graph data.
  • the perspective network and the edge perspective network update the parameters of the dual-view graph neural network model to be trained to obtain a trained dual-view graph neural network model.
  • a training device for a dual-view graph neural network model including:
  • a graph data acquisition module is used to obtain multiple graph data with labels, the graph data includes nodes, edges and attribute information, and construct a node feature matrix and a node adjacency matrix based on the attribute information;
  • the exchanged graph data acquisition module is used to exchange the positions of nodes and edges in the graph data, obtain the exchanged graph data, and construct an edge feature matrix and an edge adjacency matrix based on the attribute information of the exchanged graph data;
  • the first judgment result acquisition module is used to input the node feature matrix and the node adjacency matrix into the node perspective network of the dual-view graph neural network model to be trained, and obtain the first judgment result of the graph data.
  • the node perspective network is the GNN network;
  • the second judgment result acquisition module is used to input the edge feature matrix and the edge adjacency matrix into the edge view network of the dual-view graph neural network model to be trained, and obtain the second judgment result of the graph data.
  • the edge view network is the GNN network;
  • a model training module configured to weight the first judgment result of the graph data and the second judgment result of the graph data to obtain the judgment result of the graph data, and based on the judgment result of the graph data and the label of the graph data , jointly train the node perspective network and the edge perspective network, update the parameters of the dual-view graph neural network model to be trained, and obtain a trained dual-view graph neural network model.
  • a computer device including a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the computer program, the steps of the above intelligent question and answer processing method are implemented.
  • a computer-readable storage medium stores a computer program.
  • the computer program is executed by a processor, the steps of the above intelligent question and answer processing method are implemented.
  • the training method, system, equipment and medium of the dual-view graph neural network model of this application is designed to train the graph neural network through comparative learning of complementary perspectives of nodes and edges.
  • the node feature matrix and the node adjacency matrix are input to the edge perspective network, and the two networks are trained simultaneously, so that the update of the weights in the two networks is related to the characteristics of the nodes and edges, and finally the trained dual-view graph neural network model is obtained.
  • contrastive learning positive and negative samples are constructed from the input data, allowing the model to distinguish positive and negative samples in the implicit representation space, making full use of the feature information of nodes and edges, so that the model has better transferability.
  • the obtained model has good generalization.
  • the trained dual-view graph neural network model can be directly used for fine-tuning, thereby avoiding the need to train a new model from scratch for each downstream task.
  • the proposed dual-view graph neural network self-supervised training strategy can more effectively learn rich key graph representation information, capture the general structural rules in node and edge data, and then empower it to operate in unlimited types of downstream graphs. Fitting capabilities on mining tasks.
  • the method of this application transforms the original method of manually adjusting parameters and relying on machine learning engineers and experts into a method that is suitable for large-scale and replicable industrial implementation.
  • Figure 1 shows a schematic diagram of an application environment of a training method for a dual-view graph neural network model in an embodiment of the present application
  • Figure 2 shows a schematic flow chart of a training method for a dual-view graph neural network model in an embodiment of the present application
  • FIG. 3 shows a schematic flow chart of step S30 in an embodiment of the present application
  • Figure 4 shows a schematic flow chart of step S31 in an embodiment of the present application
  • Figure 5 shows a schematic flow chart of step S40 in an embodiment of the present application
  • Figure 6 shows a structural block diagram of a training device for a dual-view neural network model in an embodiment of the present application
  • Figure 7 is a schematic structural diagram of a computer device in an embodiment of the present application.
  • Figure 8 is another schematic structural diagram of a computer device in an embodiment of the present application.
  • the training method of the dual-view graph neural network model provided by the embodiment of the present application can be applied in the application environment as shown in Figure 1, in which the client communicates with the server through the network.
  • the server can train the graph neural network through comparative learning of client-side design node and edge complementary perspectives, by inputting the node feature matrix and node adjacency matrix to the node perspective network, and inputting the edge feature matrix and edge adjacency matrix into the edge perspective network. Train two networks at the same time so that the weight updates in the two networks are related to the characteristics of nodes and edges, and finally obtain the trained dual-view graph neural network model.
  • the trained dual-view graph neural network model can be directly used for fine-tuning, thereby avoiding the need to train a new model from scratch for each downstream task.
  • the proposed dual-view graph network self-supervised training strategy can more effectively learn rich key graph representation information, capture the general structural rules in node and edge data, and then enable it to perform downstream graph mining without limiting types. task fitting ability.
  • the method of this application transforms the original method of manually adjusting parameters and relying on machine learning engineers and experts into a method that is suitable for large-scale and replicable industrial implementation.
  • the client can be, but is not limited to, various personal computers, laptops, smartphones, tablets and portable wearable devices.
  • the server can be implemented as an independent server or a server cluster composed of multiple servers. The present application is described in detail below through specific embodiments.
  • Figure 2 is a schematic flowchart of a training method for a dual-view graph neural network model provided by an embodiment of the present application, which includes the following steps:
  • the graph data includes nodes, edges and attribute information, and construct a node feature matrix and a node adjacency matrix based on the attribute information.
  • the graph data is obtained from an open source graph database or a self-produced data set, and the specific acquisition method is not limited.
  • graph data uses the data structure of "graph" to store and query data, and its data model is mainly represented by attribute information, nodes and edges.
  • the attribute information of graph data refers to the information contained in nodes and edges that characterizes the nature of the graph data itself.
  • its attribute information can be the properties of each atom in the compound, the specific atomic structure, etc.
  • Graph data can quickly solve complex relationship problems through the combination of nodes and edges.
  • nodes represent the atoms in the compound
  • edges represent information about the chemical bonds connecting the atoms.
  • Each node in the graph data is connected to its adjacent nodes through edges, thus forming a node adjacency matrix.
  • the node adjacency matrix is a matrix with N rows and N columns.
  • the element in the i-th row and j-th column of the node adjacency matrix indicates "whether there is an edge between nodes i and j."
  • One of the representation methods is: if there is an edge between nodes i and j, then this element is 1, otherwise it is 0. Of course, it can also be expressed as: if there is an edge between nodes i and j, then this element is 0, otherwise it is 1.
  • the node feature matrix represents the characteristics of each node contained in the graph data, and each row in the node feature matrix represents the characteristics of a node in the graph data.
  • the characteristics of nodes can be the size of each atom, the weight of the atom, etc.
  • contrastive learning is currently one of the most effective technical means.
  • the so-called self-supervised contrastive learning actually means that for any two data points, the more similar they are (belonging to the same category), the closer their graph representation will be.
  • Self-supervised learning models based on contrastive learning have achieved great success on image and sequence-based language data structures. The idea is to construct different samples from the input data and learn the features in the input data by guiding the pre-trained model to distinguish positive and negative samples in the implicit representation space.
  • edges have a series of characteristics
  • the nodes of the original graph become edges of the new graph
  • the edges of the original graph become nodes of the new graph, thus constructing a dual-perspective graph. data.
  • the construction methods of the edge feature matrix and edge adjacency matrix are similar to the above-mentioned node feature matrix and node adjacency matrix, and will not be described again here.
  • the negative sample edge feature matrix is obtained. For example, for the three nodes with row numbers 1, 2, and 3 in the edge feature matrix, changing the positions of the first node and the second node results in a negative sample edge feature matrix of 2, 1, and 3.
  • the node perspective network is a GNN network.
  • step S30 the node feature matrix and the node adjacency matrix are input into the node perspective network to obtain the first judgment result of the graph data, including:
  • step S31 includes the following process:
  • the edge perspective network is a GNN network.
  • the node feature matrix is subjected to feature perturbation processing to obtain a node negative sample feature matrix, where feature perturbation refers to randomly exchanging multiple rows of the node feature matrix.
  • feature perturbation refers to randomly exchanging multiple rows of the node feature matrix.
  • each node has 32-dimensional features, and the row numbers of the node feature matrix are 1, 2, 3, and 4.
  • the second node and the third node The positions of the nodes are exchanged.
  • the row numbers are 1, 3, 2, and 4.
  • each convolutional layer learns the adjacency matrix and the node feature matrix in the convolutional layer to obtain the relevant features of the positive sample data in the next convolutional layer.
  • each convolutional layer learns the adjacency matrix and the negative sample feature matrix of the node in the convolutional layer to obtain the relevant features of the negative sample data in the next convolutional layer.
  • the positive sample local features of the node and the negative sample local features of the node jointly form the local features of the node. Comparing the similarity between the local features of the nodes and the average features of the nodes, the first judgment result of the graph data is obtained. Comparing the similarity between the local features of the edges and the average features of the edges, the second judgment result of the graph data is obtained.
  • the method of extracting the negative sample local features of the node is: Among them, e is the negative sample local feature of the node, ⁇ is the sigmord function, is the degree matrix after adding the node characteristic matrix and the identity matrix, A is the node characteristic matrix, I is the identity matrix, X is the node adjacency matrix, and ⁇ is the preset parameter matrix. It can be understood that the processing process of the edge-view network is similar to that of the node-view network, and will not be described in detail here.
  • S40 Weight the first judgment result of the graph data and the second judgment result of the graph data to obtain the judgment result of the graph data, and jointly train the institute based on the judgment result of the graph data and the label of the graph data. Describe the node perspective network and the edge perspective network, update the parameters of the dual-view graph neural network model to be trained, and obtain the trained dual-view graph neural network model.
  • the GCN network is used as the node perspective network and the edge perspective network to extract the features of nodes and edges.
  • the main process of graph neural network model learning is to iteratively aggregate and update the neighbor information of nodes in the graph data. In an iteration, each node updates its own information by aggregating the characteristics of neighbor nodes and its own characteristics in the previous layer, and usually performs non-linear transformation on the aggregated information.
  • each node can obtain information about neighbor nodes within the corresponding hop number.
  • the node information is read through the GCN network and the local features of the node are extracted.
  • the same method extracts the local features of the edge, calculates the loss value through the loss function, and updates the weight of the node perspective network and the weight of the edge perspective network based on the loss value.
  • the loss value contains node and edge information
  • the edge information will be taken into account when updating the weight of the node perspective network.
  • the node information will also be taken into consideration.
  • the edge-perspective network contains node information
  • the node-perspective network contains edge information.
  • the first judgment result of the graph data and the second judgment result of the graph data can be weighted according to their importance to obtain the judgment result of the graph data.
  • the weight of the first judgment result, ⁇ is the weight of the second judgment result of the graph data.
  • Both ⁇ and ⁇ are any numbers in the range of [0,1].
  • nodes contain more information, so the value of ⁇ is greater than ⁇ ;
  • edges contain more information, so the value of ⁇ is smaller than ⁇ . It can be understood that those skilled in the art can adaptively select the values of ⁇ and ⁇ according to the type of actual graph data.
  • the step before inputting the node feature matrix and the node adjacency matrix into the preset node perspective network, the step further includes: averaging the node feature matrix to obtain the average feature of the node.
  • the method for obtaining the average features of nodes is: Among them, ⁇ is the sigmord function, M is the number of nodes, and h i is the characteristic of the i-th node.
  • step S40 the node perspective network and the edge perspective network are jointly trained based on the judgment results of the graph data and the labels of the graph data, and updating the parameters of the dual-view graph neural network model to be trained includes the following process:
  • the dual-view graph neural network model is constructed based on two GCN networks and Softmax layers. Specifically, taking link prediction as an example, after the graph data is sent to the two GCN networks, the graph data is obtained through graph convolution operation The feature vector of the requested information (link) is sent to the Softmax layer for mapping, and the probability value of whether the graph data contains the requested information is obtained. This probability value is used to calculate the loss through the loss function. According to the calculation results, the parameters and weights in the dual-view graph neural network model are updated to obtain the updated dual-view graph neural network model. Training is performed again until the loss function converges. After the training is completed, the trained dual-view graph neural network model is obtained and packaged for use.
  • the weight update rule is: perform derivation according to the chain derivation rule.
  • different factor parameters are set, and the dual-view graph neural network is realized by adjusting the corresponding factor parameters. Construction of network loss function. During training, the gradient descent method is used to continuously adjust the parameters of the two-view network to achieve network construction.
  • the negative sample feature matrix of the node is input into the node perspective network, and the method for extracting the negative sample local features of the node is: Among them, e is the negative sample feature of the node, ⁇ is the sigmord function, is the degree matrix after adding the node characteristic matrix and the identity matrix, A is the node characteristic matrix, I is the identity matrix, X is the node adjacency matrix, and ⁇ is the preset parameter matrix.
  • the formula of the loss function L 1 of the node perspective network is: Among them, N is the number of positive sample nodes, M is the number of negative sample nodes, The representation vector of the node after extracting local features for the k-th positive sample, The representation vector of the node after extracting local features for the a-th negative sample, is the global feature of the positive sample, and D is the discriminator. If it approaches 1, it means that the global feature of the positive sample is similar to the local feature. If it approaches 0, it means that the two are not similar. Update the weight value of the node perspective network based on the loss value.
  • the dual-view graph neural network model in this embodiment can be applied to a variety of different fields, such as speech recognition, medical diagnosis, application testing, etc.
  • the trained dual-view graph neural network model can be directly used for fine-tuning, thereby avoiding the need to train a new model from scratch for each downstream task.
  • the proposed dual-view graph network self-supervised training strategy can more effectively learn rich key graph representation information, capture the general structural rules in node and edge data, and then enable it to perform downstream graph mining without limiting types. task fitting ability.
  • the method of this application transforms the original method of manually adjusting parameters and relying on machine learning engineers and experts into a method that is suitable for large-scale and replicable industrial implementation.
  • sequence number of each step in the above embodiment does not mean the order of execution.
  • the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application.
  • a training device for a dual-view neural network model corresponds to the training method for the dual-view neural network model in the above embodiment.
  • the training device of the dual-view graph neural network model includes a graph data acquisition module 111, an exchanged graph data acquisition module 112, a first judgment result acquisition module 113, a second judgment result acquisition module 114 and model training. Module 115.
  • the detailed description of each functional module is as follows:
  • the graph data acquisition module 111 is used to obtain multiple graph data with labels, the graph data includes nodes, edges and attribute information, and construct a node feature matrix and a node adjacency matrix based on the attribute information.
  • the exchanged graph data acquisition module 112 is used to exchange the positions of nodes and edges in the graph data, obtain the exchanged graph data, and construct an edge feature matrix and an edge adjacency matrix based on the attribute information of the exchanged graph data;
  • the first judgment result acquisition module 113 is used to input the node feature matrix and the node adjacency matrix into the node perspective network of the dual-view graph neural network model to be trained, and obtain the first judgment result of the graph data.
  • the node perspective The network is a GNN network;
  • the second judgment result acquisition module 114 is used to input the edge feature matrix and the edge adjacency matrix into the edge perspective network of the dual-view graph neural network model to be trained, and obtain the second judgment result of the graph data.
  • the edge perspective The network is a GNN network;
  • the model training module 115 is used to weight the first judgment result of the graph data and the second judgment result of the graph data to obtain the judgment result of the graph data, and based on the judgment result of the graph data and the graph data label, jointly train the node perspective network and the edge perspective network, update the parameters of the dual-view graph neural network model to be trained, and obtain the trained dual-view graph neural network model.
  • the first judgment result acquisition module 113 is specifically used for:
  • the first judgment result acquisition module 113 is specifically used for:
  • the first judgment result acquisition module 113 is specifically used to:
  • the node feature matrix is averaged to obtain the average feature of the node.
  • the first judgment result acquisition module 113 is also used to:
  • the method for extracting negative sample local features of nodes is: Among them, e is the negative sample local feature of the node, ⁇ is the sigmord function, is the degree matrix after adding the node characteristic matrix and the identity matrix, A is the node characteristic matrix, I is the identity matrix, X is the node adjacency matrix, and ⁇ is the preset parameter matrix.
  • model training module 115 is specifically used to:
  • the preset parameter update rules obtain the loss value of the graph data judgment result, and obtain new parameter values based on the loss value and the current parameter value;
  • model training module 115 is also used to:
  • the weight of the first judgment result, ⁇ is the weight of the second judgment result of the graph data.
  • This application provides a training device for a dual-view graph neural network model. It designs comparative learning of complementary views of nodes and edges to train the graph neural network.
  • the edge feature matrix is The edge adjacency matrix and the edge adjacency matrix are input to the edge perspective network, and the two networks are trained simultaneously, so that the updates of the weights in the two networks are related to the characteristics of nodes and edges, and finally the trained dual-view graph neural network model is obtained.
  • the model is constructed from the input data, allowing the model to distinguish positive and negative samples in the implicit representation space, making full use of the feature information of nodes and edges, so that the model has better transferability.
  • the obtained model has good generalization.
  • the trained dual-view graph neural network model can be directly used for fine-tuning, thereby avoiding the need to train a new model from scratch for each downstream task.
  • the proposed dual-view graph network training strategy can more effectively learn rich key graph representation information, capture the general structural rules in node and edge data, and then give it the ability to perform unlimited types of downstream graph mining tasks. Fitting ability.
  • the method of this application transforms the original method of manually adjusting parameters and relying on machine learning engineers and experts into a method that is suitable for large-scale and replicable industrial implementation.
  • Each module in the training device of the above-mentioned dual-view graph neural network model can be implemented in whole or in part by software, hardware, and combinations thereof.
  • Each of the above modules may be embedded in or independent of the processor of the computer device in the form of hardware, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be shown in Figure 7 .
  • the computer device includes a processor, memory, network interface, and database connected through a system bus. Wherein, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes non-volatile and/or volatile storage media and internal memory.
  • the non-volatile storage medium stores operating systems, computer programs and databases. This internal memory provides an environment for the execution of operating systems and computer programs in non-volatile storage media.
  • the network interface of the computer device is used to communicate with external clients through a network connection. When the computer program is executed by the processor, it implements functions or steps on the server side of a training method for a dual-view graph neural network model.
  • a computer device is provided.
  • the computer device may be a client, and its internal structure diagram may be shown in Figure 8 .
  • the computer equipment includes a processor, memory, network interface, display screen and input device connected by a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes non-volatile storage media and internal memory.
  • the non-volatile storage medium stores operating systems and computer programs. This internal memory provides an environment for the execution of operating systems and computer programs in non-volatile storage media.
  • the network interface of the computer device is used to communicate with an external server through a network connection. When the computer program is executed by the processor, it implements a client-side function or step of a training method for a dual-view neural network model.
  • a computer device including a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the computer program, the following steps are implemented:
  • the graph data includes nodes, edges and attribute information, and construct a node feature matrix and a node adjacency matrix based on the attribute information;
  • the edge feature matrix and the edge adjacency matrix into the edge perspective network of the dual-view graph neural network model to be trained to obtain the second judgment result of the graph data, and the edge perspective network is a GNN network;
  • the first judgment result of the graph data and the second judgment result of the graph data are weighted to obtain the judgment result of the graph data, and the node is jointly trained based on the judgment result of the graph data and the label of the graph data.
  • the perspective network and the edge perspective network update the parameters of the dual-view graph neural network model to be trained to obtain a trained dual-view graph neural network model.
  • a computer-readable storage medium is provided with a computer program stored thereon.
  • the computer program is executed by a processor, the following steps are implemented:
  • the graph data includes nodes, edges and attribute information, and construct a node feature matrix and a node adjacency matrix based on the attribute information;
  • the edge feature matrix and the edge adjacency matrix into the edge perspective network of the dual-view graph neural network model to be trained to obtain the second judgment result of the graph data, and the edge perspective network is a GNN network;
  • the first judgment result of the graph data and the second judgment result of the graph data are weighted to obtain the judgment result of the graph data, and the node is jointly trained based on the judgment result of the graph data and the label of the graph data.
  • the perspective network and the edge perspective network update the parameters of the dual-view graph neural network model to be trained to obtain a trained dual-view graph neural network model.
  • the computer-readable storage medium may be non-volatile or volatile.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Synchlink DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM
  • Module completion means dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

本申请涉及人工智能技术领域,提供一种双视角图神经网络模型的训练方法、装置、设备及介质。包括:获得多个具有标签的图数据,图数据中包括节点、边和属性信息;互换图数据中节点和边的位置,获得互换后图数据;将节点特征矩阵和节点邻接矩阵输入至节点视角网络,得到图数据的第一判断结果;将边特征矩阵和边邻接矩阵输入边视角网络,得到图数据的第二判断结果;将图数据的第一判断结果和图数据的第二判断结果加权处理,得到图数据的判断结果,基于图数据的判断结果和图数据的标签,得到训练好的双视角图神经网络模型。本申请将图数据中节点和边的位置互换,通过同时获取节点特征和边特征,使得训练的模型具有较好的泛化和可迁移性。

Description

双视角图神经网络模型的训练方法、装置、设备及介质
优先权申明
本申请要求于 2022323日提交中国专利局、申请号为 202210290121.9,发明名称为“_ 双视角图神经网络模型的训练方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,特别涉及一种双视角图神经网络模型的训练方法、装置、设备及介质。
背景技术
图数据是一种数据结构,图神经网络是深度学习在图结构上的一个分支。常见的图数据包含节点和边,其中,节点包含了实体信息,边包含实体间的关系信息。现在许多学习任务都需要处理图数据,比如物理系统建模、社交网络分析、交通网络分析、蛋白质结构预测以及分子性质预测等,这些都需要模型能够从图数据的输入中学习相关的知识,于是图神经网络模型随之诞生。
尽管图神经网络在过去几年取得了巨大的成功,但在有监督或半监督机器学习模型中,大部分模型基于“聚合邻居信息,更新节点自身状态”进行构建,在此范式中,节点特征得到充分的学习。而现实的许多图数据中,边上存在丰富的信息,它们在当前大多模型中未被充分利用。
发明人意识到,上述对比学习的方法中,对于需要模型学习到迁移性较强的、可重用的、有共性的知识,这种只关注节点信息的做法会导致图数据在模型中使用时通常主要利用了节点特征,忽略了边上信息,从而使得模型的迁移性和泛化能力不足的问题。
发明内容
本申请的目的在于提供一种双视角图神经网络模型的训练方法、装置、设备及介质,以解决图数据在模型中使用时通常主要利用了节点特征,忽略了边上信息,从而使得模型的迁移性和泛化能力不足的技术问题。
第一方面,提供了一种双视角图神经网络模型的训练方法,包括:
获得多个具有标签的图数据,所述图数据中包括节点、边和属性信息,并基于属性信息构建节点特征矩阵和节点邻接矩阵;
互换图数据中节点和边的位置,获得互换后图数据,并基于所述互换后图数据的属性信息构建边特征矩阵和边邻接矩阵;
将所述节点特征矩阵和所述节点邻接矩阵输入至待训练双视角图神经网络模型的节点视角网络,得到图数据的第一判断结果,所述节点视角网络为GNN网络;
将所述边特征矩阵和所述边邻接矩阵输入至待训练双视角图神经网络模型的边视角网络,得到图数据的第二判断结果,所述边视角网络为GNN网络;
将所述图数据的第一判断结果和所述图数据的第二判断结果加权处理,得到图数据的判断结果,并基于所述图数据的判断结果和图数据的标签,联合训练所述节点视角网络和所述边视角网络,更新所述待训练双视角图神经网络模型的参数,得到训练好的双视角图神经网络模型。
第二方面,提供了一种双视角图神经网络模型的训练装置,包括:
图数据获取模块,用于获得多个具有标签的图数据,所述图数据中包括节点、边和属性信息,并基于属性信息构建节点特征矩阵和节点邻接矩阵;
互换后图数据获取模块,用于互换图数据中节点和边的位置,获得互换后图数据,并基 于所述互换后图数据的属性信息构建边特征矩阵和边邻接矩阵;
第一判断结果获取模块,用于将所述节点特征矩阵和所述节点邻接矩阵输入至待训练双视角图神经网络模型的节点视角网络,得到图数据的第一判断结果,所述节点视角网络为GNN网络;
第二判断结果获取模块,用于将所述边特征矩阵和所述边邻接矩阵输入至待训练双视角图神经网络模型的边视角网络,得到图数据的第二判断结果,所述边视角网络为GNN网络;
模型训练模块,用于将所述图数据的第一判断结果和所述图数据的第二判断结果加权处理,得到图数据的判断结果,并基于所述图数据的判断结果和图数据的标签,联合训练所述节点视角网络和所述边视角网络,更新所述待训练双视角图神经网络模型的参数,得到训练好的双视角图神经网络模型。
第三方面,提供了一种计算机设备,包括存储器、处理器以及存储在存储器中并可在处理器上运行的计算机程序,处理器执行计算机程序时实现上述智能问答处理方法的步骤。
第四方面,提供了一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,计算机程序被处理器执行时实现上述智能问答处理方法的步骤。
本申请的双视角图神经网络模型的训练方法、系统、设备及介质,设计节点和边互补视角的对比学习来训练图神经网络,通过将节点特征矩阵和节点邻接矩阵输入至节点视角网络,将边特征矩阵和边邻接矩阵输入至边视角网络,同时训练两个网络,使得两个网络中权重的更新与节点和边的特征都相关,最终获得训练好的双视角图神经网络模型。采用对比学习的思想,从输入数据中构造出正负样本,让模型在隐式表示空间对正负样本进行判别,充分利用节点和边的特征信息,使模型具有较好的迁移性。得到的模型具有较好的泛化性,当需要解决具体的下游任务时,可以直接使用训练好的双视角图神经网络模型来进行微调,从而避免为每一个下游任务从零开始训练全新的模型。此外,提出的双视角的图神经网络自监督训练策略,可以更有效地学到丰富的关键的图表示信息,捕捉到节点和边数据中的通用结构规律,进而赋予其在不限定种类的下游图挖掘任务上的拟合能力。另一方面,本申请的方法,使原来的手工调参、依靠机器学习工程师和专家的方式,转变到可适用于大规模、可复制的工业施展的方式。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图:
图1显示为本申请一实施例中双视角图神经网络模型的训练方法的一应用环境示意图;
图2显示为本申请一实施例中双视角图神经网络模型的训练方法的流程示意图;
图3显示为本申请一实施例中步骤S30的流程示意图;
图4显示为本申请一实施例中步骤S31的流程示意图;
图5显示为本申请一实施例中步骤S40的流程示意图;
图6显示为本申请一实施例中双视角图神经网络模型的训练装置的结构框图;
图7是本申请一实施例中计算机设备的一结构示意图;
图8是本申请一实施例中计算机设备的另一结构示意图。
具体实施方式
本申请实施例提供的双视角图神经网络模型的训练方法,可应用在如图1的应用环境中,其中,客户端通过网络与服务端进行通信。服务端可以通过客户端设计节点和边互补视角的对比学习来训练图神经网络,通过将节点特征矩阵和节点邻接矩阵输入至节点视角网络,将边特征矩阵和边邻接矩阵输入至边视角网络,同时训练两个网络,使得两个网络中权重的更新与节点和边的特征都相关,最终获得训练好的双视角图神经网络模型。采用对比学习的思 想,从输入数据中构造出正负样本,让模型在隐式表示空间对正负样本进行判别,充分利用节点和边的特征信息,使模型具有较好的迁移性。得到的模型具有较好的泛化性,当需要解决具体的下游任务时,可以直接使用训练好的双视角图神经网络模型来进行微调,从而避免为每一个下游任务从零开始训练全新的模型。此外,提出的双视角的图网络自监督训练策略,可以更有效地学到丰富的关键的图表示信息,捕捉到节点和边数据中的通用结构规律,进而赋予其在不限定种类的下游图挖掘任务上的拟合能力。另一方面,本申请的方法,使原来的手工调参、依靠机器学习工程师和专家的方式,转变到可适用于大规模、可复制的工业施展的方式。其中,客户端可以但不限于各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备。服务端可以用独立的服务器或者是多个服务器组成的服务器集群来实现。下面通过具体的实施例对本申请进行详细的描述。
请参阅图2所示,图2为本申请实施例提供的双视角图神经网络模型的训练方法的一个流程示意图,包括如下步骤:
S10、获得多个具有标签的图数据,所述图数据中包括节点、边和属性信息,并基于属性信息构建节点特征矩阵和节点邻接矩阵。
本实施例中,图数据是从开源的图数据库或自己制作的数据集中获得,具体获取方式不做限定。其中,图数据是以“图”这种数据结构存储和查询数据,它的数据模型主要是以属性信息、节点和边来体现。图数据的属性信息是指节点和边中含有的表征图数据自身性质的信息。例如,对于化合物的图数据,其属性信息可以是化合物中各原子的性质、具体原子结构等。图数据通过节点和边的组合,能够快速解决复杂的关系问题。例如,对于化合物结构,节点表示化合物中的各原子,边表示连接原子之间化学键的信息。图数据中每个节点和相邻的节点之间通过边相连接,并由此构成节点邻接矩阵。对于节点数为N的图数据来说,节点邻接矩阵是一个N行N列的矩阵,节点邻接矩阵中第i行第j列的元素表示“节点i、j之间是否有边”。其中一种表示方法为:若节点i、j之间有边,则这个元素是1,否则是0。当然,还可将其表示为:若节点i、j之间有边,则这个元素是0,否则是1。节点特征矩阵表示该图数据中含有的各节点的特征,节点特征矩阵中的每一行表示图数据中一个节点的特征。例如对于化合物来说,节点的特征可以是各原子的大小、原子的重量等。
S20、互换图数据中节点和边的位置,获得互换后图数据,并基于所述互换后图数据的属性信息构建边特征矩阵和边邻接矩阵;
本实施例中,考虑到自监督学习中,对比学习是目前最有效的技术手段之一。所谓自监督对比学习,其实就是对于任意两个数据点,若越相似(属于同一类)其图表示就会越接近。基于对比学习的自监督学习模型,在图像和基于序列的语言数据结构上取得了很大的成功。该思路主要是从输入数据中构造出不同的样本,通过引导预训练模型在隐式表示空间对正负样本进行判别来学习输入数据中的特征。基于上述考量,考虑到边中具有一系列特征,通过将节点和边进行互换,把原来图的节点变成新图的边,原来图的边变成新图的节点,构建双视角的图数据。此时两个视角信息互补,可以获得更加充分丰富对比学习的效果。边特征矩阵和边邻接矩阵的构建方法与上述节点特征矩阵和节点邻接矩阵相似,在此不再赘述。通过随机改变边特征矩阵的任意两行或多行之间的顺序,得到负样本边特征矩阵。例如对于边特征矩阵中行序号为1、2、3的3个节点,改变第1节点和第2节点的位置得到负样本边特征矩阵为2、1、3。
S30、将所述节点特征矩阵和所述节点邻接矩阵输入至待训练双视角图神经网络模型的节点视角网络,得到图数据的第一判断结果,所述节点视角网络为GNN网络。
在步骤S30中,所述将所述节点特征矩阵和所述节点邻接矩阵输入所述节点视角网络,得到图数据的第一判断结果,包括:
S31、将节点特征矩阵和节点邻接矩阵输入至所述节点视角网络,提取节点的局部特征;
S32、将所述节点的局部特征和预设的节点的平均特征进行比较,得到图数据的第一判 断结果。
具体地,步骤S31包括以下过程:
S311、将节点特征矩阵进行特征扰动处理,得到节点的负样本特征矩阵;
S312、将节点邻接矩阵、节点特征矩阵和节点的负样本特征矩阵输入所述节点视角网络,提取节点的正样本局部特征和节点的负样本局部特征,获得节点的局部特征。
S40、将所述边特征矩阵和所述边邻接矩阵输入至待训练双视角图神经网络模型的边视角网络,得到图数据的第二判断结果,所述边视角网络为GNN网络。
本实施例中,将节点特征矩阵进行特征扰动处理,得到节点负样本特征矩阵,其中,特征扰动是指将节点特征矩阵的多行进行随机互换。例如,图数据中有4个节点,每个节点有32维特征,节点特征矩阵的行序号为1、2、3、4,将其进行特征扰动后,使得第2个节点和第3个节点的位置进行互换,此时虽然网络的拓扑结构没有改变,但每个位置上节点的特征已经发生改变,并由此得到节点的负样本特征矩阵,行序号为1、3、2、4。将节点特征矩阵和节点的负样本特征矩阵、节点邻接矩阵输入至预设的节点视角网络。提取节点的正样本局部特征时,每一层卷积层通过学习邻接矩阵和该层卷积层中节点特征矩阵从而获得下一层卷积层中正样本数据的相关特征。提取节点的负样本局部特征时,每一层卷积层通过学习邻接矩阵和该层卷积层中节点的负样本特征矩阵从而获得下一层卷积层中负样本数据的相关特征。通过这种多个卷积层的堆叠,实现图数据中节点信息的聚合和更新。节点的正样本局部特征和节点的负样本局部特征共同形成节点的局部特征。将节点的局部特征和节点的平均特征进行两者相似性比较,得到图数据的第一判断结果。将边的局部特征和边的平均特征进行两者相似性比较,得到图数据的第二判断结果。其中,所述提取节点的负样本局部特征方法为:
Figure PCTCN2022090086-appb-000001
其中,e为节点的负样本局部特征,σ为sigmord函数,
Figure PCTCN2022090086-appb-000002
为节点特征矩阵与单位矩阵相加后的度矩阵,A为节点特征矩阵,I为单位矩阵,X为节点邻接矩阵,θ为预设的参数矩阵。可以理解的是,边视角网络的处理过程与节点视角网络的处理过程类似,在此不做赘述。
S40、将所述图数据的第一判断结果和所述图数据的第二判断结果加权处理,得到图数据的判断结果,并基于所述图数据的判断结果和图数据的标签,联合训练所述节点视角网络和所述边视角网络,更新所述待训练双视角图神经网络模型的参数,得到训练好的双视角图神经网络模型
本实施例中,考虑到个体自身的特征已经无法完全代表个体的所有信息,这有可能是在数据采集过程中丢失错误或者一些个体的伪装,造成了一些特征的偏差。需要它邻居节点的信息作为当前节点的信息补充。进而得到比单一个体特征,更完整的信息。因此使用GCN网络作为节点视角网络和边视角网络提取节点和边的特征。图神经网络模型学习的主要过程,是通过迭代对图数据中节点的邻居信息进行聚合和更新。在一次迭代中,每一个节点通过聚合邻居节点的特征及自己在上一层的特征来更新自己的信息,通常也会对聚合后的信息进行非线性变换。通过堆叠多层网络,每个节点可以获取到相应跳数内的邻居节点信息。将节点特征矩阵和节点邻接矩阵输入至预设的节点视角网络后,通过GCN网络读取其中的节点的信息,提取节点的局部特征。同样的方法提取边的局部特征,并通过损失函数计算损失值,并根据损失值更新节点视角网络的权重以及边视角网络的权重。此时,由于损失值中含有节点和边的信息,在更新节点视角网络的权重时,会同时考虑到边的信息,同理,在更新边视角网络的参数时,会同时考虑到节点的信息,从而使得边视角网络中含有节点信息,节点视角网络中含有边信息,训练时实现了两者的联合训练,实现了双视角的网络学习。具体地,可根据图数据的第一判断结果和图数据的第二判断结果的重要性,对两者进行加权处理,得到图数据的判断结果。所述图数据的判断结果J的计算方法为J=αJ 1+βJ 2,其中,J 1为图数据的第一判断结果,J 2为图数据的第二判断结果,α为图数据的第一判断结果的权重,β为图数据的第二判断结果的权重。α、β均为取值范围为[0,1]范围内的任一数。对于社交网络型图 数据来说,节点中包含的信息更多,因此α取值大于β;对于文本序列类图数据来说,边包含的信息更多,因此α取值小于β。可以理解的是,α、β取值本领域技术人员可根据实际图数据的类型适应性选择。
本实施例中,所述将节点特征矩阵和节点邻接矩阵输入预设的节点视角网络之前,还包括:对节点特征矩阵进行均值化处理,获得节点的平均特征。其中,节点的平均特征获得方法为:
Figure PCTCN2022090086-appb-000003
其中,σ为sigmord函数,M为节点的数量,h i为第i个节点的特征。
步骤S40中,所述基于所述图数据的判断结果和图数据的标签联合训练所述节点视角网络和所述边视角网络,更新所述待训练双视角图神经网络模型的参数包括以下过程:
S41、获取节点视角网络和边视角网络的当前权重值;
S42、根据预设的权重更新规则,利用损失值和当前权重值得到新的权重值;
S43、使用新的权重值作为节点视角网络和边视角网络的权重。
本实施例中双视角图神经网络模型是基于两个GCN网络与Softmax层进行构建,具体地,以链接预测为例,将图数据送入两个GCN网络后,通过图卷积运算得到图数据中所求信息(链接)的特征向量,将所求信息的特征向量送入至Softmax层进行映射,得到图数据中是否含有所求信息的概率值。将此概率值通过损失函数进行损失计算,根据计算的结果,更新双视角图神经网络模型中的各参数和权重,得到更新后的双视角图神经网络模型。再次进行训练,直至损失函数收敛。训练结束,得到训练好的双视角图神经网络模型,封装使用。在本实施例中,权重更新规则为:按照链式求导法则进行求导。具体地,误差反向传播梯度计算公式为:Gradp=((w p-1) TGrad p-1)δ,其中,δ为当前的损失值,Gradp为第p层的误差反向传播梯度,w p-1为第p-1次迭代训练时图神经元的权重,根据反向传播梯度,使用下式计算权重更新值:wp=wp-1-ε·Gradp,其中,ε为学习率,wp表示当前图神经元的权重。需要说明的是,节点视角网络与边视角网络的训练过程类似,因此边视角网络的训练过程在此不再赘述。
本实施例中,损失值的计算方法为Loss=αL 1+βL 2,其中,α为节点视角网络的损失函数L 1的因子参数,β为边视角网络的损失函数L 2的因子参数。也即损失函数是节点视角网络的损失函数与边视角网络的损失函数加和,同时考虑到两个网络的侧重点不同,设置不同的因子参数,通过调整对应的因子参数,实现双视角图神经网络损失函数的构造。训练时,通过梯度下降法,不断调整两个视角网络的参数,实现网络的构建。进一步地,所述将节点的负样本特征矩阵输入节点视角网络,提取节点的负样本局部特征方法为:
Figure PCTCN2022090086-appb-000004
其中,e为节点的负样本特征,σ为sigmord函数,
Figure PCTCN2022090086-appb-000005
为节点特征矩阵与单位矩阵相加后的度矩阵,A为节点特征矩阵,I为单位矩阵,X为节点邻接矩阵,θ为预设的参数矩阵。本实施例中,节点视角网络的损失函数L 1的公式为:
Figure PCTCN2022090086-appb-000006
Figure PCTCN2022090086-appb-000007
其中,N为正样本节点数,M为负样本节点数,
Figure PCTCN2022090086-appb-000008
为第k个正样本提取局部特征后节点的表示向量,
Figure PCTCN2022090086-appb-000009
为第a个负样本提取局部特征后节点的表示向量,
Figure PCTCN2022090086-appb-000010
为正样本全局特征,D为判别器,若趋近1表示正样本全局特征与局部特征相似,若趋近0表示两者不相似。根据损失值更新节点视角网络的权重值。
可以理解的是,本实施例中的双视角图神经网络模型能够应用于多种不同的领域中,如:语音识别、医疗诊断、应用程序的测试等。
可见,在上述方案中,设计节点和边互补视角的对比学习来训练图神经网络,通过将节点特征矩阵和节点邻接矩阵输入至节点视角网络,将边特征矩阵和边邻接矩阵输入至边视角网络,同时训练两个网络,使得两个网络中权重的更新与节点和边的特征都相关,最终获得训练好的双视角图神经网络模型。采用对比学习的思想,从输入数据中构造出正负样本,让模型在隐式表示空间对正负样本进行判别,充分利用节点和边的特征信息,使模型具有较好 的迁移性。得到的模型具有较好的泛化性,当需要解决具体的下游任务时,可以直接使用训练好的双视角图神经网络模型来进行微调,从而避免为每一个下游任务从零开始训练全新的模型。此外,提出的双视角的图网络自监督训练策略,可以更有效地学到丰富的关键的图表示信息,捕捉到节点和边数据中的通用结构规律,进而赋予其在不限定种类的下游图挖掘任务上的拟合能力。另一方面,本申请的方法,使原来的手工调参、依靠机器学习工程师和专家的方式,转变到可适用于大规模、可复制的工业施展的方式。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
在一实施例中,提供一种双视角图神经网络模型的训练装置,该双视角图神经网络模型的训练装置与上述实施例中双视角图神经网络模型的训练方法一一对应。如图6所示,该双视角图神经网络模型的训练装置包括图数据获取模块111、互换后图数据获取模块112、第一判断结果获取模块113、第二判断结果获取模块114和模型训练模块115。各功能模块详细说明如下:
图数据获取模块111,用于获得多个具有标签的图数据,所述图数据中包括节点、边和属性信息,并基于属性信息构建节点特征矩阵和节点邻接矩阵。
互换后图数据获取模块112,用于互换图数据中节点和边的位置,获得互换后图数据,并基于所述互换后图数据的属性信息构建边特征矩阵和边邻接矩阵;
第一判断结果获取模块113,用于将所述节点特征矩阵和所述节点邻接矩阵输入至待训练双视角图神经网络模型的节点视角网络,得到图数据的第一判断结果,所述节点视角网络为GNN网络;
第二判断结果获取模块114,用于将所述边特征矩阵和所述边邻接矩阵输入至待训练双视角图神经网络模型的边视角网络,得到图数据的第二判断结果,所述边视角网络为GNN网络;
模型训练模块115,用于将所述图数据的第一判断结果和所述图数据的第二判断结果加权处理,得到图数据的判断结果,并基于所述图数据的判断结果和图数据的标签,联合训练所述节点视角网络和所述边视角网络,更新所述待训练双视角图神经网络模型的参数,得到训练好的双视角图神经网络模型.
在一实施例中,第一判断结果获取模块113,具体用于:
将节点特征矩阵和节点邻接矩阵输入至所述节点视角网络,提取节点的局部特征;
将所述节点的局部特征和预设的节点的平均特征进行比较,得到图数据的第一判断结果。
在一实施例中,第一判断结果获取模块113,具体用于:
将节点特征矩阵进行特征扰动处理,得到节点的负样本特征矩阵;
将节点特征矩阵和节点的负样本特征矩阵输入所述节点视角网络,提取节点的正样本局部特征和节点的负样本局部特征,获得节点的局部特征。
在一实施例中,第一判断结果获取模块113,具体用于:
对节点特征矩阵进行均值化处理,获得节点的平均特征。
在一实施例中,第一判断结果获取模块113,还用于:
所述提取节点的负样本局部特征方法为:
Figure PCTCN2022090086-appb-000011
其中,e为节点的负样本局部特征,σ为sigmord函数,
Figure PCTCN2022090086-appb-000012
为节点特征矩阵与单位矩阵相加后的度矩阵,A为节点特征矩阵,I为单位矩阵,X为节点邻接矩阵,θ为预设的参数矩阵。
在一实施例中,模型训练模块115,具体用于:
获取所述节点视角网络和所述边视角网络的当前参数值;
根据预设的参数更新规则,获得所述图数据判断结果的损失值,根据所述损失值和当前参数值得到新的参数值;
使用新的参数值作为所述待训练双视角图神经网络模型的参数。
在一实施例中,模型训练模块115,还用于:
所述图数据的判断结果J的计算方法为J=αJ 1+βJ 2,其中,J 1为图数据的第一判断结果,J 2为图数据的第二判断结果,α为图数据的第一判断结果的权重,β为图数据的第二判断结果的权重。
本申请提供了一种双视角图神经网络模型的训练装置,设计节点和边互补视角的对比学习来训练图神经网络,通过将节点特征矩阵和节点邻接矩阵输入至节点视角网络,将边特征矩阵和边邻接矩阵输入至边视角网络,同时训练两个网络,使得两个网络中权重的更新与节点和边的特征都相关,最终获得训练好的双视角图神经网络模型。采用对比学习的思想,从输入数据中构造出正负样本,让模型在隐式表示空间对正负样本进行判别,充分利用节点和边的特征信息,使模型具有较好的迁移性。得到的模型具有较好的泛化性,当需要解决具体的下游任务时,可以直接使用训练好的双视角图神经网络模型来进行微调,从而避免为每一个下游任务从零开始训练全新的模型。此外,提出的双视角图网络训练策略,可以更有效地学到丰富的关键的图表示信息,捕捉到节点和边数据中的通用结构规律,进而赋予其在不限定种类的下游图挖掘任务上的拟合能力。另一方面,本申请的方法,使原来的手工调参、依靠机器学习工程师和专家的方式,转变到可适用于大规模、可复制的工业施展的方式。
关于双视角图神经网络模型的训练装置的具体限定可以参见上文中对于双视角图神经网络模型的训练方法的限定,在此不再赘述。上述双视角图神经网络模型的训练装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务端,其内部结构图可以如图7所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性和/或易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的网络接口用于与外部的客户端通过网络连接通信。该计算机程序被处理器执行时以实现一种双视角图神经网络模型的训练方法服务端侧的功能或步骤。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是客户端,其内部结构图可以如图8所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口、显示屏和输入装置。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的网络接口用于与外部服务器通过网络连接通信。该计算机程序被处理器执行时以实现一种双视角图神经网络模型的训练方法客户端侧的功能或步骤
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现以下步骤:
获得多个具有标签的图数据,所述图数据中包括节点、边和属性信息,并基于属性信息构建节点特征矩阵和节点邻接矩阵;
互换图数据中节点和边的位置,获得互换后图数据,并基于所述互换后图数据的属性信息构建边特征矩阵和边邻接矩阵;
将所述节点特征矩阵和所述节点邻接矩阵输入至待训练双视角图神经网络模型的节点视角网络,得到图数据的第一判断结果,所述节点视角网络为GNN网络;
将所述边特征矩阵和所述边邻接矩阵输入至待训练双视角图神经网络模型的边视角网络,得到图数据的第二判断结果,所述边视角网络为GNN网络;
将所述图数据的第一判断结果和所述图数据的第二判断结果加权处理,得到图数据的判 断结果,并基于所述图数据的判断结果和图数据的标签,联合训练所述节点视角网络和所述边视角网络,更新所述待训练双视角图神经网络模型的参数,得到训练好的双视角图神经网络模型。
在一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现以下步骤:
获得多个具有标签的图数据,所述图数据中包括节点、边和属性信息,并基于属性信息构建节点特征矩阵和节点邻接矩阵;
互换图数据中节点和边的位置,获得互换后图数据,并基于所述互换后图数据的属性信息构建边特征矩阵和边邻接矩阵;
将所述节点特征矩阵和所述节点邻接矩阵输入至待训练双视角图神经网络模型的节点视角网络,得到图数据的第一判断结果,所述节点视角网络为GNN网络;
将所述边特征矩阵和所述边邻接矩阵输入至待训练双视角图神经网络模型的边视角网络,得到图数据的第二判断结果,所述边视角网络为GNN网络;
将所述图数据的第一判断结果和所述图数据的第二判断结果加权处理,得到图数据的判断结果,并基于所述图数据的判断结果和图数据的标签,联合训练所述节点视角网络和所述边视角网络,更新所述待训练双视角图神经网络模型的参数,得到训练好的双视角图神经网络模型。
需要说明的是,上述关于计算机可读存储介质或计算机设备所能实现的功能或步骤,可对应参阅前述方法实施例中,服务端侧以及客户端侧的相关描述,为避免重复,这里不再一一描述。所述计算机可读存储介质可以是非易失性,也可以是易失性。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种双视角图神经网络模型的训练方法,其中,包括:
    获得多个具有标签的图数据,所述图数据中包括节点、边和属性信息,并基于属性信息构建节点特征矩阵和节点邻接矩阵;
    互换图数据中节点和边的位置,获得互换后图数据,并基于所述互换后图数据的属性信息构建边特征矩阵和边邻接矩阵;
    将所述节点特征矩阵和所述节点邻接矩阵输入至待训练双视角图神经网络模型的节点视角网络,得到图数据的第一判断结果,所述节点视角网络为GNN网络;
    将所述边特征矩阵和所述边邻接矩阵输入至待训练双视角图神经网络模型的边视角网络,得到图数据的第二判断结果,所述边视角网络为GNN网络;
    将所述图数据的第一判断结果和所述图数据的第二判断结果加权处理,得到图数据的判断结果,并基于所述图数据的判断结果和图数据的标签,联合训练所述节点视角网络和所述边视角网络,更新所述待训练双视角图神经网络模型的参数,得到训练好的双视角图神经网络模型。
  2. 根据权利要求1所述的双视角图神经网络模型的训练方法,其中,所述将所述节点特征矩阵和所述节点邻接矩阵输入所述节点视角网络,得到图数据的第一判断结果,包括:
    将节点特征矩阵和节点邻接矩阵输入至所述节点视角网络,提取节点的局部特征;
    将所述节点的局部特征和预设的节点的平均特征进行比较,得到图数据的第一判断结果。
  3. 根据权利要求2所述的双视角图神经网络模型的训练方法,其中,所述节点的平均特征获得方法为:
    Figure PCTCN2022090086-appb-100001
    其中,σ为sigmord函数,M为节点的数量,h i为第i个节点的特征。
  4. 根据权利要求2所述的双视角图神经网络模型的训练方法,其中,所述将节点特征矩阵和节点邻接矩阵输入至所述节点视角网络,提取节点的局部特征包括以下过程:
    将节点特征矩阵进行特征扰动处理,得到节点的负样本特征矩阵;
    将节点特征矩阵和节点的负样本特征矩阵输入所述节点视角网络,提取节点的正样本局部特征和节点的负样本局部特征,获得节点的局部特征。
  5. 根据权利要求4所述的双视角图神经网络模型的训练方法,其中,所述特征扰动是指将所述节点特征矩阵的多行进行随机互换。
  6. 根据权利要求4所述的双视角图神经网络模型的训练方法,其中,所述节点的正样本局部特征和所述节点的负样本局部特征共同形成节点的局部特征。
  7. 根据权利要求4所述的双视角图神经网络模型的训练方法,其中,所述提取节点的负样本局部特征方法为:
    Figure PCTCN2022090086-appb-100002
    其中,e为节点的负样本局部特征,σ为sigmord函数,
    Figure PCTCN2022090086-appb-100003
    为节点特征矩阵与单位矩阵相加后的度矩阵,A为节点特征矩阵,I为单位矩阵,X为节点邻接矩阵,θ为预设的参数矩阵。
  8. 根据权利要求1所述的双视角图神经网络模型的训练方法,其中,所述基于所述图数据的判断结果和图数据的标签联合训练所述节点视角网络和所述边视角网络,更新所述待训练双视角图神经网络模型的参数,包括以下过程:
    获取所述节点视角网络和所述边视角网络的当前参数值;
    根据预设的参数更新规则,获得所述图数据判断结果的损失值,根据所述损失值和当前参数值得到新的参数值;
    使用新的参数值作为所述待训练双视角图神经网络模型的参数。
  9. 根据权利要求8所述的双视角图神经网络模型的训练方法,其中,所述参数更新规则为:按照链式求导法则进行求导。
  10. 根据权利要求8所述的双视角图神经网络模型的训练方法,其中,所述新的参数值的计算公式为:wp=wp-1-ε·Gradp,其中,ε为学习率,wp表示当前图神经元的权重,Gradp 为第p层的误差反向传播梯度,w p-1为第p-1次迭代训练时图神经元的权重。
  11. 根据权利要求8所述的双视角图神经网络模型的训练方法,其中,所述损失值的计算方法为:Loss=αL 1+βL 2,其中,α为节点视角网络的损失函数L 1的因子参数,β为边视角网络的损失函数L 2的因子参数。
  12. 根据权利要求11所述的双视角图神经网络模型的训练方法,其中,所述节点视角网络的损失函数L 1的公式为:
    Figure PCTCN2022090086-appb-100004
    其中,N为正样本节点数,M为负样本节点数,
    Figure PCTCN2022090086-appb-100005
    为第k个正样本提取局部特征后节点的表示向量,
    Figure PCTCN2022090086-appb-100006
    为第a个负样本提取局部特征后节点的表示向量,
    Figure PCTCN2022090086-appb-100007
    为正样本全局特征,D为判别器。
  13. 根据权利要求1所述的双视角图神经网络模型的训练方法,其中,所述将所述节点特征矩阵和所述节点邻接矩阵输入至待训练双视角图神经网络模型的节点视角网络之前,还包括:对节点特征矩阵进行均值化处理,获得节点的平均特征。
  14. 根据权利要求1所述的双视角图神经网络模型的训练方法,其中,所述图数据的判断结果J的计算方法为J=αJ 1+βJ 2,其中,J 1为图数据的第一判断结果,J 2为图数据的第二判断结果,α为图数据的第一判断结果的权重,β为图数据的第二判断结果的权重。
  15. 根据权利要求14所述的双视角图神经网络模型的训练方法,其中,社交网络型图数据,α取值大于β。
  16. 根据权利要求14所述的双视角图神经网络模型的训练方法,其中,文本序列类图数据,α取值小于β。
  17. 根据权利要求1所述的双视角图神经网络模型的训练方法,其中,通过随机改变所述边特征矩阵的任意两行或多行之间的顺序,得到负样本边特征矩阵。
  18. 一种双视角图神经网络模型的训练装置,其中,包括:
    图数据获取模块,用于获得多个具有标签的图数据,所述图数据中包括节点、边和属性信息,并基于属性信息构建节点特征矩阵和节点邻接矩阵;
    互换后图数据获取模块,用于互换图数据中节点和边的位置,获得互换后图数据,并基于所述互换后图数据的属性信息构建边特征矩阵和边邻接矩阵;
    第一判断结果获取模块,用于将所述节点特征矩阵和所述节点邻接矩阵输入至待训练双视角图神经网络模型的节点视角网络,得到图数据的第一判断结果,所述节点视角网络为GNN网络;
    第二判断结果获取模块,用于将所述边特征矩阵和所述边邻接矩阵输入至待训练双视角图神经网络模型的边视角网络,得到图数据的第二判断结果,所述边视角网络为GNN网络;
    模型训练模块,用于将所述图数据的第一判断结果和所述图数据的第二判断结果加权处理,得到图数据的判断结果,并基于所述图数据的判断结果和图数据的标签,联合训练所述节点视角网络和所述边视角网络,更新所述待训练双视角图神经网络模型的参数,得到训练好的双视角图神经网络模型。
  19. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现权利要求1至17中任一项所述方法的步骤。
  20. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现权利要求1至17中任一项所述的方法的步骤。
PCT/CN2022/090086 2022-03-23 2022-04-28 双视角图神经网络模型的训练方法、装置、设备及介质 WO2023178793A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210290121.9 2022-03-23
CN202210290121.9A CN114707641A (zh) 2022-03-23 2022-03-23 双视角图神经网络模型的训练方法、装置、设备及介质

Publications (1)

Publication Number Publication Date
WO2023178793A1 true WO2023178793A1 (zh) 2023-09-28

Family

ID=82168069

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/090086 WO2023178793A1 (zh) 2022-03-23 2022-04-28 双视角图神经网络模型的训练方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN114707641A (zh)
WO (1) WO2023178793A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115187610B (zh) * 2022-09-08 2022-12-30 中国科学技术大学 基于图神经网络的神经元形态分析方法、设备及存储介质
WO2024065476A1 (zh) * 2022-09-29 2024-04-04 华为技术有限公司 一种无线策略的优化方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110378372A (zh) * 2019-06-11 2019-10-25 中国科学院自动化研究所南京人工智能芯片创新研究院 图数据识别方法、装置、计算机设备和存储介质
CN112734034A (zh) * 2020-12-31 2021-04-30 平安科技(深圳)有限公司 模型训练方法、调用方法、装置、计算机设备和存储介质
US20210232918A1 (en) * 2020-01-29 2021-07-29 Nec Laboratories America, Inc. Node aggregation with graph neural networks
CN113705772A (zh) * 2021-07-21 2021-11-26 浪潮(北京)电子信息产业有限公司 一种模型训练方法、装置、设备及可读存储介质
US20210374499A1 (en) * 2020-05-26 2021-12-02 International Business Machines Corporation Iterative deep graph learning for graph neural networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110378372A (zh) * 2019-06-11 2019-10-25 中国科学院自动化研究所南京人工智能芯片创新研究院 图数据识别方法、装置、计算机设备和存储介质
US20210232918A1 (en) * 2020-01-29 2021-07-29 Nec Laboratories America, Inc. Node aggregation with graph neural networks
US20210374499A1 (en) * 2020-05-26 2021-12-02 International Business Machines Corporation Iterative deep graph learning for graph neural networks
CN112734034A (zh) * 2020-12-31 2021-04-30 平安科技(深圳)有限公司 模型训练方法、调用方法、装置、计算机设备和存储介质
CN113705772A (zh) * 2021-07-21 2021-11-26 浪潮(北京)电子信息产业有限公司 一种模型训练方法、装置、设备及可读存储介质

Also Published As

Publication number Publication date
CN114707641A (zh) 2022-07-05

Similar Documents

Publication Publication Date Title
WO2021114625A1 (zh) 用于多任务场景的网络结构构建方法和装置
US10248664B1 (en) Zero-shot sketch-based image retrieval techniques using neural networks for sketch-image recognition and retrieval
WO2023178793A1 (zh) 双视角图神经网络模型的训练方法、装置、设备及介质
WO2023000574A1 (zh) 一种模型训练方法、装置、设备及可读存储介质
WO2023087558A1 (zh) 基于嵌入平滑图神经网络的小样本遥感图像场景分类方法
WO2019228317A1 (zh) 人脸识别方法、装置及计算机可读介质
WO2020232877A1 (zh) 一种问题答案选取方法、装置、计算机设备及存储介质
WO2021212749A1 (zh) 命名实体标注方法、装置、计算机设备和存储介质
US10510021B1 (en) Systems and methods for evaluating a loss function or a gradient of a loss function via dual decomposition
WO2022001805A1 (zh) 一种神经网络蒸馏方法及装置
CN110347932B (zh) 一种基于深度学习的跨网络用户对齐方法
CN114048331A (zh) 一种基于改进型kgat模型的知识图谱推荐方法及系统
WO2020049385A1 (en) Multi-view image clustering techniques using binary compression
CN113190688B (zh) 基于逻辑推理和图卷积的复杂网络链接预测方法及系统
WO2021103675A1 (zh) 神经网络的训练及人脸检测方法、装置、设备和存储介质
Beck et al. A distributed approximate nearest neighbors algorithm for efficient large scale mean shift clustering
Wu Image retrieval method based on deep learning semantic feature extraction and regularization softmax
CN116403730A (zh) 一种基于图神经网络的药物相互作用预测方法及系统
CN116668327A (zh) 基于动态再训练的小样本恶意流量分类增量学习方法及系统
Liu et al. EACP: An effective automatic channel pruning for neural networks
Gupta et al. Relevance feedback based online learning model for resource bottleneck prediction in cloud servers
CN110717116A (zh) 关系网络的链接预测方法及系统、设备、存储介质
Mishra et al. Unsupervised functional link artificial neural networks for cluster Analysis
WO2022267956A1 (zh) 基于矩阵分解和多划分对齐的多视图聚类方法及系统
Mi et al. Fedmdr: Federated model distillation with robust aggregation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22932850

Country of ref document: EP

Kind code of ref document: A1