CN113609337A - Pre-training method, device, equipment and medium of graph neural network - Google Patents

Pre-training method, device, equipment and medium of graph neural network Download PDF

Info

Publication number
CN113609337A
CN113609337A CN202110205745.1A CN202110205745A CN113609337A CN 113609337 A CN113609337 A CN 113609337A CN 202110205745 A CN202110205745 A CN 202110205745A CN 113609337 A CN113609337 A CN 113609337A
Authority
CN
China
Prior art keywords
node
neural network
nodes
data
characteristic data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110205745.1A
Other languages
Chinese (zh)
Inventor
荣钰
刘雪怡
徐挺洋
黄文炳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110205745.1A priority Critical patent/CN113609337A/en
Publication of CN113609337A publication Critical patent/CN113609337A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a pre-training method, a device, equipment and a medium of a graph neural network. The pre-training method includes the steps of selecting a plurality of nodes from a map, obtaining first node feature data and second node feature data of the nodes, predicting the second node feature data through the first node feature data and edge feature data between the nodes, and then updating the selected first nodes according to prediction results, so that a graph neural network is trained according to the updated node feature data of the first nodes. According to the prediction task based on the node-level characteristics, each round of nodes participating in the graph neural network training are optimized through the prediction result in the training process, so that the training task can better combine the characteristics of graph data, the generalization performance of the graph neural network after pre-training is improved, the follow-up fine tuning is conveniently carried out on different application tasks, and the consumption of computing resources is favorably reduced. The method and the device can be widely applied to the technical field of artificial intelligence.

Description

Pre-training method, device, equipment and medium of graph neural network
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a pre-training method, a device, equipment and a medium for a graph neural network.
Background
In recent years, artificial intelligence technology is rapidly developed, and a neural network has a good application effect in the fields of image classification, face recognition, automatic driving and the like. Among them, the Graph (Graph) is used as a data structure, and has been widely used in the fields of social networks, academic information, and biochemistry, and information such as interpersonal relationship, molecular structure, etc. can be conveniently expressed by using the Graph. The graph neural network is a neural network model applying a machine learning method to graph data, the data in the graph is input into the graph neural network, and the network can continuously aggregate information of surrounding nodes at each node to obtain the information of the node, so as to output a characterization vector of each node or the whole graph. Based on the characterization vectors output by the graph neural network, various machine learning tasks can be conveniently carried out, for example, the nodes can be classified according to the characterization vectors of the nodes through a machine learning algorithm so as to determine certain classes of the nodes.
In the related art, for specific tasks in different application fields, label information of a large number of nodes in the application field is often required to be collected for training to obtain an applicable graph neural network. However, the label information of the node is often expensive and difficult to obtain; and a great amount of computing resources are consumed for training the graph neural networks in different application fields one by one, so that the cost is high. In summary, there is a need to solve the technical problems in the related art.
Disclosure of Invention
The present application aims to solve at least one of the technical problems in the related art to some extent.
Therefore, an object of the embodiments of the present application is to provide a pre-training method for a graph neural network, which improves generalization performance of the graph neural network through pre-training, facilitates subsequent fine tuning for specific tasks in different application fields, and is beneficial to reducing consumption of computing resources.
In order to achieve the technical purpose, the technical scheme adopted by the embodiment of the application comprises the following steps:
in one aspect, an embodiment of the present application provides a pre-training method for a graph neural network, including the following steps:
acquiring edge characteristic data among all nodes in a map;
selecting a plurality of first nodes from the nodes, and acquiring node characteristic data of the first nodes; the node characteristic data comprises first node characteristic data and second node characteristic data; the first node characteristic data and the second node characteristic data have different data types;
inputting the first node feature data and the edge feature data into a graph neural network to obtain a first node characterization vector output by the graph neural network;
determining a first loss value predicted by the graph neural network for each first node according to the second node characteristic data and the first node characterization vector;
updating the first node according to the first loss value;
and training the graph neural network according to the updated node characteristic data of the first node.
On the other hand, the embodiment of the present application provides a training method for a molecular property prediction model, including the following steps:
acquiring attribute labels of molecules, atomic characteristic data of each atom in the molecules and chemical bond data among the atoms;
inputting the atomic feature data and the chemical bond data into a graph neural network obtained by pre-training according to the method, and obtaining a molecular characterization vector output by the graph neural network;
determining a third loss value predicted by the graph neural network according to the attribute labels and the molecular characterization vectors;
and updating the parameters of the graph neural network according to the third loss value to obtain a trained molecular attribute prediction model.
In another aspect, an embodiment of the present application provides a pre-training apparatus for a graph neural network, including:
the first acquisition module is used for acquiring edge characteristic data among all nodes in the map;
the selection module is used for selecting a plurality of first nodes from the nodes and acquiring node characteristic data of the first nodes; the node characteristic data comprises first node characteristic data and second node characteristic data; the first node characteristic data and the second node characteristic data have different data types;
the first prediction module is used for inputting the first node characteristic data and the edge characteristic data into a graph neural network to obtain a first node characterization vector output by the graph neural network;
a first processing module, configured to determine, according to the second node feature data and the first node characterization vector, a first loss value predicted by the graph neural network for each first node;
a first updating module, configured to update the first node according to the first loss value;
and the first training module is used for training the graph neural network according to the updated node characteristic data of the first node.
Optionally, in some embodiments, the selecting module includes:
and the first selection submodule is used for randomly selecting a plurality of first nodes from the nodes.
Optionally, in some embodiments, the selecting module includes:
the probability obtaining submodule is used for obtaining a first selection probability corresponding to each preset node;
and the second selection submodule is used for selecting the nodes according to the first selection probability to obtain a plurality of first nodes.
Optionally, in some embodiments, the first processing module comprises:
the first input submodule is used for inputting the first node characterization vector into a first neural network to obtain third node characteristic data output by the first neural network;
and the first loss value determining submodule is used for determining a first loss value corresponding to each first node according to the second node characteristic data and the third node characteristic data.
Optionally, in some embodiments, the first updating module comprises:
the sorting submodule is used for sorting the first loss value according to the size of the first loss value;
and the deleting submodule is used for deleting the first nodes with the maximum first loss value and the preset number to obtain the updated first nodes.
Optionally, in some embodiments, the first updating module comprises:
the normalization submodule is used for normalizing the first loss value corresponding to each first node to obtain a second selection probability corresponding to each first node;
and the third selection submodule is used for updating the first node according to the second selection probability to obtain an updated first node.
Optionally, in some embodiments, the first training module comprises:
the prediction submodule is used for inputting the edge characteristic data and the updated first node characteristic data of the first node into the graph neural network to obtain a second node characterization vector output by the graph neural network;
the processing submodule is used for determining a second loss value predicted by the graph neural network on each first node according to the updated second node characteristic data of the first node and the second node characterization vector;
and the updating submodule is used for updating the parameters of the graph neural network according to the second loss value.
Optionally, in some embodiments, the processing sub-module comprises:
the second input submodule is used for inputting the second node characterization vector into a first neural network to obtain fourth node characteristic data output by the first neural network;
and the second loss value determining submodule is used for determining a second loss value corresponding to each first node according to the fourth node characteristic data and the updated second node characteristic data of the first node.
Optionally, in some embodiments, the first training module comprises:
the node updating submodule is used for carrying out iterative updating on the first node according to the updated node characteristic data of the first node;
and the network training submodule is used for training the graph neural network according to the node characteristic data of the current first node when the iteration updating round of the first node reaches the preset number.
On the other hand, an embodiment of the present application provides a training apparatus for a molecular property prediction model, including:
the second acquisition module is used for acquiring attribute labels of molecules, atom characteristic data of each atom in the molecules and chemical bond data among the atoms;
the second prediction module is used for inputting the atomic characteristic data and the chemical bond data into the graph neural network obtained by pre-training according to the method, and obtaining the molecular characterization vector output by the graph neural network;
a second processing module, configured to determine a third loss value predicted by the neural network of the graph according to the attribute labels and the molecular characterization vectors;
and the second training module is used for updating the parameters of the graph neural network according to the third loss value to obtain a trained molecular attribute prediction model.
Optionally, in some embodiments, the second processing module comprises:
the third input submodule is used for inputting the molecular characterization vector into a second neural network to obtain an attribute prediction result output by the second neural network;
and the third loss value determining submodule is used for determining the third loss value according to the attribute label and the attribute prediction result.
In another aspect, an embodiment of the present application provides a computer device, including:
at least one processor;
at least one memory for storing at least one program;
when at least one of the programs is executed by at least one of the processors, a pre-training method of a graph neural network or a training method of a molecular property prediction model as described above is implemented.
In another aspect, the present application further provides a computer-readable storage medium, in which processor-executable instructions are stored, and when executed by a processor, the processor-executable instructions are used to implement the aforementioned pre-training method for a graph neural network or the training method for a molecular property prediction model.
In another aspect, the present application further provides a computer program product or a computer program, where the computer program product or the computer program includes computer instructions, and the computer instructions are stored in the computer-readable storage medium described above; the computer instructions may be readable by a processor of a computer device as described above from a computer readable storage medium as described above, and the computer instructions, when executed by the processor, cause the computer device to perform a method of pre-training a neural network or a method of training a molecular property prediction model as described above.
Advantages and benefits of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application:
according to the pre-training method for the graph neural network, a plurality of first nodes are selected from a graph, node characteristic data of the first nodes, namely the first node characteristic data and the second node characteristic data with different data types are obtained, the second node characteristic data are predicted according to the first node characteristic data and edge characteristic data between the nodes, then the selected first nodes are updated according to the prediction result, and therefore the graph neural network is trained according to the updated node characteristic data of the first nodes. According to the pre-training method provided by the embodiment of the application, the nodes participating in the graph neural network training are optimized according to the prediction result in the training process based on the node-level characteristic prediction task, so that the training task can enhance the information aggregation capability of the graph neural network around the nodes, the generalization performance of the graph neural network is improved, the follow-up fine adjustment can be conveniently carried out on different application tasks, and the consumption of computing resources is favorably reduced.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following description is made on the drawings of the embodiments of the present application or the related technical solutions in the prior art, and it should be understood that the drawings in the following description are only for convenience and clarity of describing some embodiments in the technical solutions of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic flowchart of a pre-training method of a neural network provided in an embodiment of the present application;
FIG. 2 is a schematic illustration of profile data provided in an embodiment of the present application;
fig. 3 is a schematic diagram illustrating a principle of updating a first node in a pre-training method of a graph neural network provided in an embodiment of the present application;
FIG. 4 is a schematic diagram illustrating a round of training a neural network in a pre-training method for the neural network provided in an embodiment of the present application;
fig. 5 is a schematic flowchart of a training method of a molecular property prediction model provided in an embodiment of the present application;
FIG. 6 is a schematic diagram illustrating the training principle of a machine learning prediction model in the related art;
FIG. 7 is a schematic diagram illustrating a training principle of a prediction model provided in an embodiment of the present application;
FIG. 8 is a schematic illustration of a map constructed based on chemical molecules provided in an example of the present application;
FIG. 9 is a schematic structural diagram illustrating a pre-training apparatus for neural network provided in an embodiment of the present application;
FIG. 10 is a schematic structural diagram of a training apparatus for molecular property prediction models provided in an embodiment of the present application;
fig. 11 is a schematic structural diagram of a computer device provided in an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.
Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described first, and the terms and expressions referred to in the embodiments of the present application will be applied to the following explanations.
Map (Graph): refers to a data form composed of many nodes (also called vertexes) connected with each other, where the nodes may be people, organizations, etc., and the connection (called edge) between the nodes represents some relationship (e.g. friend relationship, affiliation, etc.); a graph may have only one node and one edge (referred to as a single graph) or may have multiple nodes or multiple edges (referred to as an abnormal graph), wherein the edges in the graph may be directed edges (referred to as a directed graph) or undirected edges (referred to as an undirected graph).
Graph Neural Networks (Graph Neural Networks): in the graph-based machine learning method, the input can be defined as graph structure data or node characteristic data, and the output is a characterization vector aiming at each node or the whole graph.
Generalization ability (Generalization ability): refers to the ability of a machine learning algorithm to recognize unseen input samples.
A source domain: in the transfer learning process, a large amount of general knowledge exists in a knowledge domain where transferred knowledge is located for transfer learning.
Target domain: the knowledge domain to which the migrated knowledge is to be migrated in the migration learning process, namely, the field where the task is located in the machine learning application.
The pre-training method of the graph neural network provided by the embodiment of the application can be applied to the artificial intelligence technology. Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence basic technology generally comprises technologies such as a sensor, a special artificial intelligence chip, cloud computing, distributed storage, a big data processing technology, an operation/interaction system, electromechanical integration and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Specifically, the pre-training method of the graph neural network provided in the embodiment of the present application may be applied to various application scenarios in the field of artificial intelligence. Specifically, for example, in an application scenario of a social network, it is necessary to recommend other users that he/she may know to a user, a prediction model for interpersonal relationship may be established through a graph neural network obtained in the embodiment of the present application, and then the interpersonal relationship between the user and the other users is predicted through the prediction model, and relevant information of the other users that the user may know is given; in the application scene of medical research, the attributes of chemical molecules with different structures need to be analyzed, a prediction model aiming at the molecular attributes can be established through a graph neural network obtained in the embodiment of the application, and then the chemical properties of the molecules are predicted through the prediction model, so that the drug screening work is facilitated. It is to be understood that the above application scenarios are only exemplary and are not meant to limit the implementation of the pre-training method of the neural network in the embodiments of the present application. In the application scenes, the artificial intelligence system can utilize the graph neural network obtained by training through the pre-training method, and the graph neural network is finely adjusted through a small amount of training data in the appointed task scene, so that the needed prediction model can be obtained and used for executing the appointed task. The graph neural network obtained by the pre-training method provided by the embodiment of the application can achieve a more ideal effect by updating the parameters for several times in the application scenes.
In the embodiment of the application, the artificial intelligence technology mainly involved is machine learning.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is a fundamental approach for computers to have intelligence, and is applied to various fields of artificial intelligence, and algorithms are various in types. Wherein, machine learning can be divided into according to the study mode: supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning. The functions according to the algorithm can be divided into: regression algorithm, classification algorithm, clustering algorithm, dimension reduction algorithm, integration algorithm and the like.
With the development of artificial intelligence technology, neural networks are gradually emerging and used in various industries. Data used in traditional machine learning is data in an Euclidean space, the most significant characteristic of the data in the Euclidean space is a regular spatial structure, for example, a picture is a regular square grid, voice data is a one-dimensional sequence, the data can be represented by a one-dimensional or two-dimensional matrix, and data processing is simple. Moreover, the data above have a typical characteristic in application: the data are independent of each other. However, in an actual partial application scenario, the data may be in a spatial structure without rules, that is, there is data in a non-euclidean space, such as an abstracted graph of an electronic transaction, a recommendation system, a social network, and the like, the connection of each node to other nodes in the graph is not regular and fixed, and there may be information related to each other between the nodes. The graph neural network is a model specially used for processing data of graph types, and can model data in non-Euclidean space, capture internal dependency relations among the data and better generate a characterization vector aiming at nodes or graphs.
In the related art, although the graph neural network can achieve better performance in processing the data of the graph type, the training of the graph neural network is a large factor limiting the application of the graph neural network. For example, if a neural network model for predicting molecular attributes is to be trained, a large amount of molecular map data is required to be modeled, and attribute labels of molecules are made to construct a trained data set, which is difficult to popularize in various fields, on one hand, the cost for making acquisition labels is high, the acquisition of labels in some application fields is very difficult, and the obtained data volume is very limited; on the other hand, for different types of prediction tasks, if training the neural network of the graph from the foundation is long, actual application requirements may be difficult to meet, even if parallel computing needs to use a lot of hardware devices, a lot of computing resources are needed, and the benefit is low.
In view of this, an embodiment of the present application provides a pre-training method for a graph neural network, where the pre-training method selects a plurality of first nodes from a graph, obtains node feature data of the first nodes, where the node feature data includes first node feature data and second node feature data, predicts the second node feature data according to the first node feature data and edge feature data between the nodes, and then updates the selected first nodes according to a prediction result, so as to train the graph neural network according to the updated node feature data of the first nodes. The pre-training method provided by the embodiment of the application is used for pre-training the neural network of the graph based on the idea of transfer learning, so that the neural network of the graph can quickly achieve an ideal effect through fine tuning in specific application, the training speed is improved, and the training time cost and the consumption of computing resources for configuring a prediction model in different application fields are reduced; meanwhile, the pre-training method is based on a node-level characteristic prediction task, and the nodes participating in the graph neural network training are optimized according to the prediction result in the training process, so that the training task can enhance the information aggregation capability of the graph neural network around the nodes, the generalization performance of the graph neural network is improved, and the model has higher prediction precision after fine tuning.
The following describes a specific implementation of the embodiment of the present application with reference to the drawings, and first, a pre-training method of the graph neural network in the embodiment of the present application will be described.
The embodiment of the application provides a pre-training method of a graph neural network, which can be applied to a terminal, a server and software in the terminal or the server and is used for realizing part of software functions. In some embodiments, the terminal may be a smart phone, a tablet computer, a notebook computer, a desktop computer, or the like; the server can be configured into an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server for providing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, big data and artificial intelligence platforms; the software may be a stand-alone application or an applet depending on the host program, etc., but is not limited to the above. Fig. 1 is an alternative flowchart of a pre-training method of a graph neural network provided in an embodiment of the present application, where the method in fig. 1 includes steps 110 to 160.
Step 110, obtaining edge characteristic data among all nodes in the map;
as described above, the structure of the graph includes nodes and edges between the nodes, and the data in the graph is also divided into data at the nodes and data at the edges accordingly. The node feature data is used for representing the features of the nodes, the edge feature data is used for representing the relationship features between the nodes connected to two sides of the edge, and the meanings represented by the features can be different in different maps. It should be noted that, in the embodiment of the present application, the data structures of the node feature data and the edge feature data may include numerical values, vectors, or matrices, and the data forms may include numerical values, characters, graphics, images, sounds, and the like; the specific data structure and form can be flexibly set according to the requirements in the map building process. In this embodiment, node feature data may be denoted as V, and edge feature data may be denoted as E, and then one Graph (Graph) may be denoted as G (V, E).
For example, for a graph constructed according to the interpersonal relationship network, each node in the graph may represent a person, and the edges between the nodes may represent the interpersonal relationship. At this time, the node characteristic data V may include data representing related information of a person at the corresponding node, for example, data such as an age, a height, a weight, and the like of the person, and at this time, the node characteristic data V may be a numerical value, for example, a numerical value 12 represents an age of the person at the node; the data may be encoded according to the information related to the person, for example, features of the person such as sex, occupation, preference, and academic calendar may be encoded, and the encoded data may be used as the node feature data V, in this case, the node feature data V may be a vector, for example, a vector (0, 1) indicates that the sex of the person at the node is male, and a vector (1, 0) indicates that the sex of the person at the node is female. Similarly, the edge feature data E may include data representing relationship information of a person between two nodes, for example, a category of an interpersonal relationship may be encoded, and the obtained data may be used as the edge feature data E.
For a graph constructed according to a molecular structure, each node in the graph may represent an atom, and edges between the nodes may represent chemical bonds between atoms. In this case, the node characteristic data V may be data representing information on an atom at the corresponding node, and may include, for example, data representing the kind of the atom, the number of charges, the number of protons, the number of neutrons, and the like, and the edge characteristic data E may be data representing information on a chemical bond, and may include, for example, data representing the kind of the chemical bond, a valence state, and the like.
It should be noted that the node feature data V and the edge feature data E in the present application may include data of multiple data types, where the data types refer to information categories. For example, taking the graph constructed according to the interpersonal relationship network as an example, the node feature data V of the graph may include a data set composed of ages of persons at respective nodes, the data set is denoted as a1, and the information contained in a1 is age information; another data set consisting of the heights of the persons of the respective nodes may also be included, which is denoted as a2, and the information implied by a2 is height information. The information in the data set a1 and the data set a2 belongs to two different information categories, that is, a1 and a2 are data of two data types, respectively, and are used for reflecting the characteristics of each node from the age dimension and the height dimension, and the two data types can jointly form the node characteristic data V in the embodiment of the present application, and then the node characteristic data V can be represented as V ═ a1, a2 }. Similarly, the edge feature data E may also have data of multiple data types, and it is understood that the number of data types in this embodiment may be any integer greater than or equal to 2; and different data types of data, data structures and data forms thereof may be different.
Step 120, selecting a plurality of first nodes from the nodes, and acquiring node characteristic data of the first nodes; the node characteristic data comprises first node characteristic data and second node characteristic data; the first node characteristic data and the second node characteristic data have different data types;
in the embodiment of the application, when the graph neural network is pre-trained, nodes participating in training when parameters of each round of the graph neural network are updated are selected, namely node feature data of a part of nodes and edge feature data of a graph are adopted to train the capability of the graph neural network for generating node characterization vectors of all the nodes, a pre-training task is also a prediction task of node level features, and the selected nodes participating in training are marked as first nodes. As described above, in the embodiment of the present application, node feature data of a node may have a plurality of different data types, and therefore, node feature data of a first node may be divided into two groups according to different data types, where one group of node feature data is used to provide input information and is denoted as first node feature data, the first node feature data is input to a Graph Neural Network (GNN), features of other data types of the node are predicted by the Graph Neural network, and the other group of node feature data is used as a label of a feature of a target data type to be predicted by the Graph Neural network and is denoted as second node feature data.
Specifically, still taking the aforementioned graph constructed according to the interpersonal relationship network as an example, for the node feature data of the graph, including the data set a1 for representing the age of the person and the data set a2 for representing the height of the person, the data set a1 may be used as the first node feature data, and the data set a2 may be used as the second node feature data. Of course, it is understood that, for node characteristic data having a plurality of data types, the number of data types of the first node characteristic data and the second node characteristic data obtained by dividing the node characteristic data may be any integer greater than or equal to 1.
Referring to fig. 2, fig. 2 shows a schematic structural diagram of a graph, which includes 4 nodes, namely an a node 211, a B node 212, a C node 213, and a D node 214, where a first edge 221 exists between the a node 211 and the B node 212, a second edge 222 exists between the B node 212 and the C node 213, and a third edge 223 exists between the C node 213 and the D node 214. Taking the graph shown in fig. 2 as an example, in some embodiments, when the first node participating in the training is selected, a specified number of first nodes may be randomly selected from all nodes in the graph, for example, three nodes may be randomly selected from the a node 211, the B node 212, the C node 213, and the D node 214 as the first nodes, where the probability that each node is selected is the same. In some embodiments, a selection probability corresponding to each node may also be preset and recorded as a first selection probability, and a specified number of first nodes are selected in batch from the nodes through the first selection probability, in this case, when part of nodes are selected from all nodes as the first nodes, the higher the first selection probability corresponding to each node is, the higher the possibility that the node is selected as the first node is, and conversely, the lower the first selection probability corresponding to the node is, the lower the possibility that the node is selected as the first node is. For example, for four nodes, i.e., the a node 211, the B node 212, the C node 213, and the D node 214, the first selection probabilities corresponding to the four nodes may be set to be 0.3, 0.4, 0.2, and 0.1 in sequence, and if three nodes are selected as the first node, the most likely selected node is the a node 211, the B node 212, the C node 213, and the D node 214 is not the first node.
It should be noted that, for convenience of description, the specific numerical value of the first selection probability in the embodiment of the present application is set for a case where one first node is selected from all nodes, and does not represent an actual probability that the node is selected as the first node. For example, even if the first selection probability corresponding to the D node 214 is 0.1, if four nodes are selected as the first node, the four nodes of the a node 211, the B node 212, the C node 213, and the D node 214 are all necessarily selected as the first node, and the probability that the D node 214 is actually selected as the first node is 100%. In addition, during practical application, the first selection probability corresponding to each node can be flexibly set and adjusted according to needs, and can not be fixed as a specific numerical value.
Step 130, inputting the first node feature data and the edge feature data into a graph neural network to obtain a first node characterization vector output by the graph neural network;
in the embodiment of the application, after the first node is selected and obtained, the first node feature data of the first node and the edge feature data of the edge are input into the graph neural network for prediction, the graph neural network can spontaneously aggregate information around the first node according to the edge feature data and part of the node feature data of the first node, namely the first node feature data, so that the vector representation of the feature of the first node in a high-dimensional space is obtained, and the vector is recorded as a first node representation vector. This process can be expressed as:
Figure BDA0002950496070000111
in the formula (I), the compound is shown in the specification,
Figure BDA0002950496070000112
representing a first node characterization vector, GNN (,) representing a graph neural network, E representing edge feature data, and X representing first node feature data.
In the actual training process, the target characteristics predicted by the first node characterization vector can be set to be the data type of the second node characteristic data through characteristic engineering, so that the graph neural network can conveniently judge the prediction precision of the first node characterization vector according to the second node characteristic data. For example, the first node characteristic data includes two data types, weight data and height data of the person, respectively, and the second node characteristic data includes one data type, data indicating the sex of the person. In this case, the first node characterization vector may be set to represent gender information of a person through feature engineering. And determining the prediction accuracy of the graph neural network by comparing whether the information contained in the first node characterization vector and the second node characteristic data is consistent.
Step 140, determining a first loss value predicted by the neural network of the graph to each first node according to the second node characteristic data and the first node characterization vector;
in the embodiment of the application, after the graph neural network obtains the first node characterization vector based on the first node characteristic data prediction, the prediction precision of the first node characterization vector can be judged according to the second node characteristic data. Specifically, since the first node characterization vector is a vector representation of the feature of the first node in the high-dimensional space, in some embodiments, the first node characterization vector needs to be further processed through a subsequent classification model or regression model to obtain numerical information or category information contained in the first node characterization vector, which is convenient for comparing the second node feature data. Therefore, in the embodiment of the present application, when determining the prediction accuracy of the graph neural network for each first node, the first node characterization vector may be input into a simple neural network, and the neural network is denoted as a first neural network, where the first neural network functions to convert the first node characterization vector into an output corresponding to the data type of the second node characteristic data.
For example, if a first node characterization vector is set to represent gender information of a person, after the first node characterization vector is output to a first neural network, the first neural network outputs result data regarding gender classification. In some embodiments, the output of the first neural network may be a numerical value, for example, when the first neural network outputs 0, it indicates that the gender of the first node is male as a result of the first neural network processing the first node characterization vector; when the first neural network outputs 1, the result obtained after the first neural network processes the first node characterization vector is that the gender of the first node is female. In the embodiment of the application, output data predicted by the first neural network is recorded as third node characteristic data. Thus, the process of obtaining the third node characteristic data can be expressed as:
Figure BDA0002950496070000121
in the formula (I), the compound is shown in the specification,
Figure BDA0002950496070000122
representing third node characteristic data, f () representing a first neural network,
Figure BDA0002950496070000123
representing a first node characterization vector.
Based on the above description, it can be understood that, in the embodiment of the present application, the accuracy of the first node characterization vector generated by the graph neural network can be specifically determined by the consistency of the third node characteristic data and the second node characteristic data. The consistency of the third node characteristic data and the second node characteristic data can be measured by the size of the loss value.
In the field of machine learning, the Loss value can be determined by a Loss Function (Loss Function). The loss function is defined on a single training datum, and for the embodiment of the application, each first node corresponds to a first loss value, which is used for measuring the prediction error of the neural network of the graph on the first node characterization vector of the first node. There are many kinds of commonly used loss functions, for example, 0-1 loss function, square loss function, cross entropy loss function, absolute loss function, logarithmic loss function, etc. can be used as the loss function in the embodiment of the present application, and are not described herein. In the embodiment of the present application, a loss function can be optionally selected to determine the first loss value. Taking the example of determining the first loss value by adopting a 0-1 loss function, at this time, when the results of the third node characteristic data and the second node characteristic data are consistent, the first loss value is 0; when the results of the third node characteristic data and the second node characteristic data are not identical, the first loss value is 1. In this embodiment of the present application, a cross entropy loss function may also be used to determine the first loss value, where the first loss value may be represented as:
Figure BDA0002950496070000124
where loss represents the first loss value, Cross Entrophy (,) represents the cross entropy loss function,
Figure BDA0002950496070000125
the characteristic data of the second node is represented,
Figure BDA0002950496070000126
representing third node characteristic data.
Step 150, updating the first node according to the first loss value;
in the embodiment of the application, the first nodes participating in the pre-training of the graph neural network are updated according to the first loss value, and some first nodes with poorer prediction effects of the graph neural network can be effectively screened according to the condition of the first loss value. The larger the loss value corresponding to the first node is, the weaker the ability of the graph neural network to aggregate the relevant information around the node is, so that the first node can be selected to be deleted, the graph neural network cannot acquire part of the characteristics of the node, and then the graph neural network is trained to enhance the information aggregation ability of the graph neural network around the node, and the generalization performance of the graph neural network is improved. In the embodiment of the application, when the first nodes to be deleted are screened, the first nodes to be deleted may be sorted according to the size of the first loss value corresponding to each first node, and then the first nodes with the larger first loss values are deleted. Specifically, for example, the first loss values may be sorted from large to small according to the size of the first loss value, and then a number of first nodes ranked at the top are deleted; similarly, the first nodes ranked last can be sorted from small to large, and then deleted.
For example, assuming that, for the graph of fig. 2, the a node 211, the B node 212, and the C node 213 are selected as first nodes, the first loss value corresponding to the a node 211 is 5, the first loss value corresponding to the B node 212 is 1, and the first loss value corresponding to the C node 213 is 14, if one of the three first nodes needs to be deleted, in some embodiments, since the first loss value corresponding to the C node 213 is the largest, the C node 213 may be directly selected to be deleted.
In other embodiments, when the first nodes to be deleted are screened, the probability of being screened may be determined according to the first loss value corresponding to each first node, and the probability of being deleted of the first nodes may be recorded as a second selection probability, where the second selection probability may be obtained by normalizing the first loss value corresponding to the first node. The above-mentioned first loss value corresponding to the a node 211 is 5, the first loss value corresponding to the B node 212 is 1, and the first loss value corresponding to the C node 213 is 14. At this time, after the first loss values corresponding to the three first nodes are normalized, the magnitude of the second selection probability corresponding to the node a 211 is 0.25, the magnitude of the second selection probability corresponding to the node B212 is 0.05, and the magnitude of the second selection probability corresponding to the node C213 is 0.7. And then, according to the numerical value of the second selection probability, the node needing to be deleted can be selected from the first node, so that the first node is updated once.
It should be noted that, the deletion in the embodiment of the present application refers to not taking the node as the first node, that is, taking the node as a normal node, and not using the node feature data of the node to participate in the current round of the graph neural network training. In an actual implementation process, when the first node is updated, the number of the first nodes deleted each time may be any integer greater than or equal to 1.
Referring to fig. 3, in some embodiments, the process of screening the first node may be performed in a round-robin fashion. That is, the above steps 130, 140, and 150 may be performed in a plurality of cycles to perform iterative selection on the first node, after the first node is initially selected, a part of the first nodes are screened in batch in each iteration, then the screening is terminated when the iteration number reaches a preset number, and the node feature data of the first node obtained at the termination is used as the training data of the parameters participating in updating the graph neural network. For example, for a certain graph including 100 nodes, during initialization, all the nodes may be used as first nodes, then each first node is predicted through a graph neural network, a predicted first loss value is determined, then the first nodes are screened according to the first loss value, for example, two nodes with the largest loss value are screened, and after a first round of iterative screening, 98 first nodes remain. And predicting and screening the 98 nodes again through the neural network until the iteration turns reach the preset times. If the preset number of times is 10 rounds, after the 10 th round of screening is finished, a total of 80 first nodes are obtained. It should be understood that, in the embodiment of the present application, the iteration round of screening the first node, that is, the preset number of times, and the number of the first nodes deleted in each iteration screening round may be flexibly adjusted according to specific needs, and are not fixed to the situations shown in the above specific embodiments.
And 160, training the graph neural network according to the updated node characteristic data of the first node.
In the embodiment of the application, after the updated first node is determined, the accuracy of the prediction result of the graph neural network can be evaluated according to the updated first characteristic data and the updated second characteristic data of the first node, so that the parameters of the graph neural network are updated through a back propagation algorithm.
In particular, when updating the parameters of the graph neural network, it is necessary to determine the accuracy of predicting the current characteristics of the first node by the graph neural network. This process is similar to the aforementioned steps 130 and 140, i.e. the loss value predicted by the neural network for the current first node needs to be determined, and the loss value is recorded as the second loss value.
When the second loss value is determined, firstly, the updated first node feature data of the first node and the edge feature data of the graph are input into the graph neural network, vector representation of the feature of the updated first node in a high-dimensional space output by the graph neural network is obtained, and the vectors are recorded as second node characterization vectors. The second node characterization vector may then be input to the preceding first neural network, so that the first neural network converts the second node characterization vector into a data output corresponding to the updated second node characteristic data type of the first node, and records the output data of the first neural network for the second node characterization vector as fourth node characteristic data. In the embodiment of the application, based on the consistency between the fourth node feature data and the updated second node feature data of the first node, the accuracy of the second node characterization vector generated by the graph neural network can be determined, that is, the second loss value can be determined according to the fourth node feature data and the updated second node feature data of the first node.
For the graph neural network, the accuracy of the model for predicting the single first node feature can be measured by solving a second loss value through a loss function, wherein the loss function is defined on the node feature data of the single node and is used for measuring the prediction error of one training data. In actual training, a map has many nodes, so a Cost Function (Cost Function) can be generally adopted to measure the overall error of the training process, the Cost Function is defined on the whole training data set and is used for calculating the average value of the prediction errors of all training data, and the prediction effect of the model can be measured better. For the graph neural network in the embodiment of the present application, a mean value of second loss values corresponding to each updated first node may be obtained as a result of the cost function, and then parameters of the primary model are updated through a back propagation algorithm according to the result.
The pre-training process of the neural network in the embodiment of the present application is described in detail below with reference to specific embodiments.
Referring to fig. 4, the pre-training of the neural network of the present embodiment includes two iterative loop processes, one is a first node selection process 41 and one is a parameter update process 42 of the neural network. The first node selecting process 41 can also be regarded as a process of selecting nodes that do not belong to the first node, that is, selecting several nodes to be deleted from the first node in each iteration. Assuming that one of the maps has N nodes, K rounds of iteration are performed in total, and N nodes are deleted in each round. A probability vector may be constructed
Figure BDA0002950496070000151
Figure BDA0002950496070000152
Each element in (a) represents the probability that each node is selected for deletion in the current round. At the time of initialization, the mobile terminal is connected with the mobile terminal,
Figure BDA0002950496070000153
the numerical values of the respective elements in (1) may be the same, and the sum of all the elements is 1, i.e.
Figure BDA0002950496070000154
The probability of each node being deleted is equal and is 1/N. In the first round of selection, according to the probability vector
Figure BDA0002950496070000155
Selecting n nodes to be deleted, and calculating the output of the first neural network after the nodes are deleted, namely the feature data of a third node:
Figure BDA0002950496070000156
then, calculating the cross entropy of the third node characteristic data and the second node characteristic data, normalizing the numerical value of the cross entropy corresponding to each node, wherein the obtained numerical value is the probability of deleting the node in the next round, and thus, completing the probability vector
Figure BDA0002950496070000157
One update round (probability of node already selected is set to 0). Similarly, in each subsequent iteration process, the selected probability is updated according to the loss value predicted by the neural network of the graph at each node. When the iteration round reaches K rounds, the first node is considered to be updated, and the graph neural network parameter updating process 42 is entered. The parameter updating process of each round of the neural network has been described previously, and is not described herein again.
It should be noted that for training of the neural network, it may need to perform multiple rounds of parameter updating to achieve a desirable effect. In the embodiment of the present application, steps 110 to 160 describe in detail a parameter updating process of a round of pre-training of the neural network of the graph, and in the training process, when a round of first nodes is screened, that is, the updated first nodes are obtained, multiple rounds of iterative updating of the model parameters may be completed according to the part of first nodes; or after each round of model parameter updating is completed, the first node is selected again according to the steps 110 to 160 to complete a round of training of the neural network. In some embodiments, whether the training of the graph neural network is completed may be determined according to an update round of model parameters of the graph neural network, and when an iteration round of training is reached, the training of the graph neural network may be considered to be completed; in some embodiments, whether the training of the graph neural network is completed or not may also be determined according to the prediction accuracy of the graph neural network, and when a certain round of training is completed, it is verified through the test data set that the prediction accuracy of the graph neural network reaches a predetermined threshold, the training of the graph neural network may be considered to be completed. In addition, in the practical training process, the parameter updating can be synchronously performed on the first neural network, and the graph neural network and the first neural network are trained as a whole, so that the accuracy of the prediction output of the graph neural network is improved.
Referring to fig. 5, in the embodiment of the present application, a method for training a molecular property prediction model is further provided, and similarly, the method may also be applied to a terminal, a server, and software in the terminal or the server, so as to implement a part of software functions. Fig. 5 is an alternative flowchart of a training method of a molecular property prediction model provided in the embodiment of the present application, and the method in fig. 5 includes steps 510 to 540.
Step 510, acquiring attribute labels of molecules, atom characteristic data of atoms in the molecules and chemical bond data among the atoms;
step 520, inputting the atom characteristic data and the chemical bond data into a graph neural network obtained by pre-training in the method shown in fig. 1 to obtain a molecular characterization vector output by the graph neural network;
step 530, determining a third loss value predicted by the neural network of the graph according to the attribute labels and the molecular characterization vectors;
and 540, updating the parameters of the graph neural network according to the third loss value to obtain a trained molecular attribute prediction model.
In the embodiment of the present application, the graph neural network obtained by training the pre-training method shown in fig. 1 may be applied to a specific task. In the technical field of artificial intelligence, pre-training can better solve the problem of insufficient training data labels, and the essence of the pre-training belongs to a transfer learning strategy. Referring to fig. 6 and 7, fig. 6 illustrates a training process of a conventional machine learning model, which requires a large amount of labeled sample data to support and for each target domain task, a corresponding prediction model is trained from initialization; fig. 7 shows a training process of the training method provided in the embodiment of the present application, that is, a migration learning strategy is adopted, a general graph neural network is obtained through pre-training of a data set of a source domain, and then, for a specific application scenario, that is, a target domain task, the pre-trained graph neural network is subjected to fine tuning through a data set of a small number of target domains, so as to obtain a prediction model that can be used for the target domain task.
Specifically, the migration learning is to migrate related task processing experiences from other source domain tasks and generate a prediction model for a target task. Here, the source domain refers to a knowledge domain which has accumulated a large amount of experience knowledge and can be used for other tasks to learn; the target domain refers to a knowledge domain that may contain only a small amount of empirical knowledge and requires migratory learning from the source domain. For example, assuming that the existing image data includes cats and tigers, wherein the cat image data includes a large number of cat category labels, the prediction model can effectively master the cat classification task in supervised learning based on the labeled cat image data, while the current objective task is to classify the tiger categories, and the existing tiger image data includes only a small number of tiger category labels, which makes it difficult to effectively perform supervised learning. The transfer learning can be used for model training of the tiger classification task by transferring partial knowledge mastered by a prediction model in the cat classification task. In this case, the image data of the cat corresponds to the training data of the source domain, and the image data of the tiger corresponds to the training data of the target domain. The transfer learning utilizes the relevance among different problems to transfer the knowledge of one or more source domains to a target domain and share the learned information among different tasks so as to improve the generalization performance of model training. The essence of the method is that the single-task learning of the traditional machine learning is converted into the multi-task learning mode, the similarity between a new problem and an existing problem is found, and the experience knowledge accumulated by the existing problem is transferred to the new problem by utilizing the similarity. For the training method in the embodiment of the application, the source domain can be determined from the field with rich atlas data, so that a large amount of simple and easily-obtained labeled data can be obtained, the pre-training cost of the graph neural network is reduced, and the trained graph neural network can be used for realizing prediction models of a plurality of different tasks, so that each prediction model does not need to be trained from initialization, and the training time consumption and the calculation resource consumption of the model are greatly saved.
For the application scenario of molecular property prediction in the embodiment of the present application, the graph neural network on which the molecular property prediction depends may be trained by any source domain. For the task of predicting molecular properties, a map can be constructed according to the structure of the molecule to be predicted. Referring to fig. 8, fig. 8 shows a schematic diagram of a profile constructed from nicotine molecules having the chemical expression C10H4N2Atoms in chemical molecules can be used as nodes on the graph; chemical bonds between atoms as edges on the graph; the related attributes of atoms, such as the number of charges, the number of protons, the number of neutrons and the like are modeled into node characteristic data of a node where the atom is located, and the chemical bond attributes between atoms, such as the chemical bond type, the chemical bond valence and the like, are modeled into edge characteristic data of edges. This allows the chemical molecule to be modeled as a map. Based on the data in the map, a vector representation of the molecule in a high-dimensional space can be generated through map neural network prediction, and the vector is marked as a molecule characterization vector. Here, when generating the molecular characterization vectors, the characterization vectors of each atom may be generated first, and then the characterization vectors of all atoms are aggregated through an Aggregation layer, so as to obtain the molecular componentsThe sub-token vectors. Specifically, the process of obtaining molecular characterization vectors can be expressed as:
Figure BDA0002950496070000171
wherein g represents a molecular characterization vector,
Figure BDA0002950496070000172
representing the transpose of the atomic token vector. Here, the token vectors of the atoms may be spliced in other manners, such as maximum pooling, self-attention mechanism, and the like.
In the embodiment of the application, when the graph neural network is finely tuned to obtain the molecular attribute prediction model, firstly, the atom characteristic data of each atom in a molecule and the chemical bond data among the atoms are input into the graph neural network to obtain the molecular characterization vector output by the graph neural network. The molecular characterization vectors can then be input into a simple neural network, which is denoted as a second neural network, which functions to map the molecular characterization vectors to specific attribute classes, thereby obtaining attribute predictions. For example, the second neural network may be a multi-level perceptual classifier (MLP), and then the attribute prediction result may be obtained by the classifier, and the process may be expressed as: oi=MLP(gi) In the formula, oiRepresenting the prediction of the properties of the ith molecule, giRepresents a molecular characterization vector of the ith molecule, and MLP () represents a multi-layered perceptual classifier.
In the embodiment of the present application, based on the attribute prediction result and the attribute label of the molecule, a predicted loss value of the molecular attribute prediction model may be determined and recorded as a third loss value, a calculation manner of the third loss value is similar to a calculation manner of the first loss value, and the third loss value may also be determined by using a cross entropy loss function, which may specifically be represented as:
loss(yi,oi)=CrossEntropy(yi,oi);
wherein, loss (y)i,oi) Indicates the third loss value corresponding to the ith molecule, and crossEntrophy (,) indicates crossoverEntropy loss function, yiAttribute tag representing the ith molecule, oiAnd (4) representing the attribute prediction result of the ith molecule. And performing simple several rounds of iterative training on the molecular attribute model according to the third loss value to obtain a trained molecular attribute prediction model. The specific training process is similar to the training process of the neural network in the previous figure, and is not described in detail here.
Referring to fig. 9, an embodiment of the present application further discloses a pre-training apparatus for a graph neural network, including:
a first obtaining module 910, configured to obtain edge feature data between nodes in a graph;
a selecting module 920, configured to select a plurality of first nodes from the nodes and obtain node characteristic data of the first nodes; the node characteristic data comprises first node characteristic data and second node characteristic data; the first node characteristic data and the second node characteristic data have different data types;
a first prediction module 930, configured to input the first node feature data and the edge feature data into the graph neural network, so as to obtain a first node characterization vector output by the graph neural network;
a first processing module 940, configured to determine, according to the second node feature data and the first node characterization vector, a first loss value predicted by the graph neural network for each first node;
a first updating module 950, configured to update the first node according to the first loss value;
the first training module 960 is configured to train the graph neural network according to the updated node feature data of the first node.
Optionally, in some embodiments, the selecting module 920 includes:
and the first selection submodule is used for randomly selecting a plurality of first nodes from the nodes.
Optionally, in some embodiments, the selecting module includes:
the probability obtaining submodule is used for obtaining a first selection probability corresponding to each preset node;
and the second selection submodule is used for selecting the nodes according to the first selection probability to obtain a plurality of first nodes.
Optionally, in some embodiments, the first processing module 940 includes:
the first input submodule is used for inputting the first node characterization vector into the first neural network to obtain third node characteristic data output by the first neural network;
and the first loss value determining submodule is used for determining a first loss value corresponding to each first node according to the second node characteristic data and the third node characteristic data.
Optionally, in some embodiments, the first updating module 950 includes:
the sorting submodule is used for sorting the first loss value according to the size of the first loss value;
and the deleting submodule is used for deleting the first nodes with the maximum first loss value and the preset number to obtain the updated first nodes.
Optionally, in some embodiments, the first updating module 950 includes:
the normalization submodule is used for normalizing the first loss value corresponding to each first node to obtain a second selection probability corresponding to each first node;
and the third selection submodule is used for updating the first node according to the second selection probability to obtain the updated first node.
Optionally, in some embodiments, the first training module 960 includes:
the prediction submodule is used for inputting the edge characteristic data and the updated first node characteristic data of the first node into the graph neural network to obtain a second node characterization vector output by the graph neural network;
the processing submodule is used for determining a second loss value predicted by the graph neural network on each first node according to the updated second node characteristic data of the first node and the second node characterization vector;
and the updating submodule is used for updating the parameters of the graph neural network according to the second loss value.
Optionally, in some embodiments, the processing submodule includes:
the second input submodule is used for inputting the second node characterization vector into the first neural network to obtain fourth node characteristic data output by the first neural network;
and the second loss value determining submodule is used for determining a second loss value corresponding to each first node according to the fourth node characteristic data and the updated second node characteristic data of the first node.
Optionally, in some embodiments, the first training module 960 includes:
the node updating submodule is used for carrying out iterative updating on the first node according to the updated node characteristic data of the first node;
and the network training submodule is used for training the graph neural network according to the current node characteristic data of the first node when the iteration updating round of the first node reaches the preset number.
It can be understood that the contents in the embodiment of the pre-training method for neural network shown in fig. 1 are all applicable to the embodiment of the apparatus, the functions implemented in the embodiment of the apparatus are the same as those in the embodiment of the pre-training method for neural network shown in fig. 1, and the beneficial effects achieved by the embodiment of the pre-training method for neural network shown in fig. 1 are also the same as those achieved by the embodiment of the pre-training method for neural network shown in fig. 1.
Referring to fig. 10, an embodiment of the present application further discloses a training apparatus for a molecular property prediction model, including:
a second obtaining module 1010, configured to obtain attribute labels of the molecules, atom feature data of each atom in the molecules, and chemical bond data between the atoms;
a second prediction module 1020, configured to input the atomic feature data and the chemical bond data into the graph neural network obtained by pre-training in the method shown in fig. 1, so as to obtain a molecular characterization vector output by the graph neural network;
a second processing module 1030, configured to determine a third loss value predicted by the neural network of the graph according to the attribute labels and the molecular characterization vectors;
and the second training module 1040 is configured to update parameters of the neural network according to the third loss value, so as to obtain a trained molecular attribute prediction model.
Optionally, in some embodiments, the second processing module 1030 comprises:
the third input submodule is used for inputting the molecular characterization vector into the second neural network to obtain an attribute prediction result output by the second neural network;
and the third loss value determining submodule is used for determining a third loss value according to the attribute label and the attribute prediction result.
It is understood that the contents in the embodiment of the training method of the molecular property prediction model shown in fig. 5 are all applicable to the embodiment of the present apparatus, the functions implemented in the embodiment of the present apparatus are the same as those in the embodiment of the training method of the molecular property prediction model shown in fig. 5, and the beneficial effects achieved are also the same as those achieved in the embodiment of the training method of the molecular property prediction model shown in fig. 5.
Referring to fig. 11, an embodiment of the present application further discloses a computer device, including:
at least one processor 1110;
at least one memory 1120 for storing at least one program;
when the at least one program is executed by the at least one processor 1110, the at least one processor 1110 may be caused to implement the method embodiment for pre-training a neural network as illustrated in FIG. 1 or the method embodiment for training a molecular property prediction model as illustrated in FIG. 5.
It is understood that the contents of the pre-training method embodiment of the neural network shown in fig. 1 or the training method embodiment of the molecular property prediction model shown in fig. 5 are all applicable to the embodiment of the computer device, the functions implemented by the embodiment of the computer device are the same as those of the pre-training method embodiment of the neural network shown in fig. 1 or the training method embodiment of the molecular property prediction model shown in fig. 5, and the beneficial effects achieved by the embodiment of the pre-training method of the neural network shown in fig. 1 or the training method embodiment of the molecular property prediction model shown in fig. 5 are also the same.
Also disclosed in an embodiment of the present application is a computer-readable storage medium, in which a program executable by a processor is stored, and the program executable by the processor is used for implementing the embodiment of the pre-training method of the graph neural network shown in fig. 1 or the embodiment of the training method of the molecular property prediction model shown in fig. 5 when being executed by the processor.
It is understood that the contents of the pre-training method embodiment of the graph neural network shown in fig. 1 or the training method embodiment of the molecular property prediction model shown in fig. 5 are all applicable to the computer-readable storage medium embodiment, the functions implemented by the computer-readable storage medium embodiment are the same as the pre-training method embodiment of the graph neural network shown in fig. 1 or the training method embodiment of the molecular property prediction model shown in fig. 5, and the beneficial effects achieved are the same as the beneficial effects achieved by the pre-training method embodiment of the graph neural network shown in fig. 1 or the training method embodiment of the molecular property prediction model shown in fig. 5.
The embodiment of the application also discloses a computer program product or a computer program, which comprises computer instructions, wherein the computer instructions are stored in the computer readable storage medium; the computer instructions may be read by a processor of the computer apparatus shown in fig. 11 from the computer-readable storage medium described above, and the computer instructions, when executed by the processor, cause the computer apparatus to perform the pre-training method of the neural network shown in fig. 1 or the training method of the molecular property prediction model shown in fig. 5.
It is understood that the contents of the pre-training method embodiment of the neural network shown in fig. 1 or the training method embodiment of the molecular property prediction model shown in fig. 5 are all applicable to the computer program product or the computer program embodiment, the functions implemented by the computer program product or the computer program embodiment are the same as the pre-training method embodiment of the neural network shown in fig. 1 or the training method embodiment of the molecular property prediction model shown in fig. 5, and the beneficial effects achieved by the pre-training method embodiment of the neural network shown in fig. 1 or the training method embodiment of the molecular property prediction model shown in fig. 5 are also the same.
In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.
Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the functions and/or features may be integrated in a single physical device and/or software module, or one or more of the functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the foregoing description of the specification, reference to the description of "one embodiment/example," "another embodiment/example," or "certain embodiments/examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (15)

1. A pre-training method of a graph neural network is characterized by comprising the following steps:
acquiring edge characteristic data among all nodes in a map;
selecting a plurality of first nodes from the nodes, and acquiring node characteristic data of the first nodes; the node characteristic data comprises first node characteristic data and second node characteristic data; the first node characteristic data and the second node characteristic data have different data types;
inputting the first node feature data and the edge feature data into a graph neural network to obtain a first node characterization vector output by the graph neural network;
determining a first loss value predicted by the graph neural network for each first node according to the second node characteristic data and the first node characterization vector;
updating the first node according to the first loss value;
and training the graph neural network according to the updated node characteristic data of the first node.
2. The method of claim 1, wherein said selecting a number of first nodes from said nodes comprises:
and randomly selecting a plurality of first nodes from the nodes.
3. The method of claim 1, wherein said selecting a number of first nodes from said nodes comprises:
acquiring a first preset selection probability corresponding to each node;
and selecting the nodes according to the first selection probability to obtain a plurality of first nodes.
4. The method of claim 1, wherein determining the first loss value predicted by the graph neural network for each of the first nodes based on the second node characterization data and the first node characterization vector comprises:
inputting the first node characterization vector into a first neural network to obtain third node characteristic data output by the first neural network;
and determining a first loss value corresponding to each first node according to the second node characteristic data and the third node characteristic data.
5. The method of any of claims 1-4, wherein the updating the first node based on the first penalty value comprises:
sorting the first loss values according to the size of the first loss values;
and deleting the first nodes with the maximum first loss value and the preset number to obtain the updated first nodes.
6. The method of any of claims 1-4, wherein the updating the first node based on the first penalty value comprises:
normalizing the first loss value corresponding to each first node to obtain a second selection probability corresponding to each first node;
and updating the first node according to the second selection probability to obtain an updated first node.
7. The method of claim 1, wherein training the graph neural network according to the updated node feature data of the first node comprises:
inputting the edge feature data and the updated first node feature data of the first node into the graph neural network to obtain a second node characterization vector output by the graph neural network;
determining a second loss value predicted by the graph neural network for each first node according to the updated second node characteristic data of the first node and the second node characterization vector;
and updating the parameters of the graph neural network according to the second loss value.
8. The method of claim 7, wherein determining the second loss value predicted by the neural network for each of the first nodes based on the updated second node feature data for the first node and the second node characterization vector comprises:
inputting the second node characterization vector into a first neural network to obtain fourth node characteristic data output by the first neural network;
and determining a second loss value corresponding to each first node according to the fourth node characteristic data and the updated second node characteristic data of the first node.
9. The method of claim 1, wherein training the graph neural network according to the updated node feature data of the first node comprises:
performing iterative updating on the first node according to the updated node characteristic data of the first node;
and when the iteration updating turn of the first node reaches a preset number, training the graph neural network according to the node characteristic data of the current first node.
10. A training method of a molecular property prediction model is characterized by comprising the following steps:
acquiring attribute labels of molecules, atomic characteristic data of each atom in the molecules and chemical bond data among the atoms;
inputting the atomic feature data and the chemical bond data into a graph neural network pre-trained by the method of any one of claims 1-9, to obtain molecular characterization vectors output by the graph neural network;
determining a third loss value predicted by the graph neural network according to the attribute labels and the molecular characterization vectors;
and updating the parameters of the graph neural network according to the third loss value to obtain a trained molecular attribute prediction model.
11. The method of claim 10, wherein determining a third loss value for the neural network prediction based on the attribute tags and the molecular characterization vectors comprises:
inputting the molecular characterization vector into a second neural network to obtain an attribute prediction result output by the second neural network;
and determining the third loss value according to the attribute label and the attribute prediction result.
12. An apparatus for pre-training a neural network, comprising:
the first acquisition module is used for acquiring edge characteristic data among all nodes in the map;
the selection module is used for selecting a plurality of first nodes from the nodes and acquiring node characteristic data of the first nodes; the node characteristic data comprises first node characteristic data and second node characteristic data; the first node characteristic data and the second node characteristic data have different data types;
the first prediction module is used for inputting the first node characteristic data and the edge characteristic data into a graph neural network to obtain a first node characterization vector output by the graph neural network;
a first processing module, configured to determine, according to the second node feature data and the first node characterization vector, a first loss value predicted by the graph neural network for each first node;
a first updating module, configured to update the first node according to the first loss value;
and the first training module is used for training the graph neural network according to the updated node characteristic data of the first node.
13. A training device for molecular property prediction model, comprising:
the second acquisition module is used for acquiring attribute labels of molecules, atom characteristic data of each atom in the molecules and chemical bond data among the atoms;
a second prediction module for inputting the atomic feature data and the chemical bond data into a graph neural network pre-trained by the method of any one of claims 1-9, resulting in a molecular characterization vector output by the graph neural network;
a second processing module, configured to determine a third loss value predicted by the neural network of the graph according to the attribute labels and the molecular characterization vectors;
and the second training module is used for updating the parameters of the graph neural network according to the third loss value to obtain a trained molecular attribute prediction model.
14. A computer device, comprising:
at least one processor;
at least one memory for storing at least one program;
the method according to any of claims 1-11, when at least one of said programs is executed by at least one of said processors.
15. A computer-readable storage medium in which a program executable by a processor is stored, characterized in that: the processor executable program is for implementing the method of any one of claims 1-11 when executed by a processor.
CN202110205745.1A 2021-02-24 2021-02-24 Pre-training method, device, equipment and medium of graph neural network Pending CN113609337A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110205745.1A CN113609337A (en) 2021-02-24 2021-02-24 Pre-training method, device, equipment and medium of graph neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110205745.1A CN113609337A (en) 2021-02-24 2021-02-24 Pre-training method, device, equipment and medium of graph neural network

Publications (1)

Publication Number Publication Date
CN113609337A true CN113609337A (en) 2021-11-05

Family

ID=78303265

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110205745.1A Pending CN113609337A (en) 2021-02-24 2021-02-24 Pre-training method, device, equipment and medium of graph neural network

Country Status (1)

Country Link
CN (1) CN113609337A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023155546A1 (en) * 2022-02-17 2023-08-24 腾讯科技(深圳)有限公司 Structure data generation method and apparatus, device, medium, and program product
WO2023213233A1 (en) * 2022-05-06 2023-11-09 墨奇科技(北京)有限公司 Task processing method, neural network training method, apparatus, device, and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023155546A1 (en) * 2022-02-17 2023-08-24 腾讯科技(深圳)有限公司 Structure data generation method and apparatus, device, medium, and program product
WO2023213233A1 (en) * 2022-05-06 2023-11-09 墨奇科技(北京)有限公司 Task processing method, neural network training method, apparatus, device, and medium

Similar Documents

Publication Publication Date Title
Zhang et al. A survey on neural network interpretability
CN110796190B (en) Exponential modeling with deep learning features
Swaminathan et al. Sparse low rank factorization for deep neural network compression
WO2021063171A1 (en) Decision tree model training method, system, storage medium, and prediction method
WO2021159776A1 (en) Artificial intelligence-based recommendation method and apparatus, electronic device, and storage medium
US20210390397A1 (en) Method, machine-readable medium and system to parameterize semantic concepts in a multi-dimensional vector space and to perform classification, predictive, and other machine learning and ai algorithms thereon
CN111667022A (en) User data processing method and device, computer equipment and storage medium
JP7403909B2 (en) Operating method of sequence mining model training device, operation method of sequence data processing device, sequence mining model training device, sequence data processing device, computer equipment, and computer program
CN111950596A (en) Training method for neural network and related equipment
CN110889450B (en) Super-parameter tuning and model construction method and device
CN111259647A (en) Question and answer text matching method, device, medium and electronic equipment based on artificial intelligence
CN116664719B (en) Image redrawing model training method, image redrawing method and device
CN114298122A (en) Data classification method, device, equipment, storage medium and computer program product
CN113609337A (en) Pre-training method, device, equipment and medium of graph neural network
CN112749737A (en) Image classification method and device, electronic equipment and storage medium
Park et al. Bayesian weight decay on bounded approximation for deep convolutional neural networks
CN114611692A (en) Model training method, electronic device, and storage medium
WO2023174064A1 (en) Automatic search method, automatic-search performance prediction model training method and apparatus
CN116975743A (en) Industry information classification method, device, computer equipment and storage medium
Rahul et al. Deep auto encoder based on a transient search capsule network for student performance prediction
CN116957006A (en) Training method, device, equipment, medium and program product of prediction model
CN115858388A (en) Test case priority ordering method and device based on variation model mapping chart
CN114898184A (en) Model training method, data processing method and device and electronic equipment
CN114936890A (en) Counter-fact fairness recommendation method based on inverse tendency weighting method
Tomar et al. A multilabel approach using binary relevance and one-versus-rest least squares twin support vector machine for scene classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination