CN115035349B - Point representation learning method, representation method and device of graph data and storage medium - Google Patents

Point representation learning method, representation method and device of graph data and storage medium Download PDF

Info

Publication number
CN115035349B
CN115035349B CN202210736863.XA CN202210736863A CN115035349B CN 115035349 B CN115035349 B CN 115035349B CN 202210736863 A CN202210736863 A CN 202210736863A CN 115035349 B CN115035349 B CN 115035349B
Authority
CN
China
Prior art keywords
node
self
subgraph
graph data
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210736863.XA
Other languages
Chinese (zh)
Other versions
CN115035349A (en
Inventor
朱文武
王鑫
李昊阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202210736863.XA priority Critical patent/CN115035349B/en
Publication of CN115035349A publication Critical patent/CN115035349A/en
Application granted granted Critical
Publication of CN115035349B publication Critical patent/CN115035349B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a point representation learning method, a point representation learning device and a point representation learning device for graph data, and a point representation learning device and a point representation learning storage medium for graph data, and belongs to the technical field of data processing. The learning method comprises the following steps: inputting graph data serving as training samples into a preset node representation model; processing the graph data through the preset node characterization model, and determining a stable self-subgraph and an unstable self-subgraph of each node in the graph data, wherein the stable self-subgraph is used for characterizing the stable characteristics of the nodes, and the unstable self-subgraph is used for characterizing the environmental information of the nodes; and the preset node characterization model learns the stable self-subgraphs of all nodes in the graph data to obtain a node characterization model after learning. The application aims to adaptively ensure the prediction effect of a model under the condition that the distribution difference exists between a test environment and a training environment.

Description

Point representation learning method, representation method and device of graph data and storage medium
Technical Field
The embodiment of the application relates to the technical field of data processing, in particular to a point representation learning method, a representation method, a device and a storage medium of graph data.
Background
At present, the graph data is already applied to various scenes, such as social networks, traffic networks, internet of things networks and the like, but generally, the graph data cannot be directly applied to a deep learning algorithm, and after vectorization characterization is carried out on the graph data, the solution of related tasks can be completed on the graph data.
Graph data characterization learning is a very important scientific research problem, and the core problem is how to calculate the characterization of nodes in graph data; the current common graph data characterization learning mainly comprises three methods: the graph neural network characterizes learning, node number generalization and expression capacity of the graph neural network.
The vectorization characterization of the graph data nodes obtained by the method is fit with a training environment, but graph data in the testing environment often has more complex data distribution, when the training data is insufficient to reflect the real distribution of the data, the test data and the training data have distribution differences, and although the network of the node characterization of the graph data obtained by the method obtains good prediction effect on a training data set, when the network is actually applied to the testing environment, the performance is easy to be obviously reduced due to distribution migration.
Therefore, how to adaptively ensure the prediction effect of the model when the test environment and the training environment have distribution differences is a problem to be solved.
Disclosure of Invention
The embodiment of the application provides a point representation learning method, a point representation device and a point representation storage medium of graph data, which aim to adaptively ensure the prediction effect of a model when the distribution difference exists between a test environment and a training environment.
In a first aspect, an embodiment of the present application provides a method for learning a point representation of graph data under distribution migration, where the learning method includes:
inputting graph data serving as training samples into a preset node representation model;
Processing the graph data through the preset node characterization model, and determining a stable self-subgraph and an unstable self-subgraph of each node in the graph data, wherein the stable self-subgraph is used for characterizing the stable characteristics of the nodes, and the unstable self-subgraph is used for characterizing the environmental information of the nodes;
And the preset node characterization model learns the stable self-subgraphs of all nodes in the graph data to obtain a node characterization model after learning.
Optionally, the processing the graph data through the preset node characterization model, determining a stable self-subgraph and an unstable self-subgraph of each node in the graph data, including:
the preset node characterization model determines a self-graph of each node in the graph data;
in the self-graph of any node, updating the node representation of the node according to the information of the neighbor node of the node;
According to the node characteristics updated by the node, respectively calculating the similarity between the node and a first-order neighbor node in the self-graph;
and determining a stable self-subgraph and an unstable self-subgraph corresponding to the node according to the similarity between the node and the first-order neighbor node in the self-graph.
Optionally, updating the node representation of the node according to the information of the neighboring node of the node, including:
carrying out neighbor aggregation on all node information in the self graph of the node;
And updating the information of the node by using the information of the neighbor nodes of the node to obtain updated node characterization.
Optionally, determining the stable self-subgraph and the unstable self-subgraph corresponding to the node according to the similarity between the node and other nodes in the self-graph, including:
if the similarity between the node and any one of the first-order neighbor nodes is larger than a preset value, the edge between the two nodes is the edge in the stable self-subgraph;
and if the similarity between the node and any one of the first-order neighbor nodes is smaller than or equal to a preset value, the edge between the two nodes is the edge in the unstable self-subgraph.
Optionally, after determining the stable self-subgraph and the unstable self-subgraph of each node in the graph data, the learning method further includes:
Characterizing the stable self subgraph and the unstable self subgraph of each node to obtain the characteristics corresponding to the stable self subgraph and the unstable self subgraph, wherein the characteristics are used for describing the clustering characteristics of each subgraph;
clustering the unstable self-subgraphs of all nodes in the graph data, wherein the clustered unstable self-subgraphs represent the environment information of the corresponding nodes.
In a second aspect, an embodiment of the present application provides a method for characterizing points of graph data under distribution migration, where the characterizing method includes:
inputting graph data to be characterized into the node characterization model according to the first aspect of the embodiment;
Processing the graph data to be characterized through the node characterization model, and determining a stable self-subgraph and an unstable self-subgraph of each node in the graph data to be characterized, wherein the stable self-subgraph of one node is used for characterizing the stable characteristics of the node, and the unstable self-subgraph of one node is used for characterizing the environmental information of the node;
and the node characterization model predicts according to the stable self-subgraphs of all nodes in the graph data to be characterized, and outputs a node characterization result of the graph data to be characterized.
Optionally, the training samples of the node characterization model are: and the graph data has the same or different data distribution with the graph data to be characterized.
In a third aspect, an embodiment of the present application provides a point representation learning apparatus for map data under distribution migration, the learning apparatus including:
the training input module is used for inputting graph data serving as a training sample into a preset node representation model;
The processing module is used for processing the graph data through the preset node characterization model and determining a stable self-subgraph and an unstable self-subgraph of each node in the graph data, wherein the stable self-subgraph is used for characterizing the stable characteristics of the nodes, and the unstable self-subgraph is used for characterizing the environmental information of the nodes;
And the learning module is used for enabling the preset node characterization model to learn the stable self-subgraphs of all nodes in the graph data, and obtaining the node characterization model after learning.
Optionally, the processing module includes:
the self-graph determining unit is used for determining a self-graph of each node in the graph data through the preset node characterization model;
The updating unit is used for updating the node representation of any node according to the information of the neighbor node of the node in the self-graph of the node;
the similarity calculation unit is used for calculating the similarity between the node and the first-order neighbor node in the self-graph according to the node representation updated by the node;
and the self-subgraph determining unit is used for determining a stable self-subgraph and an unstable self-subgraph corresponding to the node according to the similarity between the node and the first-order neighbor node in the self-graph.
Optionally, the updating unit includes:
An aggregation subunit, configured to perform neighbor aggregation on all node information in the self-graph of the node;
And the updating subunit is used for updating the information of the node by utilizing the information of the neighbor node of the node to obtain the updated node representation.
Optionally, the self subgraph determining unit includes:
A stable self-subgraph determination subunit, configured to, when the similarity between the node and any one of the first-order neighbor nodes is greater than a preset value, take an edge between two nodes as an edge in the stable self-subgraph;
and the unstable self-subgraph determination subunit is used for determining that the edge between the two nodes is the edge in the unstable self-subgraph when the similarity between the node and any one of the first-order neighbor nodes is smaller than or equal to a preset value.
Optionally, the learning device further includes:
the characterization module is used for performing characterization processing on the stable self-subgraph and the unstable self-subgraph of each node to obtain characteristics corresponding to the stable self-subgraph and the unstable self-subgraph, wherein the characteristics are used for describing clustering characteristics of each subgraph;
And the clustering module is used for clustering the unstable self-subgraphs of all the nodes in the graph data, and the clustered unstable self-subgraphs represent the environment information of the corresponding nodes.
In a fourth aspect, an embodiment of the present application provides a point characterization apparatus for map data under distribution migration, the characterization apparatus including:
The input module is used for inputting graph data to be characterized into the node characterization model according to the first aspect of the embodiment;
The prediction module is used for processing the graph data to be characterized through the node characterization model and determining a stable self-subgraph and an unstable self-subgraph of each node in the graph data to be characterized, wherein the stable self-subgraph of one node is used for characterizing the stable characteristics of the node, and the unstable self-subgraph of one node is used for characterizing the environmental information of the node; and the node characterization model predicts according to the stable self-subgraphs of all nodes in the graph data to be characterized, and outputs a node characterization result of the graph data to be characterized.
In a fifth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of learning a point representation of distribution under-migration map data according to the first aspect of the embodiments, and/or implements a method of point representation of distribution under-migration map data according to the second aspect of the embodiments.
The beneficial effects are that:
in the learning method, the graph data is taken as granularity of each node, so that a preset node representation model can identify stable self-subgraphs and unstable self-subgraphs of each node, the stable self-subgraphs represent stable characteristics of the nodes, the unstable self-subgraphs represent environment information of the nodes, and the preset node representation model only carries out learning and training on the stable self-subgraphs representing the stable characteristics of the nodes, namely, the preset node representation model predicts according to the stable self-subgraphs of the nodes and outputs representation results of the graph data.
In other words, the learning method identifies, distinguishes and dissociates stable characteristic information and unstable environment information in the graph data point characterization process, removes pseudo-correlation such as unstable environment information of nodes, and enables the nodes to ensure that the model predicts according to the stable characteristics of the nodes in the graph data; when the test environment and the training environment are different, the node characterization model obtained through training can still remove the influence of unstable environmental factors in data distribution on node characterization, obtain a more accurate characterization prediction result, and can adaptively ensure the prediction effect of the model when the test environment and the training environment are different in distribution.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart illustrating a method for learning point representations of graph data under distributed migration according to an embodiment of the present application;
FIG. 2 is a flow chart illustrating the steps of a method for point characterization of graph data under distributed migration according to one embodiment of the present application;
FIG. 3 is a functional block diagram of a point representation learning device for distributed transition map data according to an embodiment of the present application;
FIG. 4 is a functional block diagram of a point characterization device of map data under distributed migration according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
As graph data is increasingly widely applied, graph data characterization learning becomes a very important scientific research problem, and the core problem is how to calculate the characterization of nodes in the graph data; the current common graph data characterization learning mainly comprises three methods: the graph neural network characterizes learning, node number generalization and expression capacity of the graph neural network.
The graph neural network representation learning has strong graph representation capability, achieves ideal performance on a plurality of tasks, mainly adopts a neighbor aggregation mechanism, and carries out iterative updating on node representation according to neighbor information.
The node generalized graph neural network gives a prediction result independent of the node number of the graph structure data, can transfer a model trained on a small graph to a large graph and obtain a good effect on the large graph, but the method is only suitable for the condition that the node number of a training environment and a testing environment changes, cannot adapt to other types of distribution changes, particularly the distribution changes existing on a node level, and cannot be used for the graph data node characterization learning problem of the distribution generalization in a real scene.
The expression capacity of the graph neural network theoretically analyzes the capacity of the graph neural network to solve tasks, theoretical guarantee is given when the distribution of the training set and the test set are close, but the method cannot be applied to the condition that the training environment is inconsistent with the test environment, so that the obtained characterization cannot accurately describe graph structure data in the unknown test environment, and the performance of downstream tasks is affected to a certain extent.
The graph neural network adopted by the existing method is based on the same distribution assumption, namely, all node data are assumed to come from the same data distribution, the fitting process of training data distribution is focused, however, deviation often exists between the data distribution of graph data used in testing and the data distribution in training, so that the problem to be solved is that how to adaptively ensure the prediction effect of a model when the testing environment and the training environment have distribution differences.
FIG. 1 is a flowchart showing the steps of a method for learning point representations of graph data under distributed migration in an embodiment of the present application, the method may specifically include the steps of:
s101: and inputting the graph data serving as a training sample into a preset node representation model.
The graph data comprises nodes and edges between the nodes, and each node carries information; for example, if the node represents a person, the information carried by the node may be personal information such as gender, hobbies and the like of the person, and if the node represents a medicine, the information carried by the node may be information such as composition, category and the like; of course, in different application environments, even though the nodes represent the same person, the information carried by them may be different.
In the learning and training process of the preset node characterization model, the number of graph data serving as a training sample can be determined according to the actual training requirement, and the method is not limited in the embodiment; the graph data as training samples carries labels, which are related to the task being performed, and the present embodiment is not limited thereto.
S102: and processing the graph data through the preset node characterization model, and determining a stable self-subgraph and an unstable self-subgraph of each node in the graph data.
Specifically, the process of determining the stable self-subgraph and the unstable self-subgraph for each node is as follows:
A1: and the preset node characterization model determines the self-graph of each node in the graph data.
The self-graph of each node is centered on each node itself and comprises a first-order neighbor node and a second-order neighbor node.
A2: in the self-graph of any node, the node characterization of the node is updated according to the information of the neighbor nodes of the node.
Specifically, in order to improve accuracy in node characterization, node information of all nodes in a self graph of the node is subjected to neighbor aggregation, namely, information of neighbor nodes of the node is integrated into the node in a direct summation or weighted summation mode, the information of the neighbor nodes of the node is utilized to update the information of the node to obtain updated node characterization, and each iteration only carries out aggregation on first-order neighbors, so that aggregation on high-order neighbors can be realized through multiple iterations.
A3: and respectively calculating the similarity between the node and the first-order neighbor node in the self-graph according to the node representation updated by the node.
A4: and determining a stable self-subgraph and an unstable self-subgraph corresponding to the node according to the similarity between the node and the first-order neighbor node in the self-graph.
Specifically, if the similarity between the node and any one of the first-order neighbor nodes is greater than a preset value, the edge between the two nodes is the edge in the stable self-subgraph; and if the similarity between the node and any one of the first-order neighbor nodes is smaller than or equal to a preset value, the edge between the two nodes is the edge in the unstable self-subgraph.
In the actual implementation process, a first graph neural network can be set in the preset node characterization model, the steps A1-A4 are executed through the first graph neural network, and the output of the first graph neural network is the stable self-subgraph and the unstable self-subgraph of each node in the graph data.
In a possible implementation manner, after the stable self-subgraph and the unstable self-subgraph of each node are determined, the stable self-subgraph and the unstable self-subgraph of each node are subjected to characterization processing through a second graph neural network arranged in a preset node characterization model, so that the characteristics corresponding to the stable self-subgraph and the unstable self-subgraph are obtained.
The stable self subgraph captures or characterizes the stable characteristics of the nodes, the unstable self subgraph focuses more on the change information of the related environment around the nodes, and the environment information of the nodes can be deduced by aggregating the unstable self subgraphs of all the nodes in the graph data.
The characteristics corresponding to the stable self subgraph and the unstable self subgraph can be used for describing the clustering characteristic of each subgraph, the stable self subgraph generally does not have the clustering characteristic, the edges in the unstable self subgraph often have the clustering effect, a plurality of classes are constructed during clustering, each class contains similar unstable self subgraphs, and dissimilar unstable self subgraphs are represented among the classes.
When the unstable self-subgraphs of all the nodes are clustered, the same cluster is encouraged to have as many continuous edges as possible, and the continuous edges in different clusters are as few as possible, namely the edges in the unstable self-subgraphs are generalized in the same cluster as much as possible, and the clustered unstable self-subgraphs can be used for judging the environmental information of one node.
The prediction error of the node characterization model is mainly caused by an unstable self-subgraph, and the prediction of the stable self-subgraph is accurate by the general node characterization model, so that the method removes pseudo-correlation such as unstable environment information by identifying, distinguishing and dissociating the stable self-subgraph and the unstable self-subgraph of the node, and the node characterization model learns and predicts according to the stable characteristics of the node in the graph data.
S103: and the preset node characterization model learns the stable self-subgraphs of all nodes in the graph data to obtain a node characterization model after learning.
In the actual implementation process, the preset node characterization model is encouraged to learn to realize stable prediction in a multi-distribution environment according to the stable self-subgraph by adopting a normalizer, so that the node characterization model focuses more on the real information with the predicted force carried by the stable self-subgraph, ignores the noise information contained in the unstable self-subgraph, and further predicts the node task under distribution migration.
Specifically, parameters related to training in the training process, such as a loss function, a training batch, etc., may be adaptively set according to the actual task performed, which is not limited in this embodiment.
The node characterization model in the embodiment can be applied to different scenes, for example, taking medicine analysis as an example, training can be performed on a small number of marked molecular figures, and the model obtained by training is applied to classification of a large number of unlabeled medicines with larger scale, different data distribution and training; or taking social network analysis as an example, node characterization learning under distributed migration can also complete analysis on dynamic and evolving data, and a sufficiently stable generalized result is given; the node characterization model can also provide important help for human-computer interaction, computer-aided systems, trusted artificial intelligence and the like.
Referring to fig. 2, a flowchart illustrating steps of a method for characterizing points of graph data under distribution migration in an embodiment of the present application may specifically include the following steps:
s201: and inputting the graph data to be characterized into the node characterization model in the embodiment.
After training to obtain a node characterization model, the node characterization model can be directly used for predicting the characterization result of the graph data to be characterized; and training samples of the node characterization model are: the graph data with the same or different data distribution with the graph data to be characterized, namely, the graph data adopted in training and the data distribution of the graph data to be characterized can have deviation or difference.
S202: and processing the graph data to be characterized through the node characterization model, and determining a stable self-subgraph and an unstable self-subgraph of each node in the graph data to be characterized.
The node characterization model identifies, distinguishes and dissociates a stable self-subgraph and an unstable self-subgraph of the graph data to be characterized, the stable self-subgraph of one node is used for characterizing the stable characteristic of the node, and clusters the unstable self-subgraph of each node in the graph data to be characterized, so that the environmental information of each node in the graph data to be characterized is deduced.
S203: and the node characterization model predicts according to the stable self-subgraphs of all nodes in the graph data to be characterized, and outputs a node characterization result of the graph data to be characterized.
When the test environment and the training environment are different, the node characterization model can still remove the influence of unstable environmental factors in the data distribution on node characterization, and the node characterization result of the obtained graph data to be characterized is more accurate only by predicting according to the stable self-subgraph in the graph data to be characterized.
Unlike the prior art, the node characterization model provided in this embodiment identifies, distinguishes and dissociates stable characteristic information and unstable environment information in the graph data point characterization process, and removes such pseudo-correlation of the unstable environment information, so that the model predicts according to the stable characteristic of the node in the graph data, and even when the graph data input in the training and testing processes have differences in data distribution, the model can adaptively ensure the predicted effect, and has the advantages of completing adaptation and generalization to the testing environment different from the training environment, and giving out sufficiently stable and generalized output results on the testing data outside the distribution.
Referring to fig. 3, there is shown a functional block diagram of a point representation learning apparatus of map data under distributed migration in an embodiment of the present application, the learning apparatus comprising:
the training input module 101 is configured to input graph data serving as a training sample into a preset node characterization model;
The processing module 102 is configured to process the graph data through the preset node characterization model, and determine a stable self-subgraph and an unstable self-subgraph of each node in the graph data, where the stable self-subgraph is used to characterize a stable characteristic of the node, and the unstable self-subgraph is used to characterize environmental information of the node;
And the learning module 103 is configured to enable the preset node characterization model to learn the stable self-subgraphs of all nodes in the graph data, so as to obtain a node characterization model after learning.
Optionally, the processing module includes:
the self-graph determining unit is used for determining a self-graph of each node in the graph data through the preset node characterization model;
The updating unit is used for updating the node representation of any node according to the information of the neighbor node of the node in the self-graph of the node;
the similarity calculation unit is used for calculating the similarity between the node and the first-order neighbor node in the self-graph according to the node representation updated by the node;
and the self-subgraph determining unit is used for determining a stable self-subgraph and an unstable self-subgraph corresponding to the node according to the similarity between the node and the first-order neighbor node in the self-graph.
Optionally, the updating unit includes:
An aggregation subunit, configured to perform neighbor aggregation on all node information in the self-graph of the node;
And the updating subunit is used for updating the information of the node by utilizing the information of the neighbor node of the node to obtain the updated node representation.
Optionally, the self subgraph determining unit includes:
A stable self-subgraph determination subunit, configured to, when the similarity between the node and any one of the first-order neighbor nodes is greater than a preset value, take an edge between two nodes as an edge in the stable self-subgraph;
and the unstable self-subgraph determination subunit is used for determining that the edge between the two nodes is the edge in the unstable self-subgraph when the similarity between the node and any one of the first-order neighbor nodes is smaller than or equal to a preset value.
Optionally, the learning device further includes:
the characterization module is used for performing characterization processing on the stable self-subgraph and the unstable self-subgraph of each node to obtain characteristics corresponding to the stable self-subgraph and the unstable self-subgraph, wherein the characteristics are used for describing clustering characteristics of each subgraph;
And the clustering module is used for clustering the unstable self-subgraphs of all the nodes in the graph data, and the clustered unstable self-subgraphs represent the environment information of the corresponding nodes.
Referring to fig. 4, there is shown a functional block diagram of a point characterization device of map data under distributed migration in an embodiment of the present application, the characterization device includes:
An input module 201, configured to input graph data to be characterized into the node characterization model described in the embodiment;
The prediction module 202 is configured to process the graph data to be characterized through the node characterization model, and determine a stable self-subgraph and an unstable self-subgraph of each node in the graph data to be characterized, where the stable self-subgraph of one node is used for characterizing the stable characteristics of the node, and the unstable self-subgraph of one node is used for characterizing the environmental information of the node; and the node characterization model predicts according to the stable self-subgraphs of all nodes in the graph data to be characterized, and outputs a node characterization result of the graph data to be characterized.
The embodiment of the application also provides a computer readable storage medium, and a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the method for learning the point representation of the graph data under the distribution migration described in the embodiment and/or the method for learning the point representation of the graph data under the distribution migration described in the embodiment are realized.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
It will be apparent to those skilled in the art that embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the application may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the application.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or terminal device that comprises the element.
The principles and embodiments of the present application have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present application and the core ideas thereof; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (9)

1. A method for learning point representation of graph data under distribution migration, the method comprising:
inputting graph data serving as training samples into a preset node representation model;
Processing the graph data through the preset node characterization model, and determining a stable self-subgraph and an unstable self-subgraph of each node in the graph data, wherein the method comprises the following steps: the preset node characterization model determines a self-graph of each node in the graph data; in the self-graph of any node, updating the node representation of the node according to the information of the neighbor node of the node; according to the node characteristics updated by the node, respectively calculating the similarity between the node and a first-order neighbor node in the self-graph; determining a stable self-subgraph and an unstable self-subgraph corresponding to the node according to the similarity between the node and a first-order neighbor node in the self-graph; the stable self subgraph is used for representing the stable characteristics of the nodes, and the unstable self subgraph is used for representing the environmental information of the nodes;
And the preset node characterization model learns the stable self-subgraphs of all nodes in the graph data to obtain a node characterization model after learning.
2. The learning method of claim 1 wherein updating the node representation of the node based on information of neighboring nodes of the node comprises:
carrying out neighbor aggregation on all node information in the self graph of the node;
And updating the information of the node by using the information of the neighbor nodes of the node to obtain updated node characterization.
3. The learning method according to claim 1, wherein determining a stable self-subgraph and an unstable self-subgraph corresponding to the node according to the similarity between the node and a first-order neighbor node in the self-graph thereof includes:
if the similarity between the node and any one of the first-order neighbor nodes is larger than a preset value, the edge between the two nodes is the edge in the stable self-subgraph;
and if the similarity between the node and any one of the first-order neighbor nodes is smaller than or equal to a preset value, the edge between the two nodes is the edge in the unstable self-subgraph.
4. A learning method according to any one of claims 1-3, wherein after determining a stable self-subgraph and an unstable self-subgraph for each node in the graph data, the learning method further comprises:
Characterizing the stable self subgraph and the unstable self subgraph of each node to obtain the characteristics corresponding to the stable self subgraph and the unstable self subgraph, wherein the characteristics are used for describing the clustering characteristics of each subgraph;
clustering the unstable self-subgraphs of all nodes in the graph data, wherein the clustered unstable self-subgraphs represent the environment information of the corresponding nodes.
5. A method for point characterization of graph data under distribution migration, the characterization method comprising:
Inputting graph data to be characterized into a node characterization model according to any one of claims 1-4;
Processing the graph data to be characterized through the node characterization model, and determining a stable self-subgraph and an unstable self-subgraph of each node in the graph data to be characterized, wherein the stable self-subgraph of one node is used for characterizing the stable characteristics of the node, and the unstable self-subgraph of one node is used for characterizing the environmental information of the node;
and the node characterization model predicts according to the stable self-subgraphs of all nodes in the graph data to be characterized, and outputs a node characterization result of the graph data to be characterized.
6. The method of claim 5, wherein the training samples of the node characterization model are: and the graph data has the same or different data distribution with the graph data to be characterized.
7. A point representation learning device of map data under distribution migration, characterized in that the learning device comprises:
the training input module is used for inputting graph data serving as a training sample into a preset node representation model;
The processing module is configured to process the graph data through the preset node characterization model, determine a stable self-subgraph and an unstable self-subgraph of each node in the graph data, and include: the preset node characterization model determines a self-graph of each node in the graph data; in the self-graph of any node, updating the node representation of the node according to the information of the neighbor node of the node; according to the node characteristics updated by the node, respectively calculating the similarity between the node and a first-order neighbor node in the self-graph; determining a stable self-subgraph and an unstable self-subgraph corresponding to the node according to the similarity between the node and a first-order neighbor node in the self-graph; the stable self subgraph is used for representing the stable characteristics of the nodes, and the unstable self subgraph is used for representing the environmental information of the nodes;
And the learning module is used for enabling the preset node characterization model to learn the stable self-subgraphs of all nodes in the graph data, and obtaining the node characterization model after learning.
8. A point characterization device of map data under distribution migration, the characterization device comprising:
an input module for inputting graph data to be characterized into the node characterization model according to any one of claims 1-4;
The prediction module is used for processing the graph data to be characterized through the node characterization model and determining a stable self-subgraph and an unstable self-subgraph of each node in the graph data to be characterized, wherein the stable self-subgraph of one node is used for characterizing the stable characteristics of the node, and the unstable self-subgraph of one node is used for characterizing the environmental information of the node; and the node characterization model predicts according to the stable self-subgraphs of all nodes in the graph data to be characterized, and outputs a node characterization result of the graph data to be characterized.
9. A computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the method for learning point representation of distribution map data according to any one of claims 1 to 4 and/or implements the method for point representation of distribution map data according to any one of claims 5 to 6.
CN202210736863.XA 2022-06-27 2022-06-27 Point representation learning method, representation method and device of graph data and storage medium Active CN115035349B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210736863.XA CN115035349B (en) 2022-06-27 2022-06-27 Point representation learning method, representation method and device of graph data and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210736863.XA CN115035349B (en) 2022-06-27 2022-06-27 Point representation learning method, representation method and device of graph data and storage medium

Publications (2)

Publication Number Publication Date
CN115035349A CN115035349A (en) 2022-09-09
CN115035349B true CN115035349B (en) 2024-06-18

Family

ID=83125989

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210736863.XA Active CN115035349B (en) 2022-06-27 2022-06-27 Point representation learning method, representation method and device of graph data and storage medium

Country Status (1)

Country Link
CN (1) CN115035349B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673708A (en) * 2020-05-13 2021-11-19 希捷科技有限公司 Distributed decentralized machine learning model training

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021162481A1 (en) * 2020-02-11 2021-08-19 Samsung Electronics Co., Ltd. Electronic device and control method thereof
CN112380344B (en) * 2020-11-19 2023-08-22 平安科技(深圳)有限公司 Text classification method, topic generation method, device, equipment and medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673708A (en) * 2020-05-13 2021-11-19 希捷科技有限公司 Distributed decentralized machine learning model training

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Graph Convolutional Neural Network with Inter-layer Cascade Based on Attention Mechanism;WEI, Lu等;Proceedings of CCIS2021;20211231;全文 *

Also Published As

Publication number Publication date
CN115035349A (en) 2022-09-09

Similar Documents

Publication Publication Date Title
CN112364880B (en) Omics data processing method, device, equipment and medium based on graph neural network
CN109816032A (en) Zero sample classification method and apparatus of unbiased mapping based on production confrontation network
CN110598869B (en) Classification method and device based on sequence model and electronic equipment
CN111080304A (en) Credible relationship identification method, device and equipment
CN111832312A (en) Text processing method, device, equipment and storage medium
Abdelbari et al. A computational intelligence‐based method to ‘learn’causal loop diagram‐like structures from observed data
Mehrotra et al. Multiclass classification of mobile applications as per energy consumption
JP2018151578A (en) Determination device, determination method, and determination program
Zhao et al. High-dimensional linear regression via implicit regularization
CN111209105A (en) Capacity expansion processing method, capacity expansion processing device, capacity expansion processing equipment and readable storage medium
CN117215728B (en) Agent model-based simulation method and device and electronic equipment
CN117194771B (en) Dynamic knowledge graph service recommendation method for graph model characterization learning
CN105357583A (en) Method and device for discovering interest and preferences of intelligent television user
CN115035349B (en) Point representation learning method, representation method and device of graph data and storage medium
CN112100509A (en) Information recommendation method, device, server and storage medium
CN111813941A (en) Text classification method, device, equipment and medium combining RPA and AI
Jia et al. Prediction of Web Services Reliability Based on Decision Tree Classification Method.
CN114092162B (en) Recommendation quality determination method, and training method and device of recommendation quality determination model
CN113239272B (en) Intention prediction method and intention prediction device of network management and control system
US20230152787A1 (en) Performance optimization of complex industrial systems and processes
CN115510318A (en) Training method of user characterization model, user characterization method and device
CN114861004A (en) Social event detection method, device and system
CN112801156B (en) Business big data acquisition method and server for artificial intelligence machine learning
CN109299321B (en) Method and device for recommending songs
CN114218487A (en) Video recommendation method, system, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant