CN114936307A

CN114936307A - Method for constructing normal graph model

Info

Publication number: CN114936307A
Application number: CN202210574033.1A
Authority: CN
Inventors: 冯亚维; 黄胜蓝; 周玺; 王改朝
Original assignee: Xi'an Shiluhuitu Information Technology Co Ltd
Current assignee: Xi'an Shiluhuitu Information Technology Co Ltd
Priority date: 2022-05-25
Filing date: 2022-05-25
Publication date: 2022-08-23

Abstract

The invention discloses a method for constructing a paradigm chart model, which comprises the following steps: s1, data exploration: analyzing the business data by using query statements and a visualization tool to determine potential relations among the business data; s2, node and edge relation construction: determining the node and edge types of the graph based on the data probing result, thereby constructing a graph schema; s3, graph data construction: loading data of corresponding nodes and edges based on the graph schema, and coding and combining attributes of the nodes and the edges to form a graph required by a graph model; s4, model selection: selecting a corresponding normal form for modeling according to the type of the service problem; s5, setting model parameters: setting model parameters including the number of hidden layers and a loss function; s6, model tuning: the model parameters are adjusted so that the model is optimal. The invention can enable the user to concentrate on the business data without concerning specific modeling details.

Description

Van-formal graph model construction method

Technical Field

The invention relates to the technical field of graphic data processing, in particular to a method for constructing a paradigm graph model.

Background

The figure shows the structure of entities and their relationships, which can be denoted as G ═ V, E. The graph consists of two sets, one set of nodes, V, and one set of edges, E. While nodes and edges will have their own attributes.

Due to the extremely strong expression ability of maps, GNNs (map neural networks) have gained wide attention in the field of machine learning. The main purpose of the GNN architecture is to learn the embedding that contains information about its neighborhood and to solve some of the problems of node and edge prediction based on this embedding.

From the development of artificial intelligence at the present stage, with the continuous improvement of computing power and the continuous upgrade of storage means, the computing intelligence with the rapid computing and memory storage capabilities is preliminarily realized. On the basis of the development of computational intelligence and perceptual intelligence, artificial intelligence is extending to cognitive intelligence capable of analyzing, thinking, understanding, judging and the like, and a real intelligent solution is shown.

In addition, the data analysis industry will revolutionize, the proportion of graph databases adopted by Chinese enterprises is increasing, and the Gartner report indicates that the graph technology accounts for 80% of the innovative field of data analysis by 2025. And the graph database is used as the optimization direction of the traditional relational database, and after the business data volume is accumulated to a certain degree, GraphAI can provide deeper insight and better business effect.

The current deep learning method of the image can be roughly divided into two types: semi-supervised and unsupervised approaches. Specifically, the semi-supervised method includes a Graph Neural Network (GNN) and a Graph Convolution Network (GCN), and fig. 1 is a semi-supervised learning diagram of the existing node-level classification. The unsupervised approach mainly consists of a graph auto-encoder (GAE). From the perspective of a model user, not only the design of a graph structure and the processing of node labels and features need to be considered, but also the model needs to be selected and each parameter of the model needs to be continuously adjusted, so that the requirements on the user are high, business data and model data are difficult to combine together, and the graphpai technology is difficult to fall down in many scenes.

Disclosure of Invention

Aiming at the problem that the GraphAI is difficult to fall on the ground in many scenes at present, the invention provides a method for constructing a normal graph model, which can enable a user to concentrate on business data without concerning specific modeling details.

The technical scheme adopted by the invention is as follows:

a canonicalized graph model construction method comprises the following steps:

s1, data exploration: analyzing the business data by using query statements and a visualization tool to determine potential relations among the business data;

s2, node and edge relation construction: determining the node and edge types of the graph based on the data exploration result so as to construct a graph schema;

s3, graph data construction: loading data of corresponding nodes and edges from a distributed file system based on the graph schema, and coding and combining attributes of the nodes and the edges to form a graph required by a graph model;

s4, model selection: selecting a corresponding normal form for modeling according to the type of the service problem;

s5, setting model parameters: setting model parameters including the number of hidden layers and a loss function;

s6, model tuning: and adjusting the parameters of the model to optimize the model.

Further, the paradigm comprises an attribute conduction paradigm, and the modeling method of the attribute conduction paradigm comprises the following steps:

s401, polymerization: after receiving the information of the neighbor nodes, the nodes aggregate the information with the self information according to different weights to form more comprehensive information expression;

s402, nonlinear transformation: carrying out nonlinear transformation on the expression obtained by aggregation to obtain the expression of the layer node;

s403, propagation: and transmitting the node expression obtained after the nonlinear transformation to the neighbor nodes, wherein each node circularly executes the steps S401-S403.

Further, the paradigm comprises a vectorized matching paradigm, and the modeling method of the vectorized matching paradigm comprises: an aggregation process, wherein the process of aggregating neighbor information by each node is similar to an attribute conduction paradigm, but the model learning objective is not a label of a fitting sample any more, but the expression of related things is minimized, and the expression of unrelated things is maximized; after training is completed, the model obtains the calculation mode of object expression under the corresponding context, and the similarity of object expression reflects the relevance of objects.

Further, if no negative correlation relationship is predefined, some pairs of things will be randomly generated as uncorrelated relationships.

Further, the K objects that are most relevant to something are retrieved by hashing the inverted index.

Further, the paradigm comprises a causal reasoning paradigm whose modeling approach includes: the conditional probability of each node is obtained by carrying out maximum likelihood estimation on the value of each event of the training sample, and a graph structure which best meets the causal relationship of each event is searched by using breadth-first search, so that a probability graph model is formed after the conditional probability of the graph structure and each node is obtained.

Furthermore, when the probability graph model is used for prediction, the value probability of the node to be queried can be calculated after some node observation values are input.

Further, when the hypothesis is calculated according to the probability graph model, based on the causal structure learned by the model, the value conditional probability of the query node when the intervention node value is the intervention value is calculated by blocking the confounding factor, that is, the value probability of the query node after intervention is calculated.

The invention has the beneficial effects that:

(1) the invention gets through the complex flow from the service data to the graph data, so that the user only needs to pay attention to the service data of the user without paying attention to how the underlying graph data is realized and stored, discover the correlation among each kind of service data, and construct the graph schema according to the correlations, thereby greatly reducing the use difficulty of the user.

(2) The method can encapsulate the details of the graph model and achieve automatic modeling. The model type and the model parameters are packaged, so that a user can select a corresponding paradigm to model according to the problem type from actual problems, thereby omitting the details of the bottom layer of the model and reducing the modeling threshold.

(3) The invention simplifies the modeling process, reduces the modeling threshold, and enables a user to quickly test and obtain feedback.

(4) The invention enables users to enjoy the intelligent capability brought by Graph AI by directly using or combining the calculation paradigm without thinking complex model construction ideas and data processing ideas.

Drawings

Fig. 1 is a schematic diagram of semi-supervised learning for existing node-level classification.

FIG. 2 is a flowchart of a method for constructing a normalized graph model according to an embodiment of the present invention.

FIG. 3 is a second flowchart of the exemplary method for constructing a canonical graph model according to the present invention.

Fig. 4 is a diagram illustrating an exemplary embodiment of the present invention.

Detailed Description

In order to more clearly understand the technical features, objects, and effects of the present invention, specific embodiments of the present invention will now be described. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 2, the embodiment provides a method for constructing a canonical graph model, which includes a composition process and a modeling process, wherein the composition process includes the following steps:

s1, data exploration: analyzing the business data by using query sentences and a visualization tool to determine potential relations among the business data;

s2, node and edge relation construction: determining the node and edge types of the graph based on the data probing result, thereby constructing a graph schema;

s3, graph data construction: and loading data of corresponding nodes and edges from a distributed file system based on the graph schema, and coding and combining the attributes of the nodes and the edges to form a graph required by a graph model.

The modeling process comprises the following steps:

s4, model selection: and selecting a corresponding paradigm to model according to the types of the business problems (such as classification problems, recommendation problems and the like). The specific implementation process is to divide the original graph nodes into a training set and a test set in proportion so as to carry out training and verification.

S5, setting model parameters: model parameters such as the number of hidden layers and the loss function are set.

S6, model tuning: model parameters are adjusted through a neural network controller such as an AutoML (automatic modeling language) and the like, so that the model is optimal.

Preferably, as shown in fig. 3, the canonical graph model construction method of the embodiment further includes an ETL process, which mainly performs data acquisition, data cleaning, and data processing.

Preferably, the present implementation provides 3 paradigms including an attribute conducted paradigms, a vectorized matching paradigms and a causal reasoning paradigms, as shown in fig. 4 as an example of paradigms usage.

Specifically, the modeling method of the attribute conduction paradigm comprises the following steps:

Specifically, the modeling method of the vectorization matching paradigm includes: the aggregation process, each node aggregation neighbor information process, is similar to the attribute conduction paradigm, but the model learning objective is no longer to fit the labels of the samples, but to minimize the expression of related things and maximize the expression of unrelated things. If no negative correlation is predefined, some pairs of things will be randomly generated as uncorrelated relations. After training is completed, the model obtains the calculation mode of the object expression under the corresponding context, and the similarity of the object expression reflects the relevance of the object. On the basis of the expression, the user is helped to quickly search the K targets most relevant to a certain object by hashing the inverted index.

Specifically, the modeling method of the causal reasoning paradigm comprises: the conditional probability of each node is obtained by carrying out maximum likelihood estimation on the value of each event of the training sample, and a graph structure which best meets the causal relationship of each event is searched by using breadth-first search, so that a probability graph model is formed after the conditional probability of the graph structure and each node is obtained. When prediction is carried out, after some node observation values are input, the accurate value probability of the node to be queried can be calculated. When the hypothesis is calculated, based on the causal structure learned by the model, the value conditional probability of the query node when the intervention node value is the intervention value is calculated by blocking the confounding factor, namely the value probability of the query node after intervention.

The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for constructing a normalized graph model is characterized by comprising the following steps of:

s6, model tuning: the model parameters are adjusted so that the model is optimal.

2. The method of constructing a canonicalized graph model of claim 1, wherein the canonicalization includes an attribute conducted canonicalization, the method of modeling the attribute conducted canonicalization including the steps of:

3. The method for constructing the canonicalized graph model as claimed in claim 2, wherein the canonicalized graph model comprises a vectorized matching canonicalized model, and the modeling method of the vectorized matching canonicalized model comprises: an aggregation process, wherein the process of aggregating neighbor information by each node is similar to an attribute conduction paradigm, but the model learning objective is not a label of a fitting sample any more, but the expression of related things is minimized, and the expression of unrelated things is maximized; after training is completed, the model obtains the calculation mode of object expression under the corresponding context, and the similarity of object expression reflects the relevance of objects.

4. The method of claim 3, wherein if no negative correlation relationship is predefined, then some object pairs are randomly generated as irrelevant relationships.

5. The method of constructing a canonicalized graph model as recited in claim 3, wherein K objects that are most relevant to a certain thing are retrieved by hashing an inverted index.

6. The canonicalized graph model building method of claim 1, wherein the paradigm comprises a causal reasoning paradigm, and the modeling method of the causal reasoning paradigm comprises: the conditional probability of each node is obtained by carrying out maximum likelihood estimation on the value of each event of the training sample, and a graph structure which best meets the causal relationship of each event is searched by using breadth-first search, so that a probability graph model is formed after the conditional probability of the graph structure and each node is obtained.

7. The canonicalized graph model construction method of claim 6, wherein when the probability graph model is used for prediction, the value probability of the node to be queried can be calculated after some node observation values are input.

8. The canonicalized graph model construction method as claimed in claim 6, wherein when the hypothesis is calculated according to the probabilistic graph model, based on a causal structure learned by the model, a conditional probability of the query node when the intervention node takes the intervention node as the intervention value is calculated by blocking a confounding factor, that is, the value probability of the query node after intervention.