CN114936307A - Method for constructing normal graph model - Google Patents

Method for constructing normal graph model Download PDF

Info

Publication number
CN114936307A
CN114936307A CN202210574033.1A CN202210574033A CN114936307A CN 114936307 A CN114936307 A CN 114936307A CN 202210574033 A CN202210574033 A CN 202210574033A CN 114936307 A CN114936307 A CN 114936307A
Authority
CN
China
Prior art keywords
model
graph
node
data
canonicalized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210574033.1A
Other languages
Chinese (zh)
Inventor
冯亚维
黄胜蓝
周玺
王改朝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xi'an Shiluhuitu Information Technology Co Ltd
Original Assignee
Xi'an Shiluhuitu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xi'an Shiluhuitu Information Technology Co Ltd filed Critical Xi'an Shiluhuitu Information Technology Co Ltd
Priority to CN202210574033.1A priority Critical patent/CN114936307A/en
Publication of CN114936307A publication Critical patent/CN114936307A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9014Indexing; Data structures therefor; Storage structures hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method for constructing a paradigm chart model, which comprises the following steps: s1, data exploration: analyzing the business data by using query statements and a visualization tool to determine potential relations among the business data; s2, node and edge relation construction: determining the node and edge types of the graph based on the data probing result, thereby constructing a graph schema; s3, graph data construction: loading data of corresponding nodes and edges based on the graph schema, and coding and combining attributes of the nodes and the edges to form a graph required by a graph model; s4, model selection: selecting a corresponding normal form for modeling according to the type of the service problem; s5, setting model parameters: setting model parameters including the number of hidden layers and a loss function; s6, model tuning: the model parameters are adjusted so that the model is optimal. The invention can enable the user to concentrate on the business data without concerning specific modeling details.

Description

Van-formal graph model construction method
Technical Field
The invention relates to the technical field of graphic data processing, in particular to a method for constructing a paradigm graph model.
Background
The figure shows the structure of entities and their relationships, which can be denoted as G ═ V, E. The graph consists of two sets, one set of nodes, V, and one set of edges, E. While nodes and edges will have their own attributes.
Due to the extremely strong expression ability of maps, GNNs (map neural networks) have gained wide attention in the field of machine learning. The main purpose of the GNN architecture is to learn the embedding that contains information about its neighborhood and to solve some of the problems of node and edge prediction based on this embedding.
From the development of artificial intelligence at the present stage, with the continuous improvement of computing power and the continuous upgrade of storage means, the computing intelligence with the rapid computing and memory storage capabilities is preliminarily realized. On the basis of the development of computational intelligence and perceptual intelligence, artificial intelligence is extending to cognitive intelligence capable of analyzing, thinking, understanding, judging and the like, and a real intelligent solution is shown.
In addition, the data analysis industry will revolutionize, the proportion of graph databases adopted by Chinese enterprises is increasing, and the Gartner report indicates that the graph technology accounts for 80% of the innovative field of data analysis by 2025. And the graph database is used as the optimization direction of the traditional relational database, and after the business data volume is accumulated to a certain degree, GraphAI can provide deeper insight and better business effect.
The current deep learning method of the image can be roughly divided into two types: semi-supervised and unsupervised approaches. Specifically, the semi-supervised method includes a Graph Neural Network (GNN) and a Graph Convolution Network (GCN), and fig. 1 is a semi-supervised learning diagram of the existing node-level classification. The unsupervised approach mainly consists of a graph auto-encoder (GAE). From the perspective of a model user, not only the design of a graph structure and the processing of node labels and features need to be considered, but also the model needs to be selected and each parameter of the model needs to be continuously adjusted, so that the requirements on the user are high, business data and model data are difficult to combine together, and the graphpai technology is difficult to fall down in many scenes.
Disclosure of Invention
Aiming at the problem that the GraphAI is difficult to fall on the ground in many scenes at present, the invention provides a method for constructing a normal graph model, which can enable a user to concentrate on business data without concerning specific modeling details.
The technical scheme adopted by the invention is as follows:
a canonicalized graph model construction method comprises the following steps:
s1, data exploration: analyzing the business data by using query statements and a visualization tool to determine potential relations among the business data;
s2, node and edge relation construction: determining the node and edge types of the graph based on the data exploration result so as to construct a graph schema;
s3, graph data construction: loading data of corresponding nodes and edges from a distributed file system based on the graph schema, and coding and combining attributes of the nodes and the edges to form a graph required by a graph model;
s4, model selection: selecting a corresponding normal form for modeling according to the type of the service problem;
s5, setting model parameters: setting model parameters including the number of hidden layers and a loss function;
s6, model tuning: and adjusting the parameters of the model to optimize the model.
Further, the paradigm comprises an attribute conduction paradigm, and the modeling method of the attribute conduction paradigm comprises the following steps:
s401, polymerization: after receiving the information of the neighbor nodes, the nodes aggregate the information with the self information according to different weights to form more comprehensive information expression;
s402, nonlinear transformation: carrying out nonlinear transformation on the expression obtained by aggregation to obtain the expression of the layer node;
s403, propagation: and transmitting the node expression obtained after the nonlinear transformation to the neighbor nodes, wherein each node circularly executes the steps S401-S403.
Further, the paradigm comprises a vectorized matching paradigm, and the modeling method of the vectorized matching paradigm comprises: an aggregation process, wherein the process of aggregating neighbor information by each node is similar to an attribute conduction paradigm, but the model learning objective is not a label of a fitting sample any more, but the expression of related things is minimized, and the expression of unrelated things is maximized; after training is completed, the model obtains the calculation mode of object expression under the corresponding context, and the similarity of object expression reflects the relevance of objects.
Further, if no negative correlation relationship is predefined, some pairs of things will be randomly generated as uncorrelated relationships.
Further, the K objects that are most relevant to something are retrieved by hashing the inverted index.
Further, the paradigm comprises a causal reasoning paradigm whose modeling approach includes: the conditional probability of each node is obtained by carrying out maximum likelihood estimation on the value of each event of the training sample, and a graph structure which best meets the causal relationship of each event is searched by using breadth-first search, so that a probability graph model is formed after the conditional probability of the graph structure and each node is obtained.
Furthermore, when the probability graph model is used for prediction, the value probability of the node to be queried can be calculated after some node observation values are input.
Further, when the hypothesis is calculated according to the probability graph model, based on the causal structure learned by the model, the value conditional probability of the query node when the intervention node value is the intervention value is calculated by blocking the confounding factor, that is, the value probability of the query node after intervention is calculated.
The invention has the beneficial effects that:
(1) the invention gets through the complex flow from the service data to the graph data, so that the user only needs to pay attention to the service data of the user without paying attention to how the underlying graph data is realized and stored, discover the correlation among each kind of service data, and construct the graph schema according to the correlations, thereby greatly reducing the use difficulty of the user.
(2) The method can encapsulate the details of the graph model and achieve automatic modeling. The model type and the model parameters are packaged, so that a user can select a corresponding paradigm to model according to the problem type from actual problems, thereby omitting the details of the bottom layer of the model and reducing the modeling threshold.
(3) The invention simplifies the modeling process, reduces the modeling threshold, and enables a user to quickly test and obtain feedback.
(4) The invention enables users to enjoy the intelligent capability brought by Graph AI by directly using or combining the calculation paradigm without thinking complex model construction ideas and data processing ideas.
Drawings
Fig. 1 is a schematic diagram of semi-supervised learning for existing node-level classification.
FIG. 2 is a flowchart of a method for constructing a normalized graph model according to an embodiment of the present invention.
FIG. 3 is a second flowchart of the exemplary method for constructing a canonical graph model according to the present invention.
Fig. 4 is a diagram illustrating an exemplary embodiment of the present invention.
Detailed Description
In order to more clearly understand the technical features, objects, and effects of the present invention, specific embodiments of the present invention will now be described. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 2, the embodiment provides a method for constructing a canonical graph model, which includes a composition process and a modeling process, wherein the composition process includes the following steps:
s1, data exploration: analyzing the business data by using query sentences and a visualization tool to determine potential relations among the business data;
s2, node and edge relation construction: determining the node and edge types of the graph based on the data probing result, thereby constructing a graph schema;
s3, graph data construction: and loading data of corresponding nodes and edges from a distributed file system based on the graph schema, and coding and combining the attributes of the nodes and the edges to form a graph required by a graph model.
The modeling process comprises the following steps:
s4, model selection: and selecting a corresponding paradigm to model according to the types of the business problems (such as classification problems, recommendation problems and the like). The specific implementation process is to divide the original graph nodes into a training set and a test set in proportion so as to carry out training and verification.
S5, setting model parameters: model parameters such as the number of hidden layers and the loss function are set.
S6, model tuning: model parameters are adjusted through a neural network controller such as an AutoML (automatic modeling language) and the like, so that the model is optimal.
Preferably, as shown in fig. 3, the canonical graph model construction method of the embodiment further includes an ETL process, which mainly performs data acquisition, data cleaning, and data processing.
Preferably, the present implementation provides 3 paradigms including an attribute conducted paradigms, a vectorized matching paradigms and a causal reasoning paradigms, as shown in fig. 4 as an example of paradigms usage.
Specifically, the modeling method of the attribute conduction paradigm comprises the following steps:
s401, polymerization: after receiving the information of the neighbor nodes, the nodes aggregate the information with the self information according to different weights to form more comprehensive information expression;
s402, nonlinear transformation: carrying out nonlinear transformation on the expression obtained by aggregation to obtain the expression of the layer node;
s403, propagation: and transmitting the node expression obtained after the nonlinear transformation to the neighbor nodes, wherein each node circularly executes the steps S401-S403.
Specifically, the modeling method of the vectorization matching paradigm includes: the aggregation process, each node aggregation neighbor information process, is similar to the attribute conduction paradigm, but the model learning objective is no longer to fit the labels of the samples, but to minimize the expression of related things and maximize the expression of unrelated things. If no negative correlation is predefined, some pairs of things will be randomly generated as uncorrelated relations. After training is completed, the model obtains the calculation mode of the object expression under the corresponding context, and the similarity of the object expression reflects the relevance of the object. On the basis of the expression, the user is helped to quickly search the K targets most relevant to a certain object by hashing the inverted index.
Specifically, the modeling method of the causal reasoning paradigm comprises: the conditional probability of each node is obtained by carrying out maximum likelihood estimation on the value of each event of the training sample, and a graph structure which best meets the causal relationship of each event is searched by using breadth-first search, so that a probability graph model is formed after the conditional probability of the graph structure and each node is obtained. When prediction is carried out, after some node observation values are input, the accurate value probability of the node to be queried can be calculated. When the hypothesis is calculated, based on the causal structure learned by the model, the value conditional probability of the query node when the intervention node value is the intervention value is calculated by blocking the confounding factor, namely the value probability of the query node after intervention.
The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. A method for constructing a normalized graph model is characterized by comprising the following steps of:
s1, data exploration: analyzing the business data by using query statements and a visualization tool to determine potential relations among the business data;
s2, node and edge relation construction: determining the node and edge types of the graph based on the data probing result, thereby constructing a graph schema;
s3, graph data construction: loading data of corresponding nodes and edges from a distributed file system based on the graph schema, and coding and combining attributes of the nodes and the edges to form a graph required by a graph model;
s4, model selection: selecting a corresponding normal form for modeling according to the type of the service problem;
s5, setting model parameters: setting model parameters including the number of hidden layers and a loss function;
s6, model tuning: the model parameters are adjusted so that the model is optimal.
2. The method of constructing a canonicalized graph model of claim 1, wherein the canonicalization includes an attribute conducted canonicalization, the method of modeling the attribute conducted canonicalization including the steps of:
s401, polymerization: after receiving the information of the neighbor nodes, the nodes aggregate the information with the self information according to different weights to form more comprehensive information expression;
s402, nonlinear transformation: carrying out nonlinear transformation on the expression obtained by aggregation to obtain the expression of the layer node;
s403, propagation: and transmitting the node expression obtained after the nonlinear transformation to the neighbor nodes, wherein each node circularly executes the steps S401-S403.
3. The method for constructing the canonicalized graph model as claimed in claim 2, wherein the canonicalized graph model comprises a vectorized matching canonicalized model, and the modeling method of the vectorized matching canonicalized model comprises: an aggregation process, wherein the process of aggregating neighbor information by each node is similar to an attribute conduction paradigm, but the model learning objective is not a label of a fitting sample any more, but the expression of related things is minimized, and the expression of unrelated things is maximized; after training is completed, the model obtains the calculation mode of object expression under the corresponding context, and the similarity of object expression reflects the relevance of objects.
4. The method of claim 3, wherein if no negative correlation relationship is predefined, then some object pairs are randomly generated as irrelevant relationships.
5. The method of constructing a canonicalized graph model as recited in claim 3, wherein K objects that are most relevant to a certain thing are retrieved by hashing an inverted index.
6. The canonicalized graph model building method of claim 1, wherein the paradigm comprises a causal reasoning paradigm, and the modeling method of the causal reasoning paradigm comprises: the conditional probability of each node is obtained by carrying out maximum likelihood estimation on the value of each event of the training sample, and a graph structure which best meets the causal relationship of each event is searched by using breadth-first search, so that a probability graph model is formed after the conditional probability of the graph structure and each node is obtained.
7. The canonicalized graph model construction method of claim 6, wherein when the probability graph model is used for prediction, the value probability of the node to be queried can be calculated after some node observation values are input.
8. The canonicalized graph model construction method as claimed in claim 6, wherein when the hypothesis is calculated according to the probabilistic graph model, based on a causal structure learned by the model, a conditional probability of the query node when the intervention node takes the intervention node as the intervention value is calculated by blocking a confounding factor, that is, the value probability of the query node after intervention.
CN202210574033.1A 2022-05-25 2022-05-25 Method for constructing normal graph model Pending CN114936307A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210574033.1A CN114936307A (en) 2022-05-25 2022-05-25 Method for constructing normal graph model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210574033.1A CN114936307A (en) 2022-05-25 2022-05-25 Method for constructing normal graph model

Publications (1)

Publication Number Publication Date
CN114936307A true CN114936307A (en) 2022-08-23

Family

ID=82863593

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210574033.1A Pending CN114936307A (en) 2022-05-25 2022-05-25 Method for constructing normal graph model

Country Status (1)

Country Link
CN (1) CN114936307A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117408584A (en) * 2023-12-07 2024-01-16 国网智能电网研究院有限公司 Carbon asset operation data model construction method, device, equipment and medium
WO2024046459A1 (en) * 2022-09-02 2024-03-07 深圳忆海原识科技有限公司 Model management apparatus and hierarchical system for neural network operation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024046459A1 (en) * 2022-09-02 2024-03-07 深圳忆海原识科技有限公司 Model management apparatus and hierarchical system for neural network operation
CN117408584A (en) * 2023-12-07 2024-01-16 国网智能电网研究院有限公司 Carbon asset operation data model construction method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN104995870B (en) Multiple target server arrangement determines method and apparatus
Berlin et al. Database schema matching using machine learning with feature selection
US8825640B2 (en) Methods and apparatus for ranking uncertain data in a probabilistic database
CN107391512B (en) Method and device for predicting knowledge graph
CN114936307A (en) Method for constructing normal graph model
Liu et al. Defective alternatives detection-based multi-attribute intuitionistic fuzzy large-scale decision making model
CN112131261B (en) Community query method and device based on community network and computer equipment
Zhang et al. Hierarchical community detection based on partial matrix convergence using random walks
Zhou et al. Betweenness centrality-based community adaptive network representation for link prediction
CN116244513A (en) Random group POI recommendation method, system, equipment and storage medium
CN115018545A (en) Similar user analysis method and system based on user portrait and clustering algorithm
CN116757262B (en) Training method, classifying method, device, equipment and medium of graph neural network
CN113269310A (en) Graph neural network interpretable method based on counterfactual
Yang Research on integration method of AI teaching resources based on learning behaviour data analysis
CN116450938A (en) Work order recommendation realization method and system based on map
Weinstein et al. Agent communication with differentiated ontologies: eight new measures of description compatibility
Zheng et al. An efficient preference-based sensor selection method in Internet of Things
Apajalahti et al. Combining ontological modelling and probabilistic reasoning for network management
Jia et al. The overlapping community discovery algorithm based on the local interaction model
Li Multidimensional Information Network Big Data Mining Algorithm Relying on Finite Element Analysis
Thiagarasu et al. A MADM model with VIKOR method for decision making support systems
CN117271577B (en) Keyword retrieval method based on intelligent analysis
Yi et al. CDRKD: An improved density peak algorithm based on kernel fuzzy measure in the overlapping community detection
Petrov Understanding the Performance of Hyperbolic Graph Neural Networks
Diao et al. Heuristic search for fuzzy-rough bireducts and its use in classifier ensembles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination