CN110598055A - Parallel graph summarization method based on attribute graph - Google Patents

Parallel graph summarization method based on attribute graph Download PDF

Info

Publication number
CN110598055A
CN110598055A CN201910783949.6A CN201910783949A CN110598055A CN 110598055 A CN110598055 A CN 110598055A CN 201910783949 A CN201910783949 A CN 201910783949A CN 110598055 A CN110598055 A CN 110598055A
Authority
CN
China
Prior art keywords
node
graph
nodes
error
point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910783949.6A
Other languages
Chinese (zh)
Inventor
马应龙
张鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China Electric Power University
Original Assignee
North China Electric Power University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China Electric Power University filed Critical North China Electric Power University
Priority to CN201910783949.6A priority Critical patent/CN110598055A/en
Publication of CN110598055A publication Critical patent/CN110598055A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/904Browsing; Visualisation therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of computer abstraction, and particularly relates to a parallel graph abstraction method based on an attribute graph, which comprises the following steps: step 1: preprocessing the acquired graph data, and processing each node in the graph into a node structure with own information and all direct neighbor information; step 2: randomly selecting a direct neighbor node for a current node, and then selecting a node with the same attribute and the maximum similarity with the current node from all the direct neighbor nodes of the neighbor nodes as a candidate node combined with the current node; and step 3: judging whether the introduced error exceeds an error threshold value after the current node and the candidate node are combined, if so, returning to the step 2 to continuously search other candidate nodes, and if not, combining the two nodes; and 4, step 4: and (4) executing two-node combination by updating all node information in the node structure, repeating the steps 3-4 until the number of the remaining nodes is less than a set threshold value, saving the final node structure and exporting the summary graph.

Description

Parallel graph summarization method based on attribute graph
Technical Field
The invention belongs to the technical field of computer abstraction, and particularly relates to a parallel graph abstraction method based on an attribute graph.
Background
The graph has strong inherent advantages and is widely applied to modeling of real objects and relationships thereof. Large scale graphical data is common in many application areas. In the graph, entities are modeled as vertices, and their relationships or connections are represented by edges. Various modern applications generate a large amount of graph data, and because the graph data stores a large amount of relationship information in a code, potential implicit knowledge can be mined from the graph data so as to be better used for serving users, so that many researchers intensively research the processing calculation of the graph data. However, as the number of application users continues to grow, the size and structure of the graph becomes increasingly complex, and analyzing and processing large graphs with millions or even billions of nodes and edges becomes a huge challenge. Because of the extremely high amount and complexity of graph data, conventional graph data analysis tools are unable to complete mining analysis of graph data in a limited time. Therefore, whether for tools or algorithms, it is a vital requirement today to reduce the size and complexity of graph data by generalizing large-scale graphs into compact, information-rich, highly abstract representations of original graphs, in such a way that large-scale graph data can be easily stored, managed, analyzed, and processed. Among various graph computation techniques, graph summarization is one potential approach to solving these problems.
In the field of graph summarization research, subjects concerned by different research groups are different, and they often extract features of graphs from different angles, so that a plurality of graph summarization algorithms are generated. Most of the existing graph abstract algorithms adopt a statistical method to research and extract the characteristics of a graph, and mainly concern about the topological structure of the graph, such as node degree distribution, frequent subgraph mining, community detection and the like. However, the summary generated by this algorithm is often a series of graphs, which are only subgraphs with high occurrence frequency or dense structures in the original graph, and the summary graph is obtained approximately by replacing the whole graph with the main structure. Although they contain the main information of the original graph to a large extent and can be analyzed and processed instead of the original graph, they often ignore other information in the graph, resulting in that the structural information of the whole graph loses intuitiveness and may cause deviation or even error of the analysis result. Most algorithms only consider the topological structure of the graph and do not consider the node attribute and relationship information, however, most network graphs in the real world are attribute graphs, the nodes and edges of the network graphs have various attributes and relationships, and only the topological structure of the graph is considered to be not in accordance with the requirements of the actual situation. In addition, most of the existing methods perform graph summarization in a single machine environment, with the rapid increase of internet users, the scale of the graph often exceeds the computing and storage capacity of a single computer, and when nodes and edges reach the order of millions or billions, the algorithms cannot normally process large-scale graphs, and the expansibility is not high. The centralized graph abstract algorithm based on the single-machine environment is not suitable for the current processing environment any more, and the research and implementation of the parallel graph abstract algorithm based on the distributed environment play a crucial role in analyzing and processing future large-scale graph data.
In the past research on node pairing, there are two node selection strategies: greedy and random methods. The greedy method is to select the optimal 2-hop neighbor node pair in the whole graph for combination every time, and although the optimal node pair is selected for combination and the minimum summary error is obtained, the greedy method causes a large amount of calculation and network communication. The random method is to randomly select 2-hop neighbor node pairs as candidate node pairs each time, although the calculated amount of the node pair selection stage is greatly reduced, the selected node pairs do not meet the error threshold of node combination in a high probability, and unnecessary calculation in the subsequent stage is caused.
Disclosure of Invention
Aiming at the technical problem, the invention provides a parallel graph summarization method based on an attribute graph, which comprises the following steps:
step 1: preprocessing the acquired graph data, and processing each node in the graph into a node structure with own information and all direct neighbor information;
step 2: randomly selecting a direct neighbor node for a current node, and then selecting a node with the same attribute and the maximum similarity with the current node from all the direct neighbor nodes of the neighbor nodes as a candidate node combined with the current node;
and step 3: judging whether the introduced error exceeds an error threshold value after the current node and the candidate node are combined, if so, returning to the step 2 to continuously search other candidate nodes, and if not, combining the two nodes;
and 4, step 4: and (4) executing two-node combination by updating all node information in the node structure, repeating the steps 3-4 until the number of the remaining nodes is less than a set threshold value, saving the final node structure and exporting the summary graph.
The error threshold is dynamically adjusted by adopting a heuristic algorithm of simulated annealing, and the error threshold is continuously increased along with the decrease of the number of the residual nodes.
The two-node combination cancels the node with larger ID and reserves the ID of the node with smaller ID.
Each node called a super point in the summary graph corresponds to a partition divided by the nodes of the original graph, and each edge called a super edge corresponds to a connection between two related node partitions; a hyper-edge join relationship between two hyper-points exists if and only if there is at least one edge join between some of the two hyper-points.
The introduced error is defined as:
whereinIndicating a point of excess viAnd vjMerge into vmThe introduced error, alpha, is an adjustable parameter,the error in the topology is represented by,representing the relationship error:
vpis a point of excess, VsThe data is a super point set, and the data is a super point set,respectively is a super point vmAnd vpV of over pointiAnd vpV of over pointjAnd vpV of over pointiAnd vpThese four sets of pairwise merge errors; r isi,p、rj,pRespectively representing a point of excess viAnd vpV of over pointjAnd vpThe relationship between these two sets of over-points.
The invention has the beneficial effects that:
the invention is oriented to the attribute graph, and fully considers the topological structure of the input graph and the attribute relation of the nodes in the graph abstract process; on the premise of meeting the requirement of controlling the abstract resolution by a user, a bottom-up node aggregation technology is adopted to finally generate abstract graphs with different abstract degrees.
The invention defines the concept of summary error and provides a specific calculation method of node aggregation error to quantitatively evaluate the increase of nodes to merging error in the process of graph summary. The threshold value of the error introduced by node pair merging at each time is defined by using a heuristic algorithm, and the error threshold value is dynamically increased along with the advance of the graph summarization process. The method not only can obtain smaller abstract errors on the whole, but also can improve the probability of node pair successful combination, and avoid excessive invalid candidate node pair selection operation, thereby improving the efficiency of map abstraction.
The abstract graph generated by the invention is a suboptimal solution close to an optimal solution, a large number of experiments are carried out on the attribute graph with various node attributes and relationships based on a Spark platform, the effectiveness and the efficiency of the algorithm are evaluated by analyzing abstract errors in the graph abstract process and the running time of a program, and finally, a large number of experiment results show that compared with the traditional graph abstract algorithm, the parallel graph abstract algorithm provided by the invention has high feasibility and efficiency.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a graph of the abstract error of the embodiment.
FIG. 3 is a time chart of the graph summarization process in the embodiment.
FIG. 4 is a diagram of extensibility analysis in an embodiment.
Detailed Description
The invention provides a parallel graph summarization method based on an attribute graph, which comprises the following steps of:
step 1: preprocessing the acquired graph data, and processing each node in the graph into a node structure with own information and all direct neighbor information;
step 2: randomly selecting a direct neighbor node for a current node, and then selecting a node with the same attribute and the maximum similarity with the current node from all the direct neighbor nodes of the neighbor nodes as a candidate node combined with the current node;
and step 3: judging whether the introduced error exceeds an error threshold value after the current node and the candidate node are combined, if so, returning to the step 2 to continuously search other candidate nodes, and if not, combining the two nodes;
and 4, step 4: and (4) executing two-node combination by updating all node information in the node structure, repeating the steps 3-4 until the number of the remaining nodes is less than a set threshold value, saving the final node structure and exporting the summary graph.
The first stage performs the candidate node pair selection task. The selection of node pairs to be merged is key to the graph summarization algorithm during each iteration. In view of the graph abstract of the attribute-oriented graph, two nodes to be merged must have the same attribute type and have as many neighbor domain relations as possible, which depends mainly on the direct neighbor nodes. Because two nodes with the same direct neighbor are similar in a high probability, in the process of selecting the candidate node pair, each current node selects the node with the same attribute in the 2-hop neighbor as the node with the highest similarity, and the node with the highest similarity serves as the merging partner of the node. In previous studies, there are two node selection strategies: greedy and random methods. The greedy method is to select the optimal 2-hop neighbor node pair in the whole graph for combination every time, and although the optimal node pair is selected for combination and the minimum summary error is obtained, the greedy method causes a large amount of calculation and network communication. The random method is to randomly select 2-hop neighbor node pairs as candidate node pairs each time, although the calculated amount of the node pair selection stage is greatly reduced, the selected node pairs do not meet the error threshold of node combination in a high probability, and unnecessary calculation in the subsequent stage is caused. In the algorithm, a candidate node pair selection method combines two strategies, a current node randomly selects a direct neighbor node, and then selects a node with the same attribute and the maximum similarity as the current node from all the direct neighbor nodes of the neighbor node as a partner of the current node. This stage only needs to find a pair of nodes with the same attribute and similar relationship domain, but does not need to verify whether the pair of nodes can really merge. Because the node pairs found are only merged with a high probability, the merging operation will never take place until further decisions are made that depend on the next stage. In addition, a variable is used in the program to track the number of nodes remaining in the data graph during the summarization process. When the digest size, i.e., the number of nodes in the graph is less than or equal to the user-defined number of nodes, k, this stage will stop finding candidate node pairs and send a message telling the other stages that the graph digest process is complete, then the other stages stop, and finally the whole process will terminate.
And the second stage executes the merging task of the candidate node pair. After a pair of candidate nodes is found, node pair information is sent to this stage, and it is then determined whether they can be combined by comparing the magnitude of the error introduced by this pair combination with the error threshold. We define a metric Δ to evaluate the similarity between the pair of nodes in the merging process, where Δ accumulates the difference between the relationship between the pair of nodes and its neighboring nodes, and a smaller value indicates that the two nodes are more similar, i.e. the pair of nodes are easier to merge. In order to obtain a lower total error of the combination, an error threshold value ET is used to limit the error value in each combination process, and the graph summarization algorithm is optimized. The merging operation can only be finally carried out on the pair of nodes when the merging introduced error of the pair of nodes is smaller than an error threshold ET, otherwise the pair of nodes can not be merged. In the early stage of the iterative process of the graph abstract, a pair of nodes meeting the merging error condition is easily found, which means that the probability that the node is merged with other nodes at the beginning is high. As the graph summarization process progresses, the overall level of error introduced by a single node pair merge will increase. If the error threshold remains the same, it is difficult to find a pair of nodes where merging can actually occur. Therefore, a heuristic algorithm named Simulated Annealing (SA) is adopted to dynamically adjust the error threshold, so that not only can a smaller summary error be obtained on the whole, but also the probability of node pair successful combination can be improved, and excessive invalid candidate node pair selection operation is avoided, thereby improving the efficiency of graph summarization. In the algorithm, the error threshold is set to a small initial value at the beginning of the program and then continuously increases as the graph summarization process advances. For each node merging operation, a node with a smaller node ID is always reserved as a merged node, instead of creating a new node. Specifically, the node with the larger ID should set its own ownerID as the ID of the node with the smaller ID, that is, set its own state as a dead state, and the subsequent operation is not applied to the node.
The third stage performs a node structure update task. If a pair of candidate nodes satisfies all the merging conditions, including the attribute isomorphism and the error threshold, the actual merging is finally performed by updating all the node information in the node structure, i.e., the node with the smaller ID (i.e., vi) The node whose size and number of self-connections, and ID are large (i.e., v) should be updatedj) The ID value of the smaller node whose owerid should be modified. In addition, node pair merging affects all neighbors of the pair of nodes. Neighbors should update their neighbor list with the new neighbor ID, size and connection. In particular, viAnd vjShould remove v from the neighbor listjAnd updates v with its new node informationi。viShould update v in its neighbor listiInformation of vjShould have v in its neighbor listjIs modified into a new viAnd (4) node information. After all affected information is updated, the program will continue to find another pair of candidate node pairs and iterate continuouslyOperation above the loop.
At the end of the procedure, the resulting node structure is saved for deriving the digest map. For each node, if its node ID is equal to the ownerID, it indicates that the node is a node present in the result digest map. In essence, the ID of each node and its ownerID form a member information from the superpoint in the original graph and the summary graph, where the nodes are compressed to form a ownership map. The algorithm assumes that the graph summarization process terminates when the resulting summarized graph has k nodes. However, since multiple merges occur in each iteration, the number of nodes remaining in the resultant summary graph is in most cases less than k. In most practical cases, the user may accept a summary map with a size approximately equal to k. Thus, once the number of super nodes does not exceed k, the program will terminate.
Given an attribute graph, the nodes in the graph have any number of attributes and are connected by edges of various relationship types. More precisely, the attribute graph is represented as G ═ V, E, a, R, where V is the set of nodes and E is the set of edges. The node attribute set in the figure is denoted as a ═ a1,a2,…,amAnd V e V is provided with one attribute type in A for any node V E V in the graph. The node relationship set in the figure is represented as R ═ { R ═ R1,r2,…,rnAnd (c) for any one edge (u, v) epsilon E in the graph, the graph has one relationship type in E.
The abstract diagram is defined as follows: given an input attribute graph G ═ (V, E, a, R), and a node partition of attribute graph GThe summary map based on the partition P is represented as s (g) ═ Vs,EsA, R), wherein Vs=P,More intuitively, each node in the summary graph, called a hyper-point, corresponds to a partition of the original graph node partitions, and each edge, called a hyper-edge, corresponds to a connection between two related node partitions. A hyper-edge join relationship between two hyper-points exists if and only if there is at least one edge join between some of the two hyper-points. To refer to any node vi∈VsRepresents a subset of the nodes in the original graph, any one edge (v) being a super pointi,vj)∈EsFor a super edge, a super point v is indicatediEach node in vjThere is a connection between each node in the set. In other words, forv∈vj,(u,v)∈(vi,vj) Whether or not the edge (u, v) is present in the original graph. Partition viAnd vjThe type of the edge relation between is defined asAbbreviated as ri,p
Due to node merging, a super edge between two super points may add an extra edge or delete an extra edge, thereby causing an error between information contained in the summary graph and the original graph. Let IIi,jRepresenting two corresponding over points viAnd vjSet of edges with fully connected nodes in between, Ai,jIndicating a point of excess viAnd vjThe set of edges that actually exist in the original graph. If there is a super edge (v) in the summary mapi,vj) Then the merging node is increased by | Πi,j|-Ai,jI, an edge; otherwise | A is deleted from the summary charti,jAn | edge. More precisely, a pair of the salient points v in the summary map generated based on the topological graph is definediAnd vjAssociated error ei,jAs follows:
ei,j=min{|Πi,j|-Ai,j|,|Ai,j|}
therefore, the summary error of the summary map S (G) can be defined as
Once the user has selected a summary resolution, i.e. the number of super nodes k, the graph summary problem is automatically transformed into an optimization problem, i.e. a summary graph s (g) is generated, minimizing the summary error E (s (g)). The past literature confirms that the graph summarization problem is an NP-hard problem, the most difficult part of which is to determine a super point set, and once the super point set is determined, a super edge set with the minimum summarization error can be constructed in polynomial time.
The algorithm starts with a summary map initialized to the original map, iteratively merges a pair of nodes into a salient point, forming a lower resolution summary map until k salient points remain in the final summary map. In each iteration step, a pair of outliers with lower introduced errors should be merged, thereby reducing the overall summary error level. Defining and combining two super points v based on node structure obtained by preprocessingiAnd vjForming a point of excess vmThe error increment of (2) is:
Δi,j=em,·-(ei,·+ej,·-ei,j)
here, the first and second liquid crystal display panels are,representation and over-point viAssociated total error, em,·And ej,··In a similar way to that. e.g. of the typei,·+ej,·-ei,jIndicating a point of excess viAnd vjThe sum of the errors associated therewith before combining. Due to ei,jAt ei,·+ej,·Is counted twice, so it is necessary to count from ei,·+ej,·Minus ei,j,。
The above summary error calculation is topology-oriented, and does not consider the error introduced by the node merging which is an edge relation. Defines a name deltaEThe metrics of (a) to evaluate the similarity between node pairs during each merge process, which accumulates errors in topology and edge relationship types between the pair of superpoints and their neighbors. DeltaEThe smaller the value of (c), the more similar the two super points, meaning that the pair of super points are easier to merge. Suppose a super point viAnd vjMerge into vmThe formula for the node pair merging error increment at this time is as follows:
whereinThe error in the topology is represented by,indicating a relationship error. In addition, a tunable parameter alpha (alpha epsilon [0, 1) is introduced]) To balance the importance of topological errors and relational errors. If α is 1, the formula applies to the node pair merging error calculation of the topology, i.e.Is defined as the above-mentioned Δi,jIs defined similarly, whereinThe formula of (c) is defined as follows:
wherein r isi,p=r(vi,vp) Indicating a point of excess viAnd vpThe relationship between them. In the calculation of the relationship error, because of the existence of various relationships, only one super edge is regarded as one edge, and the relationship error can be accumulated from the actual edge of the original graph. In addition, the overtop and the overtop v containing more actual nodes are selectedpThe relationship between as the relationship after the pair of merged nodes, because the over-point is with the over-point vpThe relationship between them dominates the merged relationship.
In the present invention, the proposed parallel graph summarization algorithm is implemented based on the Enron dataset (V36692, E367662). Firstly, evaluating the effectiveness and efficiency of the algorithm by using the abstract error and the abstract execution time; the scalability of the algorithm is then evaluated by increasing the size of the input graph. All experiments were repeated three times or more and the average of the statistical data was graphically displayed.
First, the summary error and the processing time (the number of machines is 4) of the topology map and the attribute map are compared. As can be seen from fig. 2, as the abstract degree of the abstract increases, the error of the abstract increases. For different types of input graphs, the summary error of the attribute graph is always larger than that of the topological graph, but the error difference is less than 10% in general, which shows the effectiveness of the graph summary facing the attribute graph. As can be seen from fig. 3, the digest time increases with the increase of the abstraction level of the digest. For different types of input graphs, the summary time of the attribute graphs is always greater than that of the topological graphs, but the summary time difference is averagely less than 20% on the whole, which indicates that the graph summary of the profile attribute graphs has good efficiency.
The scalability of the algorithm was then evaluated by increasing the size of the input graph (50% abstraction, 4 machines). As can be seen from fig. 4, the summary time increases approximately linearly with the increase of the scale of the input graph, which illustrates that the algorithm has good scalability to some extent.
The embodiments are only preferred embodiments of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (5)

1. A parallel graph summarization method based on an attribute graph is characterized by comprising the following steps:
step 1: preprocessing the acquired graph data, and processing each node in the graph into a node structure with own information and all direct neighbor information;
step 2: randomly selecting a direct neighbor node for a current node, and then selecting a node with the same attribute and the maximum similarity with the current node from all the direct neighbor nodes of the neighbor nodes as a candidate node combined with the current node;
and step 3: judging whether the introduced error exceeds an error threshold value after the current node and the candidate node are combined, if so, returning to the step 2 to continuously search other candidate nodes, and if not, combining the two nodes;
and 4, step 4: and (4) executing two-node combination by updating all node information in the node structure, repeating the steps 3-4 until the number of the remaining nodes is less than a set threshold value, saving the final node structure and exporting the summary graph.
2. The method for abstracting a parallel graph according to claim 1, wherein the error threshold is dynamically adjusted by using a heuristic algorithm of simulated annealing, and the error threshold is continuously increased as the number of the remaining nodes is reduced.
3. The method for parallel graph summarization according to claim 1 wherein the two-node combination cancels the node with the larger ID and reserves the ID of the node with the smaller ID.
4. The parallel graph summarization method according to claim 1 wherein each node in the summary graph, called a super-point, corresponds to a partition of the original graph node partition, and each edge, called a super-edge, corresponds to a connection between two related node partitions; a hyper-edge join relationship between two hyper-points exists if and only if there is at least one edge join between some of the two hyper-points.
5. The method for parallel graph summarization according to claim 1, wherein the introduced error is defined as:
whereinIndicating a point of excess viAnd vjMerge into vmThe introduced error, alpha, is an adjustable parameter,the error in the topology is represented by,representing the relationship error:
vpis a point of excess, VsThe data is a super point set, and the data is a super point set,respectively is a super point vmAnd vpV of over pointiAnd vpV of over pointjAnd vpV of over pointiAnd vpThese four sets of pairwise merge errors; r isi,p、rj,pRespectively representing a point of excess viAnd vpV of over pointjAnd vpThe relationship between these two sets of over-points.
CN201910783949.6A 2019-08-23 2019-08-23 Parallel graph summarization method based on attribute graph Pending CN110598055A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910783949.6A CN110598055A (en) 2019-08-23 2019-08-23 Parallel graph summarization method based on attribute graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910783949.6A CN110598055A (en) 2019-08-23 2019-08-23 Parallel graph summarization method based on attribute graph

Publications (1)

Publication Number Publication Date
CN110598055A true CN110598055A (en) 2019-12-20

Family

ID=68855323

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910783949.6A Pending CN110598055A (en) 2019-08-23 2019-08-23 Parallel graph summarization method based on attribute graph

Country Status (1)

Country Link
CN (1) CN110598055A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113139098A (en) * 2021-03-23 2021-07-20 中国科学院计算技术研究所 Abstract extraction method and system for big homogeneous relation graph
CN113190720A (en) * 2021-05-17 2021-07-30 深圳计算科学研究院 Graph compression-based graph database construction method and device and related components
CN116562923A (en) * 2023-05-26 2023-08-08 深圳般若海科技有限公司 Big data analysis method, system and medium based on electronic commerce behaviors

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113139098A (en) * 2021-03-23 2021-07-20 中国科学院计算技术研究所 Abstract extraction method and system for big homogeneous relation graph
CN113139098B (en) * 2021-03-23 2023-12-12 中国科学院计算技术研究所 Abstract extraction method and system for homogeneity relation large graph
CN113190720A (en) * 2021-05-17 2021-07-30 深圳计算科学研究院 Graph compression-based graph database construction method and device and related components
CN116562923A (en) * 2023-05-26 2023-08-08 深圳般若海科技有限公司 Big data analysis method, system and medium based on electronic commerce behaviors
CN116562923B (en) * 2023-05-26 2023-12-22 深圳般若海科技有限公司 Big data analysis method, system and medium based on electronic commerce behaviors

Similar Documents

Publication Publication Date Title
Yun et al. An efficient algorithm for mining high utility patterns from incremental databases with one database scan
Cheng et al. Efficient core decomposition in massive networks
Yun et al. Incremental mining of weighted maximal frequent itemsets from dynamic databases
Gupta et al. Top-k interesting subgraph discovery in information networks
Censor-Hillel et al. Optimal dynamic distributed MIS
CN110598055A (en) Parallel graph summarization method based on attribute graph
CN110719106B (en) Social network graph compression method and system based on node classification and sorting
Nam et al. Efficient approach for damped window-based high utility pattern mining with list structure
Italiano et al. Dynamic algorithms for the massively parallel computation model
Lin et al. Incrementally updating the discovered sequential patterns based on pre-large concept
Kanezashi et al. Adaptive pattern matching with reinforcement learning for dynamic graphs
Yoo et al. Sampling subgraphs with guaranteed treewidth for accurate and efficient graphical inference
US20200104425A1 (en) Techniques for lossless and lossy large-scale graph summarization
Khan et al. Depth first search in the semi-streaming model
Guclu et al. Synchronization landscapes in small-world-connected computer networks
Mami et al. View selection under multiple resource constraints in a distributed context
Fu et al. Complexity vs. optimality: Unraveling source-destination connection in uncertain graphs
Chen et al. An improved incomplete AP clustering algorithm based on K nearest neighbours
Koh et al. SPO-tree: efficient single pass ordered incremental pattern mining
Zhou et al. Incremental association rule mining based on matrix compression for edge computing
CN111026862B (en) Incremental entity abstract method based on formal concept analysis technology
Hong et al. Incremental fuzzy utility mining with tree structure
CN111680196A (en) Key node searching method based on bipartite graph butterfly structure
Liu et al. A new method of identifying core designers and teams based on the importance and similarity of networks
CN111309786A (en) Parallel frequent item set mining method based on MapReduce

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191220