CN110765319B - Method for improving Janusgraph path exploration performance - Google Patents
Method for improving Janusgraph path exploration performance Download PDFInfo
- Publication number
- CN110765319B CN110765319B CN201910973922.3A CN201910973922A CN110765319B CN 110765319 B CN110765319 B CN 110765319B CN 201910973922 A CN201910973922 A CN 201910973922A CN 110765319 B CN110765319 B CN 110765319B
- Authority
- CN
- China
- Prior art keywords
- vertex
- path
- node
- improving
- janusgraph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000008569 process Effects 0.000 claims abstract description 9
- 230000002457 bidirectional effect Effects 0.000 claims abstract description 8
- 230000008859 change Effects 0.000 claims abstract description 3
- 230000009191 jumping Effects 0.000 claims description 9
- 238000001914 filtration Methods 0.000 claims description 6
- 230000007704 transition Effects 0.000 claims description 3
- 230000004044 response Effects 0.000 abstract description 6
- 238000004364 calculation method Methods 0.000 abstract description 5
- 238000007418 data mining Methods 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 238000004806 packaging method and process Methods 0.000 description 2
- RTAQQCXQSZGOHL-UHFFFAOYSA-N Titanium Chemical compound [Ti] RTAQQCXQSZGOHL-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 210000003462 vein Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method for improving path exploration performance of a Janusgraph, and belongs to the technical field of graph calculation by applying a data mining technology. The method for improving the path exploration performance of the Janusgraph uses a bidirectional breadth-first traversal algorithm to change the process of gradually transiting from an original node to a target node into traversal from two nodes simultaneously, so that the total number of vertexes required to be traversed is reduced, and the number of iterations is reduced. The method for improving the path exploration performance of the Janusgraph can reduce the consumption of storage memory resources, reduce response time, meet the requirements of large data volume, real-time calculation and low response time, and has good popularization and application values.
Description
Technical Field
The invention relates to the technical field of graph computation by applying a data mining technology, and particularly provides a method for improving the path exploration performance of a Janusgraph.
Background
A Graph (Graph) is a mathematical logic object representing a relationship between entities, and is represented in mathematics as a g= (V, E) binary group, which is itself composed of N vertices (v=vertex) and M edges (e=edge), each vertex corresponds to a plurality of edges (<=m), each edge is connected to two vertices, and the edges may have directions, and if the edges included in the Graph have directions, the Graph is called a directed Graph (directed Graph), and vice versa. Graph Theory (Graph Theory) is a mathematical branch of a basic study object, and an Algorithm for solving a Graph by using Graph Theory is called a Graph Algorithm (Graph Algorithm).
In the internet information industry today, graph and graph algorithms are widely used in the field of Data Mining, and the extended applications include: traffic routing, social network computing, commodity recommendation, network communications, and the like.
A graph database (graph database) is a database tool that is accessed as a logical structure from a graph, and common graph databases include: neo4j, janusgraph, titan, etc., in order to meet the current industry demand for graph use, itself generally provides a variety of graph algorithms, including: path explore (path explore), spanning tree (spanning tree), connected graph (connected graph), social Networking Algorithm (SNA), and the like.
Path exploration is one of the commonly used graph algorithms for finding the association (p= > (V, E)) between two arbitrary vertices of a directed graph or undirected graph, i.e. the path from one point to another, and this algorithm application mainly comprises two kinds of: shortest path (shorttestpath), such as calculating the walking way from one place to another place with the least number of steps on the map of the city; path traversal (path transition), such as for finding a six degree relationship between two, how one finds another person through his vein relationship.
The JanusGraph is an open-source graph database product, and is widely used in the field of data analysis due to the advantages of good universality, high performance, open source codes and the like, and a graph algorithm used by the JanusGraph for providing a path exploration function is depth-first traversal (DFS), and the algorithm has the following defects:
(1) Poor performance, longer response time: depth-first traversal adopts ideas similar to exploration and construction of mazes, each possible vertex of a graph needs to be traversed, if the scale of the graph is very large, exploration cannot be completed within acceptable time, and the depth-first traversal is not suitable for real-time computing scenes and is more suitable for offline (offline) computing;
(2) The resource consumption is high: when the algorithm is executed, all traversed vertex information is required to be recorded into the memory so as to meet the requirement of the algorithm, when the data calculation amount is relatively dense, the memory consumption is gradually increased, and the calculation resource requirement is relatively high.
These problems make it difficult for the path exploration function of janus graphs to respond to external path exploration requests quickly with low latency in a scenario with large data volumes and real-time (real) requirements, and the memory consumption in the query process is very high.
Disclosure of Invention
The technical task of the invention is to provide a method for improving the path exploration performance of the Janusgraph, which aims at the problems, reduces the consumption of storage memory resources, reduces response time, meets the requirements of large data volume, real-time calculation and low response time.
In order to achieve the above purpose, the present invention provides the following technical solutions:
a method for improving the path exploration performance of Janusgraph uses a bidirectional breadth-first traversal algorithm to change the gradual transition process from an original node to a target node into traversal from two nodes simultaneously, the total number of vertexes required to be traversed is reduced, and the number of iterations is reduced.
Preferably, the method for improving the path exploration performance of the Janusgraph specifically comprises the following steps:
s1, defining a set v1 and a set v2;
s2, traversing the source vertex and the target vertex by the source node and the target node, taking out the source node loading set v1, and taking out the target node loading set v2;
s3, traversing the set v1, and taking out each vertex v;
s4, expanding the vertex v, finding a node path, loading the matched vertex into v1a, judging whether a coincident intersection exists between v1a and v2, if yes, jumping to S9, otherwise, executing step S5;
s5, replacing v1 with the content of v1 a;
s6, traversing the set v2, and taking out each vertex v;
s7, expanding the vertex v, finding a node path, loading the matched vertex into v2a, judging whether a coincident intersection exists between v2a and v1, if yes, jumping to S9, otherwise, executing step S8;
s8, replacing v2 with the content of v2 a;
s9, jumping to the step S4;
and S10, connecting the node path of v2 with v1, and outputting a path result.
Preferably, in the method, parallel node expansion is adopted, the vertex expansion step S4 and the vertex expansion step S7 are operated in parallel by two-way breadth-first traversal, and the expansion of a plurality of nodes is executed simultaneously as a plurality of tasks.
Preferably, the path length is limited, a maximum step length is set for the query path, and the query path is calculated as the absence of the path when the path of the node is not found after the maximum step length is exceeded.
The traversal process of the graph can limit the maximum step length, and the query process is prevented from falling into a dead loop.
Preferably, super nodes and hot data are optimized, and edges of nodes with a large number of edges are additionally cached in memory.
Preferably, the super node and the hot data are cached, so that access to the graph database and occupied time are reduced.
Preferably, in step S4, if the vertex v is a directed graph, the node path is found by filtering the directed edges, the conforming vertex is loaded into v1a, and whether there is a coincidence intersection between v1a and v2 is determined.
Preferably, in step S7, if the vertex v is a directed graph, filtering the direction according to the opposite direction, finding a node path, loading the conforming vertex into v2a, and determining whether there is a coincidence intersection between v2a and v1.
The method for improving the path exploration performance of the Janusgraph adopts a mode compatible with the Janusgraph to carry out integration as follows: and (3) packaging the algorithm for realizing the bidirectional breadth-first traversal into a tracker model supported by the Janusgraph, recompiling, and placing the algorithm under a lib catalog of the service to replace the service.
Compared with the prior art, the method for improving the path exploration performance of the Janusgraph has the following outstanding beneficial effects: the method for improving the path exploration performance of the Janusgraph optimizes the bi-directional breadth-first traversal, increases the maximum step limit for the algorithm, increases the cache of super nodes, reduces the access and occupation of a graph database, improves the parallel execution of node expansion, solves the problems of long response time and overlarge resource consumption commonly existing in the depth-first traversal path exploration algorithm used by the Janusgraph database, and has good popularization and application values.
Drawings
FIG. 1 is a flow chart of a method of improving Janusgraph path exploration performance in accordance with the present invention.
Detailed Description
The method for improving the path exploration performance of the Janusgraph of the present invention will be described in further detail below with reference to the accompanying drawings and examples.
Examples
As shown in fig. 1, in the method for improving the path exploration performance of the janus graph, a bidirectional breadth-first traversal algorithm is used, so that the process of gradually transiting from an original node to a target node is changed into traversal from two nodes simultaneously, the total number of vertexes required to be traversed is reduced, and the number of iterations is reduced.
The method for improving the Janusgraph path exploration performance specifically comprises the following steps:
s1, defining a set v1 and a set v2.
S2, traversing the source vertex and the target vertex by the source node and the target node, taking out the source node loading set v1, and taking out the target node loading set v2.
S3, traversing the set v1, and taking out each vertex v.
S4, expanding the vertex v, finding a node path, loading the matched vertex into v1a, judging whether a coincident intersection exists between v1a and v2, if yes, jumping to S9, otherwise, executing step S5.
And expanding the vertex v, if the vertex v is a directed graph, filtering according to opposite sides of the direction, finding a node path, loading the matched vertex into v1a, and judging whether a coincident intersection exists between v1a and v2.
S5, replacing v1 with the content of v1 a.
S6, traversing the set v2, and taking out each vertex v.
And S7, expanding the vertex v, finding a node path, loading the matched vertex into v2a, judging whether a coincident intersection exists between v2a and v1, if yes, jumping to S9, otherwise, executing the step S8.
And if the vertex v is a directed graph, performing direction filtering according to the opposite sides of the direction, finding a node path, loading the conforming vertex into v2a, and judging whether a coincidence intersection exists between v2a and v1.
S8, replacing v2 with the content of v2 a.
S9, jumping to the step S4.
And S10, connecting the node path of v2 with v1, and outputting a path result.
Wherein the method can also be optimized as follows:
(1) In the method, parallel node expansion is adopted, the vertex expansion step S4 and the vertex expansion step S7 are operated in parallel by the bidirectional breadth-first traversal, and the expansion of a plurality of nodes is executed simultaneously as a plurality of tasks.
(2) And limiting the path length, setting a maximum step length for the query path, and calculating that the path does not exist when the path of the node is not found after the maximum step length is exceeded.
The traversal process of the graph can limit the maximum step length, and the query process is prevented from falling into a dead loop.
(3) Super nodes and hot data optimization, wherein the edges of nodes with a large number of edges are additionally cached in a memory.
And caching the super node and the hot data, and reducing access and occupation time to the graph database.
The method for improving the path exploration performance of the Janusgraph adopts a mode compatible with the Janusgraph to carry out integration as follows: and (3) packaging the algorithm for realizing the bidirectional breadth-first traversal into a tracker model supported by the Janusgraph, recompiling, and placing the algorithm under a lib catalog of the service to replace the service.
The Janusgraph service is divided into websocket interfaces, graph operations, algorithms and graph databases, the algorithms for realizing the two-way breadth-first traversal are packaged into a tracker model supported by the Janusgraph, and the tracker model is recompiled and placed under a lib catalog of the service to replace the server model.
The above embodiments are only preferred embodiments of the present invention, and it is intended that the common variations and substitutions made by those skilled in the art within the scope of the technical solution of the present invention are included in the scope of the present invention.
Claims (5)
1. A method for improving the path exploration performance of a Janusgraph is characterized by comprising the following steps of: the method uses a bidirectional breadth-first traversal algorithm to change the gradual transition process from an original node to a target node into traversal from two nodes simultaneously, the total number of vertexes required to be traversed is reduced, and the number of iterations is reduced, and the method specifically comprises the following steps:
s1, defining a set v1 and a set v2;
s2, traversing the source vertex and the target vertex by the source node and the target node, taking out the source node loading set v1, and taking out the target node loading set v2;
s3, traversing the set v1, and taking out each vertex v;
s4, expanding the vertex v, finding a node path, loading the matched vertex into v1a, judging whether a coincident intersection exists between v1a and v2, if yes, jumping to S9, otherwise, executing step S5;
s5, replacing v1 with the content of v1 a;
s6, traversing the set v2, and taking out each vertex v;
s7, expanding the vertex v, finding a node path, loading the matched vertex into v2a, judging whether a coincident intersection exists between v2a and v1, if yes, jumping to S9, otherwise, executing step S8;
s8, replacing v2 with the content of v2 a;
s9, jumping to the step S4;
and S10, connecting the node path of v2 with v1, and outputting a path result.
2. The method for improving the path exploration performance of a janus graph according to claim 1, wherein: in the method, parallel node expansion is adopted, the vertex expansion step S4 and the vertex expansion step S7 are operated in parallel by the bidirectional breadth-first traversal, and the expansion of a plurality of nodes is executed simultaneously as a plurality of tasks.
3. The method for improving the path exploration performance of a janus graph according to claim 2, wherein: and limiting the path length, setting a maximum step length for the query path, and calculating that the path does not exist when the path of the node is not found after the maximum step length is exceeded.
4. The method for improving the path discovery performance of a janus graph according to claim 3, wherein: in step S4, expanding the vertex v, if the vertex v is a directed graph, filtering according to opposite sides of the direction, finding a node path, loading the conforming vertex into v1a, and judging whether a coincidence intersection exists between v1a and v2.
5. The method for improving the path discovery performance of a janus graph according to claim 4, wherein: in step S7, expanding the vertex v, if the vertex v is a directed graph, filtering the direction according to the opposite direction, finding a node path, loading the conforming vertex into v2a, and judging whether a superposition intersection exists between v2a and v1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910973922.3A CN110765319B (en) | 2019-10-14 | 2019-10-14 | Method for improving Janusgraph path exploration performance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910973922.3A CN110765319B (en) | 2019-10-14 | 2019-10-14 | Method for improving Janusgraph path exploration performance |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110765319A CN110765319A (en) | 2020-02-07 |
CN110765319B true CN110765319B (en) | 2024-03-26 |
Family
ID=69332099
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910973922.3A Active CN110765319B (en) | 2019-10-14 | 2019-10-14 | Method for improving Janusgraph path exploration performance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110765319B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112333101B (en) * | 2020-10-16 | 2022-08-12 | 烽火通信科技股份有限公司 | Network topology path finding method, device, equipment and storage medium |
CN112348935B (en) * | 2020-11-06 | 2022-09-23 | 芯勍(上海)智能化科技股份有限公司 | Wire frame rendering method, terminal device and computer-readable storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105225187A (en) * | 2015-10-09 | 2016-01-06 | 苏州盛景信息科技股份有限公司 | Based on the pipe network spacial analytical method of breadth-first search |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101344538B1 (en) * | 2011-09-29 | 2013-12-26 | 인텔 코포레이션 | Cache and/or socket sensitive multi-processor cores breadth-first traversal |
US10810257B2 (en) * | 2015-08-27 | 2020-10-20 | Oracle International Corporation | Fast processing of path-finding queries in large graph databases |
-
2019
- 2019-10-14 CN CN201910973922.3A patent/CN110765319B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105225187A (en) * | 2015-10-09 | 2016-01-06 | 苏州盛景信息科技股份有限公司 | Based on the pipe network spacial analytical method of breadth-first search |
Non-Patent Citations (1)
Title |
---|
宋磊 ; 贾进章 ; .广度优先路径搜索法在流体网络中的应用研究.煤炭技术.2009,(12),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN110765319A (en) | 2020-02-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11580441B2 (en) | Model training method and apparatus | |
WO2017156968A1 (en) | Neural network computing method, system and device therefor | |
CN103455531B (en) | A kind of parallel index method supporting high dimensional data to have inquiry partially in real time | |
CN110941494A (en) | Deep learning-oriented GPU parallel computing data processing method | |
CN110765319B (en) | Method for improving Janusgraph path exploration performance | |
CN103810244A (en) | Distributed data storage system expansion method based on data distribution | |
US9922133B2 (en) | Live topological query | |
CN110941619B (en) | Definition method of graph data storage model and structure for various usage scenes | |
CN106649882B (en) | Spatial data management middleware applied to telecommunication field and implementation method thereof | |
CN106055590A (en) | Power grid data processing method and system based on big data and graph database | |
CN108388603A (en) | The construction method and querying method of distributed summary data structure based on Spark frames | |
CN108389152B (en) | Graph processing method and device for graph structure perception | |
CN107315694A (en) | A kind of buffer consistency management method and Node Controller | |
CN109739433A (en) | The method and terminal device of data processing | |
US20240095241A1 (en) | Data search method and apparatus, and device | |
US11729268B2 (en) | Computer-implemented method, system, and storage medium for prefetching in a distributed graph architecture | |
CN112699134A (en) | Distributed graph database storage and query method based on graph subdivision | |
CN104331336B (en) | Be matched with the multilayer nest balancing method of loads of high-performance computer structure | |
CN114579537A (en) | Distributed graph database optimization method and device, electronic equipment and storage medium | |
CN113886652B (en) | Memory-priority multimode graph data storage and calculation method and system | |
CN112116081B (en) | Optimization method and device for deep learning network | |
CN114741029A (en) | Data distribution method applied to deduplication storage system and related equipment | |
CN110851178B (en) | Inter-process program static analysis method based on distributed graph reachable computation | |
CN108009099B (en) | Acceleration method and device applied to K-Mean clustering algorithm | |
CN113626650A (en) | Service processing method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Country or region after: China Address after: 271000 Langchao science and Technology Park, 527 Dongyue street, Tai'an City, Shandong Province Applicant after: INSPUR SOFTWARE Co.,Ltd. Address before: No. 1036, Shandong high tech Zone wave road, Ji'nan, Shandong Applicant before: INSPUR SOFTWARE Co.,Ltd. Country or region before: China |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |