CN110765319B - Method for improving Janusgraph path exploration performance - Google Patents

Method for improving Janusgraph path exploration performance Download PDF

Info

Publication number
CN110765319B
CN110765319B CN201910973922.3A CN201910973922A CN110765319B CN 110765319 B CN110765319 B CN 110765319B CN 201910973922 A CN201910973922 A CN 201910973922A CN 110765319 B CN110765319 B CN 110765319B
Authority
CN
China
Prior art keywords
vertex
path
node
improving
janusgraph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910973922.3A
Other languages
Chinese (zh)
Other versions
CN110765319A (en
Inventor
解一豪
周庆勇
赵振修
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Co Ltd
Original Assignee
Inspur Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Co Ltd filed Critical Inspur Software Co Ltd
Priority to CN201910973922.3A priority Critical patent/CN110765319B/en
Publication of CN110765319A publication Critical patent/CN110765319A/en
Application granted granted Critical
Publication of CN110765319B publication Critical patent/CN110765319B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for improving path exploration performance of a Janusgraph, and belongs to the technical field of graph calculation by applying a data mining technology. The method for improving the path exploration performance of the Janusgraph uses a bidirectional breadth-first traversal algorithm to change the process of gradually transiting from an original node to a target node into traversal from two nodes simultaneously, so that the total number of vertexes required to be traversed is reduced, and the number of iterations is reduced. The method for improving the path exploration performance of the Janusgraph can reduce the consumption of storage memory resources, reduce response time, meet the requirements of large data volume, real-time calculation and low response time, and has good popularization and application values.

Description

Method for improving Janusgraph path exploration performance
Technical Field
The invention relates to the technical field of graph computation by applying a data mining technology, and particularly provides a method for improving the path exploration performance of a Janusgraph.
Background
A Graph (Graph) is a mathematical logic object representing a relationship between entities, and is represented in mathematics as a g= (V, E) binary group, which is itself composed of N vertices (v=vertex) and M edges (e=edge), each vertex corresponds to a plurality of edges (<=m), each edge is connected to two vertices, and the edges may have directions, and if the edges included in the Graph have directions, the Graph is called a directed Graph (directed Graph), and vice versa. Graph Theory (Graph Theory) is a mathematical branch of a basic study object, and an Algorithm for solving a Graph by using Graph Theory is called a Graph Algorithm (Graph Algorithm).
In the internet information industry today, graph and graph algorithms are widely used in the field of Data Mining, and the extended applications include: traffic routing, social network computing, commodity recommendation, network communications, and the like.
A graph database (graph database) is a database tool that is accessed as a logical structure from a graph, and common graph databases include: neo4j, janusgraph, titan, etc., in order to meet the current industry demand for graph use, itself generally provides a variety of graph algorithms, including: path explore (path explore), spanning tree (spanning tree), connected graph (connected graph), social Networking Algorithm (SNA), and the like.
Path exploration is one of the commonly used graph algorithms for finding the association (p= > (V, E)) between two arbitrary vertices of a directed graph or undirected graph, i.e. the path from one point to another, and this algorithm application mainly comprises two kinds of: shortest path (shorttestpath), such as calculating the walking way from one place to another place with the least number of steps on the map of the city; path traversal (path transition), such as for finding a six degree relationship between two, how one finds another person through his vein relationship.
The JanusGraph is an open-source graph database product, and is widely used in the field of data analysis due to the advantages of good universality, high performance, open source codes and the like, and a graph algorithm used by the JanusGraph for providing a path exploration function is depth-first traversal (DFS), and the algorithm has the following defects:
(1) Poor performance, longer response time: depth-first traversal adopts ideas similar to exploration and construction of mazes, each possible vertex of a graph needs to be traversed, if the scale of the graph is very large, exploration cannot be completed within acceptable time, and the depth-first traversal is not suitable for real-time computing scenes and is more suitable for offline (offline) computing;
(2) The resource consumption is high: when the algorithm is executed, all traversed vertex information is required to be recorded into the memory so as to meet the requirement of the algorithm, when the data calculation amount is relatively dense, the memory consumption is gradually increased, and the calculation resource requirement is relatively high.
These problems make it difficult for the path exploration function of janus graphs to respond to external path exploration requests quickly with low latency in a scenario with large data volumes and real-time (real) requirements, and the memory consumption in the query process is very high.
Disclosure of Invention
The technical task of the invention is to provide a method for improving the path exploration performance of the Janusgraph, which aims at the problems, reduces the consumption of storage memory resources, reduces response time, meets the requirements of large data volume, real-time calculation and low response time.
In order to achieve the above purpose, the present invention provides the following technical solutions:
a method for improving the path exploration performance of Janusgraph uses a bidirectional breadth-first traversal algorithm to change the gradual transition process from an original node to a target node into traversal from two nodes simultaneously, the total number of vertexes required to be traversed is reduced, and the number of iterations is reduced.
Preferably, the method for improving the path exploration performance of the Janusgraph specifically comprises the following steps:
s1, defining a set v1 and a set v2;
s2, traversing the source vertex and the target vertex by the source node and the target node, taking out the source node loading set v1, and taking out the target node loading set v2;
s3, traversing the set v1, and taking out each vertex v;
s4, expanding the vertex v, finding a node path, loading the matched vertex into v1a, judging whether a coincident intersection exists between v1a and v2, if yes, jumping to S9, otherwise, executing step S5;
s5, replacing v1 with the content of v1 a;
s6, traversing the set v2, and taking out each vertex v;
s7, expanding the vertex v, finding a node path, loading the matched vertex into v2a, judging whether a coincident intersection exists between v2a and v1, if yes, jumping to S9, otherwise, executing step S8;
s8, replacing v2 with the content of v2 a;
s9, jumping to the step S4;
and S10, connecting the node path of v2 with v1, and outputting a path result.
Preferably, in the method, parallel node expansion is adopted, the vertex expansion step S4 and the vertex expansion step S7 are operated in parallel by two-way breadth-first traversal, and the expansion of a plurality of nodes is executed simultaneously as a plurality of tasks.
Preferably, the path length is limited, a maximum step length is set for the query path, and the query path is calculated as the absence of the path when the path of the node is not found after the maximum step length is exceeded.
The traversal process of the graph can limit the maximum step length, and the query process is prevented from falling into a dead loop.
Preferably, super nodes and hot data are optimized, and edges of nodes with a large number of edges are additionally cached in memory.
Preferably, the super node and the hot data are cached, so that access to the graph database and occupied time are reduced.
Preferably, in step S4, if the vertex v is a directed graph, the node path is found by filtering the directed edges, the conforming vertex is loaded into v1a, and whether there is a coincidence intersection between v1a and v2 is determined.
Preferably, in step S7, if the vertex v is a directed graph, filtering the direction according to the opposite direction, finding a node path, loading the conforming vertex into v2a, and determining whether there is a coincidence intersection between v2a and v1.
The method for improving the path exploration performance of the Janusgraph adopts a mode compatible with the Janusgraph to carry out integration as follows: and (3) packaging the algorithm for realizing the bidirectional breadth-first traversal into a tracker model supported by the Janusgraph, recompiling, and placing the algorithm under a lib catalog of the service to replace the service.
Compared with the prior art, the method for improving the path exploration performance of the Janusgraph has the following outstanding beneficial effects: the method for improving the path exploration performance of the Janusgraph optimizes the bi-directional breadth-first traversal, increases the maximum step limit for the algorithm, increases the cache of super nodes, reduces the access and occupation of a graph database, improves the parallel execution of node expansion, solves the problems of long response time and overlarge resource consumption commonly existing in the depth-first traversal path exploration algorithm used by the Janusgraph database, and has good popularization and application values.
Drawings
FIG. 1 is a flow chart of a method of improving Janusgraph path exploration performance in accordance with the present invention.
Detailed Description
The method for improving the path exploration performance of the Janusgraph of the present invention will be described in further detail below with reference to the accompanying drawings and examples.
Examples
As shown in fig. 1, in the method for improving the path exploration performance of the janus graph, a bidirectional breadth-first traversal algorithm is used, so that the process of gradually transiting from an original node to a target node is changed into traversal from two nodes simultaneously, the total number of vertexes required to be traversed is reduced, and the number of iterations is reduced.
The method for improving the Janusgraph path exploration performance specifically comprises the following steps:
s1, defining a set v1 and a set v2.
S2, traversing the source vertex and the target vertex by the source node and the target node, taking out the source node loading set v1, and taking out the target node loading set v2.
S3, traversing the set v1, and taking out each vertex v.
S4, expanding the vertex v, finding a node path, loading the matched vertex into v1a, judging whether a coincident intersection exists between v1a and v2, if yes, jumping to S9, otherwise, executing step S5.
And expanding the vertex v, if the vertex v is a directed graph, filtering according to opposite sides of the direction, finding a node path, loading the matched vertex into v1a, and judging whether a coincident intersection exists between v1a and v2.
S5, replacing v1 with the content of v1 a.
S6, traversing the set v2, and taking out each vertex v.
And S7, expanding the vertex v, finding a node path, loading the matched vertex into v2a, judging whether a coincident intersection exists between v2a and v1, if yes, jumping to S9, otherwise, executing the step S8.
And if the vertex v is a directed graph, performing direction filtering according to the opposite sides of the direction, finding a node path, loading the conforming vertex into v2a, and judging whether a coincidence intersection exists between v2a and v1.
S8, replacing v2 with the content of v2 a.
S9, jumping to the step S4.
And S10, connecting the node path of v2 with v1, and outputting a path result.
Wherein the method can also be optimized as follows:
(1) In the method, parallel node expansion is adopted, the vertex expansion step S4 and the vertex expansion step S7 are operated in parallel by the bidirectional breadth-first traversal, and the expansion of a plurality of nodes is executed simultaneously as a plurality of tasks.
(2) And limiting the path length, setting a maximum step length for the query path, and calculating that the path does not exist when the path of the node is not found after the maximum step length is exceeded.
The traversal process of the graph can limit the maximum step length, and the query process is prevented from falling into a dead loop.
(3) Super nodes and hot data optimization, wherein the edges of nodes with a large number of edges are additionally cached in a memory.
And caching the super node and the hot data, and reducing access and occupation time to the graph database.
The method for improving the path exploration performance of the Janusgraph adopts a mode compatible with the Janusgraph to carry out integration as follows: and (3) packaging the algorithm for realizing the bidirectional breadth-first traversal into a tracker model supported by the Janusgraph, recompiling, and placing the algorithm under a lib catalog of the service to replace the service.
The Janusgraph service is divided into websocket interfaces, graph operations, algorithms and graph databases, the algorithms for realizing the two-way breadth-first traversal are packaged into a tracker model supported by the Janusgraph, and the tracker model is recompiled and placed under a lib catalog of the service to replace the server model.
The above embodiments are only preferred embodiments of the present invention, and it is intended that the common variations and substitutions made by those skilled in the art within the scope of the technical solution of the present invention are included in the scope of the present invention.

Claims (5)

1. A method for improving the path exploration performance of a Janusgraph is characterized by comprising the following steps of: the method uses a bidirectional breadth-first traversal algorithm to change the gradual transition process from an original node to a target node into traversal from two nodes simultaneously, the total number of vertexes required to be traversed is reduced, and the number of iterations is reduced, and the method specifically comprises the following steps:
s1, defining a set v1 and a set v2;
s2, traversing the source vertex and the target vertex by the source node and the target node, taking out the source node loading set v1, and taking out the target node loading set v2;
s3, traversing the set v1, and taking out each vertex v;
s4, expanding the vertex v, finding a node path, loading the matched vertex into v1a, judging whether a coincident intersection exists between v1a and v2, if yes, jumping to S9, otherwise, executing step S5;
s5, replacing v1 with the content of v1 a;
s6, traversing the set v2, and taking out each vertex v;
s7, expanding the vertex v, finding a node path, loading the matched vertex into v2a, judging whether a coincident intersection exists between v2a and v1, if yes, jumping to S9, otherwise, executing step S8;
s8, replacing v2 with the content of v2 a;
s9, jumping to the step S4;
and S10, connecting the node path of v2 with v1, and outputting a path result.
2. The method for improving the path exploration performance of a janus graph according to claim 1, wherein: in the method, parallel node expansion is adopted, the vertex expansion step S4 and the vertex expansion step S7 are operated in parallel by the bidirectional breadth-first traversal, and the expansion of a plurality of nodes is executed simultaneously as a plurality of tasks.
3. The method for improving the path exploration performance of a janus graph according to claim 2, wherein: and limiting the path length, setting a maximum step length for the query path, and calculating that the path does not exist when the path of the node is not found after the maximum step length is exceeded.
4. The method for improving the path discovery performance of a janus graph according to claim 3, wherein: in step S4, expanding the vertex v, if the vertex v is a directed graph, filtering according to opposite sides of the direction, finding a node path, loading the conforming vertex into v1a, and judging whether a coincidence intersection exists between v1a and v2.
5. The method for improving the path discovery performance of a janus graph according to claim 4, wherein: in step S7, expanding the vertex v, if the vertex v is a directed graph, filtering the direction according to the opposite direction, finding a node path, loading the conforming vertex into v2a, and judging whether a superposition intersection exists between v2a and v1.
CN201910973922.3A 2019-10-14 2019-10-14 Method for improving Janusgraph path exploration performance Active CN110765319B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910973922.3A CN110765319B (en) 2019-10-14 2019-10-14 Method for improving Janusgraph path exploration performance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910973922.3A CN110765319B (en) 2019-10-14 2019-10-14 Method for improving Janusgraph path exploration performance

Publications (2)

Publication Number Publication Date
CN110765319A CN110765319A (en) 2020-02-07
CN110765319B true CN110765319B (en) 2024-03-26

Family

ID=69332099

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910973922.3A Active CN110765319B (en) 2019-10-14 2019-10-14 Method for improving Janusgraph path exploration performance

Country Status (1)

Country Link
CN (1) CN110765319B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112333101B (en) * 2020-10-16 2022-08-12 烽火通信科技股份有限公司 Network topology path finding method, device, equipment and storage medium
CN112348935B (en) * 2020-11-06 2022-09-23 芯勍(上海)智能化科技股份有限公司 Wire frame rendering method, terminal device and computer-readable storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105225187A (en) * 2015-10-09 2016-01-06 苏州盛景信息科技股份有限公司 Based on the pipe network spacial analytical method of breadth-first search

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101344538B1 (en) * 2011-09-29 2013-12-26 인텔 코포레이션 Cache and/or socket sensitive multi-processor cores breadth-first traversal
US10810257B2 (en) * 2015-08-27 2020-10-20 Oracle International Corporation Fast processing of path-finding queries in large graph databases

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105225187A (en) * 2015-10-09 2016-01-06 苏州盛景信息科技股份有限公司 Based on the pipe network spacial analytical method of breadth-first search

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
宋磊 ; 贾进章 ; .广度优先路径搜索法在流体网络中的应用研究.煤炭技术.2009,(12),全文. *

Also Published As

Publication number Publication date
CN110765319A (en) 2020-02-07

Similar Documents

Publication Publication Date Title
US11580441B2 (en) Model training method and apparatus
WO2017156968A1 (en) Neural network computing method, system and device therefor
CN103455531B (en) A kind of parallel index method supporting high dimensional data to have inquiry partially in real time
CN110941494A (en) Deep learning-oriented GPU parallel computing data processing method
CN110765319B (en) Method for improving Janusgraph path exploration performance
CN103810244A (en) Distributed data storage system expansion method based on data distribution
US9922133B2 (en) Live topological query
CN110941619B (en) Definition method of graph data storage model and structure for various usage scenes
CN106649882B (en) Spatial data management middleware applied to telecommunication field and implementation method thereof
CN106055590A (en) Power grid data processing method and system based on big data and graph database
CN108388603A (en) The construction method and querying method of distributed summary data structure based on Spark frames
CN108389152B (en) Graph processing method and device for graph structure perception
CN107315694A (en) A kind of buffer consistency management method and Node Controller
CN109739433A (en) The method and terminal device of data processing
US20240095241A1 (en) Data search method and apparatus, and device
US11729268B2 (en) Computer-implemented method, system, and storage medium for prefetching in a distributed graph architecture
CN112699134A (en) Distributed graph database storage and query method based on graph subdivision
CN104331336B (en) Be matched with the multilayer nest balancing method of loads of high-performance computer structure
CN114579537A (en) Distributed graph database optimization method and device, electronic equipment and storage medium
CN113886652B (en) Memory-priority multimode graph data storage and calculation method and system
CN112116081B (en) Optimization method and device for deep learning network
CN114741029A (en) Data distribution method applied to deduplication storage system and related equipment
CN110851178B (en) Inter-process program static analysis method based on distributed graph reachable computation
CN108009099B (en) Acceleration method and device applied to K-Mean clustering algorithm
CN113626650A (en) Service processing method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Country or region after: China

Address after: 271000 Langchao science and Technology Park, 527 Dongyue street, Tai'an City, Shandong Province

Applicant after: INSPUR SOFTWARE Co.,Ltd.

Address before: No. 1036, Shandong high tech Zone wave road, Ji'nan, Shandong

Applicant before: INSPUR SOFTWARE Co.,Ltd.

Country or region before: China

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant