CN110765319B

CN110765319B - Method for improving Janusgraph path exploration performance

Info

Publication number: CN110765319B
Application number: CN201910973922.3A
Authority: CN
Inventors: 解一豪; 周庆勇; 赵振修
Original assignee: Inspur Software Co Ltd
Current assignee: Inspur Software Co Ltd
Priority date: 2019-10-14
Filing date: 2019-10-14
Publication date: 2024-03-26
Anticipated expiration: 2039-10-14
Also published as: CN110765319A

Abstract

The invention discloses a method for improving path exploration performance of a Janusgraph, and belongs to the technical field of graph calculation by applying a data mining technology. The method for improving the path exploration performance of the Janusgraph uses a bidirectional breadth-first traversal algorithm to change the process of gradually transiting from an original node to a target node into traversal from two nodes simultaneously, so that the total number of vertexes required to be traversed is reduced, and the number of iterations is reduced. The method for improving the path exploration performance of the Janusgraph can reduce the consumption of storage memory resources, reduce response time, meet the requirements of large data volume, real-time calculation and low response time, and has good popularization and application values.

Description

Method for improving Janusgraph path exploration performance

Technical Field

The invention relates to the technical field of graph computation by applying a data mining technology, and particularly provides a method for improving the path exploration performance of a Janusgraph.

Background

A Graph (Graph) is a mathematical logic object representing a relationship between entities, and is represented in mathematics as a g= (V, E) binary group, which is itself composed of N vertices (v=vertex) and M edges (e=edge), each vertex corresponds to a plurality of edges (<=m), each edge is connected to two vertices, and the edges may have directions, and if the edges included in the Graph have directions, the Graph is called a directed Graph (directed Graph), and vice versa. Graph Theory (Graph Theory) is a mathematical branch of a basic study object, and an Algorithm for solving a Graph by using Graph Theory is called a Graph Algorithm (Graph Algorithm).

In the internet information industry today, graph and graph algorithms are widely used in the field of Data Mining, and the extended applications include: traffic routing, social network computing, commodity recommendation, network communications, and the like.

A graph database (graph database) is a database tool that is accessed as a logical structure from a graph, and common graph databases include: neo4j, janusgraph, titan, etc., in order to meet the current industry demand for graph use, itself generally provides a variety of graph algorithms, including: path explore (path explore), spanning tree (spanning tree), connected graph (connected graph), social Networking Algorithm (SNA), and the like.

Path exploration is one of the commonly used graph algorithms for finding the association (p= > (V, E)) between two arbitrary vertices of a directed graph or undirected graph, i.e. the path from one point to another, and this algorithm application mainly comprises two kinds of: shortest path (shorttestpath), such as calculating the walking way from one place to another place with the least number of steps on the map of the city; path traversal (path transition), such as for finding a six degree relationship between two, how one finds another person through his vein relationship.

The JanusGraph is an open-source graph database product, and is widely used in the field of data analysis due to the advantages of good universality, high performance, open source codes and the like, and a graph algorithm used by the JanusGraph for providing a path exploration function is depth-first traversal (DFS), and the algorithm has the following defects:

(1) Poor performance, longer response time: depth-first traversal adopts ideas similar to exploration and construction of mazes, each possible vertex of a graph needs to be traversed, if the scale of the graph is very large, exploration cannot be completed within acceptable time, and the depth-first traversal is not suitable for real-time computing scenes and is more suitable for offline (offline) computing;

(2) The resource consumption is high: when the algorithm is executed, all traversed vertex information is required to be recorded into the memory so as to meet the requirement of the algorithm, when the data calculation amount is relatively dense, the memory consumption is gradually increased, and the calculation resource requirement is relatively high.

These problems make it difficult for the path exploration function of janus graphs to respond to external path exploration requests quickly with low latency in a scenario with large data volumes and real-time (real) requirements, and the memory consumption in the query process is very high.

Disclosure of Invention

The technical task of the invention is to provide a method for improving the path exploration performance of the Janusgraph, which aims at the problems, reduces the consumption of storage memory resources, reduces response time, meets the requirements of large data volume, real-time calculation and low response time.

In order to achieve the above purpose, the present invention provides the following technical solutions:

a method for improving the path exploration performance of Janusgraph uses a bidirectional breadth-first traversal algorithm to change the gradual transition process from an original node to a target node into traversal from two nodes simultaneously, the total number of vertexes required to be traversed is reduced, and the number of iterations is reduced.

Preferably, the method for improving the path exploration performance of the Janusgraph specifically comprises the following steps:

s1, defining a set v1 and a set v2;

s2, traversing the source vertex and the target vertex by the source node and the target node, taking out the source node loading set v1, and taking out the target node loading set v2;

s3, traversing the set v1, and taking out each vertex v;

s4, expanding the vertex v, finding a node path, loading the matched vertex into v1a, judging whether a coincident intersection exists between v1a and v2, if yes, jumping to S9, otherwise, executing step S5;

s5, replacing v1 with the content of v1 a;

s6, traversing the set v2, and taking out each vertex v;

s7, expanding the vertex v, finding a node path, loading the matched vertex into v2a, judging whether a coincident intersection exists between v2a and v1, if yes, jumping to S9, otherwise, executing step S8;

s8, replacing v2 with the content of v2 a;

s9, jumping to the step S4;

and S10, connecting the node path of v2 with v1, and outputting a path result.

Preferably, in the method, parallel node expansion is adopted, the vertex expansion step S4 and the vertex expansion step S7 are operated in parallel by two-way breadth-first traversal, and the expansion of a plurality of nodes is executed simultaneously as a plurality of tasks.

Preferably, the path length is limited, a maximum step length is set for the query path, and the query path is calculated as the absence of the path when the path of the node is not found after the maximum step length is exceeded.

The traversal process of the graph can limit the maximum step length, and the query process is prevented from falling into a dead loop.

Preferably, super nodes and hot data are optimized, and edges of nodes with a large number of edges are additionally cached in memory.

Preferably, the super node and the hot data are cached, so that access to the graph database and occupied time are reduced.

Preferably, in step S4, if the vertex v is a directed graph, the node path is found by filtering the directed edges, the conforming vertex is loaded into v1a, and whether there is a coincidence intersection between v1a and v2 is determined.

Preferably, in step S7, if the vertex v is a directed graph, filtering the direction according to the opposite direction, finding a node path, loading the conforming vertex into v2a, and determining whether there is a coincidence intersection between v2a and v1.

The method for improving the path exploration performance of the Janusgraph adopts a mode compatible with the Janusgraph to carry out integration as follows: and (3) packaging the algorithm for realizing the bidirectional breadth-first traversal into a tracker model supported by the Janusgraph, recompiling, and placing the algorithm under a lib catalog of the service to replace the service.

Compared with the prior art, the method for improving the path exploration performance of the Janusgraph has the following outstanding beneficial effects: the method for improving the path exploration performance of the Janusgraph optimizes the bi-directional breadth-first traversal, increases the maximum step limit for the algorithm, increases the cache of super nodes, reduces the access and occupation of a graph database, improves the parallel execution of node expansion, solves the problems of long response time and overlarge resource consumption commonly existing in the depth-first traversal path exploration algorithm used by the Janusgraph database, and has good popularization and application values.

Drawings

FIG. 1 is a flow chart of a method of improving Janusgraph path exploration performance in accordance with the present invention.

Detailed Description

The method for improving the path exploration performance of the Janusgraph of the present invention will be described in further detail below with reference to the accompanying drawings and examples.

Examples

As shown in fig. 1, in the method for improving the path exploration performance of the janus graph, a bidirectional breadth-first traversal algorithm is used, so that the process of gradually transiting from an original node to a target node is changed into traversal from two nodes simultaneously, the total number of vertexes required to be traversed is reduced, and the number of iterations is reduced.

The method for improving the Janusgraph path exploration performance specifically comprises the following steps:

s1, defining a set v1 and a set v2.

S2, traversing the source vertex and the target vertex by the source node and the target node, taking out the source node loading set v1, and taking out the target node loading set v2.

S3, traversing the set v1, and taking out each vertex v.

S4, expanding the vertex v, finding a node path, loading the matched vertex into v1a, judging whether a coincident intersection exists between v1a and v2, if yes, jumping to S9, otherwise, executing step S5.

And expanding the vertex v, if the vertex v is a directed graph, filtering according to opposite sides of the direction, finding a node path, loading the matched vertex into v1a, and judging whether a coincident intersection exists between v1a and v2.

S5, replacing v1 with the content of v1 a.

S6, traversing the set v2, and taking out each vertex v.

And S7, expanding the vertex v, finding a node path, loading the matched vertex into v2a, judging whether a coincident intersection exists between v2a and v1, if yes, jumping to S9, otherwise, executing the step S8.

And if the vertex v is a directed graph, performing direction filtering according to the opposite sides of the direction, finding a node path, loading the conforming vertex into v2a, and judging whether a coincidence intersection exists between v2a and v1.

S8, replacing v2 with the content of v2 a.

S9, jumping to the step S4.

And S10, connecting the node path of v2 with v1, and outputting a path result.

Wherein the method can also be optimized as follows:

(1) In the method, parallel node expansion is adopted, the vertex expansion step S4 and the vertex expansion step S7 are operated in parallel by the bidirectional breadth-first traversal, and the expansion of a plurality of nodes is executed simultaneously as a plurality of tasks.

(2) And limiting the path length, setting a maximum step length for the query path, and calculating that the path does not exist when the path of the node is not found after the maximum step length is exceeded.

(3) Super nodes and hot data optimization, wherein the edges of nodes with a large number of edges are additionally cached in a memory.

And caching the super node and the hot data, and reducing access and occupation time to the graph database.

The Janusgraph service is divided into websocket interfaces, graph operations, algorithms and graph databases, the algorithms for realizing the two-way breadth-first traversal are packaged into a tracker model supported by the Janusgraph, and the tracker model is recompiled and placed under a lib catalog of the service to replace the server model.

The above embodiments are only preferred embodiments of the present invention, and it is intended that the common variations and substitutions made by those skilled in the art within the scope of the technical solution of the present invention are included in the scope of the present invention.

Claims

1. A method for improving the path exploration performance of a Janusgraph is characterized by comprising the following steps of: the method uses a bidirectional breadth-first traversal algorithm to change the gradual transition process from an original node to a target node into traversal from two nodes simultaneously, the total number of vertexes required to be traversed is reduced, and the number of iterations is reduced, and the method specifically comprises the following steps:

s1, defining a set v1 and a set v2;

s3, traversing the set v1, and taking out each vertex v;

s5, replacing v1 with the content of v1 a;

s6, traversing the set v2, and taking out each vertex v;

s8, replacing v2 with the content of v2 a;

s9, jumping to the step S4;

and S10, connecting the node path of v2 with v1, and outputting a path result.

2. The method for improving the path exploration performance of a janus graph according to claim 1, wherein: in the method, parallel node expansion is adopted, the vertex expansion step S4 and the vertex expansion step S7 are operated in parallel by the bidirectional breadth-first traversal, and the expansion of a plurality of nodes is executed simultaneously as a plurality of tasks.

3. The method for improving the path exploration performance of a janus graph according to claim 2, wherein: and limiting the path length, setting a maximum step length for the query path, and calculating that the path does not exist when the path of the node is not found after the maximum step length is exceeded.

4. The method for improving the path discovery performance of a janus graph according to claim 3, wherein: in step S4, expanding the vertex v, if the vertex v is a directed graph, filtering according to opposite sides of the direction, finding a node path, loading the conforming vertex into v1a, and judging whether a coincidence intersection exists between v1a and v2.

5. The method for improving the path discovery performance of a janus graph according to claim 4, wherein: in step S7, expanding the vertex v, if the vertex v is a directed graph, filtering the direction according to the opposite direction, finding a node path, loading the conforming vertex into v2a, and judging whether a superposition intersection exists between v2a and v1.