Detailed Description
Generally, in a Graph Database (Graph Database), an entity (Object) represented by a Vertex (Vertex) in a Graph (Graph) may be: people, things, objects that exist objectively, events that exist abstractly, tables, tasks, etc., the relationship represented by an Edge (Edge) in a Graph (Graph) may be: dependency, social relationships, etc. Each Vertex in the Graph (Graph) is provided with a Vertex attribute (Vertex Property), wherein the Vertex attribute refers to the attribute (such as the age, the name and the like of a person) of a corresponding entity; each Edge in the Graph (Graph) also has an Edge Property (Edge Property), which refers to a Property of the relationship between corresponding entities. The graphic database can be applied to constructing social networks, public transportation networks, maps, network topographies and the like. In the related art, since a sub graph (Subgraph) obtained by a user query often contains a very large number of top points and edges, it is not easy for the user to quickly obtain some profile information from the sub graph obtained by the query. For this reason, the following aspects of the present application are proposed. In one embodiment, the graphical data query engine tools or data storage tools used to implement the following schemes include, but are not limited to: geobase, Neo4J, titan, GraphDB, MYSQL, ADS, HBASE, ODPS, HIVE, ORACLE, GRENPLUM, etc.
FIG. 1 illustrates a flow chart of a method of generating graphical database query results provided by an exemplary embodiment. As shown in FIG. 1, in one embodiment, the method includes the following steps 101-103, wherein:
in step 101, a graph database is queried according to an input vertex ID, and a subgraph corresponding to the vertex ID is obtained. The primitive elements in the subgraph comprise vertexes and edges connected between the two vertexes.
The graphs in the Graph database can be generally divided into a directed Graph (Oriented Graph) and an undirected Graph (undirected Graph), and the query process is not very similar. In an embodiment, if the graph stored in the graph database to be queried is an undirected graph, the query can be realized only by inputting the vertex ID to be queried. In another embodiment, if the graph stored in the graph database to be queried is a directed graph, the user needs to specify a query direction while inputting the vertex ID to be queried, where the query direction may be: a direction in which a vertex corresponding to the input vertex ID is a starting point, or a direction in which a vertex corresponding to the input vertex ID is an ending point.
Referring to fig. 2 and 3 in combination, in the example of the directed graph, each vertex in the directed graph corresponds to a vertex ID for identifying the unique identity of the vertex, such as: a. b, c, and each Vertex corresponds to a Vertex Type (Vertex Type), such as: type1, type 2. Based on the directed Graph (ordered Graph) shown in fig. 2, if the vertex ID to be queried input by the user is a, the specified query direction is: and (3) taking the vertex corresponding to the input vertex ID as the direction of the starting point, and querying to obtain a sub-graph as shown in FIG. 3, wherein the sub-graph is composed of vertices a, b, c, d, e, f, g and h, and the vertex a is the starting point of the sub-graph. On the contrary, if the vertex ID to be queried input by the user is a, the specified query direction is: and taking the vertex corresponding to the input vertex ID as the direction of the terminal point, and forming the inquired sub-graph by using the vertices a, j, i, q, k and p, wherein the vertex a is the terminal point of the sub-graph.
In step 102, at least one presentation index value related to each graph element in the subgraph is determined according to the type of the graph element.
The presentation index value is generally used to reflect the relevant index actually required by the user of the graphic database. It is desirable for a user to quickly learn at least one desired metric from the query results after querying the graph database. When the system is actually used, a user can adjust or set one or more indexes according to the requirement of the user. In one embodiment, the display index value may include, but is not limited to, at least one of the following:
① number of types of vertices in the subgraph;
② number of vertices in the subgraph belonging to each type;
③, the number of edges in the sub-graph where the start point type and the end point type are both consistent.
The above three display index values will be described as examples.
In one embodiment, the step 102 can be implemented by the following process:
step 1021: and obtaining four-tuple information according to the type of each vertex in the inquired subgraph, wherein the four-tuple information comprises a starting point ID, a starting point type, an end point ID and an end point type.
In the example shown in fig. 3, the obtained quadruple information is shown in table 1.
Table 1:
origin ID
|
Type of origin
|
Endpoint ID
|
End point type
|
a
|
type1
|
b
|
type2
|
a
|
type1
|
c
|
type1
|
b
|
type2
|
d
|
type2
|
b
|
type2
|
e
|
type2
|
c
|
type1
|
d
|
type2
|
d
|
type2
|
f
|
type3
|
d
|
type2
|
g
|
type3
|
e
|
type2
|
h
|
type3 |
Step 1022: and counting the number of edges with consistent start point types and end point types based on the obtained four-tuple information.
Based on table 1, the counted number of edges having the same start point type and end point type is shown in table 2.
Table 2:
type of origin
|
End point type
|
Number of edges
|
type1
|
type1 |
|
1
|
type2
| type2 |
|
2
|
type1
| type2 |
|
2
|
type2
|
type3
|
3 |
Step 1023: and counting the number of the top points belonging to each type based on the obtained four-tuple information.
Based on table 1, the number of top points belonging to each type obtained by statistics is shown in table 3. Where under a certain type, the same vertex counts once.
Table 3:
end point type
|
Number of vertices (i.e. end points) belonging to the type
|
type1 |
|
1
|
type2
|
3
|
type3
|
3 |
In step 103, according to the presentation index value, a to-be-presented graph including a primitive element for describing the presentation index value is generated and presented.
In one embodiment, the step 103 can be realized by the following processes:
step 1031: and drawing a target point corresponding to each vertex type in the inquired subgraph.
Step 1032: and respectively labeling the number of the top points belonging to the type for each drawn target point.
Step 1033: and drawing a directed edge connected to a target point or connected between two target points according to the statistical starting point type and the statistical end point type.
Step 1034: and respectively labeling the number of edges with the consistent starting point type and the end point type for each drawn directed edge.
Fig. 4 is a diagram to be displayed generated according to the above tables 2 and 3, wherein in the process of generating the diagram, first, three points are plotted, and the IDs of the three points are: type1, type2, type 3. Subsequently, according to the quantity values statistically obtained in table 3, points having IDs "type 1", "type 2", and "type 3" are respectively labeled with corresponding numerical values: 1. 3, 3 (ID and value may be labeled in some format within the dots). Next, according to Table 2, a directed edge connecting between the point ID of "type 1" and the point ID of "type 2", a directed edge connecting between the point ID of "type 2" and the point ID of "type 3", and a directed edge having start and end points on the point ID of "type 2" are drawn. Finally, according to table 2, corresponding values are respectively marked on the three directed edges. By presenting the diagram to be presented shown in fig. 4 to the user, the user can quickly check information on how many types of vertices are shared downstream (query is performed with the vertex as a starting point) or upstream (query is performed with the vertex as an end point) of the vertex with the vertex ID "a" that is input, how many vertices are in each type, how many edges are in one type to another type, and the like. Especially, when the searched subgraph is complex, the process of searching and obtaining information by the user is more efficient.
In another embodiment, the step 102 can be realized by the following processes:
step 1021: and obtaining four-tuple information according to the type of each vertex in the subgraph, wherein the four-tuple information comprises a starting point ID, a starting point type, an end point ID and an end point type.
Similarly, in the example shown in fig. 3, the obtained quadruple information is as shown in table 1 above.
Step 1025: and replacing the origin type corresponding to the input vertex ID in the obtained four-tuple information with the vertex ID.
In the example shown in fig. 3, the quadruple information obtained after the replacement is shown in table 4.
Table 4:
step 1026: and counting the number of edges with the consistent starting point type and the end point type based on the quadruple information obtained after replacement.
In the example shown in fig. 3, the number of edges whose start point type and end point type match is counted as shown in table 5, based on table 4.
Table 5:
type of origin
|
End point type
|
Number of edges
|
a
|
type2
|
1
|
a
|
type1
|
1
|
type2
| type2 |
|
2
|
type1
| type2 |
|
1
|
type2
|
type3
|
3 |
Step 1027: and counting the number of the top points belonging to each type based on the quadruple information before replacement.
The number of top points belonging to each type obtained by statistics is shown in table 3 above.
Finally, the vertex corresponding to the input vertex ID is supplemented to table 3 to obtain table 6:
vertex type
|
Number of vertices under that type
|
a
|
1
|
type1
|
1
|
type2
|
3
|
type3
|
3 |
According to the above tables 5 and 6, the graph to be displayed as shown in fig. 5 can be finally generated. In order to further improve the display effect, in this embodiment, on the basis of the above fig. 4, the queried vertex with the ID "a" may be added, so that the user can see from the graph which types of vertices the vertex with the ID "a" is directly connected to.
In other possible embodiments, the form of the generated graph to be displayed, the included primitive pixel types and the indication index value marking manner are not limited to the examples shown in fig. 4 or fig. 5, and are not listed again here.
In the above embodiment provided by the present application, after a sub-graph corresponding to a certain vertex ID is obtained through query, according to a type of a graph element in the sub-graph, one or more presentation index values related to the type may be determined, and finally, a to-be-presented graph is generated and presented by using the determined presentation index values, where the to-be-presented graph includes a primitive element for describing the presentation index value. Compared with the original subgraph, the finally generated graph to be displayed occupies smaller storage space, and the query efficiency of the database is improved; on the other hand, the user can quickly check related display index values through the generated graph to be displayed, and the efficiency of obtaining useful information by the user is improved.
Fig. 6 shows a structure of an electronic device according to an exemplary embodiment. As shown in fig. 6, the electronic device may be a graphic database server, and the electronic device may include a processor, an internal bus, a network interface, a memory, a non-volatile memory, and may also include hardware required for other services. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form a device for generating the query result of the graphic database on a logic level. Of course, besides the software implementation, the present application does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or logic devices.
As shown in fig. 7, in an embodiment, an apparatus 200 for generating a query result of a graph database includes a query unit 201, a determination unit 202, and a generation unit 203, where:
the query unit 201 is configured to query a graph database according to an input vertex ID, and obtain a sub-graph corresponding to the vertex ID, where a primitive element in the sub-graph includes a vertex and an edge connected between the two vertices.
The determining unit 202 is configured to determine at least one presentation indicator value associated with each graph element in the subgraph according to the type of the graph element.
The generating unit 203 is configured to generate and display a to-be-displayed graph including a primitive element for describing the display index value according to the display index value.
The display index is generally an index that a user of the graphic database needs to view, and when the display index is actually used, the user can adjust or set one or more indexes according to the requirement of the user. In one embodiment, the presentation metrics may include, but are not limited to, one or more of the following:
① number of types of vertices in the subgraph;
② number of vertices in the subgraph belonging to each type;
③, the number of edges in the sub-graph where the start point type and the end point type are both consistent.
Other possible indicators are for example: clustering according to the edge types to obtain the type of each edge
The number of edges, etc.
In an embodiment, if the graph stored in the graph database to be queried is a directed graph, the querying unit 201 is configured to: and querying a graph database according to the input vertex ID and the query direction to obtain a directed subgraph corresponding to the vertex ID, wherein the vertex corresponding to the vertex ID is a starting point or an end point of the directed subgraph.
As shown in fig. 8, in another embodiment, an apparatus 200' for generating a query result of a graph database includes a query unit 201, a determination unit 202 and a generation unit 203, wherein, on the basis of the apparatus 200 shown in fig. 7, the determination unit 202 may specifically include:
a quadruple information obtaining unit 2021 configured to obtain quadruple information according to types of vertices in the subgraph, wherein the quadruple information includes a start point ID, a start point type, an end point ID, and an end point type.
A counting unit 2022 configured to count the number of edges whose start point type and end point type are both consistent, based on the quadruple information; and/or counting the number of the top points belonging to each type based on the four-tuple information.
In an embodiment, the statistical unit 2022 may further include:
a replacement unit configured to replace a start point type corresponding to the input vertex ID in the obtained quadruple information with the vertex ID.
And the edge number counting unit is configured to count the number of edges with the consistent starting point type and the end point type based on the quadruple information obtained after replacement.
In an embodiment, the generating unit 203 may specifically include:
a point drawing unit 2031 configured to draw a target point corresponding to each vertex type in the subgraph.
A point labeling unit 2032 configured to label the number of vertices belonging to the type for each target point drawn, respectively.
An edge drawing unit 2033 configured to draw a directed edge connected to one target point or connected between two target points according to the counted start point type and end point type.
An edge labeling unit 2034 configured to label the number of edges with the same start point type and end point type for each drawn directed edge.
It can be seen that, by the technical scheme provided by the above embodiment, in the process of querying the graph database, the queried complex sub-graphs can be aggregated for one or more display indexes and converted into a simpler overview graph (i.e. the graph to be displayed), so that a user of the graph database can conveniently and clearly view required information, and the development and management efficiency of some projects can be greatly improved to a certain extent.
It should be noted that the above-mentioned apparatus embodiments and the above-mentioned method embodiments can be complementary to each other without violations.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.