WO2022241813A1 - 一种基于图压缩的图数据库构建方法、装置及相关组件 - Google Patents

一种基于图压缩的图数据库构建方法、装置及相关组件 Download PDF

Info

Publication number
WO2022241813A1
WO2022241813A1 PCT/CN2021/096278 CN2021096278W WO2022241813A1 WO 2022241813 A1 WO2022241813 A1 WO 2022241813A1 CN 2021096278 W CN2021096278 W CN 2021096278W WO 2022241813 A1 WO2022241813 A1 WO 2022241813A1
Authority
WO
WIPO (PCT)
Prior art keywords
graph
nodes
compressed
target node
node
Prior art date
Application number
PCT/CN2021/096278
Other languages
English (en)
French (fr)
Inventor
樊文飞
李源昊
刘沐阳
卢璨
Original Assignee
深圳计算科学研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳计算科学研究院 filed Critical 深圳计算科学研究院
Publication of WO2022241813A1 publication Critical patent/WO2022241813A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/144Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying

Definitions

  • the present application relates to the technical field of computer software, in particular to a graph compression-based graph database construction method, device and related components.
  • a query on a graph is a computable function that takes a graph G as input and produces a certain output, which can be a Boolean value, a graph or even a tuple. For example, a graph pattern matching query is given a small graph Q, and outputs all subgraphs on the graph G that match Q.
  • the existing technologies mainly include:
  • Graph shrinkage is a traditional programming technique that merges nodes, edges, etc. in a large graph, and is used to speed up certain calculations in code implementation.
  • Graph shrinkage has been applied to problems such as single-source shortest paths, connectivity, spanning trees, etc.
  • Graph compression For many queries, such as graph pattern matching, social network analysis, reachability, shortest distance, etc., graph compression algorithms can directly calculate queries without decompression by merging equivalent points on certain types of queries result. But for each type of different queries, a different compression map needs to be calculated.
  • Graph summarization is mainly to summarize the information in the graph on a large graph G for specific requirements, such as the total number of edges, average degree, centrality, etc. in the graph.
  • Graph summarization can be regarded as a kind of lossy compression, which can only support queries that do not require precise results such as fuzzy queries, approximate queries, and aggregation queries.
  • a common generalization method such as merging a group of nodes into one node and recording the number of edges between these nodes, and merging a group of edges into a hyperedge, will add or delete some isolated edges when merging, but it is guaranteed to be consistent with the original graph The difference does not exceed k edges at most.
  • Index For queries such as graph pattern matching, the existing technology usually targets the pattern to be queried, builds an index during calculation, stores possible matching nodes, and enumerates these nodes to calculate the matching. Index methods usually need to be calculated at runtime, separately for each individual query.
  • the compression algorithm is lossy: the information of the graph will be lost during the compression process, (even if the information of this compressed graph may be enough to complete some queries, the graph itself cannot be recovered), usually a compressed graph cannot be applied to other queries;
  • the indexing algorithm needs to be built on-site when calculating each individual query, which consumes a lot of time and space, and the information cannot be effectively reused.
  • the embodiment of the present application provides a method, device, computer equipment, and storage medium for constructing a graph database based on graph compression, aiming to meet the needs of fast and simultaneous multiple queries on large graphs through graph compression.
  • the embodiment of the present application provides a method for constructing a graph database based on graph compression, including:
  • the summary information includes the distance between nodes, the number of triangles formed between nodes, and the type of topology to which the nodes belong;
  • the embodiment of the present application provides a device for constructing a graph database based on graph compression, including:
  • the first compression unit is used to obtain the original image, and compress the original image based on the topological structure of the graph to obtain the compressed image;
  • a summary information calculation unit configured to perform summary information calculation on the compressed graph by enumeration; wherein, the summary information includes the distance between nodes, the number of triangles formed between nodes, and the type of topology to which the nodes belong;
  • the query unit is configured to determine the target node to be queried and the type of query information for the target node when querying the compressed graph, and then search for the corresponding type of query information in the summary information according to a preset search method, and returned as the query result of the target node.
  • an embodiment of the present application provides a computer device, including a memory, a processor, and a computer program stored on the memory and operable on the processor, and the computer program is implemented when the processor executes the computer program.
  • an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the graph-based compression described in the first aspect is implemented.
  • a graph database construction method
  • the embodiment of the present application provides a graph database construction method, device, computer equipment, and storage medium based on graph compression.
  • the method includes: obtaining the original graph, and compressing the original graph based on the topology of the graph to obtain the compressed graph ; Calculate the summary information of the compressed graph by enumeration; wherein, the summary information includes the distance between nodes and the number of triangles formed between nodes and the type information of the topology structure to which the nodes belong; when the compressed graph is calculated
  • determine the target node to be queried and the type of query information for the target node then search for the corresponding type of query information in the summary information according to the preset search method, and return it as the query result of the target node.
  • the graph compression method can satisfy the user's requirement of quickly and simultaneously performing various precise queries on a large graph.
  • the embodiment of the present application can simultaneously support multiple precise queries on one compressed graph.
  • the compressed graph in the embodiment of the present application is lossless, that is, any information in the graph will not be lost.
  • Fig. 1 is a schematic flow chart of a method for constructing a graph database based on graph compression provided by an embodiment of the present application
  • Fig. 2 is a schematic subflow diagram of a graph database construction method based on graph compression provided by the embodiment of the present application;
  • FIG. 3 is an example schematic diagram of a compressed graph in a graph compression-based graph database construction method provided by an embodiment of the present application
  • FIG. 4 is a schematic block diagram of an apparatus for constructing a graph database based on graph compression provided in an embodiment of the present application
  • FIG. 5 is a sub-schematic block diagram of an apparatus for constructing a graph database based on graph compression provided by an embodiment of the present application.
  • FIG. 1 is a schematic flowchart of a method for constructing a graph database based on graph compression provided by an embodiment of the present application, which specifically includes steps S101 to S103.
  • S102 Calculate summary information on the compressed graph by enumeration; wherein, the summary information includes the distance between nodes, the number of triangles formed between nodes, and the type of topology to which the nodes belong;
  • an original graph is first input, and then the original graph is compressed under the topology structure of the graph.
  • graphs in the graph topology described here are collectively referred to as graphs, and It does not specifically refer to the original picture described in this embodiment.
  • calculate the summary information of the compressed image according to the enumeration method such as enumerating the distance between the nodes in the compressed image, or the nodes in the compressed image that can be composed
  • the number of triangles is enumerated, and the topology type information of the node is recorded, etc., and the calculated summary information is saved.
  • the summary information corresponding to the target information can be obtained, so as to complete the query task.
  • this embodiment by means of graph compression, users can quickly and simultaneously perform various accurate queries on large graphs. And by storing only one compressed graph structure and storing summary information for different queries, this embodiment can simultaneously support multiple precise queries on one compressed graph. Meanwhile, the compressed image described in this embodiment is lossless compression, that is, any information in the image will not be lost. Compared with the prior art where each type of query needs to calculate different information on the graph, this embodiment records a certain level of summary information in advance, so that the query effect can be realized without decompressing the graph as much as possible. Moreover, this embodiment only needs to store one copy of the graph structure, instead of storing a separate copy of each type of compressed graph as in the prior art, thereby greatly saving space overhead.
  • the step S101 includes: steps S201-S206.
  • S201 Determine the expired data in the original graph by using the preset time stamp threshold, and then use the connected nodes in the expired data and the nodes not connected with other nodes as the first subgraph, and at the same time use the first subgraph
  • the subgraphs form a first type of subgraph, and each first subgraph in the first type of subgraph is compressed into a superpoint;
  • the input original graph G is obtained, and the timestamp threshold t0 for identifying expired data, as well as integer values k l and k u are set, so that each compressed subgraph must have at least k l nodes and have at most k u nodes), and then output as two functions, that is, the first function f_C and the second function f_D, wherein the first function f_C is used to map each node in the original graph G to the compressed sub corresponding superpoints in the graph, and the second function f_D maps each superpoint in the compressed subgraph to a node in the original graph.
  • the expired data is first distinguished according to the timestamp threshold, and then the connected nodes in the expired data are used as the first subgraph, and a single node that is not connected with other nodes is also used as the first subgraph, In this way, a first-type subgraph including at least a first subgraph is formed, and at the same time, the size of the first subgraph is guaranteed to be within the range of k l ⁇ ku . Then determine the topological structure in the original graph, including cluster structure, star structure, and simple path structure, etc., and compress the cluster structure, star structure, and simple path structure into superpoints, while ensuring that the compressed subgraph has a size of k In the range of l ⁇ k u .
  • the compressed graph of the entire original image can be obtained, and the compressed nodes are mapped to the compressed superpoints, and the uncompressed nodes are mapped to themselves, thus the first function f_C and the second Second function f_D. Further, during the compression process, mark the compressed node to ensure that the node will not be compressed again. In addition, a hash table is used to map the compressed nodes to the compressed superpoints.
  • this embodiment constructs a compressed graph G_C, including each node in f_C(G), that is, all superpoints corresponding to the compressed subgraph, and nodes that cannot be compressed. And if and only if two nodes have an edge connection in the original graph G, there will be an edge connection between the corresponding two nodes in the compressed graph G_C.
  • the expired data includes f 1 , n 1 , l 1 , i 1
  • the cluster structure includes k 1 , k 2 , k 3 , k 4 , k 5
  • the star structure includes u 1 , u 2 , u 3 , u 4 , u 5 , u 6 , u 7 , u 8 , u 9 , u 10
  • the simple path structure includes k 6 , k 7 , k 8 , k 9 , and t 2 cannot be compressed of nodes.
  • the step S102 includes:
  • the possible path lengths of each node in the compressed graph are calculated in advance, so as to avoid decompression super-points, and thus the shortest path between any two nodes can be calculated.
  • the distance between any two nodes can be directly enumerated; for the clique structure, the distance between any two nodes can be enumerated, and the shortest distance at this time is the distance between the two nodes Length; for a star structure, just record the distance between the middle point of the star structure and the leaf node, and the distance between any other two leaf nodes can be combined; for a simple path structure, record the distance between every two adjacent nodes , for the distance between any two nodes that are not adjacent, just add the distances of all edges between the two nodes.
  • this embodiment also records the structure type and topological characteristics of each node, which can be used to check the structure information of the compressed superpoint and query the connectivity of subgraphs in the superpoint.
  • (i 1 , f 1 , 1) means that the distance between i 1 and f 1 is 1.
  • step S102 further includes:
  • the number of triangles in the clique structure is calculated according to the following formula:
  • A is the number of triangles in the group structure, and k is the number of nodes in the group structure;
  • the uncompressed nodes in the original image are set as external nodes, and the nodes connected to the external nodes in the expired data, cluster structure, star structure, and simple path structure are set as edge nodes;
  • the number of triangles in each compressed subgraph is counted in advance. Specifically include: for expired data, enumerate each triangle in the expired data; for clique structure, calculate according to the above formula for calculating the number of triangles, and for star structure and simple path structure, there is no corresponding triangle.
  • nodes in the compressed graph can also form triangles with uncompressed nodes (ie, the external nodes), for example, nodes in two subgraphs and an external node form a triangle, or a subgraph The inner node and the two outer nodes form a triangle.
  • edge nodes in expired data may form a triangle, so it is necessary to enumerate and count the edge nodes in turn;
  • each node can be an edge node, so the edge nodes in the clique structure are sequentially Enumeration;
  • the nodes except the central node are all edge nodes, and for the simple path structure, the two endpoints are edge nodes, and the same is to enumerate the edge nodes in the star structure and the simple path structure.
  • the step S103 includes:
  • the query information type of the target node is to query the shortest distance of the target node, obtain the distance between the target node and other nodes in the summary information, and select the smallest distance as the target node The shortest path is returned.
  • the target node can also be two nodes, that is, query the shortest path between the two target nodes.
  • step S103 also includes:
  • the query information type of the target node is to query the number of triangles corresponding to the target node, determine whether the target node is a compression node;
  • the target node is a compressed node, obtain the number of triangles formed by the target node and other nodes in the summary information as the number of triangles corresponding to the target node and return;
  • the target node When the target node is an uncompressed node, enumerate the triangles corresponding to the uncompressed node, and return the enumeration result as the query result of the target node.
  • the summary information of each node has counted the number of triangles with other nodes
  • the query information of the target node is the number of triangles and the target node is a compressed node
  • only the pre-computed summary The number of triangles corresponding to the target node obtained in the information can be returned as the query result.
  • the target node is an uncompressed node, it is still necessary to enumerate the triangles formed between the target node and other nodes, and use the enumerated number as the query result.
  • the graph database construction method based on graph compression further includes:
  • the compressed map and the summary information are updated according to the update set.
  • the graph database constructed by the above method also supports incremental computing.
  • incremental computing means that when information such as nodes, edges, or attributes on the graph changes, the graph database needs to be maintained and updated. Therefore, for this For an embodiment, both the structure of the compressed graph and the precomputed summary information need to be updated.
  • FIG. 4 is a schematic block diagram of an apparatus 400 for constructing a graph database based on graph compression provided in an embodiment of the present application.
  • the apparatus 400 includes:
  • the first compression unit 401 is configured to obtain an original image, and compress the original image based on the topology structure of the graph to obtain a compressed image;
  • the summary information calculation unit 402 is configured to perform summary information calculation on the compressed graph through enumeration; wherein, the summary information includes the distance between nodes, the number of triangles formed between nodes, and the type information of the topology structure to which the nodes belong;
  • the query unit 403 is configured to determine the target node to be queried and the type of query information for the target node when querying the compressed graph, and then search for the corresponding type of query information in the summary information according to a preset search method , and returned as the query result for the target node.
  • the first compression unit 401 includes:
  • An expired data determination unit 501 configured to determine expired data in the original graph by using a preset time stamp threshold, and then use nodes connected in the expired data and nodes not connected with other nodes as the first subgraph, Simultaneously forming the first subgraph into a first type subgraph, and compressing each first subgraph in the first type subgraph into a superpoint;
  • a topology type determining unit 502 configured to determine a topology type to be compressed based on the original graph, wherein the topology type includes a cluster structure, a star structure, and a simple path structure;
  • the subgraph acquiring unit 503 is configured to acquire a second subgraph conforming to the topology type on the original graph, so that the second subgraph of each topology type contains at least one second subgraph composed of nodes Graph, and determine whether the number of nodes in the second sub-graph meets the preset sub-graph node threshold;
  • the second compression unit 504 is configured to compress the second subgraph if the number of nodes in the second subgraph meets a preset subgraph node threshold
  • a node mapping unit 505 configured to map the compressed nodes in the original image to compressed superpoints, and map the uncompressed nodes to themselves, so as to obtain the first function and the second function;
  • a compressed graph construction unit 506, configured to combine the first subgraph, the second subgraph, and the first function and the second function to construct the compressed graph.
  • the summary information calculation unit 402 includes:
  • the first enumeration unit of expired data is used to enumerate the distance between any two nodes in the expired data
  • a clique structure enumeration unit configured to enumerate the distance between any two nodes in the clique structure
  • a star structure enumeration unit configured to record the distance between an intermediate point in the star structure and each leaf node, and enumerate the distance between any two leaf nodes;
  • a simple path structure recording unit configured to record the distance between two adjacent nodes in the simple path structure, and add and sum the lengths of all edges between any two non-adjacent nodes;
  • the topological feature recording unit is used to record the structure type and topological feature to which each node belongs.
  • the summary information calculation unit 402 further includes:
  • the second enumeration unit of expired data is used to enumerate each triangle in the expired data
  • a group structure calculation unit used to calculate the number of triangles in the group structure according to the following formula:
  • A is the number of triangles in the group structure, and k is the number of nodes in the group structure;
  • a node setting unit configured to set uncompressed nodes in the original graph as external nodes, and set nodes connected to the external nodes in the outdated data, cluster structure, star structure, and simple path structure as edge nodes ;
  • the node enumeration unit is configured to enumerate the triangle formed between the edge node and the external node.
  • the query unit 403 includes:
  • a shortest path query unit configured to obtain the distance between the target node and other nodes from the summary information when the query information type of the target node is to query the shortest distance of the target node, and select the smallest The distance is returned as the shortest path to the target node.
  • the query unit 403 further includes:
  • a node judging unit configured to judge whether the target node is a compressed node when the query information type of the target node is querying the number of triangles corresponding to the target node;
  • the first query return unit is used to obtain the number of triangles formed by the target node and other nodes in the summary information as the number of triangles corresponding to the target node and return if the target node is a compressed node;
  • the first query return unit is used to enumerate the triangles corresponding to the uncompressed node when the target node is an uncompressed node, and use the enumeration result as the query result of the target node return.
  • the device for constructing a graph database based on graph compression further includes:
  • An update node statistics unit is used to perform statistics on updated nodes and edges in the original graph to obtain an update set
  • An updating unit configured to update the compressed map and the summary information according to the update set.
  • the embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored. When the computer program is executed, the steps provided in the above-mentioned embodiments can be realized.
  • the storage medium may include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.
  • the embodiment of the present application also provides a computer device, which may include a memory and a processor.
  • a computer program is stored in the memory.
  • the processor invokes the computer program in the memory, the steps provided in the above embodiments can be implemented.
  • the computer equipment may also include components such as various network interfaces and power supplies.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Library & Information Science (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种基于图压缩的图数据库构建方法、装置及相关组件,该方法包括:获取原图,并基于图的拓扑结构对所述原图进行压缩,得到压缩图(S101);通过枚举的方式对所述压缩图进行概要信息计算;其中,所述概要信息包括节点间的距离和节点间构成的三角形数量以及节点所属的拓扑结构种类信息(S102);当对所述压缩图进行查询时,确定待查询的目标节点以及对于目标节点的查询信息类型,然后根据预设的查找方式在所述概要信息中查找对应类型的查询信息,并作为所述目标节点的查询结果返回(S103)。该方法可满足在大图上快速、同时进行多种查询的需求。通过只存储一份压缩图结构,并针对不同查询存储概要信息的方式,能够同时在一张压缩图上支持多种精确查询。

Description

一种基于图压缩的图数据库构建方法、装置及相关组件
本申请是以申请号为202110536533.1、申请日为2021年5月17日的中国专利申请为基础,并主张其优先权,该申请的全部内容在此作为整体引入本申请中。
技术领域
本申请涉及计算机软件技术领域,特别涉及一种基于图压缩的图数据库构建方法、装置及相关组件。
背景技术
一个图可以用一个四元组G=(V,E,L,T)来表示,其中,V代表节点集合,E代表边集合,L是一个函数,表示将每个节点映射到一个标签,而每一个节点拥有一个且仅有一个标签,T也是一个函数,表示每个节点上都带有一个时间戳T(v)。一个图上的查询,就是一个可计算的函数,这个函数接受图G作为输入,并产生一定的输出,可以是一个布尔值,一个图甚至一个元组。例如,一个图模式匹配查询就是给定一个小图Q,输出图G上所有与Q相匹配的子图。
针对大图上的查询,现有的技术主要有:
图收缩:图收缩是一项传统的编程技巧,将大图中的节点、边等做合并,在代码实现中用于加速某些特定计算。图收缩已被应用在单源最短路径,连接性,生成树等问题中。
图压缩:针对许多查询,例如图模式匹配,社交网络分析,可达性,最短距离等,图压缩算法通过合并在某类查询上的等价点,能够在不解压缩的情况下直接计算查询结果。但对于每一类不同的查询,都需要计算一份不同的压缩图。
图概括:图概括主要是在某张大图G上针对特定需求概括图中的信息,例如图中的总边数,平均度数,中心度等信息。图概括可以看作是一种有损压缩,只能支持如模糊查询,近似查询,聚合查询等不需要精确结果的查询。常见的概括方式,如将一组节点合并为一个节点并记录这些节点间的边数目,而将一组边合并为一条超边,合并时会添加或删除某些孤立边,但保证与原图相差最 多不超过k条边。
索引:对于图模式匹配等查询,现有技术通常针对需要查询的模式,在计算时建立索引,存储可能匹配的节点,并枚举这些节点计算匹配。索引方法通常需要在运行时计算,针对每一个单独的查询单独计算。
现有技术存在以下缺陷:
传统的图收缩、压缩等只能针对单独一类查询进行压缩,同时处理多类查询时则需要给每一类单独存储一份压缩图;
压缩算法是有损的:在压缩过程中图的信息会丢失,(即使此压缩图的信息可能足够完成某些查询,图本身并无法被恢复),通常一种压缩图无法应用于其他查询;
索引算法需要在计算每个单独查询时现场建立,需要消耗大量时间与空间,同时信息无法被有效重用。
申请内容
本申请实施例提供了一种基于图压缩的图数据库构建方法、装置、计算机设备及存储介质,旨在通过图压缩的方式,满足在大图上快速、同时进行多种查询的需求。
第一方面,本申请实施例提供了一种基于图压缩的图数据库构建方法,包括:
获取原图,并基于图的拓扑结构对所述原图进行压缩,得到压缩图;
通过枚举的方式对所述压缩图进行概要信息计算;其中,所述概要信息包括节点间的距离和节点间构成的三角形数量以及节点所属的拓扑结构种类信息;
当对所述压缩图进行查询时,确定待查询的目标节点以及对于目标节点的查询信息类型,然后根据预设的查找方式在所述概要信息中查找对应类型的查询信息,并作为所述目标节点的查询结果返回。
第二方面,本申请实施例提供了一种基于图压缩的图数据库构建装置,包括:
第一压缩单元,用于获取原图,并基于图的拓扑结构对所述原图进行压缩,得到压缩图;
概要信息计算单元,用于通过枚举的方式对所述压缩图进行概要信息计算; 其中,所述概要信息包括节点间的距离和节点间构成的三角形数量以及节点所属的拓扑结构种类信息;
查询单元,用于当对所述压缩图进行查询时,确定待查询的目标节点以及对于目标节点的查询信息类型,然后根据预设的查找方式在所述概要信息中查找对应类型的查询信息,并作为所述目标节点的查询结果返回。
第三方面,本申请实施例提供一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如第一方面所述的基于图压缩的图数据库构建方法。
第四方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如第一方面所述的基于图压缩的图数据库构建方法。
本申请实施例提供了一种基于图压缩的图数据库构建方法、装置、计算机设备及存储介质,该方法包括:获取原图,并基于图的拓扑结构对所述原图进行压缩,得到压缩图;通过枚举的方式对所述压缩图进行概要信息计算;其中,所述概要信息包括节点间的距离和节点间构成的三角形数量以及节点所属的拓扑结构种类信息;当对所述压缩图进行查询时,确定待查询的目标节点以及对于目标节点的查询信息类型,然后根据预设的查找方式在所述概要信息中查找对应类型的查询信息,并作为所述目标节点的查询结果返回。本申请实施例通过图压缩的方式,能够满足用户在大图上快速、同时进行多种精确查询的需求。通过只存储一份压缩图结构,并针对不同查询存储概要信息的方式,本申请实施例能够同时在一张压缩图上支持多种精确查询。同时本申请实施例中的压缩图是无损的,即图中的任何信息都不会丢失。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种基于图压缩的图数据库构建方法的流程示意图;
图2为本申请实施例提供的一种基于图压缩的图数据库构建方法的子流程 示意图;
图3为本申请实施例提供的一种基于图压缩的图数据库构建方法中压缩图的示例示意图;
图4为本申请实施例提供的一种基于图压缩的图数据库构建装置的示意性框图;
图5为本申请实施例提供的一种基于图压缩的图数据库构建装置的子示意性框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。
还应当进一步理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
下面请参见图1,图1是本申请实施例提供的一种基于图压缩的图数据库构建方法的流程示意图,具体包括:步骤S101~S103。
S101、获取原图,并基于图的拓扑结构对所述原图进行压缩,得到压缩图;
S102、通过枚举的方式对所述压缩图进行概要信息计算;其中,所述概要信息包括节点间的距离和节点间构成的三角形数量以及节点所属的拓扑结构种类信息;
S103、当对所述压缩图进行查询时,确定待查询的目标节点以及对于目标节点的查询信息类型,然后根据预设的查找方式在所述概要信息中查找对应类 型的查询信息,并作为所述目标节点的查询结果返回。
本实施例中,在构建图数据库时,首先输入一张原图,然后在图的拓扑结构下对原图进行压缩,可以理解的是,这里所述的图的拓扑结构中图是统称,而不是具体所指本实施例所述的原图。在对原图压缩得到压缩图后,对压缩图按照枚举的方式进行概要信息计算,例如对压缩图中的各节点之间的距离进行枚举,又或者是对压缩图中各节点可构成的三角形数量进行枚举,以及记录节点所属的拓扑结构种类信息等等,并计算得到的概要信息进行保存。在进行节点查询时,根据查询的节点(即所述目标节点)在所述压缩图中对应的位置,即可获取所述目标信息对应的概要信息,从而完成查询任务。
本实施例通过图压缩的方式,能够满足用户在大图上快速、同时进行多种精确查询的需求。并且通过只存储一份压缩图结构,针对不同查询存储概要信息的方式,本实施例能够同时在一张压缩图上支持多种精确查询。同时本实施例所述的压缩图为无损压缩,即图中的任何信息都不会丢失。与现有技术中每类查询都需要对图上的不同信息进行计算相比,本实施例提前记录一定程度的概要信息,从而尽可能的不对图进行解压缩便可以实现查询的效果。而且本实施例只需要存储一份图结构,而不需要如同现有技术将每一类压缩图单独存储一份,从而大大节省了空间开销。
在一实施例中,如图2所示,所述步骤S101包括:步骤S201~S206。
S201、利用预先设置的时间戳阈值确定所述原图中的过期数据,然后将所述过期数据中连通的节点以及未与其他节点连通的节点均作为第一子图,同时将所述第一子图组成第一类子图,并将所述第一类子图中的每一第一子图压缩为超点;
S202、基于所述原图确定待压缩的拓扑结构类型,其中,所述拓扑结构类型包括团结构、星结构和简单路径结构;
S203、获取所述原图上符合所述拓扑结构类型的第二子图,使每一拓扑结构类型的第二类子图中均包含至少一由节点组成的第二子图,并判断所述第二子图中的节点数量是否符合预设子图节点阈值;
S204、若所述第二子图中的节点数量符合预设子图节点阈值,则对所述第二子图进行压缩;
S205、将所述原图中被压缩的节点映射为压缩后的超点,以及将未被压缩 的节点映射为自身,从而得到第一函数和第二函数;
S206、结合所述第一子图、第二子图以及第一函数和第二函数构建得到所述压缩图。
本实施例中,获取输入原图G,并设置用于识别过期数据的时间戳阈值t0,以及整型数值k l和k u,使每个被压缩的子图至少要有k l个节点以及最多拥有k u个节点),然后输出为两个函数,即所述第一函数f_C与第二函数f_D,其中,第一函数f_C用于将原图G中每个节点映射到压缩后的子图中对应的超点,而所述第二函数f_D则将压缩后的子图中每个超点映射为原图中的节点。
具体来说,首先根据所述时间戳阈值区分得到过期数据,然后将过期数据中相连通的节点作为所述第一子图,以及将未与其他节点连通的单个节点同样作为第一子图,由此组成至少包含一第一子图的第一类子图,同时保证所述第一子图的大小处于k l~k u范围内。然后确定所述原图中的拓扑结构,包括团结构、星结构和简单路径结构等,并分别将团结构、星结构和简单路径结构压缩为超点,同时保证压缩后的子图大小处于k l~k u范围内。此时,即可得到整张原图的压缩图,并将被压缩的节点映射到压缩后的超点,以及将未能被压缩的节点映射到自己,由此得出第一函数f_C与第二函数f_D。进一步的,在压缩过程中,对压缩的节点进行标记,以确保该节点不会被再次压缩。另外,利用哈希表将被压缩的节点映射到压缩后的超点。
最终,本实施例构建得到压缩图G_C,包括f_C(G)中的每一个节点,即压缩子图对应的所有超点,以及未能被压缩的节点。并且当且仅当两个节点在原图G中有边连接时,压缩图G_C中对应的两个节点之间才有边连接。
如图3所示,图3中,过期数据包括f 1、n 1、l 1、i 1,团结构包括k 1、k 2、k 3、k 4、k 5,星结构包括u 1、u 2、u 3、u 4、u 5、u 6、u 7、u 8、u 9、u 10,简单路径结构包括k 6、k 7、k 8、k 9,另外,t 2为无法被压缩的节点。
在一实施例中,所述步骤S102包括:
对所述过期数据中的任意两个节点之间的距离进行枚举;
对所述团结构中的任意两个节点之间的距离进行枚举;
对所述星结构中的中间点和每一叶子节点之间的距离进行记录,以及对任意两个叶子节点之间的距离进行枚举;
对所述简单路径结构中相邻的两个节点间的距离进行记录,以及将任意不 相邻的两个节点之间的所有边长度进行相加求和;
记录每一节点所属结构种类和拓扑特征。
本实施例中,对于最短路径问题,提前计算好压缩图内部各节点可能存在的路径长度,以此可以避免解压缩超点,从而可以计算任何两节点间的最短路径。具体来说,对于过期数据,直接枚举任意两个节点间的距离即可;对于团结构,枚举任意两个节点间的距离,此时的最短距离即为两个节点之间连边的长度;对于星结构,记录星结构的中间点与叶子节点间的距离即可,其他任意两个叶子节点间的距离可以组合得到;对于简单路径结构,则记录每两个相邻节点间的距离,对于不相邻的任意两个节点间的距离,则对该两个节点间的所有边的距离相加即可。另外,本实施例还记录了每一节点所述的结构种类和拓扑特征,可以用于检查压缩后的超点的结构信息以及查询超点内子图连通性。
结合图3,假设所有边的距离都是1,例如可以得到节点v{H1}的节点距离为dis={(i 1,f 1,1),(i 1,n 1,1),(i 1,l 1,1),(i 1,n 1,2),(i 1,l 1,2),(n 1,l 1,2)},节点v{H2}的节点距离dis={(k i,k j,1)},其中1≤i<j≤5。这其中,(i 1,f 1,1)表示i 1与f 1之间的距离为1。
在一实施例中,所述步骤S102还包括:
对所述过期数据中的每一三角形进行枚举;
按照下式对所述团结构中的三角形数量进行计算:
A=k(k-1)(k-2)/6
式中,A为所述团结构中的三角形数量,k为所述团结构中的节点数量;
将所述原图中未被压缩的节点设置为外部节点,将所述过期数据、团结构、星结构、简单路径结构中与所述外部节点相连的节点设置为边缘节点;
对所述边缘节点和所述外部节点之间构成的三角形进行枚举。
本实施例中,对于三角形计数问题,提前对于每个被压缩的子图中的三角形数量进行统计。具体包括:对于过期数据,枚举过期数据中的每一个三角形;对于团结构,则按照上述三角形数量计算公式计算,对于星结构和简单路径结构则不存在相应的三角形。
除了压缩图内部的三角形数量以外,压缩图中的节点也可以和未被压缩的节点(即所述外部节点)构成三角形,例如两个子图中的节点和一个外部节点构成三角形,或者一个子图内的节点和两个外部节点构成三角形。具体的,对 于过期数据中的边缘节点均有可能构成三角形,故需要对边缘节点依次进行枚举统计数量;对于团结构,每个节点均可以边缘节点,因此依次对团结构中的边缘节点进行枚举;类似地,对于星结构,除去中心节点的节点都是边缘节点,而对于简单路径结构,两个端点为边缘节点,同样是对星结构和简单路径结构中的边缘节点进行枚举。
在一实施例中,所述步骤S103包括:
当所述目标节点的查询信息类型为查询所述目标节点的最短距离时,在所述概要信息中获取所述目标节点与其他节点之间的距离,并选择最小的距离作为所述目标节点的最短路径返回。
本实施例中,由于每个节点的概要信息已经统计了与其他节点间的距离,因此,当目标节点的查询信息为最短路径时,则只需在预计算的概要信息中获取目标节点对应的最短路径即可作为查询结果返回。可以理解的是,所述目标节点也可以是两个节点,即查询两个目标节点之间最短路径,同样的,首先利用所述第一函数确定两个目标节点在压缩图中的位置,如果两个目标节点为相邻的节点,则可以直接将概要信息中记录的距离作为最短路径返回。如果两个目标节点之间不相邻,那么结合概要信息中记录的各节点间的距离,从中选择最短距离作为最终的查询结果。
在一实施例中,所述步骤S103还包括:
当所述目标节点的查询信息类型为查询所述目标节点对应的三角形数量时,判断所述目标节点是否为压缩节点;
若所述目标节点为压缩节点,则在所述概要信息中获取所述目标节点与其他节点构成的三角形数量作为所述目标节点对应的三角形数量并返回;
当所述目标节点为未被压缩的节点时,则对所述未被压缩的节点对应的三角形进行枚举,并将枚举结果作为所述目标节点的查询结果返回。
本实施例中,由于每个节点的概要信息已经统计了与其他节点间的三角形数量,因此,当目标节点的查询信息为三角形数量且目标节点为压缩节点时,则只需在预计算的概要信息中获取目标节点对应的三角形数量即可作为查询结果返回。当然,如果所述目标节点为未被压缩的节点,则仍需对目标节点与其他节点之间构成的三角形进行枚举,并将枚举数量作为查询结果。
结合图3,可知图3中对于可压缩的节点之间共构成了14个三角形,例如 (k 1,k 2,k 3)、(k 3,k 4,k 5)、(k 1,k 5,u 6)等等,而对于不可压缩的节点(t 2),则可以枚举得到3个三角形,即(u 6,t 2,k 1)、(t 1,t 2,k 1)和(k 1,t 2,k 5)。
在一实施例中,所述基于图压缩的图数据库构建方法还包括:
对所述原图中更新的节点、边进行统计,得到一更新集合;
根据所述更新集合对所述压缩图和所述概要信息进行更新。
本实施例中,通过上述方法构建得到的图数据库还支持增量计算,所谓增量计算即是指图上节点、边或属性等信息发生改动时,需要对图数据库进行维护更新,因此对于本实施例来说,既需要更新压缩图的结构,也需要更新预计算的概要信息。
具体的,首先统计原图上发生更新的节点和边以及更新的属性,得到一个更新集合。然后,如果更新集合中包含对于属性的更新改动,则可以直接更新对应的概要信息,例如节点间的距离等;如果更新集合中包含对于节点、边的更新改动,且更新改动导致被压缩的子图不再满足特定结构子图(即团、星、和简单路径),则需要解压缩该子图,并对发生更新改动的节点重新进行压缩,而这可能会得到其他方式的压缩,或者是更新改动的节点无法被压缩而成为独立节点。另外,对于发生更新改动的节点处于过期数据中,则需要将更新改动的节点从过期数据中分离出来,并按照拓扑结构对该节点进行对应的压缩。
图4为本申请实施例提供的一种基于图压缩的图数据库构建装置400的示意性框图,该装置400包括:
第一压缩单元401,用于获取原图,并基于图的拓扑结构对所述原图进行压缩,得到压缩图;
概要信息计算单元402,用于通过枚举的方式对所述压缩图进行概要信息计算;其中,所述概要信息包括节点间的距离和节点间构成的三角形数量以及节点所属的拓扑结构种类信息;
查询单元403,用于当对所述压缩图进行查询时,确定待查询的目标节点以及对于目标节点的查询信息类型,然后根据预设的查找方式在所述概要信息中查找对应类型的查询信息,并作为所述目标节点的查询结果返回。
在一实施例中,如图5所示,所述第一压缩单元401包括:
过期数据确定单元501,用于利用预先设置的时间戳阈值确定所述原图中的 过期数据,然后将所述过期数据中连通的节点以及未与其他节点连通的节点均作为第一子图,同时将所述第一子图组成第一类子图,并将所述第一类子图中的每一第一子图压缩为超点;
拓扑结构类型确定单元502,用于基于所述原图确定待压缩的拓扑结构类型,其中,所述拓扑结构类型包括团结构、星结构和简单路径结构;
子图获取单元503,用于获取所述原图上符合所述拓扑结构类型的第二子图,使每一拓扑结构类型的第二类子图中均包含至少一由节点组成的第二子图,并判断所述第二子图中的节点数量是否符合预设子图节点阈值;
第二压缩单元504,用于若所述第二子图中的节点数量符合预设子图节点阈值,则对所述第二子图进行压缩;
节点映射单元505,用于将所述原图中被压缩的节点映射为压缩后的超点,以及将未被压缩的节点映射为自身,从而得到第一函数和第二函数;
压缩图构建单元506,用于结合所述第一子图、第二子图以及第一函数和第二函数构建得到所述压缩图。
在一实施例中,所述概要信息计算单元402包括:
过期数据第一枚举单元,用于对所述过期数据中的任意两个节点之间的距离进行枚举;
团结构枚举单元,用于对所述团结构中的任意两个节点之间的距离进行枚举;
星结构枚举单元,用于对所述星结构中的中间点和每一叶子节点之间的距离进行记录,以及对任意两个叶子节点之间的距离进行枚举;
简单路径结构记录单元,用于对所述简单路径结构中相邻的两个节点间的距离进行记录,以及将任意不相邻的两个节点之间的所有边长度进行相加求和;
拓扑特征记录单元,用于记录每一节点所属结构种类和拓扑特征。
在一实施例中,所述概要信息计算单元402还包括:
过期数据第二枚举单元,用于对所述过期数据中的每一三角形进行枚举;
团结构计算单元,用于按照下式对所述团结构中的三角形数量进行计算:
A=k(k-1)(k-2)/6
式中,A为所述团结构中的三角形数量,k为所述团结构中的节点数量;
节点设置单元,用于将所述原图中未被压缩的节点设置为外部节点,将所 述过期数据、团结构、星结构、简单路径结构中与所述外部节点相连的节点设置为边缘节点;
节点枚举单元,用于对所述边缘节点和所述外部节点之间构成的三角形进行枚举。
在一实施例中,所述查询单元403包括:
最短路径查询单元,用于当所述目标节点的查询信息类型为查询所述目标节点的最短距离时,在所述概要信息中获取所述目标节点与其他节点之间的距离,并选择最小的距离作为所述目标节点的最短路径返回。
在一实施例中,所述查询单元403还包括:
节点判断单元,用于当所述目标节点的查询信息类型为查询所述目标节点对应的三角形数量时,判断所述目标节点是否为压缩节点;
第一查询返回单元,用于若所述目标节点为压缩节点,则在所述概要信息中获取所述目标节点与其他节点构成的三角形数量作为所述目标节点对应的三角形数量并返回;
第一查询返回单元,用于当所述目标节点为未被压缩的节点时,则对所述未被压缩的节点对应的三角形进行枚举,并将枚举结果作为所述目标节点的查询结果返回。
在一实施例中,所述基于图压缩的图数据库构建装置还包括:
更新节点统计单元,用于对所述原图中更新的节点、边进行统计,得到一更新集合;
更新单元,用于根据所述更新集合对所述压缩图和所述概要信息进行更新。
由于装置部分的实施例与方法部分的实施例相互对应,因此装置部分的实施例请参见方法部分的实施例的描述,这里暂不赘述。
本申请实施例还提供了一种计算机可读存储介质,其上存有计算机程序,该计算机程序被执行时可以实现上述实施例所提供的步骤。该存储介质可以包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
本申请实施例还提供了一种计算机设备,可以包括存储器和处理器,存储器中存有计算机程序,处理器调用存储器中的计算机程序时,可以实现上述实 施例所提供的步骤。当然计算机设备还可以包括各种网络接口,电源等组件。
说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的系统而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围内。
还需要说明的是,在本说明书中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的状况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。

Claims (10)

  1. 一种基于图压缩的图数据库构建方法,其特征在于,包括:
    获取原图,并基于图的拓扑结构对所述原图进行压缩,得到压缩图;
    通过枚举的方式对所述压缩图进行概要信息计算;其中,所述概要信息包括节点间的距离和节点间构成的三角形数量以及节点所属的拓扑结构种类信息;
    当对所述压缩图进行查询时,确定待查询的目标节点以及对于目标节点的查询信息类型,然后根据预设的查找方式在所述概要信息中查找对应类型的查询信息,并作为所述目标节点的查询结果返回。
  2. 根据权利要求1所述的基于图压缩的图数据库构建方法,其特征在于,所述获取原图,并基于图的拓扑结构对所述原图进行压缩,得到压缩图,包括:
    利用预先设置的时间戳阈值确定所述原图中的过期数据,然后将所述过期数据中连通的节点以及未与其他节点连通的节点均作为第一子图,同时将所述第一子图组成第一类子图,并将所述第一类子图中的每一第一子图压缩为超点;
    基于所述原图确定待压缩的拓扑结构类型,其中,所述拓扑结构类型包括团结构、星结构和简单路径结构;
    获取所述原图上符合所述拓扑结构类型的第二类子图,使每一拓扑结构类型的第二类子图中均包含至少一由节点组成的第二子图,并判断所述第二子图中的节点数量是否符合预设子图节点阈值;
    若所述第二子图中的节点数量符合预设子图节点阈值,则对所述第二子图进行压缩;
    将所述原图中被压缩的节点映射为压缩后的超点,以及将未被压缩的节点映射为自身,从而得到第一函数和第二函数;
    结合所述第一子图、第二子图以及第一函数和第二函数构建得到所述压缩图。
  3. 根据权利要求2所述的基于图压缩的图数据库构建方法,其特征在于,所述通过枚举的方式对所述压缩图进行概要信息计算,包括:
    对所述过期数据中的任意两个节点之间的距离进行枚举;
    对所述团结构中的任意两个节点之间的距离进行枚举;
    对所述星结构中的中间点和每一叶子节点之间的距离进行记录,以及对任意两个叶子节点之间的距离进行枚举;
    对所述简单路径结构中相邻的两个节点间的距离进行记录,以及将任意不相邻的两个节点之间的所有边长度进行相加求和;
    记录每一节点所属结构种类和拓扑特征。
  4. 根据权利要求2所述的基于图压缩的图数据库构建方法,其特征在于,所述通过枚举的方式对所述压缩图进行概要信息计算,还包括:
    对所述过期数据中的每一三角形进行枚举;
    按照下式对所述团结构中的三角形数量进行计算:
    A=k(k-1)(k-2)/6
    式中,A为所述团结构中的三角形数量,k为所述团结构中的节点数量;
    将所述原图中未被压缩的节点设置为外部节点,将所述过期数据、团结构、星结构、简单路径结构中与所述外部节点相连的节点设置为边缘节点;
    对所述边缘节点和所述外部节点之间构成的三角形进行枚举。
  5. 根据权利要求3所述的基于图压缩的图数据库构建方法,其特征在于,所述当对所述压缩图进行查询时,确定待查询的目标节点以及对于目标节点的查询信息类型,然后根据预设的查找方式在所述概要信息中查找对应类型的查询信息,并作为所述目标节点的查询结果返回,包括:
    当所述目标节点的查询信息类型为查询所述目标节点的最短距离时,在所述概要信息中获取所述目标节点与其他节点之间的距离,并选择最小的距离作为所述目标节点的最短路径返回。
  6. 根据权利要求4所述的基于图压缩的图数据库构建方法,其特征在于,所述当对所述压缩图进行查询时,确定待查询的目标节点以及对于目标节点的查询信息类型,然后根据预设的查找方式在所述概要信息中查找对应类型的查询信息,并作为所述目标节点的查询结果返回,还包括:
    当所述目标节点的查询信息类型为查询所述目标节点对应的三角形数量时,判断所述目标节点是否为压缩节点;
    若所述目标节点为压缩节点,则在所述概要信息中获取所述目标节点与其他节点构成的三角形数量作为所述目标节点对应的三角形数量并返回;
    当所述目标节点为未被压缩的节点时,则对所述未被压缩的节点对应的三角形进行枚举,并将枚举结果作为所述目标节点的查询结果返回。
  7. 根据权利要求1所述的基于图压缩的图数据库构建方法,其特征在于,还包括:
    对所述原图中更新的节点、边进行统计,得到一更新集合;
    根据所述更新集合对所述压缩图和所述概要信息进行更新。
  8. 一种基于图压缩的图数据库构建装置,其特征在于,包括:
    第一压缩单元,用于获取原图,并基于图的拓扑结构对所述原图进行压缩,得到压缩图;
    概要信息计算单元,用于通过枚举的方式对所述压缩图进行概要信息计算;其中,所述概要信息包括节点间的距离和节点间构成的三角形数量以及节点所属的拓扑结构种类信息;
    查询单元,用于当对所述压缩图进行查询时,确定待查询的目标节点以及对于目标节点的查询信息类型,然后根据预设的查找方式在所述概要信息中查找对应类型的查询信息,并作为所述目标节点的查询结果返回。
  9. 一种计算机设备,其特征在于,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至7任一项所述的基于图压缩的图数据库构建方法。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述的基于图压缩的图数据库构建方法。
PCT/CN2021/096278 2021-05-17 2021-05-27 一种基于图压缩的图数据库构建方法、装置及相关组件 WO2022241813A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110536533.1 2021-05-17
CN202110536533.1A CN113190720B (zh) 2021-05-17 2021-05-17 一种基于图压缩的图数据库构建方法、装置及相关组件

Publications (1)

Publication Number Publication Date
WO2022241813A1 true WO2022241813A1 (zh) 2022-11-24

Family

ID=76982203

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/096278 WO2022241813A1 (zh) 2021-05-17 2021-05-27 一种基于图压缩的图数据库构建方法、装置及相关组件

Country Status (2)

Country Link
CN (1) CN113190720B (zh)
WO (1) WO2022241813A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117196542A (zh) * 2023-11-06 2023-12-08 上海叁零肆零科技有限公司 城市燃气管网拓扑完整性检查方法、装置、设备及介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113609100B (zh) * 2021-08-02 2023-10-27 北京百度网讯科技有限公司 数据存储方法、数据查询方法、装置及电子设备
WO2023231207A1 (zh) * 2022-06-02 2023-12-07 深圳计算科学研究院 一种基于层级压缩技术的图计算方法、装置及相关介质
CN115345291A (zh) * 2022-07-05 2022-11-15 华为技术有限公司 一种图处理方法及相关装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030088715A1 (en) * 2001-10-19 2003-05-08 Microsoft Corporation System for keyword based searching over relational databases
US20120269200A1 (en) * 2011-04-21 2012-10-25 International Business Machines Corporation Similarity Searching in Large Disk-Based Networks
US20140143280A1 (en) * 2012-11-20 2014-05-22 International Business Machines Corporation Scalable Summarization of Data Graphs
CN108388642A (zh) * 2018-02-27 2018-08-10 中南民族大学 一种子图查询方法、装置及计算机可读存储介质

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8392393B2 (en) * 2010-02-12 2013-03-05 Hewlett-Packard Development Company, L.P. Graph searching
WO2013087084A1 (de) * 2011-12-16 2013-06-20 Universität Zu Lübeck Verfahren und vorrichtung zur schätzung einer pose
US10521473B2 (en) * 2012-05-21 2019-12-31 Kent State University Shortest path computation in large networks
CN105530011A (zh) * 2014-09-30 2016-04-27 华东师范大学 一种基于三角形统计的图数据压缩方法和查询方法
CN107092667B (zh) * 2017-04-07 2018-02-27 平安科技(深圳)有限公司 基于社交网络的群组查找方法和装置
CN109492131B (zh) * 2018-09-18 2021-01-08 华为技术有限公司 一种图数据存储方法及装置
CN110598055A (zh) * 2019-08-23 2019-12-20 华北电力大学 一种基于属性图的并行图摘要方法
CN111444287B (zh) * 2020-03-17 2024-03-15 北京齐尔布莱特科技有限公司 图数据库构建方法、关联信息查询方法、装置及计算设备
CN111538867B (zh) * 2020-04-15 2021-06-15 深圳计算科学研究院 一种有界增量图划分方法和系统
CN111651641B (zh) * 2020-05-29 2023-08-29 全球能源互联网研究院有限公司 一种图查询方法、装置及存储介质
CN112699278A (zh) * 2020-12-23 2021-04-23 新奥数能科技有限公司 一种图数据库构建方法、装置、可读存储介质及电子设备
CN112530015A (zh) * 2020-12-28 2021-03-19 武汉怀创智能科技有限公司 基于空间结构特征的点云数据压缩存储方法与系统
CN112667860A (zh) * 2020-12-30 2021-04-16 海南普适智能科技有限公司 一种子图匹配方法、装置、设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030088715A1 (en) * 2001-10-19 2003-05-08 Microsoft Corporation System for keyword based searching over relational databases
US20120269200A1 (en) * 2011-04-21 2012-10-25 International Business Machines Corporation Similarity Searching in Large Disk-Based Networks
US20140143280A1 (en) * 2012-11-20 2014-05-22 International Business Machines Corporation Scalable Summarization of Data Graphs
CN108388642A (zh) * 2018-02-27 2018-08-10 中南民族大学 一种子图查询方法、装置及计算机可读存储介质

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DING, LINLIN, LI ZHENG DAO, JI WAN-TING , SONG BAO-YAN : "Reachability Query of Large Scale Dynamic Graph Based on Improved Huffman Coding", ACTA ELECTRONICA SINICA, vol. 45, no. 2, 28 February 2017 (2017-02-28), XP093008001 *
ZHANG, HAIWEI ET AL.: "An Algorithm for Subgraph Matching Based on Adaptive Structural Summary of Labeled Directed Graph Data", CHINESE JOURNAL OF COMPUTERS, vol. 40, no. 1, 31 January 2017 (2017-01-31), XP009541319 *
ZHANG, LI: "Lossless Compression of Large Scale Graphs with Query Support", CHINESE DOCTORAL DISSERTATIONS FULL-TEXT DATABASE (INFORMATION TECHNOLOGY SERIES, 15 September 2017 (2017-09-15), XP009541320 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117196542A (zh) * 2023-11-06 2023-12-08 上海叁零肆零科技有限公司 城市燃气管网拓扑完整性检查方法、装置、设备及介质
CN117196542B (zh) * 2023-11-06 2024-01-26 上海叁零肆零科技有限公司 城市燃气管网拓扑完整性检查方法、装置、设备及介质

Also Published As

Publication number Publication date
CN113190720A (zh) 2021-07-30
CN113190720B (zh) 2023-01-17

Similar Documents

Publication Publication Date Title
WO2022241813A1 (zh) 一种基于图压缩的图数据库构建方法、装置及相关组件
US10423626B2 (en) Systems and methods for data conversion and comparison
US10430433B2 (en) Systems and methods for data conversion and comparison
US10394822B2 (en) Systems and methods for data conversion and comparison
US20200410005A1 (en) Heterogenous key-value sets in tree database
US10180992B2 (en) Atomic updating of graph database index structures
US20100106713A1 (en) Method for performing efficient similarity search
US20150293958A1 (en) Scalable data structures
CN112148928B (zh) 一种基于指纹家族的布谷鸟过滤器
US20170255708A1 (en) Index structures for graph databases
US20180144061A1 (en) Edge store designs for graph databases
Li et al. Spatial approximate string search
CN102890678A (zh) 一种基于格雷编码的分布式数据布局方法及查询方法
US10445370B2 (en) Compound indexes for graph databases
WO2021179488A1 (zh) 监控数据存储方法、设备、服务器及存储介质
WO2023083237A1 (zh) 图数据的管理
WO2023179787A1 (zh) 分布式文件系统的元数据管理方法和装置
US20110179013A1 (en) Search Log Online Analytic Processing
CN113656397A (zh) 一种针对时序数据的索引构建及查询的方法、装置
CN113297171A (zh) 数据库迁移方法及装置、数据库集群
US20080294673A1 (en) Data transfer and storage based on meta-data
CN116431726A (zh) 一种图数据处理方法、装置、设备及计算机存储介质
CN115905630A (zh) 一种图数据库查询方法、装置、设备及存储介质
US8548980B2 (en) Accelerating queries based on exact knowledge of specific rows satisfying local conditions
CN110389953B (zh) 基于压缩图的数据存储方法、存储介质、存储装置和服务器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21940266

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE