CN114138776A - Method, system, apparatus and medium for graph structure and graph attribute separation design - Google Patents

Method, system, apparatus and medium for graph structure and graph attribute separation design Download PDF

Info

Publication number
CN114138776A
CN114138776A CN202111291714.9A CN202111291714A CN114138776A CN 114138776 A CN114138776 A CN 114138776A CN 202111291714 A CN202111291714 A CN 202111291714A CN 114138776 A CN114138776 A CN 114138776A
Authority
CN
China
Prior art keywords
graph
graph structure
engine
caching
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111291714.9A
Other languages
Chinese (zh)
Inventor
吴敏
叶小萌
周瑶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Ouruozhi Technology Co ltd
Original Assignee
Hangzhou Ouruozhi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Ouruozhi Technology Co ltd filed Critical Hangzhou Ouruozhi Technology Co ltd
Priority to CN202111291714.9A priority Critical patent/CN114138776A/en
Publication of CN114138776A publication Critical patent/CN114138776A/en
Priority to US17/977,226 priority patent/US20230140423A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24539Query rewriting; Transformation using cached or materialised query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a method, a system, a device and a medium for separating and designing a graph structure and graph attributes, wherein the method comprises the following steps: performing Key-Value structure separation on a storage engine, and separately storing the graph structure and the graph attributes of graph data; and reading the graph structure by the computing engine, caching the graph structure in a short-term caching mode in a single query request scene, and caching the graph structure in a long-term caching mode in a read-only analysis scene. By the method and the device, the problems of complex engineering and poor image depth traversal performance during image data processing are solved, the engineering complexity is reduced, and the image depth traversal performance is improved.

Description

Method, system, apparatus and medium for graph structure and graph attribute separation design
Technical Field
The present application relates to the field of graph databases, and more particularly, to methods, systems, apparatus, and media for graph structure and graph attribute separation design.
Background
A graph database is a database used to process graph data structures, and can complete graph traversal operations within milliseconds. The graph data structure is composed of points and edges, and attributes on the points and attributes on the edges form graph attributes. For example, in an equity relationship such as (company) < [ invest ] - (stockholder), company and stockholder are points, and [ invest ] is an edge, an edge: [ investment ] has an attribute of "investment amount". At this time, all stakeholders (companies) need to be queried, only the graph structure needs to be used, and graph attributes do not need to be used; the largest stakeholder of the query (company) needs to use both the graph structure and the graph attributes. Therefore, the number of accesses to the graph structure is necessarily greater than the number of accesses to the graph attributes, and the acceleration of the graph structure is more important than the acceleration of the graph attributes.
In the related art, there are generally two design ideas for the graph structure and the graph attributes of graph data: taking a Native graph (Native graph) of Neo4j as an example, a completely customized graph structure and graph attributes are adopted, and a calculation engine and a storage engine are in the same process, wherein the graph structure adopts a doubly linked list, and the graph attributes adopt a singly linked list. The advantages are that: when the first adjacent side of the point is taken, the speed is extremely high, but the engineering complexity is higher;
in another example, where JanusGraph and Nebula Graph are used, the compute engine and the storage engine are not in the same process, and the storage engine stores the Graph structure and the Graph attributes by using a Key Value structure. The advantages are that: can reduce engineering complexity, be fit for the construction of large-scale distributed system, its shortcoming is: over the deep pass, the performance is poor. This is due, in part, to the fact that such a split architecture does not hierarchically cache and speed up graph data, and data must be frequently exchanged between two engines, for example, a storage engine accesses un-optimized graph topology data and a compute engine reads un-cached graph topology data. Thus, the existing split structure has poor depth traversal performance.
At present, no effective solution is provided for the problems of complex engineering and poor image depth traversal performance in processing image data in the related art.
Disclosure of Invention
The embodiment of the application provides a method, a system, a device and a medium for separating and designing a graph structure and graph attributes, so as to solve the problems of complex engineering and poor graph depth traversal performance when graph data is processed in the related art.
In a first aspect, an embodiment of the present application provides a method for separating and designing a graph structure and graph attributes, which is applied to a system for implementing separating and designing a graph structure and graph attributes in a graph database, where the system includes: a storage engine and a compute engine;
performing Key-Value structure separation on the storage engine, and separately storing the graph structure and the graph attributes of the graph data;
and the calculation engine reads the graph structure, caches the graph structure in a short-term cache mode in a single query request scene, and caches the graph structure in a long-term cache mode in a read-only analysis scene.
In some of these embodiments, Key-Value structure splitting the storage engine includes:
and adding four dual read interfaces in the storage engine to acquire the data stored by the Key in the Key-Value structure.
In some embodiments, the graph structure caching by short-term caching includes:
starting from a first clause of a query statement, judging a required graph structure part through an execution plan, and checking whether the required graph structure part is cached in an adjacency list of a current process;
in the absence of the need, requesting and obtaining edges from the storage engine, and adding a corresponding linked list in the adjacency list, and in the presence of the need, directly obtaining the needed graph structure part from the adjacency list;
and executing and finishing the query statement, and releasing the cache data of the adjacency list.
In some of these embodiments, requesting and retrieving edges from the storage engine comprises:
and the calculation engine actively carries out RPC request to the storage engine, or returns to obtain the edge through the pre-reading loading of the storage engine.
In some embodiments, the graph structure caching by long-term caching includes:
graph structure caching is performed by the CSR.
In some embodiments, the graph structure caching by the CSR includes:
in the storage engine, counting and summarizing data of each storage partition of the graph structure, respectively performing point coding and edge coding on the data, and storing the data into the storage engine;
and scanning the full mapping relation and the full edges in the Key-Value in each storage partition, processing corresponding concurrent tasks and subtasks, generating and obtaining a CSR, and caching the CSR into the computing engine.
In some embodiments, after generating a CSR and caching the CSR in the compute engine, the method includes:
and deleting the graph topology cache in the computing engine and deleting the persistent mapping coded in the storage engine through a deletion instruction.
In a second aspect, an embodiment of the present application provides a system for separating and designing a graph structure and a graph attribute, where the system includes:
the storage engine separates a Key-Value structure and stores the graph structure and the graph attribute of the graph data separately;
and the calculation engine reads the graph structure, caches the graph structure in a short-term cache mode in a single query request scene, and caches the graph structure in a long-term cache mode in a read-only analysis scene.
In a third aspect, an embodiment of the present application provides an electronic apparatus, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the processor implements the method for graph structure and graph attribute separation design according to the first aspect.
In a fourth aspect, the present application provides a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for graph structure and graph attribute separation design as described in the first aspect above.
Compared with the related art, the method for separating and designing the graph structure and the graph attributes, provided by the embodiment of the application, is applied to a system for realizing the separation and design of the graph structure and the graph attributes in a graph database, and the system comprises the following steps: a storage engine and a compute engine. Specifically, Key-Value structure separation is carried out on a storage engine, and a graph structure and graph attributes of graph data are stored separately; and reading the graph structure by the computing engine, caching the graph structure in a short-term caching mode in a single query request scene, and caching the graph structure in a long-term caching mode in a read-only analysis scene.
In the application, a storage engine part separates a graph structure and graph attributes in hardware by using a Key Value separation scheme, so that the reading speed of the graph structure in a hard disk and a memory is increased; in addition, the calculation engine part designs the graph structure cache in a long-term cache and short-term cache mode, and the method has the advantages that: 1. the time delay of graph query and graph calculation is reduced, 2, reusable engineering codes are more, the development amount is smaller, the development risk is lower, 3, the compatibility is better, both historical query statements and files can be compatible, and the user side transformation is not needed.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow diagram of a method of graph structure and graph property separation design according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a Key-Value separation structure according to an embodiment of the present application;
FIG. 3 is a task flow diagram for implementing CSR graph structure caching according to an embodiment of the present application;
FIG. 4 is a block diagram of a system for graph structure and graph property separation design according to an embodiment of the present application;
fig. 5 is an internal structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference herein to "a plurality" means greater than or equal to two. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
The embodiment of the present application provides a method for separately designing a graph structure and a graph attribute, fig. 1 is a flowchart of the method for separately designing the graph structure and the graph attribute according to the embodiment of the present application, and as shown in fig. 1, the flowchart includes the following steps:
step S101, performing Key-Value structure separation on a storage engine, and separately storing a graph structure and graph attributes of graph data, wherein the specific separation design of the storage engine is as follows:
fig. 2 is a schematic diagram of a Key-Value separation structure according to an embodiment of the present application, and as shown in fig. 2, in this embodiment, a Key-Value separation scheme is adopted, all keys are aggregated into a smaller LSM-tree, and all values are aggregated into a Value Log file in an appendix-only manner. Wherein, addr part in each < key, addr > in LSM-tree points to offset and length in the corresponding Value Log file. It should be noted that, partial adjustments are performed for both the read interface and the write interface: the form of the writing interface is not changed, but the storage is split into two parts, namely an LSM-tree part and a Value Log part; the form of the reading interface is also unchanged, but the reading process is divided into two steps, namely, the LSM tree on the left side is read firstly to obtain < key, addr >, and the addr value of the data to be read is obtained; then obtaining corresponding offer and length in Value Log according to the addr Value, and taking out the real Value; finally, splicing the value taken out with the read key value to obtain < key, value > and returning;
furthermore, the separate design of this embodiment customizes the compact () interface, and when the garbage data reaches a set proportion, for example, 30%, then triggers the full amount of compact (), otherwise, by default, triggers only the compact () of the LSM-tree part, so that the hard disk read-write amplification brought by the compact () can be greatly reduced;
and after Key-value separation, the block cache is changed into a cache only allowing to cache the data of the Key part. For the migration of the stock historical data file, the stock < key, Value > data can be read one by one through a batch task or a customized compact task, and the stock historical data file is changed into a format of LSM-tree + Value Log;
in addition, in this embodiment, four dual read interfaces are further added to the storage engine to obtain the data stored in Key in the Key-Value structure. Specifically, the four dual interfaces are respectively: GetKey (), MultiGetKey (), RangeKey (), PrefixKey (). The effect of the four new interfaces is to acquire only the Key part of the stored data and not the Value part of the stored data. This means that these 4 new interfaces only need to access the left LSM-tree portion of fig. 2, not the right Value Log portion; that is, only the hard disk file of the LSM-tree part is accessed and the hard disk file of the Value Log part is not accessed corresponding to the operation of the hard disk. The data volume loaded from the hard disk into the memory is greatly reduced, and meanwhile, the time delay is greatly reduced because the Key occupies a much smaller data memory than the Value occupies.
The storage engine part of the embodiment separates the graph structure and the graph attributes in hardware by using a Key Value separation scheme, thereby effectively improving the reading speed of the graph structures in the hard disk and the memory;
step S102, a calculation engine reads a graph structure, and performs graph structure caching in a short-term caching mode in a single query request scene, and performs graph structure caching in a long-term caching mode in a read-only analysis scene. It should be noted that the single query-level graph structure cache means that, in a compound statement or a single query statement composed of multiple clauses, a graph structure acquired by a previous clause may be reused by subsequent clauses until the compound statement completes query, and the cache is released. Furthermore, the caching of graph structures is not multiplexed between different query statements. The read-only analysis scenario means that the administrator generates a graph cache through command specification, and the graph cache can be used by various query statements until the administrator specifies deletion. In specific practice, the short-term caching mode and the long-term caching mode can be combined. For example, the caching timeliness of the short-term caching mode can be changed to be controlled or configured by an administrator; the caching lifecycle of the long-term caching approach may also be initiated by certain specific query statements.
Preferably, in this embodiment, in a single query request scenario, an adjacency list is used for graph structure caching. The specific process is as follows:
s1: judging a required graph structure part by an execution plan from a first clause of the query statement;
s2: checking whether the required graph structure part is cached in the adjacency list of the current process,
requesting and acquiring edges from the storage engine under the condition of no edges, and adding a corresponding linked list in the adjacency list every time one or more edges are requested from the storage engine; preferably, the request and the acquisition may be that the computation engine actively makes an RPC request to the storage engine, or that the edge is obtained by returning through a pre-read load of the storage engine. For example, the calculation engine only requests the edge 0- >1, but the storage engine, due to the storage locality thereof, additionally returns other adjacent edges of the point 0 besides the edge 0- >1, such as 0- >4, 1- >0, and the like;
in some cases, the calculation engine directly obtains the required graph structure part from the adjacency list;
s3: and releasing the cache data of the adjacency list after the query of the whole statement is completed.
Through the steps, when the graph structure cache is realized concretely, in order to improve the multithreading concurrency capability, a Concurrent adjacency list or the like supporting concurrency can be used. The calculation engine does not need to acquire the whole image, and the memory can be effectively saved.
For example, there are 5 sub-query statements whose execution order is sub-query 1, sub-query 2, sub-query 3, sub-query 4, and sub-query 5, where sub-query 1 first performs 6-layer breadth-first expansion, that is, breadth-first expansion starts from the point of "flodamreyhxgvqdsf", layer 1 expansion can be understood as an adjacent point of this point, layer 2 expansion can be understood as an adjacent point of the adjacent point of this point, and so on, 6-layer breadth-first expansion is performed to obtain 6-layer adjacent points. Then, all the adjacency points are acquired and the path chain between the adjacency points is obtained through the query 3. Finally, several paths among all paths through the subquery 5 statistical point "fludamreyhkyxgvqdsf" can communicate with the point "fludamreyhxgvqdsf". For example, there are very distant friends, how many kinds of friend chains can be connected between two people. The specific codes are as follows:
go 6steps from"fludamreyhkyxgvqdssf"over*YIELD RM10011889._dst as dst
if the subquery1 is executed, the adjacent point with the segment of the point 'fludameyhkyxvgqdssf' is obtained.
I// execute subquery2, give the result from subquery1 to subquery 3.
find all path from"fludamreyhkyxgvqdssf"to$-.dst over*UPTO 6steps
// execute subquery3, find all path chains related to the point "fludameyhkyxvgqdssf".
I// execute subquery4, and submit the result from subquery3 to subquery 5.
YIELD count (//execute subquery 5, there are several paths in all paths that count point "fludamreyhkyxvgqdsf" that can communicate with point "fludamreyhkyxvgqdsf".
In some typical large-scale scenarios, before the structure of the cache map, the acquisition process of the 6-layer BFS may involve about 37 ten thousand concurrent RPC requests, which may generate about 300 ten thousand paths in total, and the data size is huge. And only the graph structure is cached in the calculation engine, so that half of RPC communication time delay can be effectively reduced, and the efficiency is improved.
Preferably, in this embodiment, in a read-only analysis scenario, a CSR (compressed sparse matrix) is used for graph structure caching. The specific process is as follows:
s1: in a storage engine, counting and summarizing data of each storage partition of a graph structure, respectively carrying out point coding and edge coding on the data, and then storing the data into the storage engine;
s2: and scanning the mapping relation and the edges of the total amount in each storage partition, performing corresponding concurrent task and subtask processing, generating and obtaining a CSR, and caching the CSR into a computing engine.
In some embodiments, after generating the CSR and caching the CSR in the compute engine, the graph topology cache in the compute engine is deleted and the persistent mapping encoded in the storage engine is deleted by sending a delete instruction.
It should be noted that, in step S101, by separating Key-Value, the graph structure and the graph attribute are separated, which can greatly accelerate the encoding process of CSR in this embodiment.
Specifically, in this embodiment, taking encoding the id number VID of the student into the student number int32ID of the student as an example, fig. 3 is a task flow diagram for implementing a CSR graph structure cache according to this embodiment of the present application, as shown in fig. 3, the specific process is as follows:
s1: running a statistical task 1 in the Nebula Graph, wherein the statistical task 1 comprises a concurrent subtask 1.1 and a convergence subtask 1.2, and specifically:
the concurrent subtask 1.1 counts the number of VIDs in each storage partition (partition) in the storage engine, and the VIDs are respectively marked as P1, P2, and … PN. The number of concurrent subtasks is equal to the number of partitions. It should be noted that this process corresponds to one scan of the LSM-Tree in each Partition.
The aggregate subtask 1.2 is the total number of VIDs aggregated across the storage engine. It should be noted that the convergence subtask 1.2 only needs to be executed by a single process.
In the concrete implementation, a large amount of existing submit job stats function codes of Nebula Graph can be reused in the concurrent subtask 1.1;
s2: and running an encoding task 2 in the Nebula Graph to perform point encoding. Wherein, the encoding task 2 includes a convergence subtask 2.1 and a concurrency subtask 2.2, specifically:
the convergence subtask 2.1 is used to assign a continuous ID range [ begin, end ] to each Partition, e.g., Partition 1 is assigned a continuous ID range [1, P1], Partition 2 is [ P1+1, P1+ P2], Partition 3 is [ P1+ P2+1, P1+ P2+ P3] ….
The concurrency subtask 2.2 records the first VID1 as two < key, value > pairs for each Partition when it is scanned: < begin, VID1> and < VID1, begin >; and writing the two < key, value > pairs into the storage engines of the Nebula Graph respectively, wherein begin is distributed in the aggregation subtask 2.1. Then, scanning next VID2 to generate < begin +1, VID2> and < VID2, begin +1>, and the above process is processed in sequence;
s3: and running a coding task 3 in the Nebula Graph to carry out side coding. Specifically, the concurrency task 3 is performed, and in each partition, one edge represented by the original (VID1) - > (VID2) is changed to (int32 ID1) - > (int32 ID 2). For example, an original edge ("abc 001") - > ("bcd 002") can be expressed as (1) - > (12). In this embodiment, before encoding the edge, the Nebula Graph storage engine needs to use 58 bytes to store one edge, and after encoding, only 8 bytes need to be used. The memory occupation in the subsequent calculation process is greatly reduced;
s4: running the generating task 4 to generate and obtain the CSR, specifically:
the concurrency subtask 4.1 scans the full mapping relationship in each partition, that is, the mapping relationship between VID and int32ID generated in the concurrency subtask 2.2, and the full edges, obtains the number of edges in each partition, and synchronizes the number of edges to other concurrency tasks;
each concurrent subtask 4.2 calculates, in the memory, a corresponding row in the Compressed spare row format, where the specific calculation can be implemented by an open source code "boost Compressed _ spare _ row _ graph library" in C + +. .
The concurrent subtask 4.3 converges the result obtained by the computation of the concurrent subtask 4.2 to a certain process to form a complete CSR;
s5: caching the CSR corresponding to the coded student ID in a calculation engine;
s6: by sending the instruction, the computing engine deletes the graph topology cache, and the storage engine deletes the persistent mapping of the code.
In the embodiment, the calculation engine part designs the graph structure cache in a long-term cache and short-term cache mode, so that the time delay of graph query and graph calculation is reduced; in addition, the reusable engineering codes are more, the development amount is smaller, and the development risk is lower; moreover, the compatibility is good, the historical query statement and the file can be compatible, and the transformation on the user side is not needed.
Through the steps S101 to S102, in this embodiment, the storage engine part separates the graph structure and the graph attributes in hardware by using a Key Value separation scheme, so that the reading speed of the graph structure in the hard disk and the memory is increased, the access efficiency of the graph structure is accelerated, and great help is provided for subsequent calculation; the development amount and the development risk are reduced, and the historical compatibility of the data file is kept; the calculation engine part designs graph structure cache in a long-term cache and short-term cache mode, and further reduces time delay of graph query and graph calculation; and maintains compatibility of historical queries. By combining the two methods, the problems of complex engineering and poor image depth traversal performance in image data processing are solved, the engineering complexity is reduced, and the image depth traversal performance is improved.
It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.
The present embodiment further provides a system for separately designing a graph structure and a graph attribute, where the system is used to implement the foregoing embodiments and preferred embodiments, and details are not repeated after the description. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 4 is a block diagram of a system for separating a graph structure and a graph attribute according to an embodiment of the present application, and as shown in fig. 4, the system includes a storage engine 41 and a calculation engine 42:
the storage engine 41 separates the Key-Value structure and stores the graph structure and the graph attribute of the graph data separately; the calculation engine 42 reads the graph structure, and performs graph structure caching in a short-term caching mode in a single query request scenario, and performs graph structure caching in a long-term caching mode in a read-only analysis scenario. In this embodiment, the storage engine 41 part separates the graph structure and the graph attributes in hardware by using a Key Value separation scheme, so that the reading speed of the graph structure in a hard disk and a memory is increased, the access efficiency of the graph structure is accelerated, and great help is provided for subsequent calculation; the development amount and the development risk are reduced, and the historical compatibility of the data file is kept; the calculation engine 42 designs graph structure cache in a long-term cache and short-term cache mode, and further reduces time delay of graph query and graph calculation; and maintains compatibility of historical queries. By combining the two methods, the problems of complex engineering and poor image depth traversal performance in image data processing are solved, the engineering complexity is reduced, and the image depth traversal performance is improved.
It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.
Note that each of the modules may be a functional module or a program module, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.
The present embodiment also provides an electronic device comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
In addition, in combination with the method for separately designing the graph structure and the graph attribute in the above embodiments, the embodiments of the present application may provide a storage medium to implement. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements a method for graph structure and graph attribute separation design in any of the above embodiments.
In one embodiment, a computer device is provided, which may be a terminal. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method for graph structure and graph property separation design. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
In one embodiment, fig. 5 is a schematic diagram of an internal structure of an electronic device according to an embodiment of the present application, and as shown in fig. 5, an electronic device is provided, where the electronic device may be a server, and the internal structure diagram may be as shown in fig. 5. The electronic device comprises a processor, a network interface, an internal memory and a non-volatile memory connected by an internal bus, wherein the non-volatile memory stores an operating system, a computer program and a database. The processor is used for providing calculation and control capability, the network interface is used for communicating with an external terminal through network connection, the internal memory is used for providing an environment for an operating system and the running of a computer program, the computer program is executed by the processor to realize a method for separating and designing a graph structure and graph attributes, and the database is used for storing data.
Those skilled in the art will appreciate that the configuration shown in fig. 5 is a block diagram of only a portion of the configuration associated with the present application, and does not constitute a limitation on the electronic device to which the present application is applied, and a particular electronic device may include more or less components than those shown in the drawings, or may combine certain components, or have a different arrangement of components.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It should be understood by those skilled in the art that various features of the above-described embodiments can be combined in any combination, and for the sake of brevity, all possible combinations of features in the above-described embodiments are not described in detail, but rather, all combinations of features which are not inconsistent with each other should be construed as being within the scope of the present disclosure.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for separating and designing graph structure and graph attributes is applied to a system for realizing the separation and design of the graph structure and the graph attributes in a graph database, and is characterized in that the system comprises: a storage engine and a compute engine;
performing Key-Value structure separation on the storage engine, and separately storing the graph structure and the graph attributes of the graph data;
and the calculation engine reads the graph structure, caches the graph structure in a short-term cache mode in a single query request scene, and caches the graph structure in a long-term cache mode in a read-only analysis scene.
2. The method of claim 1, wherein performing Key-Value structure separation on the storage engine comprises:
and adding four dual read interfaces in the storage engine to acquire the data stored by the Key in the Key-Value structure.
3. The method of claim 1, wherein the graph structure caching by short-term caching comprises:
starting from a first clause of a query statement, judging a required graph structure part through an execution plan, and checking whether the required graph structure part is cached in an adjacency list of a current process;
in the absence of the need, requesting and obtaining edges from the storage engine, and adding a corresponding linked list in the adjacency list, and in the presence of the need, directly obtaining the needed graph structure part from the adjacency list;
and executing and finishing the query statement, and releasing the cache data of the adjacency list.
4. The method of claim 3, wherein requesting and retrieving edges from the storage engine comprises:
and the calculation engine actively carries out RPC request to the storage engine, or returns to obtain the edge through the pre-reading loading of the storage engine.
5. The method of claim 1, wherein the graph structure caching by long-term caching comprises:
graph structure caching is performed by the CSR.
6. The method of claim 5, wherein the graph structure caching by the CSR comprises:
in the storage engine, counting and summarizing data of each storage partition of the graph structure, respectively performing point coding and edge coding on the data, and storing the data into the storage engine;
and scanning the full mapping relation and the full edges in the Key-Value in each storage partition, processing corresponding concurrent tasks and subtasks, generating and obtaining a CSR, and caching the CSR into the computing engine.
7. The method of claim 6, wherein after generating a CSR and caching the CSR in the compute engine, the method comprises:
and deleting the graph topology cache in the computing engine and deleting the persistent mapping coded in the storage engine through a deletion instruction.
8. A system for separating design of graph structure and graph attributes, the system comprising: a storage engine and a compute engine;
the storage engine separates a Key-Value structure and stores the graph structure and the graph attribute of the graph data separately;
and the calculation engine reads the graph structure, caches the graph structure in a short-term cache mode in a single query request scene, and caches the graph structure in a long-term cache mode in a read-only analysis scene.
9. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and the processor is configured to execute the computer program to perform the method of graph structure and graph property separation design according to any one of claims 1 to 7.
10. A storage medium having stored thereon a computer program, wherein the computer program is configured to execute the method of graph structure and graph property separation design according to any one of claims 1 to 7 when running.
CN202111291714.9A 2021-11-01 2021-11-01 Method, system, apparatus and medium for graph structure and graph attribute separation design Pending CN114138776A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111291714.9A CN114138776A (en) 2021-11-01 2021-11-01 Method, system, apparatus and medium for graph structure and graph attribute separation design
US17/977,226 US20230140423A1 (en) 2021-11-01 2022-10-31 Method and system for storing data in graph database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111291714.9A CN114138776A (en) 2021-11-01 2021-11-01 Method, system, apparatus and medium for graph structure and graph attribute separation design

Publications (1)

Publication Number Publication Date
CN114138776A true CN114138776A (en) 2022-03-04

Family

ID=80392306

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111291714.9A Pending CN114138776A (en) 2021-11-01 2021-11-01 Method, system, apparatus and medium for graph structure and graph attribute separation design

Country Status (1)

Country Link
CN (1) CN114138776A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114925123A (en) * 2022-04-24 2022-08-19 杭州悦数科技有限公司 Data transmission method between distributed graph database and graph computing system
CN115033722A (en) * 2022-08-10 2022-09-09 杭州悦数科技有限公司 Method, system, device and medium for accelerating data query of database
CN115168505A (en) * 2022-06-21 2022-10-11 中国人民解放军国防科技大学 Management system and method for ocean space-time data
CN115374301A (en) * 2022-10-24 2022-11-22 杭州欧若数网科技有限公司 Cache structure, and method and system for realizing graph query based on cache structure
CN115658329A (en) * 2022-12-22 2023-01-31 杭州欧若数网科技有限公司 Method, system and medium for optimizing memory of graph data structure

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899156A (en) * 2015-05-07 2015-09-09 中国科学院信息工程研究所 Large-scale social network service-oriented graph data storage and query method
CN109255055A (en) * 2018-08-06 2019-01-22 四川蜀天梦图数据科技有限公司 A kind of diagram data access method and device based on packet associated table
CN110168533A (en) * 2016-12-15 2019-08-23 微软技术许可有限责任公司 Caching to subgraph and the subgraph of caching is integrated into figure query result
CN110399089A (en) * 2018-04-19 2019-11-01 阿里巴巴集团控股有限公司 Date storage method, device, equipment and medium
CN110929186A (en) * 2018-08-29 2020-03-27 武汉斗鱼网络科技有限公司 Client, information display method, electronic device and medium
CN112166425A (en) * 2018-04-18 2021-01-01 甲骨文国际公司 Efficient in-memory relationship representation for metamorphic graphs
CN113204564A (en) * 2021-05-20 2021-08-03 山东英信计算机技术有限公司 Database high-frequency SQL query method, system and storage medium
CN113254527A (en) * 2021-04-22 2021-08-13 杭州欧若数网科技有限公司 Optimization method of distributed storage map data, electronic device and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899156A (en) * 2015-05-07 2015-09-09 中国科学院信息工程研究所 Large-scale social network service-oriented graph data storage and query method
CN110168533A (en) * 2016-12-15 2019-08-23 微软技术许可有限责任公司 Caching to subgraph and the subgraph of caching is integrated into figure query result
CN112166425A (en) * 2018-04-18 2021-01-01 甲骨文国际公司 Efficient in-memory relationship representation for metamorphic graphs
CN110399089A (en) * 2018-04-19 2019-11-01 阿里巴巴集团控股有限公司 Date storage method, device, equipment and medium
CN109255055A (en) * 2018-08-06 2019-01-22 四川蜀天梦图数据科技有限公司 A kind of diagram data access method and device based on packet associated table
CN110929186A (en) * 2018-08-29 2020-03-27 武汉斗鱼网络科技有限公司 Client, information display method, electronic device and medium
CN113254527A (en) * 2021-04-22 2021-08-13 杭州欧若数网科技有限公司 Optimization method of distributed storage map data, electronic device and storage medium
CN113204564A (en) * 2021-05-20 2021-08-03 山东英信计算机技术有限公司 Database high-frequency SQL query method, system and storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114925123A (en) * 2022-04-24 2022-08-19 杭州悦数科技有限公司 Data transmission method between distributed graph database and graph computing system
CN115168505A (en) * 2022-06-21 2022-10-11 中国人民解放军国防科技大学 Management system and method for ocean space-time data
CN115033722A (en) * 2022-08-10 2022-09-09 杭州悦数科技有限公司 Method, system, device and medium for accelerating data query of database
CN115374301A (en) * 2022-10-24 2022-11-22 杭州欧若数网科技有限公司 Cache structure, and method and system for realizing graph query based on cache structure
CN115374301B (en) * 2022-10-24 2023-02-07 杭州欧若数网科技有限公司 Cache device, method and system for realizing graph query based on cache device
CN115658329A (en) * 2022-12-22 2023-01-31 杭州欧若数网科技有限公司 Method, system and medium for optimizing memory of graph data structure
CN115658329B (en) * 2022-12-22 2023-03-17 杭州欧若数网科技有限公司 Method, system and medium for optimizing memory of graph data structure

Similar Documents

Publication Publication Date Title
CN107391653B (en) Distributed NewSQL database system and picture data storage method
US9678969B2 (en) Metadata updating method and apparatus based on columnar storage in distributed file system, and host
CN114138776A (en) Method, system, apparatus and medium for graph structure and graph attribute separation design
US20220350819A1 (en) System and method for improved performance in a multidimensional database environment
US9767131B2 (en) Hierarchical tablespace space management
CN104679898A (en) Big data access method
CN104778270A (en) Storage method for multiple files
CN112597114B (en) OLAP (on-line analytical processing) precomputation engine optimization method and application based on object storage
CN105718561A (en) Particular distributed data storage file structure redundancy removing construction method and system
US11775527B2 (en) Storing derived summaries on persistent memory of a storage device
US10552371B1 (en) Data storage system with transparent presentation of file attributes during file system migration
US9760577B2 (en) Write-behind caching in distributed file systems
WO2023000561A1 (en) Method and apparatus for accelerating database operation
CN110781137A (en) Directory reading method and device for distributed system, server and storage medium
US11074244B1 (en) Transactional range delete in distributed databases
CN112395252A (en) File merging method and device and electronic equipment
CN113051244A (en) Data access method and device, and data acquisition method and device
Lee et al. Boosting compaction in B-tree based key-value store by exploiting parallel reads in flash ssds
US20230140423A1 (en) Method and system for storing data in graph database
CN111752941B (en) Data storage and access method and device, server and storage medium
US11500590B2 (en) Method, device and computer program product for data writing
US20240111743A1 (en) Efficient evaluation of queries across multiple columnar storage tiers
CN112835888A (en) Joining method and related apparatus
CN111752941A (en) Data storage method, data access method, data storage device, data access device, server and storage medium
CN114116189A (en) Task processing method and device and computing equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220304