CN114741569A - Method and device for supporting composite data types in graph database - Google Patents

Method and device for supporting composite data types in graph database Download PDF

Info

Publication number
CN114741569A
CN114741569A CN202210647635.5A CN202210647635A CN114741569A CN 114741569 A CN114741569 A CN 114741569A CN 202210647635 A CN202210647635 A CN 202210647635A CN 114741569 A CN114741569 A CN 114741569A
Authority
CN
China
Prior art keywords
edge
value
graph
graph database
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210647635.5A
Other languages
Chinese (zh)
Other versions
CN114741569B (en
Inventor
吴敏
古思为
岳通
杨怡璇
黄凤仙
梁振亚
周瑶
叶小萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Ouruozhi Technology Co ltd
Original Assignee
Hangzhou Ouruozhi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Ouruozhi Technology Co ltd filed Critical Hangzhou Ouruozhi Technology Co ltd
Priority to CN202210647635.5A priority Critical patent/CN114741569B/en
Publication of CN114741569A publication Critical patent/CN114741569A/en
Application granted granted Critical
Publication of CN114741569B publication Critical patent/CN114741569B/en
Priority to US17/977,226 priority patent/US20230140423A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method and a device for supporting composite data types in a graph database, wherein the method comprises the following steps: when the first attribute of the first graph element of the graph database is composite data, creating N +1 auxiliary edges of self-loops on the first graph element, wherein N is greater than the number of elements of the composite data; and corresponding each auxiliary edge on the first graph element to an element of the composite data one by one, wherein the rank value of the auxiliary edge is the ranking value of the corresponding element in the composite data, and the attribute comprises the real numerical value of the corresponding element. The invention can enable the attributes in the graph database to support the composite data type on the premise of not sacrificing the performance of the graph database.

Description

Method and device for supporting composite data types in graph database
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a method and a device for supporting composite data types in a graph database.
Background
With the rapid development of big data technology and artificial intelligence technology, very large scale relational networks are gradually widely applied in the fields of social recommendation, risk control, internet of things, block chains, security and the like. Such ultra-large scale relational networks are generally based on Graph theory (Graph) in data structures, and core elements constituting the Graph (relational network) include: a point (or node, also called a node) and attributes on the point, an edge (or Relationship), and attributes on the edge. Collectively referred to as an attribute map.
For example, in a social network, a point may correspond to an individual, and its attributes may be a mailbox, an account, etc.; the edges can correspond to friend relationships or transfer relationships, the attributes of the edges can be transfer amount, transfer time and the like, and the edges have directionality.
Graph databases are a class of database systems designed to store and query attribute graphs in a specialized, real-time manner. For example, taking Nebula Graph as an example, in a Nebula Graph database, there are several core elements constituting an attribute Graph: points (vertex) and edges (edge). Wherein, the point (vertex) is uniquely identified, and the point may have an attribute or not. Edge (edge) is the relationship between two points, the edge being directional; an edge consists of a type and attributes that uniquely identifies an edge using a quadruple < start, edge type, rank, end >.
Here, rank is specifically described, and when the types of the start point, the end point, and the edge are the same, rank can be well used for the distinction. For example, in a transfer network where the start point A, the end point B, and the edge type are transfers, the transfer serial number may be placed in the rank field, thereby uniquely identifying each transfer. In the graph query language, we use the @123 symbol to manipulate an edge with a rank value of 123. rank ranges from 0 to INT MAX (maximum INT value), and in tigerggraph, there is no rank concept, but multiple edge types can be used instead.
A specific example is illustrated below:
a bank account is a point, the identifier is a card number, and the attributes are account opening time (data type is timestamp), account name (data type is string) and balance (data type is double). The transfer between two bank accounts is an edge, the attribute is the amount (data type is double), the data type of the transfer epilogue is string, and the data types of the attributes are basic data types. And if the name of the manager of the past client is to be stored, such as: [ 'alice', 'bob', 'kit' ], which is a linked list or array type, i.e., a composite data type.
In most of the existing graph database systems, the basic data types (int, string, float, double, etc.) are well supported as attributes. But whether composite data types are supported, the products are quite different. Products such as the schema-less of Neo4j, which primarily target flexibility, can support multiple composite data types with a corresponding sacrifice in performance; strong schema products like tiger Graph, Nebula Graph, by default do not support compound data types for performance reasons.
However, complex data structures such as arrays, lists, sets, maps, etc. are common in mainstream programming languages such as Java. "composite" here refers to a more complex structure such as List < int > that is composed of basic types (e.g., 8-byte integer int) according to data structures such as linked lists, queues, collections, mappings, etc. For example, arrays are characterized by fast addressing at the mark-down, with a time complexity of O (1), but with a time complexity of O (n) if the cost of inserting an element is high. The List linked List may insert a deletion or a new element in the linked List quickly, but at the time of addressing o (n). No duplicate elements may appear in the Set, and so on.
That is, because of the common role of compound data types in programming, supporting compound data types without sacrificing performance in graph databases is a problem that the industry is urgently required to address.
Disclosure of Invention
The invention aims to provide a method and a device for supporting composite data types in a graph database, so that the attributes in the graph database can support the composite data types on the premise of not sacrificing the performance of the graph database.
In order to solve the technical problem, the invention discloses a method for supporting composite data types in a graph database, which comprises the following steps:
when a first attribute of a first graph element of a graph database is composite data, creating N +1 attached edges of self-loops on the first graph element, wherein N is greater than the number of elements of the composite data;
and corresponding each auxiliary edge on the first graph element to the elements of the composite data one by one, wherein the rank value of the auxiliary edge is the ranking value of the corresponding element in the composite data, and the attribute comprises the real numerical value of the corresponding element.
Further, the dependent edge is provided with a visible right.
Further, the dependent edge includes a first edge, the attribute of the first edge includes a concurrency control value, and the method further includes:
when a client needs to operate the auxiliary edge on the first graph element, whether the concurrent control value is a controllable value or not is judged, if yes, the client is allowed to operate the auxiliary edge, the controllable value is modified into a unique identifier of the client, and otherwise, the operation of the client is refused.
Further, when the client side needs to operate the auxiliary edge on the first graph element, the auxiliary edge to be operated and the first edge are read concurrently.
Further, still include:
and judging whether the unique identifier of the client exists or not and whether the client is on line or not in real time, and if one of the unique identifiers does not exist, modifying the concurrence control value from the unique identifier of the client to a controllable value.
Further, the attribute of the first edge further includes a composite length, and the composite length is the number of elements of the composite data, and the method further includes:
when the capacity of the first graph element is required to be expanded to the length of M and the concurrency control value is modified to be the unique identifier of the current client, sequentially creating the (N + 2) th to M-th auxiliary edges and modifying the composite length to be M, wherein M is larger than N.
Further, the first creation and each subsequent expansion are taken as a creation event, and then the method further includes:
before each creation event is completed, only the data before the current creation event is displayed and manipulated;
if the creation of the partial auxiliary edge fails in one creation event and the retry also fails, the auxiliary edge which is successfully created in the creation event is subjected to garbage collection.
Further, when the composite data is of a linked list type, the attribute of the dependent edge further includes a rank value of the dependent edge corresponding to the next element.
Further, when the composite data is of the set type, the rank value of the dependent edge is a hash value of the real value of the corresponding element, and the attribute of the dependent edge further includes the rank value of the dependent edge corresponding to the next element.
In order to solve the above technical problem, the present invention further discloses an apparatus for supporting compound data types in a graph database, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the above method for supporting compound data types in a graph database.
The invention provides a method and a device for supporting compound data types in a graph database, wherein when a first attribute of a first graph element of the graph database is compound data, auxiliary edges with corresponding number and self-circulation are created on the first graph element, wherein the rank value of the auxiliary edges is the ranking value of the corresponding element in the compound data, and the attribute comprises the real numerical value of the corresponding element. The operation of the user level for defining the attribute as the composite data type is converted into the operation of the graph database level for the self-loop, and the self-loop is the basic capability of all graph databases, so that the support for the composite data type is realized in all graph databases; meanwhile, for the operation of each element, only a few attached graph data are involved, and the performance is greatly improved, so that the attributes in the graph database can support the composite data type on the premise of not sacrificing the performance of the graph database.
Drawings
FIG. 1 is a flowchart illustrating a method for supporting compound data types in a graph database according to an embodiment.
Fig. 2 is a schematic structural diagram of an array according to the present embodiment.
Fig. 3 is a schematic diagram illustrating an implementation of the point attribute support array according to this embodiment.
Fig. 4 is a schematic diagram illustrating an implementation of the edge attribute support array according to this embodiment.
Fig. 5 is a schematic structural diagram of the one-way linked list according to this embodiment.
Fig. 6 is a schematic diagram of implementation in which the attribute is a single-direction linked list according to this embodiment.
Fig. 7 is an implementation diagram of the attribute set according to this embodiment.
FIG. 8 is a block diagram illustrating an apparatus for supporting compound data types in a graph database according to an embodiment.
Detailed Description
The present invention is further described in detail below with reference to examples so that those skilled in the art can practice the invention with reference to the description.
It will be understood that terms such as "having," "including," and "comprising," when used herein, do not preclude the presence or addition of one or more other elements or groups thereof.
Example one
In this embodiment, in order to realize that the point attribute in the graph database can support the composite data type of the array, before the description of this embodiment, the array in this embodiment is described as follows:
the structure of an array is shown in FIG. 2, where the array is 5 in length, the array indices start from 0 to 4, and their index numbers are used to access the elements of the array. For example, to access the first element of an array, we can use index [0], the second element using index [1], and so on.
Thus, in this embodiment, the definition of the operation on the array is described as follows:
1. initialization: each data stored in the array is also called an element; the type of the element is determined at the time of data creation; we define the function create _ array < int > (N) to initialize an array of length N and element type int.
2. Length operation: the array length N cannot be changed once determined; we define the function length () to get the length of an array.
3. Capacity expansion operation: the array can be expanded; we define the function resize (M) to expand the array to M > N. While array reduction is not allowed in the usual case.
4. Subscript references elements: each element has an exact position in the array and can be accessed by its index; we define functions get (i) and set (i) to read or write to the value corresponding to index i; where delete (i) of a particular element is not usually supported in the array because of its very high cost.
5. And (3) segregation: we define store _ array () to delete this array.
6. And (6) recycling the garbage. Partial failure or logical deletion, etc., may cause the left-over space not to be reclaimed immediately, which may be reclaimed by garbage reclamation trim () operation to avoid space waste. In Java, garbage collection is also known as GC.
Therefore, the operation defines an interface of the array operation in the OGM, and a user of the OGM can look like that the point attribute supports the composite data type of the array.
Wherein, the OGM is described below, which is an english abbreviation of Object Graph Mapping, which is the concept first proposed by Neo4j, and functions close to the orm (Object Relational Mapping) framework of Relational databases such as Hibernate, Mybatis; the method aims to realize the direct increase, deletion, and change of points, edges, paths and subgraphs in a graph database by operating (class) objects in a programming language by a user. Specific database operation statements (such as Cypher of Neo4 j) are specifically realized by an OGM framework; therefore, the user only needs to use familiar python or Java language and does not need to learn a database operation language; and the programming language (python or Java) is turing-complete and more flexible than Cypher et al.
Therein, the CAS is described as follows: the comparison and exchange is one of atomic operations and can be used for realizing uninterrupted data exchange operation in concurrent programming, so that the problem of data inconsistency caused by the uncertainty of execution sequence and the unpredictability of interruption when certain data is rewritten by concurrent operations is avoided. This operation replaces the compared data with a new value when the values are the same by comparing with the specified data.
As shown in FIG. 1, the present embodiment discloses a method for supporting compound data types in a graph database, comprising:
step S1, when the first attribute of the first graph element of the graph database is composite data, creating the auxiliary edges of N +1 self-loops on the first graph element, wherein N is larger than the element number of the composite data;
as shown in fig. 3, when the first attribute is a point and the composite data is an array, the number of elements of the array is N, and N +1 additional edges of the self-loop are added to the point.
Step S2, each attached edge on the first graph element is in one-to-one correspondence with an element of the composite data, the rank value of the attached edge is the ranking value of the corresponding element in the composite data, and the attribute includes the real value of the corresponding element.
In this embodiment, the dependent edge includes a first edge and other edges, the attribute of the first edge includes a concurrency control value and a composite length, that is, rank 0 in fig. 3, where property means attribute, N is composite length, and lock 0 is a controllable value. Other edges include edges of rank 1, rank 2, …, rank N, for storing the true values of the corresponding elements.
Thus, the operations regarding the dot attributes in the present embodiment are implemented as follows:
1. and (3) initialization implementation: create _ array < int > (N).
a. Creating auxiliary edges aux-edge of the N +1 self-loops for the point, wherein rank values are 0 to N respectively, and property is int type; wherein the aux-edge specific name is specified by the user to control conflicts created by the same name.
CREATE EDGE aux-edge (property INT, lock INT)。
b. For the edges of rank 1, rank 2, … rank N, the property field value is NULL, meaning NULL pointer, followed by padding based on the true value.
UPSERT EDGE aux-edge @1,2,…,N:(NULL)。
c. // for the edges of rank 0, the property field value is N.
UPSERT EDGE aux-edge @0:(N,0)。
Because a plurality of elements are involved in one composite data type, and concurrent operations may cause damage to the composite data type, the application also realizes ACID operations on the composite data type; this functionality is not dependent on the ACID capability of the Graph database itself, as like Nebula Graph, JanusGraph, etc. do not have ACID capability by themselves, but rather the CAS capability for a single operation statement that can be provided by the Graph database. Implementation of the present application can be simplified if the graph database itself has ACID capability. Its ACID operates as:
when the client needs to operate the auxiliary edge on the first graph element, whether the concurrent control value is a controllable value or not is judged, if yes, the client is allowed to operate the auxiliary edge, the controllable value is modified into the unique identification of the client, and otherwise, the operation of the client is refused.
Corresponding to fig. 3, it can be seen that the lock field is used for concurrent control of multiple clients: a value of 0 indicates that no client is doing a write operation; the non-0 value is the session id, i.e. the unique identifier, of the corresponding client. As a lock for multi-client concurrent control.
In this embodiment, an auxiliary edge aux-edge is introduced, so as to avoid the visibility of other users, which may cause a change in the graph structure itself, for example, the number of adjacent edges of a point is increased, and a visibility right may be set for aux-edge to be visible only by the user or a special internal user, so that the external behavior is not destroyed.
In this embodiment, the first creation and each subsequent expansion are taken as a creation event, and the method further includes:
before each creation event is completed, only the data before the current creation event is displayed and manipulated;
if the creation of the partial auxiliary edge fails in one creation event and the retry also fails, the auxiliary edge which is successfully created in the creation event is subjected to garbage collection.
Specifically, since step b involves the creation of multiple self-ring edges, there is a possibility of a failure in the process, and at this time, retry may be performed on the failed self-ring edge, or abandon may be performed, for example, the space is insufficient; at this time, the partial successful self-loop edge becomes garbage, and garbage collection can be carried out in the subsequent trim () operation, which are all invisible to the outside. And only when the step c is successful, the whole creation process is completed, and the array is visible to the outside, wherein the statement of the step c is a single CAS command and does not partially fail.
In addition, to prevent a session from occupying lock for a long time due to an unexpected exit of the client, the embodiment further includes:
and judging whether the unique identifier of the client exists or not and whether the client is on-line or not in real time, and if one of the unique identifiers does not exist, modifying the concurrence control value from the unique identifier of the client to a controllable value.
That is, the database background can check whether the lock and the session id exist in a task-cycling manner, or check the operation log and perform playback. The operation log recording is a common means in a database.
2. The length operation length () is acquired.
If lock is not 0, then the operation fails.
FETCH PROP ON aux-edge @0 YIELD lock, property;
3. And (5) expanding the volume to the length of M (M > N).
Thus, the present embodiment further includes:
when the capacity expansion on the first graph element is needed to reach the length of M and the concurrency control value is modified to be the unique identifier of the current client, the (N + 2) th to M-th auxiliary edges are sequentially created, the composite length is modified to be M, and M is larger than N. For example, if N is 10, there are N +1, that is, 11 dependent edges, and at this time, when the capacity needs to be expanded to 20, the dependent edges of 12 th to 20 th are created in sequence.
Specifically, the method comprises the following steps:
a// check and lock rank 0 with the client's session id. If the WHEN condition is not satisfied, the statement will not execute.
UPDATE EDGE ON aux-edge @0;
SET lock={sessionid} WHEN (lock=0 or lock={sessionid})AND(M>N)。
3, b// creating an edge of rank N +1, … rank M with a property field value of NULL.
UPSERT EDGE aux-edge @ N+1,…,M:(NULL)。
3, c// unlocked and modified to length M.
UPDATE EDGE ON aux-edge @0;
SET lock=0 AND property=M WHEN lock={sessionid}。
Therefore, in order to guarantee ACID, the method further comprises the following steps:
the operations of adding, deleting and changing the subordinate edge are all single CAS operation commands.
Namely, the above UPDATE and UPSERT statements are both CAS operations;
in step 3.a, the CAS ensures the serial execution of multiple clients and must satisfy the WHEN condition;
in step 3.b, if a partial failure occurs and the retry is invalid: at the moment, partial successful data is recycled by a trim () function; while the externally visible length () is still N, the part [1,2, …, N ] can still operate normally. The N +1 to M parts are reclaimed by trim (). Thereby ensuring that normal foreground operation of the user can return quickly.
In step 3.c, the operation is also an atomic operation.
4. Subscript element, read and write.
A get (i): the value of the array index i is read.
Acquiring the lock and the property of rank 0 and rank i simultaneously;
if lock is not self session id or 0, the operation fails;
FETCH PROP ON aux-edge @0,{i} YIELD lock,property;
in this embodiment, when the client wants to operate the dependent edge on the first graph element, the dependent edge to be operated and the first edge are read concurrently.
Specifically, to ensure higher reading performance, the operation chooses not to lock in advance, but to use the concurrent reading capability of the graph database to directly read rank 0 and rank I, because the marginal cost for reading 1 edge and 2 edges is the same, and then check whether rank 0 is locked.
A FETCH PROP bulk read operation, which also uses edges of the same starting point here, is atomic. The impact of other concurrent write operations is avoided.
Where the operation is similar for a range [ i, j ] of intervals for which an array is obtained.
B set (i): the value with the array index i is modified to be the new value { newvalue }.
// locking
UPDATE EDGE ON aux-edge @0;
SET lock = {sessionid} WHEN (lock = 0 OR lock = {sessionid})。
// modify rank to the value of i, { newvalue } to input parameter
UPDATE EDGE ON aux-edge @{i};
SET{newvalue}。
// unlock
UPDATE EDGE ON aux-edge @0;
SET lock=0 WHEN lock={sessionid}。
The principle is similar to get (i).
5. Reconstruct store _ array < int > ().
// locking
UPDATE EDGE ON aux-edge @0;
SET lock={sessionid} WHEN (lock=0 OR lock={sessionid})。
// delete all edges; only the rank 0 needs to be guaranteed to be successfully deleted, and the rest of ranks can be handed over to the trim () process
DELETE EDGE aux-edge @0,1,…N-1;
6. Total waste recovery trim ():
all self-loops are no longer required.
A// for performance purposes, a primary rank 0 is pre-checked for the presence to determine if it can be recycled.
FETCH PROP ON aux-edge @0 Yield lock,property;
B// to disable other sessions.
UPSERT EDGE aux-edge @0 : (0,{sessionid});
c// reverse order, range, deleting the edges corresponding to all possible rank values. The purpose of the reverse order is to ensure that rank 0 is deleted last, before which rank 0 cannot be operated concurrently.
DELETE EDGE aux-edge @INT_MAX,…,,1,0。
7. Incremental garbage collection trim ():
only partial self-looping is no longer required. Only the number group length N in rank 0 needs to be checked and rank N +1, …, rank INT _ MAX deleted.
8. When deleting the node, the self-loops are deleted first, and the operation is the same as that of reconstructing the store _ array < int > ().
The operation flow is internal locking-operation-unlocking of a single function, and if a plurality of functions in the OGM are combined and submitted in batch, the locking-operation 1-operation 2-unlocking can be combined to improve the execution efficiency.
Thus, in the present embodiment, the following effects are provided:
1. on the OGM level, the operation behaviors of an array in a complete data structure are realized, such as construction, capacity expansion, subscript access, destruct and the like; the array operation of the user level for the point attribute is converted into the operation of the graph database level for the self-loop, the process is completed by the OGM, and the operation is invisible to the user.
2. The ACID semanteme of each operation in the array is guaranteed, and the integrity of data and operation when multiple clients are concurrent is guaranteed.
3. Less capacity requirements for graph databases: only the CAS capability of a single command of the graph database is needed, not the ACID capability of providing multiple combined statements. While self-loop and authority control are basic capabilities of all graph databases.
4. Only a few accompanying graph data are involved for each element of operation, performance is greatly improved, and performance of the graph database is not sacrificed.
Example two
In this embodiment, in order to realize that the edge attribute in the graph database can support the composite data type of an array, the specific implementation process is continuously described as follows on the basis of the first embodiment:
as shown in FIG. 4, there is an edge between two points, called edge A. Wherein. Id. Name, and Account _ ampout are their attributes, string and double types, respectively.
At this time, we also need to add an attribute support _ manager, which is an array [ 'alice', 'bob', 'kit' ].
Then it is necessary to:
1. add an attached edge structure for edgeA: aux-edge.
For point, aux-edge is a self-loop;
and for edges, aux-edge is the same dependent edge as the starting and ending point of edgeA.
Its visibility rights are likewise visible to the particular user.
2. The behaviors of construction, capacity expansion, subscript access, analysis, garbage collection and the like in the array are the same as the operation process in the first embodiment, and the operation on the attribute of the edge A array is translated into the operation on each rank of the aux-edge. No further description is required.
3. When deleting edge edgeA, these dependent edges are deleted first.
EXAMPLE III
In this embodiment, in order to implement that the attribute in the graph database can support the composite data type of the linked list, the attribute of the dependent edge further includes a rank value of the dependent edge corresponding to the next element.
1. FIG. 5 shows the structure of a linked List (List) with a head pointer (head in FIG. 5) pointing to the first element;
2. the last element has no tail pointer (or points to NULL);
3. there is one pointer per element to the next element. Thus, the traversal is performed.
Referring to the idea of the previous section, when the attribute of a point is a linked list, a plurality of attached self-loops aux-edge are constructed to obtain the structure shown in fig. 6, where:
rank 0 for lock control; the first column of the property (property in fig. 6 is property) is used as lock, and the second column value (property in fig. 6 is 1) is the rank value of the first element in the linked list, so that the edge of rank 0 is equivalent to the edge pointing to rank 1, since reading this 1 knows that the rank value corresponding to the next element should be 1.
rank 1, the first element in the linked list, the first column "a" being the actual value to be stored in the linked list, and the second column being the rank of the next element, i.e. continuing to point to the next element.
rank 2, the first column 'B' is the real value to be stored in the linked list; the second column is NULL, represented as the tail element of the linked list.
Briefly describe how operations on a linked list are escape to operations on aux-edge:
1. the step of inserting an element, e.g. linked list 1- >2 instead 1- >100- > 2:
a. locking rank 0 (NULL is changed to session id);
UPSERT writes to the edge of rank 100 with the second column changed to 2 (now 1- >2, 100- > 2);
UPSERT changes the second column of rank 1 to 100 (now 1- >100, 100- > 2);
d. rank 0 is unlocked.
2. A step of deleting an element, e.g. 1- >100- >2 instead 100- > 2:
a. locking rank 0;
UPSERT writes rank 0 with the second column changed to 2(1- >100- >2, 0- >100- > 2);
c. rank 0 is unlocked.
In step b, although rank 1 points to 100, head pointer rank 0 points to 100; since there is no rank pointing to 1, it becomes orphan, which can be handed over to trim ().
In addition, only rank 0 needs to be deleted when the whole list is deleted.
3. Traversal, list, is not usually addressed under the subscript, but traverses one by one in the order of the pointers:
a. locking rank 0;
b. obtaining a corresponding rank value according to the second column value;
c. continuing until a rank corresponds to a second column value of NULL;
d. rank 0 is unlocked.
Compared with get (i) of the array, the reason that the linked list needs to be locked is that the traversal process usually involves multiple RPCs, and the concurrency isolation between 2 RPCs cannot be guaranteed without locking.
The method of the full trim is the same as that in the previous section, and the method of trim of the lower difference is briefly described here.
a. Locking; meanwhile, a bitmap is established in the memory of the OGM, and is set to be 0 from 0 to INT _ MAX.
b. Obtaining all rank values of the linked list as in the traversing step, and if the rank values exist, correspondingly setting the bitmap to be 1;
c. if the value corresponding to the subscript i in the bitmap is still 0, deleting the edge corresponding to the rank value;
d. and (4) unlocking.
An extra field can be added to rank 0 to buffer the length of the whole linked list, and the value is modified correspondingly each time an element is added or deleted. This way the chain length fetching operation can be fast.
Example four
In this embodiment, when the compound data is an aggregate type, the rank value of the subordinate edge is a hash value of the true value of the corresponding element based on the first embodiment, and the attribute of the subordinate edge further includes the rank value of the subordinate edge corresponding to the next element.
In this embodiment, HashSet is taken as an example for explanation.
Wherein, Hashset is characterized in that:
1. no duplicate elements in the set; can insert the same element for multiple times, and the elements in the set are unchanged
2. When reading an element, returning whether the element exists
3. All elements may be traversed one-by-one, but traversing in some particular order is not required.
Thus, as shown in fig. 7, HashSet is structured as a special type of linked list:
rank 0 for lock control; the first column of the attribute is used as lock, and the second column value is the rank value of the first element in the linked list, so that the edge of rank 0 refers to the first element to be inserted in the HashSet.
hash (A) is the hash value of "A". In the example shown as the first element in the linked list, the first column "a" is the actual value to be stored in the linked list, and the second column is the rank value of the next element, i.e., hash (b).
hash (B) is the hash value of "B". In the example shown as the first element in the linked list, the first column "B" is the actual value to be stored in the linked list, and the second column is the rank value of the next element, i.e., hash (c).
rank (C): the first column "C" is the real value to be stored in the linked list; the second column is NULL, represented as the tail element of the linked list.
Wherein the main steps of reading the element (X):
1. calculating the value of hash (x);
2. checking whether an edge with rank as hash (X) exists;
wherein the main step of inserting the element (X):
1. calculating hash (x);
2. checking whether an edge with rank as hash (X) exists; if yes, returning directly;
3. after inserting hash (x) into 0, the next element of original 0 is changed to the next element of hash (x), which means that each time a new element is actually inserted at the head of the linked list.
Wherein, the step of traversing:
since the HashSet is implemented as a linked list, the traversal process is the same as the linked list in the third embodiment.
EXAMPLE five
In this embodiment, in order to realize that the attribute in the graph database can support the HashMap as a composite data type, since the HashMap < key, value > is very close to the HashSet in the implementation principle, it is only necessary to implement the key as hash (key), and add an additional field to record the value.
EXAMPLE six
As shown in fig. 8, an apparatus 1 for supporting compound data types in a graph database includes a memory 2, a processor 3, and a computer program stored on the memory 2 and executable on the processor 3, wherein the processor 3 implements any one of the steps of the method for supporting compound data types in a graph database according to one to five embodiments when executing the computer program.
While embodiments of the invention have been described above, it is not limited to the applications set forth in the description and the embodiments, which are fully applicable to various fields of endeavor for which the invention may be embodied with additional modifications as would be readily apparent to those skilled in the art, and the invention is therefore not limited to the details given herein and to the embodiments shown and described without departing from the generic concept as defined by the claims and their equivalents.

Claims (10)

1. A method for supporting compound data types in a graph database, comprising:
when a first attribute of a first graph element of a graph database is composite data, creating attached edges of N +1 self-loops on the first graph element, wherein N is larger than the number of elements of the composite data;
and corresponding each auxiliary edge on the first graph element to the elements of the composite data one by one, wherein the rank value of the auxiliary edge is the ranking value of the corresponding element in the composite data, and the attribute comprises the real numerical value of the corresponding element.
2. A method for supporting compound data types in a graph database as claimed in claim 1, wherein said dependent edge is provided with visibility rights.
3.A method for supporting compound data types in a graph database as claimed in claim 1, wherein said dependent edge comprises a first edge, and wherein an attribute of said first edge comprises a concurrency control value, said method further comprising:
when a client needs to operate the auxiliary edge on the first graph element, whether the concurrent control value is a controllable value or not is judged, if yes, the client is allowed to operate the auxiliary edge, the controllable value is modified into a unique identifier of the client, and otherwise, the operation of the client is refused.
4. The method of claim 3, wherein when a client is to operate on a dependent edge on said first graph element, reading the dependent edge and the first edge to be operated on concurrently.
5. A method for supporting complex data types in a graph database as claimed in claim 3, further comprising:
and judging whether the unique identification of the client exists or not and whether the client is on line or not in real time, and if one of the unique identification of the client and the on line is not the same, modifying the concurrency control value from the unique identification of the client to a controllable value.
6.A method for supporting compound data types in a graph database as claimed in claim 3, wherein said first edge attributes further comprise a compound length, said compound length being the number of elements of compound data, said method further comprising:
when the capacity of the first graph element needs to be expanded to the length of M and the concurrency control value is modified to be the unique identifier of the current client, sequentially creating the (N + 2) th to M-th auxiliary edges and modifying the composite length to be M, wherein M is larger than N.
7. A method for supporting compound data types in a graph database as claimed in claim 6, wherein the first creation and each subsequent expansion are made as a creation event, the method further comprising:
before each creation event is completed, only the data before the current creation event is displayed and manipulated;
if the creation of the partial auxiliary edge fails in one creation event and the retry also fails, the auxiliary edge which is successfully created in the creation event is subjected to garbage collection.
8. The method for supporting compound data types in a graph database according to claim 6, wherein said dependent edge attribute further comprises a rank value of the dependent edge corresponding to the next element when the compound data is of a linked list type.
9. The method as claimed in claim 6, wherein when the composition data is of the set type, the rank value of said dependent edge is a hash value of the true value of the corresponding element, and the attribute of said dependent edge further comprises the rank value of the dependent edge corresponding to the next element.
10. Apparatus for supporting compound data types in a graph database, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements a method for supporting compound data types in a graph database as claimed in any one of claims 1 to 9.
CN202210647635.5A 2021-11-01 2022-06-09 Method and device for supporting composite data types in graph database Active CN114741569B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210647635.5A CN114741569B (en) 2022-06-09 2022-06-09 Method and device for supporting composite data types in graph database
US17/977,226 US20230140423A1 (en) 2021-11-01 2022-10-31 Method and system for storing data in graph database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210647635.5A CN114741569B (en) 2022-06-09 2022-06-09 Method and device for supporting composite data types in graph database

Publications (2)

Publication Number Publication Date
CN114741569A true CN114741569A (en) 2022-07-12
CN114741569B CN114741569B (en) 2022-09-13

Family

ID=82287265

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210647635.5A Active CN114741569B (en) 2021-11-01 2022-06-09 Method and device for supporting composite data types in graph database

Country Status (1)

Country Link
CN (1) CN114741569B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140019490A1 (en) * 2012-07-13 2014-01-16 Indrajit Roy Event processing for graph-structured data
WO2014143878A1 (en) * 2013-03-15 2014-09-18 The Board Of Trustees Of The Leland Stanford Junior University Systems and methods for recommending relationships within a graph database
US20180144061A1 (en) * 2016-11-23 2018-05-24 Linkedin Corporation Edge store designs for graph databases
CN108182295A (en) * 2018-02-09 2018-06-19 重庆誉存大数据科技有限公司 A kind of Company Knowledge collection of illustrative plates attribute extraction method and system
CN110633378A (en) * 2019-08-19 2019-12-31 杭州欧若数网科技有限公司 Graph database construction method supporting super-large scale relational network
CN110889014A (en) * 2019-10-21 2020-03-17 浙江工业大学 D 3-based method for displaying enterprise association relationship map
CN112214649A (en) * 2020-10-21 2021-01-12 北京航空航天大学 Distributed transaction solution system of temporal graph database
CN112948591A (en) * 2021-02-25 2021-06-11 成都数联铭品科技有限公司 Subgraph matching method and system suitable for directed graph and electronic device
CN112948610A (en) * 2021-02-25 2021-06-11 杭州欧若数网科技有限公司 Method and system for verifying result behavior of graph query language
CN113157943A (en) * 2021-04-15 2021-07-23 辽宁大学 Distributed storage and visual query processing method for large-scale financial knowledge map
CN114036318A (en) * 2021-11-22 2022-02-11 南京启数智能系统有限公司 Public safety knowledge graph generation method facing to pan-aware data and based on time sequence diagram
WO2022100233A1 (en) * 2020-11-12 2022-05-19 全球能源互联网研究院有限公司 Graph database-based power grid retrieval method and system
CN114564620A (en) * 2022-02-25 2022-05-31 苏州浪潮智能科技有限公司 Graph data storage method and system and computer equipment

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140019490A1 (en) * 2012-07-13 2014-01-16 Indrajit Roy Event processing for graph-structured data
WO2014143878A1 (en) * 2013-03-15 2014-09-18 The Board Of Trustees Of The Leland Stanford Junior University Systems and methods for recommending relationships within a graph database
US20180144061A1 (en) * 2016-11-23 2018-05-24 Linkedin Corporation Edge store designs for graph databases
CN108182295A (en) * 2018-02-09 2018-06-19 重庆誉存大数据科技有限公司 A kind of Company Knowledge collection of illustrative plates attribute extraction method and system
CN110633378A (en) * 2019-08-19 2019-12-31 杭州欧若数网科技有限公司 Graph database construction method supporting super-large scale relational network
CN110889014A (en) * 2019-10-21 2020-03-17 浙江工业大学 D 3-based method for displaying enterprise association relationship map
CN112214649A (en) * 2020-10-21 2021-01-12 北京航空航天大学 Distributed transaction solution system of temporal graph database
WO2022100233A1 (en) * 2020-11-12 2022-05-19 全球能源互联网研究院有限公司 Graph database-based power grid retrieval method and system
CN112948591A (en) * 2021-02-25 2021-06-11 成都数联铭品科技有限公司 Subgraph matching method and system suitable for directed graph and electronic device
CN112948610A (en) * 2021-02-25 2021-06-11 杭州欧若数网科技有限公司 Method and system for verifying result behavior of graph query language
CN113157943A (en) * 2021-04-15 2021-07-23 辽宁大学 Distributed storage and visual query processing method for large-scale financial knowledge map
CN114036318A (en) * 2021-11-22 2022-02-11 南京启数智能系统有限公司 Public safety knowledge graph generation method facing to pan-aware data and based on time sequence diagram
CN114564620A (en) * 2022-02-25 2022-05-31 苏州浪潮智能科技有限公司 Graph data storage method and system and computer equipment

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
MARTINA ŠESTAK 等: "《Applying k-vertex cardinality constraints on a Neo4j graph database》", 《FUTURE GENERATION COMPUTER SYSTEMS》 *
NITIN JAMADAGNI 等: "《GoDB: From Batch Processing to Distributed Querying over Property Graphs》", 《IEEE》 *
黄权隆: "《HybriG:一种高效处理大量重边的属性图存储架构》", 《计算机学报》 *
黄权隆等: "HybriG:一种高效处理大量重边的属性图存储架构", 《计算机学报》 *

Also Published As

Publication number Publication date
CN114741569B (en) 2022-09-13

Similar Documents

Publication Publication Date Title
US7376674B2 (en) Storage of multiple pre-modification short duration copies of database information in short term memory
US6026406A (en) Batch processing of updates to indexes
Turek et al. Locking without blocking: making lock based concurrent data structure algorithms nonblocking
US5758356A (en) High concurrency and recoverable B-tree index management method and system
EP1342173B1 (en) Database management system and method for databases having large objects
US5832498A (en) Device for generating object-oriented interfaces for relational data bases and a process implemented by this device
EP0303231B1 (en) Method and device for enabling concurrent access of indexed sequential data files
US6105026A (en) Multi-phase locking for partition maintenance operations
KR930002331B1 (en) Method and apparatus for concurrent modification of an index tree
US5430869A (en) System and method for restructuring a B-Tree
US5890153A (en) Database lock control method
US6519614B1 (en) Transaction processing system using efficient file update processing and recovery processing
US20050102255A1 (en) Computer-implemented system and method for handling stored data
US20080120304A1 (en) Method and system for providing high performance data modification of relational database tables
JPH09507109A (en) Device for facilitating the processing of transactions relating to computer databases
US8832022B2 (en) Transaction processing device, transaction processing method and transaction processing program
JP4295333B2 (en) Database control method and program
US20090287660A1 (en) Bit string searching apparatus, searching method, and program
Maier Why object-oriented databases can succeed where others have failed
CN114741569B (en) Method and device for supporting composite data types in graph database
WO2014061847A1 (en) Apparatus and method for logging and recovering transaction of database built in mobile environment
US20230140423A1 (en) Method and system for storing data in graph database
US8195667B2 (en) Bit string search apparatus, search method, and program
US7209919B2 (en) Library server locks DB2 resources in short time for CM implicit transaction
Krishna et al. Using Cuckoo Filters to Improve Performance in Object Store-based Very Large Databases

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant