WO2023083237A1 - 图数据的管理 - Google Patents

图数据的管理 Download PDF

Info

Publication number
WO2023083237A1
WO2023083237A1 PCT/CN2022/131020 CN2022131020W WO2023083237A1 WO 2023083237 A1 WO2023083237 A1 WO 2023083237A1 CN 2022131020 W CN2022131020 W CN 2022131020W WO 2023083237 A1 WO2023083237 A1 WO 2023083237A1
Authority
WO
WIPO (PCT)
Prior art keywords
edge
index
key
graph
auxiliary
Prior art date
Application number
PCT/CN2022/131020
Other languages
English (en)
French (fr)
Inventor
朱晓伟
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2023083237A1 publication Critical patent/WO2023083237A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying

Definitions

  • the present disclosure relates to the field of data processing, in particular to a method and device for managing graph data.
  • the key of the primary index of the edge data (i.e. the primary key) does not contain the timestamp of the edge, resulting in low efficiency of the edge scan operation.
  • auxiliary index On the basis of the primary index.
  • the auxiliary index retains all the properties of the edge.
  • the difference between the auxiliary index and the main index is that the timestamp of the edge is added to the key of the auxiliary index. This allows users to complete side-scan operations in chronological order by accessing the secondary index.
  • edge attributes need to be stored twice in the primary index and the secondary index, so the overhead of storage space is large.
  • the present disclosure provides a method and device for managing graph data, so as to reduce storage space overhead.
  • a method for managing graph data is provided.
  • the method is applied to a graph database, and edge data of a time-series graph is stored in the graph database, and the number of edges in the sequence graph is recorded in the edge data.
  • attributes, and the various attributes include timestamps, the index of the edge data includes a main index and an auxiliary index, the value of the main index is the timestamp, and the auxiliary index stores the various attributes,
  • the key of the auxiliary index includes the timestamp
  • the method includes: receiving a first request, the first request is used to request to perform an edge scan operation on the edge data according to a time range; using the time stamp in the key, and perform the edge scan operation on the edges in the timing diagram to obtain edges whose timestamps in the timing diagram fall within the time range.
  • the key and value of the primary index together form the key of the secondary index.
  • the method further includes: when the target edge in the sequence diagram needs to be updated or deleted, searching the target edge in the main index to obtain the target edge in the The position in the main index; according to the key and value recorded in the position, construct the key of the auxiliary index; according to the key of the auxiliary index, search the target edge in the auxiliary index to obtain the target edge Position in the auxiliary index: update or delete the target edge according to the position of the target edge in the main index and the auxiliary index.
  • the key of the main index sequentially includes the source vertex, edge type and target vertex of the edge of the sequence graph.
  • the key of the auxiliary index sequentially includes the source vertex, edge type, timestamp and target vertex of the edge of the sequence graph.
  • a device for managing graph data is provided, the graph data is stored in a graph database, edge data of a timing graph is stored in the graph database, and the edge data of the timing graph is recorded in the edge data
  • the various attributes include timestamps
  • the index of the edge data includes a main index and an auxiliary index
  • the value of the main index is the timestamp
  • the auxiliary index stores the various attributes
  • the key of the auxiliary index includes the timestamp
  • the device includes: a receiving module, configured to receive a first request, where the first request is used to request to perform an edge scan operation on the edge data according to a time range;
  • a scanning module configured to use the timestamp in the key of the auxiliary index to perform the edge scan operation on the edges in the sequence diagram, and obtain edges whose timestamps in the sequence diagram fall within the time range .
  • the key and value of the primary index together form the key of the secondary index.
  • the device further includes: a query module, configured to search for the target edge in the main index when the target edge in the sequence diagram needs to be updated or deleted, and obtain the The position of the target edge in the main index; the construction module is used to construct the key of the auxiliary index according to the key and value recorded in the position; the search module is used to construct the key of the auxiliary index according to the key of the auxiliary index in the The target edge is searched in the auxiliary index to obtain the position of the target edge in the auxiliary index; the update module is used to update the target edge according to the position of the target edge in the main index and the auxiliary index The target edge is updated or deleted.
  • a query module configured to search for the target edge in the main index when the target edge in the sequence diagram needs to be updated or deleted, and obtain the The position of the target edge in the main index
  • the construction module is used to construct the key of the auxiliary index according to the key and value recorded in the position
  • the search module is used to construct the key of the auxiliary index according to the key of the auxiliary
  • the key of the main index includes the source vertex, edge type and target vertex of the edge of the sequence graph in sequence.
  • the key of the auxiliary index sequentially includes the source vertex, edge type, timestamp and target vertex of the edge of the sequence graph.
  • a device including a memory and a processor, where executable codes are stored in the memory, and the processor is configured to execute the executable codes to implement the method as described in the first aspect.
  • a computer-readable storage medium on which executable code is stored, and when the executable code is executed, the method as described in the first aspect can be implemented.
  • a computer program product including executable code, and when the executable code is executed, the method as described in the first aspect can be implemented.
  • the main index does not store all attributes of edges, but only stores timestamps of edges, thereby reducing data redundancy and storage space overhead.
  • FIG. 1 is a schematic diagram of a graph provided by an embodiment of the present disclosure.
  • Fig. 2 is a schematic diagram of edge data provided by an embodiment of the present disclosure.
  • FIG. 3 is a schematic flowchart of a sequence diagram retrieval method provided by an embodiment of the present disclosure.
  • Fig. 4 is a schematic flowchart of a method for updating or deleting edges of a sequence diagram provided by an embodiment of the present disclosure.
  • Fig. 5 is a schematic diagram of a storage structure of a primary index of an edge provided by an embodiment of the present disclosure.
  • Fig. 6 is a schematic diagram of a storage structure of an auxiliary index of an edge provided by an embodiment of the present disclosure.
  • Fig. 7 is a schematic structural diagram of a device for managing graph data provided by an embodiment of the present disclosure.
  • Fig. 8 is a schematic structural diagram of a device provided by an embodiment of the present disclosure.
  • a graph database is a data model used to describe the relationship between objects.
  • Graph databases use graph structures for semantic queries, and use vertices, edges, and attributes to represent and store data. Compared with the traditional relational model, for queries involving complex multi-hop relationships, using a graph database is not only more natural in expression, but also more efficient in processing.
  • FIG. 1 is a schematic diagram of a graph provided by an embodiment of the present disclosure. The following takes Figure 1 as an example to briefly introduce the figure.
  • a graph usually consists of vertices and edges. Vertices can also be called nodes or junctions, and edges can also be called relationships. Vertex data can include vertex ID, vertex type, etc. Vertex ID is used to uniquely identify a vertex. Edge data can include source vertices, destination vertices, and edge attributes. The source vertex of an edge refers to the departure vertex of the edge, and the destination vertex of an edge refers to the vertex to which the edge points. For example, the source vertex of edge 1 in Figure 1 is vertex 0, and the target vertex is vertex 1.
  • Vertex and edge data can be business-related.
  • the vertex identifier may be a person's ID number or personnel number.
  • the properties of vertices and edges can be set according to user needs.
  • the attributes of a vertex can include age, education, address, occupation, etc.
  • the edge attributes may include the relationship between vertices, for example, classmate relationship, colleague relationship, friend relationship and so on.
  • the vertices and/or edges of the graph contain timestamp attributes, and such graphs containing timestamp attributes may also be called timing graphs.
  • Sequence diagrams can be stored and managed using a key-value database.
  • a key-value database refers to a database that is organized, indexed, and stored in the form of key-value pairs.
  • a key-value pair consists of a value part and a key part.
  • some attributes in the data for query and/or scanning are specified as keys, and the remaining attribute information of the data is stored as values.
  • the key of the primary index may be referred to as a primary key, and is used to uniquely identify a vertex or an edge in a sequence graph stored and managed using a key-value database. Due to common rules (for example, in some databases, the primary key cannot be updated after the graph database is created), the primary key should generally not contain dynamically changing data (such as timestamp, creation time column, etc.).
  • Fig. 2 is a schematic diagram of edge data provided by an embodiment of the present disclosure.
  • the edge data shown in Figure 2 may be stored in the main index.
  • Primary indexes can be constructed as key-value pairs.
  • the key of the primary index can be, for example, a triple (source vertex, edge type, target vertex), and the value of the primary index can be the remaining attributes of the various attributes of the edge data except the primary key.
  • the attributes of the edge of the graph can have many kinds, and the specific number and type of attributes can be set by the user according to the needs.
  • the value of the main index can include timestamp, quantity, comment, etc. It can be understood that the greater the amount of attribute data, the larger the space occupied by each attribute.
  • the storage space occupied by the data of each edge is larger, that is, the storage space occupied by the main index is larger.
  • edge scan is the core of graph data processing (such as graph database query, iterative graph computing).
  • An edge scan can also be called a range lookup of an edge.
  • the edge scanning operation on the sequence graph refers to an operation of scanning edges adjacent to the vertex with a certain rule centered on a vertex, so as to obtain required edge data.
  • the scan starts from the current vertex, and the time stamp is within a certain time range (the time range can be, for example, from the latest edge to a certain moment, or all edges from a certain moment forward) related to the current vertex.
  • the data of the adjacent edge can be implemented through various methods, for example, methods such as binary search and quadrature search can be used to perform edge-scan on the stored edge data.
  • an index can be used to efficiently scan all edges originating from vertex X, where vertex X refers to any vertex in the graph, when its keys sequentially contain source vertex, edge type, and destination vertex.
  • This index can also be used to efficiently scan all edges of type M originating from vertex X.
  • this index can also be used to efficiently scan all edges starting from vertex X, whose edge type is type M, and whose target vertex is vertex Y.
  • the above index setting method is equivalent to only knowing the first attribute of the key of the index. At this time, using this index for side scanning cannot take full advantage of the advantages of the key-value pair index, and the scanning efficiency is low.
  • a sequence graph can contain multiple vertices and multiple edges.
  • operations such as searching or scanning a graph may be decomposed into multiple searching or scanning operations on edges starting from a certain vertex. For example, continue to refer to Figure 1, if you want to get all edge data starting from vertex 0, you need to scan the edge between vertex 0 and vertex 1 to vertex 3, the edge between vertex 1 and vertex 2, and the edge between vertex 3 and vertex 2 Edge between vertices 2.
  • the edge scanning process can be decomposed into scanning all edges starting from vertex 0, all edges starting from vertex 1, and all edges starting from vertex 3 respectively, and summarizing the scanning results. It can be seen that the main performance of the sequence graph system usually depends on the scanning efficiency of the edges corresponding to the nodes of the graph.
  • an auxiliary index can be added so that the key of the auxiliary index includes timestamp information.
  • the auxiliary index is continuous data stored in time order.
  • a pointer to the primary index can be stored in the value of the secondary index.
  • auxiliary index does not contain the attribute information of the edge that needs to be obtained by the current scan, it is necessary to locate the auxiliary index, then search the main index according to the pointer of the main index in the auxiliary index, and return the attribute stored in the main index information.
  • a social network For example, in a social network, user A wants to obtain comment information on himself from user A's fans within a certain time range. Since only the pointer to the main index is stored in the auxiliary index, after locating the corresponding edge in the auxiliary index, it is necessary to obtain the pointer of the main index through the value in the edge of the auxiliary index. Locate the position of the edge in the main index according to the pointer of the main index, and obtain the required comment information.
  • the auxiliary index is stored in chronological order, the storage mode of the main index is random storage compared to the timestamp attribute. Therefore, access to the main index is still random access at this time.
  • the main index and auxiliary index set in this way still have the problem of low efficiency.
  • the value of the primary index may also be copied to the value of the secondary index. That is, various attributes of edges are also included in the auxiliary index. At this point, only the secondary index needs to be retrieved when side-scanning by time range. Since there is no need to perform side scan operations on the main index, the scanning efficiency is effectively improved.
  • the present disclosure provides a method and device for managing graph data, so as to solve the problem of large storage space overhead in the prior art.
  • FIG. 3 is a schematic flowchart of a sequence diagram retrieval method provided by an embodiment of the present disclosure.
  • the method of the present disclosure is applied to a graph database, and edge data of a time series graph is stored in the graph database.
  • Edge data records various attributes of edges in the time series graph.
  • Various attributes of edges can be set according to user needs. Multiple properties may refer to all properties possessed by the edges of the graph.
  • timestamps may be included among the various attributes.
  • Indexes for edge data include primary and secondary indexes.
  • the value of the primary index is the aforementioned timestamp.
  • the key of the primary index can be set arbitrarily as needed, as long as it meets the setting rules of the primary key in the graph database.
  • the information of an edge can be uniquely determined by the triplet of source vertex, edge type and target vertex. By using the source vertex as the beginning of the key, the edges starting from the same vertex can be gathered together and stored continuously, thus efficiently supporting the scan operation of the vertex-centered edge.
  • a secondary index stores the various attributes, and the key of the secondary index includes the timestamp. That is, the setting method of the index provided by the embodiment of the present disclosure is to store all the various attributes of the edge data of the graph in the auxiliary index, and only store the primary key and the timestamp in the main index.
  • the embodiments of the present disclosure store all the various attributes of the edges in the auxiliary The way in the index reduces data redundancy and reduces storage space overhead.
  • the specific composition form of the key of the auxiliary index can be set as required.
  • the keys of a secondary index can be formed from the keys and values of the primary index.
  • the keys of the secondary index may be formed by inserting the value of the primary index into a position of the key of the primary index.
  • the key of the primary index is ⁇ source vertex, edge type, target vertex> in turn, and when the value of the primary index is a timestamp, the value of the primary index (that is, the timestamp) can be inserted into the key of the primary index to form the key of the secondary index key.
  • the keys of the auxiliary index can be ⁇ source vertex, timestamp, edge type, target vertex>, or ⁇ source vertex, edge type, timestamp, target vertex> in sequence.
  • the secondary index may be an ordered index sorted by key. Edge data can be made to be stored sequentially in time order in the secondary index by setting the timestamp in the key of the secondary index.
  • the continuous storage in time order may mean that in the storage space corresponding to the auxiliary index, the edge data of the graph is continuously stored in the storage space in time order.
  • auxiliary index Stored in an orderly manner through the auxiliary index, only a limited number of side scan operations are required (for example, the location of the side data corresponding to the start time stamp of the location time range and the side data corresponding to the end time stamp) to obtain the to-be-scanned All edge data. Therefore, the scanning efficiency when scanning edges by time is effectively improved.
  • step S310 a first request is received, and the first request is used to request to perform an edge scan operation on edge data according to a time range.
  • the first request can be automatically generated according to user requirements.
  • the first request may be a search formula, or a search word, etc.
  • the first request can also be manually input by the user. This application does not limit the specific implementation and specific form of the first request.
  • the time range can be set arbitrarily as required. For example, you can set the time range to all times from a timestamp onwards. You can also set the time range as a period of time from a certain moment to another moment.
  • the scan operation is performed on the edge data according to the time range.
  • a certain vertex may be used as a source vertex, and edges adjacent to the source vertex and whose timestamps are within a certain time range are scanned.
  • step S320 using the time stamp in the key of the auxiliary index, an edge scan operation is performed on the edges in the time sequence diagram to obtain edges whose time stamps in the time sequence diagram fall within the time range.
  • the key of the auxiliary index may sequentially include the source vertex, edge type, timestamp and target vertex of the edge of the sequence graph.
  • edge data starting from the source vertex belonging to a certain edge type, and within a certain time range.
  • edge type For example, in a social network, get the number of fans (edge type) obtained by a certain user (source vertex) within a time range (time stamp). Setting the timestamp in the third position of the key of the auxiliary index can improve the efficiency of the side scan for the above scenario.
  • the auxiliary index stores various attributes of the edge data. That is, all attribute information of edges is stored in the secondary index. Since the main index does not store all the attributes of the edge, but only stores the timestamp of the edge, thus reducing data redundancy and reducing the cost of storage space.
  • user C initiates a transaction to user D, for example, user C lends money to user D.
  • loan transaction After the loan transaction is completed, it is necessary to update the data of the target edge with user C as the source vertex, user D as the target vertex, and loan as the edge type in the graph database.
  • the present disclosure also provides a method for updating or deleting edges.
  • Fig. 4 is a schematic flowchart of a method for updating or deleting edges of a sequence diagram provided by an embodiment of the present disclosure.
  • the method for managing graph data provided by the present disclosure may further include: in step S410, when the target edge in the sequence graph needs to be updated or deleted, search the target edge in the main index to obtain the target edge in the main index position in .
  • the target edge refers to an edge that needs to be updated or deleted in the graph database. Since an edge in graph data represents a relationship between two vertices, when updating or deleting a target edge, the timestamp of the target edge is usually not known. At this time, according to the previous introduction to the use of indexes, if the target edge is directly located in the auxiliary index, the advantages of the auxiliary index cannot be fully utilized, and the positioning efficiency is low.
  • the key of the primary index does not include time stamp information. Therefore, when an edge is updated or deleted, the target edge can be quickly located through the primary index.
  • step S420 the key of the auxiliary index is constructed according to the key and value of the location record.
  • the key and value of the primary index may jointly form the key of the secondary index.
  • the specific formation method please refer to the introduction in the part of FIG. 3 .
  • the key of the primary index is ⁇ source vertex, edge type, target vertex> and the value of the primary index is timestamp.
  • the keys of the secondary index are ⁇ source vertex, edge type, timestamp, target vertex>.
  • steps S430 to S440 according to the key of the auxiliary index, the target edge is searched in the auxiliary index to obtain the position of the target edge in the auxiliary index.
  • the target edge is updated or deleted according to its position in the primary index and the secondary index.
  • the position of the target edge in the auxiliary index can be quickly located. After determining the position of the target edge in the primary index and the auxiliary index, corresponding update or delete operations can be performed on the records of the target edge in the primary index and the auxiliary index.
  • the method for updating or deleting edges of the graph database provided in this embodiment can realize fast location and deletion of edges, thereby further improving the efficiency of managing edges of the graph database.
  • the storage method of the index can be greatly optimized. That is, the primary and secondary indexes only duplicate the key and timestamp of the primary index. For a graph database with a lot of attribute information of edge data, using this storage method can reduce data redundancy and effectively improve the utilization rate of storage space.
  • the key of the auxiliary index can be jointly formed by the key and value of the primary index, so when updating and deleting edges, it can realize fast positioning and processing.
  • the graph data management method of the present disclosure will be introduced in detail below by taking FIG. 2 , FIG. 5 , and FIG. 6 as examples.
  • the diagram shown in FIG. 2 includes time stamp information, so it can also be called a sequence diagram.
  • the graph shown in FIG. 2 includes 4 vertices (vertex 0, vertex 1, vertex 2, and vertex 3), and 5 edges (edge 1, edge 2, edge 3, edge 4, and edge 5).
  • edge types there may be multiple edge types corresponding to edges in a graph, and each edge may correspond to a different edge type.
  • the type of side 1 may be friend and the type of side 2 may be colleague.
  • all edges correspond to the same edge type, which is recorded as type 0.
  • Type 0 can refer to any edge type, for example, it can refer to a friend type or a loan type.
  • FIG. 5 is a schematic diagram of a storage structure of a primary index of an edge provided by an embodiment of the present disclosure.
  • the key of the primary index is ⁇ source node, edge type, target node>, and the key and value of the primary index are stored in a continuous storage space in the order corresponding to the key of the primary index.
  • the edge data of the time sequence diagram shown in FIG. 2 may be continuously stored in the order of edge 1, edge 4, edge 3, edge 2, and edge 5.
  • the primary index of the edge cannot be stored in chronological order.
  • the retrieval efficiency of using the main index is very low.
  • Fig. 6 is a schematic diagram of a storage structure of an auxiliary index provided by an embodiment of the present disclosure.
  • an auxiliary index containing timestamp information in the key may be added.
  • the attributes (attribute 1, . . . , attribute n) in the values of the auxiliary indexes shown in FIG. 6 are only exemplary to indicate that the edges of the graph may contain multiple attributes. Attribute 1, attribute n can refer to any edge attribute (for example, attribute 1 can refer to the number of loans, and attribute n can refer to the number of reviews).
  • the auxiliary index provided by this embodiment is stored sequentially in the order of keys. That is, in the auxiliary index provided in this embodiment, the edge data of the sequence diagram shown in FIG. 2 are stored in the order of edge 1, edge 4, edge 3, edge 2, and edge 5. It can be seen that in this embodiment, the storage order of the edges of the main index and the auxiliary index is different.
  • the target edge whose source vertex, edge type and target vertex only know the edge
  • the target vertex since the target vertex is located in the fourth key of the auxiliary index, it is equivalent to only using the first two digits of the auxiliary index (source vertex and edge type) index, so it is not possible to directly locate the position of the target edge.
  • the timestamp information of the target edge is usually not known. At this time, the efficiency of locating the target edge through the auxiliary index is low.
  • the embodiment of the present disclosure sets that the key of the secondary index is jointly formed by the key and value of the primary index.
  • an edge scan operation is first performed in the main index to quickly locate the position of the target edge in the main index. Since the timestamp of the target edge is stored in the value of the primary index, the key and value of the primary index of the target edge can be used to form the key of the secondary index.
  • the key of the main index can be set as (source vertex, edge type, target vertex), and the value of the main index is timestamp.
  • the keys of the secondary index are (source vertex, edge type, timestamp, target vertex).
  • the edge scan locates the primary index of the target edge
  • the key and value of the primary index can be used to splice out the key of the secondary index.
  • the edge scan operation in the auxiliary index can quickly locate the position of the target edge in the auxiliary index. After determining the position of the target edge in the primary index and the secondary index, update or delete the data of the target edge.
  • Fig. 7 is a schematic structural diagram of a device for managing graph data provided by an embodiment of the present disclosure.
  • the apparatus shown in FIG. 7 can be used in the above-mentioned method of managing map data.
  • the device 700 shown in FIG. 7 may include a receiving module 710 and a scanning module 720 .
  • the graph data is stored in a graph database
  • edge data of a timing graph is stored in the graph database
  • various attributes of edges in the timing graph are recorded in the edge data
  • the various attributes include timestamps
  • the index of the edge data includes a main index and an auxiliary index
  • the value of the main index is the timestamp
  • the auxiliary index stores the various attributes
  • the auxiliary The keys of the index include the timestamp.
  • the receiving module 710 may be configured to receive a first request, where the first request is used to request to perform an edge scan operation on the edge data according to a time range.
  • the scanning module 720 may be configured to use the timestamp in the key of the auxiliary index to perform the edge scan operation on the edges in the timing diagram, and obtain the time stamps in the timing diagram that fall within the time range. side.
  • Using the device management graph data shown in FIG. 7 can reduce data redundancy and reduce storage space overhead.
  • the key and value of the primary index together form the key of the secondary index.
  • the apparatus 700 may further include a query module 730 , a construction module 740 , a search module 750 and an update module 760 .
  • the query module 730 may be configured to search the target edge in the main index to obtain the position of the target edge in the main index when the target edge in the sequence graph needs to be updated or deleted.
  • the construction module 740 may be configured to construct the key of the auxiliary index according to the key and value of the location record.
  • the search module 750 may be configured to search the target edge in the auxiliary index according to the key of the auxiliary index, and obtain the position of the target edge in the auxiliary index.
  • the update module 760 may be configured to update or delete the target edge according to the position of the target edge in the main index and the auxiliary index.
  • the key of the main index includes the source vertex, edge type and target vertex of the edge of the sequence graph in sequence.
  • the key of the auxiliary index sequentially includes the source vertex, edge type, timestamp and target vertex of the edge of the sequence graph.
  • Fig. 8 is a schematic structural diagram of a device provided by an embodiment of the present disclosure.
  • the apparatus 800 may be, for example, a user terminal or a portable device.
  • the apparatus 800 may include a memory 810 and a processor 820 .
  • Memory 810 may be used to store executable code.
  • the processor 820 can be used to execute the executable code stored in the memory 810, so as to realize the steps in the various methods described above.
  • the device 800 may further include a network interface 830 and a processor 820 . Data exchange with external devices can be realized through the network interface 830 .
  • sequence numbers of the above-mentioned processes do not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, rather than by the embodiments of the present disclosure.
  • the implementation process constitutes any limitation.
  • all or part may be implemented by software, hardware, firmware or other arbitrary combinations.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, all or part of the processes or functions according to the embodiments of the present disclosure will be generated.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server or data center via wired (such as coaxial cable, optical fiber, Digital Subscriber Line (DSL)) or wireless (such as infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (such as a floppy disk, a hard disk, a magnetic tape), an optical medium (such as a digital video disc (Digital Video Disc, DVD)), or a semiconductor medium (such as a solid state disk (Solid State Disk, SSD)), etc. .
  • a magnetic medium such as a floppy disk, a hard disk, a magnetic tape
  • an optical medium such as a digital video disc (Digital Video Disc, DVD)
  • a semiconductor medium such as a solid state disk (Solid State Disk, SSD)
  • the disclosed systems, devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本公开提供一种管理图数据的方法和装置。所述方法应用于图数据库,所述图数据库中存储有时序图的边数据,所述边数据中记录有所述时序图中的边的多种属性,且所述多种属性包括时间戳,所述边数据的索引包括主索引和辅助索引,所述主索引的值为所述时间戳,所述辅助索引存储有所述多种属性,且所述辅助索引的键包括所述时间戳,所述方法包括:接收第一请求,所述第一请求用于请求按照时间范围对所述边数据执行边扫描操作;利用所述辅助索引的键中的时间戳,对所述时序图中的边执行所述边扫描操作,得到所述时序图中的时间戳落入所述时间范围内的边。

Description

图数据的管理 技术领域
本公开涉及数据处理领域,具体涉及一种管理图数据的方法及装置。
背景技术
在时序图的处理中,经常需要按照时间范围对时序图的边数据进行边扫描操作,以获取时间戳落入该时间范围的边。但是,边数据的主索引(primary index)的键(即主键)不包含边的时间戳,导致边扫描操作的效率较低。
为了提高边扫描操作的效率,一些现有技术在主索引的基础上引入了辅助索引(secondary index)。与主索引一样,该辅助索引保留了边的所有属性,该辅助索引与主索引的不同之处在于,该辅助索引的键中加入了边的时间戳。这样一来,用户通过访问辅助索引即可按时间顺序完成边扫描操作。
但是,该现有技术需要将边的属性在主索引和辅助索引中存储两份,因此存储空间的开销较大。
发明内容
有鉴于此,本公开提供一种管理图数据的方法及装置,以降低存储空间的开销。
第一方面,提供一种管理图数据的方法,所述方法应用于图数据库,所述图数据库中存储有时序图的边数据,所述边数据中记录有所述时序图中的边的多种属性,且所述多种属性包括时间戳,所述边数据的索引包括主索引和辅助索引,所述主索引的值为所述时间戳,所述辅助索引存储有所述多种属性,且所述辅助索引的键包括所述时间戳,所述方法包括:接收第一请求,所述第一请求用于请求按照时间范围对所述边数据执行边扫描操作;利用所述辅助索引的键中的时间戳,对所述时序图中的边执行所述边扫描操作,得到所述时序图中的时间戳落入所述时间范围内的边。
可选地,在一些实施例中,所述主索引的键和值共同形成所述辅助索引的键。
可选地,在一些实施例中,所述方法还包括:当需要更新或删除所述时序图中的目标边时,在所述主索引中查找所述目标边,得到所述目标边在所述主索引中的位置;根据所述位置记录的键和值,构造所述辅助索引的键;根据所述辅助索引的键,在所述辅 助索引中查找所述目标边,得到所述目标边在所述辅助索引中的位置;根据所述目标边在所述主索引和所述辅助索引中的位置,对所述目标边进行更新或删除。
可选地,在一些实施例中,所述主索引的键依次包含所述时序图的边的源顶点,边类型和目标顶点。
可选地,在一些实施例中,所述辅助索引的键依次包含所述时序图的边的源顶点,边类型,时间戳和目标顶点。
第二方面,提供一种管理图数据的装置,所述图数据存储于图数据库,所述图数据库中存储有时序图的边数据,所述边数据中记录有所述时序图中的边的多种属性,且所述多种属性包括时间戳,所述边数据的索引包括主索引和辅助索引,所述主索引的值为所述时间戳,所述辅助索引存储有所述多种属性,且所述辅助索引的键包括所述时间戳,所述装置包括:接收模块,用于接收第一请求,所述第一请求用于请求按照时间范围对所述边数据执行边扫描操作;扫描模块,用于利用所述辅助索引的键中的时间戳,对所述时序图中的边执行所述边扫描操作,得到所述时序图中的时间戳落入所述时间范围内的边。
可选地,在一些实施例中,所述主索引的键和值共同形成所述辅助索引的键。
可选地,在一些实施例中,所述装置还包括:查询模块,用于当需要更新或删除所述时序图中的目标边时,在所述主索引中查找所述目标边,得到所述目标边在所述主索引中的位置;构造模块,用于根据所述位置记录的键和值,构造所述辅助索引的键;查找模块,用于根据所述辅助索引的键,在所述辅助索引中查找所述目标边,得到所述目标边在所述辅助索引中的位置;更新模块,用于根据所述目标边在所述主索引和所述辅助索引中的位置,对所述目标边进行更新或删除。
可选地,在一些实施例中,所述主索引的键包依次含所述时序图的边的源顶点,边类型和目标顶点。
可选地,在一些实施例中,所述辅助索引的键依次包含所述时序图的边的源顶点,边类型,时间戳和目标顶点。
第三方面,提供一种设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器被配置为执行所述可执行代码,以实现如第一方面所述的方法。
第四方面,提供一种计算机可读存储介质,其上存储有可执行代码,当所述可执行代码被执行时,能够实现如第一方面所述的方法。
第五方面,提供一种计算机程序产品,包括可执行代码,当所述可执行代码被执行时,能够实现如第一方面所述的方法。
本公开实施例中,主索引并非存储边的所有属性,而是仅存储边的时间戳,从而减少了数据冗余,降低了存储空间的开销。
附图说明
图1是本公开一实施例提供的图的示意图。
图2是本公开一实施例提供的边数据的示意图。
图3是本公开一实施例提供的时序图检索方法流程示意图。
图4是本公开一实施例提供的时序图的边的更新或删除的方法流程示意图。
图5是本公开一实施例提供的边的主索引的存储结构示意图。
图6是本公开一实施例提供的边的辅助索引的存储结构示意图。
图7是本公开一实施例提供的管理图数据的装置的结构示意图。
图8是本公开一实施例提供的装置的结构示意图。
具体实施方式
下面将结合附图,对本公开中的技术方案进行描述。
图数据库是一种用于描述对象之间关联关系的数据模型。图数据库使用图结构进行语义查询,并使用顶点、边和属性来表示和存储数据。相比传统的关系模型,对于涉及复杂多跳关系的查询,使用图数据库进行处理,不仅表达起来更自然,处理起来也更高效。
图1是本公开一实施例提供的图的示意图。下面以图1为例,对图进行简要介绍。
如图1所示,图通常包括顶点和边。顶点也可以称为节点或结点,边也可以称为关系。顶点的数据可以包括顶点标识、顶点类型等。顶点标识用于唯一标识一个顶点。边的数据可以包括源顶点、目标顶点和边属性。边的源顶点指边的出发顶点,边的目标顶点指边指向的顶点。例如,图1中的边1的源顶点为顶点0,目标顶点为顶点1。
顶点和边的数据可以与业务相关。例如,对于社交网络场景,顶点标识可以是人的身份证号码或者人员编号等。顶点和边的属性可以根据用户需求设置。例如,顶点的属 性可以包括年龄、学历、住址、职业等。边属性可以包括顶点与顶点之间的关系,例如,同学关系、同事关系、朋友关系等。
在一些实施例中,图的顶点和/或边包含时间戳属性,这种包含时间戳属性的图,也可以称为时序图。时序图可以使用键值数据库进行存储和管理。键值数据库指按照键值对的形式进行组织、索引和存储的数据库。键值对由值(value)部分和键(key)部分组成。通常指定数据中的一些用于查询和/或扫描的属性作为键,数据的其余属性信息则作为值存储。
主索引的键可以称为主键,用于唯一地标识使用键值数据库存储和管理的时序图中的一个顶点或一条边。由于通用的规则(例如,在一些数据库中,主键在图数据库创建后不能进行更新操作),主键中通常不应包含动态变化的数据(如时间戳、创建时间列等)。
图2是本公开一实施例提供的边数据的示意图。图2所示的边数据可以存储于主索引中。主索引可以以键值对的形式构建。主索引的键例如可以是三元组(源顶点,边类型,目标顶点),主索引的值可以是边数据的多种属性中除主键之外的剩余属性。
图的边的属性可以有多种,具体的属性数量以及类别,可以由用户根据需要自行设置。例如,对于图2所示的边数据,主索引的值可以包括时间戳,数量,评论等。可以理解,属性数据的数量越多,每种属性占用的空间越大。相对应的,每条边的数据所占用的存储空间越大,即,主索引所占用的存储空间越大。
对于图数据库而言,边扫描是图数据处理(例如图数据库查询、迭代式图计算)的核心。边扫描也可以称为边的范围查找。在一些实施例中,对时序图的边扫描操作,指以一个顶点为中心,以一定的规则,扫描与该顶点相邻的边,从而获取所需的边数据的操作。例如,扫描从当前顶点出发,时间戳处于某一个时间范围(时间范围例如可以是从最新的边开始到某个时刻,或是从某个时刻往前所有的边)内的所有与当前顶点相邻的边的数据。边扫描操作可以通过多种方法实现,例如可以使用二分查找、四分查找等方法对存储的边数据进行边扫描。
对于边扫描操作,索引的键的设置方式决定了索引的使用范围。例如,当索引的键依次包含源顶点、边类型和目标顶点时,该索引可以用于高效地扫描从顶点X出发的所有边,顶点X指图中的任意一个顶点。该索引还可以用于高效地扫描从顶点X出发,边类型为类型M的所有边。当然,该索引还可以用于高效地扫描从顶点X出发,边类 型为类型M,目标顶点为顶点Y的所有边。
但是,如果仅知道出发顶点为顶点X,目标顶点为顶点Y,而不知道边类型时,对于上述索引的设置方式,相当于仅知道索引的键的第一个属性。此时,使用该索引进行边扫描,不能充分利用键值对索引的优势,扫描效率较低。
一个时序图中可以包含多个顶点以及多条边。在一些实施例中,对图查找或扫描等操作,可以分解为多个对从某个顶点出发的边的查找或扫描操作。例如,继续参见图1,如果想要获取从顶点0出发的所有边数据时,需要扫描顶点0到顶点1~顶点3之间的边,顶点1与顶点2之间的边,以及顶点3到顶点2之间的边。该边扫描过程可以分解为分别扫描从顶点0出发的所有边,从顶点1出发的所有边,以及从顶点3出发的所有边,并将扫描结果汇总。可见,时序图系统的主要性能通常取决于对图的节点对应的边的扫描效率。
为了提高边扫描的效率,可以增加辅助索引,使辅助索引的键中包含时间戳信息。此时,在一些实施例中,辅助索引为按时间顺序存储的连续数据。辅助索引的值中可以存储指向主索引的指针。
然而,当辅助索引中不包含当前扫描需要获取的边的属性信息时,则需要在定位到辅助索引后,再根据辅助索引中的主索引的指针查找主索引,返回主索引中存储的该属性信息。
例如,在社交网络中,用户A想要获取在某个时间范围内用户A的粉丝对自己的评论信息。由于辅助索引中仅存储了指向主索引的指针,在辅助索引中定位到对应的边后,还需通过辅助索引的边中的值获取主索引的指针。根据主索引的指针定位到主索引中该边的位置,获取所需的评论信息。
可以理解,虽然辅助索引按照时间顺序存储,但是主索引的存储方式相对时间戳属性而言,为随机存储。因此,此时对主索引的访问仍然是随机访问。通过该方式设置的主索引和辅助索引,仍然存在效率较低的问题。
在一些实施例中,也可以将主索引的值在辅助索引的值中复制一份。即,辅助索引中也包含边的多种属性。此时,按时间范围进行边扫描操作时,仅需检索辅助索引。由于无需再对主索引进行边扫描操作,有效地提升了扫描效率。
然而,由于所有属性在主索引和辅助索引中均保存了两份,因此该方式有较大的空间开销。尤其是边的属性信息较多,或属性数据的数据量较大的情况下,数据重复存储 将产生极大的数据冗余。
为了解决上述问题,本公开提供一种管理图数据的方法及装置,以解决现有技术中存储空间开销较大的问题。
下面结合图3-图6对本公开实施例提供的时序图检索方法进行详细描述。图3是本公开一实施例提供的时序图检索方法流程示意图。
本公开的方法应用于图数据库,该图数据库中存储有时序图的边数据。边数据中记录有时序图中的边的多种属性。边的多种属性可以根据用户需要设置。多种属性可以指图的边所具备的所有属性。在一些实施例中,多种属性中可以包括时间戳。
边数据的索引包括主索引和辅助索引。在一些实施例中,主索引的值为上述时间戳。主索引的键可以根据需要任意设置,只要满足图数据库中主键的设置规则即可。在一些实施例中,可以设置主索引的键依次包含时序图的边的源顶点,边类型和目标顶点。通过源顶点、边类型和目标顶点的三元组,即可唯一地确定一条边的信息。通过将源顶点作为键的开头,可以将同一顶点出发的边聚集到一起连续存储,从而高效地支持以顶点为中心的边的扫描操作。
辅助索引存储有该多种属性,且辅助索引的键包括所述时间戳。即,本公开实施例提供的索引的设置方式为,将图的边数据的多种属性全部存储在辅助索引中,而主索引中仅存储主键以及时间戳。
相比于传统的图数据库中将边的多种属性全部存储在主索引中,本公开实施例通过仅在主索引的值中存储边的时间戳,而将边的多种属性全部存储在辅助索引中的方式,减少了数据冗余,降低了存储空间的开销。
辅助索引的键的具体组成形式可以根据需要设置。例如,辅助索引的键可以由主索引的键和值共同形成。在一些实施例中,可以通过将主索引的值插入主索引的键中某一位置的方式形成辅助索引的键。例如,主索引的键依次为<源顶点,边类型,目标顶点>,主索引的值为时间戳时,可以将主索引的值(即时间戳)插入主索引的键中,形成辅助索引的键。具体地,辅助索引的键可以依次为<源顶点,时间戳,边类型,目标顶点>,也可以依次为<源顶点,边类型,时间戳,目标顶点>。
在一些实施例中,辅助索引可以是按键进行排序的有序索引。通过将时间戳设置在辅助索引的键中,可以使边数据在辅助索引中按时间顺序连续存储。在一些实施例中,按时间顺序连续存储,可以指在该辅助索引对应的存储空间中,图的边的数据是按照时 间顺序连续存储在存储空间中的。当进行边扫描操作时,可以仅定位待扫描的时间范围的起始时间戳和/或截止时间戳对应的边在辅助索引中的位置。其余的边可以从该定位到的边数据对应的辅助索引的位置起连续读取。
通过辅助索引有序存储,仅需进行有限次的边扫描操作(例如,定位时间范围的起始时间戳对应的边数据和截止时间戳对应的边数据所在的位置),即可获取待扫描的所有边数据。因此,有效地提高了按时间扫描边时的扫描效率。
参见图3,在步骤S310,接收第一请求,第一请求用于请求按照时间范围对边数据执行边扫描操作。
在一些实施例中,第一请求可以根据用户需求自动生成。第一请求可以是检索式,也可以是检索词等。当然,第一请求也可以由用户手动输入。本申请对第一请求的具体实现方式以及具体的形式不作限定。
时间范围可以根据需要任意设置。例如,可以设置时间范围为时间戳从某个时刻开始所有时间。也可以设置时间范围为从某一时刻开始,到另一时刻为止的一段时间。
在一些实施例中,按照时间范围对边数据执行扫描操作,可以是以某个顶点作为源顶点,扫描与该源顶点邻接的,时间戳位于某一个时间范围内的边。
在步骤S320,利用辅助索引的键中的时间戳,对时序图中的边执行边扫描操作,得到时序图中的时间戳落入时间范围内的边。
在一些实施例中,辅助索引的键可以依次包含所述时序图的边的源顶点,边类型,时间戳和目标顶点。
在很多应用场景中,需要获取从源顶点出发的,属于某个边类型的,在某个时间范围内的边数据。例如,在社交网络中,获取某一个用户(源顶点),在一个时间范围内(时间戳),获得的粉丝(边类型)数量。将时间戳设置在辅助索引的键的第三位,可以提高对上述场景的边扫描的效率。
由前述可知,本公开实施例提供的方法所应用的图数据库中,辅助索引中存储有边数据的多种属性。也就是说,边的所有属性信息均存储在辅助索引中。由于主索引并非存储边的所有属性,而是仅存储边的时间戳,因而减少了数据冗余,降低了存储空间的开销。
图的处理中,经常需要对边进行更新或删除操作。
例如,在社交网络中,当用户A取消对用户B的关注后,需要在图数据库中删除以用户A作为源顶点,以用户B作为目标顶点,以关注作为边类型的目标边。
又如,在借贷网络中,用户C向用户D发起了一笔交易,例如用户C向用户D贷款。在该贷款交易完成后,需要更新图数据库中以用户C为源顶点,以用户D为目标顶点,以贷款为边类型的目标边的数据。
为了提高对图数据库的边的更新或删除的效率,本公开还提供了边的更新或删除方法。
图4是本公开一实施例提供的时序图的边的更新或删除的方法流程示意图。如图4所示,本公开提供的管理图数据的方法还可以包括:在步骤S410,当需要更新或删除时序图中的目标边时,在主索引中查找目标边,得到目标边在主索引中的位置。
目标边指图数据库中的一条需要更新或删除的边。由于图数据中的边表示两个顶点之间的关系,对目标边进行更新或删除时,通常不知道目标边的时间戳。此时,根据之前对索引使用的介绍可知,如果直接在辅助索引中定位该目标边,实际上不能充分利用辅助索引的优势,定位效率较低。
本公开实施例中,主索引的键不包括时间戳信息,因此,在边的更新或删除时,通过主索引可以快速定位到目标边。
在步骤S420,根据所述位置记录的键和值,构造所述辅助索引的键。
本实施例中,主索引的键和值可以共同形成辅助索引的键。具体的形成方式可以参见图3部分的介绍。
根据主索引定位到目标边后,可以方便地使用主索引的键和值构造出辅助索引的键。例如,主索引的键为<源顶点,边类型,目标顶点>,主索引的值为时间戳。辅助索引的键为<源顶点,边类型,时间戳,目标顶点>。当根据主索引的键定位到目标边对应的主索引的位置后,读取主索引中记录的目标边的值,获得目标边的时间戳。将目标边的时间戳插入主索引的键中,形成辅助索引的键。
在步骤S430至S440,根据辅助索引的键,在辅助索引中查找目标边,得到目标边在辅助索引中的位置。根据目标边在主索引和辅助索引中的位置,对目标边进行更新或删除。
确定了辅助索引的键后,根据辅助索引的键,可以快速定位到目标边在辅助索引中 的位置。确定了目标边在主索引和辅助索引中的位置后,即可对主索引和辅助索引中目标边的记录做对应的更新或删除操作。
本实施例提供的图数据库的边的更新或删除方法,能够实现边的快速定位和删除,从而进一步提升对图数据库的边的管理效率。
通过设置主索引的值为时序图的边数据的时间戳,并使辅助索引的键由主索引的键和值构成,可以极大地优化索引的存储方式。即,主索引和辅助索引仅重复存储主索引的键和时间戳。对于边数据的属性信息非常多的图数据库,使用该存储方法,可以降低数据冗余,有效提高存储空间的利用率。同时,使用该存储方法,辅助索引的键可以通过主索引的键和值共同形成,因此在进行边的更新和删除时,可以实现快速定位和处理。
下面以图2、图5以及图6为例,具体介绍本公开的图数据管理方法。图2所示的图包括时间戳信息,因此也可以称为时序图。图2所示的图中包括4个顶点(顶点0、顶点1、顶点2和顶点3),以及5条边(边1、边2、边3、边4、边5)。
可以理解,一个图中的边对应的边类型可以有多种,且每条边都可以对应不同的边类型。例如,边1的类型可以是朋友,边2的类型可以是同事。为了简化描述,图2所示的时序图中,所有边对应的边类型相同,记为类型0。类型0可以指任何边类型,例如可以指朋友类型或者借贷类型。
图2所示的时序图可以使用如图5和图6所示的主索引和辅助索引的存储方式进行存储。图5是本公开一实施例提供的边的主索引的存储结构示意图。图5中,主索引的键为<源节点,边类型,目标节点>,主索引的键和值按照主索引的键对应的顺序,存储在一片连续的存储空间中。例如,图2所示的时序图的边数据可以按照边1,边4,边3,边2,边5的顺序连续存储。
由于主索引的键不包含时间戳,边的主索引不能按时间顺序有序地存储。当需要获取某个顶点对应的某个时间段内的边数据时,使用主索引的检索效率很低。以图5为例,当需要获取从顶点0出发,时间戳范围在[时间戳2,时间戳3]之间的边数据时,需要对从顶点0出发的所有边数据进行边扫描操作,以获取所需的边数据。
图6是本公开一实施例提供的辅助索引的存储结构示意图。为了提高边扫描操作的效率,如图6所示,可以增加键中包含时间戳信息的辅助索引。图6所示的辅助索引的值中的属性(属性1,…,属性n),仅示例性地表示图的边可能包含多种属性。属性1,属性n可以指任意边属性(例如,属性1可以指贷款数量,属性n可以指评论数量)。
本实施例提供的辅助索引以键的顺序有序存储。即,本实施例提供的辅助索引中,图2所示的时序图的边数据,按照边1,边4,边3,边2,边5的顺序存储。可以看出,在本实施例中,主索引和辅助索引的边的存储顺序不同。
在图6所示的辅助索引中,当需要获取从顶点0出发,时间戳范围大于等于时间戳2的边数据时,仅需首先通过边扫描定位边4所在的索引位置,再顺序地读取剩余边即可。
但是,对于仅知道边的源顶点,边类型以及目标顶点的目标边,使用上述辅助索引时,由于目标顶点位于该辅助索引的键的第四位,相当于只能使用辅助索引的前两位(源顶点和边类型)进行索引,因而无法直接定位到该目标边的位置。当需要更新或删除目标边时,通常不知道该目标边的时间戳信息,此时通过辅助索引定位目标边,效率较低。
为了保证更新或删除边的效率,本公开实施例设置辅助索引的键通过主索引的键和值共同形成。在本实施例中,在更新或删除目标边时,首先在主索引中进行边扫描操作,以快速定位目标边在主索引中的位置。由于主索引的值中存储有目标边的时间戳,利用目标边的主索引的键以及值,即可组成辅助索引的键。具体地,可以设置主索引的键为(源顶点,边类型,目标顶点),主索引的值为时间戳。辅助索引的键为(源顶点,边类型,时间戳,目标顶点)。
当边扫描定位到目标边的主索引后,利用主索引的键和值,即可拼接出辅助索引的键。利用拼接出的辅助索引的键,在辅助索引中进行的边扫描操作,即可快速定位到目标边在辅助索引中的位置。确定目标边在主索引和辅助索引中的位置后,对目标边的数据进行更新或删除操作。
根据上述分析可知,利用本公开提供的主索引和辅助索引的存储方式,在减少数据冗余,提高空间利用率的基础上,还能够进一步保证对边的更新和删除的效率。
上面结合图1至图6,详细描述了本公开的方法实施例,下面结合图7和图8,详细描述本公开的装置实施例。应理解,方法实施例的描述与装置实施例的描述相互对应,因此,未详细描述的部分可以参见前面方法实施例。
图7是本公开一实施例提供的管理图数据的装置的结构示意图。图7所示的装置可以用于上述管理图数据的方法。图7所示的装置700可以包括接收模块710和扫描模块720。
图7所示的装置中,所述图数据存储于图数据库,所述图数据库中存储有时序图的 边数据,所述边数据中记录有所述时序图中的边的多种属性,且所述多种属性包括时间戳,所述边数据的索引包括主索引和辅助索引,所述主索引的值为所述时间戳,所述辅助索引存储有所述多种属性,且所述辅助索引的键包括所述时间戳。
如图7所示,接收模块710可以用于接收第一请求,所述第一请求用于请求按照时间范围对所述边数据执行边扫描操作。
扫描模块720可以用于利用所述辅助索引的键中的时间戳,对所述时序图中的边执行所述边扫描操作,得到所述时序图中的时间戳落入所述时间范围内的边。
使用图7所示的装置管理图数据,可以减少数据冗余,降低存储空间的开销。
可选地,在一些实施例中,所述主索引的键和值共同形成所述辅助索引的键。
可选地,在一些实施例中,所述装置700还可以包括查询模块730、构造模块740、查找模块750和更新模块760。
查询模块730可以用于当需要更新或删除所述时序图中的目标边时,在所述主索引中查找所述目标边,得到所述目标边在所述主索引中的位置。
构造模块740可以用于根据所述位置记录的键和值,构造所述辅助索引的键。
查找模块750可以用于根据所述辅助索引的键,在所述辅助索引中查找所述目标边,得到所述目标边在所述辅助索引中的位置。
更新模块760可以用于根据所述目标边在所述主索引和所述辅助索引中的位置,对所述目标边进行更新或删除。
可选地,在一些实施例中,所述主索引的键包依次含所述时序图的边的源顶点,边类型和目标顶点。
可选地,在一些实施例中,所述辅助索引的键依次包含所述时序图的边的源顶点,边类型,时间戳和目标顶点。
图8是本公开一实施例提供的装置的结构示意图。该装置800例如可以是用户终端或便携设备。装置800可以包括存储器810和处理器820。存储器810可用于存储可执行代码。处理器820可用于执行所述存储器810中存储的可执行代码,以实现前文描述的各个方法中的步骤。在一些实施例中,该装置800还可以包括网络接口830,处理器820。与外部设备的数据交换可以通过该网络接口830实现。
应理解,在本公开的各种实施例中,上述各过程的序号的大小并不意味着执行顺序 的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本公开实施例的实施过程构成任何限定。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其他任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本公开实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(Digital Subscriber Line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(Digital Video Disc,DVD))、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。
本领域普通技术人员可以意识到,结合本公开实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本公开的范围。
在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
以上所述,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以所述权利要求的保护范围为准。

Claims (11)

  1. 一种管理图数据的方法,所述方法应用于图数据库,所述图数据库中存储有时序图的边数据,所述边数据中记录有所述时序图中的边的多种属性,且所述多种属性包括时间戳,所述边数据的索引包括主索引和辅助索引,所述主索引的值为所述时间戳,所述辅助索引存储有所述多种属性,且所述辅助索引的键包括所述时间戳,
    所述方法包括:
    接收第一请求,所述第一请求用于请求按照时间范围对所述边数据执行边扫描操作;
    利用所述辅助索引的键中的时间戳,对所述时序图中的边执行所述边扫描操作,得到所述时序图中的时间戳落入所述时间范围内的边。
  2. 根据权利要求1所述的方法,所述主索引的键和值共同形成所述辅助索引的键。
  3. 根据权利要求2所述的方法,所述方法还包括:
    当需要更新或删除所述时序图中的目标边时,在所述主索引中查找所述目标边,得到所述目标边在所述主索引中的位置;
    根据所述位置记录的键和值,构造所述辅助索引的键;
    根据所述辅助索引的键,在所述辅助索引中查找所述目标边,得到所述目标边在所述辅助索引中的位置;
    根据所述目标边在所述主索引和所述辅助索引中的位置,对所述目标边进行更新或删除。
  4. 根据权利要求1所述的方法,所述主索引的键依次包含所述时序图的边的源顶点,边类型和目标顶点。
  5. 根据权利要求1所述的方法,所述辅助索引的键依次包含所述时序图的边的源顶点,边类型,时间戳和目标顶点。
  6. 一种管理图数据的装置,所述图数据存储于图数据库,所述图数据库中存储有时序图的边数据,所述边数据中记录有所述时序图中的边的多种属性,且所述多种属性包括时间戳,所述边数据的索引包括主索引和辅助索引,所述主索引的值为所述时间戳,所述辅助索引存储有所述多种属性,且所述辅助索引的键包括所述时间戳,
    所述装置包括:
    接收模块,用于接收第一请求,所述第一请求用于请求按照时间范围对所述边数据执行边扫描操作;
    扫描模块,用于利用所述辅助索引的键中的时间戳,对所述时序图中的边执行所述 边扫描操作,得到所述时序图中的时间戳落入所述时间范围内的边。
  7. 根据权利要求6所述的装置,所述主索引的键和值共同形成所述辅助索引的键。
  8. 根据权利要求7所述的装置,所述装置还包括:
    查询模块,用于当需要更新或删除所述时序图中的目标边时,在所述主索引中查找所述目标边,得到所述目标边在所述主索引中的位置;
    构造模块,用于根据所述位置记录的键和值,构造所述辅助索引的键;
    查找模块,用于根据所述辅助索引的键,在所述辅助索引中查找所述目标边,得到所述目标边在所述辅助索引中的位置;
    更新模块,用于根据所述目标边在所述主索引和所述辅助索引中的位置,对所述目标边进行更新或删除。
  9. 根据权利要求6所述的装置,所述主索引的键包依次含所述时序图的边的源顶点,边类型和目标顶点。
  10. 根据权利要求6所述的装置,所述辅助索引的键依次包含所述时序图的边的源顶点,边类型,时间戳和目标顶点。
  11. 一种管理图数据的装置,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器被配置为执行所述可执行代码,以实现权利要求1-5中任一项所述的方法。
PCT/CN2022/131020 2021-11-11 2022-11-10 图数据的管理 WO2023083237A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111329919.1A CN113779286B (zh) 2021-11-11 2021-11-11 管理图数据的方法及装置
CN202111329919.1 2021-11-11

Publications (1)

Publication Number Publication Date
WO2023083237A1 true WO2023083237A1 (zh) 2023-05-19

Family

ID=78873692

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/131020 WO2023083237A1 (zh) 2021-11-11 2022-11-10 图数据的管理

Country Status (2)

Country Link
CN (1) CN113779286B (zh)
WO (1) WO2023083237A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113779286B (zh) * 2021-11-11 2022-02-08 支付宝(杭州)信息技术有限公司 管理图数据的方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593433A (zh) * 2013-11-12 2014-02-19 中国科学院信息工程研究所 一种面向海量时序数据的图数据处理方法及系统
CN105138528A (zh) * 2014-06-09 2015-12-09 腾讯科技(深圳)有限公司 一种多值数据存储、读取的方法和装置及其存取的系统
CN108256088A (zh) * 2018-01-23 2018-07-06 清华大学 一种基于键值数据库的时序数据的存储方法及系统
CN108572958A (zh) * 2017-03-07 2018-09-25 腾讯科技(深圳)有限公司 数据处理方法及装置
CN113190718A (zh) * 2021-04-28 2021-07-30 百度在线网络技术(北京)有限公司 图数据库的数据处理方法、装置、电子设备及存储介质
CN113779286A (zh) * 2021-11-11 2021-12-10 支付宝(杭州)信息技术有限公司 管理图数据的方法及装置

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210216516A1 (en) * 2014-05-28 2021-07-15 GraphSQL Inc. Management of a secondary vertex index for a graph
CN105488050B (zh) * 2014-09-17 2019-03-08 阿里巴巴集团控股有限公司 一种数据库多索引方法、装置及系统
CN104615677B (zh) * 2015-01-20 2018-02-09 同济大学 一种图数据存取方法及系统
CN104899156B (zh) * 2015-05-07 2017-11-14 中国科学院信息工程研究所 一种面向大规模社交网络的图数据存储及查询方法
CN105095371B (zh) * 2015-06-29 2018-08-10 清华大学 时序图的图数据管理方法及其装置
US11256746B2 (en) * 2016-04-25 2022-02-22 Oracle International Corporation Hash-based efficient secondary indexing for graph data stored in non-relational data stores
CN111694834A (zh) * 2019-03-15 2020-09-22 杭州海康威视数字技术股份有限公司 图数据的入库方法、装置、设备及可读存储介质
CN110941619B (zh) * 2019-12-02 2023-05-16 浪潮软件股份有限公司 针对多种使用场景的图数据存储模型和结构的定义方法
CN112363979B (zh) * 2020-09-18 2023-08-04 杭州欧若数网科技有限公司 一种基于图数据库的分布式索引方法和系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593433A (zh) * 2013-11-12 2014-02-19 中国科学院信息工程研究所 一种面向海量时序数据的图数据处理方法及系统
CN105138528A (zh) * 2014-06-09 2015-12-09 腾讯科技(深圳)有限公司 一种多值数据存储、读取的方法和装置及其存取的系统
CN108572958A (zh) * 2017-03-07 2018-09-25 腾讯科技(深圳)有限公司 数据处理方法及装置
CN108256088A (zh) * 2018-01-23 2018-07-06 清华大学 一种基于键值数据库的时序数据的存储方法及系统
CN113190718A (zh) * 2021-04-28 2021-07-30 百度在线网络技术(北京)有限公司 图数据库的数据处理方法、装置、电子设备及存储介质
CN113779286A (zh) * 2021-11-11 2021-12-10 支付宝(杭州)信息技术有限公司 管理图数据的方法及装置

Also Published As

Publication number Publication date
CN113779286A (zh) 2021-12-10
CN113779286B (zh) 2022-02-08

Similar Documents

Publication Publication Date Title
CN110275884B (zh) 数据存储方法及节点
CN107818115B (zh) 一种处理数据表的方法及装置
US8924365B2 (en) System and method for range search over distributive storage systems
CN108369587B (zh) 创建用于交换的表
US8938456B2 (en) Data recovery system and method in a resource description framework environment
US20150293958A1 (en) Scalable data structures
CN111046034A (zh) 管理内存数据及在内存中维护数据的方法和系统
WO2018097846A1 (en) Edge store designs for graph databases
US10762068B2 (en) Virtual columns to expose row specific details for query execution in column store databases
CN110928882B (zh) 一种基于改进红黑树的内存数据库索引方法及系统
CN102890678A (zh) 一种基于格雷编码的分布式数据布局方法及查询方法
US11868328B2 (en) Multi-record index structure for key-value stores
WO2023160137A1 (zh) 图数据存储方法、系统及计算机设备
US20230252012A1 (en) Method for indexing data
WO2023083237A1 (zh) 图数据的管理
CN112912870A (zh) 租户标识符的转换
WO2020192663A1 (zh) 一种数据管理方法及相关设备
CN111666302A (zh) 用户排名的查询方法、装置、设备及存储介质
CN116450607A (zh) 数据处理方法、设备及存储介质
CN111897837A (zh) 数据查询方法、装置、设备和介质
CN114048219A (zh) 图数据库更新方法及装置
US20200334265A1 (en) Computer program for providing space management for data storage in a database management system
CN115718571B (zh) 一种基于多维度特征的数据管理方法和装置
WO2024016789A1 (zh) 日志数据查询方法、装置、设备和介质
CN117131023B (zh) 数据表处理方法、装置、计算机设备和可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22892031

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18572132

Country of ref document: US