WO2023056928A1 - 数据存储及查询 - Google Patents

数据存储及查询 Download PDF

Info

Publication number
WO2023056928A1
WO2023056928A1 PCT/CN2022/123782 CN2022123782W WO2023056928A1 WO 2023056928 A1 WO2023056928 A1 WO 2023056928A1 CN 2022123782 W CN2022123782 W CN 2022123782W WO 2023056928 A1 WO2023056928 A1 WO 2023056928A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
outbound
node
query
neighbor
Prior art date
Application number
PCT/CN2022/123782
Other languages
English (en)
French (fr)
Inventor
张松清
江进
付治钧
朱炳鹏
袁琳
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2023056928A1 publication Critical patent/WO2023056928A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing

Definitions

  • the embodiments of this specification generally relate to the field of data processing, and in particular, relate to a data storage method and device, a data query method and device, and a database system applicable to graph data.
  • the application scenarios of graph data are becoming more and more extensive, and the amount of graph data is also increasing.
  • the memory storage method is limited by the capacity and price of the memory, and the data storage scale is limited, which is not suitable for storing massive graph data, so the graph data needs to be stored in a data storage medium such as a disk.
  • Existing graph data storage schemes cannot store graph data in data storage media in a manner with high data query efficiency.
  • the embodiment of this specification provides a data storage and query solution.
  • the graph data can be stored in the data storage medium in the form of point-edge hybrid storage, and efficient data query can be realized.
  • a data storage method including: determining the number of neighbor graph nodes of each start graph node in the directed graph data to be stored; according to the number of neighbor graph nodes of each start graph node The quantity determines the data storage mode; for each initial graph node, when the data storage mode is non-super large point data storage, the node data, neighbor information, edge index feature information and edge data storage of the initial graph node To the data block of the first initial graph node of the first data storage medium, the outbound index feature information includes outbound index features of all outbound edges of the initial graph node, and each outbound index feature is used to index all outbound index features.
  • a mapping relationship is formed between the outbound data indexes corresponding to the outbound data stored in the first initial graph node data block; and for each initial graph node, when the data storage method is super large point data storage, the The node data, neighbor information, outbound index feature range information and outbound data block index of the initial graph node are stored in the second initial graph node data block of the second data storage medium, and the outbound index feature range information Including a plurality of outbound index feature ranges that form a mapping relationship with outbound data block indexes, and at least two of the outbound data and outbound data storage address information of the starting graph node stored in the third data storage medium
  • Each outbound data block, the outbound data storage address information includes a binary array ⁇ the outbound index feature of the outbound data, the relative storage address of the outbound data in the outbound data block>.
  • the data storage method is determined relative to all start graph nodes in the directed graph data, or the data storage method is determined relative to each The starting graph nodes are determined respectively.
  • the node data includes the node ID and node attributes of the start graph node
  • the neighbor information includes the node ID and neighbor attributes of the start graph node
  • the neighbor attributes include the basic information
  • the outbound data include outbound identifiers and outbound attributes.
  • the basic information of each outgoing edge includes the node identifier of the terminating graph node of the outgoing edge and the outgoing edge index feature of the outgoing edge, and the outgoing edge identifier includes the node identifier of the terminating graph node and Out edge index feature.
  • the basic information of each outgoing edge further includes the node type of the termination graph node of the outgoing edge and/or the outgoing edge type of the outgoing edge
  • the outgoing edge identifier further includes the outgoing edge type
  • the node data further includes node metadata
  • the node metadata includes node index features and/or node types of the starting graph nodes.
  • the index features include timestamps
  • the outbound edge index feature information includes outbound timestamps of all outbound edges sorted in descending order
  • the outbound edge index feature range information includes outbound edge time stamps sorted in descending order Multiple outbound timestamp ranges after sorting.
  • each range of outbound timestamps stores the maximum and minimum outbound timestamps of the corresponding outbound data block.
  • the node data block of the first start graph and the node data block of the second start graph also store reverse neighbor information, and/or the out-edge data block also stores out-edge quantity.
  • the node data, neighbor information, outgoing edge of the initial graph node may include: block the node data, neighbor index feature range, and neighbor data of the start graph node
  • the index, the outbound index feature range information and the outbound data block index are stored in the second start graph node data block of the second data storage medium, and the neighbor information is stored in at least two neighbors of the fourth data storage medium respectively
  • the first data storage medium, the second data storage medium, and the third data storage medium respectively include one or more data storage media, and the first data storage medium, Part of the second data storage medium and the third data storage medium are implemented by using the same data storage medium.
  • the non-super-large point data storage and the super-large point data storage are implemented in a key-value pair storage manner.
  • a data query method including: in response to receiving a data query request initiated by a user, determining the data block index of the graph node to be queried based on the node identifier of the graph node to be queried , the directed graph data is stored in the first data storage medium, the second data storage medium and/or the third data storage medium according to the above-mentioned method; the data block index indexed start graph node data The block is read from the first data storage medium or the second data storage medium into the memory of the data query device and analyzed; according to the parsed start graph node data block, in the local analysis data of the data query device Or acquire the query data of the data query request from the outbound data blocks of the third data storage medium; and provide the acquired query data to the user.
  • the node data includes the node ID and node attributes of the start graph node
  • the neighbor information includes the node ID and neighbor attributes of the start graph node
  • the neighbor attributes include the basic Information
  • the outbound data includes outbound identification and outbound attributes.
  • obtaining the query data of the data query request from the local parsed data of the data query device or from the outbound data block of the third data storage medium may include : in response to the data query request, instruct the node attribute of the query graph node, obtain the node attribute in the node data of the parsed initial graph node data block, as the query data, and instruct the query in response to the data query request
  • the neighbor attribute of the graph node is obtained by obtaining the neighbor attribute in the parsed neighbor information as the query data, or in response to the data query request indicating the query graph node's outbound attribute, and analyzing the initial graph node data from the parsed Determine the outbound index feature of the target outbound edge in the neighbor information of the block, determine the outbound data index of the target outbound edge based on the outbound index feature and the outbound index feature information, and determine the outbound data index from the outbound data index Obtaining the outbound attribute of the target outbound edge from the outbound data of the outbound data of the
  • the data query request includes a filter condition.
  • obtaining the node attributes in the parsed node data of the initial graph node data block may include: instructing the query graph nodes in response to the data query request Based on the filter conditions in the data query request, query and filter the parsed node data of the initial graph node data block, and obtain the node attributes of the node data after query and filter.
  • obtaining the neighbor attribute in the parsed neighbor information may include: in response to the data query request indicating the query graph node's neighbor attribute, based on the data query request The filtering condition is used to query and filter the parsed neighbor information, and obtain the neighbor attributes in the query-filtered neighbor information.
  • determining the outgoing edge index feature of the target outgoing edge from the parsed neighbor information of the starting graph node data block may include: responding to the data query request Indicate the outgoing edge attribute of the query graph node, and determine the outgoing edge index feature of the target outgoing edge that meets the filter condition from the parsed neighbor information of the starting graph node data block.
  • obtaining the outbound attribute of the target outbound edge from the outbound data block of the third data storage medium indexed by the outbound data block index as the query data may include: Reading the edge data blocks indexed by the edge data block index from the third data storage medium into the memory of the data query device; analyzing the edge data in the read edge data blocks storing address information; based on the outbound index feature of the target outbound, determine the relative storage address of the outbound data of the target outbound in the outbound data block from the resolved outbound data storage address information ; According to the relative storage address, obtain the outbound data of the target outbound data from the read outbound data block and analyze it; and acquire the outbound data in the analyzed target outbound data attribute, as the query data.
  • the outbound index feature includes an outbound timestamp.
  • determining the outbound data index of the target outbound edge based on the outbound index feature and the outbound index feature information may include: based on the outbound time stamp, using a binary search method at the outbound time Find out in the stamp information to determine the outbound data index of the target outbound.
  • determining the outbound data block index of the target outbound edge based on the outbound index feature and the outbound index feature range information may include: based on the outbound timestamp, using a binary search method in the outbound The edge timestamp range information is searched to determine the outgoing edge data block index of the target outgoing edge.
  • the second start graph node data blocks store the start graph node
  • the node data, neighbor index feature range, neighbor data block index, out-edge index feature range information and out-edge data block index, and the neighbor information of the starting graph node are stored in at least two neighbors of the fourth data storage medium respectively. Data chunking.
  • the data query method may further include: in response to the data query request, indicating the neighbor attributes of the query graph nodes, based on the neighbor index feature and the neighbor index feature The range information determines the neighbor data block index; and the neighbor data block indexed by the neighbor data block index is read from the fourth data storage medium to the memory of the data query device and analyzed.
  • a data storage device including: a node number determination unit, which determines the number of neighbor graph nodes of each starting graph node in the directed graph data to be stored; the data storage method The determination unit determines the data storage method according to the number of neighbor graph nodes of each initial graph node; and the data storage unit, for each initial graph node, when the data storage method is non-super large point data storage, the initial graph node
  • the node data, neighbor information, outbound index feature information and outbound data of the node are stored in the first start graph node data block of the first data storage medium, and the outbound index feature information includes all outbound graph nodes of the start graph node.
  • each outbound index feature forms a mapping relationship with the outbound data index for indexing the corresponding outbound data stored in the first start graph node data block; and in the data
  • the node data, neighbor information, outbound index feature range information and outbound data block index of the initial graph node are stored in the second initial graph node data of the second data storage medium block
  • the outbound index feature range information includes a plurality of outbound index feature ranges that form a mapping relationship with the outbound data block index
  • the outbound data and outbound data storage address of the starting graph node The information is stored in at least two outbound data blocks of the third data storage medium, the outbound data storage address information includes the outbound index feature of the binary array ⁇ outbound data, and the outbound data is in the outbound data blocks The relative storage address of >.
  • the data storage mode determination unit determines the data storage mode with respect to all start graph nodes in the directed graph data, or with respect to all start graph nodes in the directed graph data Each start graph node determines the data storage mode.
  • the data storage unit takes the node data of the starting graph node, neighbors
  • the index feature range, neighbor data block index, outbound index feature range information, and outbound data block index are stored in the second start graph node data block of the second data storage medium, and neighbor information is stored in the fourth data block respectively.
  • At least two neighbor data blocks of the storage medium, and the outbound data and outbound data storage address information of the starting graph node are stored in at least two outbound data blocks of the third data storage medium.
  • a data query device including: a data block index determination unit, in response to receiving a data query request initiated by a user, based on the node identifier of the node to be queried, determine the node to be queried
  • the data block index of the graph node, the directed graph graph data is stored in the first data storage medium, the second data storage medium and/or the third data storage medium according to the above method; the data reading unit, the described The initial graph node data block indexed by the data block index is read from the first data storage medium or the second data storage medium into the memory of the data query device; the data analysis unit, for the read initial graph node data Parsing in blocks;
  • the query data acquisition unit obtains the data in the local parsing data of the data query device or from the outbound data blocks of the third data storage medium according to the parsed start graph node data blocks query data of the query request; and a query data providing unit that provides the acquired query data to the user.
  • the node data includes the node ID and node attributes of the start graph node
  • the neighbor information includes the node ID and neighbor attributes of the start graph node
  • the neighbor attributes include the basic Information
  • the outbound data includes outbound identification and outbound attributes.
  • the query data acquisition unit acquires the node attributes in the parsed node data of the initial graph node data block as the query data.
  • the query data acquiring unit acquires the neighbor attributes in the parsed neighbor information as the query data.
  • the query data acquisition unit determines the outbound index feature of the target outbound edge from the parsed neighbor information of the initial graph node data block, based on the The outbound index feature and the outbound index feature information determine the outbound data index of the target outbound edge, and from the outbound data of the parsed initial graph node data block indexed by the outbound data index Obtaining the outbound attribute of the target outbound edge as the query data, or determining the outbound data block index of the target outbound edge based on the outbound index feature and the outbound index feature range information, and from the The outbound attribute of the target outbound edge is acquired from the outbound data block of the third data storage medium indexed by the outbound edge data block index as the query data.
  • the data query request includes a filter condition.
  • the query data acquisition unit further performs query filtering on the parsed node data of the initial graph node data block based on the filter condition in the data query request.
  • the query data obtaining unit performs query filtering on the parsed neighbor information based on the filter condition in the data query request.
  • the query data acquisition unit further determines the outbound target outbound attribute that meets the filter condition from the parsed neighbor information of the initial graph node data block. Edge index feature.
  • the data reading unit reads the outbound data block indexed by the outbound data block index from a third data storage medium into the memory of the data query device.
  • the data parsing unit analyzes the storage address information of the outbound data in the outbound data block; based on the outbound index feature of the target outbound.
  • the data query unit is configured to: determine the relative storage address of the target outbound data block in the outbound data block from the parsed outbound data storage address information; Acquiring and parsing the outbound data of the target outbound data in the outbound data block of the outbound target;
  • the second start graph node data blocks store the start graph node
  • the node data, neighbor index feature range, neighbor data block index, outbound index feature range information, outbound data block index, and neighbor information are respectively stored in at least two neighbor data blocks of the fourth data storage medium.
  • the query data obtaining unit determines the neighbor data block index based on the neighbor index feature and the neighbor index feature range information, and the data reading unit converts the The neighbor data blocks indexed by the neighbor data block index are read from the fourth data storage medium to the memory of the data query device.
  • the data parsing unit further parses the read neighbor data blocks.
  • a database system including: the above-mentioned data storage device; the above-mentioned data query device; and at least one data storage medium, including the first data storage medium, the second The second data storage medium and/or the third data storage medium.
  • a data storage device including: at least one processor, a memory coupled to the at least one processor, and a computer program stored in the memory, the at least A processor executes the computer program to implement the data storage method as described above.
  • a data query device including: at least one processor, a memory coupled to the at least one processor, and a computer program stored in the memory, the at least A processor executes the computer program to implement the above data query method.
  • a computer-readable storage medium which stores executable instructions, and the instructions, when executed, cause a processor to perform the data storage method as described above or execute the data storage method as described above. data query method.
  • a computer program product including a computer program, the computer program is executed by a processor to implement the above data storage method or execute the above data query method.
  • Fig. 1 shows an example schematic diagram of a database system according to an embodiment of the present specification.
  • Fig. 2 shows an example flow chart of a data storage method according to an embodiment of the specification.
  • Figure 3 shows an example schematic diagram of directed graph data.
  • Fig. 4 shows an exemplary schematic diagram of a non-very large point data storage process according to an embodiment of the present specification.
  • Fig. 5 shows a schematic diagram of an example of data partitioning of nodes in a first start graph according to an embodiment of the present specification.
  • Fig. 6 shows an example flowchart of a very large point data storage process according to an embodiment of the present specification.
  • Fig. 7 shows another example flow chart of the super large point data storage process according to the embodiment of the specification.
  • Fig. 8 shows an example flow chart of a data query method according to an embodiment of this specification.
  • FIG. 9 shows an example flow chart of a query data acquisition process according to an embodiment of the present specification.
  • Fig. 10 shows an example flowchart of a method for acquiring neighbor attributes from neighbor data blocks of a fourth data storage medium according to an embodiment of the present specification.
  • Fig. 11 shows an example flow chart of a method for acquiring outbound attributes from outbound data blocks of a third data storage medium according to an embodiment of the present specification.
  • FIG. 12 shows an example block diagram of a data storage device according to an embodiment of the specification.
  • Fig. 13 shows an example block diagram of a data query device according to an embodiment of the present specification.
  • Fig. 14 shows an exemplary schematic diagram of a data storage device implemented based on a computer system according to an embodiment of the present specification.
  • Fig. 15 shows a schematic diagram of an example of a data query device implemented based on a computer system according to an embodiment of the present specification.
  • the term “comprising” and its variants represent open terms meaning “including but not limited to”.
  • the term “based on” means “based at least in part on”.
  • the terms “one embodiment” and “an embodiment” mean “at least one embodiment.”
  • the term “another embodiment” means “at least one other embodiment.”
  • the terms “first”, “second”, etc. may refer to different or the same object. The following may include other definitions, either express or implied. Unless the context clearly indicates otherwise, the definition of a term is consistent throughout the specification.
  • Graph data includes graph node data and edge data.
  • Graph node data may include, for example, node identifications (node IDs) and node attributes of graph nodes, and edge data may include edge attribute data.
  • the node ID of a graph node is used to uniquely identify the node.
  • Node identification, node attribute data, and edge attribute data may be business-related.
  • the node identifier may be a person's ID number or personnel number.
  • Node attribute data can include age, education, address, occupation, etc.
  • the edge attribute data may include a relationship between nodes, that is, a relationship between people, such as a classmate/colleague relationship, and the like.
  • massive graph data there are massive graph node data and edge data, and there is a complex relationship between graph node data and edge data.
  • the storage method of graph data in the data storage medium will greatly affect the query efficiency of graph data.
  • edge data is stored separately.
  • multiple outgoing edge data of a certain starting point may be stored in different data blocks (edge tables).
  • edge information for example, querying edge attributes
  • IO queries IO reads
  • IO query refers to data interaction with the data storage medium (for example, disk) through the IO interface of the operating system of the data query device, thereby reading the IO query result (IO query data) from the data storage medium to the data query in the device's memory.
  • IO query may become the system bottleneck of the data query device, thereby prolonging the data query delay, resulting in poor efficiency of graph data query.
  • the node data, edge data, and neighbor information of graph nodes are stored as point tables, edge tables, and neighbor tables with different data structures, and are deployed on independent point table servers and edge table servers respectively. end and neighbor table server. Due to the heterogeneity of stored data on different servers, the update of the neighbor table will become the system bottleneck of the database system when the topology of the graph is updated frequently.
  • FIG. 1 shows an example schematic diagram of a database system 100 according to an embodiment of the present specification.
  • the database system 100 can also be referred to as a database application server, which is used to provide data storage services and data query services.
  • the database system 100 includes a data storage device 110 , a data query device 120 and at least one data storage medium 130 .
  • at least one data storage medium 130 includes a first data storage medium 130-1, a second data storage medium 130-2, and/or a third data storage medium 130-3.
  • first data storage medium 130-1, the second data storage medium 130-2 and the third data storage medium 130-3 are illustrated as one data storage medium, in other embodiments, the first data storage medium
  • Each of the medium 130-1, the second data storage medium 130-2, and the third data storage medium 130-3 may have a plurality of data storage mediums.
  • the data storage device 110 is configured to store the data to be stored in the data storage medium 130 of the database system 100 .
  • the data query means 120 is configured to acquire target data (query data) from the data storage medium 130 in response to a data query request.
  • the data storage medium 130 may also be referred to as an external memory.
  • the data query device 120 performs data query, it is necessary to read the data in the data storage medium 130 into the memory of the data query device 120 by initiating a data read operation such as an IO query to the data storage medium 130, and in the memory Perform data query processing to obtain query data.
  • the data storage medium 130 may be, for example, various non-volatile storage media, such as a magnetic disk device, a memory card, and the like.
  • a magnetic disk is a memory that stores data using magnetic recording technology. Examples of disks may include, for example, various forms of (Soft Disk) floppy disks and hard disks (Hard Disk).
  • FIG. 2 shows an example flow chart of a data storage method 200 according to an embodiment of the specification.
  • FIG. 2 determines the number of neighbor graph nodes of each start graph node in the directed graph data to be stored.
  • the term "directed graph” refers to graph data in which the edge relationships between graph nodes have directionality.
  • the term “neighbor graph node” refers to a graph node that can be reached by one hop along the direction of an edge.
  • Figure 3 shows an example schematic diagram of directed graph data. In the example of FIG. 3 , six graph nodes A, B, C, D, E and F are shown. Graph nodes B, C, and D are neighbor nodes of graph node A, and graph node F is a neighbor node of graph node C.
  • Graph node E is not a neighbor node of graph node A, but graph node E is a reverse neighbor node of graph node A.
  • a data storage mode is determined according to the number of neighbor graph nodes of each start graph node.
  • the data storage manner of the directed graph graph data may be determined with respect to all start graph nodes in the directed graph graph data.
  • all graph nodes in the digraph graph data store data in the same way.
  • the maximum value of the number of neighbor graph nodes is compared with the first threshold. If the maximum value is greater than the first threshold, the data storage mode is determined as super large point data storage. If the maximum value is not greater than the first threshold, the data storage mode is determined as non-super large point data storage.
  • the first threshold may be set based on the storage capacity of the data storage medium used, or may be set according to application scenarios or experience.
  • each start graph node may be respectively determined relative to each start graph node in the directed graph data.
  • the data storage method of each graph node in the directed graph data is determined based on the number of its neighbor nodes, so the data storage method of each starting graph node may be the same or different.
  • the data storage mode of the start graph node can be determined based on the number of neighbor nodes of the start graph node, and then subsequent data storage operations are performed, There is no need to wait for the determination of the number of neighbor nodes and the determination of the data storage mode of other start graph nodes.
  • the graph data storage corresponding to the initial graph node is completed according to the determined data storage mode.
  • the node data, neighbor information, outgoing edge index feature information, and outgoing edge data of the initial graph node are Stored in the first start graph node data block of the first data storage medium 130-1.
  • index feature may refer to a specific graph data feature that helps to generate or determine index information when graph data is stored and/or graph data is queried. For example, during data storage, a mapping relationship may be formed between the outbound index feature of each outbound edge and the outbound data index for corresponding outbound data stored in the index data block.
  • the outgoing edge index feature of each outgoing edge can be stored in the outgoing edge index feature information, and according to the storage order (or storage location) of the outgoing edge index feature of each outgoing edge in the outgoing edge index feature information ) to store the corresponding outbound data in the data blocks sequentially, so that each storage location in the outbound index feature information can be used as the outbound data index corresponding to outbound data, and in the outbound index feature A mapping relationship is formed with the storage location in the feature information of the edge index (that is, the index of outgoing edge data).
  • the corresponding data index can be obtained based on the data index features, thereby realizing data query.
  • Examples of indexed features of the data may include, for example, timestamps, specific attributes of the data, and the like.
  • the specific attributes of the data may include specific attributes that help determine the data storage index. For example, if the data attribute includes "payee's age", when data is stored, it can be stored in the order of payee's age, so that the data storage location can be generated based on the attribute "payee's age”. index, so that the attribute "age of payee" can act as an index feature.
  • the outbound index feature information includes outbound index features of all outbound edges of the initial graph node, and each outbound index feature is used to index the corresponding outbound data stored in the first initial graph node data block A mapping relationship is formed between outbound data indexes.
  • the outbound index feature of each outbound data can suggest a mapping relationship with the storage order of the outbound index feature in the outbound index feature information, and the storage order in the outbound index feature information can be used to index the first start graph node
  • the corresponding outgoing edge data in the data block For example, suppose there are four outbound data A1, A2, A3, and A4, and the eigenvalues of the outbound index features of the outbound data A1, A2, A3, and A4 are F1, F2, F3, and F4, respectively.
  • the feature values F1, F2, F3 and F4 can be stored in the feature information of the outgoing edge index. If the storage order of the feature values in the outbound index feature information is F1, F3, F2 and F4, the storage order of outbound data A1, A2, A3 and A4 in the data block is A1, A3, A2 and A4.
  • each first data storage medium may store a first start graph node data block.
  • the first data storage medium may also store more than one first start graph node data partition.
  • Fig. 4 shows an exemplary schematic diagram of a non-very large point data storage process according to an embodiment of the present specification.
  • n initial graph nodes in the directed graph data there are n initial graph nodes in the directed graph data, and the node data, neighbor information, outgoing edge index feature information, and outgoing edge data of each initial graph node are independently stored in a first In the data block of the initial graph node.
  • Fig. 5 shows a schematic diagram of an example of data partitioning of nodes in a first start graph according to an embodiment of the present specification.
  • the first initial graph node data block can store node data, neighbor information, outgoing edge index feature information 1-n, and outgoing edge data 1-n of the initial graph node.
  • the first start graph node data block can be formed as a first data structure with multiple fields, and each field in the first data structure is used to store node data, neighbor information, and outbound index of the start graph node respectively.
  • Feature information 1 ⁇ n and edge data 1 ⁇ n may also be stored in the first start graph node data block.
  • the reverse neighbor information may have the same content as the neighbor information.
  • the node data of the starting graph node may include a node identification (node ID), node attributes, and node metadata of the starting graph node.
  • the node attributes of the starting graph node may include one or more node attributes. Each node attribute can include attribute name and attribute value.
  • the attribute names of nodes may include, for example, "age", "height", "occupation” and so on.
  • the attribute value refers to the corresponding value of the attribute name. Attribute names can be used for indexing, thus supporting conditional filtering during data query.
  • Node metadata for a starting graph node may include an index feature of the starting graph node, eg, a node timestamp.
  • the node metadata of the starting graph node may also include the node type.
  • the node type may be, for example, characteristic information for realizing node classification, for example, "person", “company”, “equipment” and so on.
  • node metadata includes timestamp and node type.
  • node data may not include node metadata.
  • Neighbor information may include node identification and neighbor attributes of the starting graph node.
  • Neighbor attributes include the basic information of all outgoing edges of the starting graph node.
  • the basic information of each outgoing edge may include the node identifier (terminal ID) of the termination graph node of the outgoing edge and the outgoing edge index feature of the outgoing edge.
  • the basic information of each outgoing edge may be sequentially stored in the neighbor information according to the corresponding storage order of the outgoing edge index feature of each outgoing edge in the outgoing edge index feature information.
  • the outbound index feature is the outbound timestamp.
  • the neighbor information of the start graph node may also include the end point type and outgoing edge type of the end graph node.
  • the outgoing edge index feature information may include sorted outgoing edge index features of all outgoing edges of the starting graph node.
  • the storage order of the outgoing edge data in the data block of the initial graph node is the same as the storage order of the outgoing edge index feature in the outgoing edge index feature information.
  • the storage location (for example, a field) for storing the outgoing edge index feature information may include a plurality of outgoing edge index feature storage locations, each outgoing edge index feature storage location is used to store one outgoing edge index feature, and The storage location of each outgoing edge index feature can index a subsequent corresponding outgoing edge data, that is, the storage location serves as a data index of the outgoing edge data.
  • the outgoing edge index feature information includes n storage locations for storing the outgoing edge index feature, where the kth storage location corresponds to the subsequent kth outgoing edge data, 1 ⁇ k ⁇ n, and k is positive integer.
  • the outgoing timestamps of all outgoing edges can be sorted in descending order and stored in the outgoing timestamp information.
  • each outbound data can be stored sequentially according to the order in which their time stamps are stored in the outbound time stamp information.
  • Outbound data can include outbound identifiers and outbound attributes.
  • the outgoing edge identification may include the node identification (terminal ID) of the terminating graph node and the outgoing edge index feature.
  • the outgoing edge identifier may also include the outgoing edge type.
  • the outgoing edge type may be, for example, feature information for implementing edge classification. For example, when the outgoing edge indicates account transfer, the outgoing edge type can be "transfer”. When the outgoing edge indicates payment, the outgoing edge type can be "payment”.
  • Out-edge attributes may include one or more out-edge attributes. Each outgoing edge attribute can include attribute name and attribute value.
  • the attribute name of the edge-out attribute may include, for example, "amount", “currency”, "operating equipment” and the like.
  • the attribute value refers to the corresponding value of the attribute name.
  • the node data, neighbor information, outgoing edge index feature range information and outgoing edge index feature range information of the initial graph node are stored.
  • the edge data block index is stored in the second start graph node data block of the second data storage medium.
  • the second start graph node data block can be formed as a second data structure with a plurality of fields, and each field in the second data structure is used to store node data, neighbor information, outgoing edge Index feature range information and outbound data block index.
  • the feature range of the outbound index may include multiple outbound index feature ranges that form a mapping relationship with the outbound data block index, so that each outbound index feature range information can be used to index an outbound data block index.
  • the storage sequence of feature range information of each outbound index may correspond to one outbound data block index.
  • Each outgoing edge index feature range can store the maximum outgoing edge index feature value and the minimum outgoing edge index feature value in the corresponding outgoing edge data block.
  • the outbound data and outbound data storage address information of the starting graph node are stored in at least two outbound data blocks of the third data storage medium, and the outbound data storage address information includes a binary array ⁇
  • each block of outbound data can be formed into a third data structure with multiple fields, and each field in the third data structure is used to store at least two outbound data and corresponding outbound data storage address information ,As shown in Figure 6.
  • the relative storage address of the outbound data in the outbound data block may be an offset relative to the first address of the outbound data block.
  • the outbound data is stored before the outbound data storage address information.
  • the outgoing edge data block can also store the number of outgoing edges. In this case, the number of outgoing edges is stored at the head of the outgoing edge data block, that is, before all outgoing edge data.
  • Fig. 6 shows an example flowchart of a very large point data storage process according to an embodiment of the present specification.
  • the definition and storage content of the node data, neighbor information and outbound data of the starting graph node in Fig. 6 are exactly the same as those in Fig. 5, and will not be described here.
  • the outbound index feature range information includes multiple outbound timestamp ranges after sorting, and each outbound timestamp range stores the maximum outbound time of the corresponding outbound data block stamp and the minimum outbound timestamp.
  • the outbound data block index is used to save the index information of the outbound data blocks.
  • the index of the outgoing edge data block may be used to save the starting storage address (that is, the first storage address) of the outgoing edge data block in the third data storage medium.
  • the determined data storage method is super-large point data storage, when storing graph data for this type of initial graph node, the node data of the initial graph node , neighbor index feature range, neighbor data block index, outbound index feature range information, and outbound data block index are stored in the second start graph node data block of the second data storage medium, and neighbor information is stored in In at least two neighbor data blocks of the fourth data storage medium, and store the outbound data and outbound data storage address information of the starting graph node in at least two outbound data blocks of the third data storage medium .
  • the neighbor index feature range includes multiple neighbor index feature ranges that form a mapping relationship with the neighbor data block index, so that each neighbor index feature range can index a neighbor data block index.
  • the storage order of the feature range information of each neighbor index can be made to correspond to a neighbor data block index, so that the neighbor index feature range stored in the storage order can be mapped to the neighbor data block index corresponding to the storage order .
  • each neighbor index feature range is used to store the maximum index feature value and the minimum index feature value of the corresponding neighbor data block.
  • the neighbor data block index is used to save the index information of the neighbor data block.
  • the neighbor data block index may be used to save the starting storage address (that is, the first storage address) of the neighbor data block in the fourth data storage medium.
  • Fig. 7 shows another example flow chart of the super large point data storage process according to the embodiment of the specification.
  • the node of the initial graph node can be based on Identify the data block index that generates the first start graph node data block or the second start graph node data block. For example, by calculating the perfect hash value of the node identifier of the node in the starting graph, the perfect hash value is moduloed according to the number of nodes in the starting graph, and the value obtained after the modulo processing is used as the first starting graph The data block index of the node data block or the second start graph node data block.
  • part of the first data storage medium, the second data storage medium, the third data storage medium, and/or the fourth data storage medium may be implemented by using the same data storage medium.
  • the non-super-large point data storage and the super-large point data storage can be implemented in a key-value pair storage manner.
  • key-value pair storage may include, but are not limited to: key-value pair storage based on perfect hash technology, LevelDB-based key-value pair storage, RocksDB-based key-value pair storage, and Redis-based key-value pair storage.
  • the data storage process according to the embodiment of the present specification is described above with reference to the accompanying drawings.
  • the data query can be performed in response to the data query request initiated by the user.
  • FIG. 8 shows an example flowchart of a data query process 800 according to an embodiment of the specification.
  • the data block index of the graph node to be queried is determined based on the node identifier of the graph node to be queried.
  • the data block index is used to index the data block corresponding to the start graph node stored in the data storage medium.
  • the data block index of the graph node to be queried can be determined by calculating the perfect hash value of the node identifier of the graph node to be queried and taking the modulus of the perfect hash value according to the number of nodes of the starting graph node.
  • the data block of the start graph node indexed by the data block index is read from the first data storage medium or the second data storage medium into the memory of the data query device and analyzed.
  • the query data of the data query request is obtained from the local parsed data of the data query device or from the outbound data block of the third data storage medium.
  • the query data acquisition process will be described in detail below with reference to the accompanying drawings.
  • the retrieved query data is provided to the user.
  • FIG. 9 shows an example flowchart of a query data acquisition process 900 according to an embodiment of the specification.
  • the node data includes node identification, node attributes, and node metadata of the starting graph node.
  • Node metadata includes a node index feature for the starting graph node.
  • Neighbor information includes the node ID and neighbor attributes of the starting graph node.
  • Neighbor attributes include the basic information of all outgoing edges, and the basic information of each outgoing edge includes the node identifier of the end graph node of the outgoing edge and the outgoing edge index feature of the outgoing edge.
  • Outbound data includes outbound identification and outbound attributes.
  • the outgoing edge identification includes the node identification of the end graph node and the outgoing edge index feature.
  • node metadata may also include a node type.
  • a data query request is received.
  • query and filter the parsed node data based on the filter condition in the data query request in response to the node attribute of the query graph node indicated by the data query request.
  • the filter condition in the data query request is an index feature (for example, time stamp)
  • query filtering may be performed based on the node index feature in the node metadata of the parsed node data.
  • the data query request also includes the node type
  • query filtering can be performed based on the index feature and node type in the node metadata of the parsed node data.
  • the data query request may also include other filter conditions.
  • the node attributes of the node data filtered by the query are acquired as query data.
  • the data query request may not include filter conditions, so that query filtering may not be performed when obtaining node attributes, so as to obtain all node attributes in the parsed node data as query data.
  • query and filter the parsed neighbor information based on the filter condition in the data query request in response to the data query request indicating the neighbor attribute of the query graph node.
  • the query and filtering of neighbor information can adopt a query and filtering method similar to that of node data filtering.
  • the neighbor attributes include terminal identification, terminal type, outgoing edge type, and outgoing edge index features
  • the terminal identification, terminal type, outgoing edge type, and outgoing edge index features can be used for query and filtering.
  • the data query request may also include other filter conditions.
  • the data query request may not include a filter condition, so that query filtering may not be performed when obtaining neighbor attributes, so as to obtain all neighbor attributes in the parsed neighbor information as query data.
  • the second start graph node data blocks store the node data of the start graph node,
  • the feature range of the neighbor index, the block index of the neighbor data, the feature range information of the out-edge index, the block index of the out-edge data, and the neighbor information of the starting graph node are respectively stored in at least two neighbor data blocks of the fourth data storage medium .
  • Fig. 10 shows an example flowchart of a method 1000 for acquiring neighbor attributes from neighbor data blocks of a fourth data storage medium according to an embodiment of the present specification.
  • the neighbor data block index is determined in the range information of the neighbor index feature based on the neighbor index feature.
  • the neighbor data blocks indexed by the neighbor data block index are read from the fourth data storage medium to the memory of the data query device and analyzed.
  • the data query request may not include a filter condition, so that all neighbor attributes in the parsed neighbor data block may be used as query data.
  • the outbound edge of the target outbound edge that meets the filter condition in the data query request from the parsed initial graph node data block index feature. For example, query and filter may be performed based on the parsed neighbor information in the node data block of the start graph to find an end point identifier (ID) that meets the filter condition.
  • ID end point identifier
  • the above query filtering process may adopt the same query filtering process as the above query with reference to neighbor attributes. Then, the edge index feature corresponding to the found end point ID is extracted.
  • query filtering may not be performed.
  • the binary search method can be used to determine the outgoing edge data index of the target outgoing edge in the outgoing edge timestamp information based on the outgoing edge timestamp, or determine the target in the outgoing edge timestamp range information The block index of the outgoing edge data.
  • Fig. 11 shows an example flow chart of a method 1100 for acquiring outbound attributes from outbound data blocks of a third data storage medium according to an embodiment of the present specification.
  • the outbound data block indexed by the outbound data block index is read from the third data storage medium into the memory of the data query device.
  • the storage address information of the outbound data is usually stored at the end position of the outbound data block, that is, stored after the outbound data block.
  • the data analysis can be performed from the end position to the start position, so that the outbound storage address information in the outbound data blocks can be first analyzed.
  • the outgoing edge data block can also store the number of outgoing edges. Based on the number of outgoing edges, the storage address information of the outgoing edge data can be easily extracted from the outgoing edge data blocks.
  • the outbound attributes in the outbound data of the parsed target outbound edges are obtained as query data.
  • the storage address information of the outbound data in the outbound data block is obtained by first parsing (that is, partial parsing), and based on the outbound index characteristics of the target outbound edge, the outbound data of the target outbound edge is determined.
  • the relative storage address in the block so that only the outbound data of the target outbound data can be obtained from the outbound data block for further analysis, without the need to obtain and analyze the rest of the outbound data, thus greatly reducing the data processing of the data query process volume, thereby improving the efficiency of data query.
  • the neighbor attribute of the neighbor information includes an outgoing edge index feature and an outgoing edge type
  • the outgoing edge identifier includes an outgoing edge index feature, an end point identifier, and an outgoing edge type.
  • the outbound data index or the outbound data block index is determined based on the outbound index feature.
  • the outbound attributes of the matched outbound data are obtained as query data.
  • the outgoing edge attributes that match the outgoing edge data parsed locally can be directly obtained as query data.
  • the outbound edge attribute For unmatched outgoing edge data, the outgoing edge attribute will not be obtained.
  • the outbound data to be obtained is matched again by using the edge index feature, end point identifier and outbound type, and the matched data is analyzed to obtain the query data , which can make the obtained edge data more accurate, and further reduce the amount of data for analysis and processing, thereby further improving the efficiency of graph data query.
  • the neighbor nodes of the graph node are not large, by storing the node data, neighbor information, edge index feature information, and edge data of the graph node in the same data block, it is possible to Data query is realized through an IO read operation for the data storage medium.
  • the neighbor nodes of the graph node are large in scale, by storing the node data, neighbor information, outbound index feature range information and outbound data block index of the initial graph node in the second data storage medium
  • the initial graph node data block, and the outbound data and outbound data storage address information of the initial graph node are respectively stored in at least two outbound data blocks of the third data storage medium, which can be used for the data storage medium
  • the two IO read operations implement data query. According to the above processing scheme, the number of IO read operations during graph data query can be greatly reduced, thereby reducing the graph data query time and improving the graph data query efficiency.
  • the neighbor information is stored in multiple neighbor data blocks, and when data query is performed, several target neighbor data blocks are read on demand for Analytical processing, so as to reduce the amount of data processing in the data query process, thereby improving the efficiency of data query.
  • FIG. 12 shows an example block diagram of a data storage device 1200 according to an embodiment of the specification.
  • the data storage device 1200 may include a node number determination unit 1210 , a data storage mode determination unit 1220 and a data storage unit 1230 .
  • the node number determining unit 1210 is configured to determine the number of neighbor graph nodes of each start graph node in the directed graph data to be stored.
  • the data storage mode determination unit 1220 is configured to determine the data storage mode according to the number of neighbor graph nodes of each start graph node. In one example, the data storage mode determining unit 1220 determines a data storage mode with respect to all start graph nodes in the directed graph data. In another example, the data storage manner determining unit 1220 determines a data storage manner for each start graph node in the directed graph data.
  • the data storage unit 1230 is configured to configure the node data, neighbor information, outgoing edge index feature information, and outgoing edge information of the starting graph node.
  • the data is stored in the first start graph node data blocks of the first data storage medium, the outbound index feature information includes all outbound edge index features, and each outbound index feature is used to index the first start A mapping relationship is formed between the outbound data indexes of the outbound data stored in the graph node data blocks.
  • the data storage unit 1230 is configured to include the node data, neighbor information, outgoing edge index feature range information, and outgoing edge information of the starting graph node.
  • the data block index is stored in the second start graph node data block of the second data storage medium, and the outbound index feature range information includes a plurality of outbound index features that form a mapping relationship with the outbound data block index range, each outbound index feature range information is used to index an outbound data block index, and store outbound data and outbound data storage address information of the starting graph node in at least two of the third data storage medium
  • the outbound data is divided into blocks, and the outbound data storage address information includes a binary array ⁇ the outbound index feature of the outbound data, the relative storage address of the outbound data in the outbound data block>.
  • the data storage unit 1230 may further store reverse neighbor information in the first start graph node data block and the second start graph node data block.
  • the data storage unit 1230 may use the node data of the starting graph node, neighbor index feature The range, neighbor data block index, outbound index feature range information, and outbound data block index are stored in the second start graph node data block of the second data storage medium, and the neighbor information is stored in the fourth data storage respectively In the at least two neighbor data blocks of the medium, and store the outbound data and outbound data storage address information of the starting graph node in at least two outbound data blocks of the third data storage medium.
  • Fig. 13 shows an example block diagram of a data query device 1300 according to an embodiment of the present specification.
  • the data query device 1300 includes a data block index determination unit 1310 , a data reading unit 1320 , a data parsing unit 1330 , a query data acquisition unit 1340 and a query data provision unit 1350 .
  • the data block index determining unit 1310 is configured to determine the data block index of the graph node to be queried based on the node identifier of the graph node to be queried in response to receiving a data query request initiated by a user.
  • the data reading unit 1320 is configured to read the corresponding start graph node data block of the data block index from the first data storage medium or the second data storage medium to the memory of the data query device.
  • the data parsing unit 1330 is configured to parse the read start graph node data blocks.
  • the query data obtaining unit 1340 is configured to obtain the query data of the data query request from the locally parsed data of the data query device or from the outbound data blocks of the third data storage medium according to the parsed start graph node data blocks.
  • the query data providing unit 1350 is configured to provide the acquired query data to the user.
  • the node data includes a node identification of a starting graph node and node attributes.
  • Neighbor information includes the node ID and neighbor attributes of the starting graph node.
  • Outbound data includes outbound identification and outbound attributes.
  • the query data obtaining unit 1340 obtains the node attributes in the parsed node data of the initial graph node data block as query data.
  • the query data obtaining unit 1340 obtains the neighbor attributes in the parsed neighbor information of the initial graph node data block as query data.
  • the query data acquisition unit 1340 determines the outbound index feature of the target outbound edge from the parsed initial graph node data block, based on the outbound index feature and the outbound index
  • the feature information determines the outbound data index of the target outbound edge or determines the outbound data block index of the target outbound edge based on the outbound index feature and the outbound index feature range information.
  • the query data acquisition unit 1340 acquires the outbound attribute of the target outbound edge from the outbound data of the parsed start graph node data block indexed by the outbound data index as Query data.
  • the query data acquisition unit 1340 acquires the edge-out attribute of the target out-edge from the edge-out data block of the third data storage medium indexed by the edge-out data block index as query data.
  • the data query request may include filter conditions.
  • the query data acquisition unit 1340 performs query filtering on the parsed node data of the starting graph node data block based on the filter condition in the data query request, and obtains the query filter
  • the node attributes in the node data are used as query data.
  • the query data obtaining unit 1340 performs query filtering on the parsed neighbor information based on the filtering conditions in the data query request, and obtains the neighbor attributes in the query filtered neighbor information as Query data.
  • the query data acquisition unit 1340 determines the outbound index feature of the target outbound edge that meets the filtering condition from the parsed neighbor information of the initial graph node data block. Then, the query data acquisition unit 1340 determines the outbound data index of the target outbound edge based on the outbound index feature and the outbound index feature information, or determines the outbound data index of the target outbound edge based on the outbound index feature and the outbound index feature range information. block index.
  • the query data acquisition unit 1340 acquires the outbound attribute of the target outbound edge from the outbound data of the parsed start graph node data block indexed by the outbound data index as Query data. In response to determining the edge-out data block index of the target out-edge, the query data acquisition unit 1340 acquires the edge-out attribute of the target out-edge from the edge-out data block of the third data storage medium indexed by the edge-out data block index as query data.
  • the data reading unit 1320 reads the out-edge data block indicated by the out-edge data block index from the third data storage medium to the data Query device memory.
  • the data analysis unit 1330 parses out the edge data storage address information in the edge data blocks.
  • the query data acquisition unit 1340 determines the relative storage address of the target outgoing edge in the outgoing edge data block from the parsed outgoing edge data storage address information based on the outgoing edge index feature of the target outgoing edge.
  • the query data acquiring unit 1340 acquires the edge-out data of the target edge-out from the read out-edge data block according to the relative storage address and analyzes it, and obtains the edge-out data in the resolved edge-out data of the target edge Attributes, as query data.
  • the query data acquisition unit 1340 can use the binary search method to determine the outbound data index information of the target outbound time stamp in the outbound time stamp information or in the outbound time stamp based on the outbound time stamp. In the scope information, determine the block index of the outgoing edge data of the target outgoing edge.
  • the second start graph node data blocks store the node data of the start graph node,
  • the neighbor index feature range, neighbor data block index, outbound index feature range information, outbound data block index, and neighbor information are respectively stored in at least two neighbor data blocks in the fourth data storage medium.
  • the query data obtaining unit 1340 determines the neighbor data block index based on the neighbor index feature and the neighbor index feature range information. After the neighbor data block index is determined, the data reading unit 1320 reads the neighbor data block indicated by the neighbor data block index from the fourth data storage medium to the memory of the data query device. Subsequently, the data parsing unit 1330 parses the read neighbor data blocks.
  • the query data acquisition unit 1340 performs query and filtering on the neighbor information in the parsed neighbor data block based on the filter condition in the data query request, and acquires the neighbor attributes in the query-filtered neighbor information as query data.
  • the data storage method, data storage device, data query method and data query device according to the embodiments of this specification are described.
  • the above data storage device and data query device can be implemented by hardware, or by software or a combination of hardware and software.
  • FIG. 14 shows a schematic diagram of a data storage device 1400 implemented based on a computer system according to an embodiment of the present specification.
  • the data storage device 1400 may include at least one processor 1410, memory (for example, non-volatile memory) 1420, memory 1430 and communication interface 1440, and at least one processor 1410, memory 1420, memory 1430 and The communication interfaces 1440 are connected together via a bus 1460 .
  • At least one processor 1410 executes at least one computer-readable instruction (ie, the elements implemented in software described above) stored or encoded in a memory.
  • computer-executable instructions are stored in memory which, when executed, cause at least one processor 1410 to: determine the number of neighbor graph nodes for each starting graph node in the directed graph data to be stored; The number of neighbor graph nodes of the initial graph node determines the data storage method; for each initial graph node, when the data storage method is non-super-large point data storage, the node data, neighbor information, and outgoing edge of the initial graph node
  • the index feature information and the outbound data are stored in the first start graph node data block of the first data storage medium, the outbound index feature information includes the outbound index features of all outbound edges of the start graph node, and each outbound A mapping relationship is formed between the edge index feature and the outbound data index for indexing the corresponding outbound data stored in the first initial graph node data block; and for each initial graph node, the data storage method is When super-large point data is stored, the node data, neighbor information, outbound index feature range information and outbound data block index of the initial graph node are stored in the second
  • FIG. 15 shows a schematic diagram of a data query device 1500 implemented based on a computer system according to an embodiment of the present specification.
  • the data query device 1500 may include at least one processor 1510, memory (for example, non-volatile memory) 1520, memory 1530 and communication interface 1540, and at least one processor 1510, memory 1520, memory 1530 and The communication interfaces 1540 are connected together via a bus 1560 .
  • At least one processor 1510 executes at least one computer-readable instruction (ie, the elements implemented in software described above) stored or encoded in a memory.
  • computer-executable instructions are stored in the memory, and when executed, at least one processor 1510: in response to receiving a user-initiated data query request, determines the graph to be queried based on the node identifier of the node in the graph to be queried
  • the data block index of the node, the directed graph data is stored in the data storage medium according to the method described above; the data block index indexed start graph node data block from the first data storage medium or the second data
  • the storage medium is read into the internal memory of the data query device and analyzed; according to the parsed initial graph node data blocks, the local parsed data of the data query device or from the outbound data blocks of the third data storage medium are obtained query data requested by the data query; and providing the acquired query data to the user.
  • a program product such as a machine-readable medium (eg, a non-transitory machine-readable medium) is provided.
  • the machine-readable medium may have instructions (that is, the above-mentioned elements implemented in software), and the instructions, when executed by the machine, cause the machine to perform the various operations and operations described above in conjunction with FIGS. 1-13 in various embodiments of this specification.
  • Function Specifically, a system or device equipped with a readable storage medium can be provided, on which a software program code for realizing the functions of any one of the above embodiments is stored, and the computer or device of the system or device can The processor reads and executes the instructions stored in the readable storage medium.
  • the program code itself read from the readable medium can realize the function of any one of the above-mentioned embodiments, so the machine-readable code and the readable storage medium storing the machine-readable code constitute the present invention. a part of.
  • Examples of readable storage media include floppy disks, hard disks, magneto-optical disks, optical disks (such as CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD-RW), magnetic tape, non- Volatile memory card and ROM.
  • the program code can be downloaded from a server computer or cloud via a communication network.
  • a computer program product includes a computer program, and when the computer program is executed by a processor, the processor executes the above described in conjunction with FIGS. 1-13 in various embodiments of this specification. Various operations and functions.
  • the execution order of each step is not fixed, and can be determined as required.
  • the device structures described in the above embodiments may be physical structures or logical structures, that is, some units may be realized by the same physical entity, or some units may be realized by multiple physical entities, or may be realized by multiple physical entities. Certain components in individual devices are implemented together.
  • the hardware units or modules may be implemented mechanically or electrically.
  • a hardware unit, module, or processor may include permanently dedicated circuitry or logic (such as a dedicated processor, FPGA, or ASIC) to perform the corresponding operations.
  • the hardware unit or processor may also include programmable logic or circuits (such as a general-purpose processor or other programmable processors), which can be temporarily set by software to complete corresponding operations.
  • the specific implementation mechanical way, or a dedicated permanent circuit, or a temporary circuit

Abstract

提供了一种数据存储方法及装置、数据查询方法及装置以及数据库系统。在进行数据存储时,确定待存储的有向图图数据中的各个起始图节点的邻居图节点数量,并且根据各个起始图节点的邻居图节点数量确定数据存储方式。针对各个起始图节点,在数据存储方式为非超大点数据存储时,将该起始图节点的节点数据、邻居信息、出边索引特征信息以及出边数据存储在同一数据分块中。在数据存储方式为超大点数据存储时,将该起始图节点的节点数据、邻居信息、出边索引特征范围信息以及出边数据分块索引存储在起始图节点数据分块中,以及将该起始图节点的出边数据以及出边数据存储地址信息存储在至少两个出边数据分块中。

Description

数据存储及查询 技术领域
本说明书实施例通常涉及数据处理领域,尤其涉及适用于图数据的数据存储方法及装置、数据查询方法及装置以及数据库系统。
背景技术
图数据的应用场景越来越广泛,并且图数据的数量也越来越庞大。内存存储方式受限于内存的容量和价格,数据存储规模有限,不适用于存储海量图数据,从而需要将图数据存储在比如磁盘的数据存储介质中。现有图数据存储方案不能以具有较高数据查询效率的方式将图数据存储在数据存储介质中。
发明内容
鉴于上述,本说明书实施例提供数据存储及查询方案。利用该数据存储及查询方案,可以以点边混合存储的方式将图数据存储到数据存储介质,并且实现高效的数据查询。
根据本说明书实施例的一个方面,提供一种数据存储方法,包括:确定待存储的有向图图数据中的各个起始图节点的邻居图节点数量;根据各个起始图节点的邻居图节点数量确定数据存储方式;针对各个起始图节点,在所述数据存储方式为非超大点数据存储时,将该起始图节点的节点数据、邻居信息、出边索引特征信息以及出边数据存储到第一数据存储介质的第一起始图节点数据分块,所述出边索引特征信息包括该起始图节点的所有出边的出边索引特征,每个出边索引特征与用于索引所述第一起始图节点数据分块中存储的对应出边数据的出边数据索引之间形成映射关系;以及针对各个起始图节点,在所述数据存储方式为超大点数据存储时,将该起始图节点的节点数据、邻居信息、出边索引特征范围信息以及出边数据分块索引存储到第二数据存储介质的第二起始图节点数据分块,所述出边索引特征范围信息包括与出边数据分块索引之间形成映射关系的多个出边索引特征范围,以及将该起始图节点的出边数据以及出边数据存储地址信息存储到第三数据存储介质的至少两个出边数据分块,所述出边数据存储地址信息包括二元数组<出边数据的出边索引特征,出边数据在出边数据分块中的相对存储地址>。
在上述方面的一个示例中,所述数据存储方式相对于所述有向图图数据中的所有起始图节点确定出,或者所述数据存储方式相对于所述有向图图数据中的各个起始图节点分别确定出。
在上述方面的一个示例中,所述节点数据包括起始图节点的节点标识以及节点属性,所述邻居信息包括起始图节点的节点标识以及邻居属性,所述邻居属性包括所有出边的基本信息,以及所述出边数据包括出边标识以及出边属性。
在上述方面的一个示例中,每个出边的基本信息包括该出边的终止图节点的节点标识以及该出边的出边索引特征,以及所述出边标识包括终止图节点的节点标识以及出边索引特征。
在上述方面的一个示例中,每个出边的基本信息还包括该出边的终止图节点的节点类型和/或该出边的出边类型,以及所述出边标识还包括出边类型。
在上述方面的一个示例中,所述节点数据还包括节点元数据,所述节点元数据包括起始图节点的节点索引特征和/或节点类型。
在上述方面的一个示例中,所述索引特征包括时间戳,所述出边索引特征信息包括经过降序排序后的所有出边的出边时间戳,以及所述出边索引特征范围信息包括经过降序排序后的多个出边时间戳范围。
在上述方面的一个示例中,每个出边时间戳范围保存对应出边数据分块的最大出边时间戳和最小出边时间戳。
在上述方面的一个示例中,所述第一起始图节点数据分块和所述第二起始图节点数据分块还存储逆向邻居信息,和/或所述出边数据分块还存储出边数量。
在上述方面的一个示例中,针对邻居数量超过预定阈值的各个起始图节点,在所确定的数据存储方式为超大点数据存储时,将该起始图节点的节点数据、邻居信息、出边索引特征范围信息以及出边数据分块索引存储到第二数据存储介质的第二起始图节点数据分块可以包括:将该起始图节点的节点数据、邻居索引特征范围、邻居数据分块索引、出边索引特征范围信息以及出边数据分块索引存储到第二数据存储介质的第二起始图节点数据分块,以及将邻居信息分别存储到第四数据存储介质的至少两个邻居数据分块,其中,所述邻居索引特征范围包括与邻居数据分块索引之间形成映射关系的多个邻 居索引特征范围。
在上述方面的一个示例中,所述第一数据存储介质、所述第二数据存储介质以及所述第三数据存储介质分别包括一个或多个数据存储介质,以及所述第一数据存储介质、所述第二数据存储介质以及所述第三数据存储介质中的部分数据存储介质采用同一数据存储介质实现。
在上述方面的一个示例中,所述非超大点数据存储和所述超大点数据存储采用键值对存储方式实现。
根据本说明书的实施例的另一方面,提供一种数据查询方法,包括:响应于接收到用户发起的数据查询请求,基于待查询图节点的节点标识确定该待查询图节点的数据分块索引,有向图图数据按照如上所述的方法存储在第一数据存储介质、第二数据存储介质和/或第三数据存储介质中;将所述数据分块索引所索引的起始图节点数据分块从第一数据存储介质或第二数据存储介质读取到数据查询装置的内存中并进行解析;根据解析后的起始图节点数据分块,在所述数据查询装置的本地解析数据中或者从第三数据存储介质的出边数据分块获取所述数据查询请求的查询数据;以及将所获取的查询数据提供给所述用户。
在上述方面的一个示例中,所述节点数据包括起始图节点的节点标识以及节点属性,所述邻居信息包括起始图节点的节点标识以及邻居属性,所述邻居属性包括所有出边的基本信息,所述出边数据包括出边标识以及出边属性。相应地,根据解析后的起始图节点数据分块,在所述数据查询装置的本地解析数据中或者从第三数据存储介质的出边数据分块获取所述数据查询请求的查询数据可以包括:响应于所述数据查询请求指示查询图节点的节点属性,获取解析出的起始图节点数据分块的节点数据中的节点属性,作为所述查询数据,响应于所述数据查询请求指示查询图节点的邻居属性,获取解析出的邻居信息中的邻居属性,作为所述查询数据,或者响应于所述数据查询请求指示查询图节点的出边属性,从解析出的起始图节点数据分块的邻居信息中确定目标出边的出边索引特征,基于所述出边索引特征和所述出边索引特征信息确定所述目标出边的出边数据索引,以及从所述出边数据索引所索引的解析后的起始图节点数据分块的出边数据中获取所述目标出边的出边属性作为所述查询数据,或者基于所述出边索引特征和所述出边索引特征范围信息确定所述目标出边的出边数据分块索引,从所述出边数据分块索引所索引的第三数据存储介质的出边数据分块中获取所述目标出边的出边属性作为所述查询数据。
在上述方面的一个示例中,所述数据查询请求包括过滤条件。相应地,响应于所述数据查询请求指示查询图节点的节点属性,获取解析出的起始图节点数据分块的节点数据中的节点属性可以包括:响应于所述数据查询请求指示查询图节点的节点属性,基于所述数据查询请求中的过滤条件对解析出的起始图节点数据分块的节点数据进行查询过滤,并且获取经过查询过滤后的节点数据的节点属性。响应于所述数据查询请求指示查询图节点的邻居属性,获取解析出的邻居信息中的邻居属性可以包括:响应于所述数据查询请求指示查询图节点的邻居属性,基于所述数据查询请求中的过滤条件对解析出的邻居信息进行查询过滤,并且获取经过查询过滤后的邻居信息中的邻居属性。响应于所述数据查询请求指示查询图节点的出边属性,从解析出的起始图节点数据分块的邻居信息中确定目标出边的出边索引特征可以包括:响应于所述数据查询请求指示查询图节点的出边属性,从解析出的起始图节点数据分块的邻居信息中确定符合所述过滤条件的目标出边的出边索引特征。
在上述方面的一个示例中,从所述出边数据分块索引所索引的第三数据存储介质的出边数据分块中获取所述目标出边的出边属性作为所述查询数据可以包括:将所述出边数据分块索引所索引的出边数据分块从第三数据存储介质读取到所述数据查询装置的内存中;解析所读取的出边数据分块中的出边数据存储地址信息;基于所述目标出边的出边索引特征,从解析后的出边数据存储地址信息中确定所述目标出边的出边数据在所述出边数据分块中的相对存储地址;根据所述相对存储地址从所读取的出边数据分块中获取所述目标出边的出边数据并进行解析;以及获取解析后的所述目标出边的出边数据中的出边属性,作为所述查询数据。
在上述方面的一个示例中,所述出边索引特征包括出边时间戳。相应地,基于所述出边索引特征和所述出边索引特征信息确定所述目标出边的出边数据索引可以包括:基于所述出边时间戳,使用二分查找法在所述出边时间戳信息中查找来确定所述目标出边的出边数据索引。或者,基于所述出边索引特征和所述出边索引特征范围信息确定所述目标出边的出边数据分块索引可以包括:基于所述出边时间戳,使用二分查找法在所述 出边时间戳范围信息中查找来确定所述目标出边的出边数据分块索引。
在上述方面的一个示例中,针对邻居数量超过预定阈值的各个起始图节点,在所述数据存储方式为超大点数据存储时,所述第二起始图节点数据分块存储起始图节点的节点数据、邻居索引特征范围、邻居数据分块索引、出边索引特征范围信息以及出边数据分块索引,以及起始图节点的邻居信息分别存储到第四数据存储介质的至少两个邻居数据分块。相应地,在获取解析出的邻居信息中的邻居属性之前,所述数据查询方法还可以包括:响应于所述数据查询请求指示查询图节点的邻居属性,基于邻居索引特征和所述邻居索引特征范围信息确定邻居数据分块索引;以及将所述邻居数据分块索引所索引的邻居数据分块从第四数据存储介质读取到所述数据查询装置的内存并进行解析。
根据本说明书的实施例的另一方面,提供一种数据存储装置,包括:节点数量确定单元,确定待存储的有向图图数据中的各个起始图节点的邻居图节点数量;数据存储方式确定单元,根据各个起始图节点的邻居图节点数量确定数据存储方式;以及数据存储单元,针对各个起始图节点,在所述数据存储方式为非超大点数据存储时,将该起始图节点的节点数据、邻居信息、出边索引特征信息以及出边数据存储到第一数据存储介质的第一起始图节点数据分块,所述出边索引特征信息包括该起始图节点的所有出边的出边索引特征,每个出边索引特征与用于索引所述第一起始图节点数据分块中存储的对应出边数据的出边数据索引之间形成映射关系;以及在所述数据存储方式为超大点数据存储时,将该起始图节点的节点数据、邻居信息、出边索引特征范围信息以及出边数据分块索引存储到第二数据存储介质的第二起始图节点数据分块,所述出边索引特征范围信息包括与出边数据分块索引之间形成映射关系的多个出边索引特征范围,以及将该起始图节点的出边数据以及出边数据存储地址信息存储到第三数据存储介质的至少两个出边数据分块,所述出边数据存储地址信息包括二元数组<出边数据的出边索引特征,出边数据在出边数据分块中的相对存储地址>。
在上述方面的一个示例中,所述数据存储方式确定单元相对于所述有向图图数据中的所有起始图节点确定所述数据存储方式,或者相对于所述有向图图数据中的各个起始图节点确定所述数据存储方式。
在上述方面的一个示例中,针对邻居数量超过预定阈值的各个起始图节点,在所述数据存储方式为超大点数据存储时,所述数据存储单元将该起始图节点的节点数据、邻居索引特征范围、邻居数据分块索引、出边索引特征范围信息以及出边数据分块索引存储到第二数据存储介质的第二起始图节点数据分块,将邻居信息分别存储到第四数据存储介质的至少两个邻居数据分块,以及将该起始图节点的出边数据以及出边数据存储地址信息存储到第三数据存储介质的至少两个出边数据分块。
根据本说明书的实施例的另一方面,提供一种数据查询装置,包括:数据分块索引确定单元,响应于接收到用户发起的数据查询请求,基于待查询图节点的节点标识确定该待查询图节点的数据分块索引,有向图图数据按照如上所述的方法存储到第一数据存储介质、第二数据存储介质和/或第三数据存储介质中;数据读取单元,将所述数据分块索引所索引的起始图节点数据分块从第一数据存储介质或第二数据存储介质读取到数据查询装置的内存中;数据解析单元,对所读取的起始图节点数据分块进行解析;查询数据获取单元,根据解析后的起始图节点数据分块,在所述数据查询装置的本地解析数据中或者从第三数据存储介质的出边数据分块获取所述数据查询请求的查询数据;以及查询数据提供单元,将所获取的查询数据提供给所述用户。
在上述方面的一个示例中,所述节点数据包括起始图节点的节点标识以及节点属性,所述邻居信息包括起始图节点的节点标识以及邻居属性,所述邻居属性包括所有出边的基本信息,所述出边数据包括出边标识以及出边属性。响应于所述数据查询请求指示查询图节点的节点属性,所述查询数据获取单元获取解析出的起始图节点数据分块的节点数据中的节点属性,作为所述查询数据。响应于所述数据查询请求指示查询图节点的邻居属性,所述查询数据获取单元获取解析出的邻居信息中的邻居属性,作为所述查询数据。响应于所述数据查询请求指示查询图节点的出边属性,所述查询数据获取单元从解析后的起始图节点数据分块的邻居信息中确定目标出边的出边索引特征,基于所述出边索引特征和所述出边索引特征信息确定所述目标出边的出边数据索引,以及从所述出边数据索引所索引的解析后的起始图节点数据分块的出边数据中获取所述目标出边的出边属性作为所述查询数据,或者基于所述出边索引特征和所述出边索引特征范围信息确定所述目标出边的出边数据分块索引,以及从所述出边数据分块索引所索引的第三数据存储介质的出边数据分块中获取所述目标出边的出边属性作为所述查询数据。
在上述方面的一个示例中,所述数据查询请求包括过滤条件。响应于所述数据查询 请求指示查询图节点的节点属性,所述查询数据获取单元进一步基于所述数据查询请求中的过滤条件对解析出的起始图节点数据分块的节点数据进行查询过滤。响应于所述数据查询请求指示查询图节点的节点属性,所述查询数据获取单元基于所述数据查询请求中的过滤条件对解析出的邻居信息进行查询过滤。响应于所述数据查询请求指示查询图节点的出边属性,所述查询数据获取单元进一步从解析后的起始图节点数据分块的邻居信息中确定符合所述过滤条件的目标出边的出边索引特征。
在上述方面的一个示例中,所述数据读取单元将所述出边数据分块索引所索引的出边数据分块从第三数据存储介质读取到所述数据查询装置的内存中。所述数据解析单元解析所述出边数据分块中的出边数据存储地址信息;基于所述目标出边的出边索引特征。所述数据查询单元被配置为:从解析后的出边数据存储地址信息中确定所述目标出边在所述出边数据分块中的相对存储地址;根据所述相对存储地址从所读取的出边数据分块中获取所述目标出边的出边数据并进行解析;以及获取解析后的所述目标出边的出边数据中的出边属性,作为所述查询数据。
在上述方面的一个示例中,针对邻居数量超过预定阈值的各个起始图节点,在所述数据存储方式为超大点数据存储时,所述第二起始图节点数据分块存储起始图节点的节点数据、邻居索引特征范围、邻居数据分块索引、出边索引特征范围信息以及出边数据分块索引,以及邻居信息分别存储到第四数据存储介质的至少两个邻居数据分块。响应于所述数据查询请求指示查询图节点的邻居属性,所述查询数据获取单元基于邻居索引特征和所述邻居索引特征范围信息确定邻居数据分块索引,以及所述数据读取单元将所述邻居数据分块索引所索引的邻居数据分块从第四数据存储介质读取到所述数据查询装置的内存。所述数据解析单元进一步解析所读取的邻居数据分块。
根据本说明书的实施例的另一方面,提供一种数据库系统,包括:如上所述的数据存储装置;如上所述的数据查询装置;以及至少一个数据存储介质,包括第一数据存储介质、第二数据存储介质和/或第三数据存储介质。
根据本说明书的实施例的另一方面,提供一种数据存储装置,包括:至少一个处理器,与所述至少一个处理器耦合的存储器,以及存储在所述存储器中的计算机程序,所述至少一个处理器执行所述计算机程序来实现如上所述的数据存储方法。
根据本说明书的实施例的另一方面,提供一种数据查询装置,包括:至少一个处理器,与所述至少一个处理器耦合的存储器,以及存储在所述存储器中的计算机程序,所述至少一个处理器执行所述计算机程序来实现如上所述的数据查询方法。
根据本说明书的实施例的另一方面,提供一种计算机可读存储介质,其存储有可执行指令,所述指令当被执行时使得处理器执行如上所述的数据存储方法或者执行如上所述的数据查询方法。
根据本说明书的实施例的另一方面,提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行来实现如上所述的数据存储方法或者执行如上所述的数据查询方法。
附图说明
通过参照下面的附图,可以实现对于本说明书内容的本质和优点的进一步理解。在附图中,类似组件或特征可以具有相同的附图标记。
图1示出了根据本说明书的实施例的数据库系统的示例示意图。
图2示出了根据本说明书的实施例的数据存储方法的示例流程图。
图3示出了有向图图数据的示例示意图。
图4示出了根据本说明书的实施例的非超大点数据存储过程的示例示意图。
图5示出了根据本说明书的实施例的第一起始图节点数据分块的示例示意图。
图6示出了根据本说明书的实施例的超大点数据存储过程的示例流程图。
图7示出了根据本说明书的实施例的超大点数据存储过程的另一示例流程图。
图8示出了根据本说明书的实施例的数据查询方法的示例流程图。
图9示出了根据本说明书的实施例的查询数据获取过程的示例流程图。
图10示出了根据本说明书的实施例的用于从第四数据存储介质的邻居数据分块中获取邻居属性的方法的示例流程图。
图11示出了根据本说明书的实施例的用于从第三数据存储介质的出边数据分块中获取出边属性的方法的示例流程图。
图12示出了根据本说明书的实施例的数据存储装置的示例方框图。
图13示出了根据本说明书的实施例的数据查询装置的示例方框图。
图14示出了根据本说明书的实施例的基于计算机系统实现的数据存储装置的示例示意图。
图15示出了根据本说明书的实施例的基于计算机系统实现的数据查询装置的示例示意图。
具体实施方式
现在将参考示例实施方式讨论本文描述的主题。应该理解,讨论这些实施方式只是为了使得本领域技术人员能够更好地理解从而实现本文描述的主题,并非是对权利要求书中所阐述的保护范围、适用性或者示例的限制。可以在不脱离本说明书内容的保护范围的情况下,对所讨论的元素的功能和排列进行改变。各个示例可以根据需要,省略、替代或者添加各种过程或组件。例如,所描述的方法可以按照与所描述的顺序不同的顺序来执行,以及各个步骤可以被添加、省略或者组合。另外,相对一些示例所描述的特征在其它例子中也可以进行组合。
如本文中使用的,术语“包括”及其变型表示开放的术语,含义是“包括但不限于”。术语“基于”表示“至少部分地基于”。术语“一个实施例”和“一实施例”表示“至少一个实施例”。术语“另一个实施例”表示“至少一个其他实施例”。术语“第一”、“第二”等可以指代不同的或相同的对象。下面可以包括其他的定义,无论是明确的还是隐含的。除非上下文中明确地指明,否则一个术语的定义在整个说明书中是一致的。
图数据包括图节点数据和边数据。图节点数据例如可以包括图节点的节点标识(节点ID)和节点属性,以及边数据可以包括边属性数据。图节点的节点标识用于唯一标识节点。节点标识、节点属性数据和边属性数据可以与业务相关。比如,对于社交网络场景,节点标识可以是人的身份证号码或者人员编号等。节点属性数据可以包括年龄、学历、住址、职业等。边属性数据可以包括节点与节点之间的关系,即,人与人之间的关系,比如,同学/同事关系等。在海量图数据的情况下,存在海量图节点数据和边数据,并且图节点数据和边数据之间存在复杂的关联关系。数据存储介质中的图数据存储方式会大大影响图数据查询效率。
在一些图数据存储方案中,对边数据进行单独存储。按照这种图数据存储方案,某个起始点的多个出边数据可能会存储在不同的数据分块(边表)中。在进行边信息查询(例如,查询出边属性)时,可能会需要进行多次IO查询(IO读取)。这里,IO查询是指通过数据查询装置的操作系统的IO接口来与数据存储介质(例如,磁盘)进行数据交互,由此将IO查询结果(IO查询数据)从数据存储介质读取到数据查询装置的内存中。IO查询可能会成为数据查询装置的系统瓶颈,从而延长数据查询延时,由此导致图数据查询效率不佳。此外,在这种图数据存储方案中,在图节点的邻居数量非常大的情况下,查询该图节点的指定时间戳的出边信息时,需要遍历该图节点的所有出边,从而导致需要进行较多不必要的数据查询及数据解析,由此导致数据查询效率较低。
在一些图数据存储方案中,将图节点的节点数据、边数据以及邻居信息存储成具有不同数据结构的点表、边表和邻居表,并且分别部署在独立的点表服务端、边表服务端和邻居表服务端中。由于不同服务端上的存储数据异构,从而使得在图拓扑结构更新频繁的情况下,邻居表的更新会成为数据库系统的系统瓶颈。
下面将参照附图描述根据本说明书的实施例的数据库系统、数据存储方法、数据存储装置、数据查询方法及数据查询装置。
图1示出了根据本说明书的实施例的数据库系统100的示例示意图。数据库系统100也可以称为数据库应用的服务端,用于提供数据存储服务和数据查询服务。
如图1所示,数据库系统100包括数据存储装置110、数据查询装置120和至少一个数据存储介质130。在图1的示例中,至少一个数据存储介质130包括第一数据存储介质130-1、第二数据存储介质130-2和/或第三数据存储介质130-3。要说明的是,第一数据存储介质130-1、第二数据存储介质130-2和第三数据存储介质130-3虽然例示为一个数据存储介质,但是在其它实施例中,第一数据存储介质130-1、第二数据存储介质130-2和第三数据存储介质130-3中的每种数据存储介质可以具有多个数据存储介质。
数据存储装置110被配置为将待存储数据存储到数据库系统100的数据存储介质130中。数据查询装置120被配置为响应于数据查询请求,从数据存储介质130中获取目标数据(查询数据)。
数据存储介质130也可以称为外部存储器。在数据查询装置120进行数据查询时,需要通过向数据存储介质130发起例如IO查询的数据读取操作来将数据存储介质130中的数据读取到数据查询装置120的内存中,并且在内存中进行数据查询处理得到查询 数据。数据存储介质130例如可以是各种非易失性存储介质,比如,磁盘设备、存储卡等。磁盘是指利用磁记录技术存储数据的存储器。磁盘的示例例如可以包括各种形式的(Soft Disk)软盘和硬盘(Hard Disk)。
图2示出了根据本说明书的实施例的数据存储方法200的示例流程图。
如图2所示,在210,确定待存储的有向图图数据中的各个起始图节点的邻居图节点数量。术语“有向图”是指图节点之间的边关系具有方向性的图数据。在本说明书中,术语“邻居图节点”是指沿着边的方向一跳可以到达的图节点。图3示出了有向图图数据的示例示意图。在图3的示例中,示出了6个图节点A、B、C、D、E和F。图节点B、C和D是图节点A的邻居节点,图节点F是图节点C的邻居节点。图节点E不是图节点A的邻居节点,但图节点E是图节点A的逆向邻居节点。
在如上确定出各个起始图节点的邻居图节点数量后,在220,根据各个起始图节点的邻居图节点数量确定数据存储方式。
在一个示例中,可以相对于有向图图数据中的所有起始图节点确定该有向图图数据的数据存储方式。在该示例中,有向图图数据中的所有图节点的数据存储方式都相同。在这种方式下,在确定出有向图图数据的所有起始图节点的邻居图节点的节点数量后,将邻居图节点的节点数量的最大值与第一阈值进行比较。如果该最大值大于第一阈值,则将数据存储方式确定为超大点数据存储。如果该最大值不大于第一阈值,则将数据存储方式确定为非超大点数据存储。这里,第一阈值可以基于所使用的数据存储介质的存储容量设定,或者可以根据应用场景或者经验设置。
在另一示例中,可以相对于有向图图数据中的各个起始图节点分别确定各个起始图节点的数据存储方式。在该示例中,有向图图数据中的各个图节点的数据存储方式基于各自的邻居节点数量确定,从而各个起始图节点的数据存储方式可能相同,也可能不同。在这种方式下,在确定出一个起始图节点的邻居节点数量后,可以基于该起始图节点的邻居节点数量确定该起始图节点的数据存储方式,并且随后执行后续数据存储操作,而无需等待其它起始图节点的邻居节点数量确定和数据存储方式确定。
在如上完成起始图节点的数据存储方式确定后,按照所确定的数据存储方式完成对应起始图节点的图数据存储。
具体地,在230,针对各个起始图节点,在所确定的数据存储方式为非超大点数据存储时,将该起始图节点的节点数据、邻居信息、出边索引特征信息以及出边数据存储到第一数据存储介质130-1的第一起始图节点数据分块中。在本说明书中,术语“索引特征”可以指代有助于在图数据存储和/或图数据查询时生成或确定索引信息的特定图数据特征。例如,在数据存储时,可以在每个出边的出边索引特征与用于索引数据分块中所存储的对应出边数据的出边数据索引之间形成映射关系。在一个示例中,可以将每个出边的出边索引特征存储在出边索引特征信息中,并且按照每个出边的出边索引特征在出边索引特征信息中的存储顺序(或存储位置)来将对应的出边数据依序存储在数据分块中,由此使得出边索引特征信息中的每个存储位置可以用作对应出边数据的出边数据索引,并且在出边索引特征与边索引特征信息中的存储位置(即,出边数据索引)之间形成映射关系。在数据查询时,可以基于数据索引特征得到对应的数据索引,由此实现数据查询。数据的索引特征的示例例如可以包括时间戳、数据的特定属性等。这里,数据的特定属性可以包括有助于确定数据存储索引的特定属性。例如,在数据属性包括“收款人的年龄”的情况下,在进行数据存储时,可以按照收款人的年龄顺序存储,由此可以基于属性“收款人的年龄”生成数据存储位置的索引,从而属性“收款人的年龄”可以充当索引特征。所述出边索引特征信息包括该起始图节点的所有出边的出边索引特征,并且每个出边索引特征与用于索引第一起始图节点数据分块中存储的对应出边数据的出边数据索引之间形成映射关系。例如,每个出边数据的出边索引特征可以与出边索引特征在出边索引特征信息中的存储顺序建议映射关系,出边索引特征信息中的存储顺序可以用于索引第一起始图节点数据分块中的对应出边数据。比如,假设存在4个出边数据A1、A2、A3和A4,以及出边数据A1、A2、A3和A4的出边索引特征的特征值分别为F1、F2、F3和F4。在进行数据存储时,可以将特征值F1、F2、F3和F4保存在出边索引特征信息中。如果出边索引特征信息中的特征值的存储顺序为F1、F3、F2和F4,则出边数据A1、A2、A3和A4在数据分块中的存储顺序为A1、A3、A2和A4。
在一个示例中,每个第一数据存储介质可以存储一个第一起始图节点数据分块。在另一示例中,第一数据存储介质也可以存储多于一个第一起始图节点数据分块。
图4示出了根据本说明书的实施例的非超大点数据存储过程的示例示意图。如图4所示,有向图图数据中存在n个起始图节点,每个起始图节点的节点数据、邻居信息、 出边索引特征信息以及出边数据被独立地存储在一个第一起始图节点数据分块中。
图5示出了根据本说明书的实施例的第一起始图节点数据分块的示例示意图。
如图5所示,第一起始图节点数据分块可以存储起始图节点的节点数据、邻居信息、出边索引特征信息1~n以及出边数据1~n。例如,第一起始图节点数据分块可以形成为具有多个字段的第一数据结构,该第一数据结构中的各个字段分别用于存储起始图节点的节点数据、邻居信息、出边索引特征信息1~n以及出边数据1~n。此外,第一起始图节点数据分块中还可以存储有逆向邻居信息。逆向邻居信息所具有的内容可以与邻居信息相同。
起始图节点的节点数据可以包括起始图节点的节点标识(节点ID)、节点属性以及节点元数据。起始图节点的节点属性可以包括一个或多个节点属性。每个节点属性可以包括属性名和属性值。节点的属性名例如可以包括“年龄”、“身高”、“职业”等。属性值是指属性名的对应取值。属性名可以用于建立索引,从而在数据查询时支持条件过滤。起始图节点的节点元数据可以包括该起始图节点的索引特征,例如,节点时间戳。在一个示例中,起始图节点的节点元数据还可以包括节点类型。节点类型例如可以是实现节点分类的特征信息,例如,“人”、“公司”、“设备”等。在图5的示例中,节点元数据包括时间戳和节点类型。此外,在其它实施例中,节点数据也可以不包括节点元数据。
邻居信息可以包括起始图节点的节点标识以及邻居属性。邻居属性包括起始图节点的所有出边的基本信息。每个出边的基本信息可以包括该出边的终止图节点的节点标识(终点ID)以及该出边的出边索引特征。在一个示例中,每个出边的基本信息可以按照每个出边的出边索引特征在出边索引特征信息中的对应存储顺序依序存储在邻居信息中。在图5的示例中,出边索引特征为出边时间戳。此外,起始图节点的邻居信息还可以包括终止图节点的终点类型和出边类型。
在一些实施例中,出边索引特征信息可以包括起始图节点的所有出边的经过排序后的出边索引特征。出边数据在起始图节点数据分块中的存储顺序与其出边索引特征在出边索引特征信息中的存储顺序相同。在一个示例中,用于存储出边索引特征信息的存储位置(例如,字段)可以包括多个出边索引特征存储位置,每个出边索引特征存储位置用于存储一个出边索引特征,并且每个出边索引特征的存储位置可以索引后续的一个对应出边数据,即,该存储位置充当出边数据的数据索引。如图5所示,出边索引特征信息包括n个用于存储出边索引特征的存储位置,其中,第k个存储位置对应后续第k个出边数据,1≤k≤n,并且k为正整数。例如,在数据索引特征是时间戳的情况下,可以将所有出边的出边时间戳按照降序排序保存在出边时间戳信息中。然后,在存储出边数据时,可以将各个出边数据按照其时间戳在出边时间戳信息中的保存顺序依序存储。
出边数据可以包括出边标识以及出边属性。出边标识可以包括终止图节点的节点标识(终点ID)以及出边索引特征。此外,出边标识还可以包括出边类型。出边类型例如可以是实现边分类的特征信息。例如,在该出边指示账户转账时,出边类型可以为“转账”。在该出边指示支付时,出边类型可以为“支付”。出边属性可以包括一个或多个出边属性。每个出边属性可以包括属性名和属性值。出边属性的属性名例如可以包括“金额”、“货币”、“操作设备”等。属性值是指属性名的对应取值。
回到图2,在240,针对各个起始图节点,在所确定的数据存储方式为超大点数据存储时,将该起始图节点的节点数据、邻居信息、出边索引特征范围信息以及出边数据分块索引存储到第二数据存储介质的第二起始图节点数据分块中。例如,第二起始图节点数据分块可以形成为具有多个字段的第二数据结构,该第二数据结构中的各个字段分别用于存储起始图节点的节点数据、邻居信息、出边索引特征范围信息以及出边数据分块索引。出边索引特征范围可以包括与出边数据分块索引之间形成映射关系的多个出边索引特征范围,从而使得每个出边索引特征范围信息可以用于索引一个出边数据分块索引。例如,每个出边索引特征范围信息的存储顺序可以对应一个出边数据分块索引。每个出边索引特征范围中可以存储对应出边数据分块中的最大出边索引特征值和最小出边索引特征值。此外,将该起始图节点的出边数据以及出边数据存储地址信息存储在第三数据存储介质的至少两个出边数据分块中,所述出边数据存储地址信息包括二元数组<出边数据的出边索引特征,出边数据在出边数据分块中的相对存储地址>。换言之,每个出边数据分块可以形成为具有多个字段的第三数据结构,该第三数据结构中的各个字段分别用于存储至少两个出边数据以及对应的出边数据存储地址信息,如图6所示。这里,出边数据在出边数据分块中的相对存储地址可以是相对于该出边数据分块的首地址的偏移量。要说明的是,在出边数据分块中,出边数据存储在出边数据存储地址信息之前。 此外,出边数据分块还可以存储出边数量。在这种情况下,出边数量存储在出边数据分块的头部,即,存储在所有出边数据之前。
图6示出了根据本说明书的实施例的超大点数据存储过程的示例流程图。图6中的起始图节点的节点数据、邻居信息以及出边数据的定义及其存储内容与图5中的节点数据、邻居信息以及出边数据完全相同,在此不再描述。
在出边索引特征为时间戳的情况下,出边索引特征范围信息包括经过排序后的多个出边时间戳范围,每个出边时间戳范围保存对应出边数据分块的最大出边时间戳和最小出边时间戳。出边数据分块索引用于保存出边数据分块的索引信息。例如,出边数据分块索引可以用于保存出边数据分块在第三数据存储介质中的起始存储地址(即,存储首地址)。
针对邻居数量超过预定阈值的各个起始图节点,如果所确定的数据存储方式为超大点数据存储,则在针对该类起始图节点进行图数据存储时,将该起始图节点的节点数据、邻居索引特征范围、邻居数据分块索引、出边索引特征范围信息以及出边数据分块索引存储到第二数据存储介质的第二起始图节点数据分块中,将邻居信息分别存储到第四数据存储介质的至少两个邻居数据分块中,以及将该起始图节点的出边数据以及出边数据存储地址信息存储到第三数据存储介质的至少两个出边数据分块中。同样,邻居索引特征范围包括与邻居数据分块索引形成映射关系的多个邻居索引特征范围,从而使得每个邻居索引特征范围可以索引一个邻居数据分块索引。例如,可以使得每个邻居索引特征范围信息的存储顺序对应一个邻居数据分块索引,由此使得存储在该存储顺序上的邻居索引特征范围可以映射到该存储顺序所对应的邻居数据分块索引。在一个示例中,每个邻居索引特征范围用于存储对应邻居数据分块的最大索引特征值和最小索引特征值。邻居数据分块索引用于保存邻居数据分块的索引信息。例如,邻居数据分块索引可以用于保存邻居数据分块在第四数据存储介质中的起始存储地址(即,存储首地址)。图7示出了根据本说明书的实施例的超大点数据存储过程的另一示例流程图。
要说明的是,在一个示例中,在将起始图节点的图数据存储到第一起始图节点数据分块或第二起始图节点数据分块后,可以基于该起始图节点的节点标识生成第一起始图节点数据分块或第二起始图节点数据分块的数据分块索引。例如,通过计算该起始图节点的节点标识的完美哈希值,对该完美哈希值按照起始图节点的节点数量取模,并将经过取模处理后得到的数值作为第一起始图节点数据分块或第二起始图节点数据分块的数据分块索引。
要说明的是,在一些实施例中,第一数据存储介质、第二数据存储介质、第三数据存储介质和/或第四数据存储介质中的部分数据存储介质可以采用同一数据存储介质实现。此外,在一些实施例中,非超大点数据存储和超大点数据存储可以采用键值对存储方式实现。键值对存储的示例例如可以包括但不限于:基于完美哈希技术实现的键值对存储、基于LevelDB的键值对存储、基于RocksDB的键值对存储以及基于Redis的键值对存储。
如上参照附图描述了根据本说明书的实施例的数据存储过程。在按照上述数据存储方法将数据存储到数据库系统中后,可以响应于用户发起的数据查询请求来进行数据查询。
图8示出了根据本说明书的实施例的数据查询过程800的示例流程图。
如图8所示,在810,响应于接收到用户发起的数据查询请求,基于待查询图节点的节点标识确定待查询图节点的数据分块索引。所述数据分块索引用于索引数据存储介质中存储的对应起始图节点数据分块。例如,可以通过计算待查询图节点的节点标识的完美哈希值并对该完美哈希值按照起始图节点的节点数量取模,确定待查询图节点的数据分块索引。
在820,将数据分块索引所索引的起始图节点数据分块从第一数据存储介质或第二数据存储介质读取到数据查询装置的内存中并进行解析。
在830,根据解析后的起始图节点数据分块,在数据查询装置的本地解析数据中或者从第三数据存储介质的出边数据分块获取数据查询请求的查询数据。查询数据获取过程将在下面参照附图详细描述。
在840,将所获取的查询数据提供给用户。
图9示出了根据本说明书的实施例的查询数据获取过程900的示例流程图。在图9的示例中,节点数据包括起始图节点的节点标识、节点属性以及节点元数据。节点元数据包括该起始图节点的节点索引特征。邻居信息包括起始图节点的节点标识以及邻居属性。邻居属性包括所有出边的基本信息,每个出边的基本信息包括该出边的结束图节点 的节点标识以及该出边的出边索引特征。出边数据包括出边标识以及出边属性。所述出边标识包括结束图节点的节点标识以及出边索引特征。此外,节点元数据还可以包括节点类型。
如图9所示,在910,接收数据查询请求。在920,响应于数据查询请求指示查询图节点的节点属性,基于数据查询请求中的过滤条件对解析出的节点数据进行查询过滤。例如,假设数据查询请求中的过滤条件为索引特征(例如,时间戳),则可以基于解析出的节点数据的节点元数据中的节点索引特征进行查询过滤。此外,在数据查询请求中还包括节点类型的情况下,则可以基于解析出的节点数据的节点元数据中的索引特征和节点类型来进行查询过滤。在其它示例中,数据查询请求也可以包括其它过滤条件。在930,获取经过查询过滤后的节点数据的节点属性作为查询数据。在另一示例中,数据查询请求也可以不包括过滤条件,由此在获取节点属性时可以不进行查询过滤,从而获取解析出的节点数据中的所有节点属性作为查询数据。
在940,响应于数据查询请求指示查询图节点的邻居属性,基于数据查询请求中的过滤条件对解析出的邻居信息进行查询过滤。邻居信息的查询过滤可以采用与节点数据过滤类似的查询过滤方式。在邻居属性包括终点标识、终点类型、出边类型和出边索引特征的情况下,可以利用终点标识、终点类型、出边类型和出边索引特征进行查询过滤。在其它示例中,数据查询请求也可以包括其它过滤条件。在950,获取经过查询过滤后的邻居属性作为查询数据。在另一示例中,数据查询请求也可以不包括过滤条件,由此在获取邻居属性时可以不进行查询过滤,从而获取解析出的邻居信息中的所有邻居属性作为查询数据。
在一个示例中,针对邻居数量超过预定阈值的各个起始图节点,在所确定的数据存储方式为超大点数据存储时,第二起始图节点数据分块存储起始图节点的节点数据、邻居索引特征范围、邻居数据分块索引、出边索引特征范围信息以及出边数据分块索引,以及起始图节点的邻居信息分别存储在第四数据存储介质的至少两个邻居数据分块中。
图10示出了根据本说明书的实施例的用于从第四数据存储介质的邻居数据分块中获取邻居属性的方法1000的示例流程图。
如图10所示,在1010,响应于数据查询请求指示查询图节点的邻居属性,基于邻居索引特征在邻居索引特征范围信息中确定出邻居数据分块索引。
在1020,将邻居数据分块索引所索引的邻居数据分块从第四数据存储介质读取到数据查询装置的内存并进行解析。
在1030,基于数据查询请求中的过滤条件,对解析出的邻居数据分块中的邻居信息进行查询过滤。
在1040,获取经过查询过滤后的邻居属性作为查询数据。
同样,在图10的示例中,数据查询请求也可以不包括过滤条件,从而可以将解析出的邻居数据分块中的所有邻居属性作为查询数据。
回到图9,在960,响应于数据查询请求指示查询图节点的出边属性,从解析后的起始图节点数据分块中确定符合数据查询请求中的过滤条件的目标出边的出边索引特征。例如,在可以基于解析后的起始图节点数据分块中的邻居信息进行查询过滤,找到符合过滤条件的终点标识(ID)。上述查询过滤过程可以采用与上面参照邻居属性查询相同的查询过滤过程。然后,提取出所找出的终点ID所对应的出边索引特征。在另一示例中,在数据查询请求不包括过滤条件的情况下,也可以不执行查询过滤。
在970,基于出边索引特征以及出边索引特征信息确定目标出边的出边数据索引,或者基于出边索引特征和出边索引特征范围信息确定目标出边的出边数据分块索引。在索引特征为时间戳的情况下,可以基于出边时间戳,使用二分查找法来在出边时间戳信息中确定目标出边的出边数据索引,或者在出边时间戳范围信息中确定目标出边的出边数据分块索引。
在980,响应于确定出目标出边的出边数据索引,从出边数据索引所索引的经过解析后的起始图节点数据分块的出边数据中获取目标出边的出边属性,作为查询数据。
在990,响应于确定出目标出边的出边数据分块索引,从出边数据分块索引所索引的第三数据存储介质的出边数据分块中获取目标出边的出边属性,作为查询数据。
图11示出了根据本说明书的实施例的用于从第三数据存储介质的出边数据分块中获取出边属性的方法1100的示例流程图。
如图11所示,在1110,将出边数据分块索引所索引的出边数据分块从第三数据存储介质读取到数据查询装置的内存中。
在1120,解析所读取的出边数据分块中的出边数据存储地址信息。在本说明书的 实施例中,出边数据存储地址信息通常存储在出边数据分块的结束位置处,即,存储在出边数据分块之后。在进行出边数据分块解析时,可以自结束位置处向开始位置进行数据解析,从而可以首先解析得到出边数据分块中的出边存储地址信息。此外,出边数据分块还可以存储出边数量。基于出边数量,可以很容易地从出边数据分块中提取出出边数据存储地址信息。
在1130,基于目标出边的出边索引特征,从解析后的出边数据存储地址信息中确定目标出边在出边数据分块中的相对存储地址。
在1140,根据所确定的相对存储地址,从所读取的出边数据分块中获取所述目标出边的出边数据并进行解析。
在1150,获取解析出的目标出边的出边数据中的出边属性,作为查询数据。
按照上述查询数据获取方式,通过先解析得到出边数据分块中的出边数据存储地址信息(即,部分解析),并且基于目标出边的出边索引特征确定出目标出边在出边数据分块中的相对存储地址,从而可以从出边数据分块中仅仅获取目标出边的出边数据来进一步解析,而无需获取并解析其余出边数据,由此大大降低数据查询过程的数据处理量,进而提升数据查询效率。
要说明的是,在一个示例中,邻居信息的邻居属性包括出边索引特征和出边类型,以及出边标识包括出边索引特征、终点标识和出边类型。在这种情况下,在找到终点标识并提取出对应的出边索引特征后,基于出边索引特征确定出边数据索引或者出边数据分块索引。在查询到出边数据索引或者出边数据分块索引所索引的目标出边的出边数据后,需要基于上述邻居属性处理(即,终点ID查找处理)中得到的边索引特征、终点标识和出边类型,对该目标出边的各条数据进行匹配。如果存在匹配的出边数据,则获取该匹配的出边数据的出边属性作为查询数据。在非超大点数据存储的情况下,由于起始图节点数据分块被一次性解析出,从而可以直接获取本地解析出的匹配出边数据的出边属性作为查询数据。在超大点数据存储的情况下,如果存在匹配的出边数据,则从出边数据分块中读取该匹配的出边数据并进行解析,然后,获取解析后的出边数据的出边属性作为查询数据。对于不匹配的出边数据,则不进行出边属性获取。按照上述处理方式,在获取到目标出边的出边数据后,通过利用边索引特征、终点标识和出边类型对需要获取的出边数据再次进行匹配处理,并且对匹配数据进行解析得到查询数据,可以使得所获取的出边数据更加准确,并且进一步降低解析处理的数据量,从而进一步提升图数据查询效率。
利用上述数据存储及查询方案,在图节点的邻居节点规模不大时,通过将该图节点的节点数据、邻居信息、出边索引特征信息和出边数据存储在同一数据分块中,使得可以通过针对数据存储介质的一次IO读取操作实现数据查询。此外,在图节点的邻居节点规模较大时,通过将该起始图节点的节点数据、邻居信息、出边索引特征范围信息以及出边数据分块索引存储到第二数据存储介质的第二起始图节点数据分块,以及将该起始图节点的出边数据以及出边数据存储地址信息分别存储到第三数据存储介质的至少两个出边数据分块,可以通过针对数据存储介质的两次IO读取操作实现数据查询。按照上述处理方案,可以大大降低图数据查询时的IO读取操作次数,由此降低图数据查询时间,进而提升图数据查询效率。
此外,利用上述数据存储及查询方案,按照点边混合存储的方式进行存储,使得整个图数据存储过程只需要支持一种数据存储结构,从而使得各个存储服务端的存储结构相同,由此使得各个存储服务端的数据更新压力均衡。
此外,利用上述数据存储及查询方案,在邻居数量较大的情况下,将邻居信息存储在多个邻居数据分块中,并且在进行数据查询时,按需读取若干目标邻居数据分块进行解析处理,从而可以降低数据查询过程的数据处理量,由此提升数据查询效率。
图12示出了根据本说明书的实施例的数据存储装置1200的示例方框图。如图12所示,数据存储装置1200可以包括节点数量确定单元1210、数据存储方式确定单元1220和数据存储单元1230。
节点数量确定单元1210被配置为确定待存储的有向图图数据中的各个起始图节点的邻居图节点数量。
数据存储方式确定单元1220被配置为根据各个起始图节点的邻居图节点数量确定数据存储方式。在一个示例中,数据存储方式确定单元1220相对于有向图图数据中的所有起始图节点确定数据存储方式。在另一示例中,数据存储方式确定单元1220相对于有向图图数据中的各个起始图节点分别确定数据存储方式。
针对各个起始图节点,在所确定的数据存储方式为非超大点数据存储时,数据存储 单元1230被配置为将该起始图节点的节点数据、邻居信息、出边索引特征信息以及出边数据存储到第一数据存储介质的第一起始图节点数据分块,所述出边索引特征信息包括所有出边的出边索引特征,每个出边索引特征与用于索引所述第一起始图节点数据分块中存储的出边数据的出边数据索引之间形成映射关系。
针对各个起始图节点,在所确定的数据存储方式为超大点数据存储时,数据存储单元1230被配置为将该起始图节点的节点数据、邻居信息、出边索引特征范围信息以及出边数据分块索引存储到第二数据存储介质的第二起始图节点数据分块,所述出边索引特征范围信息包括与出边数据分块索引之间形成映射关系的多个出边索引特征范围,每个出边索引特征范围信息用于索引一个出边数据分块索引,以及将该起始图节点的出边数据以及出边数据存储地址信息存储到第三数据存储介质的至少两个出边数据分块,所述出边数据存储地址信息包括二元数组<出边数据的出边索引特征,出边数据在出边数据分块中的相对存储地址>。
在一个示例中,数据存储单元1230可以进一步将逆向邻居信息存储在第一起始图节点数据分块和第二起始图节点数据分块中。
在一个示例中,针对邻居数量超过预定阈值的各个起始图节点,在所确定的数据存储方式为超大点数据存储时,数据存储单元1230可以将该起始图节点的节点数据、邻居索引特征范围、邻居数据分块索引、出边索引特征范围信息以及出边数据分块索引存储到第二数据存储介质的第二起始图节点数据分块中,将邻居信息分别存储到第四数据存储介质的至少两个邻居数据分块中,以及将该起始图节点的出边数据以及出边数据存储地址信息存储到第三数据存储介质的至少两个出边数据分块中。
图13示出了根据本说明书的实施例的数据查询装置1300的示例方框图。如图13所示,数据查询装置1300包括数据分块索引确定单元1310、数据读取单元1320、数据解析单元1330、查询数据获取单元1340和查询数据提供单元1350。
数据分块索引确定单元1310被配置为响应于接收到用户发起的数据查询请求,基于待查询图节点的节点标识确定待查询图节点的数据分块索引。
数据读取单元1320被配置为将数据分块索引的对应起始图节点数据分块从第一数据存储介质或第二数据存储介质读取到数据查询装置的内存。
数据解析单元1330被配置为对所读取的起始图节点数据分块进行解析。
查询数据获取单元1340被配置为根据解析后的起始图节点数据分块,在数据查询装置本地解析数据中或者从第三数据存储介质的出边数据分块获取数据查询请求的查询数据。
查询数据提供单元1350被配置为将所获取的查询数据提供给用户。
在一个示例中,节点数据包括起始图节点的节点标识以及节点属性。邻居信息包括起始图节点的节点标识以及邻居属性。出边数据包括出边标识以及出边属性。
响应于数据查询请求指示查询图节点的节点属性,查询数据获取单元1340获取解析出的起始图节点数据分块的节点数据中的节点属性,作为查询数据。
响应于数据查询请求指示查询图节点的邻居属性,查询数据获取单元1340获取解析出的起始图节点数据分块的邻居信息中的邻居属性,作为查询数据。
响应于数据查询请求指示查询图节点的出边属性,查询数据获取单元1340从解析后的起始图节点数据分块中确定目标出边的出边索引特征,基于出边索引特征以及出边索引特征信息确定目标出边的出边数据索引或者基于出边索引特征和出边索引特征范围信息确定目标出边的出边数据分块索引。响应于确定出目标出边的出边数据索引,查询数据获取单元1340从出边数据索引所索引的解析后的起始图节点数据分块的出边数据中获取目标出边的出边属性作为查询数据。响应于确定出目标出边的出边数据分块索引,查询数据获取单元1340从出边数据分块索引所索引的第三数据存储介质的出边数据分块中获取目标出边的出边属性作为查询数据。
在一个示例中,数据查询请求可以包括过滤条件。响应于数据查询请求指示查询图节点的节点属性,查询数据获取单元1340基于数据查询请求中的过滤条件对解析出的起始图节点数据分块的节点数据进行查询过滤,并且获取经过查询过滤后的节点数据中的节点属性作为查询数据。响应于数据查询请求指示查询图节点的节点属性,查询数据获取单元1340基于数据查询请求中的过滤条件对解析出的邻居信息进行查询过滤,并且获取经过查询过滤后的邻居信息中的邻居属性作为查询数据。响应于数据查询请求指示查询图节点的出边属性,查询数据获取单元1340从解析后的起始图节点数据分块的邻居信息中确定出符合过滤条件的目标出边的出边索引特征。然后,查询数据获取单元1340基于出边索引特征和出边索引特征信息确定目标出边的出边数据索引,或者基于出 边索引特征和出边索引特征范围信息确定目标出边的出边数据分块索引。响应于确定出目标出边的出边数据索引,查询数据获取单元1340从出边数据索引所索引的解析后的起始图节点数据分块的出边数据中获取目标出边的出边属性作为查询数据。响应于确定出目标出边的出边数据分块索引,查询数据获取单元1340从出边数据分块索引所索引的第三数据存储介质的出边数据分块中获取目标出边的出边属性作为查询数据。
在一个示例中,响应于确定出目标出边的出边书分块索引,数据读取单元1320将出边数据分块索引所指示的出边数据分块从第三数据存储介质读取到数据查询装置的内存中。在将出边数据分块读取到数据查询装置的内存后,数据解析单元1330解析出边数据分块中的出边数据存储地址信息。查询数据获取单元1340基于目标出边的出边索引特征,从解析后的出边数据存储地址信息中确定目标出边在出边数据分块中的相对存储地址。然后,查询数据获取单元1340根据相对存储地址从所读取的出边数据分块中获取目标出边的出边数据并进行解析,并且获取解析后的目标出边的出边数据中的出边属性,作为查询数据。
在索引特征为时间戳的情况下,查询数据获取单元1340可以基于出边时间戳,使用二分查找法来在出边时间戳信息中确定目标出边的出边数据索引信息或者在出边时间戳范围信息中确定目标出边的出边数据分块索引。
在一个示例中,针对邻居数量超过预定阈值的各个起始图节点,在所确定的数据存储方式为超大点数据存储时,第二起始图节点数据分块存储起始图节点的节点数据、邻居索引特征范围、邻居数据分块索引、出边索引特征范围信息以及出边数据分块索引,以及邻居信息分别存储在第四数据存储介质的至少两个邻居数据分块中。
在这种情况下,响应于数据查询请求指示查询图节点的邻居属性,查询数据获取单元1340基于邻居索引特征和邻居索引特征范围信息确定邻居数据分块索引。在确定出邻居数据分块索引后,数据读取单元1320将邻居数据分块索引所指示的邻居数据分块从第四数据存储介质读取到数据查询装置的内存。随后,数据解析单元1330解析所读取的邻居数据分块。查询数据获取单元1340基于数据查询请求中的过滤条件对解析出的邻居数据分块中的邻居信息进行查询过滤,并且获取经过查询过滤后的邻居信息中的邻居属性,作为查询数据。
如上参照图1到图13,对根据本说明书实施例的数据存储方法、数据存储装置、数据查询方法以及数据查询装置进行了描述。上面的数据存储装置和数据查询装置可以采用硬件实现,也可以采用软件或者硬件和软件的组合来实现。
图14示出了根据本说明书的实施例的基于计算机系统实现的数据存储装置1400的示意图。如图14所示,数据存储装置1400可以包括至少一个处理器1410、存储器(例如,非易失性存储器)1420、内存1430和通信接口1440,并且至少一个处理器1410、存储器1420、内存1430和通信接口1440经由总线1460连接在一起。至少一个处理器1410执行在存储器中存储或编码的至少一个计算机可读指令(即,上述以软件形式实现的元素)。
在一个实施例中,在存储器中存储计算机可执行指令,其当执行时使得至少一个处理器1410:确定待存储的有向图图数据中的各个起始图节点的邻居图节点数量;根据各个起始图节点的邻居图节点数量确定数据存储方式;针对各个起始图节点,在所述数据存储方式为非超大点数据存储时,将该起始图节点的节点数据、邻居信息、出边索引特征信息以及出边数据存储到第一数据存储介质的第一起始图节点数据分块,所述出边索引特征信息包括该起始图节点的所有出边的出边索引特征,每个出边索引特征与用于索引所述第一起始图节点数据分块中存储的对应出边数据的出边数据索引之间形成映射关系;以及针对各个起始图节点,在所述数据存储方式为超大点数据存储时,将该起始图节点的节点数据、邻居信息、出边索引特征范围信息以及出边数据分块索引存储到第二数据存储介质的第二起始图节点数据分块,所述出边索引特征范围信息包括与出边数据分块索引之间形成映射关系的多个出边索引特征范围,以及将该起始图节点的出边数据以及出边数据存储地址信息存储到第三数据存储介质的至少两个出边数据分块,所述出边数据存储地址信息包括二元数组<出边数据的出边索引特征,出边数据在出边数据分块中的相对存储地址>。
应该理解,在存储器中存储的计算机可执行指令当执行时使得至少一个处理器1410进行本说明书的各个实施例中以上结合图1-图7以及图12描述的各种操作和功能。
图15示出了根据本说明书的实施例的基于计算机系统实现的数据查询装置1500的示意图。如图15所示,数据查询装置1500可以包括至少一个处理器1510、存储器(例如,非易失性存储器)1520、内存1530和通信接口1540,并且至少一个处理器1510、 存储器1520、内存1530和通信接口1540经由总线1560连接在一起。至少一个处理器1510执行在存储器中存储或编码的至少一个计算机可读指令(即,上述以软件形式实现的元素)。
在一个实施例中,在存储器中存储计算机可执行指令,其当执行时使得至少一个处理器1510:响应于接收到用户发起的数据查询请求,基于待查询图节点的节点标识确定该待查询图节点的数据分块索引,有向图图数据按照如上所述的方法存储在数据存储介质中;将数据分块索引所索引的起始图节点数据分块从第一数据存储介质或第二数据存储介质读取到数据查询装置的内存中并进行解析;根据解析后的起始图节点数据分块,在数据查询装置的本地解析数据中或者从第三数据存储介质的出边数据分块获取数据查询请求的查询数据;以及将所获取的查询数据提供给用户。
应该理解,在存储器中存储的计算机可执行指令当执行时使得至少一个处理器1510执行本说明书的各个实施例中以上结合图8-图11以及图13描述的各种操作和功能。
根据一个实施例,提供了一种比如机器可读介质(例如,非暂时性机器可读介质)的程序产品。机器可读介质可以具有指令(即,上述以软件形式实现的元素),该指令当被机器执行时,使得机器执行本说明书的各个实施例中以上结合图1-图13描述的各种操作和功能。具体地,可以提供配有可读存储介质的系统或者装置,在该可读存储介质上存储着实现上述实施例中任一实施例的功能的软件程序代码,且使该系统或者装置的计算机或处理器读出并执行存储在该可读存储介质中的指令。
在这种情况下,从可读介质读取的程序代码本身可实现上述实施例中任何一项实施例的功能,因此机器可读代码和存储机器可读代码的可读存储介质构成了本发明的一部分。
可读存储介质的实施例包括软盘、硬盘、磁光盘、光盘(如CD-ROM、CD-R、CD-RW、DVD-ROM、DVD-RAM、DVD-RW、DVD-RW)、磁带、非易失性存储卡和ROM。可选择地,可以由通信网络从服务器计算机上或云上下载程序代码。
根据一个实施例,提供一种计算机程序产品,该计算机程序产品包括计算机程序,该计算机程序当被处理器执行时,使得处理器执行本说明书的各个实施例中以上结合图1-图13描述的各种操作和功能。
本领域技术人员应当理解,上面公开的各个实施例可以在不偏离发明实质的情况下做出各种变形和修改。因此,本发明的保护范围应当由所附的权利要求书来限定。
需要说明的是,上述各流程和各系统结构图中不是所有的步骤和单元都是必须的,可以根据实际的需要忽略某些步骤或单元。各步骤的执行顺序不是固定的,可以根据需要进行确定。上述各实施例中描述的装置结构可以是物理结构,也可以是逻辑结构,即,有些单元可能由同一物理实体实现,或者,有些单元可能分由多个物理实体实现,或者,可以由多个独立设备中的某些部件共同实现。
以上各实施例中,硬件单元或模块可以通过机械方式或电气方式实现。例如,一个硬件单元、模块或处理器可以包括永久性专用的电路或逻辑(如专门的处理器,FPGA或ASIC)来完成相应操作。硬件单元或处理器还可以包括可编程逻辑或电路(如通用处理器或其它可编程处理器),可以由软件进行临时的设置以完成相应操作。具体的实现方式(机械方式、或专用的永久性电路、或者临时设置的电路)可以基于成本和时间上的考虑来确定。
上面结合附图阐述的具体实施方式描述了示例性实施例,但并不表示可以实现的或者落入权利要求书的保护范围的所有实施例。在整个本说明书中使用的术语“示例性”意味着“用作示例、实例或例示”,并不意味着比其它实施例“优选”或“具有优势”。出于提供对所描述技术的理解的目的,具体实施方式包括具体细节。然而,可以在没有这些具体细节的情况下实施这些技术。在一些实例中,为了避免对所描述的实施例的概念造成难以理解,公知的结构和装置以框图形式示出。
本公开内容的上述描述被提供来使得本领域任何普通技术人员能够实现或者使用本公开内容。对于本领域普通技术人员来说,对本公开内容进行的各种修改是显而易见的,并且,也可以在不脱离本公开内容的保护范围的情况下,将本文所定义的一般性原理应用于其它变型。因此,本公开内容并不限于本文所描述的示例和设计,而是与符合本文公开的原理和新颖性特征的最广范围相一致。

Claims (31)

  1. 一种数据存储方法,包括:
    确定待存储的有向图图数据中的各个起始图节点的邻居图节点数量;
    根据各个起始图节点的邻居图节点数量确定数据存储方式;
    针对各个起始图节点,在所述数据存储方式为非超大点数据存储时,将该起始图节点的节点数据、邻居信息、出边索引特征信息以及出边数据存储到第一数据存储介质的第一起始图节点数据分块,所述出边索引特征信息包括该起始图节点的所有出边的出边索引特征,每个出边索引特征与用于索引所述第一起始图节点数据分块中存储的对应出边数据的出边数据索引之间形成映射关系;以及
    针对各个起始图节点,在所述数据存储方式为超大点数据存储时,将该起始图节点的节点数据、邻居信息、出边索引特征范围信息以及出边数据分块索引存储到第二数据存储介质的第二起始图节点数据分块,所述出边索引特征范围信息包括与出边数据分块索引之间形成映射关系的多个出边索引特征范围,以及将该起始图节点的出边数据以及出边数据存储地址信息存储到第三数据存储介质的至少两个出边数据分块,所述出边数据存储地址信息包括二元数组<出边数据的出边索引特征,出边数据在出边数据分块中的相对存储地址>。
  2. 如权利要求1所述的数据存储方法,其中,所述数据存储方式相对于所述有向图图数据中的所有起始图节点确定出,或者所述数据存储方式相对于所述有向图图数据中的各个起始图节点分别确定出。
  3. 如权利要求1所述的数据存储方法,其中,所述节点数据包括起始图节点的节点标识以及节点属性,所述邻居信息包括起始图节点的节点标识以及邻居属性,所述邻居属性包括所有出边的基本信息,以及所述出边数据包括出边标识以及出边属性。
  4. 如权利要求3所述的数据存储方法,其中,每个出边的基本信息包括该出边的终止图节点的节点标识以及该出边的出边索引特征,以及所述出边标识包括终止图节点的节点标识以及出边索引特征。
  5. 如权利要求4所述的数据存储方法,其中,每个出边的基本信息还包括该出边的终止图节点的节点类型和/或该出边的出边类型,以及所述出边标识还包括出边类型。
  6. 如权利要求3所述的数据存储方法,其中,所述节点数据还包括节点元数据,所述节点元数据包括起始图节点的节点索引特征和/或节点类型。
  7. 如权利要求1所述的数据存储方法,其中,所述索引特征包括时间戳,所述出边索引特征信息包括经过降序排序后的所有出边的出边时间戳,以及所述出边索引特征范围信息包括经过降序排序后的多个出边时间戳范围。
  8. 如权利要求7所述的数据存储方法,其中,每个出边时间戳范围保存对应出边数据分块的最大出边时间戳和最小出边时间戳。
  9. 如权利要求1所述的数据存储方法,其中,所述第一起始图节点数据分块和所述第二起始图节点数据分块还存储逆向邻居信息,和/或所述出边数据分块还存储出边数量。
  10. 如权利要求1所述的数据存储方法,其中,针对邻居数量超过预定阈值的各个起始图节点,在所确定的数据存储方式为超大点数据存储时,将该起始图节点的节点数据、邻居信息、出边索引特征范围信息以及出边数据分块索引存储到第二数据存储介质的第二起始图节点数据分块包括:
    将该起始图节点的节点数据、邻居索引特征范围、邻居数据分块索引、出边索引特征范围信息以及出边数据分块索引存储到第二数据存储介质的第二起始图节点数据分块,以及将邻居信息分别存储到第四数据存储介质的至少两个邻居数据分块,
    其中,所述邻居索引特征范围包括与邻居数据分块索引之间形成映射关系的多个邻居索引特征范围。
  11. 如权利要求1所述的数据存储方法,其中,所述第一数据存储介质、所述第二数据存储介质以及所述第三数据存储介质分别包括一个或多个数据存储介质,以及所述第一数据存储介质、所述第二数据存储介质以及所述第三数据存储介质中的部分数据存储介质采用同一数据存储介质实现。
  12. 如权利要求1所述的数据存储方法,其中,所述非超大点数据存储和所述超大点数据存储采用键值对存储方式实现。
  13. 一种数据查询方法,包括:
    响应于接收到用户发起的数据查询请求,基于待查询图节点的节点标识确定该待查询图节点的数据分块索引,有向图图数据按照如权利要求1所述的方法存储在第一数据 存储介质、第二数据存储介质和/或第三数据存储介质中;
    将所述数据分块索引所索引的起始图节点数据分块从第一数据存储介质或第二数据存储介质读取到数据查询装置的内存中并进行解析;
    根据解析后的起始图节点数据分块,在所述数据查询装置的本地解析数据中或者从第三数据存储介质的出边数据分块获取所述数据查询请求的查询数据;以及
    将所获取的查询数据提供给所述用户。
  14. 如权利要求13所述的数据查询方法,其中,所述节点数据包括起始图节点的节点标识以及节点属性,所述邻居信息包括起始图节点的节点标识信息以及邻居属性,所述邻居属性包括所有出边的基本信息,所述出边数据包括出边标识以及出边属性,
    根据解析后的起始图节点数据分块,在所述数据查询装置的本地解析数据中或者从第三数据存储介质的出边数据分块获取所述数据查询请求的查询数据包括:
    响应于所述数据查询请求指示查询图节点的节点属性,获取解析出的起始图节点数据分块的节点数据中的节点属性,作为所述查询数据,
    响应于所述数据查询请求指示查询图节点的邻居属性,获取解析出的邻居信息中的邻居属性,作为所述查询数据,或者
    响应于所述数据查询请求指示查询图节点的出边属性,从解析出的起始图节点数据分块的邻居信息中确定目标出边的出边索引特征,基于所述出边索引特征和所述出边索引特征信息中确定所述目标出边的出边数据索引,以及从所述出边数据索引所索引的解析后的起始图节点数据分块的出边数据中获取所述目标出边的出边属性作为所述查询数据,或者基于所述出边索引特征和所述出边索引特征范围信息确定所述目标出边的出边数据分块索引,以及从所述出边数据分块索引所索引的第三数据存储介质的出边数据分块中获取所述目标出边的出边属性作为所述查询数据。
  15. 如权利要求14所述的数据查询方法,其中,所述数据查询请求包括过滤条件,
    响应于所述数据查询请求指示查询图节点的节点属性,获取解析出的起始图节点数据分块的节点数据中的节点属性包括:
    响应于所述数据查询请求指示查询图节点的节点属性,基于所述数据查询请求中的过滤条件对解析出的起始图节点数据分块的节点数据进行查询过滤,并且获取经过查询过滤后的节点数据的节点属性,
    响应于所述数据查询请求指示查询图节点的邻居属性,获取解析出的邻居信息中的邻居属性包括:
    响应于所述数据查询请求指示查询图节点的邻居属性,基于所述数据查询请求中的过滤条件对解析出的邻居信息进行查询过滤,并且获取经过查询过滤后的邻居信息中的邻居属性,或者
    响应于所述数据查询请求指示查询图节点的出边属性,从解析出的起始图节点数据分块的邻居信息中确定目标出边的出边索引特征包括:
    响应于所述数据查询请求指示查询图节点的出边属性,从解析出的起始图节点数据分块的邻居信息中确定出符合所述过滤条件的目标出边的出边索引特征。
  16. 如权利要求14所述的数据查询方法,其中,从所述出边数据分块索引所索引的第三数据存储介质的出边数据分块中获取所述目标出边的出边属性作为所述查询数据包括:
    将所述出边数据分块索引所索引的出边数据分块从第三数据存储介质读取到所述数据查询装置的内存中;
    解析所读取的出边数据分块中的出边数据存储地址信息;
    基于所述目标出边的出边索引特征,从解析后的出边数据存储地址信息中确定所述目标出边的出边数据在所述出边数据分块中的相对存储地址;
    根据所述相对存储地址从所读取的出边数据分块中获取所述目标出边的出边数据并进行解析;以及
    获取解析后的所述目标出边的出边数据中的出边属性,作为所述查询数据。
  17. 如权利要求14所述的数据查询方法,其中,所述出边索引特征包括出边时间戳,
    基于所述出边索引特征和所述出边索引特征信息确定所述目标出边的出边数据索引包括:
    基于所述出边时间戳,使用二分查找法在所述出边时间戳信息中查找来确定所述目标出边的出边数据索引或者在所述出边时间戳范围信息中确定出所述目标出边的出边数据分块索引,或者
    基于所述出边索引特征和所述出边索引特征范围信息确定所述目标出边的出边数据分块索引包括:
    基于所述出边时间戳,使用二分查找法在所述出边时间戳范围信息中查找来确定所述目标出边的出边数据分块索引。
  18. 如权利要求14所述的数据查询方法,其中,针对邻居数量超过预定阈值的各个起始图节点,在所述数据存储方式为超大点数据存储时,所述第二起始图节点数据分块存储起始图节点的节点数据、邻居索引特征范围、邻居数据分块索引、出边索引特征范围信息以及出边数据分块索引,以及起始图节点的邻居信息分别存储到第四数据存储介质的至少两个邻居数据分块,
    在获取解析出的邻居信息中的邻居属性之前,所述数据查询方法还包括:
    响应于所述数据查询请求指示查询图节点的邻居属性,基于邻居索引特征和所述邻居索引特征范围信息确定邻居数据分块索引;以及
    将所述邻居数据分块索引所索引的邻居数据分块从第四数据存储介质读取到所述数据查询装置的内存并进行解析。
  19. 一种数据存储装置,包括:
    节点数量确定单元,确定待存储的有向图图数据中的各个起始图节点的邻居图节点数量;
    数据存储方式确定单元,根据各个起始图节点的邻居图节点数量确定数据存储方式;以及
    数据存储单元,针对各个起始图节点,在所述数据存储方式为非超大点数据存储时,将该起始图节点的节点数据、邻居信息、出边索引特征信息以及出边数据存储到第一数据存储介质的第一起始图节点数据分块,所述出边索引特征信息包括该起始图节点的所有出边的出边索引特征,每个出边索引特征与用于索引所述第一起始图节点数据分块中存储的对应出边数据的出边数据索引之间形成映射关系;以及在所述数据存储方式为超大点数据存储时,将该起始图节点的节点数据、邻居信息、出边索引特征范围信息以及出边数据分块索引存储到第二数据存储介质的第二起始图节点数据分块,所述出边索引特征范围信息包括与出边数据分块索引之间形成映射关系的多个出边索引特征范围,以及将该起始图节点的出边数据以及出边数据存储地址信息存储到第三数据存储介质的至少两个出边数据分块,所述出边数据存储地址信息包括二元数组<出边数据的出边索引特征,出边数据在出边数据分块中的相对存储地址>。
  20. 如权利要求19所述的数据存储装置,其中,所述数据存储方式确定单元相对于所述有向图图数据中的所有起始图节点确定所述数据存储方式,或者相对于所述有向图图数据中的各个起始图节点确定所述数据存储方式。
  21. 如权利要求19所述的数据存储装置,其中,针对邻居数量超过预定阈值的各个起始图节点,在所述数据存储方式为超大点数据存储时,所述数据存储单元将该起始图节点的节点数据、邻居索引特征范围、邻居数据分块索引、出边索引特征范围信息以及出边数据分块索引存储到第二数据存储介质的第二起始图节点数据分块,将邻居信息分别存储到第四数据存储介质的至少两个邻居数据分块,以及将该起始图节点的出边数据以及出边数据存储地址信息存储到第三数据存储介质的至少两个出边数据分块。
  22. 一种数据查询装置,包括:
    数据分块索引确定单元,响应于接收到用户发起的数据查询请求,基于待查询图节点的节点标识确定该待查询图节点的数据分块索引,有向图图数据按照如权利要求1所述的方法存储到第一数据存储介质、第二数据存储介质和/或第三数据存储介质中;
    数据读取单元,将所述数据分块索引所索引的起始图节点数据分块从第一数据存储介质或第二数据存储介质读取到数据查询装置的内存中;
    数据解析单元,对所读取的起始图节点数据分块进行解析;
    查询数据获取单元,根据解析后的起始图节点数据分块,在所述数据查询装置的本地解析数据中或者从第三数据存储介质的出边数据分块获取所述数据查询请求的查询数据;以及
    查询数据提供单元,将所获取的查询数据提供给所述用户。
  23. 如权利要求22所述的数据查询装置,其中,所述节点数据包括起始图节点的节点标识以及节点属性,所述邻居信息包括起始图节点的节点标识以及邻居属性,所述邻居属性包括所有出边的基本信息,所述出边数据包括出边标识以及出边属性,
    响应于所述数据查询请求指示查询图节点的节点属性,所述查询数据获取单元获取解析出的起始图节点数据分块的节点数据中的节点属性,作为所述查询数据,
    响应于所述数据查询请求指示查询图节点的邻居属性,所述查询数据获取单元获取解析出的邻居信息中的邻居属性,作为所述查询数据,或者
    响应于所述数据查询请求指示查询图节点的出边属性,所述查询数据获取单元从解析出的起始图节点数据分块的邻居信息中确定目标出边的出边索引特征,基于所述出边索引特征和所述出边索引特征信息确定所述目标出边的出边数据索引,以及从所述出边数据索引所索引的解析后的起始图节点数据分块的出边数据中获取所述目标出边的出边属性作为所述查询数据,或者基于所述出边索引特征和所述出边索引特征范围信息中确定所述目标出边的出边数据分块索引,以及从所述出边数据分块索引所索引的第三数据存储介质的出边数据分块中获取所述目标出边的出边属性作为所述查询数据。
  24. 如权利要求23所述的数据查询装置,所述数据查询请求包括过滤条件,
    响应于所述数据查询请求指示查询图节点的节点属性,所述查询数据获取单元进一步基于所述数据查询请求中的过滤条件对解析出的起始图节点数据分块的节点数据进行查询过滤,
    响应于所述数据查询请求指示查询图节点的节点属性,所述查询数据获取单元基于所述数据查询请求中的过滤条件对解析出的邻居信息进行查询过滤,或者
    响应于所述数据查询请求指示查询图节点的出边属性,所述查询数据获取单元进一步从解析后的起始图节点数据分块的邻居信息中确定符合所述过滤条件的目标出边的出边索引特征。
  25. 如权利要求23所述的数据查询装置,其中,所述数据读取单元将所述出边数据分块索引所索引的出边数据分块从第三数据存储介质读取到所述数据查询装置的内存中,所述数据解析单元解析所述出边数据分块中的出边数据存储地址信息,以及
    所述数据查询单元被配置为:
    基于所述目标出边的出边索引特征,从解析后的出边数据存储地址信息中确定所述目标出边在所述出边数据分块中的相对存储地址;
    根据所述相对存储地址从所读取的出边数据分块中获取所述目标出边的出边数据并进行解析;以及
    获取解析后的所述目标出边的出边数据中的出边属性,作为所述查询数据。
  26. 如权利要求23所述的数据查询装置,其中,针对邻居数量超过预定阈值的各个起始图节点,在所述数据存储方式为超大点数据存储时,所述第二起始图节点数据分块存储起始图节点的节点数据、邻居索引特征范围、邻居数据分块索引、出边索引特征范围信息以及出边数据分块索引,以及邻居信息分别存储到第四数据存储介质的至少两个邻居数据分块,
    响应于所述数据查询请求指示查询图节点的邻居属性,所述查询数据获取单元基于邻居索引特征在所述邻居索引特征范围信息中确定出邻居数据分块索引,以及所述数据读取单元将所述邻居数据分块索引所索引的邻居数据分块从第四数据存储介质读取到所述数据查询装置的内存,
    所述数据解析单元进一步解析所读取的邻居数据分块。
  27. 一种数据库系统,包括:
    如权利要求19到21中任一所述的数据存储装置;
    如权利要求22到26中任一所述的数据查询装置;以及
    至少一个数据存储介质,包括第一数据存储介质、第二数据存储介质和/或第三数据存储介质。
  28. 一种数据存储装置,包括:
    至少一个处理器,
    与所述至少一个处理器耦合的存储器,以及
    存储在所述存储器中的计算机程序,所述至少一个处理器执行所述计算机程序来实现如权利要求1到12中任一所述的数据存储方法。
  29. 一种数据查询装置,包括:
    至少一个处理器,
    与所述至少一个处理器耦合的存储器,以及
    存储在所述存储器中的计算机程序,所述至少一个处理器执行所述计算机程序来实现如权利要求13到18中任一所述的数据查询方法。
  30. 一种计算机可读存储介质,其存储有可执行指令,所述指令当被执行时使得处理器执行如权利要求1到12中任一所述的数据存储方法或者执行如权利要求13到18中任一所述的数据查询方法。
  31. 一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行来实现如权利要求1到12中任一所述的数据存储方法或者执行如权利要求13到18中任一所述的数据查询方法。
PCT/CN2022/123782 2021-10-08 2022-10-08 数据存储及查询 WO2023056928A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111169465.6 2021-10-08
CN202111169465.6A CN113609347B (zh) 2021-10-08 2021-10-08 数据存储及查询方法、装置及数据库系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/400,366 Continuation US20240134911A1 (en) 2021-10-07 2023-12-29 Data storage and querying

Publications (1)

Publication Number Publication Date
WO2023056928A1 true WO2023056928A1 (zh) 2023-04-13

Family

ID=78310803

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/123782 WO2023056928A1 (zh) 2021-10-08 2022-10-08 数据存储及查询

Country Status (2)

Country Link
CN (2) CN113609347B (zh)
WO (1) WO2023056928A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117290560A (zh) * 2023-11-23 2023-12-26 支付宝(杭州)信息技术有限公司 图计算任务中获取图数据的方法和装置

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113609347B (zh) * 2021-10-08 2021-12-28 支付宝(杭州)信息技术有限公司 数据存储及查询方法、装置及数据库系统
CN113901279B (zh) * 2021-12-03 2022-03-22 支付宝(杭州)信息技术有限公司 一种图数据库的检索方法和装置
CN114077680B (zh) * 2022-01-07 2022-05-17 支付宝(杭州)信息技术有限公司 一种图数据的存储方法、系统及装置
CN116204683A (zh) * 2022-09-15 2023-06-02 阿里巴巴(中国)有限公司 动态图数据存储系统、读取系统及对应方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101645038A (zh) * 2009-05-20 2010-02-10 中国科学院声学研究所 基于彼特森的网络存储结构的数据存储方法
CN108920105A (zh) * 2018-07-03 2018-11-30 清华大学 基于社区结构的图数据分布式存储方法及装置
CN110737659A (zh) * 2019-09-06 2020-01-31 平安科技(深圳)有限公司 图数据存储和查询方法、装置及计算机可读存储介质
US20200226156A1 (en) * 2019-01-14 2020-07-16 Salesforce.Com, Inc. Systems, methods, and apparatuses for executing a graph query against a graph representing a plurality of data stores
CN113609347A (zh) * 2021-10-08 2021-11-05 支付宝(杭州)信息技术有限公司 数据存储及查询方法、装置及数据库系统

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9942053B2 (en) * 2013-09-17 2018-04-10 Cisco Technology, Inc. Bit indexed explicit replication using internet protocol version 6
CN104123369B (zh) * 2014-07-24 2017-06-13 中国移动通信集团广东有限公司 一种基于图形数据库的配置管理数据库系统的实现方法
US9916187B2 (en) * 2014-10-27 2018-03-13 Oracle International Corporation Graph database system that dynamically compiles and executes custom graph analytic programs written in high-level, imperative programming language
US9330138B1 (en) * 2015-09-18 2016-05-03 Linkedin Corporation Translating queries into graph queries using primitives
CN106919628A (zh) * 2015-12-28 2017-07-04 阿里巴巴集团控股有限公司 一种图数据的处理方法和装置
CN106227794B (zh) * 2016-07-20 2019-09-17 北京航空航天大学 时态图数据中动态属性数据的存储方法和装置
US10374877B2 (en) * 2017-05-08 2019-08-06 NetApp., Inc. Address extraction of a cluster configuration inception point
CN110990638B (zh) * 2019-10-28 2023-04-28 北京大学 基于fpga-cpu异构环境的大规模数据查询加速装置及方法
CN111190904B (zh) * 2019-12-30 2023-12-08 四川蜀天梦图数据科技有限公司 一种图-关系数据库混合存储的方法和装置
CN112363979B (zh) * 2020-09-18 2023-08-04 杭州欧若数网科技有限公司 一种基于图数据库的分布式索引方法和系统
CN112287182B (zh) * 2020-10-30 2023-09-19 杭州海康威视数字技术股份有限公司 图数据存储、处理方法、装置及计算机存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101645038A (zh) * 2009-05-20 2010-02-10 中国科学院声学研究所 基于彼特森的网络存储结构的数据存储方法
CN108920105A (zh) * 2018-07-03 2018-11-30 清华大学 基于社区结构的图数据分布式存储方法及装置
US20200226156A1 (en) * 2019-01-14 2020-07-16 Salesforce.Com, Inc. Systems, methods, and apparatuses for executing a graph query against a graph representing a plurality of data stores
CN110737659A (zh) * 2019-09-06 2020-01-31 平安科技(深圳)有限公司 图数据存储和查询方法、装置及计算机可读存储介质
CN113609347A (zh) * 2021-10-08 2021-11-05 支付宝(杭州)信息技术有限公司 数据存储及查询方法、装置及数据库系统
CN114186100A (zh) * 2021-10-08 2022-03-15 支付宝(杭州)信息技术有限公司 数据存储及查询方法、装置及数据库系统

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117290560A (zh) * 2023-11-23 2023-12-26 支付宝(杭州)信息技术有限公司 图计算任务中获取图数据的方法和装置
CN117290560B (zh) * 2023-11-23 2024-02-23 支付宝(杭州)信息技术有限公司 图计算任务中获取图数据的方法和装置

Also Published As

Publication number Publication date
CN113609347B (zh) 2021-12-28
CN114186100A (zh) 2022-03-15
CN113609347A (zh) 2021-11-05

Similar Documents

Publication Publication Date Title
WO2023056928A1 (zh) 数据存储及查询
US11281793B2 (en) User permission data query method and apparatus, electronic device and medium
JP6225261B2 (ja) データを記憶する方法及び装置
CN104281672B (zh) 日志数据的处理方法和装置
US8290900B2 (en) Apparatus, and associated method, for synchronizing directory services
US20170031948A1 (en) File synchronization method, server, and terminal
US10536456B2 (en) Method and system for identifying user information in social network
CN113641841B (zh) 数据编码方法、图数据存储方法、图数据查询方法及装置
WO2018010693A1 (zh) 识别伪基站信息的方法及装置
WO2019233255A1 (zh) 短信分组方法及装置、计算机可读存储介质
WO2023165272A1 (zh) 数据存储及查询
CN112564991A (zh) 应用识别方法、装置及存储介质
US20210344589A1 (en) Method, server, and system for data stream redirecting
CN108876644B (zh) 一种基于社交网络的相似账号计算方法及装置
CN111309946B (zh) 一种已建立档案优化方法及装置
WO2023066221A1 (zh) 图数据库处理
CN112463527A (zh) 一种数据处理方法、装置、设备、系统及存储介质
US20240134911A1 (en) Data storage and querying
WO2020020358A1 (zh) 一种确定驻留时长的方法、装置、设备及存储介质
CN108924100B (zh) 一种异常用户识别方法
CN107025300B (zh) 一种数据查询的方法及装置
CN110781309A (zh) 一种基于模式匹配的实体并列关系相似度计算方法
CN110580243A (zh) 一种文件比对方法、装置、电子设备及存储介质
US20230091953A1 (en) Systems and methods for precomputation of digital asset inventories
CN115379026B (zh) 一种报文头域的识别方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22877944

Country of ref document: EP

Kind code of ref document: A1