WO2023131218A1 - 图数据的存储 - Google Patents

图数据的存储 Download PDF

Info

Publication number
WO2023131218A1
WO2023131218A1 PCT/CN2023/070606 CN2023070606W WO2023131218A1 WO 2023131218 A1 WO2023131218 A1 WO 2023131218A1 CN 2023070606 W CN2023070606 W CN 2023070606W WO 2023131218 A1 WO2023131218 A1 WO 2023131218A1
Authority
WO
WIPO (PCT)
Prior art keywords
edge
node
information
attribute
data
Prior art date
Application number
PCT/CN2023/070606
Other languages
English (en)
French (fr)
Inventor
张达
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2023131218A1 publication Critical patent/WO2023131218A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying

Definitions

  • One or more embodiments of this specification relate to the field of computers, and in particular to a method, system, and device for storing graph data.
  • One aspect of this specification provides a method for storing graph data, where the graph data includes nodes and edges; the storage method includes: storing node information of several nodes in the graph data in a point table of a data block; the The node information includes a node identifier; the edge information of the edges of the several nodes is stored in the edge table of the data block; the edge information includes the node identifier of the target node connected to the edge; the edge information of the several nodes is The attribute information is stored in the point attribute table of the data block; the attribute information of the edges of the several nodes is stored in the edge attribute table of the data block.
  • the graph data includes nodes and edges;
  • the storage system includes: a node information storage module, used to store the node information of several nodes in the graph data in the data in the point table of the block;
  • the node information includes a node identifier;
  • the edge information storage module is used to store the edge information of the edges of the several nodes in the edge table of the data block;
  • the edge information includes an edge The node identification of the connected target node;
  • the node attribute information storage module used to store the attribute information of the several nodes in the point attribute table of the data block;
  • the edge attribute information storage module used to store the several nodes
  • the attribute information of the edge of the node is stored in the edge attribute table of the data block.
  • graph data file includes nodes and edges; the file includes several data blocks, wherein each data block includes: a point table, used to store nodes of at least some nodes in the graph data information; the node information includes a node identifier; an edge table is used to store the edge information of the edge of the node; the edge information includes a node identifier of a target node connected to the edge; a point attribute table is used to store the node The attribute information of the node; the edge attribute table is used to store the attribute information of the edge of the node.
  • Fig. 1 is a schematic diagram of an application scenario of an exemplary graph data storage system according to some embodiments of the present specification
  • Figure 2 is a schematic diagram of a point table according to some embodiments of the present specification.
  • Fig. 3 is a schematic diagram of an edge table according to some embodiments of the present specification.
  • Fig. 4 is a schematic diagram of a point/edge attribute table according to some embodiments of the present specification.
  • Fig. 5 is a system block diagram of graph data storage according to some embodiments of the present specification.
  • Fig. 6 is a schematic diagram of a data block structure according to some embodiments of this specification.
  • Fig. 7 is an exemplary flow chart of graph data storage according to some embodiments of the present specification.
  • Fig. 8 is an exemplary flow chart of querying graph data according to some embodiments of the present specification.
  • system means for distinguishing different components, elements, parts, parts or assemblies of different levels.
  • the words may be replaced by other expressions if other words can achieve the same purpose.
  • Fig. 1 is a schematic diagram of an application scenario of an exemplary graph database storage system according to some embodiments of the present specification.
  • the data generated between different entities is increasing exponentially, and the internal dependence of data and complexity increases.
  • the form of graph data is used to describe and characterize the relationship between different entities.
  • Graph data is composed of multiple nodes and edges connecting each node.
  • the nodes in the graph data represent entities, and the edges between nodes represent the relationship between entities.
  • Entities can be real objects, institutions, etc. in the physical world, or abstract concepts, such as companies, equipment, people, goods, storage locations, means of transportation, images, computer programs, accounts, etc. Entities can have attribute information.
  • attribute information includes age, gender, occupation, work unit or home address, etc.
  • attribute information includes company registered address, legal person, business scope, registered capital and other information.
  • Edges between entities can reflect the relationship between entities. For example, there may be an employment relationship between an entity person and an entity company, and there may be a friend relationship between Zhang San and Li Si. Edges can also have attribute information.
  • the attribute information of an employment relationship can include establishment time, employment relationship type (whether it is formal employment or temporary employment), and so on.
  • the graph data can be stored in a relational database, and this storage method will store the nodes and edges in the graph data separately.
  • relational databases show more inadaptability when storing graph data. For example, because the graph data is huge, the graph data needs to be stored in separate databases and tables, and then the nodes and the edges of these nodes will be split and stored.
  • queries the graph data it is necessary to interact with different databases (such as storage devices) to find the target Query nodes and their edges, or multiple reads and writes are required to obtain the target query nodes and their edges.
  • a graph data storage method based on graph databases is proposed in some embodiments.
  • the relationship between data plays an important role, and it can store massive and complex data and the relationship between complex data.
  • the graph database is a graph database that divides the nodes and edges in the graph data into different KV storage engines for storage, and builds a proxy layer (that is, a proxy layer) on top of the graph database to provide graph query services.
  • a proxy layer that is, a proxy layer
  • a one-hop subgraph that is, a one-hop graph, it refers to a node, the edge connected to the node, and the node at the other end of the edge
  • querying a one-hop subgraph requires many read and write operations to obtain the query result of a one-hop subgraph, and such retrieval efficiency is very low.
  • the graph database needs an independent cluster server (computer) for deployment and operation and maintenance, so as to ensure that there is enough memory for multiple read and write operations in the graph query process. This brings about a large equipment operation and maintenance cost.
  • some embodiments of this specification provide a storage method for graph data, including: correspondingly storing node information, edge information, node attribute information, and edge attribute information of several nodes in the graph data in the same data In the point table, edge table, point attribute table and edge attribute table of the block.
  • the node information and edge information of the relevant nodes can be obtained by reading the data block once, which effectively reduces the frequency of reading and writing in the process of graph processing.
  • the data block can be read and written only once, and the query efficiency is significantly improved.
  • the storage order of the edges in the edge table can also be consistent with the storage order of the several nodes in the point table, so that the storage order of the attribute information of several nodes in the point attribute table is the same as The storage order of the several nodes in the point table is consistent, so that the storage order of the edge attribute information of the several nodes in the edge attribute table is consistent with the storage order of the edges of the several nodes in the edge table, through such In this way, the alignment of point table-edge table-attribute table is realized. After node A is queried, the positions of all edges corresponding to node A in the edge table can be quickly determined, and then the attribute information of node A in the edge attribute table can be quickly located. Such a setting eliminates the need for excessive data reading and writing and caching requirements during the graph query process, so the entire process does not require a resident service cluster to support it.
  • the application scenario of the graph data storage system is shown in FIG. 1 , and the scenario 100 may include a storage device 110 - 1 , a storage device 110 - 2 , .
  • the storage device 110-1, the storage device 110-2, the storage device 110-3, ... may include a processor and a large capacity memory, a removable memory, a volatile read-write memory, a read-only memory (ROM), etc. or any combination thereof , for data storage, management of resources, and processing of data and/or information from at least one component of the System or external data sources (eg, cloud data centers).
  • each of storage device 110-1, storage device 110-2, storage device 110-3, ... may be a single server or a group of servers.
  • the server group may be centralized or distributed (for example, the server 110-1 may be a distributed system), may be dedicated, or may be simultaneously provided by other devices or systems.
  • storage device 110-1, storage device 110-2, storage device 110-3, ... may be local or remote.
  • the storage device 110-1, the storage device 110-2, the storage device 110-3, ... may be implemented on a cloud platform, or provided in a virtual manner.
  • the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an internal cloud, a multi-layer cloud, etc., or any combination thereof.
  • any one or more of storage device 110-1, storage device 110-2, ..., storage device 110-n can store one or more graph files, and support parallel query of graph data.
  • the graph file may include multiple data blocks, and each data block is used to store node information, edge information, and attribute information corresponding to nodes and edges of all or part of the nodes in the graph data.
  • each data block includes a point table 210 , an edge table 220 , a point attribute table 230 , an edge attribute table 240 and a table element 250 .
  • the processing device 120 can generate or acquire graph data, write the graph data into multiple data blocks or multiple graph files, and distribute the multiple data blocks or graph files to the storage device 110-1, storage device 110-2, ..., the storage device 110-n stores.
  • the processing device 120 can obtain the query request, and distribute the query request to each storage device, so that each storage device can perform a query in the locally stored map data or data blocks, and return the query result to the processing device 120 .
  • a storage device may be used to store the map files, and in this case, the processing device 120 may be omitted.
  • the scene 100 may also include a network (not shown in the figure).
  • a network can connect components of a system and/or connect the system with external parts.
  • a network enables communication between the various components of the system and between the system and external parts, facilitating the exchange of data and/or information.
  • the network 130 may be any one or more of a wired network or a wireless network.
  • a network may include a cable network, a fiber optic network, a telecommunications network, the Internet, a local area network (LAN), a wide area network (WAN), a wireless local area network (WLAN), a metropolitan area network (MAN), a public switched telephone network (PSTN), Bluetooth network, ZigBee network (ZigBee), near field communication (NFC), internal bus, internal line, cable connection, etc. or any combination thereof.
  • the network connection between various parts of the system may adopt one of the above-mentioned methods, or may adopt multiple methods.
  • the network may be in various topologies such as point-to-point, shared, and central, or a combination of multiple topologies.
  • Fig. 5 is a system block diagram for storing a graph database according to some embodiments of the present specification.
  • the system 500 is arranged on any processing device that can execute programs (such as any one of server 110-1, storage device 110-2, ..., storage device 110-n in FIG. 1 ), specifically including : a node information storage module 510, used to store the node information of several nodes in the graph data in the point table of the data block; the node information includes a node identifier; a side information storage module 520, used to store the several nodes
  • the edge information of the edge of the node is stored in the edge table of the data block; the edge information includes the node identification of the target node connected to the edge;
  • the node attribute information storage module 530 is used to store the attribute information of the several nodes Stored in the point attribute table of the data block;
  • the edge attribute information storage module 540 is configured to store the attribute information of the edges of the several nodes in the edge attribute table of the data block.
  • the storage order of the edges of the several nodes in the edge table is consistent with the storage order of the several nodes in the point table; the storage order of the attribute information of the several nodes in the point attribute table It is consistent with the storage order of the several nodes in the point table; the storage order of the edge attribute information of the several nodes in the edge attribute table is consistent with the storage order of the edges of the several nodes in the edge table.
  • the edge table includes an edge table index area and an edge table data area; the edge information of the edges of the several nodes is stored in the edge table data area; the edge table index area stores the several The edge index information of a node, the edge index information includes the storage address information of the edge information of the corresponding node in the edge table data area; the storage order of the edge index information of the several nodes is the same as the storage order of the edge information The storage order of the above-mentioned several nodes in the point table is consistent.
  • the node information further includes storage address information of edges of nodes, and the storage address information of edges in the point table is storage address information of index information corresponding to edges in the edge table.
  • the edge information of different edges of the same node is continuously stored in the edge table data area; the storage order of the edge information of the edges of the several nodes is the same as the storage order of the several nodes in the point table in the same order.
  • the edge index information also includes the edge type; the edge information also includes the node type of the target node; the edge information of the same node is stored sequentially in the edge table data area according to the edge type.
  • the edge attribute table includes an edge attribute table index area and an edge attribute table data area; the attribute information of the edges of the several nodes is stored in the edge attribute table data area; the edge attribute table index area The edge attribute index information of the edges of the several nodes is stored, and the edge attribute index information includes the storage address information of the edge attribute information in the edge attribute table data area; the edge attribute index information of the edges of the several nodes The storage order of the information is consistent with the storage order of the edge information of the several edges in the edge table data area.
  • the node information further includes node types, and the node information of the several nodes is stored in the point table in order of node identification.
  • the point attribute table includes a point attribute table index area and a point attribute table data area; the attribute information of the several nodes is stored in the point attribute table data area; the point attribute table index area stores There are node attribute index information of the several nodes, and the node attribute index information includes the storage address information of the attribute information of the node in the point attribute table data area; the storage order of the node attribute index information of the several nodes is the same as The storage order of the several nodes in the point table is consistent.
  • the system 500 further includes a table element generation module 550, the table element generation module 550 is used to generate the table element of the data block, and the table element includes storage address information of each table in the data block And the node identifier of the first node in each point table in the data block.
  • the data block includes encoding information;
  • the system 500 also includes a vocabulary generating module 560, and the vocabulary generating module 560 is used to generate a vocabulary of the map file; the vocabulary includes encoding in each data block in the map file The mapping relationship between information and original information.
  • the system 500 also includes a data block index generation module 570, the data block index generation module 570 is used to generate the data block index of the map file; the data block index of the map file includes the storage of each data block in the map file Address information and node identification of the first node in each data block.
  • the system 500 further includes a map file element generation module 580, and the map file element generation module 580 is used to generate a map file element, and the map file element includes the map file where each data block in each map file is located and the The serial number of the data block in the graph file, the node identifier of the first node in each graph file, and the node identifier of the last node in each graph file.
  • a data block is the smallest read/write unit.
  • the edge of the graph data includes an outgoing edge and an incoming edge;
  • the edge table includes an outgoing edge table and an incoming edge table;
  • the edge attribute table includes an outgoing edge attribute table and an incoming edge attribute table;
  • the node information It also includes the storage address information of the outgoing edge and the storage address information of the incoming edge of the node.
  • the device and its modules shown in FIG. 5 can be implemented in various ways.
  • the device and its modules may be implemented by hardware, software, or a combination of software and hardware.
  • the hardware part can be implemented by using dedicated logic;
  • the software part can be stored in a memory and executed by an appropriate instruction executing device, such as a microprocessor or specially designed hardware.
  • an appropriate instruction executing device such as a microprocessor or specially designed hardware.
  • processor control code for example on a carrier medium such as a magnetic disk, CD or DVD-ROM, such as a read-only memory (firmware ) or on a data carrier such as an optical or electronic signal carrier.
  • the device and its modules in this specification can not only be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc. , can also be realized by software executed by various types of processors, for example, and can also be realized by a combination of the above-mentioned hardware circuits and software (for example, firmware).
  • Fig. 6 is a schematic diagram of a data block structure according to some embodiments of the present specification.
  • Stored file 600 includes an atlas file element and one or more atlas files.
  • the graph file element includes the graph file where each data block in each graph file is located, the serial number of the data block in the graph file, the node identifier of the first node in each graph file, and the node identifier of the last node in each graph file.
  • the node label is to indicate the number of the node in the graph data, and is used to trace the position of the node in the graph data. Exemplarily, the node label can be set as node 1, node 2, . . . , node m and so on.
  • nodes in the graph data can be stored in multiple data blocks or graph files based on node identifiers, so as to quickly determine which graph file the target search node is in.
  • the map file element can be understood as index information of multiple map files, which can be called and accessed by a host computer or a server (such as calling through an SDK).
  • a graph file may include multiple data blocks.
  • a graph file may include a fixed number of data blocks, for example, a graph file may include 1024 data blocks.
  • the data block is the smallest read-write unit, which can be used to store and write data.
  • the data block is the minimum writing unit, and the processing device can sequentially write the graph data into one or more data blocks according to the format of the data block.
  • a data block can have a fixed size, such as 64 bytes, 128 bytes, etc. When a data block is full, a new data block is created to continue writing until a complete graph data is written.
  • the data in the data block comes from the same graph data, and may also come from different graph data.
  • the data block includes a point table, a point attribute table, an edge table, and an edge attribute table.
  • the data block can also include a table element, and the table element includes the storage address information of each table in the data block and the point table in the data block.
  • the node identifier of the first node in , the table element can be regarded as the index information inside the data block, which is convenient for quickly locating the storage location of each table.
  • the graph file may also include file footer information, data block indexes, and vocabulary.
  • the vocabulary of the map file is used to record the mapping relationship between the encoded information and the original information. Further, the vocabulary can be used to encode or decode at least part of the information in the map file. For example, information such as edge type and node type can be represented by numbers, such as number 1 for user-type nodes, and number 2 for company-type nodes. Therefore, when storing node types in the point table, numbers such as 1 and 2 can be used to represent corresponding type. Representing text with shorter numbers or letters can effectively reduce the actual storage space of graph data. Correspondingly, similar mapping relationships such as "1" - user node, "2" - company node, etc. may be recorded in the vocabulary.
  • the data block index of the map file includes the storage address information of each data block in the map file and the node identifier of the first node in each data block.
  • the data block index of the map file can quickly determine which data block the target query point is in.
  • the file footer information includes the total number of nodes in the data block, the total number of edges, and file extension areas (such as file protocol, compression algorithm, correction information, etc.).
  • Fig. 8 is an exemplary flow chart of querying graph data according to some embodiments of the present specification.
  • the method of using the stored file will be described by taking the known target query node and finding the N-hop subgraph of the target query node as an example.
  • the N-hop subgraph includes N-hop edges of the target query node and nodes on each edge.
  • the storage device receives a query request from a service end or a processing device.
  • the query request includes a node identifier of a target query node.
  • the storage device accesses the graph file element, as in step 820, determines which graph file the target node is stored in through the node identifier of the first node of each graph file stored in the graph file element and the node identifier of the last node in each graph file in (eg locked to a map file V). Further, based on the node identifier of the first node in each data block stored in the data block index of the map file (data block index of map file V), determine the target data block where the target query node is located, as in step 830.
  • the target data block where the target query node is located based on the storage address information of each data block in the map file stored in the data block index, for example, in step 840, specifically, the target data block can be obtained.
  • the point table can be located based on its elements, and the node information of the target query node can be found in the point table based on the node ID.
  • binary search can be performed The node information of the target query node is quickly determined in a manner such as step 850.
  • the node information of the target query node can be based on the target query node through a read operation (such as loading the data block into the memory).
  • each first-hop neighbor node the node on the first hop side of the target query node
  • obtain the node identifications of each first-hop neighbor node (the node on the first hop side of the target query node) of the target query node in the one-hop subgraph and repeat the above steps to find the one-hop sub-nodes of each first-hop neighbor node Graph, get the two-hop subgraph of the target query node, and so on, get the N-hop subgraph of the target query node.
  • the edges of graph data may include outgoing edges and incoming edges.
  • the edge table involved in this specification can also be further divided into the outgoing edge table and the incoming edge table; the corresponding edge attribute table also includes the outgoing edge attribute table and the incoming edge attribute table; the corresponding node information It also includes the storage address information of the outgoing edge and the storage address information of the incoming edge of the node.
  • Fig. 7 is an exemplary flowchart of graph data storage according to some embodiments of the present specification.
  • process 700 may include steps 710 , 720 , . . . , step 780 , and a detailed description of process 700 is as follows.
  • Step 710 storing the node information of several nodes in the graph data in the point table of the data block.
  • step 710 may be performed by the node information storage module 510 .
  • the node information storage module 510 fills the node information into the point table in order based on the format of the set point table.
  • Graph data includes nodes and edges.
  • the node information storage module 510 may select several nodes from the graph data for storage. Several nodes can be all the nodes of the graph data, or some of them.
  • FIG. 2 is a schematic diagram of an exemplary point table 210 .
  • Node information of several nodes is stored in the point table, and the node information includes node identifiers.
  • the node identifier is the number indicating the node in the graph data, and is used to trace the position of the node in the graph data.
  • the node identifier can be set as node 1, node 2, . . . , node m and so on.
  • the node information stored in the point table is stored based on the order of node identification.
  • the node information storage module 510 may select several nodes with consecutive node IDs from the graph data, and store the node information of these nodes sequentially according to the ascending or descending order of the node IDs.
  • the node information also includes storage address information of the edge corresponding to the node, and the storage address information of the edge indicates the storage location of the edge in the edge table, for example, it may be the storage address information of the edge index information in the edge table.
  • the storage address information may be an absolute address, or an offset relative to a certain starting position.
  • the storage address information of the edge index information in the edge table may be an absolute address, or an offset relative to the starting position of the edge table.
  • a node can contain multiple edges.
  • the node information storage module 510 can record the storage address information of each edge of the node in the point table, that is, a node information can record the storage address information of all edges connected to the node.
  • a node information can record the storage address information of all edges connected to the node.
  • the edge information of the same node can be continuously stored in the edge table. For example, node A has 5 edges and node B has 3 edges.
  • node B's edge information is continuously stored in another area (eg, an area with a size of 12 ⁇ 3 bytes) starting from the second storage location (eg, the 76th byte in the edge table).
  • the edge storage address information of each node stored in the point table can only include the initial storage location of its edge in the edge table (such as the edge storage address information of A node is the first storage location location, the storage address information of the edge of node B is the second storage location). That is, in the point table, the intermediate storage area from the storage address information of the edge of the previous node to the storage address information of the edge of the next node is regarded as the storage address information of the edge corresponding to the previous node.
  • an edge has a direction
  • a node may have an outgoing edge and/or an incoming edge, where an incoming edge is an edge pointing to the node, and an outgoing edge is an edge starting from the node pointing to another node.
  • the edge storage address information in the node information can be further divided into the storage address information of the incoming edge and the storage address information of the outgoing edge.
  • the edge table can include two types: an in-edge table and an out-edge table. The in-edge table only stores the edge information of the in-edge table, and the out-edge table stores the edge information of the out-edge table.
  • the storage address information of the outgoing/incoming edge in the node information and the storage method of the outgoing/incoming edge information in the outgoing/incoming edge table are similar to those described above, and will not be repeated here.
  • the node information may also include node type information. Since a node can describe any entity or object in the physical world, it can be of different types. For example, a user-type node, a company-type node, a location-type node, and so on.
  • the node type (not shown in the figure) may be stored between the node identifier of each node and the storage address information of the edge as shown in FIG. 2 .
  • the types of nodes can be exhaustive.
  • the node types can also be encoded in the map file through the vocabulary, and the point table only stores the encoded the node type.
  • node type of a node When it is necessary to read the node type of a node from the point table, it can be encoded and parsed into a node type with clear semantics based on the vocabulary again, such as "user class node".
  • the way of encoding and decoding in the file through the vocabulary can simplify the expression of the node type, so as to further reduce the storage space.
  • FIG. 6 For more descriptions about the vocabulary, refer to the description of FIG. 6 , which will not be repeated here.
  • the node information may also be stored in the order of node types first, and then in the order of node identifiers.
  • user class nodes can be stored together, and stored sequentially according to node identifiers among multiple user class nodes.
  • it can be arranged according to the pinyin alphabet of the first character of the node type description text or the first letter of the first word.
  • the point table 210 shown in FIG. 2 also includes a header identification bit for indicating whether the table has an index area. In some embodiments, the point table does not include an index area, and its header identification bit stores "0".
  • Step 720 storing the edge information of the edges of the several nodes in the edge table of the data block.
  • step 720 may be performed by the side information storage module 520 .
  • the side information storage module 520 fills the data into the side table in sequence based on the format of the set side table.
  • the edge table may include an edge table index area and an edge table data area. It can be understood that since an edge can be described by two target nodes connected by the edge, the edge information can include a node identifier of the target node connected to the edge.
  • the edge information is stored in the edge table data area.
  • the edge table data area stores a pair of node IDs of target nodes, wherein each pair of node IDs of target nodes corresponds to an edge.
  • the edge table index area stores the index information of the edge information of each edge in the edge table, for example, includes the storage address information of the node identifier of the target node corresponding to each edge in the edge table data area.
  • the header flag indicates whether the table has an index area. Exemplarily, setting the header identification bit to "1" indicates that there is an index area; setting the header identification bit to "0" indicates that there is no index area. Since all edge tables contain index areas, the table header flag is 1.
  • the index area length indicates the total length of the edge table index area, such as the number of bytes occupied by the edge table index area. The length of the index area can indicate from which bit is the edge table data area.
  • the edge table index area is used to store the index information of each edge, for example, the index information of edge A points to the position of the data of edge A in the edge table data area.
  • the edge table data area is used to store the edge information of each edge.
  • the side information may also include the node type of the target node.
  • the storage length of each piece of side information is the same. For example, for each edge, 4 bytes are used to store the node types of the two target nodes, and 8 bytes are used to store the node identifiers of the two target nodes.
  • the storage order of the edge index information is consistent with the storage order of the nodes in the vertex table (also referred to as the alignment of the edge table and the vertex table). For example, start from the edge table index area, continuously store the edge index information of the first node in the point table, then store the edge index information of the second node, and so on.
  • the edge information can store the edge information of each edge sequentially according to the storage order of the edge index information in the edge table index area.
  • the index information of the corresponding edge can be found according to the position of the node in the point table.
  • the storage order of the edge information in the edge table is consistent with the storage order of the nodes in the vertex table, and the edge information of the same node is stored together consecutively.
  • node A is connected to three nodes K, M, and L
  • node B is connected to two nodes Q and G.
  • the storage order of node A in the point table is the first
  • the storage order of node B in the point table is The second one.
  • the edge information of the three edges A-K, A-M, and A-L, and the edge information of the two edges B-Q and B-G are stored sequentially from the starting position of the edge table data area.
  • the edge index information stored in the edge table index area can only include the initial storage position of the edge information of the edge corresponding to the node in the edge table (such as the edge index information corresponding to node A includes edge A-K
  • the storage address information of the node B, the edge index information corresponding to the node B includes the storage address information of the edge B-Q). That is, in the edge table, the storage area between the index information of the edge corresponding to the previous node and the index information of the edge of the next node is regarded as the edge information of the edge corresponding to the previous node.
  • the edge table index area also includes the edge type of each edge.
  • the edge index information of edge A not only stores the address information, but also includes the edge type.
  • the edge type can reflect the interactive relationship between two entities, such as the litigation relationship between two enterprises or the economic transaction relationship between two enterprises.
  • the edge index information corresponding to the node in the edge table index area may include multiple edge types and multiple storage address information, wherein the multiple edge types are continuously stored, and the multiple storage address information are also continuously stored.
  • node B has multiple edges, and these edges belong to two types of edges, two edge types and two storage address information can be continuously stored in the edge index information of node B, where the first The storage address information is the storage address information of the edge information belonging to the first edge type among the multiple edges of node B in the edge data area (for example, the edge information belonging to the first edge type among the multiple edges of node B is in the edge data area), the second storage address information is the storage address information of the edge information belonging to the second edge type among the multiple edges of node B in the edge data area (for example, among the multiple edges of node B, the edge information belongs to the first The edge information of the two edge types is in the initial storage location of the edge data area).
  • the edge type can be the same as the node type, and the edge type is encoded inside the graph file using a vocabulary, and the edge table part only stores the internal encoding of the edge type.
  • the vocabulary For more descriptions about the vocabulary, refer to the corresponding description in FIG. 6 , which will not be repeated here.
  • edges have directions and nodes may have outgoing and/or incoming edges.
  • the edge table can include two types: an in-edge table and an out-edge table.
  • the in-edge table only stores the relevant data of the in-edge table
  • the out-edge table stores the relevant data of the out-edge table.
  • the storage method of the relevant data of the outgoing/incoming edge in the outgoing/incoming edge table is similar to the foregoing content, and will not be repeated here.
  • Step 730 storing the attribute information of the several nodes in the point attribute table of the data block.
  • step 730 may be performed by the node attribute information storage module 530 .
  • the node attribute information storage module 530 fills the data into the point attribute table in sequence based on the format of the set point attribute table.
  • FIG. 4 is a schematic diagram of an exemplary attribute table 240 .
  • the point attribute table and the edge attribute table may have the same format. Therefore, the attribute table 240 can also be regarded as a point attribute table.
  • the point attribute table includes the point attribute table index area and the point attribute table data area, and the attribute information of the point is stored in the point attribute table data area; the point attribute table index area stores the point attribute index information of the point, and the point attribute index information includes the point The storage address information of the attribute information in the point attribute table data area.
  • each attribute index information can point to an attribute data.
  • the point attribute table may also be aligned with the point table.
  • the storage order of the point attribute index information in the point attribute table is consistent with the storage order of the node information in the point table. With such a setting, it is possible to locate the point attribute index information according to the storage order of the nodes in the point table, and further obtain the attribute information of the node from the point attribute table data area based on the point attribute index information.
  • the attribute table 240 may also include a header flag "1" and the length of the index area.
  • Step 740 storing the edge attribute information of the several nodes in the edge attribute table of the data block.
  • step 740 may be performed by the edge attribute information storage module 540 .
  • the edge attribute information storage module 540 fills data into the edge attribute table in sequence based on the format of the set edge attribute table.
  • the attribute table 240 can also be regarded as an edge attribute table.
  • the attribute information of the edges of several nodes is stored in the edge attribute table data area; the edge attribute table index area stores the attribute index information of each edge, and the edge attribute index information includes the attribute information of the edge in the edge attribute table data area storage address information.
  • the storage order of the edge attribute index information in the edge attribute table index area is consistent with the storage order of the edge information of each edge in the edge table data area.
  • edges have directions and nodes may have outgoing and/or incoming edges.
  • the edge attribute table may include two types: an incoming edge attribute table and an outgoing edge attribute table, wherein only the attribute information of the incoming edge is stored in the incoming edge attribute table, and the attribute information of the outgoing edge is stored in the outgoing edge attribute table.
  • the storage method of the attribute information of the outgoing/incoming edge in the outgoing/incoming edge attribute table is similar to the foregoing content, and will not be repeated here.
  • the process 700 further includes step 750: generating the table element of the data block.
  • step 750 may be performed by the tab generation module 550 .
  • the table element includes the storage address information of each table in the data block and the node identifier of the first node in each point table in the data block. For more descriptions about the table elements, refer to the corresponding description in FIG. 6 , which will not be repeated here.
  • multiple data blocks may be generated according to steps 710-740, and multiple data blocks constitute a map file.
  • the map file can also include information such as vocabulary and data block index.
  • the process 700 further includes step 760: generating a vocabulary of the graph file.
  • step 760 may be performed by the vocabulary generation module 560 .
  • the data block includes encoding information
  • the vocabulary of the graph file can also be generated.
  • the vocabulary includes the mapping relationship between the coding information in each data block in the map file and the original information. For more expressions about the vocabulary, refer to the corresponding description in FIG. 6 , which will not be repeated here.
  • the process 700 further includes step 770: generating a data block index of the atlas file.
  • step 770 may be performed by the data block index generation module 570 .
  • the data block index of the map file includes the storage address information of each data block in the map file and the node identifier of the first node in each data block, which is used to determine which data block the target query node is in. For more descriptions about the data block index, refer to the corresponding description in FIG. 6 , which will not be repeated here.
  • map file is generated based on the map data, and in some embodiments, multiple map files can be generated to form a storage file.
  • Stored files may also include atlas file elements.
  • the process 700 further includes step 780: generating a graph file element.
  • the map file element includes the map file where each data block is located in each map file and the serial number of the data block in the map file, the node identifier of the first node in each map file and the node identifier of the last node in each map file, among which It is used to determine which graph file the target query node is in.
  • map file elements refer to the corresponding description in Figure 6, and will not repeat them here.
  • the possible beneficial effects of the embodiments of this specification include but are not limited to: 1) Store several nodes of the graph data, the edges of these nodes, and attribute information in a data block. Find the edge and attribute information related to the node in the block, without multiple read and write operations; 2)
  • the graph data is stored in multiple data blocks in an orderly manner. For large-scale graph data, it can be distributed and stored on multiple devices. , when performing graph query, multiple devices can query in parallel (for example, different devices query different data blocks), so as to save the time of retrieval query and improve the response speed of graph query; 3) realize the point table-edge table-attribute
  • the alignment of tables saves the storage space of edge tables and attribute tables.
  • different embodiments may have different beneficial effects.
  • the possible beneficial effects may be any one or a combination of the above, or any other possible beneficial effects.
  • aspects of this specification can be illustrated and described by several patentable types or situations, including any new and useful process, machine, product or combination of substances, or their Any new and useful improvements.
  • various aspects of this specification may be entirely executed by hardware, may be entirely executed by software (including firmware, resident software, microcode, etc.), or may be executed by a combination of hardware and software.
  • the above hardware or software may be referred to as “block”, “module”, “engine”, “unit”, “component” or “system”.
  • aspects of this specification may be embodied as a computer product comprising computer readable program code on one or more computer readable media.
  • a computer storage medium may contain a propagated data signal embodying a computer program code, for example, in baseband or as part of a carrier wave.
  • the propagated signal may have various manifestations, including electromagnetic form, optical form, etc., or a suitable combination.
  • a computer storage medium may be any computer-readable medium, other than a computer-readable storage medium, that can be used to communicate, propagate, or transfer a program for use by being coupled to an instruction execution system, apparatus, or device.
  • Program code residing on a computer storage medium may be transmitted over any suitable medium, including radio, electrical cable, fiber optic cable, RF, or the like, or combinations of any of the foregoing.
  • the computer program codes required for the operation of each part of this manual can be written in any one or more programming languages, including object-oriented programming languages such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C++, C#, VB.NET, Python etc., conventional procedural programming languages such as C language, VisualBasic, Fortran2003, Perl, COBOL2002, PHP, ABAP, dynamic programming languages such as Python, Ruby and Groovy, or other programming languages.
  • the program code may run entirely on the user's computer, or as a stand-alone software package, or run partly on the user's computer and partly on a remote computer, or entirely on the remote computer or processing device.
  • the remote computer can be connected to the user computer through any form of network, such as a local area network (LAN) or wide area network (WAN), or to an external computer (such as through the Internet), or in a cloud computing environment, or as a service Use software as a service (SaaS).
  • LAN local area network
  • WAN wide area network
  • SaaS service Use software as a service
  • numbers describing the quantity of components and attributes are used, and it should be understood that such numbers used in the description of the embodiments, in some examples, use the modifiers "about”, “approximately” or “substantially” to express grooming. Unless otherwise stated, “about”, “approximately” or “substantially” indicates that the stated figure allows for a variation of ⁇ 20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that can vary depending upon the desired characteristics of individual embodiments. In some embodiments, numerical parameters should take into account the specified significant digits and adopt the general digit reservation method. Although the numerical ranges and parameters used in some embodiments of this specification to confirm the breadth of the range are approximations, in specific embodiments, such numerical values are set as precisely as practicable.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本说明书涉及一种图数据的存储方法、系统及装置,图数据包括节点和边。存储方法包括:将图数据中的若干个节点的节点信息存储在数据块的点表中;节点信息包括节点标识;将若干个节点的边的边信息存储在数据块的边表中;边信息包括与边连接的目标节点的节点标识;将若干个节点的属性信息存储在数据块的点属性表中;将若干个节点的边的属性信息存储在数据块的边属性表中。

Description

图数据的存储 技术领域
本说明书一个或多个实施例涉及计算机领域,特别涉及一种图数据的存储方法、系统及装置。
背景技术
目前对于图数据的存储和管理,可以使用各种数据库实现。随着社交网络、移动互联网和IOT(物联网)等新的互联网应用不断涌现,各个实体(如用户、系统和传感器等)产生的交互数据呈指数级增长,图数据的规模以及复杂度显著增加。在进行海量和复杂图数据的存储和管理时,需要数据库具备较高的读写效率,以支持高效地进行数据遍历、关联关系查询、一跳子图(即one-hop图,指一个节点与该节点连接的边构成的子图)展开等图处理操作。
所以,亟需一种图数据的存储方法、系统及装置,以实现图数据的高效存储以及图数据的复杂关系查询等功能。
发明内容
本说明书一个方面提供一种图数据的存储方法,所述图数据包括节点和边;所述存储方法包括:将图数据中的若干个节点的节点信息存储在数据块的点表中;所述节点信息包括节点标识;将所述若干个节点的边的边信息存储在所述数据块的边表中;所述边信息包括与边连接的目标节点的节点标识;将所述若干个节点的属性信息存储在所述数据块的点属性表中;将所述若干个节点的边的属性信息存储在所述数据块的边属性表中。
本说明书另一个方面提供一种图数据的存储系统,所述图数据包括节点和边;所述存储系统包括:节点信息存储模块,用于将图数据中的若干个节点的节点信息存储在数据块的点表中;所述节点信息包括节点标识;边信息存储模块,用于将所述若干个节点的边的边信息存储在所述数据块的边表中;所述边信息包括与边连接的目标节点的节点标识;节点属性信息存储模块,用于将所述若干个节点的属性信息存储在所述数据块的点属性表中;边属性信息存储模块,用于将所述若干个节点的边的属性信息存储在所述数据块的边属性表中。
本说明书另一个方面提供一种图数据存储装置,所述装置包括处理器以及存储器;所述存储器用于存储指令,所述处理器用于执行所述指令,以实现所述一种图数据存储装置,包括存储介质和处理器,所述存储介质用于存储计算机指令,所述处理器用于执 行计算机指令以实现图数据存储训练方法。
本说明书另一个方面提供一种图数据文件,所述图数据包括节点和边;所述文件包括若干数据块,其中每个数据块包括:点表,用于存储图数据中至少部分节点的节点信息;所述节点信息包括节点标识;边表,用于存储所述节点的边的边信息;所述边信息包括与边连接的目标节点的节点标识;点属性表,用于存储所述节点的属性信息;边属性表,用于存储所述节点的边的属性信息。
附图说明
本说明书将以示例性实施例的方式进一步描述,这些示例性实施例将通过附图进行详细描述。这些实施例并非限制性的,在这些实施例中,相同的编号表示相同的结构,其中:
图1是根据本说明书的一些实施例所示的示例性图数据存储系统的应用场景示意图;
图2是根据本说明书的一些实施例所示的点表示意图;
图3是根据本说明书的一些实施例所示的边表示意图;
图4是根据本说明书的一些实施例所示的点/边属性表示意图;
图5是根据本说明书一些实施例所示的进行图数据存储的系统框图;
图6是根据本说明书的一些实施例所示的数据块结构示意图;
图7是根据本说明书的一些实施例所示的进行图数据存储的示例性流程图;
图8是根据本说明书的一些实施例所示的进行图数据查询的示例性流程图。
具体实施方式
为了更清楚地说明本说明书实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单的介绍。显而易见地,下面描述中的附图仅仅是本说明书的一些示例或实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图将本说明书应用于其它类似情景。除非从语言环境中显而易见或另做说明,图中相同标号代表相同结构或操作。
应当理解,本说明书中所使用的“系统”、“装置”、“单元”和/或“模组”是用于区分不同级别的不同组件、元件、部件、部分或装配的一种方法。然而,如果其他词语可实现相同的目的,则可通过其他表达来替换所述词语。
如本说明书和权利要求书中所示,除非上下文明确提示例外情形,“一”、“一个”、“一种”和/或“该”等词并非特指单数,也可包括复数。一般说来,术语“包括”与“包含”仅提示包括已明确标识的步骤和元素,而这些步骤和元素不构成一个排它性的 罗列,方法或者设备也可能包含其它的步骤或元素。
本说明书中使用了流程图用来说明根据本说明书的实施例的系统所执行的操作。应当理解的是,前面或后面操作不一定按照顺序来精确地执行。相反,可以按照倒序或同时处理各个步骤。同时,也可以将其他操作添加到这些过程中,或从这些过程移除某一步或数步操作。
图1是根据本说明书的一些实施例所示的示例性图数据库存储系统的应用场景示意图。
随着社交网络、移动互联网和物联网(The Internet of Things,简称IOT)等新的互联网应用不断涌现,不同实体之间(如用户、系统和传感器)产生的数据呈指数级增长,数据内部依赖和复杂度增加。通常会采用图数据的形式以刻画和表征不同实体之间的相互关系。图数据有多个节点以及连接各个节点的边构成,其中,图数据中的节点表示实体,节点之间的边表征实体之间的相互关系。实体可以是物理世界中真实存在的物体、机构等也可以是抽象的概念,例如,公司、设备、人、货品、库位、运输工具、图像、计算机程序、账户等。实体可以具有属性信息,以实体为“人”为例,属性信息包括年龄、性别、职业、工作单位或家庭住址等,对于公司而言,属性信息包括公司注册地址、法人、营业范围、注册资本等信息。实体之间的边(即边信息)可以反映实体之间的关系。如实体人与实体公司之间可以具有雇佣关系,张三与李四之间可以是朋友关系等。边也可以具有属性信息,如雇佣关系的属性信息可以包括建立时间、雇佣关系类型(是正式雇佣还是临时雇佣)等。
随着互联网技术的发展,图数据的规模越来越大,如何对图数据进行存储以实现对存储好的数据进行高效地调用成为了有待解决的问题。
在一些实施例中,可以将图数据存储进入关系型数据库中,这类存储方式会将图数据中的节点和边分离存储。然而,关系型数据库在存储图数据时表现出了较多的不适应性。例如,因为图数据庞大,图数据需要分库分表存储,进而会将节点以及这些节点的边拆分存储,再进行图数据查询时,需要不同数据库(如存储设备)之间交互,找到目标查询节点及其边,又或者需要多次读写才能获取目标查询节点及其边。
为了弥补关系数据库的上述缺点,在一些实施例中提出了基于图数据库的图数据存储方式。在图数据库中,数据之间的关系占重要地位,可以存储海量的、关系复杂的数据以及复杂数据之间的相互关系。具体地,图数据库是将图数据中的节点和边分到不同的KV存储引擎的图数据库进行存储,并在图数据库之上搭建proxy层(即代理层)以提供图查询服务。然而,这种做法一方面由于增设了代理层,数据在查询过程中需要多 次地在不同的数据区域进行缓存,提高了整个查询过程的复杂性。另一方面,对图数据库进行图查询时,由于节点与边是分开存储的,在检索一个一跳子图(即one-hop图,指一个节点、该节点连接的边与边另一端的节点构成的子图)时,需要分别查询该节点以及与该节点相连的所有边。换言之,查询一个一跳子图需要很多次地读写操作才能得到一个一跳子图的查询结果,这样的检索效率很低。同时,为了保证以上查询过程中的效率,图数据库需要独立的集群服务器(计算机)进行部署和运维,以保证具有足够的内存以进行图查询过程中的多次读写操作的需求,这也带来了较大的设备运维成本。
针对以上技术的不足,本说明书一些实施例提供了一种图数据的存储方法,包括:将图数据中的若干个节点的节点信息、边信息、节点属性信息以及边属性信息对应存储在同一数据块的点表、边表、点属性表以及边属性表中。通过这种方式,可以通过一次读取数据块,便可以获得相关节点的节点信息以及边信息,有效降低了图处理过程中的读写频次。示例性的,当需要进行一跳子图查询时,读写一次数据块便可完成,查询效率显著提高。
在本说明书的一些实施例中,还可以使得边在边表中的存储顺序与所述若干个节点在点表中的存储顺序一致,使得若干个节点的属性信息在点属性表的存储顺序与所述若干个节点在点表中的存储顺序一致,使得若干个节点的边的属性信息在边属性表的存储顺序与所述若干个节点的边在边表中的存储顺序一致,通过这样的方式,实现了点表-边表-属性表的对齐。在查询到节点A后,可以快速地确定节点A对应的所有边在边表中的位置,进而可以快速定位到节点A在边属性表中的属性信息。这样的设置使得图查询过程中无需过多的数据读写以及缓存需求,因此整个过程无需常驻的服务集群来支持。
需要说明的是,在说明书的实施例中,由于图数据是按顺序存储在多个数据块中,且节点信息及其边信息存储在同一个数据块中,对于规模较大的图数据可以用多个数据块或者用多个图谱文件(图谱文件中包含多个数据块)进行存储,这使得本说明书涉及的一个及多个实施例可以由多台设备对图数据进行分布式存储并支持并行查询(如不同的设备查询不同的数据块),以进一步提高查询效率。
在一些实施例中,图数据存储系统的应用场景如图1所示,场景100可以包括存储设备110-1、存储设备110-2、…、存储设备110-n和处理设备120。
存储设备110-1、存储设备110-2、存储设备110-3、…可包括处理器以及大容量存储器、可移动存储器、易失性读写存储器、只读存储器(ROM)等或其任意组合,用于数据存储、管理资源以及处理来自本系统至少一个组件或外部数据源(例如,云数据 中心)的数据和/或信息。在一些实施例中,存储设备110-1、存储设备110-2、存储设备110-3、…中的每一个可以是单一服务器或服务器组。该服务器组可以是集中式或分布式的(例如,服务器110-1可以是分布式系统),可以是专用的也可以由其他设备或系统同时提供服务。在一些实施例中,存储设备110-1、存储设备110-2、存储设备110-3、…可以是区域的或者远程的。在一些实施例中,存储设备110-1、存储设备110-2、存储设备110-3、…可以在云平台上实施,或者以虚拟方式提供。仅作为示例,所述云平台可以包括私有云、公共云、混合云、社区云、分布云、内部云、多层云等或其任意组合。
在一些实施例中,存储设备110-1、存储设备110-2、…、存储设备110-n中的任一个或以上个可以存储一个或多个图谱文件,同时支持图数据的并行查询。图谱文件中可以包括多个数据块,每个数据块用于存储图数据中全部或部分节点的节点信息、边信息以及节点和边对应的属性信息。具体地,如图1中200所示即为一个典型的数据块结构,每个数据块中包括点表210、边表220,点属性表230,边属性表240以及表元250。
处理设备120可以生成或获取图数据,将图数据写入到多个数据块或多个图谱文件中,并将多个数据块或图谱文件分发给存储设备110-1、存储设备110-2、…、存储设备110-n进行存储。在一些实施例中,处理设备120可以获取查询请求,并将查询请求分发给各存储设备,以便各存储设备在本地存储的图谱数据或数据块中进行查询,并将查询结果返回给处理设备120。在一些实施例中,在图数据规模不大的情形下,可以使用一个存储设备对其图谱文件进行存储,此时,处理设备120可以省去。
在一些实施例中,场景100还可以包括网络(图中未示出)。网络可以连接系统的各组成部分和/或连接系统与外部部分。网络使得系统各组成部分之间以及与系统与外部部分之间可以进行通讯,促进数据和/或信息的交换。在一些实施例中,网络130可以是有线网络或无线网络中的任意一种或多种。例如,网络可以包括电缆网络、光纤网络、电信网络、互联网、局域网络(LAN)、广域网络(WAN)、无线局域网络(WLAN)、城域网(MAN)、公共交换电话网络(PSTN)、蓝牙网络、紫蜂网络(ZigBee)、近场通信(NFC)、设备内总线、设备内线路、线缆连接等或其任意组合。在一些实施例中,系统各部分之间的网络连接可以采用上述一种方式,也可以采取多种方式。在一些实施例中,网络可以是点对点的、共享的、中心式的等各种拓扑结构或者多种拓扑结构的组合。
图5是根据本说明书一些实施例所示的进行图数据库存储的系统框图。
如图5所示,系统500布置在任意可执行程序的处理设备上(如图1中的服务器110-1、 存储设备110-2、…、存储设备110-n中的任意一个),具体包括:节点信息存储模块510,用于将图数据中的若干个节点的节点信息存储在数据块的点表中;所述节点信息包括节点标识;边信息存储模块520,用于将所述若干个节点的边的边信息存储在所述数据块的边表中;所述边信息包括与边连接的目标节点的节点标识;节点属性信息存储模块530,用于将所述若干个节点的属性信息存储在所述数据块的点属性表中;边属性信息存储模块540,用于将所述若干个节点的边的属性信息存储在所述数据块的边属性表中。
在一些实施例中,所述若干个节点的边在边表中的存储顺序与所述若干个节点在点表中的存储顺序一致;所述若干个节点的属性信息在点属性表的存储顺序与所述若干个节点在点表中的存储顺序一致;所述若干个节点的边的属性信息在边属性表的存储顺序与所述若干个节点的边在边表中的存储顺序一致。
在一些实施例中,所述边表包括边表索引区以及边表数据区;所述若干个节点的边的边信息存储在所述边表数据区中;边表索引区存储有所述若干个节点的边的索引信息,所述边的索引信息包括对应节点的边的边信息在所述边表数据区中的存储地址信息;所述若干个节点的边的索引信息的存储顺序与所述若干个节点在点表中的存储顺序一致。
在一些实施例中,所述节点信息还包括节点的边的存储地址信息,所述点表中边的存储地址信息为对应边的索引信息在边表中的存储地址信息。
在一些实施例中,同一节点的不同边的边信息在所述边表数据区中连续存储;所述若干个节点的边的边信息的存储顺序与所述若干个节点在点表中的存储顺序一致。
在一些实施例中,边的索引信息还包括边类型;边信息还包括目标节点的节点类型;同一个节点的边的边信息按照边的边类型在边表数据区中顺序存储。
在一些实施例中,所述边属性表包括边属性表索引区以及边属性表数据区;所述若干个节点的边的属性信息存储在所述边属性表数据区中;边属性表索引区存储有所述若干个节点的边的边属性索引信息,边属性索引信息包括该边的属性信息在所述边属性表数据区中的存储地址信息;所述若干个节点的边的边属性索引信息的存储顺序与所述若干个边的边信息在边表数据区中的存储顺序一致。
在一些实施例中,节点信息还包括节点类型,所述若干个节点的节点信息按照节点标识顺序存储在所述点表中。
在一些实施例中,所述点属性表中包括点属性表索引区以及点属性表数据区;所述若干个节点的属性信息存储在所述点属性表数据区中;点属性表索引区存储有所述若干个节点的节点属性索引信息,节点属性索引信息包括该节点的属性信息在所述点属性表 数据区中的存储地址信息;所述若干个节点的节点属性索引信息的存储顺序与所述若干个节点在点表中的存储顺序一致。
在一些实施例中,系统500还包括表元生成模块550,所述表元生成模块550用于生成所述数据块的表元,所述表元包括所述数据块中各表的存储地址信息以及所述数据块中各点表中第一个节点的节点标识。
在一些实施例中,数据块包括编码信息;系统500还包括词表生成模块560,词表生成模块560用于生成图谱文件的词表;所述词表包括图谱文件中各数据块中的编码信息与原始信息的映射关系。
在一些实施例中,系统500还包括数据块索引生成模块570,数据块索引生成模块570用于生成图谱文件的数据块索引;所述图谱文件的数据块索引包括图谱文件中各数据块的存储地址信息以及各数据块中第一个节点的节点标识。
在一些实施例中,系统500还包括图谱文件元生成模块580,图谱文件元生成模块580用于生成图谱文件元,所述图谱文件元包括各图谱文件中各数据块所在的图谱文件以及在该图谱文件中的数据块序号、各图谱文件中第一个节点的节点标识以及各图谱文件中最后一个节点的节点标识。
在一些实施例中,数据块为最小读写单元。
在一些实施例中,图数据的边包括出边和入边;所述边表包括出边表和入边表;所述边属性表包括出边属性表和入边属性表;所述节点信息还包括节点的出边的存储地址信息和入边的存储地址信息。
应当理解,图5所示的系统及其模块可以利用各种方式来实现。例如,在一些实施例中,装置及其模块可以通过硬件、软件或者软件和硬件的结合来实现。其中,硬件部分可以利用专用逻辑来实现;软件部分则可以存储在存储器中,由适当的指令执行装置,例如微处理器或者专用设计硬件来执行。本领域技术人员可以理解上述的方法和装置可以使用计算机可执行指令和/或包含在处理器控制代码中来实现,例如在诸如磁盘、CD或DVD-ROM的载体介质、诸如只读存储器(固件)的可编程的存储器或者诸如光学或电子信号载体的数据载体上提供了这样的代码。本说明书的装置及其模块不仅可以有诸如超大规模集成电路或门阵列、诸如逻辑芯片、晶体管等的半导体、或者诸如现场可编程门阵列、可编程逻辑设备等的可编程硬件设备的硬件电路实现,也可以用例如由各种类型的处理器所执行的软件实现,还可以由上述硬件电路和软件的结合(例如,固件)来实现。
图6是根据本说明书的一些实施例所示的数据块结构示意图。
下面结合图6对本说明涉及的一个及多个实施例所涉及的存储文件的形式进行进一步说明。
存储文件600包括图谱文件元以及一个或多个图谱文件。图谱文件元包括各图谱文件中各数据块所在的图谱文件以及在该图谱文件中的数据块序号、各图谱文件中第一个节点的节点标识以及各图谱文件中最后一个节点的节点标识。节点标示是指示节点在图数据中的编号,用于追溯节点在图数据中的位置。示例性地,节点标示可以设置为节点1,节点2,…,节点m等。在一些实施例中,图数据中的节点可以基于节点标识存储在多个数据块或图谱文件中,以便快速确定目标查找节点在哪个图谱文件中。图谱文件元可以理解为多个图谱文件的索引信息,其可以被上位机或者是服务器进行调取和访问(如通过SDK等方式进行调用)。
一个图谱文件可以包括多个数据块,在一些实施例中,图谱文件可以包含数量固定的数据块,如一个图谱文件可以包括1024个数据块。其中,数据块为最小读写单元,可以用于存储和写入数据。在进行图数据存储时,数据块为最小写单元,处理设备可以按照数据块的格式将图数据依次写入一个或多个数据块。数据块可以有固定大小,如64字节、128字节等。当一个数据块被写满时,便创建一个新的数据块继续写入,直到将一个完整的图数据被写入。在一些实施例中,数据块中的数据来自同一图数据,也可以来自不同的图数据。数据块中具体包括点表、点属性表、边表和边属性表,在一些实施例中数据块还可以包括表元,表元包括数据块中各表的存储地址信息以及数据块中点表中第一个节点的节点标识,表元可以视作数据块内部的索引信息,便于快速定位到各表的存储位置。有关点表、点属性表、边表和边属性表的更多描述可参见图7对应部分的详细描述,在此不再赘述。
在一些实施例中,图谱文件除了包含多个数据块以外,还可以包括文件页脚信息、数据块索引以及词表。
图谱文件的词表用于记录编码信息与原始信息的映射关系,进一步,词表可以用来对图谱文件中的至少部分信息进编码或解码。示例性地,边类型、节点类型等信息可以使用数字表征,如数字1表示用户类节点、数字2表示公司类节点,因此,在点表中存储节点类型时可以用1、2等数字表示对应的类型。将文本以更为简短的数字或字母予以表征,可以有效减少图数据实际的存储空间。相应的,词表中可以记录有“1”——用户类节点,“2”——公司类节点等类似的映射关系。
图谱文件的数据块索引包括图谱文件中各数据块的存储地址信息以及各数据块中第一个节点的节点标识。图谱文件的数据块索引可以快速确定目标查询点在哪一个数据 块中。
文件页脚信息包括数据块中的总节点数、边的总数以及文件扩展区域(比如文件协议、压缩算法、校正信息等)。
图8是根据本说明书的一些实施例所示的进行图数据查询的示例性流程图。下面结合图8所示出的流程800,以已知目标查询节点,查找该目标查询节点的N跳子图为例阐述存储文件的使用方法。N跳子图包括目标查询节点的N跳边以及各边上的节点。存储设备接收来自业务端或处理设备的查询请求,如步骤810,查询请求中包括目标查询节点的节点标识。首先,存储设备访问图谱文件元,如步骤820,通过图谱文件元中存储的各图谱文件的第一个节点的节点标识以及各图谱文件中最后一个节点的节点标识确定目标节点存储在哪个图谱文件中(如锁定到一个图谱文件V)。进一步地,再基于该图谱文件的数据块索引(图谱文件V的数据块索引)中存储的各数据块中第一个节点的节点标识,确定目标查询节点所在的目标数据块,如步骤830。再基于数据块索引中存储的图谱文件中各数据块的存储地址信息定位到目标查询节点所在的目标数据块,例如步骤840,具体可以获取所述目标数据块。在目标数据块中,可以基于其表元定位到点表,在点表中基于节点标识查找到目标查询节点的节点信息,当点表中的节点信息按照节点标识顺序存储时,可以通过二分查找的方式快速确定目标查询节点的节点信息,如步骤850。由于点表、边表、点属性表以及边属性表位于同一数据块中且相互对齐,因此通过一次读操作(如将数据块加载到内存中)便可基于目标查询节点的节点信息在所述点表中的存储顺序或者边的存储地址信息,从所述目标数据块的边表、点属性表以及边属性表中的一个或多个表中获取目标查询节点的边信息、点属性信息以及边属性信息中的一种或多种信息,如步骤860,进而找到目标查询节点的一跳子图。进一步,获取一跳子图中目标查询节点的各第一跳邻居节点(目标查询节点第一跳边上的节点)的节点标识,重复上述步骤,可以找到各第一跳邻居节点的一跳子图,得到目标查询节点的二跳子图,以此类推,得到目标查询节点的N跳子图。
需要说明的是,在本说明书涉及的一个或多个实施例中,图数据的边可以包括出边和入边。在该场景的实施例中,本说明书所涉及的边表也可以进一步分为出边表和入边表;所对应的边属性表也包括出边属性表和入边属性表;对应的节点信息还包括节点的出边的存储地址信息和入边的存储地址信息。
图7是根据本说明书的一些实施例所示的进行图数据存储的示例性流程图。在一些实施例中,进行图数据存储的示例性流程如流程700所示,其中,流程700可以包括步骤710、步骤720、…、步骤780,以下是对流程700的详细描述。
步骤710,将图数据中的若干个节点的节点信息存储在数据块的点表中。
在一些实施例中,步骤710可以由节点信息存储模块510执行。节点信息存储模块510基于设定好的点表的格式将节点信息按序填入点表中。图数据包括节点和边,在一些实施例中,节点信息存储模块510可以从图数据中选取若干节点进行存储。若干节点可以是图数据的全部节点,也可以是其中的一部分。
如图2所示为一个示例性的点表210的示意图。点表中存储若干个节点的节点信息,节点信息包括节点标识。节点标识是指示节点在图数据中的编号,用于追溯节点在图数据中的位置。示例性地,节点标识可以设置为节点1,节点2,…,节点m等。在一些实施例中,点表中存储的节点信息基于节点标识的顺序进行存储。示例性地,节点信息存储模块510可以从图数据中选取节点标识连续的若干节点,并按照节点标识的升序或者降序将这些节点的节点信息进行依次存储。
在一些实施例中,节点信息还包括节点对应边的存储地址信息,边的存储地址信息指示该边在边表中的存储位置,例如可以是边的索引信息在边表中的存储地址信息。其中,存储地址信息可以是绝对地址,也可以是相对于某起始位置的偏移量。示例性的,边的索引信息在边表中的存储地址信息可以是一个绝对地址,或者是其相对边表起始位置的偏移量。通过这样的设置,在进行图查询时,在定位到某目标节点后,可以基于目标节点在点表中的边的存储地址信息直接确定与该目标节点相连的边的数据。
一般来说,节点可以包括多条边。在一些实施例中,节点信息存储模块510可以将节点的每一条边的存储地址信息均在点表中进行记录,即一个节点信息中可以记录所有与该节点连接的边的存储地址信息。但是,在一些实施场景中,由于一个节点对应的边的数量很多(如一个商户节点可以与成千上万个用户节点相连),采用以上方式存储节点所有边的存储地址信息会占用大量的存储资源,十分低效。因此,在本说明书的一些实施例中,可以在边表中将同一节点的边信息连续存储。如节点A具有5条边,节点B具有3条边。在边表中,节点A的5条边的边信息从第一存储位置(如边表中的第16个字节)开始连续地存放在一个区域(如大小为12×5=60字节的区域)内,节点B的边信息从第二存储位置(如边表中的第76个字节)开始连续地存放在另一个区域(如大小为12×3字节的区域)。如此,如图2所示,点表中存储的每个节点的边存储地址信息可以只包括其边的在边表中的起始存储位置(如A节点的边的存储地址信息为第一存储位置,B节点的边的存储地址信息为第二存储位置)。即,点表中,前一节点的边的存储地址信息到下一个节点的边的存储地址信息中间存储区域均视为前一节点对应的边的存储地址信息。
在一些实施例中,边具有方向,节点可以具有出边和/或入边,其中入边是指向该节点的边,出边是从该节点出发指向另一节点的边。因此,在一些实施例中,在点表中,节点信息中的边的存储地址信息可以进一步分为入边的存储地址信息以及出边的存储地址信息。对应的,边表可以包括入边表与出边表两种,其中入边表中仅存储入边的边信息,出边表中存储出边表的边信息。节点信息中的出/入边的存储地址信息,以及出/入边的边信息在出/入边表中的存储方式与前述内容类似,在此不再赘述。有关边的存储地址信息的更多描述可参见步骤720的相应描述。
在一些实施例中,节点信息还可以包括节点的类型信息。由于节点可以描述物理世界中任何实体或对象,因此其可以具有不同的类型。例如,用户类型的节点、公司类型的节点、地点类的节点等等。节点类型(图中未示出)可以存储在如图2所示的每个节点的节点标识与边的存储地址信息之间。一般来说,节点的类型是可以穷举的,为了方便对节点类型进行表示和存储,在一些实施例中,还可以通过词表对节点类型进行图谱文件内部的编码,点表仅存储编码后的节点类型。当需要从点表中读取节点的节点类型时,可以再次基于词表将其编码解析成语义明确的节点类型,如“用户类节点”。通过词表进行文件内编解码的方式可以使得节点类型的表达变得简约,以进一步减小存储空间。有关词表的更多描述可参见图6的描述,在此不再赘述。
在一些实施例中,节点信息也可以先按照节点类型的顺序存储,再按照节点标识顺序存储。例如,用户类节点可以存储在一起,在多个用户类节点中按照节点标识再次顺序存储。当按照节点类型排序时,可以是按照节点类型描述文本的第一个字符的拼音字母或第一个单词的首字母顺序排列。图2所示出的点表210中还包括表头标识位,用于指示该表是否具有索引区,在一些实施例中,点表不包含索引区,其表头标识位存储“0”。
步骤720,将所述若干个节点的边的边信息存储在所述数据块的边表中。
在一些实施例中,步骤720可以由边信息存储模块520执行。边信息存储模块520基于设定好的边表的格式将数据按序填入边表中。
在一些实施例中,边表可以包括边表索引区以及边表数据区。可以理解,由于边可以由边所连接的两个目标节点进行刻画,因此,边信息可以包括与边连接的目标节点的节点标识。在一些实施例中,边信息存储在边表数据区中,如边表数据区中存储的是一对对目标节点的节点标识,其中每一对目标节点的节点标识对应一条边。边表索引区存储各边的边信息在边表中的索引信息,例如包括各边对应的目标节点的节点标识在边表数据区中的存储地址信息。
如图3所示即为一个示例性的边表220的示意图。图中,表头标识位表示该表是否 具有索引区。示例性地,可以将表头标识位设为“1”表示有索引区;将表头标识位设为“0”表示无索引区。由于边表均包含索引区,因此表头标识位为1。索引区长度表示边表索引区的总长度,如表示边表索引区所占用的字节数。索引区长度可以表示从哪一位起是边表数据区。边表索引区用于存储各边的索引信息,例如,边A的索引信息指向了边A的数据在边表数据区中的位置。边表数据区用于存储各边的边信息。在一些实施例中,边信息还可以包括目标节点的节点类型。在一些实施例中,每条边信息的存储长度是相同的。例如,对于每一条边,使用4字节存储两个目标节点的节点类型,使用8个字节存储两个目标节点的节点标识。
在一些实施例中,边的索引信息的存储顺序与节点在点表中的存储顺序一致(也可称之为边表与点表的对齐)。例如,从边表索引区开始,连续存储点表中第一个节点的边的索引信息,之后存储第二个节点的边的索引信息,以此类推。在边表数据区中,边信息可以按照边表索引区中边的索引信息的存储顺序,依次存储各边的边信息。由此,可以按照节点的在点表中的位置找到对应的边的索引信息。例如,确定某个节点在点表中的存储顺序第k个,可以直接读取第k个边的索引信息,进而基于第k个边的索引信息找到第k个节点对应边在边表数据区的存储位置。
在一些实施例中,边表中的边信息的存储顺序与节点在点表中的存储顺序一致,同一节点的边信息连续存储在一起。例如,节点A与K、M、L三个节点相连,节点B与Q、G两个节点相连,节点A在点表中的存储顺序为第一个,节点B在点表中的存储顺序是第2个,此时,从边表数据区的起始位置依次存储的是A-K、A-M、A-L这三条边的边信息,B-Q、B-G这两条边的边信息。如此,如图3所示,边表索引区中存储的边的索引信息可以只包括对应节点的边的边信息在边表中的起始存储位置(如节点A对应的边索引信息包括边A-K的存储地址信息,节点B对应的边索引信息包括边B-Q的存储地址信息)。即,边表中,前一节点对应的边的索引信息到下一个节点的边的索引信息中间的存储区域均视为前一节点对应的边的边信息。
可选的,在一些实施例中,边表索引区中还包括各边的边类型,如图3中在边A的边索引信息中除了存储地址信息外,还包括边类型。边类型可以反映两个实体之间的交互关系,如两个企业之间的诉讼关系或者两个企业之间的经济交易关系等。在一些实施例中,当同一节点对应多条边,且多条边分属不同的类型时,在边表数据区中,同一节点的边的边信息可以按照边类型顺序存储。此时,该节点在边表索引区对应的边索引信息可以包括多个边类型以及多个存储地址信息,其中,所述多个边类型连续存储,多个存储地址信息也连续存储。如图3所示,假设节点B有多条边,且这些边分属两种边 类型,则可以在节点B的边索引信息中连续存储两个边类型以及两个存储地址信息,其中第一个存储地址信息为节点B的多条边中属于第一个边类型的边信息在边数据区的存储地址信息(如节点B的多条边中属于第一个边类型的边信息在边数据区的起始存储位置),第二个存储地址信息为节点B的多条边中属于第二个边类型的边信息在边数据区的存储地址信息(如节点B的多条边中属于第二个边类型的边信息在边数据区的起始存储位置)。通过这样地设置,使得在进行图查询时,可以快速定位某一节点对应的某一边类型对应的所有边。
在一些实施例中,边类型可以与节点类型一样,采用词表对边类型进行图谱文件内部的编码,边表部分仅存储边类型的内部编码。有关词表的更多描述可参见图6的相应描述,在此不再赘述。
在一些实施例中,边具有方向,节点可以具有出边和/或入边。对应的,边表可以包括入边表与出边表两种,其中入边表中仅存储入边的相关数据,出边表中存储出边表的相关数据。出/入边的相关数据在出/入边表中的存储方式与前述内容类似,在此不再赘述。
步骤730,将所述若干个节点的属性信息存储在所述数据块的点属性表中。
在一些实施例中,步骤730可以由节点属性信息存储模块530执行。节点属性信息存储模块530基于设定好的点属性表的格式将数据按序填入点属性表中。
如图4所示为一个示例性的属性表240的示意图。在一些实施例中,点属性表与边属性表可以具有相同的格式。因此,属性表240亦可看作点属性表。点属性表包括点属性表索引区以及点属性表数据区,点的属性信息存储在点属性表数据区中;点属性表索引区存储有点的点属性索引信息,点属性索引信息包括该点的属性信息在点属性表数据区中的存储地址信息。如图4所示,每一个属性索引信息都可以指向一个属性数据。
在一些实施例中,与边表与点表的对齐相类似,点属性表也可以与点表相对齐。具体地,点属性表中点属性索引信息的存储顺序与点表中节点信息的存储顺序一致。通过这样的设置,可以根据节点在点表中的存储顺序确定定位到点属性索引信息,进一步基于点属性索引信息从点属性表数据区中获取该节点的属性信息。
在一些实施例中,属性表240还可以包括表头标识位“1”,以及索引区长度。
步骤740,将所述若干个节点的边的属性信息存储在所述数据块的边属性表中。
在一些实施例中,步骤740可以由边属性信息存储模块540执行。边属性信息存储模块540基于设定好的边属性表的格式将数据按序填入边属性表中。
同理,属性表240亦可看作边属性表。若干个节点的边的属性信息存储在所述边属 性表数据区中;边属性表索引区存储有各边的属性索引信息,边属性索引信息包括该边的属性信息在边属性表数据区中的存储地址信息。
在一些实施例中,边属性索引信息在边属性表索引区的存储顺序与各边的边信息在边表数据区中的存储顺序一致。
在一些实施例中,边具有方向,节点可以具有出边和/或入边。对应的,边属性表可以包括入边属性表与出边属性表两种,其中入边属性表中仅存储入边的属性信息,出边属性表中存储出边的属性信息。出/入边的属性信息在出/入边属性表中的存储方式与前述内容类似,在此不再赘述。
在一些实施例中,流程700还包括步骤750:生成数据块的表元。在一些实施例中,步骤750可以由表元生成模块550执行。
表元包括数据块中各表的存储地址信息以及数据块中各点表中第一个节点的节点标识。有关表元的更多表述可参见图6的相应说明,在此不再赘述。
至此,便完成了一个数据块的生成。在一些实施例中,可以按照步骤710~740生成多个数据块,多个数据块构成一个图谱文件。图谱文件还可以包括词表、数据块索引等信息。
在一些实施例中,流程700还包括步骤760:生成图谱文件的词表。在一些实施例中,步骤760可以由词表生成模块560执行。
在一些实施例中,数据块包括编码信息,此时,还可以生成图谱文件的词表。词表包括图谱文件中各数据块中的编码信息与原始信息的映射关系。有关词表的更多表述可参见图6的相应说明,在此不再赘述。
在一些实施例中,流程700还包括步骤770:生成图谱文件的数据块索引。在一些实施例中,步骤770可以由数据块索引生成模块570执行。
图谱文件的数据块索引包括图谱文件中各数据块的存储地址信息以及各数据块中第一个节点的节点标识,其用来确定目标查询节点在哪一个数据块中。有关数据块索引的更多表述可参见图6的相应说明,在此不再赘述。
至此,便基于图数据生成了一个图谱文件,在一些实施例中,可以生成多个图谱文件,以构成存储文件。存储文件还可以包括图谱文件元。
在一些实施例中,流程700还包括步骤780:生成图谱文件元。
图谱文件元包括各图谱文件中各数据块所在的图谱文件以及在该图谱文件中的数据块序号、各图谱文件中第一个节点的节点标识以及各图谱文件中最后一个节点的节点标识,其用来确定目标查询节点在哪一个图谱文件中。有关图谱文件元的更多表述可参 见图6的相应说明,在此不再赘述。
本说明书实施例可能带来的有益效果包括但不限于:1)将图数据的若干节点、这些节点的边、属性信息存储在一个数据块中,在进行图查询时,可以方便的在一个数据块中找到节点相关的边和属性信息,无需多次读写操作;2)图数据是有序存储在多个数据块中,对于规模较大的图数据,可以分布式存储在多台设备上,在进行图查询时可以由多台设备并行查询(如不同的设备查询不同的数据块),以节约检索查询的时间,提高图查询的响应速度;3)实现了点表-边表-属性表的对齐,节约了边表、属性表的存储空间。需要说明的是,不同实施例可能产生的有益效果不同,在不同的实施例里,可能产生的有益效果可以是以上任意一种或几种的组合,也可以是其他任何可能获得的有益效果。
上文已对基本概念做了描述,显然,对于本领域技术人员来说,上述详细披露仅仅作为示例,而并不构成对本说明书的限定。虽然此处并没有明确说明,本领域技术人员可能会对本说明书进行各种修改、改进和修正。该类修改、改进和修正在本说明书中被建议,所以该类修改、改进、修正仍属于本说明书示范实施例的精神和范围。
同时,本说明书使用了特定词语来描述本说明书的实施例。如“一个实施例”、“一实施例”、和/或“一些实施例”意指与本说明书至少一个实施例相关的某一特征、结构或特点。因此,应强调并注意的是,本说明书中在不同位置两次或多次提及的“一实施例”或“一个实施例”或“一个替代性实施例”并不一定是指同一实施例。此外,本说明书的一个或多个实施例中的某些特征、结构或特点可以进行适当的组合。
此外,本领域技术人员可以理解,本说明书的各方面可以通过若干具有可专利性的种类或情况进行说明和描述,包括任何新的和有用的工序、机器、产品或物质的组合,或对他们的任何新的和有用的改进。相应地,本说明书的各个方面可以完全由硬件执行、可以完全由软件(包括固件、常驻软件、微码等)执行、也可以由硬件和软件组合执行。以上硬件或软件均可被称为“数据块”、“模块”、“引擎”、“单元”、“组件”或“系统”。此外,本说明书的各方面可能表现为位于一个或多个计算机可读介质中的计算机产品,该产品包括计算机可读程序编码。
计算机存储介质可能包含一个内含有计算机程序编码的传播数据信号,例如在基带上或作为载波的一部分。该传播信号可能有多种表现形式,包括电磁形式、光形式等,或合适的组合形式。计算机存储介质可以是除计算机可读存储介质之外的任何计算机可读介质,该介质可以通过连接至一个指令执行系统、装置或设备以实现通讯、传播或传输供使用的程序。位于计算机存储介质上的程序编码可以通过任何合适的介质进行传播, 包括无线电、电缆、光纤电缆、RF、或类似介质,或任何上述介质的组合。
本说明书各部分操作所需的计算机程序编码可以用任意一种或多种程序语言编写,包括面向对象编程语言如Java、Scala、Smalltalk、Eiffel、JADE、Emerald、C++、C#、VB.NET、Python等,常规程序化编程语言如C语言、VisualBasic、Fortran2003、Perl、COBOL2002、PHP、ABAP,动态编程语言如Python、Ruby和Groovy,或其他编程语言等。该程序编码可以完全在用户计算机上运行、或作为独立的软件包在用户计算机上运行、或部分在用户计算机上运行部分在远程计算机运行、或完全在远程计算机或处理设备上运行。在后种情况下,远程计算机可以通过任何网络形式与用户计算机连接,比如局域网(LAN)或广域网(WAN),或连接至外部计算机(例如通过因特网),或在云计算环境中,或作为服务使用如软件即服务(SaaS)。
此外,除非权利要求中明确说明,本说明书所述处理元素和序列的顺序、数字字母的使用、或其他名称的使用,并非用于限定本说明书流程和方法的顺序。尽管上述披露中通过各种示例讨论了一些目前认为有用的发明实施例,但应当理解的是,该类细节仅起到说明的目的,附加的权利要求并不仅限于披露的实施例,相反,权利要求旨在覆盖所有符合本说明书实施例实质和范围的修正和等价组合。例如,虽然以上所描述的系统组件可以通过硬件设备实现,但是也可以只通过软件的解决方案得以实现,如在现有的处理设备或移动设备上安装所描述的系统。
同理,应当注意的是,为了简化本说明书披露的表述,从而帮助对一个或多个发明实施例的理解,前文对本说明书实施例的描述中,有时会将多种特征归并至一个实施例、附图或对其的描述中。但是,这种披露方法并不意味着本说明书对象所需要的特征比权利要求中提及的特征多。实际上,实施例的特征要少于上述披露的单个实施例的全部特征。
一些实施例中使用了描述成分、属性数量的数字,应当理解的是,此类用于实施例描述的数字,在一些示例中使用了修饰词“大约”、“近似”或“大体上”来修饰。除非另外说明,“大约”、“近似”或“大体上”表明所述数字允许有±20%的变化。相应地,在一些实施例中,说明书和权利要求中使用的数值参数均为近似值,该近似值根据个别实施例所需特点可以发生改变。在一些实施例中,数值参数应考虑规定的有效数位并采用一般位数保留的方法。尽管本说明书一些实施例中用于确认其范围广度的数值域和参数为近似值,在具体实施例中,此类数值的设定在可行范围内尽可能精确。
针对本说明书引用的每个专利、专利申请、专利申请公开物和其他材料,如文章、书籍、说明书、出版物、文档等,特此将其全部内容并入本说明书作为参考。与本说明 书内容不一致或产生冲突的申请历史文件除外,对本说明书权利要求最广范围有限制的文件(当前或之后附加于本说明书中的)也除外。需要说明的是,如果本说明书附属材料中的描述、定义、和/或术语的使用与本说明书所述内容有不一致或冲突的地方,以本说明书的描述、定义和/或术语的使用为准。
最后,应当理解的是,本说明书中所述实施例仅用以说明本说明书实施例的原则。其他的变形也可能属于本说明书的范围。因此,作为示例而非限制,本说明书实施例的替代配置可视为与本说明书的教导一致。相应地,本说明书的实施例不仅限于本说明书明确介绍和描述的实施例。

Claims (19)

  1. 一种图数据的存储方法,所述图数据包括节点和边;所述存储方法包括:
    将所述图数据中的若干个节点的节点信息存储在数据块的点表中;所述节点信息包括节点标识;
    将所述若干个节点的边的边信息存储在所述数据块的边表中;所述边信息包括与边连接的目标节点的节点标识;
    将所述若干个节点的属性信息存储在所述数据块的点属性表中;
    将所述若干个节点的边的属性信息存储在所述数据块的边属性表中。
  2. 根据权利要求1所述的方法,所述若干个节点的边在边表中的存储顺序与所述若干个节点在点表中的存储顺序一致;
    所述若干个节点的属性信息在点属性表的存储顺序与所述若干个节点在点表中的存储顺序一致;
    所述若干个节点的边的属性信息在边属性表的存储顺序与所述若干个节点的边在边表中的存储顺序一致。
  3. 根据权利要求1或2所述的方法,所述边表包括边表索引区以及边表数据区;
    所述若干个节点的边的边信息存储在所述边表数据区中;
    所述边表索引区存储有所述若干个节点的边的索引信息,所述边的索引信息包括对应节点的边的边信息在所述边表数据区中的存储地址信息;
    所述若干个节点的边的索引信息在所述边表索引区中的存储顺序与所述若干个节点在点表中的存储顺序一致。
  4. 根据权利要求3所述的方法,所述节点信息还包括节点的边的存储地址信息,所述点表中边的存储地址信息为对应边的索引信息在边表中的存储地址信息。
  5. 根据权利要求3所述的方法,同一节点的不同边的边信息在所述边表数据区中连续存储;所述若干个节点的边的边信息在所述边表数据区中的存储顺序与所述若干个节点在点表中的存储顺序一致。
  6. 根据权利要求5所述的方法,边的索引信息还包括边类型;边信息还包括目标节点的节点类型;同一个节点的边的边信息按照边的边类型在所述边表数据区中顺序存储,同一节点在边表索引区对应的边的索引信息包括一个或多个边类型以及与其对应的一个或多个存储地址信息,其中,所述一个或多个边类型连续存储,所述一个或多个存储地址信息也连续存储。
  7. 根据权利要求3所述的方法,所述边属性表包括边属性表索引区以及边属性表 数据区;
    所述若干个节点的边的属性信息存储在所述边属性表数据区中;
    所述边属性表索引区存储有所述若干个节点的边的边属性索引信息,所述边属性索引信息包括对应节点的边的属性信息在所述边属性表数据区中的存储地址信息;
    所述若干个节点的边的边属性索引信息在所述边属性表索引区中的存储顺序与所述若干个节点的边的边信息在所述边表数据区中的存储顺序一致。
  8. 根据权利要求1所述的方法,所述节点信息还包括节点类型,所述若干个节点的节点信息按照所述节点类型顺序存储在所述点表中。
  9. 根据权利要求1所述的方法,所述点属性表包括点属性表索引区以及点属性表数据区;
    所述若干个节点的属性信息存储在所述点属性表数据区中;
    所述点属性表索引区存储有所述若干个节点的节点属性索引信息,所述节点属性索引信息包括该节点的属性信息在所述点属性表数据区中的存储地址信息;
    所述若干个节点的节点属性索引信息在所述点属性表索引区中的存储顺序与所述若干个节点在点表中的存储顺序一致。
  10. 根据权利要求1所述的方法,还包括:生成所述数据块的表元,所述表元包括所述数据块中各表的存储地址信息以及所述数据块中点表中第一个节点的节点标识。
  11. 根据权利要求10所述的方法,所述数据块包括编码信息;所述方法还包括:生成包括多个所述数据块的图谱文件的词表;所述词表包括所述图谱文件中各数据块中的编码信息与原始信息的映射关系。
  12. 根据权利要求10所述的方法,还包括:生成包括多个所述数据块的图谱文件的数据块索引;所述图谱文件的数据块索引包括图谱文件中各数据块的存储地址信息以及各数据块中第一个节点的节点标识。
  13. 根据权利要求12所述的方法,还包括:生成图谱文件元,所述图谱文件元包括各图谱文件中各数据块所在的图谱文件以及在该图谱文件中的数据块序号、各图谱文件中第一个节点的节点标识以及各图谱文件中最后一个节点的节点标识。
  14. 根据权利要求1所述的方法,所述数据块为最小读写单元。
  15. 根据权利要求1所述的方法,所述图数据的边包括出边和入边;所述边表包括出边表和入边表;所述边属性表包括出边属性表和入边属性表;所述节点信息还包括节点的出边的存储地址信息和入边的存储地址信息。
  16. 一种图数据的存储系统,所述图数据包括节点和边;所述存储系统包括:
    节点信息存储模块,用于将所述图数据中的若干个节点的节点信息存储在数据块的点表中;所述节点信息包括节点标识;
    边信息存储模块,用于将所述若干个节点的边的边信息存储在所述数据块的边表中;所述边信息包括与边连接的目标节点的节点标识;
    节点属性信息存储模块,用于将所述若干个节点的属性信息存储在所述数据块的点属性表中;
    边属性信息存储模块,用于将所述若干个节点的边的属性信息存储在所述数据块的边属性表中。
  17. 一种图数据存储装置,包括存储介质和处理器,所述存储介质用于存储计算机指令,所述处理器用于执行计算机指令以实现权利要求1-15中任一项所述的存储方法。
  18. 一种图数据的存储设备,所述图数据包括节点和边;所述存储设备存储有若干数据块,其中每个数据块包括:
    点表,用于存储图数据中至少部分节点的节点信息;所述节点信息包括节点标识;
    边表,用于存储所述节点的边的边信息;所述边信息包括与边连接的目标节点的节点标识;
    点属性表,用于存储所述节点的属性信息;
    边属性表,用于存储所述节点的边的属性信息。
  19. 一种图数据查询方法,其包括:
    接收查询请求,查询请求中包括目标查询节点的节点标识;
    访问图谱文件元,通过图谱文件元中存储的各图谱文件的第一个节点的节点标识以及各图谱文件中最后一个节点的节点标识确定目标查询节点所在的目标图谱文件;
    访问所述目标图谱文件的数据块索引,通过数据块索引中存储的目标图谱文件中各数据块中第一个节点的节点标识,确定目标查询节点所在的目标数据块;
    基于所述数据块索引中存储的目标图谱文件中各数据块的存储地址信息,读取所述目标数据块;
    在目标数据块中,基于其表元获取点表的存储地址信息,并在所述点表中基于目标查询节点的节点标识查找到目标查询节点的节点信息;
    基于目标查询节点的节点信息在所述点表中的存储顺序或者边的存储地址信息,从所述目标数据块的边表、点属性表以及边属性表中的一个或多个表中获取目标查询节点的边信息、点属性信息以及边属性信息中的一种或多种信息。
PCT/CN2023/070606 2022-01-07 2023-01-05 图数据的存储 WO2023131218A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210014665.2 2022-01-07
CN202210014665.2A CN114077680B (zh) 2022-01-07 2022-01-07 一种图数据的存储方法、系统及装置

Publications (1)

Publication Number Publication Date
WO2023131218A1 true WO2023131218A1 (zh) 2023-07-13

Family

ID=80284470

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/070606 WO2023131218A1 (zh) 2022-01-07 2023-01-05 图数据的存储

Country Status (2)

Country Link
CN (1) CN114077680B (zh)
WO (1) WO2023131218A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117235120A (zh) * 2023-11-09 2023-12-15 支付宝(杭州)信息技术有限公司 具有时序特性的超图数据存储和查询方法及装置

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114077680B (zh) * 2022-01-07 2022-05-17 支付宝(杭州)信息技术有限公司 一种图数据的存储方法、系统及装置
CN114282073B (zh) * 2022-03-02 2022-07-15 支付宝(杭州)信息技术有限公司 数据存储方法及装置、数据读取方法及装置
CN116204683A (zh) * 2022-09-15 2023-06-02 阿里巴巴(中国)有限公司 动态图数据存储系统、读取系统及对应方法
CN115481298B (zh) * 2022-11-14 2023-03-14 阿里巴巴(中国)有限公司 图数据处理方法及电子设备
CN117932120A (zh) * 2024-03-18 2024-04-26 支付宝(杭州)信息技术有限公司 图数据库的数据存储方法及装置

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572740A (zh) * 2013-10-23 2015-04-29 华为技术有限公司 一种存储数据的方法和装置
US20180101559A1 (en) * 2016-10-06 2018-04-12 Microsoft Technology Licensing, Llc Diverse addressing of graph database entities by database applications
CN109189994A (zh) * 2018-06-27 2019-01-11 北京中科睿芯科技有限公司 一种面向图计算应用的cam结构存储系统
CN111512303A (zh) * 2017-12-29 2020-08-07 电子技术公司 分层图形数据结构
CN112287182A (zh) * 2020-10-30 2021-01-29 杭州海康威视数字技术股份有限公司 图数据存储、处理方法、装置及计算机存储介质
CN112559631A (zh) * 2020-12-15 2021-03-26 北京百度网讯科技有限公司 分布式图数据库的数据处理方法、装置以及电子设备
CN113609347A (zh) * 2021-10-08 2021-11-05 支付宝(杭州)信息技术有限公司 数据存储及查询方法、装置及数据库系统
CN113722520A (zh) * 2021-11-02 2021-11-30 支付宝(杭州)信息技术有限公司 图数据的查询方法及装置
CN114077680A (zh) * 2022-01-07 2022-02-22 支付宝(杭州)信息技术有限公司 一种图数据的存储方法、系统及装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8055864B2 (en) * 2007-08-06 2011-11-08 International Business Machines Corporation Efficient hierarchical storage management of a file system with snapshots
US9047189B1 (en) * 2013-05-28 2015-06-02 Amazon Technologies, Inc. Self-describing data blocks of a minimum atomic write size for a data store
CN104133970A (zh) * 2014-08-06 2014-11-05 浪潮(北京)电子信息产业有限公司 一种数据空间管理方法及装置
US20180173755A1 (en) * 2016-12-16 2018-06-21 Futurewei Technologies, Inc. Predicting reference frequency/urgency for table pre-loads in large scale data management system using graph community detection
CN107657027B (zh) * 2017-09-27 2021-09-21 北京小米移动软件有限公司 数据存储方法及装置
US10810075B2 (en) * 2018-04-23 2020-10-20 EMC IP Holding Company Generating a social graph from file metadata

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572740A (zh) * 2013-10-23 2015-04-29 华为技术有限公司 一种存储数据的方法和装置
US20180101559A1 (en) * 2016-10-06 2018-04-12 Microsoft Technology Licensing, Llc Diverse addressing of graph database entities by database applications
CN111512303A (zh) * 2017-12-29 2020-08-07 电子技术公司 分层图形数据结构
CN109189994A (zh) * 2018-06-27 2019-01-11 北京中科睿芯科技有限公司 一种面向图计算应用的cam结构存储系统
CN112287182A (zh) * 2020-10-30 2021-01-29 杭州海康威视数字技术股份有限公司 图数据存储、处理方法、装置及计算机存储介质
CN112559631A (zh) * 2020-12-15 2021-03-26 北京百度网讯科技有限公司 分布式图数据库的数据处理方法、装置以及电子设备
CN113609347A (zh) * 2021-10-08 2021-11-05 支付宝(杭州)信息技术有限公司 数据存储及查询方法、装置及数据库系统
CN113722520A (zh) * 2021-11-02 2021-11-30 支付宝(杭州)信息技术有限公司 图数据的查询方法及装置
CN114077680A (zh) * 2022-01-07 2022-02-22 支付宝(杭州)信息技术有限公司 一种图数据的存储方法、系统及装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117235120A (zh) * 2023-11-09 2023-12-15 支付宝(杭州)信息技术有限公司 具有时序特性的超图数据存储和查询方法及装置

Also Published As

Publication number Publication date
CN114077680B (zh) 2022-05-17
CN114077680A (zh) 2022-02-22

Similar Documents

Publication Publication Date Title
WO2023131218A1 (zh) 图数据的存储
CN109254733B (zh) 用于存储数据的方法、装置和系统
JP5544431B2 (ja) スケーラブルなデータ構造を利用するための方法および装置
WO2018149271A1 (zh) 数据查询方法、装置及计算设备
WO2017107414A1 (zh) 文件操作方法和装置
CN107704202B (zh) 一种数据快速读写的方法和装置
CN107103011B (zh) 终端数据搜索的实现方法和装置
US10248736B1 (en) Data loader and mapper tool
CN104021123A (zh) 用于数据迁移的方法和系统
CN108388423A (zh) 一种ios数据对象转换方法及装置
WO2017097159A1 (zh) 一种随机字符串生成方法及装置
WO2023143096A1 (zh) 数据查询方法、装置、设备及存储介质
CN101576919B (zh) 标识生成方法和装置
US20220253419A1 (en) Multi-record index structure for key-value stores
CN112912870A (zh) 租户标识符的转换
CN112925954A (zh) 用于在图数据库中查询数据的方法和装置
CN106570153A (zh) 一种海量url的数据提取方法及系统
US20220019907A1 (en) Dynamic In-Memory Construction of a Knowledge Graph
CN110049133B (zh) 一种dns区文件全量下发的方法和装置
CN112889039A (zh) 用于克隆后租户标识符转换的记录的标识
CN111310076A (zh) 地理位置查询方法、装置、介质及电子设备
CN111125216A (zh) 数据导入Phoenix的方法及装置
CN107463618B (zh) 一种索引创建方法和装置
CN113849550A (zh) 数据处理方法及装置
CN109947739A (zh) 数据源管理方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23737081

Country of ref document: EP

Kind code of ref document: A1