WO2023083234A1 - 图状态数据管理 - Google Patents

图状态数据管理 Download PDF

Info

Publication number
WO2023083234A1
WO2023083234A1 PCT/CN2022/131007 CN2022131007W WO2023083234A1 WO 2023083234 A1 WO2023083234 A1 WO 2023083234A1 CN 2022131007 W CN2022131007 W CN 2022131007W WO 2023083234 A1 WO2023083234 A1 WO 2023083234A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
file
graph state
graph
key
Prior art date
Application number
PCT/CN2022/131007
Other languages
English (en)
French (fr)
Inventor
唐浩栋
潘臻轩
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2023083234A1 publication Critical patent/WO2023083234A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Definitions

  • the embodiments of this specification generally relate to the field of graph computing, and in particular, relate to a graph state data management method and a graph state data management device.
  • Graph computing refers to complex computing oriented to graph data structures.
  • the graph computing engine abstracts real-world business data into a graph data structure and performs complex calculations.
  • the graph data structure is a complex data structure composed of vertices and edges, which includes various data attributes.
  • the graph computation performed by the graph computation engine is an iterative computation. In each round of iterative calculation, the graph computing engine will generate intermediate results, which can be called graph state data. In some real-time graph computing application scenarios, the built real-time graph computing engine has the graph computing capabilities of streaming computing and graph computing integration. In order to ensure data fault tolerance in streaming computing, it is necessary to store graph state data in, for example, the memory, cache, or local disk of the graph computing engine, and manage the stored graph state data.
  • the embodiments of this specification provide a graph state data management method and device.
  • the graph state management can be decoupled from the graph computing, the separation of computing and storage can be realized, and larger-scale graph state data management can be realized.
  • a graph state data management method applied to a graph state management device including: batch acquiring graph state data obtained by a graph computing engine during graph computation, the graph state data including vertex data and/or edge data; each graph state data in the graph state data is encoded as kv data, wherein the vertex ID in the vertex data and/or the starting point ID in the edge data is encoded as a key, and the vertex data in the vertex data
  • the non-starting point ID data in the non-vertex ID data and/or edge data is encoded as value;
  • the list is sorted based on the key of the kv data to form the kv list data, and in the kv list data, each key corresponds to a or a plurality of values; write the value of the kv list data into the data file in the file storage system in sequence and record the corresponding logical address of each key in the data file, and the logical address includes the value written corresponding to the key
  • a variable data table and an immutable data table are maintained in the memory of the graph state management device.
  • the graph state data management method may also include: writing the kv list data into a variable data table; and judging the variable data table for writing the kv data Whether the data size of has reached the threshold.
  • sorting the kv data based on the key of the kv data may include: in response to the data size of the variable data table written into the kv data reaching a threshold value, based on the key of the kv data to write the kv data into the variable data table
  • the kv data are sorted to form kv list data.
  • Writing the value of the kv list data into the data file in the file storage system in order may include: converting the sorted variable data table into an immutable data table; and writing the value of the immutable data table into the file in order
  • Data files in the storage system, each immutable data table corresponds to a data file.
  • writing the value of the kv list data into a data file in the file storage system in sequence may include: constructing the value of the kv list data into multiple The ordered data blocks of the same size; carry out data compression on the constructed ordered data blocks; and write the ordered data blocks after the data compression into the data files in the file storage system in order, and the data files include each passed data The compressed ordered data block and the metadata block, the first file offset address corresponding to the metadata record key in the metadata block and the second file offset of the compressed ordered data block in the data file The mapping relationship between addresses.
  • the graph state data management method may further include: in response to receiving a graph state data read request from the graph computing engine, encoding the data ID in the graph state data read request is the target key, the data ID includes a vertex ID and/or an edge ID; query the corresponding logical address in the memory index based on the target key; obtain the value corresponding to the target key according to the logical address ; Decode the obtained value to obtain target graph state data; and provide the obtained target graph state data to the graph computing engine.
  • obtaining the value corresponding to the target key according to the logical address may include: initiating a data acquisition request to the file storage system in response to finding the corresponding logical address, the The data acquisition request includes the corresponding logical address; and the value returned in response to the graph state data acquisition request is received from the file storage system, and the returned value is used by the file storage system according to the corresponding logic
  • the addresses are obtained from data files in the file storage system.
  • a data LRU cache is maintained in the memory of the graph state management device, and the data LRU cache is used to cache a previously obtained value in association with a corresponding logical address of the key.
  • acquiring the value corresponding to the target key according to the logical address may also include: judging whether the data LRU cache caches the target according to the logical address The value corresponding to the key; and when the value corresponding to the target key is cached in the data LRU cache, obtain the corresponding value from the data LRU cache.
  • the value of the graph state data is constructed as a plurality of ordered data blocks with a first data size, and is written to the data file of the file storage system after data compression wherein, the data file includes each compressed ordered data block and metadata block, the first file offset address corresponding to the metadata record key in the metadata block and the compressed ordered data block The mapping relationship between the offset addresses of the second file in the data file.
  • acquiring the value corresponding to the target key according to the logical address may include: in response to finding the corresponding logical address, initiating a data block acquisition request to the file storage system, the data block acquisition request including the Corresponding logical address; receive the compressed data block returned in response to the data block acquisition request from the file storage system, and the compressed data block is stored in the file storage system according to the first file offset address in the Obtaining from the data file of the file storage system; decompressing the obtained compressed data block; determining the value corresponding to the target key based on the first file offset address in the logical address and the first data size A third offset address in the decompressed data block; and obtaining a value corresponding to the target key from the decompressed data block according to the third offset address.
  • a data block LRU cache is maintained in the memory of the graph state management device, and the data block LRU cache is used to cache previously acquired data blocks in association with the corresponding logical address of the key .
  • obtaining the value corresponding to the target key according to the logical address may also include: judging whether the data block LRU cache is cached according to the logical address the compressed data block corresponding to the target key; and when the compressed data block corresponding to the target key is cached in the data block LRU cache, obtaining the corresponding compressed data block from the data block LRU cache.
  • the graph state data management method may further include: using a given data filtering strategy, Data filtering.
  • the data management method may further include: judging whether to update the memory index; and in response to judging that it is necessary to update the memory index, using the recorded logical address of each key to perform incremental logical address update on the corresponding logical address in the memory index.
  • the graph state data management method may further include: in response to satisfying the data aggregation condition, using a given data aggregation policy to store the graph state data stored in the data file of the file storage system Data is aggregated.
  • a graph state data management device applied to a graph state management device including: a graph state data acquisition unit that acquires graph state data obtained by a graph computing engine during graph calculation in batches,
  • the graph state data includes vertex data and/or edge data;
  • the first data encoding unit encodes each graph state data in the graph state data into kv data, wherein the vertex ID and/or edge data in the vertex data
  • the starting point ID in is encoded as a key
  • the non-vertex ID data in the vertex data and/or the non-starting point ID data in the edge data is encoded as a value;
  • the data sorting unit sorts the kv data based on the key of the kv data To form kv list data, in the kv list data, each key corresponds to one or more values;
  • the first data writing unit writes the value of the kv list data into the data files in the file storage system in sequence
  • the logical address recording unit records the corresponding logical address of each key in the data file
  • a variable data table and an immutable data table are maintained in the memory of the graph state management device.
  • the graph state data management device may further include: a second data writing unit, which writes the kv data into a variable data table before sorting the kv data based on the key of the kv data; and a first judging unit , judging whether the data size of the variable data table written in kv data reaches a threshold, the data sorting unit responds to the data size of the variable data table written in kv data reaching a threshold, and writes the key based on the kv data The kv data in the variable data table is sorted to form the kv list data.
  • the graph state data management device may also include: a data table conversion unit, which converts the sorted variable data table into an immutable data table, wherein the first data writing unit sequentially converts the value of the immutable data table Write data files in the file storage system, and each immutable data table corresponds to a data file.
  • a data table conversion unit which converts the sorted variable data table into an immutable data table
  • the first data writing unit sequentially converts the value of the immutable data table Write data files in the file storage system, and each immutable data table corresponds to a data file.
  • the first data writing unit may include: a data block construction module, which constructs the value of the kv list data into a plurality of ordered data blocks with a first data size ; A data block compression module, which compresses the constructed ordered data blocks; and a data block write module, which writes the ordered data blocks after data compression into the data files in the file storage system in order, and the data
  • the file includes each compressed ordered data block and metadata block, the first file offset address corresponding to the metadata record key in the metadata block and the compressed ordered data block in the data file The mapping relationship between the offset addresses of the second file.
  • the graph state data management device may further include: a second data encoding unit, which reads the graph state data in response to receiving a graph state data read request from the graph computing engine
  • the data ID in the request is coded as a target key, and the data ID includes a vertex ID and/or a starting point ID of an edge
  • a logical address query unit queries the corresponding logical address in the memory index based on the target key
  • a data acquisition unit Obtain the value corresponding to the target key according to the logical address
  • the data decoding unit decodes the acquired value to obtain the target graph state data
  • the data providing unit provides the obtained target graph state data to the graph calculation engine.
  • the data acquisition unit may include: a data acquisition request initiating module that initiates a data acquisition request to the file storage system in response to finding a corresponding logical address, and the data acquisition The request includes the corresponding logical address; and the data acquisition module receives, from the file storage system, a value returned in response to the data acquisition request, and the returned value is determined by the file storage system according to the corresponding logical address Obtained from the data files in the file storage system.
  • a data LRU cache is maintained in the memory of the graph state management device, and the data LRU cache is used to cache a previously obtained value in association with a corresponding logical address of a key.
  • the data acquisition unit may further include: a data cache judging module, before initiating a data acquisition request to the file storage system, according to the logical address, it is judged whether the data LRU cache has cached the key corresponding to the target key. value, wherein, when the value corresponding to the target key is cached in the data LRU cache, the data acquisition module acquires the corresponding value from the data LRU cache.
  • the value of the graph state data is constructed as a plurality of ordered data blocks with a first data size, and is written to the data file of the file storage system after data compression wherein, the data file includes each compressed ordered data block and metadata block, the first file offset address corresponding to the metadata record key in the metadata block and the compressed ordered data block The mapping relationship between the offset addresses of the second file in the data file.
  • the data acquisition unit may include: a data block acquisition request initiating module, in response to finding the corresponding logical address, initiate a data block acquisition request to the file storage system, the data block acquisition request includes the corresponding Logical address; a data block acquisition module, which receives from the file storage system the compressed data block returned in response to the data block acquisition request, and the compressed data block is determined by the file storage system according to the first file offset address Obtained from the data file of the file storage system; the data block decompression module decompresses the obtained compressed data block; the offset address determination module is based on the first file offset address in the logical address and the The first data size determines the third offset address of the value corresponding to the target key in the decompressed data block; and the data acquisition module, according to the third offset address, from the decompressed Obtain the value corresponding to the target key from the data block.
  • a data block acquisition request initiating module in response to finding the corresponding logical address, initiate a data block acquisition request to the file storage system, the data block acquisition request includes
  • a data block LRU cache is maintained in the memory of the graph state management device, and the data block LRU cache is used to cache previously acquired data blocks in association with the corresponding logical address of the key .
  • the data acquisition unit may further include: a data block cache judgment module, before initiating a data block acquisition request to the file storage system, judge according to the logical address whether the data block LRU cache is cached with the target key Corresponding compressed data block, wherein, when the compressed data block corresponding to the target key is cached in the data block LRU cache, the data block obtaining module obtains the corresponding compressed data block from the data block LRU cache .
  • the device for managing graph state data may further include: a data filtering unit that uses a given data filtering policy to process the obtained graph state data before providing the graph computing engine The resulting graph state data is subjected to data filtering.
  • the graph state data management device may further include: a memory index update judging unit, which writes the value of the kv list data into a data file in the file storage system in sequence and After recording the corresponding logical address of each key in the data file, it is judged whether to update the memory index; The logical address undergoes incremental logical address updates.
  • a memory index update judging unit which writes the value of the kv list data into a data file in the file storage system in sequence and After recording the corresponding logical address of each key in the data file, it is judged whether to update the memory index; The logical address undergoes incremental logical address updates.
  • the graph state data management device may further include: a data aggregation unit, in response to satisfying a data aggregation condition, using a given data aggregation policy to Stored graph state data for data aggregation.
  • a graph state data management device including: at least one processor, a memory coupled to the at least one processor, and a computer program stored in the memory, the The at least one processor executes the computer program to implement the graph state data management method as described above.
  • a computer-readable storage medium storing executable instructions, which when executed cause a processor to execute the graph state data management method as described above.
  • a computer program product including a computer program, the computer program is executed by a processor to implement the graph state data management method as described above.
  • Fig. 1 shows an example diagram of a graph state management architecture according to an embodiment of the present specification.
  • Fig. 2 shows an example flow chart of a method for writing graph state data according to an embodiment of the specification.
  • Fig. 3 shows an example diagram of kv list data according to an embodiment of the present specification.
  • Fig. 4 shows an example schematic diagram of a memory index structure according to an embodiment of the present specification.
  • Fig. 5 shows an exemplary schematic diagram of a data file writing process according to an embodiment of the present specification.
  • FIG. 6 shows an example schematic diagram of a data file in which graph state data is written according to an embodiment of the present specification.
  • Fig. 7 shows another exemplary flow chart of a method for writing graph state data according to an embodiment of this specification.
  • Fig. 8 shows an example flow chart of a memory index update process according to an embodiment of the present specification.
  • Fig. 9 shows an exemplary schematic diagram of an updated memory index structure according to an embodiment of the present specification.
  • FIG. 10 shows an example flowchart of a graph state data compaction process according to an embodiment of the specification.
  • Fig. 11 shows an example flowchart of a method for reading graph state data according to an embodiment of the present specification.
  • Fig. 12 shows an example flowchart of a value acquisition process according to an embodiment of this specification.
  • Fig. 13 shows another example flow chart of the value acquisition process according to the embodiment of this specification.
  • FIG. 14 shows an example block diagram of a graph state data management device according to an embodiment of the present specification.
  • FIG. 15 shows an example block diagram of a first data writing unit according to an embodiment of the present specification.
  • FIG. 16 shows another example block diagram of a graph state data writing component according to an embodiment of the specification.
  • Fig. 17 shows an example block diagram of a data acquisition unit according to an embodiment of the present specification.
  • Fig. 18 shows another example block diagram of a data acquisition unit according to an embodiment of the present specification.
  • Fig. 19 shows a schematic diagram of an example of a graph state data management device implemented based on a computer system according to an embodiment of the present specification.
  • the term “comprising” and its variants represent open terms meaning “including but not limited to”.
  • the term “based on” means “based at least in part on”.
  • the terms “one embodiment” and “an embodiment” mean “at least one embodiment.”
  • the term “another embodiment” means “at least one other embodiment.”
  • the terms “first”, “second”, etc. may refer to different or the same object. The following may include other definitions, either express or implied. Unless the context clearly indicates otherwise, the definition of a term is consistent throughout the specification.
  • Fig. 1 shows an example diagram of a graph state management architecture 1 according to an embodiment of the present specification.
  • a graph state management architecture 1 includes a graph computing engine 10 , a graph state management device 20 and a file storage system 30 .
  • the graph computation engine 10 is configured to perform graph computations using graph data.
  • the graph computing engine 10 abstracts real business data into a graph data structure.
  • Graph data can include vertex data and edge data.
  • Vertex data may include, for example, vertex identification and vertex attributes.
  • vertex identification may include a vertex ID and a vertex type.
  • the vertex identification may only include the vertex ID.
  • Vertex IDs are used to uniquely identify vertices in graph data.
  • Edge data may include edge identifiers and edge attributes. Edge identification may include origin ID, edge type, edge timestamp and end ID. Alternatively, the edge identifier may include a starting point ID and an ending point ID.
  • Vertex IDs, edge IDs, vertex attributes, and edge attributes can be business-related.
  • the vertex ID may be a person's ID number or personnel number.
  • the vertex type may be the classification to which the vertex belongs, for example, the vertex is classified as a user class vertex.
  • Vertex attributes can include age, education, address, occupation, etc.
  • the edge type is used to indicate the type of the edge. For example, if a transfer edge is created between vertices A and B, the edge type of the transfer edge can be "transfer”.
  • the edge attributes may include attributes of edges formed between vertices. For example, in the above transfer edge, the edge attributes may include "amount”, "currency”, "operating equipment” and so on.
  • Graph calculations performed by the graph calculation engine 10 are iterative calculations. During each round of iterative calculation, the graph computing engine 10 generates intermediate results, which may be called graph state data.
  • the graph computing engine 10 may include any graph computing engine applicable in the art. In some real-time graph computing application scenarios, the graph computing engine 10 may have graph computing capabilities that integrate streaming computing and graph computing.
  • the graph state data generated by the graph computing engine 10 is provided to the graph state management device 20 .
  • the graph state management device 20 includes graph state data management means 21 and a memory 22 .
  • the graph state data management device 21 is configured to manage the graph state data, for example, write (store) the graph state data into the file storage system 30, and perform data update and data update on the graph state data written in the file storage system 30. Reading, data filtering, expired data removal, and/or data aggregation, etc.
  • the graph computation engine 10 and the graph state management device 20 can be deployed independently.
  • the graph state management device 20 can be integrated with the graph computing engine 10 . In this case, the graph state management device 20 can share the same memory with the graph computation engine 10 .
  • the file storage system 30 may also be called an external storage system, for example, a cloud file storage system and the like.
  • the file storage system 30 may support multiple data backups or other data disaster recovery mechanisms to ensure data reliability.
  • file storage system 30 may be a distributed file storage system.
  • FIG. 2 shows an example flowchart of a method 200 for writing graph state data according to an embodiment of the specification.
  • the graph state data management device acquires batches of graph state data from the graph computing engine.
  • the obtained graph state data may include vertex data and/or edge data.
  • Vertex data can include vertex ID, vertex metadata and vertex attributes, etc.
  • Edge data may include edge start ID and end point ID, edge metadata, edge attributes, and so on.
  • the vertex metadata and edge metadata can be fixed at 8 bytes, and can include timestamp information, whether it is a point, whether it is an outgoing edge or an incoming edge, and a user-defined label (label), etc.
  • Vertex attributes and/or edge attributes can be custom attributes.
  • the graph state data management device encodes each graph state data in the acquired graph state data into kv data.
  • the graph state data is vertex data
  • the vertex ID in the vertex data is encoded as a key (key) in the kv data
  • the non-vertex ID data in the vertex data is encoded as a value (value) in the kv data.
  • Non-vertex ID data may include, for example, vertex metadata, vertex attributes, and the like.
  • the graph state data is edge data
  • the starting point ID in the edge data is encoded as a key in the kv data
  • the non-starting point ID data in the edge data is encoded as a value in the kv data.
  • Non-origin ID data may include, for example, end ID, edge metadata, edge attributes, and the like.
  • the graph state data management device sorts the encoded kv data based on the key of the kv data to form kv list data.
  • the kv data can be sorted based on the ID size of the vertex ID or the starting point ID of the edge, and the values with the same key can be aggregated to form the kv list data. For example, if two or more graph state data have the same key, the values corresponding to the two or more graph state data are aggregated together.
  • each key can correspond to one or more values.
  • Fig. 3 shows an example diagram of kv list data according to an embodiment of the present specification.
  • 5 pieces of graph state data are obtained from the graph computing engine, and 5 pieces of kv data (K1,V1), (K2,V2), (K2,V3) are obtained after encoding the 5 pieces of graph state data , (K2, V4) and (K3, V5).
  • the kv list data shown on the right is obtained.
  • K1 corresponds to V1
  • K2 corresponds to V2, V3, and V4
  • K3 corresponds to V5.
  • the graph state data management device After obtaining the kv list data as above, at 240, the graph state data management device writes the value of the kv list data into the data file in the file storage system in sequence, and at 250, the graph state data management device records each key in the data file
  • the logical address corresponding to the recorded logical address includes the file ID of the data file where the value corresponding to the key is written and the first file offset address of the corresponding value in the written data file.
  • sequential writing refers to sequentially writing the values into the data file according to the sorting of the keys corresponding to the values.
  • the graph state data management device maintains the memory index of the written batch graph state data in the memory of the graph state management device, and the memory index is used to reflect the index relationship between the key and the corresponding logical address.
  • various memory index structures can be supported, such as FST, skip table and CSR, etc.
  • the index of the memory index corresponds to the key encoded based on the vertex ID or the starting point ID of the edge, that is, the index is obtained based on the sorting result of the key, and the value of the memory index corresponds to the value corresponding to each key in the file storage A logical address in the system's data files.
  • Fig. 4 shows an example schematic diagram of a memory index structure according to an embodiment of the present specification.
  • the memory index is stored as a java array structure.
  • the index in the array structure corresponds to the key encoded based on the vertex ID or the starting point ID of the edge, that is, the index is obtained based on the sorting result of the key, and the value in the array structure corresponds to the value corresponding to each key in the data file of the file storage system logical address.
  • the storage location corresponding to index1 records the file ID (fid) of the data file in which the value corresponding to 12 is written and the first file offset address (offset) of the corresponding value in the written data file.
  • the storage location corresponding to index2 records the file ID (fid) of the data file in which the value corresponding to 15 is written and the first file offset address (offset) of the corresponding value in the written data file.
  • the storage location corresponding to Index3 records the file ID (fid) of the data file in which the value corresponding to 23 is written and the first file offset address (offset) of the corresponding value in the written data file.
  • the file ID (fid) of the data file and the first file offset address (offset) of the corresponding value in the data file form a posting structure in the memory index structure.
  • data compression may be performed on the value of the sorted kv list data, and the compressed value may be written into the file storage system.
  • FIG. 5 shows an exemplary schematic diagram of a data file writing process 500 according to an embodiment of the present specification.
  • the value written into the data file has undergone data compression.
  • the graph state data management device constructs the value of the kv list data into a plurality of values with the first data size Ordered data blocks.
  • the ordered data blocks are constructed in sequence according to the sorting of the keys corresponding to the values, and the ordered data blocks are also sorted.
  • the key corresponding to the value in the ordered data block that is sorted before is sorted before the keys corresponding to all the values in other ordered data blocks after the ordered data block.
  • the constructed ordered data blocks may have the same data size, for example, 64k.
  • the graph state data management device performs data compression on the constructed ordered data blocks.
  • the same data compression algorithm may be used to perform data compression on the constructed ordered data blocks, so that the data sizes of the ordered data blocks after data compression are the same.
  • the graph state data management device sequentially writes the compressed ordered data blocks into a data file in the file storage system, and the data file includes each compressed ordered data block and metadata block.
  • mapping relationship recorded in the metadata can be a many-to-one mapping relationship or a one-to-many mapping relationship, that is, multiple first file offset addresses correspond to one second file offset address, or one first file offset address A file offset address corresponds to multiple second file offset addresses.
  • the values corresponding to 01 and 12 are constructed as data block 1 (block 1), and the value corresponding to 15 is constructed as data block 2 (block 2), And the value corresponding to 23 is constructed as data blocks 3 and 4 (block 3 and block 4).
  • the first file offset address offset1 corresponding to 01 and the first file offset address offset2 corresponding to 12 and block 1 are between the second file offset address in the data file Form a mapping relationship
  • the first file offset address offset3 corresponding to 15 and block 2 form a mapping relationship between the second file offset address in the data file
  • the first file offset address offset4 corresponding to 23 and block 3 A mapping relationship is formed between block 4 and the offset address of the second file in the data file.
  • FIG. 6 shows an example schematic diagram of a data file in which graph state data is written according to an embodiment of the present specification.
  • the data file includes several ordered data blocks (for example, data block 1, data block 2, ..., data block n) and metadata blocks after data compression, and each ordered data block stores The value corresponding to the key is stored in the data file in sequence.
  • Each ordered data block can store values corresponding to one or more keys.
  • the value corresponding to a key may also be stored in two or more adjacent ordered data blocks.
  • the metadata block is stored at the end of the data file.
  • FIG. 7 shows another exemplary flow chart of a method 700 for writing graph state data according to an embodiment of this specification.
  • a mutable data table (mutable table) and an immutable data table (immutable table) are maintained in the memory of the graph state management device.
  • the kv data can be continuously written in the mutable table.
  • the stored data and data sorting (data storage order) in the immutable table are locked and will not change. That is, it is not allowed to write kv data to immutable table.
  • the graph state data management device can use the above data encoding method to encode each graph state data in the obtained graph state data into kv data .
  • the graph state data management device writes the encoded kv data into the mutable table. Specifically, it is determined whether there is an idle mutable table in the memory of the graph state management device. If it exists, write the encoded kv data to the free mutable table. If it does not exist, create a new mutable table in memory, and then write the encoded kv data into the new mutable table.
  • mutable table is equivalent to a section of memory space in memory.
  • mutable tables can be created one by one.
  • multiple mutable tables can be created at one time, and the encoded kv data can be written into the multiple mutable tables in parallel.
  • the graph state data management device judges whether the mutable table written with kv data reaches a threshold, for example, 64M. If the threshold is not reached, return to 702, and the graph state data management device continues to write kv data. If the threshold is reached, then at 704, the graph state data management device sorts the kv data in the mutable table based on the key to form kv list data, thereby ensuring vertex data with the same vertex ID or edge data with the same edge start ID Implement data aggregation in mutable table.
  • a threshold for example, 64M.
  • the graph state data management device encapsulates the sorted mutable table into an immutable table.
  • An immutable table can have a specified data size, for example, 64M.
  • the graph state data management device writes the value in the immutable table in order (according to the sorting result of the corresponding key) into the data file of the file storage system. For example, the graph state data management device writes the values in the immutable table through an asynchronous thread The value of is written in the data file of the file storage system in sequence, and at 707, the graph state data management device records the corresponding logical address of each key in the data file.
  • the graph state data management device may also determine whether to update the memory index. For example, the graph state data management device may check whether the currently written graph state data is the first batch of graph state data, or check whether the currently written graph state data is written for the first time after data compaction is performed on the data files of the file storage system. Enter data to determine whether to update the memory index. If the currently written graph state data is the first batch of graph state data, or the currently written graph state data is written for the first time after the data file of the file storage system is compacted, then it is judged that no memory index update is required . If the currently written graph state data is not the first batch of graph state data, and the currently written graph state data is not written for the first time after the data file of the file storage system is compacted, it is judged that memory index update is required .
  • the graph state data management device performs an incremental index update on the memory index maintained in the memory of the graph state management device. If it is determined that no memory index update is needed, then at 710, the graph state data management device maintains the memory index from the key to the logical address in the memory of the graph state management device.
  • FIG. 8 shows an example flowchart of an in-memory index update process 800 according to an embodiment of the specification.
  • the graph state data management device obtains the initial logical address in the memory index file based on the key corresponding to the value written in the data file, that is, the logical address stored when the graph state data was written last time. address.
  • the graph state data management device merges the initial logical address and the incremental logical address when the graph state data is written this time.
  • the merging process refers to adding the incremental logical address after the initial logical address.
  • the graph state data management device records the merged logical address into the memory index.
  • Fig. 9 shows an exemplary schematic diagram of an updated memory index structure according to an embodiment of the present specification.
  • fd1+offset1 and fd2+offset2 are the initial logical addresses
  • fd3+offset3 is the logical address corresponding to 12 when the state data of the current graph is written.
  • the graph state data management device may determine whether the data compact condition needs to be satisfied.
  • data compact can refer to aggregating the value corresponding to the same key distributed in multiple different data files into the same data file, or deleting expired graph state data from the graph state data stored in the data file.
  • Data compact conditions may include, but are not limited to: the number of fids included in the logical address corresponding to the same key exceeds a predetermined value; the data size of the value corresponding to the same key exceeds a predetermined threshold, and so on.
  • the graph state management device executes the data compact process, and at 713, writes the graph state data after the data compact is completed into the data file of the file storage system again.
  • the graph state data writing process described above with reference to FIG. 2 .
  • FIG. 10 shows an example flow diagram of a graph state data compact process 1000 according to an embodiment of the specification.
  • the graph state data management device obtains the logical addresses corresponding to each key from the memory index in sequence, and at 1020, the graph state data management device obtains the logical address corresponding to each key, Get the value corresponding to each key.
  • the graph state data management device After obtaining the value corresponding to each key, at 1030, the graph state data management device performs data compaction on the obtained value. For example, reorder the obtained values based on the key, or delete expired values based on the survival time (TTL, Time to Live) of each value.
  • TTL survival time
  • the value corresponding to the same key can be written into the same data file as much as possible, thereby reducing the data reading time when reading graph state data.
  • the amount of data written into the data file can be reduced, thereby reducing the storage space occupied by the file storage system.
  • the graph state data writing process according to the embodiment of the present specification is described above. After the graph state data is written into the data file of the file storage system, when the graph computing engine executes the graph calculation again, it needs to read the graph state data of the previous iterative computing process from the file storage system.
  • FIG. 11 shows an example flowchart of a method 1100 for reading map state data according to an embodiment of the present specification.
  • the graph state data management device in response to receiving a graph state data read request from the graph computing engine, encodes the data ID in the graph state data read request as a target key.
  • the data ID is the vertex ID.
  • the data ID is the edge start ID.
  • the graph state data requested to be read includes vertex data and edge data, the data ID includes vertex ID and edge start ID.
  • the graph state data management apparatus queries the corresponding logical address in the memory index maintained in the memory of the graph state management device based on the obtained target key.
  • the graph state data management device After inquiring the corresponding logical address, at 1130, the graph state data management device acquires the value corresponding to the target key according to the inquired logical address.
  • FIG. 12 shows an example flowchart of a value acquisition process 1200 according to an embodiment of the present specification.
  • a data LRU cache is maintained in the memory of the graph state management device, and the data LRU cache is used to cache the previously obtained value in association with the corresponding logical address of the key.
  • the graph state data management device uses the logical address to perform data cache query in the data LRU cache, and at 1220, judges whether the data Whether the value corresponding to the target key is cached in the LRU cache.
  • the graph state data management device obtains the corresponding value from the data LRU cache.
  • the graph state data management device initiates a data acquisition request to the file storage system, and the initiated data acquisition request includes the corresponding logical address.
  • the file storage system acquires the corresponding value in the data file of the file storage system according to the logical address in the data acquisition request. For example, the file storage system can use the file ID in the logical address to find the corresponding data file, and obtain the corresponding value from the data file according to the offset address of the first file.
  • the graph state data management device receives the queried value from the file storage system.
  • FIG. 13 shows another example flow chart of a value acquisition process 1300 according to an embodiment of this specification.
  • the value of the graph state data is constructed as a plurality of ordered data blocks with the first data size and written into the data file of the file storage system after data compression
  • the data file includes each The compressed ordered data block and metadata block, the offset address of the first file corresponding to the metadata record key in the metadata block and the second file offset address of the compressed ordered data block in the data file mapping relationship between them.
  • a data block LRU cache is maintained in the memory of the graph state management device, and the data block LRU cache is used to cache previously acquired data blocks in association with the corresponding logical address of the key.
  • the graph state data management device uses the logical address to perform a data block cache query in the data block LRU cache, and at 1320, The graph state data management device judges whether the compressed data block corresponding to the target key is cached in the data block LRU cache.
  • the graph state data management device obtains the corresponding compressed data block from the data block LRU cache, and proceeds to 1360 .
  • the graph state data management device initiates a data block acquisition request to the file storage system, and the data block acquisition request includes a corresponding logical address.
  • the file storage system acquires the corresponding compressed data block in the data file of the file storage system according to the logical address in the data block acquisition request. Specifically, the file storage system can use the file ID in the logical address to find the corresponding data file.
  • the file storage system obtains the corresponding metadata from the metadata block of the data file, and uses the mapping from the first file offset address in the corresponding metadata to the second file offset address of the compressed data block in the data file relationship, determine the second file offset address of the compressed data block in the data file. Then, the file storage system acquires the corresponding compressed data block from the data file based on the second file offset address.
  • the graph state data management device receives the compressed data block returned in response to the data block acquisition request from the file storage system, and then proceeds to 1360 .
  • the graph state data management device decompresses the obtained compressed data block.
  • the graph state data management device determines that the value corresponding to the target key is decompressed based on the first file offset address in the logical address and the first data size (that is, the data size of the ordered data block before uncompression).
  • the third offset address in the ordered data block of For example, a modulo operation with the first data size as the modulus can be performed on the first file offset address, and the obtained remainder is the third offset of the value corresponding to the target key in the decompressed ordered data block address.
  • the graph state data management device obtains the value corresponding to the target key from the decompressed ordered data block according to the third offset address.
  • the graph state data management device decodes the obtained value to obtain the target graph state data.
  • the graph state data management device may decode the obtained value to obtain the data of the non-data ID part in the target graph state data.
  • the decoded data that is not part of the data ID can be used as the target graph state data.
  • the decoded non-data ID part data may be combined with the data ID to obtain the target graph state data.
  • the graph state data management device provides the obtained target graph state data to the graph computing engine for the graph computing engine to perform graph computing.
  • the graph state data management method may further include: using a given data filtering strategy to process the obtained graph state data Perform data filtering. For example, delete expired graph state data from the resulting graph state data using a TTL-based expired data removal mechanism.
  • data filtering can also be performed on the obtained graph state data based on other data filtering conditions.
  • FIG. 14 shows an example block diagram of a graph state data management apparatus 1400 according to an embodiment of the present specification.
  • the graph state data management apparatus 1400 may include a graph state data writing component.
  • the graph state data writing component is configured to write the graph state data obtained in batches from the graph computing engine into the file storage system in batches.
  • the graph state data writing component may include a graph state data acquisition unit 1401, a first data encoding unit 1402, a data sorting unit 1403, a first data writing unit 1404, a logical address recording unit 1405, and a memory index maintenance unit 1406.
  • the graph state data obtaining unit 1401 is configured to obtain in batches graph state data obtained by the graph computation engine during graph computation, the graph state data including vertex data and/or edge data.
  • the graph state data acquiring unit 1401 For operations of the graph state data acquiring unit 1401, reference may be made to the operations described above with reference to 210 of FIG. 2 .
  • the first data encoding unit 1402 is configured to encode each graph state data in the acquired graph state data into kv data.
  • the graph state data is vertex data
  • encode the vertex ID in the vertex data as the key in the kv data
  • encode the non-vertex ID data in the vertex data as the value in the kv data.
  • Non-vertex ID data may include, for example, vertex metadata, vertex attributes, and the like.
  • the starting point ID in the edge data is encoded as a key in the kv data
  • the non-starting point ID data in the edge data is encoded as a value in the kv data.
  • Non-origin ID data may include, for example, end ID, edge metadata, edge attributes, and the like.
  • the data sorting unit 1403 is configured to sort the encoded kv data based on the key of the kv data to form kv list data.
  • each key corresponds to one or more values.
  • the first data writing unit 1404 is configured to sequentially write the values of the kv list data into data files in the file storage system.
  • the operation of the first data writing unit 1404 may refer to the operation described above with reference to 240 of FIG. 2 .
  • the logical address recording unit 1405 is configured to record the corresponding logical address of each key in the data file, and the logical address includes the file ID of the data file in which the value corresponding to the key is written and the corresponding value in the data file in which the corresponding value is written.
  • the operation of the logical address recording unit 1405 may refer to the operation described above with reference to 250 of FIG. 2 .
  • the memory index maintenance unit 1406 is configured to maintain the obtained memory index of the graph state data in the memory of the graph state management device, and the memory index is used to reflect the index relationship between the key and the corresponding logical address.
  • the memory index maintenance unit 1406 For operations of the memory index maintenance unit 1406, reference may be made to the operations described above with reference to 260 in FIG. 2 .
  • data compression may be performed on the value of the sorted kv list data, and the compressed value may be written into the file storage system.
  • FIG. 15 shows an example block diagram of a first data writing unit 1500 according to an embodiment of the present specification.
  • the first data writing unit 1500 can compress the value in the kv data and then write it into a data file in the file storage system.
  • the first data writing unit 1500 includes a data block construction module 1510 , a data block compression module 1520 and a data block writing module 1530 .
  • the data block construction module 1510 is configured to construct the value of the kv list data into a plurality of ordered data blocks with a first data size.
  • the data block building module 1510 reference may be made to the operation described above with reference to 510 in FIG. 5 .
  • the data block compression module 1520 is configured to perform data compression on the constructed ordered data blocks.
  • the operation of the data block compression module 1520 may refer to the operation described above with reference to 520 of FIG. 5 .
  • the data block writing module 1530 is configured to sequentially write the data compressed ordered data blocks into data files in the file storage system, and the data files include each data compressed ordered data blocks and metadata blocks , the mapping relationship between the first file offset address corresponding to the metadata record key in the metadata block and the second file offset address of the compressed ordered data block in the data file.
  • the operation of the data block writing module 1530 may refer to the operation described above with reference to 530 of FIG. 5 .
  • FIG. 16 shows another example block diagram of a graph state data writing component 1600 according to an embodiment of the specification.
  • a mutable data table and an immutable data table are maintained in the memory of the graph state management device.
  • the graph state data writing component includes a graph state data acquisition unit 1610, a first data encoding unit 1620, a data sorting unit 1630, a first data writing unit 1640, a logical address recording unit 1650, and a memory index maintenance unit 1660 , a second data writing unit 1670 , a first judging unit 1680 and a data table converting unit 1690 .
  • the graph state data acquisition unit 1610 acquires graph state data obtained by the graph computing engine during graph computation in batches, and the graph state data includes vertex data and/or edge data.
  • the first data encoding unit 1620 encodes each graph state data in the acquired graph state data into kv data.
  • the graph state data is vertex data
  • Non-vertex ID data may include, for example, vertex metadata, vertex attributes, and the like.
  • Non-origin ID data may include, for example, end ID, edge metadata, edge attributes, and the like.
  • the second data writing unit 1670 writes the encoded kv data into the variable data table.
  • the first judging unit 1680 judges whether the data size of the variable data table in which the kv data is written reaches a threshold.
  • the data sorting unit 1630 sorts the kv data written in the variable data table based on the key of the kv data to form kv list data. After sorting the kv data in the variable data table, the data table conversion unit 1690 converts the sorted variable data table into an immutable data table.
  • the first data writing unit 1640 writes the value of the immutable data table into data files in the file storage system in sequence, and each immutable data table corresponds to a data file.
  • the logical address recording unit 1650 records the corresponding logical address of each key in the data file, and the logical address includes the file ID of the data file in which the value corresponding to the key is written and the first position of the corresponding value in the written data file. A file offset address.
  • the memory index maintenance unit 1660 After writing the immutable data table into the data file in the file storage system and recording the corresponding logical address of each key, the memory index maintenance unit 1660 maintains the obtained memory index of the graph state data in the memory of the graph state management device, so The above memory index is used to reflect the index relationship between the key and the corresponding logical address.
  • the first data writing unit 1640 in FIG. 16 can also be implemented by using the first data writing unit 1500 shown in FIG. 15 .
  • the data block construction module is configured to construct the value in the immutable data table into a plurality of ordered data blocks with the first data size.
  • the first data writing unit 1640 and the second data writing unit 1670 can be realized by using the same unit.
  • the graph state data management apparatus 1400 may include a graph state data reading component.
  • the graph state data reading component is configured to read corresponding graph state data and return it to the graph computing engine in response to receiving a graph state data reading request from the graph computing engine.
  • the graph state data reading component may include a second data encoding unit 1407 , a logical address query unit 1408 , a data obtaining unit 1409 , a data decoding unit 1410 and a data providing unit 1411 .
  • the second data encoding unit 1407 is configured to encode the data ID in the graph state data read request as a target key in response to receiving the graph state data read request from the graph computing engine.
  • the operation of the second data encoding unit 1407 may refer to the operation described above with reference to 1110 of FIG. 11 .
  • the logical address query unit 1408 is configured to query the corresponding logical address in the memory index maintained in the memory of the graph state management device based on the target key. For the operation of the logical address query unit 1408, reference may be made to the operation described above with reference to 1120 in FIG. 11 .
  • the data acquisition unit 1409 is configured to acquire the value corresponding to the target key according to the logical address.
  • For operations of the data acquisition unit reference may be made to the operations described above with reference to 1130 in FIG. 11 .
  • the data decoding unit 1410 is configured to decode the acquired value to obtain target graph state data. For example, the data decoding unit 1410 may decode the acquired value to obtain the data of the non-data ID part in the state data of the target graph. In an example, the data decoding unit 1410 may use the decoded data that is not part of the data ID as the state data of the target graph. In another example, the data decoding unit 1410 may combine the decoded non-data ID part data with the data ID to obtain the target graph state data.
  • the operation of the data decoding unit 1410 may refer to the operation described above with reference to 1140 of FIG. 1 .
  • the data providing unit 1411 is configured to provide the obtained target graph state data to the graph computing engine.
  • the operation of the data providing unit 1411 may refer to the operation described above with reference to 1150 of FIG. 11 .
  • the graph state data management apparatus 1400 may further include a data filtering unit 1412 .
  • the data filtering unit 1412 is configured to perform data filtering on the obtained target graph state data using a given data filtering policy before providing the obtained target graph state data to the graph computation engine.
  • the graph state data management apparatus 1400 may further include a memory index update judging unit 1413 and a memory index update unit 1414 .
  • the memory index update judging unit 1413 is configured to judge whether to update the memory index after writing the value of the kv list data into the data file in the file storage system in sequence and recording the corresponding logical address of each key in the data file.
  • the memory index update unit 1414 is configured to use the recorded logical address of each key to perform incremental logical address update on the corresponding logical address in the memory index in response to the need to update the memory index.
  • the memory index update judging unit 1413 and the memory index update unit 1414 can form a graph state data update component together with the graph state data writing component.
  • the graph state data update component can update the graph state data stored in the data file of the file storage system.
  • the update operation of the graph state data update component can be implemented using a batch update strategy.
  • the incremental graph state data is written to the data file of the file storage system in an append-only manner, and for each key, the initial logical address is maintained in the memory index and subsequent incremental logical addresses.
  • the graph state data management apparatus 1400 may further include a data aggregation unit 1415 .
  • the data aggregation unit 1415 uses a given data compaction policy to perform data compaction on the value stored in the data file of the file storage system.
  • FIG. 17 shows an example block diagram of a data acquisition unit 1700 according to an embodiment of the present specification.
  • a data LRU cache is maintained in the memory of the graph state management device, and the data LRU cache is used to cache previously acquired values in association with the corresponding logical address of the key.
  • the data acquisition unit 1700 includes a data cache judgment module 1710 , a data acquisition request initiation module 1720 and a data acquisition module 1730 .
  • the data cache judging module 1710 is configured to judge whether the value corresponding to the target key is cached in the data LRU cache according to the logical address after querying the corresponding logical address.
  • the data acquisition module 1730 acquires the corresponding value from the data LRU cache.
  • the data acquisition request initiating module 1720 initiates a data acquisition request to the file storage system, and the data acquisition request includes a corresponding logical address.
  • the data obtaining module 1730 receives the value returned in response to the data obtaining request from the file storage system, and the returned value is obtained by the file storage system from the data file of the file storage system according to the corresponding logical address.
  • the data LRU cache may not be maintained in the memory of the graph state management device. In this case, the data cache judgment module 1710 needs to be removed from the data acquisition unit shown in FIG. 17 .
  • FIG. 18 shows another example block diagram of a data acquisition unit 1800 according to an embodiment of the present specification.
  • the value of the graph state data is constructed as a plurality of ordered data blocks with the first data size and written into the data file of the file storage system after data compression.
  • the data file includes each The compressed ordered data block and metadata block, the offset address of the first file corresponding to the metadata record key in the metadata block and the second file offset address of the compressed ordered data block in the data file mapping relationship between them.
  • a data block LRU cache is maintained in the memory of the graph state management device, and the data block LRU cache is used to cache previously acquired data blocks in association with the corresponding logical address of the key.
  • the data acquisition unit 1800 includes a data block cache judgment module 1810 , a data block acquisition request initiation module 1820 , a data block acquisition module 1830 , a data block decompression module 1840 , an offset address determination module 1850 and a data acquisition module 1860 .
  • the data block cache judging module 1810 is configured to judge whether there is a compressed data block corresponding to the target key cached in the data block LRU cache according to the logical address after querying the corresponding logical address.
  • the data block obtaining module 1830 obtains the corresponding compressed data block from the data block LRU cache.
  • the data block obtaining request initiating module 1820 initiates a data block obtaining request to the file storage system, and the data block obtaining request includes a corresponding logical address.
  • the data block acquisition module 1830 receives from the file storage system the compressed data block returned in response to the data block acquisition request, and the returned compressed data block is stored in the file storage system by the file storage system according to the first file offset address obtained from the data file.
  • the data block decompression module 1840 decompresses the obtained compressed data block.
  • the offset address determination module 1850 determines the third offset of the value corresponding to the target key in the decompressed data block based on the first file offset address in the logical address and the specified size of the data block (that is, the first data size). address.
  • the data acquisition module 1860 acquires the value corresponding to the target key from the decompressed data block according to the third offset address.
  • the memory of the graph state management device may not maintain a data block LRU cache.
  • the data block cache judgment module 1810 needs to be removed from the data acquisition unit shown in FIG. 18 .
  • the above graph state data management device can be realized by hardware, software or a combination of hardware and software.
  • Fig. 19 shows a schematic diagram of a graph state data management device 1900 implemented based on a computer system according to an embodiment of the present specification.
  • the graph state data management device 1900 may include at least one processor 1910, a memory (for example, a non-volatile memory) 1920, a memory 1930, and a communication interface 1940, and at least one processor 1910, a memory 1920, a memory 1930 and communication interface 1940 are connected together via bus 1960 .
  • At least one processor 1910 executes at least one computer-readable instruction (ie, the elements implemented in software described above) stored or encoded in a memory.
  • computer-executable instructions are stored in the memory, and when executed, at least one processor 1910: acquires graph state data obtained by the graph computation engine during graph computation in batches, the graph state data includes vertex data and/or Edge data; encode each graph state data in the obtained graph state data into kv data, where the vertex ID in the vertex data and/or the starting point ID in the edge data are encoded as keys, and the non-vertex in the vertex data The ID data and/or the non-starting point ID data in the edge data are encoded as value; the kv data is sorted based on the key of the kv data to form the kv list data, in the kv list data, each key corresponds to one or more values; Write the value of the kv list data into the data file in the file storage system in sequence and record the corresponding logical address of each key in the data file.
  • the recorded logical address includes the file ID of the data file written by the value corresponding to the key and The first file offset address of the corresponding value in the written data file; and the memory index of the obtained graph state data is maintained in the memory of the graph state management device, and the memory index is used to reflect the key and the corresponding logic Index relationship between addresses.
  • a program product such as a machine-readable medium (eg, a non-transitory machine-readable medium) is provided.
  • the machine-readable medium may have instructions (that is, the above-mentioned elements implemented in software), which, when executed by the machine, cause the machine to perform the various operations and operations described above in conjunction with FIGS. 1-18 in various embodiments of this specification.
  • Function Specifically, a system or device equipped with a readable storage medium can be provided, on which a software program code for realizing the functions of any one of the above embodiments is stored, and the computer or device of the system or device can The processor reads and executes the instructions stored in the readable storage medium.
  • the program code itself read from the readable medium can realize the function of any one of the above-mentioned embodiments, so the machine-readable code and the readable storage medium storing the machine-readable code constitute the present invention. a part of.
  • Examples of readable storage media include floppy disks, hard disks, magneto-optical disks, optical disks (such as CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD-RW), magnetic tape, non- Volatile memory card and ROM.
  • the program code can be downloaded from a server computer or cloud via a communication network.
  • a computer program product includes a computer program, and when the computer program is executed by a processor, the processor executes the above described in conjunction with FIGS. 1-18 in various embodiments of this specification. Various operations and functions.
  • the execution order of each step is not fixed, and can be determined as required.
  • the device structures described in the above embodiments may be physical structures or logical structures, that is, some units may be realized by the same physical entity, or some units may be realized by multiple physical entities, or may be realized by multiple physical entities. Certain components in individual devices are implemented together.
  • the hardware units or modules may be implemented mechanically or electrically.
  • a hardware unit, module, or processor may include permanently dedicated circuitry or logic (such as a dedicated processor, FPGA, or ASIC) to perform the corresponding operations.
  • the hardware unit or processor may also include programmable logic or circuits (such as a general-purpose processor or other programmable processors), which can be temporarily set by software to complete corresponding operations.
  • the specific implementation mechanical way, or a dedicated permanent circuit, or a temporary circuit

Abstract

一种图状态数据管理方法及装置。方法包括:在从图计算引擎获取批量图状态数据后(210),将批量图状态数据中的各个图状态数据编码为kv数据(220);基于kv数据的key对kv数据进行排序以形成kv列表数据(230),在kv列表数据中,每个key对应一个或多个value;接着,将kv列表数据的value按序写入文件存储系统中的数据文件(240),同时记录各个key在数据文件中的对应逻辑地址(250),所记录的逻辑地址包括所写入数据文件的文件ID以及对应的value在所写入数据文件中的第一文件偏移地址;然后,在图状态管理设备的内存中维护批量图状态数据的内存索引(260),所维护的内存索引用于反映key与对应的逻辑地址之间的索引关系。

Description

图状态数据管理 技术领域
本说明书实施例通常涉及图计算领域,尤其涉及图状态数据管理方法及图状态数据管理装置。
背景技术
图计算是指面向图数据结构的复杂计算。在进行图计算时,图计算引擎将现实中的业务数据抽象为图数据结构并进行复杂计算。图数据结构是由顶点和边构成的复杂数据结构,该数据结构包括多种数据属性。
图计算引擎所执行的图计算是迭代计算。在每轮迭代计算时,图计算引擎会产生中间结果,所产生的中间结果可以称为图状态数据。在一些实时图计算应用场景下,所构建的实时图计算引擎具备流式计算和图计算融合的图计算能力。为了确保流式计算中的数据容错,需要将图状态数据存储在例如图计算引擎的内存、缓存或本地磁盘中,并对所存储的图状态数据进行管理。
发明内容
鉴于上述,本说明书实施例提供图状态数据管理方法及装置。利用该图状态数据管理方法及装置,可以把图状态管理从图计算中解耦,实现计算存储分离,并且可以实现更大规模的图状态数据管理。
根据本说明书实施例的一个方面,提供一种应用于图状态管理设备的图状态数据管理方法,包括:批量获取图计算引擎在图计算时得到的图状态数据,所述图状态数据包括顶点数据和/或边数据;将所述图状态数据中的各个图状态数据编码为kv数据,其中,顶点数据中的顶点ID和/或边数据中的起点ID被编码为key,以及顶点数据中的非顶点ID数据和/或边数据中的非起点ID数据被编码为value;基于kv数据的key对所述列表进行排序以形成kv列表数据,在所述kv列表数据中,每个key对应一个或多个value;将所述kv列表数据的value按序写入文件存储系统中的数据文件并记录各个key在数据文件中的对应逻辑地址,所述逻辑地址包括与key对应的value所写入数据文件的文件ID以及所述对应的value在所写入数据文件中的第一文件偏移地址;以及在所述图状态管理设备的内存中维护所述图状态数据的内存索引,所述内存索引用于反映key与对应的逻辑地址之间的索引关系。
可选地,在上述方面的一个示例中,所述图状态管理设备的内存中维护可变数据表和不可变数据表。在基于kv数据的key对所述kv数据进行排序之前,所述图状态数据管理方法还可以包括:将所述kv列表数据写入可变数据表;以及判断写入kv数据的可变数据表的数据大小是否达到阈值。相应地,基于kv数据的key对所述kv数据进行排序可以包括:响应于写入kv数据的可变数据表的数据大小达到阈值,基于kv数据的key来对写入该可变数据表中的kv数据进行排序以形成kv列表数据。将所述kv列表数据的value按序写入文件存储系统中的数据文件可以包括:将排序后的可变数据表转换为不可变数据表;以及将不可变数据表的value按序写入文件存储系统中的数据文件,每个不可变数据表对应一个数据文件。
可选地,在上述方面的一个示例中,将所述kv列表数据的value按序写入文件存储系统中的数据文件可以包括:将所述kv列表数据的value构建为多个具有第一数据大小的有序数据块;对所构建的有序数据块进行数据压缩;以及将经过数据压缩后的有序数据块按序写入文件存储系统中的数据文件,所述数据文件包括各个经过数据压缩后的有序数据块以及元数据块,所述元数据块中的元数据记录key所对应的第一文件偏移地址与压缩后的有序数据块在数据文件中的第二文件偏移地址之间的映射关系。
可选地,在上述方面的一个示例中,所述图状态数据管理方法还可以包括:响应于从图计算引擎接收到图状态数据读取请求,将图状态数据读取请求中的数据ID编码为目标key,所述数据ID包括顶点ID和/或边的起点ID;基于所述目标key在所述内存索引中查询对应的逻辑地址;根据所述逻辑地址获取与所述目标key对应的value;对所获取的value进行解码以得到目标图状态数据;以及将所得到的目标图状态数据提供给图计算引擎。
可选地,在上述方面的一个示例中,根据所述逻辑地址获取与所述目标key对应的value可以包括:响应于查询到对应的逻辑地址,向所述文件存储系统发起数据获取请求,所述数据获取请求包括所述对应的逻辑地址;以及从所述文件存储系统接收响应于所述图状态数据获取请求而返回的value,所返回的value由所述文件存储系统根据所述对应的逻辑地址在所述文件存储系统的数据文件中获取。
可选地,在上述方面的一个示例中,所述图状态管理设备的内存中维护数据LRU缓存,所述数据LRU缓存用于与key的对应逻辑地址相关联地缓存先前获取的value。在向所述文件存储系统发起数据获取请求之前,根据所述逻辑地址获取与所述目标key对应的value还可以包括:根据所述逻辑地址判断所述数据LRU缓存中是否缓存有与所述目标key对应的value;以及在所述数据LRU缓存中缓存有与所述目标key对应的value时,从所述数据LRU缓存中获取对应的value。
可选地,在上述方面的一个示例中,所述图状态数据的value被构建为多个具有第一数据大小的有序数据块并经过数据压缩后写入到所述文件存储系统的数据文件中,所述数据文件包括各个经过数据压缩后的有序数据块以及元数据块,所述元数据块中的元数据记录key所对应的第一文件偏移地址与压缩后的有序数据块在数据文件中的第二文件偏移地址之间的映射关系。相应地,根据所述逻辑地址获取与所述目标key对应的value可以包括:响应于查询到对应的逻辑地址,向所述文件存储系统发起数据块获取请求,所述数据块获取请求包括所述对应的逻辑地址;从所述文件存储系统接收响应于所述数据块获取请求而返回的压缩数据块,所述压缩数据块由所述文件存储系统根据所述第一文件偏移地址在所述文件存储系统的数据文件中获取;对所得到的压缩数据块进行解压缩;基于所述逻辑地址中的第一文件偏移地址以及所述第一数据大小,确定与所述目标key对应的value在所述解压后的数据块中的第三偏移地址;以及根据所述第三偏移地址,从所述解压后的数据块中获取与所述目标key对应的value。
可选地,在上述方面的一个示例中,所述图状态管理设备的内存中维护数据块LRU缓存,所述数据块LRU缓存用于与key的对应逻辑地址相关联地缓存先前获取的数据块。在向所述文件存储系统发起数据块获取请求之前,根据所述逻辑地址获取与所述目标key对应的value还可以包括:根据所述逻辑地址,判断所述数据块LRU缓存中是否缓存有与所述目标key对应的压缩数据块;以及在所述数据块LRU缓存中缓存有与所述目标key对应的压缩数据块时,从所述数据块LRU缓存中获取对应的压缩数据块。
可选地,在上述方面的一个示例中,将所得到的图状态数据提供给图计算引擎之前,所述图状态数据管理方法还可以包括:使用给定数据过滤策略,对所得到的图状态数据进行数据过滤。
可选地,在上述方面的一个示例中,在将所述kv列表数据的value按序写入文件存储系统中的数据文件并记录各个key在数据文件中的对应逻辑地址后,所述图状态数据管理方法还可以包括:判断是否需要进行内存索引更新;以及响应于判断为需要进行内存索引更新,使用所记录的各个key的逻辑地址对内存索引中的对应逻辑地址进行增量逻辑地址更新。
可选地,在上述方面的一个示例中,所述图状态数据管理方法还可以包括:响应于满足数据聚合条件,使用给定数据聚合策略对所述文件存储系统的数据文件中存储的图状态数据进行数据聚合。
根据本说明书的实施例的另一方面,提供一种应用于图状态管理设备的图状态数据管理装置,包括:图状态数据获取单元,批量获取图计算引擎在图计算时得到的图状态数据,所述图状态数据包括顶点数据和/或边数据;第一数据编码单元,将所述图状态数据中的各个图状态数据编码为kv数据,其中,顶点数据中的顶点ID和/或边数据中的起点ID被编码为key,以及顶点数据中的非顶点ID数据和/或边数据中的非起点ID数据被编码为value;数据排序单元,基于kv数据的key对所述kv数据进行排序以形成kv列表数据,在所述kv列表数据中,每个key对应一个或多个value;第一数据写入单元,将所述kv列表数据的value按序写入文件存储系统中的数据文件;逻辑地址记录单元,记录各个key在数据文件中的对应逻辑地址,所述逻辑地址包括与key对应的value所写入数据文件的文件ID以及所述对应的value在所写入数据文件中的第一文件偏移地址;以及内存索引维护单元,在所述图状态管理设备的内存中维护所述图状态数据的内存索引,所述内存索引用于反映key与对应的逻辑地址之间的索引关系。
可选地,在上述方面的一个示例中,所述图状态管理设备的内存中维护可变数据表和不可变数据表。所述图状态数据管理装置还可以包括:第二数据写入单元,在基于kv数据的key对所述kv数据进行排序之前,将所述kv数据写入可变数据表;以及第一判断单元,判断写入kv数据的可变数据表的数据大小是否达到阈值,所述数据排序单元响应于写入kv数据的可变数据表的数据大小达到阈值,基于kv数据的key来对写入该可变数据表中的kv数据进行排序以形成kv列表数据。所述图状态数据管理装置还可以包括:数据表转换单元,将排序后的可变数据表转换为不可变数据表,其中,所述第一数据写入单元将不可变数据表的value按序写入文件存储系统中的数据文件,每个不可变数据表对应一个数据文件。
可选地,在上述方面的一个示例中,所述第一数据写入单元可以包括:数据块构建模块,将所述kv列表数据的value构建为多个具有第一数据大小的有序数据块;数据块压缩模块,对所构建的有序数据块进行数据压缩;以及数据块写入模块,将经过数据压缩后的有序数据块按序写入文件存储系统中的数据文件,所述数据文件包括各个经过数据压缩后的有序数据块以及元数据块,所述元数据块中的元数据记录key所对应的第一文件偏移地址与压缩后的有序数据块在数据文件中的第二文件偏移地址之间的映射关系。
可选地,在上述方面的一个示例中,所述图状态数据管理装置还可以包括:第二数据编码单元,响应于从图计算引擎接收到图状态数据读取请求,将图状态数据读取请求中的数据ID编码为目标key,所述数据ID包括顶点ID和/或边的起点ID;逻辑地址查询单元,基于所述目标key在所述内存索引中查询对应的逻辑地址;数据获取单元,根据所述逻辑地址获取与所述目标key对应的value;数据解码单元,对所获取的value进行解码以得到目标图状态数据;以及数据提供单元,将所得到的目标图状态数据提供给图计算引擎。
可选地,在上述方面的一个示例中,所述数据获取单元可以包括:数据获取请求发起模块,响应于查询到对应的逻辑地址,向所述文件存储系统发起数据获取请求,所述数据获取请求包括所述对应的逻辑地址;以及数据获取模块,从所述文件存储系统接收响应于所述数据获取请求而返回的value,所返回的value由所述文件存储系统根据所述对应的逻辑地址在所述文件存储系统的数据文件中获取。
可选地,在上述方面的一个示例中,所述图状态管理设备的内存中维护数据LRU缓存,所述数据LRU缓存用于与key的对应逻辑地址相关联地缓存先前获取的value。所述数据获取单元还可以包括:数据缓存判断模块,在向所述文件存储系统发起数据获取请求之前,根据所述逻辑地址,判断所述数据LRU缓存中是否缓存有与所述目标key对应的value,其中,在所述数据LRU缓存中缓存有与所述目标key对应的value时,所述数据获取模块从所述数据LRU缓存中获取对应的value。
可选地,在上述方面的一个示例中,所述图状态数据的value被构建为多个具有第一数据大小的有序数据块并经过数据压缩后写入到所述文件存储系统的数据文件中,所述数据文件包括各个经过数据压缩后的有序数据块以及元数据块,所述元数据块中的元数据记录key所对应的第一文件偏移地址与压缩后的有序数据块在数据文件中的第二文件偏移地址之间的映射关系。相应地,所述数据获取单元可以包括:数据块获取请求发起模块,响应于查询到对应的逻辑地址,向所述文件存储系统发起数据块获取请求,所述数据块获取请求包括所述对应的逻辑地址;数据块获取模块,从所述文件存储系统接收响应于所述数据块获取请求而返回的压缩数据块,所述压缩数据块由所述文件存储系统根据所述第一文件偏移地址在所述文件存储系统的数据文件中获取;数据块解压模块,对所得到的压缩数据块进行解压缩;偏移地址确定模块,基于所述逻辑地址中的第一文件偏移地址以及所述第一数据大小,确定与所述目标key对应的value在所述解压后的数据块中的第三偏移地址;以及数据获取模块,根据所述第三偏移地址,从所述解压后的数据块中获取与所述目标key对应的value。
可选地,在上述方面的一个示例中,所述图状态管理设备的内存中维护数据块LRU缓存,所述数据块LRU缓存用于与key的对应逻辑地址相关联地缓存先前获取的数据块。所述数据获取单元还可以包括:数据块缓存判断模块,在向所述文件存储系统发起数据块获取请求之前,根据所述逻辑地址判断所述数据块LRU缓存中是否缓存有与所述目标key对应的压缩数据块,其中,在所述数据块LRU缓存中缓存有与所述目标key 对应的压缩数据块时,所述数据块获取模块从所述数据块LRU缓存中获取对应的压缩数据块。
可选地,在上述方面的一个示例中,所述图状态数据管理装置还可以包括:数据过滤单元,在将所得到的图状态数据提供给图计算引擎之前,使用给定数据过滤策略来对所得到的图状态数据进行数据过滤。
可选地,在上述方面的一个示例中,所述图状态数据管理装置还可以包括:内存索引更新判断单元,在将所述kv列表数据的value按序写入文件存储系统中的数据文件并记录各个key在数据文件中的对应逻辑地址后,判断是否需要进行内存索引更新;以及内存索引更新单元,响应于需要进行内存索引更新,使用所记录的各个key的逻辑地址对内存索引中的对应逻辑地址进行增量逻辑地址更新。
可选地,在上述方面的一个示例中,所述图状态数据管理装置还可以包括:数据聚合单元,响应于满足数据聚合条件,使用给定数据聚合策略对所述文件存储系统的数据文件中存储的图状态数据进行数据聚合。
根据本说明书的实施例的另一方面,提供一种图状态数据管理装置,包括:至少一个处理器,与所述至少一个处理器耦合的存储器,以及存储在所述存储器中的计算机程序,所述至少一个处理器执行所述计算机程序来实现如上所述的图状态数据管理方法。
根据本说明书的实施例的另一方面,提供一种计算机可读存储介质,其存储有可执行指令,所述指令当被执行时使得处理器执行如上所述的图状态数据管理方法。
根据本说明书的实施例的另一方面,提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行来实现如上所述的图状态数据管理方法。
附图说明
通过参照下面的附图,可以实现对于本说明书内容的本质和优点的进一步理解。在附图中,类似组件或特征可以具有相同的附图标记。
图1示出了根据本说明书的实施例的图状态管理架构的示例示意图。
图2示出了根据本说明书的实施例的图状态数据写入方法的示例流程图。
图3示出了根据本说明书的实施例的kv列表数据的示例示意图。
图4示出了根据本说明书的实施例的内存索引结构的示例示意图。
图5示出了根据本说明书的实施例的数据文件写入过程的示例示意图。
图6示出了根据本说明书的实施例的写入有图状态数据的数据文件的示例示意图。
图7示出了根据本说明书的实施例的图状态数据写入方法的另一示例流程图。
图8示出了根据本说明书的实施例的内存索引更新过程的示例流程图。
图9示出了根据本说明书的实施例的更新后的内存索引结构的示例示意图。
图10示出了根据本说明书的实施例的图状态数据聚合(compact)过程的示例流程图。
图11示出了根据本说明书的实施例的图状态数据读取方法的示例流程图。
图12示出了根据本说明书的实施例的value获取过程的示例流程图。
图13示出了根据本说明书的实施例的value获取过程的另一示例流程图。
图14示出了根据本说明书的实施例的图状态数据管理装置的示例方框图。
图15示出了根据本说明书的实施例的第一数据写入单元的示例方框图。
图16示出了根据本说明书的实施例的图状态数据写入组件的另一示例方框图。
图17示出了根据本说明书的实施例的数据获取单元的示例方框图。
图18示出了根据本说明书的实施例的数据获取单元的另一示例方框图。
图19示出了根据本说明书的实施例的基于计算机系统实现的图状态数据管理装置的示例示意图。
具体实施方式
现在将参考示例实施方式讨论本文描述的主题。应该理解,讨论这些实施方式只是为了使得本领域技术人员能够更好地理解从而实现本文描述的主题,并非是对权利要求书中所阐述的保护范围、适用性或者示例的限制。可以在不脱离本说明书内容的保护范围的情况下,对所讨论的元素的功能和排列进行改变。各个示例可以根据需要,省略、替代或者添加各种过程或组件。例如,所描述的方法可以按照与所描述的顺序不同的顺 序来执行,以及各个步骤可以被添加、省略或者组合。另外,相对一些示例所描述的特征在其它例子中也可以进行组合。
如本文中使用的,术语“包括”及其变型表示开放的术语,含义是“包括但不限于”。术语“基于”表示“至少部分地基于”。术语“一个实施例”和“一实施例”表示“至少一个实施例”。术语“另一个实施例”表示“至少一个其他实施例”。术语“第一”、“第二”等可以指代不同的或相同的对象。下面可以包括其他的定义,无论是明确的还是隐含的。除非上下文中明确地指明,否则一个术语的定义在整个说明书中是一致的。
图1示出了根据本说明书的实施例的图状态管理架构1的示例示意图。如图1所示,图状态管理架构1包括图计算引擎10、图状态管理设备20和文件存储系统30。
图计算引擎10被配置为使用图数据执行图计算。在进行图计算时,图计算引擎10将现实中的业务数据抽象为图数据结构。图数据可以包括顶点数据和边数据。顶点数据例如可以包括顶点标识和顶点属性。在一个示例中,顶点标识可以包括顶点ID和顶点类型。在另一示例中,顶点标识可以仅仅包括顶点ID。顶点标识用于唯一标识图数据中的顶点。边数据可以包括边标识和边属性。边标识可以包括起点ID、边类型、边时间戳和终点ID。或者,边标识可以包括起点ID和终点ID。顶点标识、边标识、顶点属性和边属性可以与业务相关。比如,对于社交网络场景,顶点ID可以是人的身份证号码或者人员编号等。顶点类型可以是顶点所属分类,例如,顶点分类为用户类顶点。顶点属性可以包括年龄、学历、住址、职业等。边类型用于表示边所属类型,例如,在顶点A和B之间创建一条转账边,则该转账边的边类型可以为“转账”。边属性可以包括顶点与顶点之间所形成边的属性。例如,在上述转账边中,边属性可以包括“金额”、“货币”、“操作设备”等。
图计算引擎10所执行的图计算是迭代计算。在每轮迭代计算时,图计算引擎10会产生中间结果,所产生的中间结果可以称为图状态数据。图计算引擎10可以包括本领域适用的任何图计算引擎。在一些实时图计算应用场景下,图计算引擎10可以具备流式计算和图计算融合的图计算能力。
图计算引擎10所产生的图状态数据被提供给图状态管理设备20。图状态管理设备20包括图状态数据管理装置21和内存22。图状态数据管理设备21被配置为对图状态数据进行管理,例如,将图状态数据写入(存储)到文件存储系统30,对文件存储系统30中写入的图状态数据进行数据更新、数据读取、数据过滤、过期数据删除和/或数据聚合等。在一些实施例中,图计算引擎10和图状态管理设备20可以独立地部署。在一些实施例中,图状态管理设备20可以与图计算引擎10集成在一起。在这种情况下,图状态管理设备20可以与图计算引擎10共用同一内存。
文件存储系统30也可以称为外部存储系统,例如,云端文件存储系统等。文件存储系统30可以支持数据多备份或者其他数据容灾机制,以确保数据可靠性。在一些实施例中,文件存储系统30可以是分布式文件存储系统。
图2示出了根据本说明书的实施例的图状态数据写入方法200的示例流程图。
如图2所示,在210,在图计算引擎通过图计算得到图状态数据后,图状态数据管理装置从图计算引擎获取批量图状态数据。所获取的图状态数据可以包括顶点数据和/或边数据。顶点数据可以包括顶点ID、顶点元数据和顶点属性等。边数据可以包括边的起点ID和终点ID、边元数据以及边属性等。在一个示例中,顶点元数据和边元数据可以固定为8个字节,可以包含时间戳信息,是否为点,出边还是入边,用户定义标签(label)等。顶点属性和/或边属性可以是自定义属性。
在220,图状态数据管理装置将所获取的图状态数据中的各个图状态数据编码为kv数据。在图状态数据为顶点数据时,将顶点数据中的顶点ID编码为kv数据中的键(key),以及将顶点数据中的非顶点ID数据编码为kv数据中的值(value)。非顶点ID数据例如可以包括顶点元数据、顶点属性等。在图状态数据为边数据时,将边数据中的起点ID编码为kv数据中的key,以及将边数据中的非起点ID数据编码为kv数据中的value。非起点ID数据例如可以包括终点ID、边元数据、边属性等。
在230,图状态数据管理装置基于kv数据的key,对编码所得到的kv数据进行排序,以形成kv列表数据。例如,可以基于顶点ID或边的起点ID的ID大小,对kv数据进行排序,并对具有相同key的value进行聚合,以形成kv列表数据。例如,假如两个或多个图状态数据具有相同的key,则将该两个或多个图状态数据所对应的value聚合在 一起。在所形成的kv列表数据中,每个key可以对应一个或多个value。
图3示出了根据本说明书的实施例的kv列表数据的示例示意图。在图3的示例中,从图计算引擎得到5条图状态数据,在对5条图状态数据进行编码后得到5条kv数据(K1,V1)、(K2,V2)、(K2,V3)、(K2,V4)和(K3,V5)。在基于key进行排序并进行value聚合后,得到右边所示的kv列表数据。在该kv列表数据中,K1对应V1,K2对应V2、V3和V4,以及K3对应V5。
在如上得到kv列表数据后,在240,图状态数据管理装置将kv列表数据的value按序写入文件存储系统中的数据文件,并且在250,图状态数据管理装置记录各个key在数据文件中的对应逻辑地址,所记录的逻辑地址包括与key对应的value所写入数据文件的文件ID以及对应的value在所写入数据文件中的第一文件偏移地址。这里,按序写入是指按照value所对应的key的排序将value顺序写入数据文件。
在260,图状态数据管理装置在图状态管理设备的内存中维护所写入的批量图状态数据的内存索引,该内存索引用于反映key与对应的逻辑地址之间的索引关系。在本说明书中,可以支持多种内存索引结构,比如FST,跳表和CSR等。在所维护的内存索引中,内存索引的index对应基于顶点ID或边的起点ID编码得到的key,即,index基于key的排序结果得到,内存索引的value对应各个key所对应的value在文件存储系统的数据文件中的逻辑地址。
图4示出了根据本说明书的实施例的内存索引结构的示例示意图。在图4的示例中,内存索引存储为java array结构。array结构中的index对应基于顶点ID或边的起点ID编码得到的key,即,index基于key的排序结果得到,array结构中的value对应各个key所对应的value在文件存储系统的数据文件中的逻辑地址。
如图4所示,假设存在4个图状态数据A、B、C和D,经过数据编码后,所对应的key分别为01、12、23和15。在进行排序后,图状态数据A、B、C和D的排序结果为A、B、D和C。在维护内存索引时,01、12、23和15分别对应内存索引结果中的index0、index1、index3和index2。index0所对应的存储位置上记录01所对应的value所写入数据文件的文件ID(fid)以及对应的value在所写入数据文件中的第一文件偏移地址(offset)。index1所对应的存储位置上记录12所对应的value所写入数据文件的文件ID(fid)以及对应的value在所写入数据文件中的第一文件偏移地址(offset)。index2所对应的存储位置上记录15所对应的value所写入数据文件的文件ID(fid)以及对应的value在所写入数据文件中的第一文件偏移地址(offset)。Index3所对应的存储位置上记录23所对应的value所写入数据文件的文件ID(fid)以及对应的value在所写入数据文件中的第一文件偏移地址(offset)。数据文件的文件ID(fid)以及对应的value在数据文件中的第一文件偏移地址(offset)组成内存索引结构中的贴(posting)结构。
在一些实施例中,为了进一步减少写入到文件存储系统中的数据量,可以对排序后的kv列表数据的value进行数据压缩,并将压缩后的value写入到文件存储系统中。
图5示出了根据本说明书的实施例的数据文件写入过程500的示例示意图。在图5中示出的数据文件写入过程中,写入数据文件的value经过了数据压缩。
如图5所示,在将kv列表数据的value按序写入文件存储系统中的数据文件时,在510,图状态数据管理装置将kv列表数据的value构建为多个具有第一数据大小的有序数据块。这里,有序数据块的构建是根据value所对应的key的排序按序构建,并且所构建的有序数据块之间也具有排序。排序在前的有序数据块中的value所对应的key的排序在该有序数据块之后的其它有序数据块中的所有value所对应的key之前。此外,所构建的有序数据块可以具有相同的数据大小,例如,64k。
在520,图状态数据管理装置对所构建的有序数据块进行数据压缩。例如,可以使用相同的数据压缩算法来对所构建的有序数据块进行数据压缩,从而使得经过数据压缩后的有序数据块的数据大小相同。
在530,图状态数据管理装置将经过数据压缩后的有序数据块按序写入文件存储系统中的数据文件,所述数据文件包括各个经过数据压缩后的有序数据块以及元数据块。元数据块中的元数据记录key所对应的第一文件偏移地址与压缩后的有序数据块在数据文件中的第二文件偏移地址之间的映射关系。
要说明的是,元数据中记录的映射关系可以是多对一的映射关系或者一对多的映射关系,即,多个第一文件偏移地址对应一个第二文件偏移地址,或者一个第一文件偏移 地址对应多个第二文件偏移地址。例如,在图3所示的数据写入过程中,假设01和12所对应的value被构建为数据块1(block 1),以及15所对应的value被构建为数据块2(block 2),以及23所对应的value被构建为数据块3和4(block 3和block 4)。在这种情况下,在元数据中,01所对应的第一文件偏移地址offset1以及12所对应的第一文件偏移地址offset2与block 1在数据文件中的第二文件偏移地址之间形成映射关系,15所对应的第一文件偏移地址offset3与block 2在数据文件中的第二文件偏移地址之间形成映射关系,以及23所对应的第一文件偏移地址offset4与block 3和block 4在数据文件中的第二文件偏移地址之间形成映射关系。
图6示出了根据本说明书的实施例的写入有图状态数据的数据文件的示例示意图。如图6所示,数据文件包括若干个经过数据压缩后的有序数据块(例如,数据块1、数据块2、……、数据块n)以及元数据块,每个有序数据块存放key所对应的value,并且按序顺序存放在数据文件中。每个有序数据块可以存放一个或多个key所对应的value。此外,一个key所对应的value也可能会存放在两个或多个紧邻的有序数据块中。元数据块存放在数据文件的末尾。
图7示出了根据本说明书的实施例的图状态数据写入方法700的另一示例流程图。在图7的示例中,图状态管理设备的内存中维护可变数据表(mutable table)和不可变数据表(immutable table)。mutable table中可以持续写入kv数据。immutable table中的存储数据以及数据排序(数据存储顺序)被锁定,不会发生变化。即,不允许向immutable table写入kv数据。
如图7所示,在701,在从图计算引擎批量获取图状态数据后,图状态数据管理装置可以采用上述数据编码方法,将所获取的图状态数据中的各个图状态数据编码为kv数据。
在702,图状态数据管理装置将编码后的kv数据写入mutable table。具体地,判断图状态管理设备的内存中是否存在空闲的mutable table。如果存在,则将编码后的kv数据写入该空闲的mutable table。如果不存在,则在内存中创建新的mutable table,然后,将编码后的kv数据写入新的mutable table。这里,mutable table相当于内存中的一段内存空间。在一个示例中,mutable table可以逐个创建。在另一示例中,可以一次性创建多个mutable table,并且将编码后的kv数据并行写入该多个mutable table。
在703,图状态数据管理装置判断写入有kv数据的mutable table是否达到阈值,例如,64M。如果未达到阈值,则返回702,图状态数据管理装置继续写入kv数据。如果达到阈值,则在704,图状态数据管理装置基于key对mutable table中的kv数据进行排序,以形成kv列表数据,从而保证具有相同顶点ID的顶点数据或者具有相同边的起点ID的边数据在mutable table中实现数据聚合。
在705,图状态数据管理装置将排序后的mutable table封装为immutable table。immutable table可以具有规定的数据大小,例如,64M。
在706,图状态数据管理装置将immutable table中的value按序(按照所对应的key的排序结果)写入文件存储系统的数据文件中,例如,图状态数据管理装置通过异步线程将immutable table中的value按序写入文件存储系统的数据文件中,并且在707,图状态数据管理装置记录各个key在数据文件中的对应逻辑地址。
可选地,在707,图状态数据管理装置还可以判断是否需要进行内存索引更新。例如,图状态数据管理装置可以通过检查当前写入的图状态数据是否是首批图状态数据,或者检查当前写入的图状态数据是否是在对文件存储系统的数据文件进行数据compact后首次写入数据,判断是否需要进行内存索引更新。如果当前写入的图状态数据是首批图状态数据,或者当前写入的图状态数据是在对文件存储系统的数据文件进行数据compact后首次写入数据,则判断为不需要进行内存索引更新。如果当前写入的图状态数据不是首批图状态数据,并且当前写入的图状态数据也不是在对文件存储系统的数据文件进行数据compact后首次写入数据,则判断为需要进行内存索引更新。
如果判断为需要进行内存索引更新,则在709,图状态数据管理装置对图状态管理设备的内存中维护的内存索引进行增量索引更新。如果判断为不需要进行内存索引更新,则在710,图状态数据管理装置在图状态管理设备的内存中维护key到逻辑地址的内存索引。
图8示出了根据本说明书的实施例的内存索引更新过程800的示例流程图。
如图8所示,在810,图状态数据管理装置基于写入数据文件中的value所对应的key,获取内存索引文件中的初始逻辑地址,即,上次图状态数据写入时存放的逻辑地址。
在820,图状态数据管理装置合并初始逻辑地址和本次图状态数据写入时的增量逻辑地址。这里,合并过程是指将增量逻辑地址添加到初次逻辑地址之后。在830,图状态数据管理装置将合并后的逻辑地址记录到内存索引中。
图9示出了根据本说明书的实施例的更新后的内存索引结构的示例示意图。在图9的示例中,fd1+offset1以及fd2+offset2是初始逻辑地址,fd3+offset3是本次图状态数据写入时12所对应的逻辑地址。
回到图7,可选地,在完成内存索引增量更新后,还可以在711,图状态数据管理装置判断是否需要满足数据compact条件。这里,数据compact可以指将分布在多个不同数据文件中的同一key所对应的value聚合到同一数据文件中,或者从数据文件中存储的图状态数据中删除过期图状态数据。数据compact条件可以包括但不限于:同一key所对应的逻辑地址所包含的fid的数量超过预定值;同一key所对应的value的数据大小超过预定阈值等。
在判断为满足数据compact条件时,在712,图状态管理装置执行数据compact过程,并且在713,将完成数据compact后的图状态数据再次写入文件存储系统的数据文件中。图状态数据的再次写入过程可以参考上面参照图2描述的图状态数据写入过程。
图10示出了根据本说明书的实施例的图状态数据compact过程1000的示例流程图。
如图10所示,在1010,图状态数据管理装置依序从内存索引中获取各个key所对应的逻辑地址,并且在1020,图状态数据管理装置根据所获取的各个key所对应的逻辑地址,获取各个key所对应的value。
在获取到各个key所对应的value后,在1030,图状态数据管理装置对所获取的value进行数据compact。例如,基于key对所获取的value重新进行排序,或者基于各个value的存活时间(TTL,Time to Live)来删除过期value。
利用上述数据compact处理,可以将同一key所对应的value尽可能地写入到同一数据文件中,从而减少图状态数据读取时的数据读取时间。或者,通过删除过期value,可以减少写入到数据文件中的数据量,由此降低文件存储系统的存储空间占用。
要说明的是,针对图7中示出的实施例,还可以进行各种修改。在一些实施例中,可以将图5所描述的数据文件写入过程添加到图7的实施例中。在一些实施例中,可以从图7的实施例中删除708-709和711-713中的部分或全部步骤。
如上描述了根据本说明书的实施例的图状态数据写入过程。在将图状态数据写入文件存储系统的数据文件后,在图计算引擎再次执行图计算时,需要从文件存储系统中读取前一迭代计算过程的图状态数据。
图11示出了根据本说明书的实施例的图状态数据读取方法1100的示例流程图。
如图11所示,在1110,响应于从图计算引擎接收到图状态数据读取请求,图状态数据管理装置将图状态数据读取请求中的数据ID编码为目标key。在请求读取的图状态数据是顶点数据时,数据ID是顶点ID。在请求读取的图状态数据是边数据时,数据ID是边的起点ID。在请求读取的图状态数据包括顶点数据和边数据时,数据ID包括顶点ID和边的起点ID。
在1120,图状态数据管理装置基于所得到的目标key在图状态管理设备的内存中维护的内存索引中查询对应的逻辑地址。
在查询到对应的逻辑地址后,在1130,图状态数据管理装置根据所查询到的逻辑地址获取与目标key对应的value。
图12示出了根据本说明书的实施例的value获取过程1200的示例流程图。在图12的示例中,图状态管理设备的内存中维护数据LRU缓存,所述数据LRU缓存用于与key的对应逻辑地址相关联地缓存先前获取的value。
如图12所示,在根据所查询到的逻辑地址获取与目标key对应的value时,在1210,图状态数据管理装置使用逻辑地址在数据LRU缓存中进行数据缓存查询,并且在1220,判断数据LRU缓存中是否缓存有与目标key对应的value。
如果判断为在数据LRU缓存中缓存有与目标key对应的value,则在1250,图状态数据管理装置从数据LRU缓存中获取对应的value。
如果判断为在数据LRU缓存中没有缓存与目标key对应的value,则在1230,图状态数据管理装置向文件存储系统发起数据获取请求,所发起的数据获取请求包括对应的逻辑地址。在接收到数据获取请求后,文件存储系统根据数据获取请求中的逻辑地址在文件存储系统的数据文件中获取对应的value。例如,文件存储系统可以利用逻辑地址中的文件ID找到对应的数据文件,并根据第一文件偏移地址,从数据文件中获取对应的value。
在1240,图状态数据管理装置从文件存储系统接收所查询到的value。
要说明的是,在其它实施例中,图状态管理设备的内存中也可以没有维护数据LRU缓存。在这种情况下,需要从图12中去除数据缓存判断步骤以及从数据LRU缓存中获取value的步骤。
图13示出了根据本说明书的实施例的value获取过程1300的另一示例流程图。在图13的示例中,图状态数据的value被构建为多个具有第一数据大小的有序数据块并经过数据压缩后写入到文件存储系统的数据文件中,所述数据文件包括各个经过数据压缩后的有序数据块以及元数据块,元数据块中的元数据记录key所对应的第一文件偏移地址与压缩后的有序数据块在数据文件中的第二文件偏移地址之间的映射关系。此外,图状态管理设备的内存中维护数据块LRU缓存,所述数据块LRU缓存用于与key的对应逻辑地址相关联地缓存先前获取的数据块。
如图13所示,在根据所查询到的逻辑地址获取与目标key对应的value时,在1310,图状态数据管理装置使用逻辑地址在数据块LRU缓存中进行数据块缓存查询,并且在1320,图状态数据管理装置判断数据块LRU缓存中是否缓存有与目标key对应的压缩数据块。
如果判断为数据块LRU缓存中缓存有与目标key对应的压缩数据块,则在1350,图状态数据管理装置从数据块LRU缓存中获取对应的压缩数据块,并且进行到1360。
如果判断为数据块LRU缓存没有缓存与目标key对应的压缩数据块,则在1330,图状态数据管理装置向文件存储系统发起数据块获取请求,所述数据块获取请求包括对应的逻辑地址。在接收到数据块获取请求后,文件存储系统根据数据块获取请求中的逻辑地址在文件存储系统的数据文件中获取对应的压缩数据块。具体地,文件存储系统可以利用逻辑地址中的文件ID找到对应的数据文件。接着,文件存储系统从数据文件的元数据块中获取对应的元数据,并且使用该对应元数据中的第一文件偏移地址到压缩数据块在数据文件中的第二文件偏移地址的映射关系,确定出压缩数据块在数据文件中的第二文件偏移地址。然后,文件存储系统基于第二文件偏移地址,从数据文件中获取对应的压缩数据块。
在1340,图状态数据管理装置从文件存储系统接收响应于数据块获取请求而返回的压缩数据块,然后进行到1360。在1360,图状态数据管理装置对所得到的压缩数据块进行解压缩。
在1370,图状态数据管理装置基于逻辑地址中的第一文件偏移地址以及第一数据大小(即,未压缩前的有序数据块的数据大小),确定与目标key对应的value在解压后的有序数据块中的第三偏移地址。例如,可以对第一文件偏移地址执行以第一数据大小为模数的取模运算,所得到的余数结果为与目标key对应的value在解压后的有序数据块中的第三偏移地址。
在1380,图状态数据管理装置根据第三偏移地址,从解压后的有序数据块中获取与目标key对应的value。
回到图11,在如上获取与目标key对应的value后,在1140,图状态数据管理装置对所获取的value进行解码以得到目标图状态数据。例如,图状态数据管理装置可以通过对所获取的value进行解码得到目标图状态数据中的非数据ID部分的数据。在一个示例中,可以将解码得到的非数据ID部分的数据作为目标图状态数据。在另一示例中,可以将解码得到的非数据ID部分的数据与数据ID组合,得到目标图状态数据。然后,在1150,图状态数据管理装置将所得到的目标图状态数据提供给图计算引擎,以供图计算引擎执行图计算。
可选地,在一个实施例中,在将所得到的图状态数据提供给图计算引擎之前,所述图状态数据管理方法还可以包括:使用给定数据过滤策略,对所得到的图状态数据进行数据过滤。例如,使用基于TTL的过期数据删除机制,从所得到的图状态数据中删除 过期图状态数据。此外,还可以基于其它数据过滤条件,对所得到的图状态数据进行数据过滤。
图14示出了根据本说明书的实施例的图状态数据管理装置1400的示例方框图。
如图14所示,图状态数据管理装置1400可以包括图状态数据写入组件。图状态数据写入组件被配置为将从图计算引擎批量获取的图状态数据批量写入文件存储系统。在一个示例中,图状态数据写入组件可以包括图状态数据获取单元1401、第一数据编码单元1402、数据排序单元1403、第一数据写入单元1404、逻辑地址记录单元1405和内存索引维护单元1406。
图状态数据获取单元1401被配置为批量获取图计算引擎在图计算时得到的图状态数据,所述图状态数据包括顶点数据和/或边数据。图状态数据获取单元1401的操作可以参考上面参照图2的210描述的操作。
第一数据编码单元1402被配置为将所获取的图状态数据中的各个图状态数据编码为kv数据。在图状态数据为顶点数据时,将顶点数据中的顶点ID编码为kv数据中的key,以及将顶点数据中的非顶点ID数据编码为kv数据中的value。非顶点ID数据例如可以包括顶点元数据、顶点属性等。在图状态数据为边数据时,将边数据中的起点ID编码为kv数据中的key,以及将边数据中的非起点ID数据编码为kv数据中的value。非起点ID数据例如可以包括终点ID、边元数据、边属性等。第一数据编码单元1402的操作可以参考上面参照图2的220描述的操作。
数据排序单元1403被配置为基于kv数据的key对经过编码得到的kv数据进行排序,以形成kv列表数据。在kv列表数据中,每个key对应一个或多个value。数据排序单元1403的操作可以参考上面参照图2的230描述的操作。
第一数据写入单元1404被配置为将kv列表数据的value按序写入文件存储系统中的数据文件。第一数据写入单元1404的操作可以参考上面参照图2的240描述的操作。
逻辑地址记录单元1405被配置为记录各个key在数据文件中的对应逻辑地址,所述逻辑地址包括与key对应的value所写入数据文件的文件ID以及所述对应的value在所写入数据文件中的第一文件偏移地址。逻辑地址记录单元1405的操作可以参考上面参照图2的250描述的操作。
内存索引维护单元1406被配置为在图状态管理设备的内存中维护所获取的图状态数据的内存索引,所述内存索引用于反映key与对应的逻辑地址之间的索引关系。内存索引维护单元1406的操作可以参考上面参照图2的260描述的操作。
在一些实施例中,为了进一步减少写入到文件存储系统中的数据量,可以对排序后的kv列表数据的value进行数据压缩,并将压缩后的value写入到文件存储系统中。
图15示出了根据本说明书的实施例的第一数据写入单元1500的示例方框图。第一数据写入单元1500可以实现将kv数据中的value进行数据压缩后写入文件存储系统中的数据文件。如图15所示,第一数据写入单元1500包括数据块构建模块1510、数据块压缩模块1520和数据块写入模块1530。
数据块构建模块1510被配置为将kv列表数据的value构建为多个具有第一数据大小的有序数据块。数据块构建模块1510的操作可以参考上面参照图5的510描述的操作。
数据块压缩模块1520被配置为对所构建的有序数据块进行数据压缩。数据块压缩模块1520的操作可以参考上面参照图5的520描述的操作。
数据块写入模块1530被配置为将经过数据压缩后的有序数据块按序写入文件存储系统中的数据文件,所述数据文件包括各个经过数据压缩后的有序数据块以及元数据块,所述元数据块中的元数据记录key所对应的第一文件偏移地址与压缩后的有序数据块在数据文件中的第二文件偏移地址之间的映射关系。数据块写入模块1530的操作可以参考上面参照图5的530描述的操作。
图16示出了根据本说明书的实施例的图状态数据写入组件1600的另一示例方框图。在图16的示例中,图状态管理设备的内存中维护可变数据表和不可变数据表。如图16所示,图状态数据写入组件包括图状态数据获取单元1610、第一数据编码单元1620、数据排序单元1630、第一数据写入单元1640、逻辑地址记录单元1650、内存索引维护单元1660、第二数据写入单元1670、第一判断单元1680和数据表转换单元1690。
图状态数据获取单元1610批量获取图计算引擎在图计算时得到的图状态数据,所 述图状态数据包括顶点数据和/或边数据。第一数据编码单元1620将所获取的图状态数据中的各个图状态数据编码为kv数据。在图状态数据为顶点数据时,将顶点数据中的顶点ID编码为kv数据中的key,以及将顶点数据中的非顶点ID数据编码为kv数据中的value。非顶点ID数据例如可以包括顶点元数据、顶点属性等。在图状态数据为边数据时,将边数据中的起点ID编码为kv数据中的key,以及将边数据中的非起点ID数据编码为kv数据中的value。非起点ID数据例如可以包括终点ID、边元数据、边属性等。
第二数据写入单元1670将编码后的kv数据写入可变数据表。在将编码后的kv数据写入可变数据表后,第一判断单元1680判断写入kv数据的可变数据表的数据大小是否达到阈值。
响应于写入kv数据的可变数据表的数据大小达到阈值,数据排序单元1630基于kv数据的key来对写入该可变数据表中的kv数据进行排序,以形成kv列表数据。在针对可变数据表中的kv数据完成排序后,数据表转换单元1690将排序后的可变数据表转换为不可变数据表。
第一数据写入单元1640将不可变数据表的value按序写入文件存储系统中的数据文件,每个不可变数据表对应一个数据文件。逻辑地址记录单元1650记录各个key在数据文件中的对应逻辑地址,所述逻辑地址包括与key对应的value所写入数据文件的文件ID以及所述对应的value在所写入数据文件中的第一文件偏移地址。
在将不可变数据表写入文件存储系统中的数据文件并且记录各个key的对应逻辑地址后,内存索引维护单元1660在图状态管理设备的内存中维护所获取的图状态数据的内存索引,所述内存索引用于反映key与对应的逻辑地址之间的索引关系。
要说明的是,图16中的第一数据写入单元1640同样可以采用图15中示出的第一数据写入单元1500来实现。在这种情况下,数据块构建模块被配置为将不可变数据表中的value构建为多个具有第一数据大小的有序数据块。此外,在一些实施例中,第一数据写入单元1640和第二数据写入单元1670可以采用同一单元实现。
图状态数据管理装置1400可以包括图状态数据读取组件。图状态数据读取组件被配置为响应于从图计算引擎接收到图状态数据读取请求,读取对应的图状态数据并返回给图计算引擎。如图14所示,图状态数据读取组件可以包括第二数据编码单元1407、逻辑地址查询单元1408、数据获取单元1409、数据解码单元1410和数据提供单元1411。
第二数据编码单元1407被配置为响应于从图计算引擎接收到图状态数据读取请求,将图状态数据读取请求中的数据ID编码为目标key。第二数据编码单元1407的操作可以参考上面参照图11的1110描述的操作。
逻辑地址查询单元1408被配置为基于目标key在图状态管理设备的内存中维护的内存索引中查询对应的逻辑地址。逻辑地址查询单元1408的操作可以参考上面参照图11的1120描述的操作。
数据获取单元1409被配置为根据逻辑地址获取与目标key对应的value。数据获取单元的操作可以参考上面参照图11的1130描述的操作。
数据解码单元1410被配置为对所获取的value进行解码得到目标图状态数据。例如,数据解码单元1410可以通过对所获取的value进行解码得到目标图状态数据中的非数据ID部分的数据。在一个示例中,数据解码单元1410可以将解码得到的非数据ID部分的数据作为目标图状态数据。在另一示例中,数据解码单元1410可以将解码得到的非数据ID部分的数据与数据ID组合,得到目标图状态数据。数据解码单元1410的操作可以参考上面参照图1的1140描述的操作。
数据提供单元1411被配置为将所得到的目标图状态数据提供给图计算引擎。数据提供单元1411的操作可以参考上面参照图11的1150描述的操作。
可选地,在一些实施例中,图状态数据管理装置1400还可以包括数据过滤单元1412。数据过滤单元1412被配置为在将所得到的目标图状态数据提供给图计算引擎之前,使用给定数据过滤策略来对所得到的目标图状态数据进行数据过滤。
可选地,在一些实施例中,图状态数据管理装置1400还可以包括内存索引更新判断单元1413以及内存索引更新单元1414。内存索引更新判断单元1413被配置为在将kv列表数据的value按序写入文件存储系统中的数据文件并记录各个key在数据文件中的对应逻辑地址后,判断是否需要进行内存索引更新。内存索引更新单元1414被配置 为响应于需要进行内存索引更新,使用所记录的各个key的逻辑地址对内存索引中的对应逻辑地址进行增量逻辑地址更新。
在一些实施例中,内存索引更新判断单元1413和内存索引更新单元1414可以与图状态数据写入组件一起构成图状态数据更新组件。图状态数据更新组件可以实现对文件存储系统的数据文件中存储的图状态数据进行数据更新。图状态数据更新组件的更新操作可以采用批量更新策略实现,增量图状态数据以append-only方式写入到文件存储系统的数据文件中,并且对于每个key,在内存索引中维护初始逻辑地址以及后续的增量逻辑地址。
在一些实施例中,图状态数据管理装置1400还可以包括数据聚合单元1415。响应于满足数据compact条件,数据聚合单元1415使用给定数据compact策略对文件存储系统的数据文件中存储的value进行数据compact。
图17示出了根据本说明书的实施例的数据获取单元1700的示例方框图。在图17的示例中,图状态管理设备的内存中维护数据LRU缓存,数据LRU缓存用于与key的对应逻辑地址相关联地缓存先前获取的value。如图17所示,数据获取单元1700包括数据缓存判断模块1710、数据获取请求发起模块1720和数据获取模块1730。
数据缓存判断模块1710被配置为在查询到对应的逻辑地址后,根据逻辑地址判断数据LRU缓存中是否缓存有与目标key对应的value。
如果数据LRU缓存中缓存有与目标key对应的value,则数据获取模块1730从数据LRU缓存中获取对应的value。
如果数据LRU缓存中没有缓存与目标key对应的value,则数据获取请求发起模块1720向文件存储系统发起数据获取请求,所述数据获取请求包括对应的逻辑地址。数据获取模块1730从文件存储系统接收响应于数据获取请求而返回的value,所返回的value由文件存储系统根据对应的逻辑地址在文件存储系统的数据文件中获取。
在一些实施例中,图状态管理设备的内存中也可以不维护数据LRU缓存。在这种情况下,需要从图17中示出的数据获取单元中去除数据缓存判断模块1710。
图18示出了根据本说明书的实施例的数据获取单元1800的另一示例方框图。在图18的示例中,图状态数据的value被构建为多个具有第一数据大小的有序数据块并经过数据压缩后写入到文件存储系统的数据文件中,所述数据文件包括各个经过数据压缩后的有序数据块以及元数据块,元数据块中的元数据记录key所对应的第一文件偏移地址与压缩后的有序数据块在数据文件中的第二文件偏移地址之间的映射关系。此外,图状态管理设备的内存中维护数据块LRU缓存,所述数据块LRU缓存用于与key的对应逻辑地址相关联地缓存先前获取的数据块。
如图18所示,数据获取单元1800包括数据块缓存判断模块1810、数据块获取请求发起模块1820、数据块获取模块1830、数据块解压模块1840、偏移地址确定模块1850和数据获取模块1860。
数据块缓存判断模块1810被配置为在查询到对应的逻辑地址后,根据逻辑地址判断数据块LRU缓存中是否缓存有与目标key对应的压缩数据块。
在数据块LRU缓存中缓存有与目标key对应的压缩数据块时,数据块获取模块1830从数据块LRU缓存中获取对应的压缩数据块。
在数据块LRU缓存中没有缓存与目标key对应的压缩数据块时,数据块获取请求发起模块1820向文件存储系统发起数据块获取请求,该数据块获取请求包括对应的逻辑地址。在这种情况下,数据块获取模块1830从文件存储系统接收响应于数据块获取请求而返回的压缩数据块,所返回的压缩数据块由文件存储系统根据第一文件偏移地址在文件存储系统的数据文件中获取。
在如上得到压缩数据块后,数据块解压模块1840对所得到的压缩数据块进行解压缩。偏移地址确定模块1850基于逻辑地址中的第一文件偏移地址以及数据块的规定大小(即,第一数据大小),确定与目标key对应的value在解压后的数据块中的第三偏移地址。
在如上得到第三偏移地址后,数据获取模块1860根据第三偏移地址,从解压后的数据块中获取与目标key对应的value。
在一些实施例中,图状态管理设备的内存中也可以不维护数据块LRU缓存。在这种情况下,需要从图18中示出的数据获取单元中去除数据块缓存判断模块1810。
如上参照图1到图18,对根据本说明书实施例的图状态数据管理方法以及图状态数据管理装置进行了描述。上面的图状态数据管理装置可以采用硬件实现,也可以采用软件或者硬件和软件的组合来实现。
图19示出了根据本说明书的实施例的基于计算机系统实现的图状态数据管理装置1900的示意图。如图19所示,图状态数据管理装置1900可以包括至少一个处理器1910、存储器(例如,非易失性存储器)1920、内存1930和通信接口1940,并且至少一个处理器1910、存储器1920、内存1930和通信接口1940经由总线1960连接在一起。至少一个处理器1910执行在存储器中存储或编码的至少一个计算机可读指令(即,上述以软件形式实现的元素)。
在一个实施例中,在存储器中存储计算机可执行指令,其当执行时使得至少一个处理器1910:批量获取图计算引擎在图计算时得到的图状态数据,图状态数据包括顶点数据和/或边数据;将所获取的图状态数据中的各个图状态数据编码为kv数据,其中,顶点数据中的顶点ID和/或边数据中的起点ID被编码为key,以及顶点数据中的非顶点ID数据和/或边数据中的非起点ID数据被编码为value;基于kv数据的key对kv数据进行排序以形成kv列表数据,在kv列表数据中,每个key对应一个或多个value;将kv列表数据的value按序写入文件存储系统中的数据文件并记录各个key在数据文件中的对应逻辑地址,所记录的逻辑地址包括与key对应的value所写入数据文件的文件ID以及对应的value在所写入数据文件中的第一文件偏移地址;以及在图状态管理设备的内存中维护所获取的图状态数据的内存索引,所述内存索引用于反映key与对应的逻辑地址之间的索引关系。
应该理解,在存储器中存储的计算机可执行指令当执行时使得至少一个处理器1910进行本说明书的各个实施例中以上结合图1-图18描述的各种操作和功能。
根据一个实施例,提供了一种比如机器可读介质(例如,非暂时性机器可读介质)的程序产品。机器可读介质可以具有指令(即,上述以软件形式实现的元素),该指令当被机器执行时,使得机器执行本说明书的各个实施例中以上结合图1-图18描述的各种操作和功能。具体地,可以提供配有可读存储介质的系统或者装置,在该可读存储介质上存储着实现上述实施例中任一实施例的功能的软件程序代码,且使该系统或者装置的计算机或处理器读出并执行存储在该可读存储介质中的指令。
在这种情况下,从可读介质读取的程序代码本身可实现上述实施例中任何一项实施例的功能,因此机器可读代码和存储机器可读代码的可读存储介质构成了本发明的一部分。
可读存储介质的实施例包括软盘、硬盘、磁光盘、光盘(如CD-ROM、CD-R、CD-RW、DVD-ROM、DVD-RAM、DVD-RW、DVD-RW)、磁带、非易失性存储卡和ROM。可选择地,可以由通信网络从服务器计算机上或云上下载程序代码。
根据一个实施例,提供一种计算机程序产品,该计算机程序产品包括计算机程序,该计算机程序当被处理器执行时,使得处理器执行本说明书的各个实施例中以上结合图1-图18描述的各种操作和功能。
本领域技术人员应当理解,上面公开的各个实施例可以在不偏离发明实质的情况下做出各种变形和修改。因此,本发明的保护范围应当由所附的权利要求书来限定。
需要说明的是,上述各流程和各系统结构图中不是所有的步骤和单元都是必须的,可以根据实际的需要忽略某些步骤或单元。各步骤的执行顺序不是固定的,可以根据需要进行确定。上述各实施例中描述的装置结构可以是物理结构,也可以是逻辑结构,即,有些单元可能由同一物理实体实现,或者,有些单元可能分由多个物理实体实现,或者,可以由多个独立设备中的某些部件共同实现。
以上各实施例中,硬件单元或模块可以通过机械方式或电气方式实现。例如,一个硬件单元、模块或处理器可以包括永久性专用的电路或逻辑(如专门的处理器,FPGA或ASIC)来完成相应操作。硬件单元或处理器还可以包括可编程逻辑或电路(如通用处理器或其它可编程处理器),可以由软件进行临时的设置以完成相应操作。具体的实现方式(机械方式、或专用的永久性电路、或者临时设置的电路)可以基于成本和时间上的考虑来确定。
上面结合附图阐述的具体实施方式描述了示例性实施例,但并不表示可以实现的或者落入权利要求书的保护范围的所有实施例。在整个本说明书中使用的术语“示例性” 意味着“用作示例、实例或例示”,并不意味着比其它实施例“优选”或“具有优势”。出于提供对所描述技术的理解的目的,具体实施方式包括具体细节。然而,可以在没有这些具体细节的情况下实施这些技术。在一些实例中,为了避免对所描述的实施例的概念造成难以理解,公知的结构和装置以框图形式示出。
本公开内容的上述描述被提供来使得本领域任何普通技术人员能够实现或者使用本公开内容。对于本领域普通技术人员来说,对本公开内容进行的各种修改是显而易见的,并且,也可以在不脱离本公开内容的保护范围的情况下,将本文所定义的一般性原理应用于其它变型。因此,本公开内容并不限于本文所描述的示例和设计,而是与符合本文公开的原理和新颖性特征的最广范围相一致。

Claims (25)

  1. 一种应用于图状态管理设备的图状态数据管理方法,包括:
    批量获取图计算引擎在图计算时得到的图状态数据,所述图状态数据包括顶点数据和/或边数据;
    将所述图状态数据中的各个图状态数据编码为kv数据,其中,顶点数据中的顶点ID和/或边数据中的起点ID被编码为key,以及顶点数据中的非顶点ID数据和/或边数据中的非起点ID数据被编码为value;
    基于kv数据的key对所述kv数据进行排序,以形成kv列表数据,在所述kv列表数据中,每个key对应一个或多个value;
    将所述kv列表数据的value按序写入文件存储系统中的数据文件并记录各个key在数据文件中的对应逻辑地址,所述逻辑地址包括与key对应的value所写入数据文件的文件ID以及所述对应的value在所写入数据文件中的第一文件偏移地址;以及
    在所述图状态管理设备的内存中维护所述图状态数据的内存索引,所述内存索引用于反映key与对应的逻辑地址之间的索引关系。
  2. 如权利要求1所述的图状态数据管理方法,其中,所述图状态管理设备的内存中维护可变数据表和不可变数据表,
    在基于kv数据的key对所述kv数据进行排序之前,所述图状态数据管理方法还包括:
    将所述kv数据写入可变数据表;以及
    判断写入kv数据的可变数据表的数据大小是否达到阈值,
    基于kv数据的key对所述kv数据进行排序包括:
    响应于写入kv数据的可变数据表的数据大小达到阈值,基于kv数据的key来对写入该可变数据表中的kv数据进行排序,
    将所述kv列表数据的value按序写入文件存储系统中的数据文件包括:
    将排序后的可变数据表转换为不可变数据表;以及
    将不可变数据表的value按序写入文件存储系统中的数据文件,每个不可变数据表对应一个数据文件。
  3. 如权利要求1所述的图状态数据管理方法,其中,将所述kv列表数据的value按序写入文件存储系统中的数据文件包括:
    将所述kv列表数据的value构建为多个具有第一数据大小的有序数据块;
    对所构建的有序数据块进行数据压缩;以及
    将经过数据压缩后的有序数据块按序写入文件存储系统中的数据文件,所述数据文件包括各个经过数据压缩后的有序数据块以及元数据块,所述元数据块中的元数据记录key所对应的第一文件偏移地址与压缩后的有序数据块在数据文件中的第二文件偏移地址之间的映射关系。
  4. 如权利要求1所述的图状态数据管理方法,还包括:
    响应于从图计算引擎接收到图状态数据读取请求,将图状态数据读取请求中的数据ID编码为目标key,所述数据ID包括顶点ID和/或边的起点ID;
    基于所述目标key在所述内存索引中查询对应的逻辑地址;
    根据所述逻辑地址获取与所述目标key对应的value;
    对所获取的value进行解码,以得到目标图状态数据;以及
    将所得到的目标图状态数据提供给图计算引擎。
  5. 如权利要求4所述的图状态数据管理方法,其中,根据所述逻辑地址获取与所述目标key对应的value包括:
    响应于查询到对应的逻辑地址,向所述文件存储系统发起数据获取请求,所述数据获取请求包括所述对应的逻辑地址;以及
    从所述文件存储系统接收响应于所述数据获取请求而返回的value,所述返回的value由所述文件存储系统根据所述对应的逻辑地址在所述文件存储系统的数据文件中获取。
  6. 如权利要求5所述的图状态数据管理方法,其中,所述图状态管理设备的内存中维护数据LRU缓存,所述数据LRU缓存用于与key的对应逻辑地址相关联地缓存先前获取的value,
    在向所述文件存储系统发起数据获取请求之前,根据所述逻辑地址获取与所述目标key对应的value还包括:
    根据所述逻辑地址,判断所述数据LRU缓存中是否缓存有与所述目标key对应的value;以及
    在所述数据LRU缓存中缓存有与所述目标key对应的value时,从所述数据LRU缓存中获取对应的value。
  7. 如权利要求4所述的图状态数据管理方法,其中,所述图状态数据的value被构建为多个具有第一数据大小的有序数据块并经过数据压缩后写入到所述文件存储系统的数据文件中,所述数据文件包括各个经过数据压缩后的有序数据块以及元数据块,所述元数据块中的元数据记录key所对应的第一文件偏移地址与压缩后的有序数据块在数据文件中的第二文件偏移地址之间的映射关系,
    根据所述逻辑地址获取与所述目标key对应的value包括:
    响应于查询到对应的逻辑地址,向所述文件存储系统发起数据块获取请求,所述数据块获取请求包括所述对应的逻辑地址;
    从所述文件存储系统接收响应于所述数据块获取请求而返回的压缩数据块,所述压缩数据块由所述文件存储系统根据所述第一文件偏移地址在所述文件存储系统的数据文件中获取;
    对所得到的压缩数据块进行解压缩;
    基于所述逻辑地址中的第一文件偏移地址以及所述第一数据大小,确定与所述目标key对应的value在所述解压后的数据块中的第三偏移地址;以及
    根据所述第三偏移地址,从所述解压后的数据块中获取与所述目标key对应的value。
  8. 如权利要求7所述的图状态数据管理方法,其中,所述图状态管理设备的内存中维护数据块LRU缓存,所述数据块LRU缓存用于与key的对应逻辑地址相关联地缓存先前获取的数据块,
    在向所述文件存储系统发起数据块获取请求之前,根据所述逻辑地址获取与所述目标key对应的value还包括:
    根据所述逻辑地址,判断所述数据块LRU缓存中是否缓存有与所述目标key对应的压缩数据块;以及
    在所述数据块LRU缓存中缓存有与所述目标key对应的压缩数据块时,从所述数据块LRU缓存中获取对应的压缩数据块。
  9. 如权利要求4所述的图状态数据管理方法,其中,将所得到的图状态数据提供给图计算引擎之前,所述图状态数据管理方法还包括:
    使用给定数据过滤策略,对所得到的图状态数据进行数据过滤。
  10. 如权利要求1所述的图状态数据管理方法,其中,在将所述kv列表数据的value按序写入文件存储系统中的数据文件并记录各个key在数据文件中的对应逻辑地址后,所述图状态数据管理方法还包括:
    判断是否需要进行内存索引更新;以及
    响应于判断为需要进行内存索引更新,使用所记录的各个key的逻辑地址对内存索引中的对应逻辑地址进行增量逻辑地址更新。
  11. 如权利要求1所述的图状态数据管理方法,还包括:
    响应于满足数据聚合条件,使用给定数据聚合策略对所述文件存储系统的数据文件中存储的value进行数据聚合。
  12. 一种应用于图状态管理设备的图状态数据管理装置,包括:
    图状态数据获取单元,批量获取图计算引擎在图计算时得到的图状态数据,所述图状态数据包括顶点数据和/或边数据;
    第一数据编码单元,将所述图状态数据中的各个图状态数据编码为kv数据,其中,顶点数据中的顶点ID和/或边数据中的起点ID被编码为key,以及顶点数据中的非顶点ID数据和/或边数据中的非起点ID数据被编码为value;
    数据排序单元,基于kv数据的key对所述kv数据进行排序,以形成kv列表数据,在所述kv列表数据中,每个key对应一个或多个value;
    第一数据写入单元,将所述kv列表数据的value按序写入文件存储系统中的数据文件;
    逻辑地址记录单元,记录各个key在数据文件中的对应逻辑地址,所述逻辑地址包括与key对应的value所写入数据文件的文件ID以及所述对应的value在所写入数据文件中的第一文件偏移地址;以及
    内存索引维护单元,在所述图状态管理设备的内存中维护所述图状态数据的内存索引,所述内存索引用于反映key与对应的逻辑地址之间的索引关系。
  13. 如权利要求12所述的图状态数据管理装置,其中,所述图状态管理设备的内存中维护可变数据表和不可变数据表,
    所述图状态数据管理装置还包括:
    第二数据写入单元,在基于kv数据的key对所述kv数据进行排序之前,将所述kv数据写入可变数据表;以及
    第一判断单元,判断写入kv数据的可变数据表的数据大小是否达到阈值,
    所述数据排序单元响应于写入kv数据的可变数据表的数据大小达到阈值,基于kv数据的key来对写入该可变数据表中的kv数据进行排序,以形成kv列表数据,
    所述图状态数据管理装置还包括:
    数据表转换单元,将排序后的可变数据表转换为不可变数据表,
    其中,所述第一数据写入单元将不可变数据表的value按序写入文件存储系统中的数据文件,每个不可变数据表对应一个数据文件。
  14. 如权利要求12所述的图状态数据管理装置,其中,所述第一数据写入单元包括:
    数据块构建模块,将所述kv列表数据的value构建为多个具有第一数据大小的有序数据块;
    数据块压缩模块,对所构建的有序数据块进行数据压缩;以及
    数据块写入模块,将经过数据压缩后的有序数据块按序写入文件存储系统中的数据文件,所述数据文件包括各个经过数据压缩后的有序数据块以及元数据块,所述元数据块中的元数据记录key所对应的第一文件偏移地址与压缩后的有序数据块在数据文件中的第二文件偏移地址之间的映射关系。
  15. 如权利要求12所述的图状态数据管理装置,还包括:
    第二数据编码单元,响应于从图计算引擎接收到图状态数据读取请求,将图状态数据读取请求中的数据ID编码为目标key,所述数据ID包括顶点ID和/或边的起点ID;
    逻辑地址查询单元,基于所述目标key在所述内存索引中查询对应的逻辑地址;
    数据获取单元,根据所述逻辑地址获取与所述目标key对应的value;
    数据解码单元,对所获取的value进行解码以得到目标图状态数据;以及
    数据提供单元,将所得到的目标图状态数据提供给图计算引擎。
  16. 如权利要求15所述的图状态数据管理装置,其中,所述数据获取单元包括:
    数据获取请求发起模块,响应于查询到对应的逻辑地址,向所述文件存储系统发起数据获取请求,所述数据获取请求包括所述对应的逻辑地址;以及
    数据获取模块,从所述文件存储系统接收响应于所述数据获取请求而返回的value,所返回的value由所述文件存储系统根据所述对应的逻辑地址在所述文件存储系统的数据文件中获取。
  17. 如权利要求16所述的图状态数据管理装置,其中,所述图状态管理设备的内存中维护数据LRU缓存,所述数据LRU缓存用于与key的对应逻辑地址相关联地缓存先前获取的value,
    所述数据获取单元还包括:
    数据缓存判断模块,在向所述文件存储系统发起数据获取请求之前,根据所述逻辑地址判断所述数据LRU缓存中是否缓存有与所述目标key对应的value,
    其中,在所述数据LRU缓存中缓存有与所述目标key对应的value时,所述数据获取模块从所述数据LRU缓存中获取对应的value。
  18. 如权利要求15所述的图状态数据管理装置,其中,所述图状态数据的value被构建为多个具有第一数据大小的有序数据块并经过数据压缩后写入到所述文件存储系统的数据文件中,所述数据文件包括各个经过数据压缩后的有序数据块以及元数据块,所述元数据块中的元数据记录key所对应的第一文件偏移地址与压缩后的有序数据块在数据文件中的第二文件偏移地址之间的映射关系,
    所述数据获取单元包括:
    数据块获取请求发起模块,响应于查询到对应的逻辑地址,向所述文件存储系统发起数据块获取请求,所述数据块获取请求包括所述对应的逻辑地址;
    数据块获取模块,从所述文件存储系统接收响应于所述数据块获取请求而返回的压缩数据块,所述压缩数据块由所述文件存储系统根据所述第一文件偏移地址在所述文件存储系统的数据文件中获取;
    数据块解压模块,对所得到的压缩数据块进行解压缩;
    偏移地址确定模块,基于所述逻辑地址中的第一文件偏移地址以及所述第一数据大小,确定与所述目标key对应的value在所述解压后的数据块中的第三偏移地址;以及
    数据获取模块,根据所述第三偏移地址,从所述解压后的数据块中获取与所述目标key对应的value。
  19. 如权利要求18所述的图状态数据管理装置,其中,所述图状态管理设备的内存中维护数据块LRU缓存,所述数据块LRU缓存用于与key的对应逻辑地址相关联地缓存先前获取的数据块,
    所述数据获取单元还包括:
    数据块缓存判断模块,在向所述文件存储系统发起数据块获取请求之前,根据所述逻辑地址,判断所述数据块LRU缓存中是否缓存有与所述目标key对应的压缩数据块,
    其中,在所述数据块LRU缓存中缓存有与所述目标key对应的压缩数据块时,所述数据块获取模块从所述数据块LRU缓存中获取对应的压缩数据块。
  20. 如权利要求15所述的图状态数据管理装置,还包括:
    数据过滤单元,在将所得到的图状态数据提供给图计算引擎之前,使用给定数据过滤策略来对所得到的图状态数据进行数据过滤。
  21. 如权利要求12所述的图状态数据管理装置,还包括:
    内存索引更新判断单元,在将所述kv列表数据的value按序写入文件存储系统中的数据文件并记录各个key在数据文件中的对应逻辑地址后,判断是否需要进行内存索引更新;以及
    内存索引更新单元,响应于需要进行内存索引更新,使用所记录的各个key的逻辑地址对内存索引中的对应逻辑地址进行增量逻辑地址更新。
  22. 如权利要求12所述的图状态数据管理装置,还包括:
    数据聚合单元,响应于满足数据聚合条件,使用给定数据聚合策略对所述文件存储系统的数据文件中存储的图状态数据进行数据聚合。
  23. 一种图状态数据管理装置,包括:
    至少一个处理器,
    与所述至少一个处理器耦合的存储器,以及
    存储在所述存储器中的计算机程序,所述至少一个处理器执行所述计算机程序来实现如权利要求1到11中任一所述的图状态数据管理方法。
  24. 一种计算机可读存储介质,其存储有可执行指令,所述指令当被执行时使得处理器执行如权利要求1到11中任一所述的图状态数据管理方法。
  25. 一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行来实现如权利要求1到11中任一所述的图状态数据管理方法。
PCT/CN2022/131007 2021-11-11 2022-11-10 图状态数据管理 WO2023083234A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111332991.XA CN113806302B (zh) 2021-11-11 2021-11-11 图状态数据管理方法及装置
CN202111332991.X 2021-11-11

Publications (1)

Publication Number Publication Date
WO2023083234A1 true WO2023083234A1 (zh) 2023-05-19

Family

ID=78898569

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/131007 WO2023083234A1 (zh) 2021-11-11 2022-11-10 图状态数据管理

Country Status (2)

Country Link
CN (1) CN113806302B (zh)
WO (1) WO2023083234A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116955363A (zh) * 2023-09-21 2023-10-27 北京四维纵横数据技术有限公司 无模式数据创建索引方法、装置、计算机设备及介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113806302B (zh) * 2021-11-11 2022-02-22 支付宝(杭州)信息技术有限公司 图状态数据管理方法及装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899156A (zh) * 2015-05-07 2015-09-09 中国科学院信息工程研究所 一种面向大规模社交网络的图数据存储及查询方法
CN106611037A (zh) * 2016-09-12 2017-05-03 星环信息科技(上海)有限公司 用于分布式图计算的方法与设备
US20190130004A1 (en) * 2017-10-27 2019-05-02 Streamsimple, Inc. Streaming Microservices for Stream Processing Applications
CN110677461A (zh) * 2019-09-06 2020-01-10 上海交通大学 一种基于键值对存储的图计算方法
CN112507026A (zh) * 2020-12-11 2021-03-16 北京计算机技术及应用研究所 基于键值模型、文档模型和图模型数据的分布式高速存储方法
CN113448964A (zh) * 2021-06-29 2021-09-28 四川蜀天梦图数据科技有限公司 一种基于图-kv的混合存储方法及装置
CN113806302A (zh) * 2021-11-11 2021-12-17 支付宝(杭州)信息技术有限公司 图状态数据管理方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10002154B1 (en) * 2017-08-24 2018-06-19 Illumon Llc Computer data system data source having an update propagation graph with feedback cyclicality
CN109033234B (zh) * 2018-07-04 2021-09-14 中国科学院软件研究所 一种基于状态更新传播的流式图计算方法及系统
CN110427359A (zh) * 2019-06-27 2019-11-08 苏州浪潮智能科技有限公司 一种图数据处理方法和装置
CN113609257B (zh) * 2021-08-09 2024-03-22 神州数码融信软件有限公司 一种金融知识图谱弹性框架构建方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899156A (zh) * 2015-05-07 2015-09-09 中国科学院信息工程研究所 一种面向大规模社交网络的图数据存储及查询方法
CN106611037A (zh) * 2016-09-12 2017-05-03 星环信息科技(上海)有限公司 用于分布式图计算的方法与设备
US20190130004A1 (en) * 2017-10-27 2019-05-02 Streamsimple, Inc. Streaming Microservices for Stream Processing Applications
CN110677461A (zh) * 2019-09-06 2020-01-10 上海交通大学 一种基于键值对存储的图计算方法
CN112507026A (zh) * 2020-12-11 2021-03-16 北京计算机技术及应用研究所 基于键值模型、文档模型和图模型数据的分布式高速存储方法
CN113448964A (zh) * 2021-06-29 2021-09-28 四川蜀天梦图数据科技有限公司 一种基于图-kv的混合存储方法及装置
CN113806302A (zh) * 2021-11-11 2021-12-17 支付宝(杭州)信息技术有限公司 图状态数据管理方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116955363A (zh) * 2023-09-21 2023-10-27 北京四维纵横数据技术有限公司 无模式数据创建索引方法、装置、计算机设备及介质
CN116955363B (zh) * 2023-09-21 2023-12-26 北京四维纵横数据技术有限公司 无模式数据创建索引方法、装置、计算机设备及介质

Also Published As

Publication number Publication date
CN113806302A (zh) 2021-12-17
CN113806302B (zh) 2022-02-22

Similar Documents

Publication Publication Date Title
WO2023083234A1 (zh) 图状态数据管理
US9430156B1 (en) Method to increase random I/O performance with low memory overheads
JP5316711B2 (ja) ファイル記憶装置、ファイル記憶方法およびプログラム
KR102082765B1 (ko) 중복 제거 방법 및 저장 장치
US9715505B1 (en) Method and system for maintaining persistent live segment records for garbage collection
US20130067237A1 (en) Providing random access to archives with block maps
JP6638821B2 (ja) データベースのアーカイビング方法及び装置、アーカイビングされたデータベースの検索方法及び装置
CN113407550A (zh) 数据存储及查询方法、装置及数据库系统
Haynes et al. Vss: A storage system for video analytics
WO2023165272A1 (zh) 数据存储及查询
US10838923B1 (en) Poor deduplication identification
US11625412B2 (en) Storing data items and identifying stored data items
CN108614837B (zh) 文件存储和检索的方法及装置
CN108415671B (zh) 一种面向绿色云计算的重复数据删除方法及系统
US20200301944A1 (en) Method and apparatus for storing off-chain data
US9405761B1 (en) Technique to determine data integrity for physical garbage collection with limited memory
US11372576B2 (en) Data processing apparatus, non-transitory computer-readable storage medium, and data processing method
US20190034466A1 (en) Database archiving method and device for creating index information and method and device of retrieving archived database including index information
CN111125034A (zh) 一种聚合对象数据处理方法、系统及相关设备
CN110908589A (zh) 数据文件的处理方法、装置、系统和存储介质
CN111611250A (zh) 数据存储设备、数据查询方法、装置、服务器及存储介质
CN107391769B (zh) 一种索引查询方法及装置
WO2021012162A1 (zh) 存储系统数据压缩的方法、装置、设备及可读存储介质
EP3432168A1 (en) Metadata separated container format
CN105610921A (zh) 一种集群下基于数据缓存的纠删码归档方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22892028

Country of ref document: EP

Kind code of ref document: A1