WO2023165196A1 - 一种日志存储加速方法、装置、电子设备及非易失性可读存储介质 - Google Patents

一种日志存储加速方法、装置、电子设备及非易失性可读存储介质 Download PDF

Info

Publication number
WO2023165196A1
WO2023165196A1 PCT/CN2022/135984 CN2022135984W WO2023165196A1 WO 2023165196 A1 WO2023165196 A1 WO 2023165196A1 CN 2022135984 W CN2022135984 W CN 2022135984W WO 2023165196 A1 WO2023165196 A1 WO 2023165196A1
Authority
WO
WIPO (PCT)
Prior art keywords
write
written
block
write operation
file system
Prior art date
Application number
PCT/CN2022/135984
Other languages
English (en)
French (fr)
Inventor
臧林劼
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2023165196A1 publication Critical patent/WO2023165196A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the embodiments of the present application relate to the technical field of distributed storage, and in particular to a log storage acceleration method, device, electronic device, and non-volatile readable storage medium.
  • NVMe SSD NVMe solid-state drives, NVMe solid-state hard disk drive
  • IO Input/Output, Input/Output
  • the purpose of the embodiments of the present application is to provide a log storage acceleration method, device, device, and medium, which can increase the log storage speed and improve storage IO performance. Its detailed plan is as follows:
  • the embodiment of the present application discloses a log storage acceleration method applied to a distributed storage system, including:
  • the aforementioned small block write operation is constructed based on the aforementioned object to be written and the aforementioned object placement group, including:
  • a small-block write operation that sequentially includes the above-mentioned object placement group identifier, the above-mentioned object identifier, the above-mentioned target operation sequence number, and the above-mentioned data to be written is constructed in the form of a quadruple.
  • the above-mentioned small block write operation is written into a hash-based multi-linked list data structure through the above-mentioned log file system, so that the above-mentioned small block write operation is merged to obtain a large block sequential write operation, and the above large block sequence Write operations are flushed to the write-back queue, including:
  • the above-mentioned small-block write operation is directly flushed to the above-mentioned write-back queue; if the above-mentioned target slot is found, the above-mentioned small-block write operation is mapped to the above-mentioned target slot, and Using the above-mentioned object placement group identifier in the above-mentioned small block write operation to search for the target block from the target linked list corresponding to the above-mentioned target slot;
  • the above-mentioned small-block write operation is directly flushed to the above-mentioned write-back queue; if the above-mentioned target block is found, the above-mentioned small-block write operation is merged into the above-mentioned target block by appending data in order to obtain a sequential write operation of large blocks, and then flush the sequential write operations of large blocks to the above-mentioned write-back queue.
  • the above-mentioned sequential write operation of large blocks in the above-mentioned write-back queue is written back to the back-end file system for storage, including:
  • the method before checking the sequence number of the operation to be checked stored in the preset linked list by using the sequence number of the operation to be written back stored in the preset checking record unit, the method further includes:
  • the above-mentioned small block write operation is written into a hash-based multi-linked list data structure through the above-mentioned log file system, including:
  • the above-mentioned small-block write operation is written into the hash-based multi-link list data structure through the above-mentioned log file system.
  • the embodiment of the present application discloses a log storage acceleration device, which is applied to a distributed storage system, including:
  • the small block write operation building block is set to divide the file to be written into multiple objects to be written, and store the above objects to be written into the object placement group respectively, and then place them based on the above objects to be written and the above objects Group builds corresponding small block write operations;
  • the small-block write operation merging module is configured to send and submit the above-mentioned small-block write operations to the log file system through the log queue, and write the above-mentioned small-block write operations into the hash-based multi-linked list data structure through the above-mentioned log file system, In order to merge the above-mentioned small-block write operations to obtain large-block sequential write operations, and flush the above-mentioned large-block sequential write operations to the write-back queue;
  • the large-block sequential write operation storage module is configured to write back the above-mentioned large-block sequential write operations in the above-mentioned write-back queue to the back-end file system for storage.
  • the embodiment of the present application discloses an electronic device, including a processor and a memory; wherein, when the processor executes the computer program stored in the memory, the log storage acceleration method disclosed above is realized.
  • the embodiment of the present application discloses a computer non-volatile readable storage medium configured to store a computer program; wherein, when the computer program is executed by a processor, the aforementioned disclosed log storage acceleration method is implemented.
  • the file to be written is divided into multiple objects to be written, and the objects to be written are respectively stored in object placement groups, and then the corresponding Small block write operation: Submit the above small block write operation to the log file system through the log queue, and write the above small block write operation into the hash-based multi-linked list data structure through the above log file system, so that the above small block write operation
  • the operations are merged to obtain sequential write operations of large blocks, and the above sequential write operations of large blocks are flushed to the write-back queue; the above-mentioned sequential write operations of large blocks in the above-mentioned write-back queue are written back to the back-end file system for storage.
  • the small block write operation is merged into a large block sequential write operation, and the next brush small block write operation is changed to the next brush large block sequential write operation to speed up log storage and improve storage IO performance.
  • FIG. 1 is a flow chart of a log storage acceleration method provided in an embodiment of the present application
  • FIG. 2 is a schematic diagram of an existing distributed storage file system access architecture
  • FIG. 3 is a schematic diagram of data stored in an existing distributed storage cluster
  • FIG. 4 is a schematic diagram of a log storage acceleration method provided by an embodiment of the present application.
  • FIG. 5 is a flow chart of an optional log storage acceleration method provided by the embodiment of the present application.
  • Figure 6 is a hash-based multi-linked list data structure provided by the embodiment of the present application.
  • FIG. 7 is a schematic diagram of a log storage acceleration device provided by an embodiment of the present application.
  • FIG. 8 is a structural diagram of an electronic device provided by an embodiment of the present application.
  • the embodiment of the present application provides a log storage acceleration solution, which can increase the log storage speed and improve the storage IO performance.
  • the embodiment of the present application discloses a method for accelerating log storage, which is applied to a distributed storage system, including:
  • Step S11 Divide the file to be written into multiple objects to be written, and store the objects to be written into object placement groups respectively, and then construct corresponding Small block write operations.
  • a distributed storage system is used, and the data storage backend OSD (Object Storage Device, object storage resource) process adopts a log file system mechanism. As shown in Figure 2, it provides a unified, self-controlled, and scalable distributed storage, and provides three protocol access interfaces: object storage (Object Storage), block storage (Block Storage) and file system storage (File System Storage).
  • OSD Object Storage Device, object storage resource
  • the distributed cluster corresponds to the object gateway (Rados (Reliable, Autonomic Distributed Object Store, reliable, self-repairing distributed object storage) GW (gateway, gateway) S3Swift) service, block ( RBD (Reliable, Autonomic Distributed Object Store block data, block storage)) service and file system (LibFS) service, Rados (Reliable, Autonomic Distributed Object Store) provides unified, self-controlled, scalable distributed storage; DRAM Cache It is a dynamic memory cache, DRAM (dynamic random access memory) is a dynamic random access memory, and cache is a cache.
  • DRAM Cache It is a dynamic memory cache
  • DRAM dynamic random access memory
  • cache is a cache.
  • the file system also needs MDS metadata cluster (or called, metadata service cluster), MON (monitor, monitoring service) cluster monitoring process to maintain the cluster state, data is stored in the storage pool pool, through PG (Placement Groups, placement group) Mapping to back-end storage, in order to better allocate and locate data, including object storage units, to store data functions.
  • MDS metadata cluster or called, metadata service cluster
  • MON monitor, monitoring service
  • PG Picture Groups, placement group
  • HDD OSD refers to the OSD backend file system located on HDD
  • HDD SSD refers to a solid-state hard disk drive.
  • a write operation When a write operation is performed, it is first written to an interface (a Rados file system interface), which converts file writes into object writes; therefore, the file to be written is divided into multiple objects to be written, and The objects to be written are respectively stored in object placement groups, and then corresponding small block write operations are constructed based on the objects to be written and the object placement groups.
  • an interface a Rados file system interface
  • FileStore represents the storage of the file system and log backing.
  • FileStore is often used as the back-end storage engine of the distributed storage system.
  • FileStore uses the POSIX interface (Portable Operating System Interface) of the file system to implement the Object Store API (Application Programming Interface, application programming interface); each Object (object) will be regarded as a file in the FileStore layer, and the attribute (xattr) of the Object will be accessed using the xattr attribute of the file, because some file systems (such as Ext4) have restrictions on the length of xattr , so the Metadata (metadata) that exceeds the length will be stored in DBObjectMap (database object mapping table structure), where DBObjectMap is a part of FileStore, which encapsulates a series of database operations on KeyValue (keywords and values stored in the database) API, while the KV (Key value, key-value pair) relationship of Object is directly implemented using DBObject
  • journal log mechanism makes a write request on the OSD side of the distributed storage system (the process that returns detailed data in response to client requests) into two write operations (write to Journal synchronously, and write to Object asynchronously);
  • the SSD is used as a Journal log to decouple the interaction between the Journal log and object write operations; each Object written corresponds to a physical file in the OSD local file system.
  • the OSD side cannot cache all local files.
  • the metadata of the file may require multiple local IOs for read and write operations, resulting in a decrease in storage system performance.
  • Step S12 Submit the small block write operation to the log file system through the log queue, and write the small block write operation into the hash-based multi-linked list data structure through the log file system, so as to save the small block Block write operations are combined to obtain large block sequential write operations, and the large block sequential write operations are flushed to the write-back queue.
  • a new memory-accelerated merge journal log architecture is designed, and a Hash-based Xi) multi-linked list data structure realizes journal log merging.
  • Nvme SSD is used as the storage medium of the journal log file system.
  • Each write transaction is first submitted to the journal log file system through the log queue, wherein, submit The method is Commit, and then the write operations will be written to the write-back queue in batches.
  • Use the fsync function to brush down.
  • the fsync function is used to synchronize all modified file data in the memory to the storage device.
  • the writing process of the combined journal log mechanism is different from the writing process in the traditional technology.
  • the small block write operation is submitted to the log file system through the log queue, and the small block write operation is written into the hash-based multi-linked list data structure through the log file system, so as to The small-block write operations are merged to obtain large-block sequential write operations, and the large-block sequential write operations are flushed to the write-back queue.
  • the log file system is located in Nvme SSD.
  • the operation from flushing data from journal logs to HDD disks is mainly divided into two stages.
  • the first stage is to write each random small block write operation into the Hash-based multi-linked list data structure; the second stage is to flush multiple merged random small block write operations to the write-back queue, that is, the large block Sequential write operations are flushed to the write-back queue.
  • the embodiment of the present application makes full use of the high-speed storage medium NVMe SSD, accelerates the IO performance of the log file system through the journal memory consolidation mechanism, and then improves the IO performance of distributed storage data.
  • the embodiment of the present application It not only optimizes the first submission stage of the journal log mechanism, but also optimizes the write back (Write Back) back-end persistent storage in the second stage, effectively reducing the performance of distributed storage back-end data persistent storage Shaking, unstable technical issues.
  • Step S13 writing back the sequential write operation of large blocks in the write-back queue to the back-end file system for storage.
  • the embodiment of the present application has designed a record module, which is used to record the write operations that have been successfully written into the HDD back-end file system.
  • the merged data can be managed by flashing to the HDD, so as to improve data durability and stability.
  • the embodiment of the present application improves the metadata index performance of the write (write) request, improves the data fsync download performance when the object is opened and closed, and reduces the number of write addressing and object opening and closing, thereby improving Write-back (WriteBack) efficiency is improved; a new data flushing scheme is designed to make full use of the performance advantages of merging journals, and at the same time prevent the problem of excessively long journal log queues.
  • the embodiment of the present application designs a security check mechanism to Guarantees the persistence of journal log data.
  • the file to be written is divided into multiple objects to be written, and the objects to be written are respectively stored in the object placement group, and then based on the object to be written and the object placement group Construct the corresponding small block write operation; submit the small block write operation to the log file system through the log queue, and write the small block write operation into the hash-based multi-link list data structure through the log file system, In order to merge the small-block write operations to obtain large-block sequential write operations, and write the large-block sequential write operations to the write-back queue; write back the large-block sequential write operations in the write-back queue to the backend file system for saving.
  • the small block write operation is merged into a large block sequential write operation, and the next brush small block write operation is changed to the next brush large block sequential write operation to speed up log storage and improve storage IO performance.
  • an optional log storage acceleration method applied to a distributed storage system including:
  • Step S21 Divide the file to be written into multiple objects to be written, store the objects to be written in object placement groups respectively, obtain the data identifiers to be written corresponding to the objects to be written, and set The object placement group sets the object placement group identifier and sets the object identifier for the object to be written, and then sets the target operation sequence number of the current small block write operation according to the preset operation sequence; it is constructed in the form of a quadruple to include The object placement group identifier, the object identifier, the target operation sequence number, and the small block write operation of the data to be written.
  • the data to be written corresponding to the objects to be written is obtained , and set the object placement group identifier for the object placement group and the object identifier for the object to be written, and then set the target operation sequence number of the current small block write operation according to the preset operation sequence; wherein, the object placement The group identifier can be represented by cid, the data to be written can be represented by oid, the target operation sequence number can be represented by sn, and the data to be written can be represented by data;
  • Step S22 Submit the small block write operation to the log file system through the log queue, and write the small block write operation into a hash-based multi-linked list data structure through the log file system, so that the small block Block write operations are combined to obtain large block sequential write operations, and the large block sequential write operations are flushed to the write-back queue.
  • the hash-based multi-linked list data structure is initialized in memory, and includes a combination of N slots and N linked lists, wherein each slot serves as a starting pointer of the linked list.
  • the embodiment of the present application is based on a multi-threaded writing method, and the log file system writes the small-block write operation into the hash-based multi-linked list data structure, which speeds up the speed.
  • the hash table uses the identifier of the data to be written as the Key (keyword), and uses the method of open addressing to solve the Hash (hash) conflict, where the hash conflict means that different keywords may obtain the same
  • all elements are stored in the hash table.
  • the next candidate position is calculated through a detection function. Search down until you find an empty slot to store the element to be inserted.
  • the open address means that in addition to the address obtained by the hash function is available, other addresses are also available when a conflict occurs.
  • each oid's value is mapped into a different slot when there is an empty slot in the hash table.
  • the Hash-based multi-linked list data structure each linked list contains M blocks, the size of each block is equal to the size of an object specified by the file system, and the blocks at the same position in the linked list are associated with the same cid , the value of cids corresponding to the block is assigned to the most frequently used block, which is updated after triggering the entire downswipe operation.
  • the memory consumption of the Hash-based multi-linked list data structure is determined by the parameters M and N and the object size. Therefore, the memory usage is controllable by choosing appropriate parameters M and N values.
  • the small block write operation is written into a hash-based multi-linked list data structure through the log file system, so that the small block write operation is merged to obtain a large block sequential write operation
  • the The process of flushing the large block sequential write operation to the write-back queue is: based on the open addressing method, using the object identifier in the small block write operation through the journal file system to read from the hash-based multiple Search the target slot in the linked list data structure; if the target slot is not found, the small block write operation is directly written to the write-back queue; if the target slot is found, the The small block write operation is mapped to the target slot, and the object placement group identifier in the small block write operation is used to search for the target block from the target linked list corresponding to the target slot; if the target block is not found If the target block is found, the write operation of the small block is directly flushed to the write-back queue; if the target block is found, the write operation of the small block is merged into the
  • a write operation [cid, oid, sn, data] reaches the Hash-based multi-linked list data structure in the first stage of the brush.
  • the writer thread will try to map it to some slot N of the hash table. If unsuccessful (i.e. there is no empty slot in the hash table and its oid is different from an existing slot), the operation will be immediately flushed to the writeback queue. If successful, the writer thread checks whether the block associated with cid exists in the corresponding linked list. If there is no such block, the write operation will be flushed directly to the write-back queue. Otherwise, it will be merged into the M block by appending write data.
  • the embodiment of the present application improves the metadata index performance of write-back requests.
  • the number of file objects is reduced.
  • the performance of data sync downloading is improved, and the number of write addressing and object opening and closing is reduced, thereby improving the write back (Write Back) efficiency.
  • the write operation written down to the write-back queue includes the large-block sequential write operation and the small-block write operation, and then the large-block sequential write operation and the write-back operation in the write-back queue need to be directly downloaded
  • the small block write operation flushed to the write-back queue is written back to the back-end file system, and stored according to the write-back sequence.
  • checkpoint the recording module in Figure 4
  • checkpoint the recording module in Figure 4
  • write operations are written back to the filesystem in the same order that they are appended to the journal file. Therefore, a checkpoint only needs to record the SN of the last successful write operation back to the filesystem.
  • the write operations in the log file may be out of sequence.
  • this embodiment of the present application records the sns of all write operations successfully written back since the last checkpoint.
  • a linked list to record sn, for each new write operation that is successfully written back, its sn is inserted into the preset linked list, so that all sns in the preset linked list follow these write operations in the log order to sort.
  • the small-block write operation corresponding to the large-block sequential write operation stored in the back-end file system and the small-block write operation directly flushed to the write-back queue are determined as target writes Operation: determine the target operation sequence number corresponding to the target write operation as the operation sequence number to be checked, and store the operation sequence number to be checked in the preset linked list according to the write-back sequence; use the preset check record unit Check the operation sequence number to be checked stored in the preset linked list, so as to check the operation sequence number to be checked in the preset linked list according to the preset operation sequence Sort.
  • the checkpoint process is performed as follows. Compare the sn value of the write operation at the checkpoint with the sn value of the first node in the preset linked list.
  • the checkpoint only needs to record the sn of the last write operation successfully written back to the file system, that is, the target operation sequence number. Therefore, according to the preset operation sequence, the target operation sequence number corresponding to the first small block write operation that has not been written back to the back-end file system is determined as the sequence number of the pending write-back operation, and the pending write-back The write operation sequence number is stored in the preset check recording unit.
  • Step S23 writing back the sequential write operation of large blocks in the write-back queue to the back-end file system for storage.
  • Nvme SSD is used as the storage medium of the Journal log file system, which solves the problem of performance jitter caused by randomly writing a large number of small files in distributed storage.
  • the embodiment of the present application proposes a memory merging journal mechanism, a memory acceleration architecture, and the memory usage is controllable.
  • the memory merging journal mechanism introduces a data structure in the memory to merge random writes of small files, and at the same time prevents the journal log and the recording unit log from increasing and occupying resources. Data persistence.
  • IOPS Input/Output Operations Per Second, the number of read and write operations per second
  • write delay when small files are randomly written in large quantities.
  • the embodiment of the present application has the following advantages: performance, distributed storage system massive small file performance IOPS, compared with the traditional log file system, the overall IO performance has been significantly improved. Stable, as the storage data time goes by, the IO performance is relatively stable. High durability, once a write transaction is successfully committed to the log, it will be permanently saved. Low cost, the additional resource consumption generated by the embodiment of the present application is maintained at a relatively low level. The compatibility is good, and the technology of the embodiment of the present application can be integrated into the existing log file system.
  • the file to be written is divided into multiple objects to be written, and the objects to be written are respectively stored in the object placement group, and then based on the object to be written and the object placement group Construct the corresponding small block write operation; submit the small block write operation to the log file system through the log queue, and write the small block write operation into the hash-based multi-link list data structure through the log file system, In order to merge the small-block write operations to obtain large-block sequential write operations, and write the large-block sequential write operations to the write-back queue; write back the large-block sequential write operations in the write-back queue to the backend file system for saving.
  • the small block write operation is merged into a large block sequential write operation, and the next brush small block write operation is changed to the next brush large block sequential write operation to speed up log storage and improve storage performance.
  • a log storage acceleration device including:
  • the small block write operation construction module 11 is configured to divide the file to be written into a plurality of objects to be written, and store the objects to be written into object placement groups respectively, and then based on the objects to be written and The object placement group constructs corresponding small block write operations;
  • the small block write operation merging module 12 is configured to send and submit the small block write operation to the log file system through the log queue, and write the small block write operation into the hash-based multi-link list through the log file system In the data structure, in order to merge described small block write operation and obtain large block sequential write operation, and described large block sequential write operation is brushed down to the write-back queue;
  • the large-block sequential write operation storage module 13 is configured to write back the large-block sequential write operation in the write-back queue to the back-end file system for storage.
  • the file to be written is divided into multiple objects to be written, and the objects to be written are respectively stored in the object placement group, and then based on the object to be written and the object placement group Construct the corresponding small block write operation; submit the small block write operation to the log file system through the log queue, and write the small block write operation into the hash-based multi-link list data structure through the log file system, In order to merge the small-block write operations to obtain large-block sequential write operations, and write the large-block sequential write operations to the write-back queue; write back the large-block sequential write operations in the write-back queue to the backend file system for saving.
  • the small block write operation is merged into a large block sequential write operation, and the next brush small block write operation is changed to the next brush large block sequential write operation to speed up log storage and improve storage performance.
  • FIG. 8 is a structural diagram of an electronic device 20 according to an exemplary embodiment. The content in the figure cannot be considered as any limitation on the scope of use of the embodiment of the present application. .
  • FIG. 8 is a schematic structural diagram of an electronic device 20 provided in an embodiment of the present application.
  • the electronic device 20 includes: at least one processor 21 , at least one memory 22 , a power supply 23 , an input/output interface 24 , a communication interface 25 and a communication bus 26 .
  • the memory 22 is set to store a computer program, and the computer program is loaded and executed by the processor 21, so as to implement the relevant steps of the log storage acceleration method disclosed in any of the foregoing embodiments.
  • the power supply 23 is set to provide working voltage for each hardware device on the electronic device 20;
  • the communication interface 25 can create a data transmission channel between the electronic device 20 and external devices, and the communication protocol it follows can be Any communication protocol applicable to the technical solution of the embodiment of the present application is not limited here;
  • the input and output interface 24 is set to obtain external input data or output data to the external, and its detailed interface type can be selected according to actual application needs , is not limited here.
  • the memory 22 is used as a resource storage carrier, which can be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., and the memory 22 can include a random access memory as a running memory and a non-volatile storage device set as an external memory.
  • Permanent storage on which storage resources include operating system 221, computer program 222, etc., and the storage method may be temporary storage or permanent storage.
  • the operating system 221 is set to manage and control each hardware device and computer program 222 on the electronic device 20 on the source host, and the operating system 221 can be Windows (Microsoft Windows operating system), Unix, Linux (GNU/Linux) and the like.
  • the computer program 222 may not only include a computer program configured to complete the method for accelerating log storage performed by the electronic device 20 disclosed in any of the foregoing embodiments, but may also include a computer program configured to complete other specific tasks.
  • the input/output interface 24 may include, but not limited to, a USB interface, a hard disk reading interface, a serial interface, a voice input interface, a fingerprint input interface, and the like.
  • the embodiment of the present application also discloses a computer non-volatile readable storage medium configured to store a computer program; wherein, when the computer program is executed by a processor, the aforementioned disclosed log storage acceleration method is implemented.
  • the computer non-volatile readable storage medium mentioned here includes random access memory (Random Access Memory, RAM), memory, read-only memory (Read-Only Memory, ROM), electrically programmable ROM, electrically erasable Programming ROM, registers, hard disk, magnetic disk or optical disk or any other form of storage medium known in the technical field.
  • RAM Random Access Memory
  • ROM read-only memory
  • EEPROM electrically programmable ROM
  • electrically erasable Programming ROM registers, hard disk, magnetic disk or optical disk or any other form of storage medium known in the technical field.
  • each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same or similar parts of each embodiment can be referred to each other.
  • the description is relatively simple, and for relevant details, please refer to the description of the method part.
  • RAM random access memory
  • ROM read-only memory
  • electrically programmable ROM electrically erasable programmable ROM
  • registers hard disk, removable disk, CD-ROM (compact disc read-only memory, CD-ROM), or any other form of storage medium known in the technical field.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种日志存储加速方法,应用于分布式存储系统,该方法包括:将待写入文件划分为多个待写入对象,并将待写入对象分别存放至对象放置组中,然后基于待写入对象和对象放置组构建相应的小块写操作(S11);通过日志队列将小块写操作提交至日志文件系统,并通过日志文件系统将小块写操作写入基于哈希的多链表数据结构中,以便对小块写操作进行合并得到大块顺序写操作,并将大块顺序写操作下刷至回写队列(S12);将回写队列中的大块顺序写操作回写至后端文件系统进行保存(S13)。由此可见,通过利用哈希的多链表数据结构将小块写操作合并为大块顺序写操作,将下刷小块写操作改为下刷大块顺序写操作以加速日志存储,提高存储性能。

Description

一种日志存储加速方法、装置、电子设备及非易失性可读存储介质
相关申请的交叉引用
本申请要求于2022年3月2日提交中国专利局,申请号为202210195258.6,申请名称为“一种日志存储加速方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及分布式存储技术领域,特别涉及一种日志存储加速方法、装置、电子设备及非易失性可读存储介质。
背景技术
当前,许多文件系统,无论是本地的文件系统例如EXT(Extended file system,延伸文件系统)3/4,还是分布式对象存储系统,当系统崩溃或者断电的情况下,为了保证数据的一致性和持久性,都采用了一种先写入journal日志的机制,每个写事务首先提交给一个只追加写的日志,然后写回到后端文件系统。当系统崩溃或断电时,恢复进程将扫描journal日志,然后重写尚未成功完成的写事务。技术较早之前,日志文件系统主要使用硬盘驱动器HDD(Hard Disk Drive)作为日志和数据的底层存储设备。随着技术的不断革新,非易失性存储协议接口Nvme(non-volatile memory-express)技术的不断发展,其中,NVMe SSD(NVMe solid-state drives,NVMe固态硬盘驱动器),受到了学术界和产业界研究人员的广泛关注。NVMe SSD相比于HDD存储性能快几个数量级。然而,针对当前日志文件系统中的IO(输入/输出,Input/Output)存储性能的需求,依然需要不断的进行性能优化。
现有技术中,许多日志文件系统使用非易失性内存设备Nvme SSD作为存储日志设备,以提高存储IO性能。但是,在海量小文件IO场景下,会出现严重的存储IO抖动现象,因为将海量小文件数据块回写到持久化磁盘驱动器上的后端文件系统(extended file system,XFS)比写日志慢得多,并且NVMe SSD利用率极低,与此同时,当出现小文件落盘回写到HDD进行持久化存储,即回写队列写满阻塞时,日志队列空闲,无法发挥SSD(solid-state drives,固态硬盘驱动器)的性能优势。
综上所述,如何加快日志存储速度并提高存储IO性能是当前亟待解决的问题。
发明内容
有鉴于此,本申请实施例的目的在于提供一种日志存储加速方法、装置、设备及介质,能够快日志存储速度并提高存储IO性能。其详细方案如下:
第一方面,本申请实施例公开了一种日志存储加速方法,应用于分布式存储系统,包括:
将待写入文件划分为多个待写入对象,并将上述待写入对象分别存放至对象放置组中,然后基于上述待写入对象和上述对象放置组构建相应的小块写操作;
通过日志队列将上述小块写操作提交至日志文件系统,并通过上述日志文件系统将上述小块写操作写入基于哈希的多链表数据结构中,以便对上述小块写操作进行合并得到大块顺序写操作,并将上述大块顺序写操作下刷至回写队列;
将上述回写队列中的上述大块顺序写操作回写至后端文件系统进行保存。
可选的,上述基于上述待写入对象和上述对象放置组构建相应的小块写操作,包括:
获取上述待写入对象对应的待写入数据,并为上述对象放置组设置对象放置组标识以及为上述待写入对象设置对象标识,然后按照预设操作顺序设定当前小块写操作的目标操作序列号;
以四元组的形式构建依次包含上述对象放置组标识、上述对象标识、上述目标操作序列号和上述待写入数据的小块写操作。
可选的,上述通过上述日志文件系统将上述小块写操作写入基于哈希的多链表数据结构中,以便对上述小块写操作进行合并得到大块顺序写操作,并将上述大块顺序写操作下刷至回写队列,包括:
基于开放寻址方法,通过上述日志文件系统利用上述小块写操作中的上述对象标识从上述基于哈希的多链表数据结构中查找目标槽位;
如果没有查找到上述目标槽位,则将上述小块写操作直接下刷至上述回写队列中;如果查找到上述目标槽位,则将上述小块写操作映射至上述目标槽位中,并利用上述小块写操作中的上述对象放置组标识从上述目标槽位对应的目标链表中查找目标块;
如果没有查找到上述目标块,则将上述小块写操作直接下刷至上述回写队列中;如果查找到上述目标块,则将上述小块写操作以追加写数据的方式合并至上述目标块中,以便得到大块顺序写操作,然后将上述大块顺序写操作下刷至上述回写队列中。
可选的,上述将上述回写队列中的上述大块顺序写操作回写至后端文件系统进行保存,包括:
将上述回写队列中的上述大块顺序写操作和直接下刷至上述回写队列的上述小块写操作回写至后端文件系统,并根据回写顺序进行保存。
可选的,上述将上述回写队列中的上述大块顺序写操作和直接下刷至上述回写队列的上 述小块写操作回写至后端文件系统,并根据回写顺序进行保存之后,还包括:
将上述后端文件系统中保存的上述大块顺序写操作对应的上述小块写操作和直接下刷至上述回写队列的上述小块写操作确定为目标写操作;
将上述目标写操作对应的目标操作序列号确定为待检查操作序列号,并将上述待检查操作序列号根据上述回写顺序存储至预设链表中;
利用预设检查记录单元中存储的待回写操作序列号对上述预设链表中存储的待检查操作序列号进行检查,以便按照上述预设操作顺序,对上述预设链表中的上述待检查操作序列号进行排序。
可选的,上述利用预设检查记录单元中存储的上述待回写操作序列号对上述预设链表中存储的待检查操作序列号进行检查之前,还包括:
根据上述预设操作顺序将没有回写至上述后端文件系统的第一个上述小块写操作对应的目标操作序列号确定为待回写操作序列号,并将该待回写操作序列号存储至上述预设检查记录单元。
可选的,上述通过上述日志文件系统将上述小块写操作写入基于哈希的多链表数据结构中,包括:
基于多线程写入方式,通过上述日志文件系统将上述小块写操作写入基于哈希的多链表数据结构中。
第二方面,本申请实施例公开了一种日志存储加速装置,应用于分布式存储系统,包括:
小块写操作构建模块,被设置为将待写入文件划分为多个待写入对象,并将上述待写入对象分别存放至对象放置组中,然后基于上述待写入对象和上述对象放置组构建相应的小块写操作;
小块写操作合并模块,被设置为通过日志队列将上述小块写操作发送提交至日志文件系统,并通过上述日志文件系统将上述小块写操作写入基于哈希的多链表数据结构中,以便对上述小块写操作进行合并获得大块顺序写操作,并将上述大块顺序写操作下刷至回写队列;
大块顺序写操作保存模块,被设置为将上述回写队列中的上述大块顺序写操作回写至后端文件系统进行保存。
第三方面,本申请实施例公开了一种电子设备,包括处理器和存储器;其中,上述处理器执行上述存储器中保存的计算机程序时实现前述公开的日志存储加速方法。
第四方面,本申请实施例公开了一种计算机非易失性可读存储介质,被设置为存储计算机程序;其中,上述计算机程序被处理器执行时实现前述公开的日志存储加速方法。
可见,本申请实施例将待写入文件划分为多个待写入对象,并将上述待写入对象分别存放至对象放置组中,然后基于上述待写入对象和上述对象放置组构建相应的小块写操作;通过日志队列将上述小块写操作提交至日志文件系统,并通过上述日志文件系统将上述小块写操作写入基于哈希的多链表数据结构中,以便对上述小块写操作进行合并得到大块顺序写操作,并将上述大块顺序写操作下刷至回写队列;将上述回写队列中的上述大块顺序写操作回写至后端文件系统进行保存。由此可见,通过利用哈希的多链表数据结构将小块写操作合并为大块顺序写操作,将下刷小块写操作改为下刷大块顺序写操作以加速日志存储,提高存储IO性能。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请实施例提供的一种日志存储加速方法流程图;
图2为现有的分布式存储文件系统访问架构示意图;
图3为现有的分布式存储集群存储数据示意图;
图4为本申请实施例提供的一种日志存储加速方法示意图;
图5为本申请实施例提供的一种可选的日志存储加速方法流程图;
图6为本申请实施例提供的一种基于哈希的多链表数据结构;
图7为本申请实施例提供的一种日志存储加速装置示意图;
图8为本申请实施例提供的一种电子设备结构图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请实施例保护的范围。
当前在海量小文件IO场景下,会出现严重的存储IO抖动现象,因为将海量小文件数据块回写到持久化磁盘驱动器上的后端文件系统(XFS)比写日志慢得多,并且NVMe SSD利用率极低,与此同时,当出现小文件落盘回写到HDD进行持久化存储,即回写队列写满阻塞时,日志队列空闲,无法发挥SSD(solid-state drives,固态硬盘驱动器)的性能优势。 为了克服上述问题,本申请实施例提供了一种日志存储加速方案,能够快日志存储速度并提高存储IO性能。
参见图1所示,本申请实施例实施例公开了一种日志存储加速方法,应用于分布式存储系统,包括:
步骤S11:将待写入文件划分为多个待写入对象,并将所述待写入对象分别存放至对象放置组中,然后基于所述待写入对象和所述对象放置组构建相应的小块写操作。
本申请实施例中,使用的是分布式存储系统,数据存储后端OSD(Object Storage Device,对象存储资源)进程采用日志文件系统机制。如图2所示,提供一种统一的、自控的、可扩展的分布式存储,提供对象存储(Object Storage),块存储(Block Storage)和文件系统存储(File System Storage)三种协议访问接口,可通过底层的动态库与后端交互,分布式集群对应对象网关(Rados(Reliable,Autonomic Distributed Object Store,可靠的,自修复分布式对象存储)GW(gateway,网关)S3Swift)服务,块(RBD(Reliable,Autonomic Distributed Object Store block data,块存储))服务和文件系统(LibFS)服务,Rados(Reliable,Autonomic Distributed Object Store)提供统一的、自控的、可扩展的分布式存储;其中DRAM Cache为动态内存高速缓存,DRAM(dynamic random access memory)为动态随机存取存储器,cache是高速缓存。文件系统还需要MDS元数据集群(或称为,元数据服务集群),MON(monitor,监控服务)集群监控进程维护集群状态,数据存放在存储池pool中,通过PG(Placement Grouops,放置组)映射到后端存储,为了更好的分配和定位数据,包括对象存储单元,用以存储数据的功能。另外,HDD OSD表示位于HDD上的OSD后端文件系统,HDD SSD为固态硬盘驱动器。本申请实施例特别指出在分布式文件系统中,每个文件被划分为若干个目录中的对象;其中,目录也标识对象放置组。当一个写入操作时,它首先被写入一个接口(一个Rados文件系统接口),它将文件写入转换为对象写入;因此,将待写入文件划分为多个待写入对象,并将所述待写入对象分别存放至对象放置组中,然后基于所述待写入对象和所述对象放置组构建相应的小块写操作。
需要指出的是,FileStore(文件存储)表示文件系统和日志后备的存储。在分布式存储系统下,FileStore常作为分布式存储系统的后端存储引擎,FileStore利用文件系统的POSIX接口(Portable Operating System Interface,可移植操作系统接口)实现Object Store API(Application Programming Interface,应用程序编程接口);每个Object(对象)在FileStore层会被看成是一个文件,Object的属性(xattr)会利用文件的xattr属性存 取,因为有些文件系统(如Ext4)对xattr的长度有限制,因此超出长度的Metadata(元数据)会被存储在DBObjectMap(数据库对象映射表结构)里,其中,DBObjectMap是FileStore的一部分,封装了对KeyValue(数据库存储的关键字和值)数据库操作一系列的API,而Object的KV(Key value,键值对)关系则直接利用DBObjectMap实现。但是FileStore存在一些问题,例如Journal日志机制使一次写请求在分布式存储系统OSD端(响应客户端请求返回详细数据的进程)变为两次写操作(同步写Journal,异步写入Object);通过SSD用作Journal日志以解耦Journal日志和object写操作的相互影响;写入的每个Object都一一对应OSD本地文件系统的一个物理文件,对于大量小Object存储场景,OSD端无法缓存本地所有文件的元数据,使读写操作可能需要多次本地IO,导致存储系统性能下降。
步骤S12:通过日志队列将所述小块写操作提交至日志文件系统,并通过所述日志文件系统将所述小块写操作写入基于哈希的多链表数据结构中,以便对所述小块写操作进行合并得到大块顺序写操作,并将所述大块顺序写操作下刷至回写队列。
本申请实施例中,根据HDD在进行大块顺序写操作时比随机小块写操作的性能要好的条件,设计了一种新的内存加速合并journal日志架构,在内存中引入了基于Hash(哈希)的多链表数据结构实现journal日志合并。
需要指出的是,现有技术中,如图3所示,发起写请求,使用Nvme SSD作为journal日志文件系统的存储介质,每个写事务首先通过日志队列提交给journal日志文件系统,其中,提交方式为Commit提交,然后,写操作将分批下刷到回写队列。利用fsync函数进行下刷。fsync函数用于同步内存中所有已修改的文件数据到存储设备。
本申请实施例中,合并journal日志机制的写过程与传统技术中的写过程不同。如图4所示,通过日志队列将所述小块写操作提交至日志文件系统,并通过所述日志文件系统将所述小块写操作写入基于哈希的多链表数据结构中,以便对所述小块写操作进行合并得到大块顺序写操作,并将所述大块顺序写操作下刷至回写队列。需要指出的是,所述日志文件系统位于Nvme SSD中。从journal日志下刷数据到HDD磁盘操作主要分为两个阶段。第一阶段是将每个随机小块写操作写入到基于Hash的多链表数据结构中;第二阶段是将多个合并的随机小块写操作下刷到回写队列中,也即将大块顺序写操作下刷至回写队列。可以理解的是,本申请实施例充分利用高速存储介质NVMe SSD,通过journal内存合并机制加速日志文件系统IO性能,进而提高分布式存储数据IO性能,相比于现有技术中,本申请实施例不仅对journal日志机制的第一次提交阶段进行了优化,同时,也对第二阶段回写(Write Back)后端持久化存储进行了优化,有效减少了分布式存储后端数据持久化存储性能抖动、不稳定的技术问 题。
步骤S13:将所述回写队列中的所述大块顺序写操作回写至后端文件系统进行保存。
本申请实施例中,如图3所示,将写操作分批下刷到回写队列后,会回写到HDD上的OSD后端文件系统。若回写成功,数据就变成永久性的,然后,下刷落盘成功后,相关的日志项将基于校验位从日志中丢弃。如果系统崩溃或断电,可以使用重做日志和日志校验位机制将硬盘数据恢复到最新的一致性状态。为了减少对整个数据进行日志记录的负担,大部分文件系统只对元数据进行日志记录,因为它们不能保证所有数据的持久性,所以它们只适用于特定的应用程序;另外,基于NVMe SSD的随机小块文件写入Journal日志速度较快,但是,基于HDD的后端数据持久化存储磁盘,下刷日志时,随机小块写速度较慢。因此,会导致回写队列写满阻塞的情况发生,这将导致日志文件系统队列处于阻塞休眠状态,从而导致严重的性能波动;但是,对于较大块文件的随机大量写,HDD性能相对较好,此时回写速度较快;由于将HDD全部替换为SSD的成本较高,目前而言不具有实际意义,因此,本申请实施例提出日志记录应用于整个数据的方法,并通过基于Hash的多链表数据结构,实现日志内存合并加速机制。
需要指出的是,本申请实施例设计了记录模块,作用是记录已经成功写入HDD后端文件系统的写操作,通过记录可以对合并的数据下刷到HDD进行管理,提高数据的持久性和稳定性。
需要指出的是,基于Hash的多链表数据结构,根据多线程写入的特点,对写入小文件进行分组合并,实现journal内存合并加速机制;该结构能够有效的聚合小块文件,还可以为提高数据下刷性能。此外,本申请实施例提高了write(写入)请求的元数据索引性能,在对象打开和关闭时,提高数据fsync下刷性能,同时减少了写寻址和对象打开和关闭的次数,进而提高了回写(WriteBack)效率;设计了一种新的数据下刷方案,以充分利用合并journal的性能优势,同时防止journal日志队列过长问题,此外,本申请实施例设计了安全校验机制以保证journal日志的数据的持久性。
可见,本申请实施例将待写入文件划分为多个待写入对象,并将所述待写入对象分别存放至对象放置组中,然后基于所述待写入对象和所述对象放置组构建相应的小块写操作;通过日志队列将所述小块写操作提交至日志文件系统,并通过所述日志文件系统将所述小块写操作写入基于哈希的多链表数据结构中,以便对所述小块写操作进行合并得到大块顺序写操作,并将所述大块顺序写操作下刷至回写队列;将所述回写队列中的所述大块顺序写操作回写至后端文件系统进行保存。由此可见,通过利用哈希的多链表数据结构将小块写操作合并 为大块顺序写操作,将下刷小块写操作改为下刷大块顺序写操作以加速日志存储,提高存储IO性能。
参见图5所示,本申请实施例公开了一种可选的日志存储加速方法,应用于分布式存储系统,包括:
步骤S21:将待写入文件划分为多个待写入对象,并将所述待写入对象分别存放至对象放置组中,获取所述待写入对象对应的待写入数据标识,并为所述对象放置组设置对象放置组标识以及为所述待写入对象设置对象标识,然后按照预设操作顺序设定当前小块写操作的目标操作序列号;以四元组的形式构建依次包含所述对象放置组标识、所述对象标识、所述目标操作序列号和所述待写入数据的小块写操作。
本申请实施例中,将待写入文件划分为多个待写入对象,并将所述待写入对象分别存放至对象放置组中之后,获取所述待写入对象对应的待写入数据,并为所述对象放置组设置对象放置组标识以及为所述待写入对象设置对象标识,然后按照预设操作顺序设定当前小块写操作的目标操作序列号;其中,所述对象放置组标识可用cid表示,所述待写入数据标识可用oid表示,所述目标操作序列号可表示用sn表示,所述待写入数据可用data表示;然后,以四元组的形式构建依次包含所述对象放置组标识、所述对象标识、所述目标操作序列号和所述待写入数据的小块写操作;因此,所述小块写操作可表示为一个四元组[cid,oid,sn,data]。需要指出的是,当对象放置组中的对象数量通常很小时,对象组的数量可能非常大;因此,定位一个对象所需的时间很短。换句话说,cid可以在一个非常大的范围内变化,而oid的数量是有限的。
步骤S22:通过日志队列将所述小块写操作提交至日志文件系统,并通过所述日志文件系统将所述小块写操作写入基于哈希的多链表数据结构中,以便对所述小块写操作进行合并得到大块顺序写操作,并将所述大块顺序写操作下刷至回写队列。
本申请实施例中,所述基于哈希的多链表数据结构在内存中初始化,包含N个槽位和N个链表的组合,其中,每个槽位充当链表的起始指针。
需要指出的是,本申请实施例基于多线程写入方式,通过所述日志文件系统将所述小块写操作写入基于哈希的多链表数据结构中,加快了速度。
需要指出的是,哈希表以待写入数据标识作为Key(关键字),使用开放寻址的方法解决Hash(哈希)冲突,其中,哈希冲突是指对应不同的关键字可能获得相同的哈希地址,即key1≠key2,但是f(key1)=f(key2)。开放寻址法中,所有的元素都存放在散列表里,当产生哈希冲突时,通过一个探测函数计算出下一个候选位置,如果下一个获选位置还是有冲 突,那么不断通过探测函数往下找,直到找个一个空槽来存放待插入元素。开放地址的意思是除了哈希函数得出的地址可用,当出现冲突的时候其他的地址也一样可用,常见的开放地址思想的方法有线性探测再散列,二次探测再散列等,这些方法都是在第一选择被占用的情况下的解决方法。通过这种方法,当哈希表中有空槽时,每个oid的值都映射到不同的槽中。如图6所示基于Hash的多链表数据结构,每个链表包含M个块,每个块的大小等于一个由文件系统指定的对象的大小,位于链表中相同位置的块与相同的cid相关联,与块对应的cids的值被分配给最常用的块,在触发整个下刷操作后进行更新。显然,基于Hash的多链表数据结构的内存消耗是由参数M和N以及对象大小决定的。因此,选择合适的参数M和N值,内存占用是可控的。
本申请实施例中,通过所述日志文件系统将所述小块写操作写入基于哈希的多链表数据结构中,以便对所述小块写操作进行合并得到大块顺序写操作,并将所述大块顺序写操作下刷至回写队列的过程为:基于开放寻址方法,通过所述日志文件系统利用所述小块写操作中的所述对象标识从所述基于哈希的多链表数据结构中查找目标槽位;如果没有查找到所述目标槽位,则将所述小块写操作直接下刷至所述回写队列中;如果查找到所述目标槽位,则将所述小块写操作映射至所述目标槽位中,并利用所述小块写操作中的所述对象放置组标识从所述目标槽位对应的目标链表中查找目标块;如果没有查找到所述目标块,则将所述小块写操作直接下刷至所述回写队列中;如果查找到所述目标块,则将所述小块写操作以追加写数据的方式合并至所述目标块中,以便得到大块顺序写操作,然后将所述大块顺序写操作下刷至所述回写队列中。假设一个写操作[cid,oid,sn,data],在下刷的第一阶段到达基于Hash的多链表数据结构。根据oid,写线程将尝试将它映射到哈希表的某个槽位N。如果没有成功(即哈希表中没有空槽且其oid与现有槽不同),该操作将立即刷新到回写队列。如果成功,写线程将检查在相应的链表中是否存在与cid相关联的块。如果没有这样的块,写操作将直接刷新到回写队列。否则,它将以追加写数据的方式合并到M块中。通过以上方法,将小文件随机小块写合并为大块文件顺序写,本申请实施例提高了回写请求的元数据索引性能,同时,由于数据合并为大文件,文件对象数的减少,在对象打开和关闭时,提高数据sync下刷性能,同时减少了写寻址和对象打开和关闭的次数,进而提高了回写(Write Back)效率。如图6所示,有四个小块写操作,分别为[cid1,oid1,sn8,8KB]、[cid1,oid1,sn7,8KB]、[cid2,oid7,sn4,4KB]、[cid1,oid1,sn1,4KB],这四个写操作通过oid找到哈希表中的目标槽位,并映射至目标槽位,然后通过cid查找目标块,并将所述小块写操作以追加写数据的方式合并至所述目标块中。
需要指出的是,下刷至回写队列的写操作包括所述大块顺序写操作和所述小块写操作,之后需要将所述回写队列中的所述大块顺序写操作和直接下刷至所述回写队列的所述小块写操作回写至后端文件系统,并根据回写顺序进行保存。
可以理解的是,本申请实施例在journal日志文件系统中,写操作会被附加到日志文件中。日志文件中存在一个检查记录单元,也即图4中的记录模块,以下称为检查点,该检查点定期更新,记录在最后一个检查点时还没有写回文件系统的第一个写操作。在传统的日志文件系统中,写操作被写回文件系统的顺序与它们被追加到日志文件的顺序相同。因此,检查点只需要记录上次成功写回文件系统的写操作的sn。然而,在本申请实施例内存合并jounal机制中,由于合并操作,在日志文件中的写操作有可能出现乱序。因此,上次成功写回文件系统的写操作的序列号不足以用于校验。因此,本申请实施例记录了自最后一个检查点以来成功回写的所有写操作的sn。可选的,使用一个链表来记录sn,对于每一个成功回写的新的写操作,它的sn被插入预设链表中,这样预设链表中的所有sn都按照这些写操作在日志中的顺序进行排序。可选的,将所述后端文件系统中保存的所述大块顺序写操作对应的所述小块写操作和直接下刷至所述回写队列的所述小块写操作确定为目标写操作;将所述目标写操作对应的目标操作序列号确定为待检查操作序列号,并将所述待检查操作序列号根据所述回写顺序存储至预设链表中;利用预设检查记录单元中存储的待回写操作序列号对所述预设链表中存储的待检查操作序列号进行检查,以便按照所述预设操作顺序,对所述预设链表中的所述待检查操作序列号进行排序。在排序过程中,检查点过程按如下方式执行。比较写操作在检查点的sn值与预设链表中第一个节点的sn值。如果相等,则通过一个写操作将检查点向后移动,并删除预设链表中的第一个节点。然后,重复这个步骤。否则,过程终止。基于这个新的检查点机制,在故障场景恢复过程,数据持久性得到了保障。需要指出的是,所述预设链表位于图4中的内存中。
需要指出的是,检查点只需要记录上次成功写回文件系统的写操作的sn,也即目标操作序列号。因此,根据所述预设操作顺序将没有回写至所述后端文件系统的第一个所述小块写操作对应的目标操作序列号确定为待回写操作序列号,并将该待回写操作序列号存储至所述预设检查记录单元。
步骤S23:将所述回写队列中的所述大块顺序写操作回写至后端文件系统进行保存。
本申请实施例中,使用Nvme SSD作为Journal日志文件系统存储介质,解决了分布式存储小文件随机大量写入的性能抖动问题。本申请实施例提出了内存合并journal机制,一种内存加速架构,并且内存占用可控。内存合并journal机制在内存中引入了一个数据结构对 小文件随机写进行合并,同时防止journal日志和记录单元日志增长占用资源,本申请实施例采用了一种新的记录日志即检查点过程来保持数据的持久性。与现有技术相比,本申请实施例在小文件随机大量写时IOPS(Input/Output Operations Per Second,每秒进行读写操作的次数)和写延迟方面都具有稳定的性能和数据可靠性。
需要指出的是,本申请实施例具有以下优点:性能,分布式存储系统海量小文件性能IOPS,与传统的的日志文件系统相比,总体IO性能有了显著提高。稳定,随着存储数据时间的推移,IO性能相对稳定。耐用性高,一旦写事务成功提交到日志,它将永久保存。低成本,本申请实施例产生的额外资源消耗维持在较低的水平。兼容性好,本申请实施例技术可以整合到现有的日志文件系统中。
可见,本申请实施例将待写入文件划分为多个待写入对象,并将所述待写入对象分别存放至对象放置组中,然后基于所述待写入对象和所述对象放置组构建相应的小块写操作;通过日志队列将所述小块写操作提交至日志文件系统,并通过所述日志文件系统将所述小块写操作写入基于哈希的多链表数据结构中,以便对所述小块写操作进行合并得到大块顺序写操作,并将所述大块顺序写操作下刷至回写队列;将所述回写队列中的所述大块顺序写操作回写至后端文件系统进行保存。由此可见,通过利用哈希的多链表数据结构将小块写操作合并为大块顺序写操作,将下刷小块写操作改为下刷大块顺序写操作以加速日志存储,提高存储性能。
参见图7所示,本申请实施例公开了一种日志存储加速装置,包括:
小块写操作构建模块11,被设置为将待写入文件划分为多个待写入对象,并将所述待写入对象分别存放至对象放置组中,然后基于所述待写入对象和所述对象放置组构建相应的小块写操作;
小块写操作合并模块12,被设置为通过日志队列将所述小块写操作发送提交至日志文件系统,并通过所述日志文件系统将所述小块写操作写入基于哈希的多链表数据结构中,以便对所述小块写操作进行合并获得大块顺序写操作,并将所述大块顺序写操作下刷至回写队列;
大块顺序写操作保存模块13,被设置为将所述回写队列中的所述大块顺序写操作回写至后端文件系统进行保存。
其中,关于上述各个模块更加详细的工作过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。
可见,本申请实施例将待写入文件划分为多个待写入对象,并将所述待写入对象分别存 放至对象放置组中,然后基于所述待写入对象和所述对象放置组构建相应的小块写操作;通过日志队列将所述小块写操作提交至日志文件系统,并通过所述日志文件系统将所述小块写操作写入基于哈希的多链表数据结构中,以便对所述小块写操作进行合并得到大块顺序写操作,并将所述大块顺序写操作下刷至回写队列;将所述回写队列中的所述大块顺序写操作回写至后端文件系统进行保存。由此可见,通过利用哈希的多链表数据结构将小块写操作合并为大块顺序写操作,将下刷小块写操作改为下刷大块顺序写操作以加速日志存储,提高存储性能。
可选的,本申请实施例还提供了一种电子设备,图8是根据一示例性实施例示出的电子设备20结构图,图中的内容不能认为是对本申请实施例的使用范围的任何限制。
图8为本申请实施例提供的一种电子设备20的结构示意图。该电子设备20,包括:至少一个处理器21、至少一个存储器22、电源23、输入输出接口24、通信接口25和通信总线26。其中,所述存储器22被设置为存储计算机程序,所述计算机程序由所述处理器21加载并执行,以实现前述任意实施例公开的日志存储加速方法的相关步骤。
本实施例中,电源23被设置为为电子设备20上的各硬件设备提供工作电压;通信接口25能够为电子设备20创建与外界设备之间的数据传输通道,其所遵循的通信协议是能够适用于本申请实施例技术方案的任意通信协议,在此不对其进行限定;输入输出接口24,被设置为获取外界输入数据或向外界输出数据,其详细的接口类型可以根据实际应用需要进行选取,在此不进行限定。
另外,存储器22作为资源存储的载体,可以是只读存储器、随机存储器、磁盘或者光盘等,存储器22作为可以包括作为运行内存的随机存取存储器和被设置为外部内存的存储用途的非易失性存储器,其上的存储资源包括操作系统221、计算机程序222等,存储方式可以是短暂存储或者永久存储。
其中,操作系统221被设置为管理与控制源主机上电子设备20上的各硬件设备以及计算机程序222,操作系统221可以是Windows(微软视窗操作系统)、Unix、Linux(GNU/Linux)等。计算机程222除了包括能够被设置为完成前述任一实施例公开的由电子设备20执行的日志存储加速方法的计算机程序之外,还可以包括能够被设置为完成其他特定工作的计算机程序。
本实施例中,所述输入输出接口24可以包括但不限于USB接口、硬盘读取接口、串行接口、语音输入接口、指纹输入接口等。
可选的,本申请实施例还公开了一种计算机非易失性可读存储介质,被设置为存储计算 机程序;其中,所述计算机程序被处理器执行时实现前述公开的日志存储加速方法。
关于该方法的步骤可以参考前述实施例中公开的相应内容,在此不再进行赘述
这里所说的计算机非易失性可读存储介质包括随机存取存储器(Random Access Memory,RAM)、内存、只读存储器(Read-Only Memory,ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、磁碟或者光盘或技术领域内所公知的任意其他形式的存储介质。其中,所述计算机程序被处理器执行时实现前述日志存储加速方法。关于该方法的步骤可以参考前述实施例中公开的相应内容,在此不再进行赘述。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的日志存储加速方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
专业人员还可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请实施例的范围。
结合本文中所公开的实施例描述算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM(compact disc read-only memory,只读光盘存储器)、或技术领域内所公知的任意其它形式的存储介质中。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上对本申请实施例所提供的一种日志存储加速方法、装置、设备及介质进行了详细介 绍,本文中应用了可选的个例对本申请实施例的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请实施例的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请实施例的思想,在实际实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请实施例的限制。

Claims (20)

  1. 一种日志存储加速方法,应用于分布式存储系统,包括:
    将待写入文件划分为多个待写入对象,并将所述待写入对象分别存放至对象放置组中,然后基于所述待写入对象和所述对象放置组构建相应的小块写操作;
    通过日志队列将所述小块写操作提交至日志文件系统,并通过所述日志文件系统将所述小块写操作写入基于哈希的多链表数据结构中,以便对所述小块写操作进行合并得到大块顺序写操作,并将所述大块顺序写操作下刷至回写队列;
    将所述回写队列中的所述大块顺序写操作回写至后端文件系统进行保存。
  2. 根据权利要求1所述的日志存储加速方法,其中,所述基于所述待写入对象和所述对象放置组构建相应的小块写操作,包括:
    获取所述待写入对象对应的待写入数据,并为所述对象放置组设置对象放置组标识以及为所述待写入对象设置对象标识,然后按照预设操作顺序设定当前小块写操作的目标操作序列号;
    以四元组的形式构建依次包含所述对象放置组标识、所述对象标识、所述目标操作序列号和所述待写入数据的小块写操作。
  3. 根据权利要求2所述的日志存储加速方法,其中,所述通过所述日志文件系统将所述小块写操作写入基于哈希的多链表数据结构中,以便对所述小块写操作进行合并得到大块顺序写操作,并将所述大块顺序写操作下刷至回写队列,包括:
    基于开放寻址方法,通过所述日志文件系统利用所述小块写操作中的所述对象标识从所述基于哈希的多链表数据结构中查找目标槽位;
    如果没有查找到所述目标槽位,则将所述小块写操作直接下刷至所述回写队列中;如果查找到所述目标槽位,则将所述小块写操作映射至所述目标槽位中,并利用所述小块写操作中的所述对象放置组标识从所述目标槽位对应的目标链表中查找目标块;
    如果没有查找到所述目标块,则将所述小块写操作直接下刷至所述回写队列中;如果查找到所述目标块,则将所述小块写操作以追加写数据的方式合并至所述目标块中,以便得到大块顺序写操作,然后将所述大块顺序写操作下刷至所述回写队列中。
  4. 根据权利要求3所述的日志存储加速方法,其中,所述将所述回写队列中的所述大块顺序写操作回写至后端文件系统进行保存,包括:
    将所述回写队列中的所述大块顺序写操作和直接下刷至所述回写队列的所述小块写操作回写至后端文件系统,并根据回写顺序进行保存。
  5. 根据权利要求4所述的日志存储加速方法,其中,所述将所述回写队列中的所述大块顺序写操作和直接下刷至所述回写队列的所述小块写操作回写至后端文件系统,并根据回写顺序进行保存之后,还包括:
    将所述后端文件系统中保存的所述大块顺序写操作对应的所述小块写操作和直接下刷至所述回写队列的所述小块写操作确定为目标写操作;
    将所述目标写操作对应的目标操作序列号确定为待检查操作序列号,并将所述待检查操作序列号根据所述回写顺序存储至预设链表中;
    利用预设检查记录单元中存储的待回写操作序列号对所述预设链表中存储的待检查操作序列号进行检查,以便按照所述预设操作顺序,对所述预设链表中的所述待检查操作序列号进行排序。
  6. 根据权利要求5所述的日志存储加速方法,其中,所述利用预设检查记录单元中存储的所述待回写操作序列号对所述预设链表中存储的待检查操作序列号进行检查之前,还包括:
    根据所述预设操作顺序将没有回写至所述后端文件系统的第一个所述小块写操作对应的目标操作序列号确定为待回写操作序列号,并将该待回写操作序列号存储至所述预设检查记录单元。
  7. 根据权利要求1至6任一项的所述的日志存储加速方法,其中,所述通过所述日志文件系统将所述小块写操作写入基于哈希的多链表数据结构中,包括:
    基于多线程写入方式,通过所述日志文件系统将所述小块写操作写入基于哈希的多链表数据结构中。
  8. 根据权利要求1至6任一项的所述的日志存储加速方法,其中,在将待写入文件划分为多个待写入对象之前,所述方法还包括:
    当检测到文件写入操作时,将所述待写入文件写入文件系统接口,通过所述文件系统接口将所述文件写入操作转换为对象写入操作,其中,所述文件写入操作用于请求写入所述待写入文件。
  9. 根据权利要求1至6任一项的所述的日志存储加速方法,其中,所述日志文件系统位于非易失性存储协议接口固态硬盘驱动器Nvme SSD中。
  10. 根据权利要求2的所述的日志存储加速方法,其中,
    所述基于哈希的多链表数据结构中的每个链表包含M个块,每个块的大小等于一个由所述日志文件系统指定的对象的大小,位于链表中相同位置的块与相同的所述对象放置组标识相关联,其中,M为大于或等于2的正整数。
  11. 根据权利要求10的所述的日志存储加速方法,其中,与块对应的所述对象放置组标识的值被分配给最常用的块,在触发整个下刷操作后进行更新。
  12. 根据权利要求10的所述的日志存储加速方法,其中,所述基于哈希的多链表数据结构在内存中初始化,包含N个槽位和N个链表的组合,其中,每个槽位充当链表的起始指针,N为大于或等于2的正整数,N和M是根据所述基于哈希的多链表数据结构的预设内存消耗确定的值。
  13. 根据权利要求5的所述的日志存储加速方法,其中,所述方法还包括:
    在将所述回写队列中的所述大块顺序写操作和直接下刷至所述回写队列的所述小块写操作都成功回写至所述后端文件系统的情况下,丢弃所述待写入文件;或者
    在按照所述预设操作顺序,对所述预设链表中的所述待检查操作序列号进行排序之后,在所述预设链表中记录了的所述回写队列中的所述大块顺序写操作对应的所述小块写操作的操作序列号,以及直接下刷至所述回写队列的所述小块写操作的操作序列号的情况下,丢弃所述待写入文件。
  14. 根据权利要求5的所述的日志存储加速方法,其中,所述按照所述预设操作顺序,对所述预设链表中的所述待检查操作序列号进行排序,包括:
    比较所述目标写操作在所述预设检查记录单元中的所述待回写操作序列号与所述预设链表中第一个节点的待检查操作序列号;
    在所述目标写操作在所述预设检查记录单元中的所述待回写操作序列号与所述预设链表中第一个节点的待检查操作序列号相等的情况下,通过一个写操作将所述预设检查记录单元向后移动,并删除所述预设链表中的所述第一个节点;
    在所述目标写操作在所述预设检查记录单元中的所述待回写操作序列号与所述预设链表中第一个节点的待检查操作序列号不相等的情况下,终止对所述预设链表中的所述待检查操作序列号进行排序。
  15. 根据权利要求5的所述的日志存储加速方法,其中,所述将所述待检查操作序列号根据所述回写顺序存储至预设链表中,包括:
    将每一个成功回写的新的所述目标写操作所对应的所述目标操作序列号作为所述待检查操作序列号插入所述预设链表中,其中,所述预设链表中的所有的所述待检查操作序列号按照所述目标写操作在所述日志文件系统中的所述回写顺序进行排序。
  16. 根据权利要求5的所述的日志存储加速方法,其中,
    所述预设检查记录单元中记录了上次成功回写所述日志文件系统的所述目标写操作所对应的所述目标操作序列号。
  17. 一种日志存储加速装置,应用于分布式存储系统,包括:
    小块写操作构建模块,被设置为将待写入文件划分为多个待写入对象,并将所述待写入对象分别存放至对象放置组中,然后基于所述待写入对象和所述对象放置组构建相应的小块写操作;
    小块写操作合并模块,被设置为通过日志队列将所述小块写操作发送提交至日志文件系统,并通过所述日志文件系统将所述小块写操作写入基于哈希的多链表数据结构中,以便对所述小块写操作进行合并获得大块顺序写操作,并将所述大块顺序写操作下刷至回写队列;
    大块顺序写操作保存模块,被设置为将所述回写队列中的所述大块顺序写操作回写至后端文件系统进行保存。
  18. 根据权利要求17的所述的日志存储加速装置,其中,所述小块写操作构建模块,包括:
    操作序列号设定单元,用于获取所述待写入对象对应的待写入数据,并为所述对象放置组设置对象放置组标识以及为所述待写入对象设置对象标识,然后按照预设操作顺序设定当前小块写操作的目标操作序列号;
    构建单元,被设置为以四元组的形式构建依次包含所述对象放置组标识、所述对象标识、所述目标操作序列号和所述待写入数据的小块写操作。
  19. 一种电子设备,包括处理器和存储器;其中,所述处理器执行所述存储器中保存的计算机程序时实现如权利要求1至16任一项所述的日志存储加速方法。
  20. 一种计算机非易失性可读存储介质,被设置为存储计算机程序;其中,所述计算机程序被处理器执行时实现如权利要求1至16任一项所述的日志存储加速方法。
PCT/CN2022/135984 2022-03-02 2022-12-01 一种日志存储加速方法、装置、电子设备及非易失性可读存储介质 WO2023165196A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210195258.6 2022-03-02
CN202210195258.6A CN114281762B (zh) 2022-03-02 2022-03-02 一种日志存储加速方法、装置、设备及介质

Publications (1)

Publication Number Publication Date
WO2023165196A1 true WO2023165196A1 (zh) 2023-09-07

Family

ID=80882182

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/135984 WO2023165196A1 (zh) 2022-03-02 2022-12-01 一种日志存储加速方法、装置、电子设备及非易失性可读存储介质

Country Status (2)

Country Link
CN (1) CN114281762B (zh)
WO (1) WO2023165196A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117407249A (zh) * 2023-12-12 2024-01-16 苏州元脑智能科技有限公司 驱动日志管理方法、装置、系统、电子设备及存储介质
CN118012732A (zh) * 2024-04-08 2024-05-10 合众新能源汽车股份有限公司 一种日志管理方法、装置及电子设备
CN118012732B (zh) * 2024-04-08 2024-06-28 合众新能源汽车股份有限公司 一种日志管理方法、装置及电子设备

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114281762B (zh) * 2022-03-02 2022-06-03 苏州浪潮智能科技有限公司 一种日志存储加速方法、装置、设备及介质
CN116719621B (zh) * 2023-06-01 2024-05-03 上海聚水潭网络科技有限公司 一种针对海量任务的数据回写方法、装置、设备及介质
CN117909296A (zh) * 2024-03-14 2024-04-19 支付宝(杭州)信息技术有限公司 一种基于lsm树的文件合并方法及相关设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101639769A (zh) * 2008-07-30 2010-02-03 国际商业机器公司 在多处理器系统上对数据集进行划分及排序的方法和装置
CN104991745A (zh) * 2015-07-21 2015-10-21 浪潮(北京)电子信息产业有限公司 一种存储系统数据写入方法和系统
US20160378653A1 (en) * 2015-06-25 2016-12-29 Vmware, Inc. Log-structured b-tree for handling random writes
CN114281762A (zh) * 2022-03-02 2022-04-05 苏州浪潮智能科技有限公司 一种日志存储加速方法、装置、设备及介质

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5881311A (en) * 1996-06-05 1999-03-09 Fastor Technologies, Inc. Data storage subsystem with block based data management
CN1499382A (zh) * 2002-11-05 2004-05-26 华为技术有限公司 廉价冗余磁盘阵列系统中高效高速缓存的实现方法
US9852150B2 (en) * 2010-05-03 2017-12-26 Panzura, Inc. Avoiding client timeouts in a distributed filesystem
CN105335098B (zh) * 2015-09-25 2019-03-26 华中科技大学 一种基于存储级内存的日志文件系统性能提高方法
US10642797B2 (en) * 2017-07-28 2020-05-05 Chicago Mercantile Exchange Inc. Concurrent write operations for use with multi-threaded file logging
CN107784121B (zh) * 2017-11-18 2020-04-24 中国人民解放军国防科技大学 一种基于非易失内存的日志文件系统的小写优化方法
CN110019142A (zh) * 2017-12-25 2019-07-16 阿凡达(上海)动力科技有限公司 一种smg-mv分布式万能大数据移动技术
CN111858077A (zh) * 2020-07-15 2020-10-30 济南浪潮数据技术有限公司 一种存储系统中io请求日志的记录方法、装置及设备
CN113360093B (zh) * 2021-06-03 2022-11-08 锐掣(杭州)科技有限公司 内存系统和设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101639769A (zh) * 2008-07-30 2010-02-03 国际商业机器公司 在多处理器系统上对数据集进行划分及排序的方法和装置
US20160378653A1 (en) * 2015-06-25 2016-12-29 Vmware, Inc. Log-structured b-tree for handling random writes
CN104991745A (zh) * 2015-07-21 2015-10-21 浪潮(北京)电子信息产业有限公司 一种存储系统数据写入方法和系统
CN114281762A (zh) * 2022-03-02 2022-04-05 苏州浪潮智能科技有限公司 一种日志存储加速方法、装置、设备及介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117407249A (zh) * 2023-12-12 2024-01-16 苏州元脑智能科技有限公司 驱动日志管理方法、装置、系统、电子设备及存储介质
CN117407249B (zh) * 2023-12-12 2024-03-01 苏州元脑智能科技有限公司 驱动日志管理方法、装置、系统、电子设备及存储介质
CN118012732A (zh) * 2024-04-08 2024-05-10 合众新能源汽车股份有限公司 一种日志管理方法、装置及电子设备
CN118012732B (zh) * 2024-04-08 2024-06-28 合众新能源汽车股份有限公司 一种日志管理方法、装置及电子设备

Also Published As

Publication number Publication date
CN114281762B (zh) 2022-06-03
CN114281762A (zh) 2022-04-05

Similar Documents

Publication Publication Date Title
WO2023165196A1 (zh) 一种日志存储加速方法、装置、电子设备及非易失性可读存储介质
US9817710B2 (en) Self-describing data blocks stored with atomic write
US10437721B2 (en) Efficient garbage collection for a log-structured data store
US7930559B1 (en) Decoupled data stream and access structures
US7720892B1 (en) Bulk updates and tape synchronization
US7640262B1 (en) Positional allocation
US7673099B1 (en) Affinity caching
US10534768B2 (en) Optimized log storage for asynchronous log updates
KR101827239B1 (ko) 분산 데이터 시스템들을 위한 전 시스템에 미치는 체크포인트 회피
KR101914019B1 (ko) 분산 데이터베이스 시스템들을 위한 고속 장애 복구
US9665304B2 (en) Storage system with fast snapshot tree search
US7165059B1 (en) Partial file migration mechanism
US8620962B1 (en) Systems and methods for hierarchical reference counting via sibling trees
WO2013174305A1 (zh) 基于SSD的Key-Value型本地存储方法及系统
US10303564B1 (en) Reduced transaction I/O for log-structured storage systems
US20120290595A1 (en) Super-records
WO2022063059A1 (zh) 键值存储系统的数据管理方法及其装置
US10909091B1 (en) On-demand data schema modifications
US11169968B2 (en) Region-integrated data deduplication implementing a multi-lifetime duplicate finder
US11550479B1 (en) Metadata management in storage systems
Zhang et al. Nvlsm: A persistent memory key-value store using log-structured merge tree with accumulative compaction
WO2022262381A1 (zh) 一种数据压缩方法及装置
WO2023040305A1 (zh) 一种数据备份系统及装置
US11960481B2 (en) Managing lookup operations of a metadata structure for a storage system
WO2022121274A1 (zh) 一种存储系统中元数据管理方法、装置及存储系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22929623

Country of ref document: EP

Kind code of ref document: A1