CN117851287A - LBA full caching method and system based on additional writing distributed storage - Google Patents

LBA full caching method and system based on additional writing distributed storage Download PDF

Info

Publication number
CN117851287A
CN117851287A CN202311727000.7A CN202311727000A CN117851287A CN 117851287 A CN117851287 A CN 117851287A CN 202311727000 A CN202311727000 A CN 202311727000A CN 117851287 A CN117851287 A CN 117851287A
Authority
CN
China
Prior art keywords
chunk
lba
addressing
storage
item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311727000.7A
Other languages
Chinese (zh)
Inventor
林媛
张宗全
刘啸滨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Cloud Technology Co Ltd
Original Assignee
Tianyi Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi Cloud Technology Co Ltd filed Critical Tianyi Cloud Technology Co Ltd
Priority to CN202311727000.7A priority Critical patent/CN117851287A/en
Publication of CN117851287A publication Critical patent/CN117851287A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a LBA full caching method and system based on additional writing distributed storage, wherein the method comprises the following steps: decomposing the shared block storage addressing space into a plurality of blocks to hash into a storage cluster; the data structure in the full cache is defined as a linear table; the item stores an index of the chunk in the chunk table, and the content addressing LBA address identified by the item is utilized; when the address is overlapped, the original pointing position is an idle position, when one chunk data is overlapped by the LBA or is integrated and moved away, the chunk is marked as the idle position in the chunk_table linear table, and a new chunk_id is stored when the new chunk is written; the chunk is integrated such that the total size of the chunk_table linear table does not exceed a threshold. The invention enables distributed shared storage to buffer LBA addressing information of all shared storage devices while consuming less storage space.

Description

LBA full caching method and system based on additional writing distributed storage
Technical Field
The invention relates to the technical field of computer storage, in particular to the technical field of distributed storage; and more particularly, to a LBA full cache method and system based on additional write distributed storage.
Background
The distributed shared storage refers to a large-scale shared storage system formed by connecting heterogeneous devices together by utilizing network communication based on a distributed algorithm. In distributed shared storage, a mass storage system is shared by multiple storage devices, each of which maintains its own LBA (Logical Block Address) in the mass storage system. Logical block addresses (Logical Block Address, LBAs) are a common mechanism for describing the blocks of data on a computer storage device, and are typically used in auxiliary memory devices such as hard disks. The LBA may mean an address of a certain data block or a data block to which a certain address points.
In a distributed shared memory system, each user's shared memory device has a separate addressing space for addressing the physical location of the shared device logical block data in the distributed memory, typically by dividing the shared device address space into fixed length units, each unit resembling a page in a kernel page table, called a virtual block. Accordingly, a physical hard disk can be considered as a fixed-length unit, each unit being a physical chunk. In order to record where virtual blocks map to physical chunk, the shared storage system needs to maintain an addressing data structure. The main purpose of the addressing data structure is to preserve virtual block to physical block address translations so that shared memory can find the actual physical storage location of the data. The addressing data structure is described in the distributed shared memory system as: lba value. lba is a number that describes logically contiguous, fixed-length blocks of data using monotonically increasing sequence numbers, value is the physical memory address of the logically contiguous, fixed-length blocks of data, typically: chunk_id, chunk_offset. In a distributed shared storage system, LBA logical addressing mappings are typically stored as KV in a database.
To improve the access performance of the LBA addressing mapping relationship, the following manner of organizing LBAs is generally adopted: LBA variable length storage, or LBA fixed length storage. In two storage modes, in order to improve LBA access performance, LBA information is typically cached in a memory, where a data structure of the cache memory generally adopts a linked list structure and a tree structure elimination algorithm, for example, LRU, LFU, and the like. The algorithm for caching LBAs specifically is generally as follows:
the cache content is as follows:
Key=LBA,
Value={physical_chunk,physical_chunk_offset}
Left=left_point
Right=right_point
wherein value contains physical location information of LBA logical block, left/right contains left and right pointers of data structure.
The query writing algorithm is described as:
1. and using the LBA as a key, and finding out a corresponding value in the memory. The temporal complexity of the query is related to the choice of data structure.
2. And copying the value, and updating or further inquiring.
3. And performing trim operation of the cache data according to the overall occupation condition of the memory.
However, due to the selection of the data structure, the amount of data in the LBA memory cache may consume a portion of the memory space to maintain the data structure in addition to the valid data. Secondly, the storage space consumed by the physical location information of the LBA logic block in the memory is related to the cluster addressing space, which is usually quite large, so that the traditional LBA caching algorithm cannot cache all LBA information, and the cache elimination algorithm is usually adopted, thus being unfavorable for the integrity and reliability of data caching.
Therefore, there is a need to design a LBA full cache method, system and electronic device based on additional writing distributed storage, so as to solve the above-mentioned problems and pain problems in the prior art.
Disclosure of Invention
In view of this, the present invention aims to provide a method, a system and a device for full cache of LBA based on additional writing distributed storage, which improve the performance of LBA addressing of a shared device in the distributed shared storage, and reduce the use of memory space while caching LBA in full memory; the shared memory LBA is cached by using the kernel multi-level page table, the addressing performance of the LBA is improved, the memory storage space occupation is reduced, the chunk_table layer is abstracted by using the characteristic of additional writing, the addressing space of a block of logical LBA physical address is reduced, the memory space of the LBA full memory cache is saved, the distributed shared memory can consume smaller memory space and cache the LBA addressing information of all shared memory devices, and the integrity of the data cache is improved.
The invention provides an LBA full caching method based on additional writing distributed storage, which comprises the following steps:
s1, decomposing a shared block storage addressing space applied by a user into a plurality of blocks according to a fixed size, hashing the blocks into a storage cluster, and defining each block as a region; splitting the addressing space of each region into a plurality of logic blocks according to a fixed size;
s2, defining a data structure of the addressing space of the region in the full cache as a linear table of a kernel page table structure, wherein each sub-item of the linear table corresponds to the addressing structure of each logic block;
s3, storing an item of an addressing structure to address an index of a physical chunk in a chunk table, and reducing the range of an LBA addressing space by utilizing the physical address of a content addressing LBA identified by the item after the LBA is addressed to the item;
according to the invention, item items of an addressing structure of a logic block are not directly stored in a chunk, but indexes of the chunk in a chunk table are stored, and after LBAs are addressed to the item items, physical addresses of the content addressing LBAs identified by the item items are utilized to save storage space and reduce the range of the LBA addressing space;
s4, when the LBA addressing is subjected to one-time overwriting, the original pointed position of the chunk is an idle position, and when the data of one physical chunk is overwritten by the LBA or is integrated and removed, the mark of the chunk in the chunk_table linear table is the idle position; the free location stores the new chunk address when writing the new chunk: chunk_id;
the LBA full cache chunk_table is a reusable linear table; when all the data of one chunk address is moved away, the chunk address is marked as an idle position in the chunk_table linear table, and the idle position is used for storing a new chunk_id when the new chunk address is written in;
s5, based on the characteristic of cluster additional writing, when one chunk is full, updating a new chunk to write, searching available free positions in a chunk_table linear table, and storing the new chunk_id;
when partial data in the chunk is covered to release free space, the chunk is integrated, so that a plurality of lba item items continuously point to one chunk_id in the chunk_table, and the total size of the chunk_table linear table does not exceed a threshold value.
In a region's linear table, unwritten portions will not reallocate memory storage space.
Further, the linear table of the region in the step S2 is created as a multi-level storage structure, and the sub-table of each level is a linear table, in the multi-level storage structure, an upper level stores the address and state information of a lower level.
The multi-level memory structure of the region's linear table may be 2 levels, or more.
Further, the block number LBA of the logic block in the step S2 is a subscript of the linear table, and when addressing inquiry is initiated to a region, the subscript of the linear table is utilized, and the block number LBA of the logic block is not stored in reign.
Further, the structure of the region's linear table is described as:
region_x=[lg1,lg2,lg3,……,lgn]
wherein region_x represents any one region after hashing, and lg represents an addressing structure of a logic block;
since region_x is a linear table structure that contains a plurality of logical blocks, the logical block addressing structure in a region need only be queried and accessed using the block number of a logical block as the index of the linear table, without storing the logical block number as the addressing key.
Further, the multi-level storage structure of the region linear table is 2 levels, and the lba_table comprises a direction_table and a lower leaf linear table item_table;
each element in the directory_table points to a leaf linear table, and each element contains state information of the leaf linear table pointed by redundant bit field identifiers; when no element is written in the leaf linear table, the leaf linear table does not allocate memory.
Further, the leaf linear table comprises a plurality of item elements, and each leaf linear table is fixed in length; the item_table contains information for addressing physical chunk; the chunk_table contains all chunk_ids referenced by this region; in the item_table, the index idx points to the chunk_id belonging to the LBA write in the chunk_table.
The invention also provides an LBA full cache system based on the additional write distributed storage, which executes the LBA full cache method based on the additional write distributed storage, and is characterized by being deployed in a distributed shared storage cluster.
The invention also provides LBA full-cache equipment based on the additional write distributed storage, which carries the LBA full-cache system based on the additional write distributed storage.
The present invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the LBA full cache method based on append write distributed storage as described above.
The invention also provides a computer device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of the LBA full cache method based on the additional write distributed storage when executing the program.
Compared with the prior art, the invention has the beneficial effects that:
the LBA full-caching method, system and equipment based on the additional writing distributed storage improve the LBA addressing performance of the shared equipment in the distributed shared storage, and reduce the use of memory space while caching the LBA in the full memory; the method has the advantages that the thought of the kernel multi-level page table is utilized to carry out full memory caching on the shared memory LBA, the addressing performance of the LBA is improved, the occupation of memory storage space is reduced, the characteristic of additional writing is utilized, the chunk_table layer is abstracted, the addressing space of a block of logical LBA physical address is reduced, the memory space of the full memory caching of the LBA is saved, the distributed shared memory can cache the LBA addressing information with shared memory equipment while consuming smaller memory space, and the integrity and the reliability of data caching are improved.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention.
In the drawings:
FIG. 1 is a schematic diagram of a two-level region full memory cache structure according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a structure for finding an idle position in a linear table according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a distributed shared storage cluster according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating a configuration of an LBA full cache device based on additional write distributed storage according to an embodiment of the present invention;
FIG. 5 is a flow chart of reading and writing a region LBA full cache according to an embodiment of the present invention;
FIG. 6 is a flow chart of a LBA full caching method based on additional write distributed storage according to the present invention;
fig. 7 is a schematic diagram of a computer device according to an embodiment of the invention.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of systems and products consistent with some aspects of the present disclosure as detailed in the appended claims.
The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in this disclosure to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
Embodiments of the present invention are described in further detail below.
The embodiment of the invention provides an LBA full caching method based on additional writing distributed storage, which is shown in FIG. 6 and comprises the following steps:
s1, decomposing a shared block storage addressing space applied by a user into a plurality of blocks according to a fixed size, hashing the blocks into a storage cluster, and defining each block as a region; splitting the addressing space of each region into a plurality of logic blocks according to a fixed size;
s2, defining a data structure of the addressing space of the region in the full cache as a linear table of a kernel page table structure, wherein each sub-item of the linear table corresponds to the addressing structure of each logic block;
the linear table of region is created as a multi-level storage structure in which an upper level stores addresses of lower levels and state information.
The block number offset of the logic block is the subscript of the linear table, and the subscript of the linear table is utilized when addressing inquiry is initiated to the region, so that the block number offset of the logic block is not stored in the reign.
The structure of the region's linear table is described as:
region_x=[lg1,lg2,lg3,……,lgn]
wherein region_x represents any one region after hashing, and lg represents an addressing structure of a logic block;
since region_x is a linear table structure that contains a plurality of logical blocks, the logical block addressing structure in a region need only be queried and accessed using the block number of a logical block as the index of the linear table, without storing the logical block number as the addressing key.
In this embodiment, the multi-level storage structure of the region linear table is 2 levels, and the lba_table includes a direction_table and a lower leaf linear table item_table;
each element in the directory_table points to a leaf linear table, and each element contains state information of the leaf linear table pointed by redundant bit field identifiers; when no element is written in the leaf linear table, the leaf linear table does not allocate memory.
The leaf linear table comprises a plurality of item elements, and each leaf linear table has a fixed length; the item_table contains information for addressing physical chunk; the chunk_table contains all chunk_ids referenced by this region; in the item_table, the index idx points to the chunk_id belonging to the LBA write in the chunk_table.
S3, storing an item of an addressing structure to address an index of a physical chunk in a chunk table, and reducing the range of an LBA addressing space by utilizing the physical address of a content addressing LBA identified by the item after the LBA is addressed to the item;
in the embodiment, the item of the addressing structure of the logic block is not directly stored in the chunk, but the index of the chunk in the chunk table is stored, and after the LBA is addressed to the item, the physical address of the content addressing LBA identified by the item is utilized to save the storage space and reduce the range of the LBA addressing space;
s4, when the LBA addressing is subjected to one-time overwriting, the original pointed position of the chunk is an idle position, and when the data of one physical chunk is overwritten by the LBA or is integrated and removed, the mark of the chunk in the chunk_table linear table is the idle position; the free location stores the new chunk address when writing the new chunk: chunk_id;
the LBA full cache chunk_table is a reusable linear table; when all the data of one chunk address is moved away, the chunk address is marked as an idle position in the chunk_table linear table, and the idle position is used for storing a new chunk_id when the new chunk address is written in;
s5, based on the characteristic of cluster additional writing, when one chunk is full, updating a new chunk to write, searching available free positions in a chunk_table linear table, and storing the new chunk_id;
when partial data in the chunk is covered to release free space, the chunk is integrated, so that a plurality of lba item items continuously point to one chunk_id in the chunk_table, and the total size of the chunk_table linear table does not exceed a threshold value.
In a region's linear table, unwritten portions will not reallocate memory storage space.
The embodiment of the invention also provides an LBA full cache system based on the additional write distributed storage, which executes the LBA full cache method based on the additional write distributed storage, and is characterized by being deployed in a distributed shared storage cluster.
As shown in fig. 3, an embodiment of the present invention further provides an LBA full cache device based on the append write distributed storage, which carries the LBA full cache system based on the append write distributed storage as described above.
In another preferred embodiment of the present invention, as shown in FIG. 4, the system device with LBA full cache may also be a stand alone device. The device is responsible for caching the LBA address information which is already allocated by all system devices in the distributed shared storage cluster. When the shared device starts to use the distributed shared storage cluster, its LBA addressing information will be cached in the device cluster with LBA full cache. When the sharing device reads and writes, firstly, according to the written LBA range, calculating which device of the device cluster with the LBA full cache the LBA is cached in, reading the LBA physical position information from the device, and then interacting with the cluster system device through the physical position information. In a distributed shared storage cluster, all chunks in the cluster are allocated by a centralized chunk allocation manager.
Application example
First, the shared block storage addressing space applied by the user is hashed into the storage cluster according to a fixed size, and each hashed block is defined as a region. The addressing space of each region is divided into logic blocks according to a fixed size, the data structure of the addressing space of the region in the full cache is defined as a linear table, and each sub-item of the linear table is the addressing structure of the logic block. When addressing a region, the number of blocks (offset) of the logical block is the index of the linear table, so in a reign, the number of blocks (offset) of the logical block is not stored.
The structure of the region linear table is described as:
region_x=[lg1,lg2,lg3,……,lgn]
where region_x represents any one region after hashing and lg represents the addressing structure of the logical block. Since region_x is a linear table structure that contains a plurality of logical blocks, the logical block addressing structure in a region need only be queried and accessed using the block number of a logical block as the index of the linear table, without storing the logical block number as the addressing key.
Second, like the kernel page table structure, the region linear table is a multi-level memory structure, and the sub-tables of each level are linear tables, so that the unwritten portions of the region linear table will not reallocate memory storage space.
The multi-level memory structure of the region linear table may be 2 levels, or more. In the multi-level memory structure, an upper level is used to store information such as addresses, states, and the like of lower levels.
For example, in a 2-level storage structure, the uppermost storage structure is defined as:
directory_table=[dir_item_1,dir_item_2,……,dir_item_n]
wherein, each item of the directory_table is a directory entry, each item may be a structure dir_item, and dir_item may be defined as:
Struct{
item_table*item_table;
}
the item_table is a pointer of a segment consisting of logic blocks in the region, and when the item_table is written, the item_table is a null pointer, namely memory space allocation cannot occur.
lba is the size of the offset addressed in a region that is aligned to the left of the region logical block, i.e., the number of blocks the offset maps to the region logical block. When a region is addressed by a multi-level linear table, lba addresses the entry of the corresponding logical block, a recursive addressing process from the root node to the leaf node of the multi-level linear table.
An lba, region multi-level linear table addressing algorithm at each level of linear table is defined as:
idx=(lba&mask)>>shift
item=item_table[idx]
wherein mask is the mask of each stage in the region multi-stage linear table, shift is the number of bits of all the linear table masks of the lower stage. When the index address of a lba in a level linear table item_table is calculated, the index address idx can be used to directly fetch the next level of table information from the item_table (when the item_table is the last-last level, the fetched item is a leaf node, that is, the addressed item of a lba).
Again, after the lba addresses the item, the item addresses the physical address of the lba with the content identified by the item, in order to save storage space, the item does not directly store the physical chunk, but stores the index of the physical chunk in the chunk table, thereby reducing the scope of the lba addressing space.
The physical chunk is a storage space distributed on a certain physical machine and a certain physical disk in the distributed shared storage, and one chunk can be a continuous physical storage space on the physical disk or a logical continuous space. In a distributed shared storage system, as the cluster size increases, it can be appreciated that the number of chunks increases linearly. The addressing space for the cluster count is large, and the addressing space for the cluster_id is generally identified by the uint64.
An lba to physical chunk address is defined as:
key:lba
value:{chunk_id,chunk_offset}
in a region multi-level linear table, the number of chunk that can be referenced is limited, so another chunk_table is abstracted for reducing the addressing space of the region multi-level linear table lba.
The chunk_table of a region multi-level linear table is defined as:
chunk_table=[chunk_id_1,chunk_id_70,chunk_id_500,……,chunk_id_N]
in the chunk_table, each item is a chunk_id number in the distributed shared storage, which is defined as the uint64.
A region multistage linear table comprises a chunk_table, and a lba leaf item of the region multistage linear table stores the following contents:
item:
{ck_idx;
chunk_lba;
}
wherein ck_idx is the index number of the address chunk_id in the chunk_table, and chunk_lba is the block of the physical chunk, and chunk_offset is not directly stored, so that the memory usage can be reduced.
The addressing algorithm to address the chunk is defined as:
chunk_id=chunk_table[item.ck_idx]
chunk_offset=item.chunk_lba*logical_block_size
the logical_block_size is the size of one logical block in the region.
Referring to fig. 1, a two-level region full memory cache structure is shown in the above figure, where the lba_table includes a directory_table, and lower leaf linear tables, where each element in the directory_table points to a leaf linear table. And there may be redundant bit fields to identify the state information of the leaf linear table to which it points, the leaf linear table not allocating memory when no element is written in the leaf linear table. One leaf linear table contains a plurality of item elements, each leaf linear table being fixed in length. The item_table contains information for addressing the physical chunk. The chunk_table contains all chunk ids referenced by this region. In the item_table, idx points to the chunkid belonging to the lba write in the chunktable.
The capacity of a chunk should be an integer multiple of the region minimum logical block (block size of lba), and when the capacity of a chunk is an integer multiple of the region minimum logical block, multiple item entries in the item_table point to one chunk in the chunk_table at the same time. Therefore, the capacity of the chunk_table can be significantly reduced. Based on the characteristic of cluster additional writing, when a region is full of a chunk, the chunk is replaced.
Finally, the lba full cache chunk_table is a reusable linear table. When the lba is subjected to one-time overwriting, the original pointed position of the chunk is an idle position. When all data of a chunk is moved, the chunk can be marked as an idle position in the chunk_table linear table, and the idle position is used for storing a new chunk_id when writing the new chunk.
FIG. 5 is a flow chart of the present application for reading and writing a region LBA full cache.
Based on the characteristic of cluster additional writing, when one chunk is full, a new chunk is replaced for writing, and when the new chunk is replaced, an available free position is searched in the chunk_table linear table, and a new chunk_id is stored. Or when there is partial data in the chunk, the system will integrate the chunk as there is an overwrite freeing free space, so that a considerable number of lba item continue to point to one chunk_id in the chunk_table. Thus, the total size of the chunk_table linear table can be ensured not to exceed a threshold value.
As shown in FIG. 2, the timer can find out which slots are free in the chunk_table linear table in advance, and can store a new chunk_id. the timer also has the function of garbage collection, and can clear the chunk_id without reference relation in the chunk_table, and the chunk_id is identified as an available slot.
When the update key is:
1. chunk_table update: the method is to search the writable slot (idx) of the chunk_table by updating the carried chunk_id, wherein the slot searching mode is as follows
a) Whether the chunk_id is equal to the active of the region or not, if so, the chunk_id is indicated as the additional write of chunk, and the chunk_table is returned directly without updating.
b) The chunk_id is not equal to the active of the region, which means that the region replaces chunk writing, a slot chunk_id_idx is fetched from the collection queue of the timer to update, the active of the region is modified to the current chunk, and the slot of the chunk: idx.
2. Addressing the lba table with lba, finding the location of the lba corresponding item, updating the chunk_lba, and the slot of chunk id: idx.
For example: when writing lba 3, { chunk_id:1, chunk_off:20 }:
(1) Whether the active chunk_id is 1 or not is determined, and if the active chunk_id is 1, whether chunk is not replaced or the chunk_id is additionally written to 1 is described, and the chunk_table does not need to be updated.
(2) Addressing in the lba table by using the lba addressing to find the item corresponding to the lba. And updating the chunk_lba and chunk_idx.
The embodiment of the invention also provides a computer device, and fig. 7 is a schematic structural diagram of the computer device provided by the embodiment of the invention; referring to fig. 7 of the drawings, the computer apparatus includes: an input system 23, an output system 24, a memory 22, and a processor 21; the memory 22 is configured to store one or more programs; when the one or more programs are executed by the one or more processors 21, the one or more processors 21 are caused to implement the LBA full cache method based on the append write distributed storage as provided by the above embodiments; wherein the input system 23, the output system 24, the memory 22 and the processor 21 may be connected by a bus or otherwise, for example in fig. 7.
The memory 22 is used as a readable storage medium of a computing device, and can be used for storing a software program and a computer executable program, and is used for storing program instructions corresponding to the LBA full cache method based on the additional write distributed storage according to the embodiment of the invention; the memory 22 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for functions; the storage data area may store data created according to the use of the device, etc.; in addition, memory 22 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device; in some examples, memory 22 may further comprise memory located remotely from processor 21, which may be connected to the device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input system 23 is operable to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the device; output system 24 may include a display device such as a display screen.
The processor 21 executes various functional applications of the device and data processing by running software programs, instructions and modules stored in the memory 22, i.e., implements the LBA full cache method based on the append write distributed storage described above.
The computer device provided by the above embodiment can be used for executing the LBA full cache method based on the additional write distributed storage, and has corresponding functions and beneficial effects.
Embodiments of the present invention also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are for performing the LBA full cache method based on append write distributed storage as provided by the above embodiments, the storage medium being any of various types of memory devices or storage devices, the storage medium comprising: mounting media such as CD-ROM, floppy disk or tape systems; computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, lanbas (Rambus) RAM, etc.; nonvolatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage); registers or other similar types of memory elements, etc.; the storage medium may also include other types of memory or combinations thereof; in addition, the storage medium may be located in a first computer system in which the program is executed, or may be located in a second, different computer system, the second computer system being connected to the first computer system through a network (such as the internet); the second computer system may provide program instructions to the first computer for execution. Storage media includes two or more storage media that may reside in different locations (e.g., in different computer systems connected by a network). The storage medium may store program instructions (e.g., embodied as a computer program) executable by one or more processors.
Of course, the storage medium containing the computer executable instructions provided in the embodiments of the present invention is not limited to the LBA full cache method based on the additional write distributed storage described in the above embodiments, and may also perform the related operations in the LBA full cache method based on the additional write distributed storage provided in any embodiment of the present invention.
Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.
The foregoing description is only of the preferred embodiments of the invention and is not intended to limit the invention; various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. The LBA full caching method based on the additional writing distributed storage is characterized by comprising the following steps of:
s1, decomposing a shared block storage addressing space applied by a user into a plurality of blocks according to a fixed size, hashing the blocks into a storage cluster, and defining each block as a region; splitting the addressing space of each region into a plurality of logic blocks according to a fixed size;
s2, defining a data structure of the addressing space of the region in the full cache as a linear table of a kernel page table structure, wherein each sub-item of the linear table corresponds to the addressing structure of each logic block;
s3, storing an item of an addressing structure to address an index of a physical chunk in a chunk table, and reducing the range of an LBA addressing space by utilizing the physical address of a content addressing LBA identified by the item after the LBA is addressed to the item;
s4, when the LBA addressing is subjected to one-time overwriting, the original pointed position of the chunk is an idle position, and when the data of one physical chunk is overwritten by the LBA or is integrated and removed, the mark of the chunk in the chunk_table linear table is the idle position; the free location stores the new chunk address when writing the new chunk: chunk_id;
s5, based on the characteristic of cluster additional writing, when one chunk is full, updating a new chunk to write, searching available free positions in a chunk_table linear table, and storing the new chunk_id;
when partial data in the chunk is covered to release free space, the chunk is integrated, so that a plurality of lba item items continuously point to one chunk_id in the chunk_table, and the total size of the chunk_table linear table does not exceed a threshold value.
2. The LBA full cache method based on the append write distributed storage according to claim 1, wherein the linear table of regions of the S2 step is created as a multi-level storage structure in which sub-tables of each level are linear tables, and an upper level stores addresses and state information of a lower level.
3. The LBA full cache method based on the append write distributed storage according to claim 1, wherein the block number LBA of the logical block in the step S2 is a subscript of the linear table, and when an addressing query is initiated to a region, the subscript of the linear table is used, and the block number LBA of the logical block is not stored in a reign.
4. The LBA full cache method based on append write distributed storage of claim 3, wherein the structure of the region's linear table is described as:
region_x=[lg1,lg2,lg3,……,lgn]
where region_x represents any one region after hashing and lg represents the addressing structure of the logical block.
5. The LBA full cache method based on additional write distributed storage according to claim 2, wherein the multi-level storage structure of the region linear table is 2 levels, and the lba_table includes a direction_table and a lower leaf linear table item_table;
each element in the directory_table points to a leaf linear table, and each element contains state information of the leaf linear table pointed by redundant bit field identifiers; when no element is written in the leaf linear table, the leaf linear table does not allocate memory.
6. The LBA full cache method based on append write distributed storage of claim 5, wherein the leaf linear table contains a plurality of item elements, each leaf linear table is fixed in length; the item_table contains information for addressing physical chunk; the chunk_table contains all chunk_ids referenced by this region; in the item_table, the index idx points to the chunk_id belonging to the LBA write in the chunk_table.
7. The LBA full cache system based on the additional write distributed storage, which performs the LBA full cache method based on the additional write distributed storage according to any one of claims 1 to 6, wherein the LBA full cache method is deployed in a distributed shared storage cluster.
8. The LBA full cache device based on the additional write distributed storage, wherein the LBA full cache system based on the additional write distributed storage as claimed in claim 7 is carried.
9. A computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the steps of the LBA full cache method based on append write distributed storage of any of claims 1-6.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor performs the steps of the LBA full cache method based on append write distributed storage as claimed in any one of claims 1 to 6 when the program is executed by the processor.
CN202311727000.7A 2023-12-15 2023-12-15 LBA full caching method and system based on additional writing distributed storage Pending CN117851287A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311727000.7A CN117851287A (en) 2023-12-15 2023-12-15 LBA full caching method and system based on additional writing distributed storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311727000.7A CN117851287A (en) 2023-12-15 2023-12-15 LBA full caching method and system based on additional writing distributed storage

Publications (1)

Publication Number Publication Date
CN117851287A true CN117851287A (en) 2024-04-09

Family

ID=90528086

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311727000.7A Pending CN117851287A (en) 2023-12-15 2023-12-15 LBA full caching method and system based on additional writing distributed storage

Country Status (1)

Country Link
CN (1) CN117851287A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105940396A (en) * 2013-12-27 2016-09-14 谷歌公司 Hierarchical chunking of objects in a distributed storage system
US20200183836A1 (en) * 2018-12-10 2020-06-11 International Business Machines Corporation Metadata for state information of distributed memory
CN111309261A (en) * 2020-02-16 2020-06-19 西安奥卡云数据科技有限公司 Physical data position mapping method on single node in distributed storage system
CN112486403A (en) * 2019-09-12 2021-03-12 伊姆西Ip控股有限责任公司 Method, apparatus and computer program product for managing metadata of storage objects
CN114647388A (en) * 2022-05-24 2022-06-21 杭州优云科技有限公司 High-performance distributed block storage system and management method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105940396A (en) * 2013-12-27 2016-09-14 谷歌公司 Hierarchical chunking of objects in a distributed storage system
US20200183836A1 (en) * 2018-12-10 2020-06-11 International Business Machines Corporation Metadata for state information of distributed memory
CN112486403A (en) * 2019-09-12 2021-03-12 伊姆西Ip控股有限责任公司 Method, apparatus and computer program product for managing metadata of storage objects
CN111309261A (en) * 2020-02-16 2020-06-19 西安奥卡云数据科技有限公司 Physical data position mapping method on single node in distributed storage system
CN114647388A (en) * 2022-05-24 2022-06-21 杭州优云科技有限公司 High-performance distributed block storage system and management method

Similar Documents

Publication Publication Date Title
US10282122B2 (en) Methods and systems of a memory controller for hierarchical immutable content-addressable memory processor
CN110825748B (en) High-performance and easily-expandable key value storage method by utilizing differentiated indexing mechanism
JP6205650B2 (en) Method and apparatus utilizing non-uniform hash function to place records in non-uniform access memory
US10175896B2 (en) Incremental snapshot based technique on paged translation systems
US10725903B2 (en) Unified paging scheme for dense and sparse translation tables on flash storage systems
US9501421B1 (en) Memory sharing and page deduplication using indirect lines
US20140164730A1 (en) System and methods for managing storage space allocation
US10235287B2 (en) Efficient management of paged translation maps in memory and flash
WO2009033419A1 (en) A data caching processing method, system and data caching device
US20220121564A1 (en) Translation lookup and garbage collection optimizations on storage system with paged translation table
US7177980B2 (en) Cache storage system and method
US20180004651A1 (en) Checkpoint Based Technique for Bootstrapping Forward Map Under Constrained Memory for Flash Devices
Chen et al. A unified framework for designing high performance in-memory and hybrid memory file systems
CN108804571B (en) Data storage method, device and equipment
US20240241873A1 (en) Tree-based data structure
CN117851287A (en) LBA full caching method and system based on additional writing distributed storage
US11829341B2 (en) Space-efficient persistent hash table data structure
CN116737664B (en) Efficient index organization method of object-oriented embedded database
US11556470B2 (en) Cache management for search optimization
Luan TreeKV: efficient garbage collection and range query for key-value separated LSM-stores
Mishra A survey of LSM-Tree based Indexes, Data Systems and KV-stores
CN117420955A (en) Method for optimizing LSM-Tree performance under hybrid storage by using B+ Tree
WO2020154803A1 (en) Cache management for search optimization
CN115525668A (en) KVS data processing method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination