CN114238226A - NVM (non volatile memory) local file management system and method based on SIMD (single instruction multiple data) instruction - Google Patents
NVM (non volatile memory) local file management system and method based on SIMD (single instruction multiple data) instruction Download PDFInfo
- Publication number
- CN114238226A CN114238226A CN202111583926.4A CN202111583926A CN114238226A CN 114238226 A CN114238226 A CN 114238226A CN 202111583926 A CN202111583926 A CN 202111583926A CN 114238226 A CN114238226 A CN 114238226A
- Authority
- CN
- China
- Prior art keywords
- hash
- data block
- file
- block
- inode
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 68
- 230000015654 memory Effects 0.000 title description 12
- 238000005457 optimization Methods 0.000 claims abstract description 12
- 230000003068 static effect Effects 0.000 claims abstract description 9
- 230000008569 process Effects 0.000 claims description 34
- 238000004364 calculation method Methods 0.000 claims description 22
- 230000002085 persistent effect Effects 0.000 claims description 11
- 230000002688 persistence Effects 0.000 claims description 5
- 238000007726 management method Methods 0.000 abstract description 22
- 238000013461 design Methods 0.000 abstract description 6
- 230000009191 jumping Effects 0.000 description 8
- 230000006870 function Effects 0.000 description 6
- 230000003321 amplification Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 4
- 230000005012 migration Effects 0.000 description 4
- 238000013508 migration Methods 0.000 description 4
- 238000003199 nucleic acid amplification method Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000013523 data management Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
- G06F16/137—Hash-based
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/14—Details of searching files based on file metadata
- G06F16/148—File search processing
- G06F16/152—File search processing using file content signatures, e.g. hash values
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/1847—File system types specifically adapted to static storage, e.g. adapted to flash memory or SSD
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3887—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Library & Information Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a system and a method for managing NVM local files based on SIMD instructions, wherein the system comprises: the directory data block indexing module based on linear hash is used for acquiring the logical block number of a directory where the file is located; the global data block management module based on the static hash is used for converting the logic block number into a physical block number to obtain a data block; and the directory in-block index module is optimized based on SIMD instructions and is used for carrying out in-block accelerated search by utilizing SIMD optimization in the data block. The method comprises the steps of file creation, file linking and file reading applied to the system. The invention designs a new data block and directory management method aiming at the read-write characteristics of NVM equipment, and improves the performance of a file system by combining with SIMD instruction optimization. The NVM local file management system and method based on the SIMD instruction can be widely applied to the field of file management.
Description
Technical Field
The invention relates to the field of file management, in particular to a system and a method for managing NVM (non volatile memory) local files based on SIMD (single instruction multiple data) instructions.
Background
A file system is a common data organization method, and at present, file systems are used as underlying storage systems in various fields. In the face of a novel storage device such as an NVM, how to effectively exert hardware performance according to the read-write characteristics of the new device and in combination with a high-performance SIMD instruction set is a problem that is not yet considered in the file system design at present. The conventional local file system, such as ext4, xfs, only uses DAX mechanism for NVM devices, but does not optimize NVM in the overall design of the file system.
Disclosure of Invention
In order to solve the above technical problems, an object of the present invention is to provide a system and a method for managing NVM local files based on SIMD instructions, which design a new data block and directory management method for the read-write characteristics of NVM devices, and improve the performance of the file system by combining SIMD instruction optimization.
The first technical scheme adopted by the invention is as follows: a SIMD instruction based NVM local file management system comprising:
the directory data block indexing module based on linear hash is used for acquiring the logical block number of a directory where the file is located;
the global data block management module based on the static hash is used for converting the logic block number into a physical block number to obtain a data block;
and the directory in-block index module is optimized based on SIMD instructions and is used for carrying out in-block accelerated search by utilizing SIMD optimization in the data block.
Further, the specific working steps of the linear hash-based directory data block indexing module include:
initializing linear hash;
the splitting process is carried out in round-by-round mode in round-robin mode;
in the process of inserting the directory entry, calculating to obtain a directory block subscript according to the file name;
if the directory block has a free bit, the directory block is directly inserted;
judging that the directory block is full, using the overflow directory block and inserting the directory entry into the overflow directory block;
in the directory indexing process, the hash value is calculated according to the file name to be indexed, and the logical block number is obtained.
Further, the specific working steps of the global data block management module based on the static hash include:
initializing equipment and initializing a global data block hash table according to the capacity of the equipment;
in the process of inserting the data block, the unique block number formed by combining the inode number and the logic block number calculates the hash value of the unique block number through a hash function, and the hash value is taken as the subscript of a hash table to obtain a corresponding hash item;
and searching the data block according to the hash item.
Further, the specific working steps of the SIMD instruction optimization-based directory intra-block index module include:
in the inserting process, calculating a hash value aiming at the file name of each directory entry, and organizing all the hash values in the same block together;
taking the lower 15 bits of the hash value with the remaining lower 15 bits as the file name;
in the searching process in the directory data block, based on the SIMD instruction, the multiple hash values are compared and searched at the same time to obtain the matched relative position, and the position of the directory entry is located.
The second technical scheme adopted by the invention is as follows: a method for NVM local file management based on SIMD instruction includes the file creating steps:
receiving a creation request of an upper layer application for a file name;
based on the variable of the linear hash in the parent directory, performing hash calculation and judgment to obtain the number of the logical data block where the file name is;
performing hash calculation on the inode number and the logic block number combination of the parent directory to obtain a bucket subscript of a data block hash table;
acquiring a corresponding bucket from the data block hash table according to the bucket subscript, and searching the bucket by using an SIMD instruction;
finding out the corresponding data block item, acquiring a physical block number in the data block item, and reading the corresponding data block;
reading a corresponding hash domain from the inside of the data block, calculating hash on the file name, taking the lower 15 bits, setting the storage position to be 1, and searching whether a duplicate name exists in the block by using an SIMD (single instruction multiple data) instruction;
judging that no duplicate name exists, setting the lower 15 position of the hash as 0, setting the storage position as 1, and searching hash items smaller than the value in the block by using the SIMD instruction, namely the corresponding position is idle;
judging that no idle item exists, triggering the linear hash data block splitting, and performing the linear hash data block splitting operation;
and judging that the idle item exists, inserting the information combination of the new file into the corresponding idle position through the idle position in the block, carrying out persistence, updating and persisting the corresponding hash value, and finishing the file creation process.
Further, the method also comprises the following file linking steps:
receiving a link request of an upper layer application for a file name;
judging whether a link marking bit of an inode of a target file to be linked is set as a link file;
judging whether the link file is not the link file, applying for an idle inode position in the link inode, migrating the target file inode data to be linked into the link inode, persisting, setting the link mark position to be true in the original inode, and writing the link mark position into a physical address corresponding to the inode data;
judging that the file is a link file, and performing hash calculation and judgment on the file name based on linear hash to obtain a logical data block number where the file name is;
performing hash calculation on the inode number of the parent directory and the logic block number combination obtained in the previous step to obtain a bucket subscript of a data block hash table;
obtaining a corresponding bucket from the data block hash table according to the bucket subscript, and searching by using a SIMD instruction in the bucket;
if the corresponding data block item is not found, applying for a free data block in the data block bitmap, and inserting the obtained physical address into the corresponding data block item in the bucket;
finding out the corresponding data block item, acquiring a physical block number in the data block item, and reading the corresponding data block;
reading a corresponding hash domain from the inside of the data block, calculating hash on the file name, taking the lower 15 bits, setting the storage position to be 1, and searching whether a duplicate name exists in the block by using an SIMD (single instruction multiple data) instruction;
judging that no duplicate name exists, setting the lower 15 position of the hash as 0, setting the storage position as 1, and searching hash items smaller than the value in the block by using the SIMD instruction, namely the corresponding position is idle;
judging that no idle item exists, triggering the splitting of the linear hash data block, and carrying out the splitting operation of the linear hash data block
Judging that an idle item exists, assembling a new file inode, setting the link mark position of the inode of the new file to be true, and writing the physical address of a target inode to be linked in the link inode;
and combining and inserting the inode and the directory entry information of the new file into the corresponding vacant positions through the vacant positions in the blocks, persisting, updating and persisting the corresponding hash value, and finishing the file linking process.
Further, the method also comprises a file reading step:
receiving a reading request of an upper layer application for a file name;
performing hash calculation and judgment on the file name based on the linear hash to obtain a logical data block number where the file name is;
combining inode numbers and logic block numbers of the parent directory and performing hash calculation to obtain bucket subscripts of a data block hash table;
obtaining a corresponding bucket from the data block hash table according to the bucket subscript, and searching by using a SIMD instruction in the bucket;
finding out the corresponding data item, acquiring a physical block number in the data block item, and reading the corresponding data block;
reading a corresponding hash domain from the inside of the data block, calculating hash on the file name, taking the lower 15 bits, setting the storage position to be 1, and searching whether the block exists or not by using an SIMD (single instruction multiple data) instruction;
judging whether the data block exists, reading corresponding inode and directory entry information from the data block according to the inode position, and comparing file names;
judging whether the file names are the same or not, and judging whether the link marking bit of the corresponding inode is a link file or not;
judging that the link file is the link file, and reading the physical address of the entity inode in the link inode from the link file;
and judging whether the link file is the link file, reading the corresponding inode data, and returning the inode data to the user.
The system and the method have the advantages that: the invention fully considers the characteristic that the performance of NVM equipment is similar to that of DRAM, redesigns the data block management and directory management method, avoids the problem of large amount of memory access in the indexing process caused by using a complex memory data structure in the traditional file system, and designs the data structure in a pertinence way by combining with the SIMD instruction set to perform calculation acceleration optimization.
Drawings
FIG. 1 is a flowchart of the steps of file creation according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating the steps of file linking according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating the steps of reading a file according to an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and the specific embodiments. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.
(1) The directory data block indexing module based on linear hash is used for acquiring the logical block number of a directory where the file is located;
aiming at the directory management of a file system, namely for file operation, firstly, a logical block number in a parent directory needs to be acquired through a file name, and because the directory size has the problem of dynamic change, the invention uses a linear hash dynamic hash method for directory data block indexing to index the directory.
Linear hashing is a dynamic hashing method that can handle the problem of hash table growth with less overhead than static hashing. Linear hash uses one bucket (one bucket corresponds to one directory data block in a file system) during initialization, where level is 0, and next is 0, 2levelFor this round of bucket numbers, next points to the next split bucket subscript, and the splitting process is performed round-by-round in a round-robin fashion. Let Hlevel(filename)=h(filename)%2levelDuring the insertion of the directory entry, H is calculated according to filenamelevelAnd (filling) obtaining a bucket subscript, directly inserting the bucket if the bucket has a null bit, and using an overflow bucket to insert the directory entry into the overflow bucket if the bucket is full. In the continuous insertion process, once a data block overflows and reaches a splitting threshold, splitting a packet corresponding to the next, traversing the filename of each directory entry in the packet in the splitting process, and using Hlevel+1(filename) recalculates the bucket subscript and migrates the directory entry to the corresponding bucket, so that each split only involves data migration of one bucket at most (may also include overflow buckets), and the data of the whole hash table does not need to be involved, which is an advantage of dynamic hash. Adding 1 to next after splitting is completed, if the next is 2levelThen it means that the splitting has completed one round, at this time level is added with 1, next is assigned as 0, and the splitting is performed from the new round. In the directory indexing process, the filename to be indexed is used to calculate the hash value, and idx is Hlevel(filename), there are two cases in this case: 1. if idx is more than or equal to next, the current bucket [ idx ] is represented]Not yet split, so that the directory entry corresponding to the filename must be in bucket [ idx ] at this time]Internally, it can search in the block directly; 2. if idx < next, it indicates that the bucket [ idx ] is at this time]The splitting has occurred and needs to be calculated by Hlevel+1(filename) the specific bucket subscript is obtained.
Using linear hashing for directory data block indexing has several advantages: 1. the linear hash is a dynamic hash strategy, only involves the data migration of a single packet each time in the process of hash table growth, does not involve the whole hash table change like the rehash of the static hash, and is beneficial to the structure of dynamic change like a directory; 2. the linear hash does not need to store a large amount of index data like other dynamic hashes, such as a direct structure of extensible hash, and the linear hash only needs to store two values of level and next, so that the linear hash is convenient to store in a structure with a fixed length, namely an inode, and an additional structure is not needed; 3. the linear hash is also a hash structure, and under normal conditions, the directory data block where the filename is located can be directly located within O (1) time, layer-by-layer search is not needed like a tree structure, and multiple device access and memory access overhead caused by binary search is performed inside each layer.
However, linear hash is to use round-robin to perform sequential splitting, and cannot split an overflowing bucket in real time, so that a temporary overflowing bucket needs to be added to store a currently overflowing directory entry, and the temporary overflowing bucket is not recycled until the bucket splitting is completed.
(2) The global data block management module based on the static hash is used for converting the logic block number into a physical block number to obtain a data block;
after the logical block number in the directory where the file is located is obtained, the logical block number needs to be converted into a physical block number and then read from the device, which is a problem of data management of the file system. In the conventional local file system, an index structure is organized for a single file or directory in data block management, such as an extend tree or a radix tree, so the reason that the organization mode of the tree index structure is generally used is that the size of the single file or directory can be dynamically expanded and contracted, and the problem can be flexibly solved by using the tree structure. However, in combination with the analysis of the above, the introduction of the tree index structure may cause a problem of multiple accesses and memories, and for this situation, the invention provides a hash-based global data block management method. Hash is a flat structure, and can be positioned to the position of a data block at one time by calculating a Hash value in the indexing process, so that the multiple access and storage expenses caused by layer-by-layer search of a tree structure and binary search in each layer are avoided. Meanwhile, in order to solve the problem of dynamic change of the size of a single file or a single directory, the invention uses a global-based data block management method, and because the number of global data blocks is fixed, the fast indexing can be realized by effectively combining a static hash method.
Specifically, the file system initializes a global data block hash table according to the capacity of the device when the device is initialized, the hash table is composed of N hash items, each hash item includes mapping from a global unique data block id to a physical block address, the global unique data block id is composed of an inode number and a logical block number, and the id is referred to as an undid (unique block id) hereinafter. The use of hashing inevitably introduces hash collisions, which are resolved using a linear probing method. In the process of inserting the data block, for an unbid, calculating the hash value h (unbid)% N of the unbid through a hash function, and obtaining a corresponding hash item by taking the hash value as the subscript of a hash table, wherein the two cases are divided at this time: 1. the hash items are null or equal and are directly inserted; 2. and (4) the hash items are not empty and the unbid are not equal, namely, hash conflicts are generated, and the conflicts are solved by using a linear detection method, namely, the hash items are inserted after the current position of the hash table is searched linearly downwards until the first empty hash item is found. Data migration can be effectively avoided by using a linear detection method to solve conflicts, and because the hash table needs to be stored in the NVM for persistence, the data migration can cause multiple read-write operations of the NVM, and the performance is seriously reduced. And the linear detection method is mainly used for increasing reading operation, and the hash table can be cached in a memory to improve the performance, but the problem of needing to access for many times in the searching process can also be caused when the conflict rate is higher. In order to further reduce hash collision, the invention provides two optimization methods: 1. setting a proper load factor when initializing the size of the hash table, assuming that the number of data blocks in the equipment is M, setting the number of hash items of the hash table as N when initializing, wherein M/N represents the load factor, generally, the smaller the load factor, the lower the conflict rate, but the larger the extra space waste of the hash table; 2. the method can effectively relieve the conflict rate, but traversal search overhead can be introduced inside the buckets, so as to avoid the problem, the invention provides an optimization method combined with SIMD instructions, a single instruction is used for searching m items inside the buckets at the same time, and the performance is also ensured while the conflict rate is reduced.
(3) And the directory in-block index module is optimized based on SIMD instructions and is used for carrying out in-block accelerated search by utilizing SIMD optimization in the data block.
For the problem of searching specific entries and inodes in a directory block read from a device, a traditional local file system such as ext4 directly uses traversal search inside the directory block, which brings a lot of access and storage operations, and each time a whole block of data block is read into a memory, there is a problem of reading and amplifying. Aiming at the problem, the invention provides a directory data block index method based on SIMD instruction optimization, wherein the structure in the directory data block is as shown in FIG. 2: in the inserting process, a hash value is calculated for the filename of each directory entry, all the hash values in the same block are organized together, each hash value has 16 bits, the upper 1 bit is an exist bit and indicates whether the bit is occupied, and the remaining hash values with the lower 15 bits being filenames take the lower 15 bits. In the searching process in the directory data block, by using the characteristics of the SIMD instruction, one instruction can simultaneously compare and search a plurality of hash values, the position of the directory item can be directly positioned after the matched relative position is obtained, only two reading operations of once reading the hash value and once reading the directory item are needed, the byte addressing characteristic of the NVM (non-volatile memory) equipment can be utilized, the whole data block does not need to be read into the memory for searching every time, only the corresponding data segment needing to be read is read in the searching process, and the reading and amplifying problems are avoided.
In order to further improve the system performance, the invention proposes a scheme of merging the directory entry and the file inode. Since the entries and the inodes are stored separately in the conventional local file system, in the file path searching process, the corresponding directory entry needs to be searched through the directory index by using the file name, the corresponding inode number is read from the entry, and finally the corresponding inode is read from the inode data block through the inode number. Therefore, the merging of the entry and the inode can effectively reduce one-time reading and writing of NVM equipment, accelerate the creating and searching process, and the data volume of the entry and the inode is very small, so that the NVM write amplification problem can be caused by separate writing, the merging together can reach the minimum write unit of the NVM, and the write amplification problem is avoided. But combining the entry with the inode presents a significant challenge to the link operation, so the present invention introduces a link inode specifically for managing the inode data storage and management with links. When the file is linked, the link inode takes over and stores the inode in the data block of the link inode, and then all the physical addresses containing the inode in the content of the directory entry linked to the inode can be directly searched. Although the storage mode of the file containing the link is degraded to be that the entry is separated from the inode, the solution still enjoys the advantages of performance improvement brought by the combination of the entry and the inode and the avoidance of NVM write amplification for the most common file in the file system.
A local file management method of NVM based on SIMD instruction includes the steps of layout of device address space, file creation, file linking and file reading.
The invention divides the continuous address space of NVM into following parts: super block, rootode, link inode, data block table, data block bitmap, data block. The super block stores file system metadata information, the root inode is root directory inode data, the link inode is specially used for storing link file inode data, the data block table is a data block hash table, the data block bitmap is a data block bitmap, and the data block is a data block area.
A file creating step:
FIG. 1 shows a specific process of file creation
Step 1: receiving a creation request of an upper layer application for the file filename, and skipping to the step 2;
step 2: based on the next and level variables of the linear hash in the parent directory, carrying out hash calculation and judgment on the filename: ln ═ Hlevel(filename)≥nextHlevel(filename):Hlevel+1(filename), obtaining the number ln of the logical data block where the file name is, and skipping to the step 3;
and 3, step 3: performing hash calculation on the inode number of the parent directory and the logic block number combination (ino + ln) obtained in the step 2 to obtain a bucket subscript of the data block hash table, and skipping to the step 4;
and 4, step 4: obtaining a corresponding bucket from the subscript obtained by the calculation in the step 3 to the data block hash table, searching by using a SIMD (single instruction multiple data) instruction in the bucket, directly jumping to the step 6 if the corresponding data block item can be searched, and jumping to the step 5 if the corresponding data block item can not be searched;
and 5, step 5: applying for an idle data block in the data block bitmap, inserting the obtained physical block number into the data block item corresponding to the packet obtained in the step 3, and skipping to the step 6;
and 6, step 6: acquiring a physical block number in a data block item, reading a corresponding data block, and skipping to the step 7;
and 7, step 7: reading a corresponding hash domain from the inside of the data block, calculating the hash of the filename and taking the lower 15 bits, setting the exist position as 1, using a SIMD instruction to search whether a duplicate name exists in the block (if an overflow data block exists, searching in the overflow block is needed), if so, directly returning a duplicate name error, otherwise, skipping to the step 8;
and 8, step 8: setting the 15 lower positions of the hash as 0, setting the exist positions as 1, searching hash items smaller than the value in the block by using a SIMD instruction, namely, the corresponding positions are idle (if an overflow data block exists, the overflow data block needs to be searched again), if no idle item exists, skipping to the step 9, otherwise, skipping to the step 10;
step 9: triggering the linear hash data block splitting, carrying out linear hash data block splitting operation, and returning to the step 2 after the splitting is finished;
step 10: and combining and inserting inode, entry and other information of the new file into the corresponding vacant positions through the vacant positions in the blocks obtained in the step 8, persisting, updating and persisting the corresponding hash value, and thus finishing the file creation process.
File linking step:
FIG. 2 shows a specific flow of file linking
Step 1: receiving a link request of the upper layer application for the filename, and skipping to the step 2;
step 2: judging whether an is _ link flag of an inode of a target file to be linked is set as a link file, if not, skipping to the step 3, otherwise skipping to the step 4;
and 3, step 3: applying for an idle inode position in a link inode, migrating and persisting the inode data of a target file to be linked into the link inode, setting an is _ link flag to be true in an original inode, writing the is _ link flag into a physical address corresponding to the inode data, and skipping to the step 4;
and 4, step 4: based on next and level variables of linear hash in the parent directory of the filename, carrying out hash calculation and judgment on the filename: ln ═ Hlevel(filename)≥nextHlevel(filename):Hlevel+1(filename), obtaining the number ln of the logical data block where the file name is, and skipping to the step 5;
and 5, step 5: performing hash calculation on the inode number of the parent directory and the logic block number combination (ino + ln) obtained in the previous step to obtain a bucket subscript of a data block hash table, and skipping to step 6;
and 6, step 6: aiming at the subscript obtained by the last step of calculation, obtaining a corresponding bucket in a data block hash table, searching by using a SIMD (single instruction multiple data) instruction in the bucket, if the corresponding data block item can be searched, directly jumping to the step 8, otherwise, jumping to the step 7;
and 7, step 7: applying for an idle data block in the data block bitmap, inserting the obtained physical address into the data block item corresponding to the packet obtained in the step 5, and skipping to the step 8;
and 8, step 8: acquiring a physical block number in a data block item, reading a corresponding data block, and skipping to the step 9;
step 9: reading a corresponding hash domain from the inside of the data block, calculating the hash of the filename and taking the lower 15 bits, setting the exist position as 1, using a SIMD instruction to search whether a duplicate name exists in the block (if an overflow data block exists, searching in the overflow block is needed), if so, directly returning a duplicate name error, otherwise, skipping to the step 10;
step 10: setting the 15 lower positions of the hash as 0, setting the exist positions as 1, searching hash items smaller than the value in the block by using a SIMD instruction, namely, the corresponding positions are idle (if an overflow data block exists, the overflow data block needs to be searched again), if no idle item exists, jumping to a step 11, otherwise, jumping to a step 12;
and 11, step 11: triggering the linear hash data block splitting, carrying out linear hash data block splitting operation, and returning to the step 4 after the splitting is finished;
step 12: assembling a new file inode, setting an is _ link flag of the inode of the new file to be true, writing a physical address of a target inode to be linked in the link inode, and jumping to the step 13;
step 13: and combining and inserting the inode, the entry and other information of the new file into the corresponding vacant positions through the obtained vacant positions in the block, performing persistence, updating and performing persistence on the corresponding hash value, and thus finishing the file linking process.
A file reading step:
FIG. 3 shows a specific process of reading a file
Step 1: receiving a reading request of an upper layer application for the file filename, and skipping to the step 2;
step 2: based on the next and level variables of the linear hash in the parent directory, carrying out hash calculation and judgment on the filename: ln ═ Hlevel(filename)≥nextHlevel(filename):Hlevel+1(filename), obtaining the number ln of the logical data block where the file name is, and skipping to the step 3;
and 3, step 3: performing hash calculation on the inode number of the parent directory and the logic block number combination (ino + ln) obtained in the step 2 to obtain a bucket subscript of the data block hash table, and skipping to the step 4;
and 4, step 4: aiming at the subscript obtained by the calculation in the step 3, obtaining a corresponding bucket in a data block hash table, searching by using a SIMD (single instruction multiple data) instruction in the bucket, if the corresponding data block item can be searched, directly jumping to the step 5, otherwise, returning to the absence of the file;
and 5, step 5: acquiring a physical block number in a data block item, reading a corresponding data block, and skipping to the step 6;
and 6, step 6: reading a corresponding hash domain from the inside of the data block, calculating the hash of the filename and taking the lower 15 bits, setting the exist position as 1, using a SIMD instruction to search whether the block exists (if the overflow data block exists, searching in the overflow block is needed), if not, directly returning that the file does not exist, otherwise, skipping to the step 7;
and 7, step 7: reading corresponding inode and entry information from the data block according to the inode position obtained in the last step, comparing the filenames, if the filenames are different, returning that the file does not exist, and otherwise, skipping to the step 8;
and 8, step 8: judging whether the is _ link flag corresponding to the inode is a link file, if so, skipping to the step 9, otherwise, skipping to the step 10;
step 9: reading the physical address of the entity inode in the link inode from the link file;
step 10: and reading the corresponding inode data and returning the data to the user.
The contents in the system embodiments are all applicable to the system embodiments, the functions specifically realized by the method embodiments are the same as the system embodiments, and the beneficial effects achieved by the method embodiments are also the same as the beneficial effects achieved by the method embodiments.
The invention redesigns the file system data block index and directory index structure by using a hash-based method, the flattened hash structure can ensure that the index is carried out within O (1) time complexity, and multiple access operations caused by layer-by-layer searching and binary searching of each layer are not needed like the traditional tree structure; in addition, the SIMD instruction set characteristics are fully utilized in the structure, the method comprises the steps of carrying out multi-path searching inside the bucket by using the SIMD instruction in the data block index structure, and carrying out searching and matching on a plurality of hashes at one time by using the SIMD instruction inside the directory data block, so that the system performance is improved.
In the aspect of data structure design, the index structure and specific data are stored separately by fully aiming at the characteristics of NVM equipment, and the byte addressing function of the NVM can be used for targeted reading during reading, so that the problem of reading and amplifying is avoided; and the entry and the inode data are merged and stored, so that the problem of write amplification is avoided when the data are written, the times of reading and writing equipment are effectively reduced, and the file creating and reading delay is reduced.
An apparatus for NVM local file management based on SIMD instructions:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement a SIMD instruction-based NVM local file management method as described above.
The contents in the above method embodiments are all applicable to the present apparatus embodiment, the functions specifically implemented by the present apparatus embodiment are the same as those in the above method embodiments, and the advantageous effects achieved by the present apparatus embodiment are also the same as those achieved by the above method embodiments.
A storage medium having stored therein instructions executable by a processor, the storage medium comprising: the processor-executable instructions, when executed by a processor, are for implementing a SIMD instruction-based NVM local file management method as described above.
The contents in the above method embodiments are all applicable to the present storage medium embodiment, the functions specifically implemented by the present storage medium embodiment are the same as those in the above method embodiments, and the advantageous effects achieved by the present storage medium embodiment are also the same as those achieved by the above method embodiments.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (7)
1. A SIMD instruction based NVM local file management system comprising:
the directory data block indexing module based on linear hash is used for acquiring the logical block number of a directory where the file is located;
the global data block management module based on the static hash is used for converting the logic block number into a physical block number to obtain a data block;
and the directory in-block index module is optimized based on SIMD instructions and is used for carrying out in-block accelerated search by utilizing SIMD optimization in the data block.
2. The system according to claim 1, wherein the linear hash-based directory data block index module comprises:
initializing linear hash;
the splitting process is carried out in round-by-round mode in round-robin mode;
in the process of inserting the directory entry, calculating to obtain a directory block subscript according to the file name;
if the directory block has a free bit, the directory block is directly inserted;
judging that the directory block is full, using the overflow directory block and inserting the directory entry into the overflow directory block;
in the directory indexing process, the hash value is calculated according to the file name to be indexed, and the logical block number is obtained.
3. The local file management system for NVM based on SIMD instruction as claimed in claim 2, wherein said static hash-based global data block management module includes the specific working steps of:
initializing equipment and initializing a global data block hash table according to the capacity of the equipment;
in the process of inserting the data block, the unique block number formed by combining the inode number and the logic block number calculates the hash value of the unique block number through a hash function, and the hash value is taken as the subscript of a hash table to obtain a corresponding hash item;
and searching the data block according to the hash item.
4. The SIMD instruction based NVM local file management system according to claim 3, wherein said SIMD instruction optimization based directory-in-block index module specific work steps include:
in the inserting process, calculating a hash value aiming at the file name of each directory entry, and organizing all the hash values in the same block together;
taking the lower 15 bits of the hash value with the remaining lower 15 bits as the file name;
in the searching process in the directory data block, based on the SIMD instruction, the multiple hash values are compared and searched at the same time to obtain the matched relative position, and the position of the directory entry is located.
5. A method for managing NVM local files based on SIMD instructions is characterized in that it includes the file creating steps:
receiving a creation request of an upper layer application for a file name;
based on the variable of the linear hash in the parent directory, performing hash calculation and judgment to obtain the number of the logical data block where the file name is;
performing hash calculation on the inode number and the logic block number combination of the parent directory to obtain a bucket subscript of a data block hash table;
acquiring a corresponding bucket from the data block hash table according to the bucket subscript, and searching the bucket by using an SIMD instruction;
finding out the corresponding data block item, acquiring a physical block number in the data block item, and reading the corresponding data block;
reading a corresponding hash domain from the inside of the data block, calculating hash on the file name, taking the lower 15 bits, setting the storage position to be 1, and searching whether a duplicate name exists in the block by using an SIMD (single instruction multiple data) instruction;
judging that no duplicate name exists, setting the lower 15 position of the hash as 0, setting the storage position as 1, and searching hash items smaller than the value in the block by using the SIMD instruction, namely the corresponding position is idle;
judging that no idle item exists, triggering the linear hash data block splitting, and performing the linear hash data block splitting operation;
and judging that the idle item exists, inserting the information combination of the new file into the corresponding idle position through the idle position in the block, carrying out persistence, updating and persisting the corresponding hash value, and finishing the file creation process.
6. The method of claim 5, further comprising the step of file linking:
receiving a link request of an upper layer application for a file name;
judging whether a link marking bit of an inode of a target file to be linked is set as a link file;
judging whether the link file is not the link file, applying for an idle inode position in the link inode, migrating the target file inode data to be linked into the link inode, persisting, setting the link mark position to be true in the original inode, and writing the link mark position into a physical address corresponding to the inode data;
judging that the file is a link file, and performing hash calculation and judgment on the file name based on linear hash to obtain a logical data block number where the file name is;
performing hash calculation on the inode number of the parent directory and the logic block number combination obtained in the previous step to obtain a bucket subscript of a data block hash table;
obtaining a corresponding bucket from the data block hash table according to the bucket subscript, and searching by using a SIMD instruction in the bucket;
if the corresponding data block item is not found, applying for a free data block in the data block bitmap, and inserting the obtained physical address into the corresponding data block item in the bucket;
finding out the corresponding data block item, acquiring a physical block number in the data block item, and reading the corresponding data block;
reading a corresponding hash domain from the inside of the data block, calculating hash on the file name, taking the lower 15 bits, setting the storage position to be 1, and searching whether a duplicate name exists in the block by using an SIMD (single instruction multiple data) instruction;
judging that no duplicate name exists, setting the lower 15 position of the hash as 0, setting the storage position as 1, and searching hash items smaller than the value in the block by using the SIMD instruction, namely the corresponding position is idle;
judging that no idle item exists, triggering the splitting of the linear hash data block, and carrying out the splitting operation of the linear hash data block
Judging that an idle item exists, assembling a new file inode, setting the link mark position of the inode of the new file to be true, and writing the physical address of a target inode to be linked in the link inode;
and combining and inserting the inode and the directory entry information of the new file into the corresponding vacant positions through the vacant positions in the blocks, persisting, updating and persisting the corresponding hash value, and finishing the file linking process.
7. The method of claim 6, further comprising a file reading step:
receiving a reading request of an upper layer application for a file name;
performing hash calculation and judgment on the file name based on the linear hash to obtain a logical data block number where the file name is;
combining inode numbers and logic block numbers of the parent directory and performing hash calculation to obtain bucket subscripts of a data block hash table;
obtaining a corresponding bucket from the data block hash table according to the bucket subscript, and searching by using a SIMD instruction in the bucket;
finding out the corresponding data item, acquiring a physical block number in the data block item, and reading the corresponding data block;
reading a corresponding hash domain from the inside of the data block, calculating hash on the file name, taking the lower 15 bits, setting the storage position to be 1, and searching whether the block exists or not by using an SIMD (single instruction multiple data) instruction;
judging whether the data block exists, reading corresponding inode and directory entry information from the data block according to the inode position, and comparing file names;
judging whether the file names are the same or not, and judging whether the link marking bit of the corresponding inode is a link file or not;
judging that the link file is the link file, and reading the physical address of the entity inode in the link inode from the link file;
and judging whether the link file is the link file, reading the corresponding inode data, and returning the inode data to the user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111583926.4A CN114238226A (en) | 2021-12-22 | 2021-12-22 | NVM (non volatile memory) local file management system and method based on SIMD (single instruction multiple data) instruction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111583926.4A CN114238226A (en) | 2021-12-22 | 2021-12-22 | NVM (non volatile memory) local file management system and method based on SIMD (single instruction multiple data) instruction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114238226A true CN114238226A (en) | 2022-03-25 |
Family
ID=80761505
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111583926.4A Pending CN114238226A (en) | 2021-12-22 | 2021-12-22 | NVM (non volatile memory) local file management system and method based on SIMD (single instruction multiple data) instruction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114238226A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115576899A (en) * | 2022-12-09 | 2023-01-06 | 深圳市木浪云科技有限公司 | Index construction method and device and file searching method and device |
WO2024254899A1 (en) * | 2023-06-12 | 2024-12-19 | 中山大学 | File system metadata management method, terminal device and computer storage medium |
-
2021
- 2021-12-22 CN CN202111583926.4A patent/CN114238226A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115576899A (en) * | 2022-12-09 | 2023-01-06 | 深圳市木浪云科技有限公司 | Index construction method and device and file searching method and device |
CN115576899B (en) * | 2022-12-09 | 2023-03-21 | 深圳市木浪云科技有限公司 | Index construction method and device and file searching method and device |
WO2024254899A1 (en) * | 2023-06-12 | 2024-12-19 | 中山大学 | File system metadata management method, terminal device and computer storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110083601B (en) | Key value storage system-oriented index tree construction method and system | |
CN110825748B (en) | High-performance and easily-expandable key value storage method by utilizing differentiated indexing mechanism | |
US11899641B2 (en) | Trie-based indices for databases | |
US6240418B1 (en) | Database apparatus | |
US9871727B2 (en) | Routing lookup method and device and method for constructing B-tree structure | |
JP6764359B2 (en) | Deduplication DRAM memory module and its memory deduplication method | |
JPS59146356A (en) | Key access type file apparatus | |
US20240028560A1 (en) | Directory management method and system for file system based on cuckoo hash and storage medium | |
US20220027349A1 (en) | Efficient indexed data structures for persistent memory | |
CN111522507B (en) | A low-latency file system address space management method, system and medium | |
CN114238226A (en) | NVM (non volatile memory) local file management system and method based on SIMD (single instruction multiple data) instruction | |
US7987205B1 (en) | Integrated search engine devices having pipelined node maintenance sub-engines therein that support database flush operations | |
CN105608214A (en) | Method for searching under-surveillance license plate numbers fast | |
CN110134335A (en) | A kind of RDF data management method, device and storage medium based on key-value pair | |
CN112732725B (en) | NVM (non volatile memory) hybrid memory-based adaptive prefix tree construction method, system and medium | |
CN116955520A (en) | Method and device for creating high-performance read-only file system | |
CN115935020A (en) | Graph data storage method and device | |
CN108804571B (en) | Data storage method, device and equipment | |
CN113392040B (en) | Address mapping method, device and equipment | |
US7197620B1 (en) | Sparse matrix paging system | |
CN113326262B (en) | Data processing method, device, equipment and medium based on key value database | |
US7953721B1 (en) | Integrated search engine devices that support database key dumping and methods of operating same | |
CN112269784A (en) | Hash table structure based on hardware realization and inserting, inquiring and deleting method | |
CN112035380B (en) | Data processing method, device and equipment and readable storage medium | |
CN110110034A (en) | A kind of RDF data management method, device and storage medium based on figure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |