CN118567577A - Data access method and device based on distributed block storage and electronic equipment - Google Patents
Data access method and device based on distributed block storage and electronic equipment Download PDFInfo
- Publication number
- CN118567577A CN118567577A CN202411036978.3A CN202411036978A CN118567577A CN 118567577 A CN118567577 A CN 118567577A CN 202411036978 A CN202411036978 A CN 202411036978A CN 118567577 A CN118567577 A CN 118567577A
- Authority
- CN
- China
- Prior art keywords
- data block
- file
- physical disk
- data
- stored
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 238000004891 communication Methods 0.000 claims description 4
- 230000003321 amplification Effects 0.000 abstract description 4
- 238000003199 nucleic acid amplification method Methods 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 17
- 230000008569 process Effects 0.000 description 11
- 238000007726 management method Methods 0.000 description 5
- 230000001960 triggered effect Effects 0.000 description 5
- 101100226364 Arabidopsis thaliana EXT1 gene Proteins 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000013467 fragmentation Methods 0.000 description 2
- 238000006062 fragmentation reaction Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000002955 isolation Methods 0.000 description 2
- 239000010410 layer Substances 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000005055 memory storage Effects 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 239000008358 core component Substances 0.000 description 1
- 230000000593 degrading effect Effects 0.000 description 1
- 239000002355 dual-layer Substances 0.000 description 1
- 229910021389 graphene Inorganic materials 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
- G06F3/0611—Improving I/O performance in relation to response time
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0643—Management of files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0683—Plurality of storage devices
- G06F3/0689—Disk arrays, e.g. RAID, JBOD
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The specification provides a data access method, a data access device, electronic equipment and a storage medium based on distributed block storage. The method comprises the following steps: receiving a data block reading request, wherein the data block reading request comprises a file identifier of a file to which a first data block to be read from a physical disk belongs and a data block identifier of the first data block; searching a storage address of a secondary index tree corresponding to the file to which the first data block belongs in the memory in the primary index tree according to the file identification of the file to which the first data block belongs; searching a storage address of the first data block in a physical disk in a secondary index tree corresponding to a file to which the first data block belongs according to the data block identification of the first data block; and reading the first data block from the physical disk according to the storage address of the first data block in the physical disk. Therefore, the quick access of the data block is realized by reducing the I/O times of the disk, and the problem of write amplification is relieved.
Description
Technical Field
One or more embodiments of the present disclosure relate to the field of block storage technologies, and in particular, to a data access method, apparatus, electronic device, and storage medium based on distributed block storage.
Background
In a distributed block storage system, a file system is generally used as an underlying storage engine, such as EXT4 (Fourth generation Extended FILE SYSTEM), XFS (eXtended FILE SYSTEM ), and the like. Because the file system is designed for file storage, and some files have smaller volumes, such as 1KB, a flexible index mode such as a multi-stage linear table is generally adopted to adapt to the management requirements of files with different volumes, and the storage space utilization rate is improved.
In some embodiments, to access a file in a physical disk that uses three levels of indirect addressing, an inode that first needs to make a disk I/O access to the file to obtain the address of the first inter-level indirect block; then accessing the physical disk according to the address of the first inter-stage connection block, and performing a second disk I/O to read the content of the first inter-stage connection block, namely the address of the second inter-stage connection block; then accessing the physical disk according to the address of the second inter-stage connection block, and performing a third disk I/O to read the content of the second inter-stage connection block, namely the address of the third inter-stage connection block; then accessing the physical disk according to the address of the third inter-stage connection block, and performing a fourth-time disk I/O to read the content of the third inter-stage connection block, namely the storage address of the file on the physical disk; and finally, carrying out the fifth disk I/O access to the file according to the storage address of the file on the physical disk.
Disclosure of Invention
The specification provides a data access method based on distributed block storage, wherein a primary index tree and a secondary index tree are stored in a memory, the primary index tree is used for searching a storage address of the secondary index tree in the memory, and each secondary index tree is used for searching a storage address of a data block included in a file stored in a physical disk in the physical disk; wherein, a node of the primary index tree stores a file identifier of a stored file in a physical disk and a storage address of a secondary index tree corresponding to the file in a memory; a node of the secondary index tree corresponding to the file stores a data block identifier of a data block included in the file and a storage address of the data block in a physical disk; the method comprises the following steps:
Receiving a data block reading request, wherein the data block reading request comprises a file identifier of a file to which a first data block to be read from a physical disk belongs and a data block identifier of the first data block;
Searching a storage address of a secondary index tree corresponding to the file to which the first data block belongs in the memory in the primary index tree according to the file identification of the file to which the first data block belongs;
searching a storage address of the first data block in a physical disk in a secondary index tree corresponding to a file to which the first data block belongs according to the data block identification of the first data block;
And reading the first data block from the physical disk according to the storage address of the first data block in the physical disk.
The specification also provides a data access device based on distributed block storage, which is characterized in that a primary index tree and a secondary index tree are stored in a memory of the device, the primary index tree is used for searching a storage address of the secondary index tree in the memory, and each secondary index tree is used for searching a storage address of a data block included in a file stored in a physical disk in the physical disk; wherein, a node of the primary index tree stores a file identifier of a stored file in a physical disk and a storage address of a secondary index tree corresponding to the file in a memory; a node of the secondary index tree corresponding to the file stores a data block identifier of a data block included in the file and a storage address of the data block in a physical disk; the device comprises:
the receiving unit is used for receiving a data block reading request, wherein the data block reading request comprises a file identifier of a file to which a first data block which needs to be read from a physical disk belongs and a data block identifier of the first data block;
The first searching unit is used for searching a storage address of a secondary index tree corresponding to the file to which the first data block belongs in the memory in the primary index tree according to the file identifier of the file to which the first data block belongs;
The second searching unit is used for searching a storage address of the first data block in a physical disk in a secondary index tree corresponding to a file to which the first data block belongs according to the data block identification of the first data block;
and the reading unit is used for reading the first data block from the physical disk according to the storage address of the first data block in the physical disk.
The specification also provides an electronic device, which comprises a communication interface, a processor, a memory and a bus, wherein the communication interface, the processor and the memory are connected with each other through the bus;
the memory stores machine readable instructions and the processor performs the method by invoking the machine readable instructions.
The present specification also provides a machine-readable storage medium storing machine-readable instructions that, when invoked and executed by a processor, implement the above-described methods.
Through the embodiment of the specification, two layers of index trees, namely a primary index tree and a secondary index tree, are designed in a memory; the first-level index tree is responsible for rapidly positioning the storage address of the second-level index tree corresponding to the specific file stored in the physical disk in the memory, and establishes direct mapping between the file identification and the storage address of the second-level index tree in the memory, and each file stored in the physical disk has a unique node in the first-level index tree; the secondary index tree is responsible for rapidly positioning the storage address of a specific data block in a specific file stored in the physical disk, and for each data block included in each file stored in the physical disk, the secondary index tree establishes direct mapping between a data block identifier and the storage address of the data block in the physical disk.
By the method, the data block is retrieved without multi-stage indirect addressing on the disk, the number of disk I/O operations is reduced, the speed of searching the storage address of the data block in the physical disk is improved, and therefore the quick access of the data block is realized, the performance bottleneck caused by multi-stage indirect addressing and frequent disk I/O in the traditional scheme is effectively solved, and the method is suitable for a distributed block storage environment requiring high-speed data access.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present disclosure, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a multi-level linear table indexing scheme, shown in an exemplary embodiment;
FIG. 2 is a schematic diagram of an index tree stored in memory according to an exemplary embodiment;
FIG. 3 is a flow chart illustrating a method of data access based on block storage in accordance with an exemplary embodiment;
FIG. 4 is a diagram illustrating a memory index tree generation scheme in accordance with an exemplary embodiment;
FIG. 5 is a schematic diagram of a memory storage data block bitmap according to an exemplary embodiment;
FIG. 6 is a hardware configuration diagram of an electronic device shown in an exemplary embodiment;
FIG. 7 is a block diagram illustrating a data access apparatus based on distributed block storage in accordance with an exemplary embodiment.
Detailed Description
In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
It should be noted that: in other embodiments, the steps of the corresponding method are not necessarily performed in the order shown and described in this specification. In some other embodiments, the method may include more or fewer steps than described in this specification. Furthermore, individual steps described in this specification, in other embodiments, may be described as being split into multiple steps; while various steps described in this specification may be combined into a single step in other embodiments.
In a distributed block storage system, a file system is generally used as an underlying storage engine, such as EXT4 (Fourth generation Extended FILE SYSTEM), XFS (eXtended FILE SYSTEM ), and the like. Because the file system is designed for file storage, and some files have smaller volumes, such as 1KB, a flexible index mode such as a multi-stage linear table is generally adopted to adapt to the management requirements of files with different volumes, and the storage space utilization rate is improved.
Referring to fig. 1, fig. 1 is a schematic diagram illustrating a multi-level linear table indexing scheme according to an exemplary embodiment. As shown in FIG. 1, the picture illustrates the structure of an inode of the EXT4 file system and its relationship to data blocks, and located at the far left is the inode of EXT4, which is one of the core components of the file system, for storing the basic attributes and metadata of the file, such as permissions, owners, modification times, etc., and containing pointers to the file data blocks. The inode contains direct blocks, which may point directly to portions of the data blocks of the file, which may directly store a portion of the content of the file. The index node also comprises a first-level indirect addressing, a second-level indirect addressing and a third-level indirect addressing, wherein the first-level indirect addressing is used for pointing to a first-level block, and the first-level block points to a data block; the second level inter-level addressing is used for pointing to the first level block, the first level block points to the second level block, and the second level block points to the data block; tertiary indirect addressing is used to point to primary blocks, which in turn point to secondary blocks, which point to tertiary blocks, which point to data blocks that store the actual file content. Through such multi-level indirect addressing, the file system can expand the management of storage space as needed, supporting file volumes ranging from a few KB to a few TB or even larger. This mechanism enables the file system to efficiently utilize storage space while maintaining quick access to the file contents, particularly when dealing with large files or file fragmentation problems.
For example, a certain file is stored in a physical disk in a multi-stage linear table index mode, in order to access the file, disk I/O needs to be performed once, and an index node of the file is accessed to obtain an address of a first inter-stage block; then accessing the physical disk according to the address of the first inter-stage connection block, and performing a second disk I/O to read the content of the first inter-stage connection block, namely the address of the second inter-stage connection block; then accessing the physical disk according to the address of the second inter-stage connection block, and performing a third disk I/O to read the content of the second inter-stage connection block, namely the address of the third inter-stage connection block; then accessing the physical disk according to the address of the third inter-stage connection block, and performing a fourth-time disk I/O to read the content of the third inter-stage connection block, namely the storage address of the file on the physical disk; and finally, carrying out the fifth disk I/O access to the file according to the storage address of the file on the physical disk.
However, such multi-level linear table indexing approaches may suffer from limitations, particularly in block storage systems, where block storage typically serves applications that operate directly on data blocks, such as virtual machine disks, database storage, etc., which are more prone to fast locating and reading/writing particular data blocks, and where data access typically requires high throughput and low latency, while multi-level linear table indexing approaches are not efficient enough to handle fine-grained block-level requests. Based on the foregoing analysis, the multi-level linear index involves multiple disk I/O operations (especially when accessing large files using multi-level indirect addressing), each of which requires time, which can significantly increase data access latency, degrading overall system performance. Particularly, the performance bottleneck is more obvious in a random read-write intensive application scene.
In view of this, the present disclosure aims to propose a technical solution for data block index deployment that can facilitate searching and reduce disk I/O.
In some embodiments, a primary index tree and a secondary index tree are stored in the memory, where the primary index tree is used to find a storage address of the secondary index tree in the memory, and each secondary index tree is used to find a storage address of a data block included in a file stored in the physical disk; wherein, a node of the primary index tree stores a file identifier of a stored file in a physical disk and a storage address of a secondary index tree corresponding to the file in a memory; a node of the secondary index tree corresponding to the file stores a data block identifier of a data block included in the file and a storage address of the data block in a physical disk; firstly, receiving a data block reading request, wherein the data block reading request comprises a file identifier of a file to which a first data block to be read from a physical disk belongs and a data block identifier of the first data block; then, according to the file identification of the file to which the first data block belongs, searching a storage address of a second-level index tree corresponding to the file to which the first data block belongs in the memory in the first-level index tree; further, according to the data block identification of the first data block, searching a storage address of the first data block in a physical disk in a secondary index tree corresponding to a file to which the first data block belongs; and finally, reading the first data block from the physical disk according to the storage address of the first data block in the physical disk.
For example, referring to fig. 2, fig. 2 is a schematic diagram of an index tree stored in a memory according to an exemplary embodiment. As shown in FIG. 2, the primary index tree has 4 nodes, and the 4 nodes are respectively in one-to-one correspondence with the stored files file4, file5, file7 and file9 in the physical disk, and the node corresponding to the stored file9 in the physical disk contains the file identifier of the stored file9, such as file_id9, and the storage address of the secondary index tree corresponding to the stored file9 in the memory, such as 0x00401000. The stored file9 currently stores 4 data blocks in the physical disk, where the 4 data blocks respectively correspond to four nodes contained in the secondary index tree one by one, and each node contains a data block identifier of a data block, such as an offset address of the data block in the file (the data block belongs to the 8 th block in the stored file 9), and a storage address of the data block in the physical disk (such as the data block is stored in the 5 th sector of the 10 th track). When a data block reading request is received, a file identification file_id9 of a file to which a first data block belongs and a data block identification block_id8 of the first data block, which are contained in the data block reading request, can be obtained, then the storage address 0x00401000 of a secondary index tree corresponding to the file9 to which the first data block belongs in a memory can be searched in a primary index tree according to the file identification file_id9 of the file to which the first data block belongs, then the storage address of the first data block in a physical disk is searched in the secondary index tree corresponding to the file9 to which the first data block belongs according to the data block identification block_id8 of the first data block (for example, the data block is stored in a5 th sector of a 10 th track), and finally the first data block is read in the physical disk according to the storage address of the first data block in the physical disk.
In some embodiments, a primary index tree and at least one secondary index tree may be stored in memory. Wherein, the number of the nodes contained in the first-level index tree can be equal to the number of the nodes contained in the second-level index tree.
Therefore, in the technical scheme of the specification, two layers of index trees, namely a primary index tree and a secondary index tree, are designed in the memory; the first-level index tree is responsible for rapidly positioning the storage address of the second-level index tree corresponding to the specific file stored in the physical disk in the memory, specifically, the first-level index tree establishes direct mapping between the file identification and the storage address of the second-level index tree in the memory, and each file stored in the physical disk has a unique node in the first-level index tree; the secondary index tree is responsible for rapidly positioning the storage address of a specific data block in a specific file stored in the physical disk, and specifically, for each data block included in each file stored in the physical disk, the secondary index tree establishes direct mapping between a data block identifier and the storage address of the data block in the physical disk. When a data block needs to be read, a storage engine of the block storage system firstly utilizes a file identifier contained in a data block reading request to find a storage address of a secondary index tree of a corresponding file in a memory in a primary index tree, then finds a storage address of the data block to be read in a physical disk in the secondary index tree according to the data block identifier contained in the data block reading request, and finally reads the data block from the physical disk based on the address.
By the mode, the storage engine designed in the specification enables the retrieval of the data blocks to be free from multi-stage indirect addressing on the disk, reduces the number of disk I/O operations, and improves the speed of searching the storage addresses of the data blocks in the physical disk, so that the quick access of the data blocks is realized, the performance bottleneck caused by multi-stage indirect addressing and frequent disk I/O in the traditional scheme is effectively solved, and the storage engine is suitable for a distributed block storage environment requiring high-speed data access.
The following describes the present specification with reference to specific application scenarios by means of specific embodiments.
Referring to fig. 3, fig. 3 is a flow chart illustrating a method of data access based on block storage according to an exemplary embodiment. The method may perform the steps of:
step 302: and receiving a data block reading request, wherein the data block reading request comprises a file identifier of a file to which a first data block to be read from a physical disk belongs and a data block identifier of the first data block.
For example, as shown in fig. 2, when a data block read request is received, a file identification file_id9 of a file to which a first data block to be read from a physical disk included in the data block read request belongs and a data block identification block_id8 of the first data block may be acquired.
Wherein, the data access refers to the process of interaction between a program or a user and data in a computer system, and comprises operations such as reading and writing of the data. The file identifier is used to uniquely identify different files, and the file identifier is usually a file ID or a file name, which is not limited in this specification for the specific content of the file identifier. When a storage engine of the block storage system receives a data block reading request, the target file is quickly positioned through the file identification. Within each file stored on a physical disk, data is typically divided into fixed-size data blocks for management and storage, each of which has a unique identifier, i.e., a data block identifier. In the above embodiment, according to the data block identifier of the data block to be read included in the data block read request, the block storage engine can precisely locate specific data in the file (for example, the data block identifier block_id8 refers to the 8 th data block in the file), which is important for directly accessing and efficiently reading the specific data block.
It should be noted that, fig. 2 is only an exemplary diagram illustrating a data block identified as block_id8 included in the access file9, and no particular limitation is imposed on other data blocks included in the access file9 or data blocks included in the access file. For example, the data block included in the file9 may be identified as the data block of block_id3, or the data block included in the file4 may be identified as the data block of block_id5, which will not be described in detail herein.
Step 304: and searching the storage address of the secondary index tree corresponding to the file to which the first data block belongs in the memory in the primary index tree according to the file identification of the file to which the first data block belongs.
For example, as shown in fig. 2, the storage address 0x00401000 of the secondary index tree corresponding to the file9 to which the first data block belongs in the memory may be searched in the primary index tree according to the file identification file_id9 of the file to which the first data block belongs.
Wherein, the first-level index tree and the second-level index tree belong to index trees, and the index tree is a data structure used for rapid searching, inserting and deleting operations. The index tree types of the primary index tree and the secondary index tree may be binary tree, red black tree, etc., and the present specification is not limited to this specific index tree type.
Step 306: and searching a storage address of the first data block in a physical disk in a secondary index tree corresponding to a file to which the first data block belongs according to the data block identification of the first data block.
For example, as shown in fig. 2, according to the data block identifier block_id8 of the first data block, the storage address of the first data block in the physical disk is searched in the secondary index tree corresponding to the file9 to which the first data block belongs (for example, the data block is stored in the 5 th sector of the 10 th track).
Taking the example that the primary index tree and the secondary index tree are both red black trees, the searching process of the red black tree generally comprises the following three steps: firstly, searching a root node of a red and black tree, if the root node is empty, indicating that no element to be searched exists, and searching fails; if the key to be searched is not empty, comparing the key to be searched with the key of the current node, and if the key to be searched is equal to the key of the current node, indicating that the target is found and the search is successful; if the key to be searched is smaller than the key of the current node, moving to the left subtree, setting the left child of the current node as a new current node, and continuously comparing the key to be searched with the key of the current node; if the key to be searched is larger than the key of the current node, moving to the right subtree, setting the right child of the current node as a new current node, and continuing to compare the key to be searched with the key of the current node. The above-described comparison and movement process may be recursively performed until a matching key is found, or the current node is empty (indicating that there are no keys to be found in the tree).
Step 308: and reading the first data block from the physical disk according to the storage address of the first data block in the physical disk.
For example, the first data block is read from the physical disk based on its storage address in the physical disk (e.g., the data block is stored in the 5 th sector of the 10 th track).
Once the storage address of the data block in the physical disk is determined, the storage engine of the block storage system initiates disk I/O operation, and directly reads the content of the data block into the memory. This process may be efficiently accomplished by DMA (Direct Memory Access ) or the like. The read data block is then transferred to a process or service that issues a data block read request, completing the delivery of the data.
It should be noted that, the multi-level linear table index method may aggravate the problem of write amplification in the scenario of direct block-oriented storage, especially related to the dual-write of log, because the conventional block storage system uses the file system of the operating system as the storage engine, so that each disk I/O operation triggers twice log recording: once a storage engine needs to record operations in its own transaction log in order to ensure the atomicity, consistency, isolation, durability of the operations; another is the recording that the file system makes to ensure data integrity (the file system is not specifically designed for block storage systems, the file system itself has journals and cannot generally be turned off). This dual-layer log mechanism would undoubtedly burden the write operation, resulting in write amplification.
In this embodiment, since the block storage system may be directly called, only one log record of the block storage system is triggered, which avoids the problem that the conventional block storage system uses the file system of the operating system as the storage engine, so that the log record is triggered twice (once recorded in the own log by the storage engine to ensure the atomicity, consistency, isolation and durability of the operation, and the log recorded by the file system itself is otherwise, because the file system is not specially designed for the block storage system, the file system itself has a log and cannot be turned off generally), resulting in write amplification, thereby optimizing the log writing mode.
In one embodiment, a file index table is stored in the physical disk, and the file index table is used for recording file identifications of stored files in the physical disk and storage addresses of the data index table in the physical disk, wherein the data index table corresponds to the stored files in the physical disk one by one; the method further comprises the steps of: loading the file index table into a memory; according to the storage addresses of the data index tables in the physical disk, which are in one-to-one correspondence with the stored files in the physical disk, loading the data index tables in one-to-one correspondence with the stored files in the physical disk into the memory; generating a secondary index tree corresponding to the stored files in the physical disk in the memory according to the data index table corresponding to the stored files in the physical disk one by one; and generating the primary index tree in the memory according to the file identification of the stored file in the physical disk recorded in the file index table and the storage address of the secondary index tree in the memory, which is generated in the memory and corresponds to the stored file in the physical disk one by one.
For example, referring to fig. 4, fig. 4 is a schematic diagram illustrating a memory index tree generation method according to an exemplary embodiment. As shown in fig. 4, a file index table is stored in the physical disk, where the file index table is used to record file identifiers (for example, file_id4, file_id5, file_id7, file_id9, which are not shown in the specific content diagram) of files stored in the physical disk and storage addresses of data index tables corresponding to the files stored in the physical disk (for example, storage addresses of the data index table corresponding to the file4 in the physical disk are track number 6 and sector number 2, storage addresses of the data index table corresponding to the file5 in the physical disk are track number 7 and sector number 3, storage addresses of the data index table corresponding to the file7 in the physical disk are track number 9 and sector number 3, and storage addresses of the data index table corresponding to the file9 in the physical disk are track number 10 and sector number 5, which are not shown in the specific content diagram); loading the file index table into a memory, knowing the storage addresses of the data index tables in the physical disk, which are in one-to-one correspondence with the stored files file4, file5, file7 and file9 in the physical disk, in the memory according to the recorded content in the file index table, and loading the data index tables into the memory to generate secondary index trees (4 secondary index trees in total, only one secondary index tree corresponding to the file9 is shown in the figure) in one-to-one correspondence with the stored files file4, file5, file7 and file 9; and finally, generating a primary index tree based on the storage address of the generated secondary index tree in the memory and file identifications file_id4, file_id5, file_id7 and file_id9 of the stored files in the physical disk recorded in the file index table.
It should be noted that, fig. 4 only illustrates a schematic diagram of a generation manner of the secondary index tree corresponding to the file9 in the memory and a schematic diagram of a generation manner of the primary index tree, and the generation manner of the secondary index tree corresponding to other files in the memory is not particularly limited. For example, the secondary index tree corresponding to the file4 is generated after the data index table corresponding to the file4 is loaded into the memory, which is not described in detail herein.
Wherein the file index table is an index structure for storing file metadata. The method records the file identification of each file and the storage address of the data index table corresponding to the file in the physical disk. Through the file index table, the storage engine of the block storage system can quickly find the actual storage position of the data index table corresponding to the file on the disk according to the file identification.
The data index table is more focused on the positioning of the data block level, and is commonly used for recording the physical position of each data block in the file (namely, the storage address in the physical disk), so that the data block can be conveniently loaded and processed as required. After the data index table is loaded into the memory, the memory dynamically allocates the storage address of the data index table in the memory, so that the storage address of the data index table in the memory needs to be known after the data index table is loaded into the memory.
In one embodiment, the file index table includes array elements, each array element including a file identifier of a stored file in the physical disk and a storage address of the data index table corresponding to the stored file in the physical disk, where each array element included in the file index table is stored in an order in which the stored file corresponding to the array element is written to the physical disk.
For example, as shown in fig. 4,4 files are stored in the physical disk, namely file4, file5, file7 and file9 respectively, the file index table includes 4 array elements, each array element includes a file identifier of a stored file in the physical disk and a storage address of a data index table corresponding to the stored file in the physical disk, for example, the 1 st array element of the file index table includes a file identifier file_id4 of file4 and a storage address of a data index table corresponding to file4 in the physical disk: track number 6 and sector number 2; the 2 nd array element of the file index table contains file identification file_id5 of file5 and a storage address of the data index table corresponding to file5 in a physical disk: track number 7 and sector number 3; the 3 rd array element of the file index table contains file identification file_id7 of file7 and the storage address of the data index table corresponding to file7 in the physical disk: track number 9 and sector number 3; the 4 th array element of the file index table contains file identification file_id9 of file9 and the storage address of the data index table corresponding to file9 in the physical disk: track number 10 and sector number 5. The 4 files stored in the physical disk are file4, file5, file7 and file9 according to the storage time sequence, so that each array element included in the file index table is stored according to the sequence of file4, file5, file7 and file 9.
It should be noted that, in the conventional file system, the file index table (such as B tree and b+ tree) is usually stored on the physical disk in an ordered structure, and when adding a node to the B tree and the b+ tree, the splitting of the B tree and the b+ tree may be triggered, so that one node adding operation may trigger multiple disk I/os. In this embodiment, a new element is added in an array form file index table in an additional manner, so that the new element is added in the file index table only once to trigger disk I/O. When the files are written continuously for many times, the file index table is updated in an additional mode, so that the I/O of the disk is reduced, and the sequential I/O writing is performed when the file index table is written back to the physical disk later, and the performance is higher.
In one embodiment, the data index table corresponding to the stored files in the physical disk includes array elements, each array element includes a data block identifier of a data block stored in the physical disk and a storage address of the data block in the physical disk, where each array element included in the data index table corresponding to the stored files in the physical disk is stored in an order in which the data blocks corresponding to each array element are written into the physical disk.
For example, as shown in fig. 4, the data index table corresponding to the stored files file4, file5, file7 and file9 in the physical disk includes array elements, each array element includes a data block identifier of a stored data block in the physical disk and a storage address of the data block in the physical disk, for example, the data index table corresponding to the file9 includes 4 array elements, that is, the file9 includes 4 stored data blocks in the physical disk, the data block identifiers of the 4 data blocks are respectively block_id4, block_id5, block_id6 and block_id8, and the storage addresses corresponding to the physical disk are respectively track number 5 and sector number 2, track number 6 and sector number 3, track number 7 and sector number 3, track number 8 and sector number 4, which are not shown in the specific content figures). The data block identifiers of the 4 data blocks stored in the physical disk and included in the file9 are respectively block_id4, block_id5, block_id6 and block_id8 according to the storage time, so that each array element included in the data index table corresponding to the file9 is stored according to the sequence of the data block identifiers of the data blocks, block_id4, block_id5, block_id6 and block_id8.
The data index table stores data indexes in an array mode, and each array element in the array comprises a data block identifier of a data block stored in a physical disk and a storage address of the data block in the physical disk, so that the data index table comprises data block identifiers of all the data blocks stored in the physical disk and the storage addresses of the data blocks in the physical disk. The data index table includes the array elements that are stored in the order in which the corresponding data blocks are written to the physical disk, where the possible storage mode is that the data block written to the physical disk is located at the front position of the array and the data block written to the physical disk is located at the rear position of the array, or vice versa, which is not limited in this specification.
It should be noted that, in the conventional file system, the data index table (such as B tree and b+ tree) is similar to the file index table, and is also typically stored on the disk in an ordered structure, when adding nodes into the B tree and the b+ tree, splitting of the B tree and the b+ tree may be triggered, so that a node adding operation may trigger multiple disk I/os. In this embodiment, new elements are added in an array form data index table in an additional mode, so that each time a new element is added in the data index table, only one disk I/O is triggered. When the file data block is written continuously for many times, the data index table is updated in an additional mode, so that the disk I/O is reduced, and the sequential I/O writing is performed when the data index table is written back to the physical disk later, so that the performance is higher.
In one embodiment shown, the method further comprises: determining the storage address of the stored data block in the physical disk according to the file index table and the data index table corresponding to the stored files in the physical disk one by one; generating and storing a data block bitmap in a memory according to the storage address of the stored data block in the physical disk; the bits in the data block bitmap are in one-to-one correspondence with the divided block storage units in the physical disk, the storage capacity of each block storage unit divided by the physical disk is the same, and the storage capacity of each block storage unit is not less than 1MB; the bit in the data block bitmap is set to a first value, which indicates that the block storage unit corresponding to the bit has stored the data block, and the bit in the data block bitmap is set to a second value, which indicates that the block storage unit corresponding to the bit has not stored the data block.
For example, referring to fig. 5, fig. 5 is a schematic diagram illustrating a memory storage data block bitmap according to an exemplary embodiment. As shown in fig. 5, the storage capacity of each block storage unit (each small square in the table of the physical disk allocation situation in the figure) of the physical disk partition is the same, and the storage capacity of each block storage unit is not less than 1MB. If any one of the block storage units of the physical disk partition stores a disk configuration table, a file index table, or other data object, then that block storage unit is considered allocated and no new data content can be stored. The storage addresses of the data index tables corresponding to the stored files in the physical disk one by one in the physical disk can be obtained according to the file index tables, the storage addresses of the data blocks stored in the physical disk and contained in the stored files in the physical disk can be obtained according to the data index tables corresponding to the stored files in the physical disk one by one, the distribution condition of the storage units of all blocks in the physical disk is further determined, and then the corresponding data block bitmaps are generated in the memory. Each bit in the data block bitmap corresponds to each block storage unit partitioned by the physical disk one by one. When the bit in the data block bitmap is set to 1, it indicates that the block storage unit corresponding to the bit has stored the data block, and when the bit in the data block bitmap is set to 0, it indicates that the block storage unit corresponding to the bit has not stored the data block.
The disk configuration table is used for recording format data of the physical disk, such as the size of the file index table, the storage address of the file index table in the physical disk, the storage address and the size of the block storage unit divided by the physical disk, and the like. When a storage engine is initialized, a disk configuration table is generally required to be loaded into a memory for scanning, so that a storage address of a file index table in a physical disk is obtained, the file index table is further loaded into the memory, and then the technical scheme disclosed in the specification is carried out. Considering that the storage contents of the disk configuration table and the file index table are limited, they are usually stored in a fixed position of the physical disk header, for example, the physical disk header 0-1MB of the fixed storage disk configuration table, the physical disk header 1-2MB of the fixed storage file index table (the file index table in this scheme is in the form of a plurality of groups, each group element includes a file identifier (8 bytes, that is, 8 bytes, which may be abbreviated as 8B) and a storage address (which is also 8B) of the data index table of the file in the physical disk, so that each group element occupies 16 bytes in total, 1 MB/16=64 KB, that is, 64000 files may be stored, and it can be seen that the file index table of 1MB is sufficient to record the information of each file stored in the physical disk. The physical disk also stores a data index table corresponding to each file and a plurality of data blocks stored in the physical disk contained in each file, and the data index table and the data blocks are respectively stored in block storage units divided by the physical disk. The present specification does not limit the numerical value and meaning of the position-located value in the data block bitmap.
In this embodiment, considering that the capacity of one file in the distributed block storage is very large, typically more than 10GB, the size of the data block split from each file data can be increased, and a larger block storage unit, for example, at least 1MB, is set to conform to the characteristics of the block storage (i.e., the number is small, the volume is large, and this characteristic is completely opposite to the characteristics of the file storage), which has the following three advantages: firstly, fragmentation of a physical disk storage space can be reduced, and randomness of data addresses during reading and writing of data blocks can be reduced; secondly, the physical disk space is divided into equal-sized block storage units, so that the division is convenient, and the unified management of the storage engine of the block storage system on the physical disk storage space is convenient; finally, because each block storage unit has larger storage capacity, the data blocks which can be contained in the physical disk are limited, the indexes of the data blocks can be stored in the memory completely, so that the index is convenient to search, the index is not needed to be loaded into the memory when the index is needed to be used like a common file system, the index is released when the memory is insufficient, and the disk I/O is increased.
Since a common file system stores a bitmap of data blocks on a disk, when a data block is newly allocated, the information of the bitmap of data blocks needs to be updated to the disk, which causes additional disk I/O. The physical disk of the present disclosure does not store the data block bitmap, and only needs to update the data block bitmap in the memory, thereby reducing unnecessary disk I/O.
In one embodiment shown, the method further comprises: receiving a data block writing request, wherein the data block writing request comprises a file identifier of a file to which a second data block of a physical disk needs to be written and a data block identifier of the second data block; distributing a block storage unit for storing the second data block from a storage area of the physical disk, which is not used for storing the data block, according to the data block bitmap; writing the second data block into a physical disk according to the block storage unit allocated for the second data block; and if the file identifier of the file to which the second data block belongs exists in the primary index tree, adding a node corresponding to the second data block into a secondary index tree corresponding to the file to which the second data block belongs according to the data block identifier of the second data block and a block storage unit allocated for the second data block.
For example, receiving a data block writing request, and obtaining a file identifier file_id9 of a file9 to which a second data block of the physical disk needs to be written and a data block identifier block_id3 of the second data block, where it is known that the second data block to be written is a third data block of the file9 with the file identifier file_id9; and according to the data block bitmap, allocating a block storage unit for storing a second data block to be written from a storage area where the data block is not stored, and writing the second data block into the physical disk. Because the primary index tree shown in fig. 4 has the file identifier file_id9 of the file9 to which the second data block belongs, according to the data block identifier block_id3 of the second data block and the block storage unit allocated for the second data block, the node corresponding to the second data block is added in the secondary index tree corresponding to the file to which the second data block belongs.
The data block bitmap is an efficient data structure used in managing files and is used for tracking the use condition of data blocks on a disk. Each bit (bit) in the bitmap corresponds to a block of data on disk, and the value of the bit (typically 0 or 1) indicates whether the corresponding block of data is occupied or free. The storage area of a physical disk where no data block is stored refers to the portion of space on the physical disk that has not been occupied by any data block. In order to efficiently use the physical disk space and ensure data continuity, it is necessary to manage this portion of free space and allocate new data blocks when needed. An inode is created in the secondary index tree for the newly written data block, which stores the data block identification and the actual storage address (i.e., allocated block storage location) of the data block on the physical disk. The purpose of this is to be able to quickly find its storage address on the physical disk based on the data block identification in the following, thereby achieving efficient access to the data block.
In one embodiment shown, the method further comprises: if the file identification of the file to which the second data block belongs does not exist in the primary index tree, a storage address of a secondary index tree corresponding to the file to which the second data block belongs in a memory is allocated; creating a secondary index tree corresponding to a file to which the second data block belongs in a memory based on a data block identifier of the second data block and a storage address of the second data block in a physical disk, and adding a node corresponding to the second data block in the secondary index tree corresponding to the file to which the second data block belongs; and adding nodes corresponding to the files of the second data blocks into the primary index tree based on the file identifications of the files of the second data blocks and the storage addresses in the memory of the secondary index tree corresponding to the files of the second data blocks.
For example, a data block writing request is received, a file identification file_id10 of a file10 to which a second data block of the physical disk is to be written and a data block identification block_id3 of the second data block are obtained, and it is known that the second data block to be written is a third data block of the file10 with the file identification file_id10, and a block storage unit (e.g., the second data block is stored in the 5 th sector of the 8 th track) for storing the second data block to be written is allocated from a storage area where the data block is not stored according to the data block bitmap, and the second data block is written into the physical disk. Because the primary index tree shown in fig. 4 does not have the file identification file_id10 of the file10 to which the second data block belongs, the storage address 0x00401234 of the secondary index tree corresponding to the file10 to which the second data block belongs in the memory is allocated, the secondary index tree corresponding to the file10 to which the second data block belongs is created in the memory based on the data block identification block_id3 of the second data block and the storage address (track number 8 and sector number 5) of the second data block in the physical disk, the node corresponding to the second data block is added in the secondary index tree corresponding to the file corresponding to the second data block, and the node corresponding to the file10 to which the second data block belongs is added in the primary index tree based on the file identification file_id10 of the file10 to which the second data block belongs and the storage address 0x00401234 of the secondary index tree corresponding to the file10 to which the second data block belongs in the memory (for example, the storage address dynamically allocated for the secondary index tree corresponding to the file10 in the memory is 0x 00401234).
When the storage engine of the block storage system attempts to write a data block of a new file (such as a data block of file 10), the file identification (file_id10) is first searched in the primary index tree. If not found (i.e., the file has not been previously indexed), this indicates that it is the first time that storage space is allocated for the file and an index is created. Since the primary index tree does not specify the index of file10, the storage engine of the block storage system dynamically allocates a block of space (e.g., address 0x 00401234) in memory for creating the secondary index tree of the file (file 10). This indicates that the index structure is dynamically extended and can grow as new files are added. A new secondary index tree is then created in the dynamically allocated address of the memory, specifically for managing the data block index of file 10. This index tree will be built based on the data block identification (e.g., block_id3) and the physical address of the data block on disk (track number 8 and sector number 5), ensuring that the data block can be quickly located by index later. Finally, a new node is added to the primary index tree to index the new file based on the file identification file_id10 of the new file and the address (0 x 00401234) of the secondary index tree in the memory. This step ensures that the corresponding secondary index tree can be found efficiently through the file identification in the future, thereby accessing all the data blocks of the file.
Corresponding to the above embodiments of the data access method based on block storage, the present specification also provides embodiments of a data access device based on block storage.
Referring to fig. 6, fig. 6 is a hardware configuration diagram of an electronic device according to an exemplary embodiment. At the hardware level, the device includes a processor 602, an internal bus 604, a network interface 606, memory 608, and non-volatile storage 610, although other hardware requirements are possible. One or more embodiments of the present description may be implemented in a software-based manner, such as by the processor 602 reading a corresponding computer program from the non-volatile memory 610 into the memory 608 and then running. Of course, in addition to software implementation, one or more embodiments of the present disclosure do not exclude other implementation manners, such as a logic device or a combination of software and hardware, etc., that is, the execution subject of the following processing flow is not limited to each logic unit, but may also be hardware or a logic device.
Referring to fig. 7, fig. 7 is a block diagram illustrating a data access apparatus based on distributed block storage according to an exemplary embodiment. The data access device 700 based on distributed block storage may be applied to an electronic device as shown in fig. 6, so as to implement the technical solution of the present specification. The memory of the device is stored with a first-level index tree and a second-level index tree, the first-level index tree is used for searching the memory address of the second-level index tree in the memory, and each second-level index tree is used for searching the memory address of a data block included in a file stored in a physical disk in the physical disk; wherein, a node of the primary index tree stores a file identifier of a stored file in a physical disk and a storage address of a secondary index tree corresponding to the file in a memory; a node of the secondary index tree corresponding to the file stores a data block identifier of a data block included in the file and a storage address of the data block in a physical disk; the device comprises:
a receiving unit 702, configured to receive a data block read request, where the data block read request includes a file identifier of a file to which a first data block that needs to be read from a physical disk belongs and a data block identifier of the first data block;
A first searching unit 704, configured to search, according to a file identifier of a file to which the first data block belongs, a storage address in a memory of a second level index tree corresponding to the file to which the first data block belongs in the first level index tree;
A second searching unit 706, configured to search, according to the data block identifier of the first data block, a storage address of the first data block in a physical disk in a secondary index tree corresponding to a file to which the first data block belongs;
And a reading unit 708, configured to read the first data block from the physical disk according to the storage address of the first data block in the physical disk.
In some embodiments, a file index table is stored in a physical disk of the device, and the file index table is used for recording file identifications of stored files in the physical disk and storage addresses of the data index table in the physical disk, wherein the data index table corresponds to the stored files in the physical disk one by one; the apparatus further comprises:
the first loading unit is used for loading the file index table to the memory;
The second loading unit is used for loading the data index tables corresponding to the stored files in the physical disk into the memory according to the storage addresses of the data index tables corresponding to the stored files in the physical disk one by one;
The first generation unit is used for generating a second-level index tree corresponding to the stored files in the physical disk in the memory according to the data index table corresponding to the stored files in the physical disk one by one;
And the second generation unit is used for generating the primary index tree in the memory according to the file identification of the stored files in the physical disk recorded in the file index table and the storage addresses of the secondary index tree in the memory, which are generated in the memory and are in one-to-one correspondence with the stored files in the physical disk.
In some embodiments, the file index table includes array elements, each array element includes a file identifier of a stored file in the physical disk and a storage address of the data index table corresponding to the stored file in the physical disk, where each array element included in the file index table is stored in an order in which the stored file corresponding to each array element is written to the physical disk.
In some embodiments, the data index table corresponding to the stored file in the physical disk includes array elements, each array element includes a data block identifier of a data block stored in the physical disk and a storage address of the data block in the physical disk, where each array element included in the data index table corresponding to the stored file in the physical disk is stored in an order in which the data block corresponding to each array element is written to the physical disk.
In some embodiments, the apparatus further comprises:
The determining unit is used for determining the storage address of the stored data block in the physical disk according to the file index table and the data index table corresponding to the stored files in the physical disk one by one;
The third generating unit is used for generating and storing a data block bitmap in a memory according to the storage address of the stored data block in the physical disk; the bits in the data block bitmap are in one-to-one correspondence with the divided block storage units in the physical disk, the storage capacity of each block storage unit divided by the physical disk is the same, and the storage capacity of each block storage unit is not less than 1 MB; the bit in the data block bitmap is set to a first value, which indicates that the block storage unit corresponding to the bit has stored the data block, and the bit in the data block bitmap is set to a second value, which indicates that the block storage unit corresponding to the bit has not stored the data block.
In some embodiments, the apparatus further comprises:
The receiving unit is used for receiving a data block writing request, wherein the data block writing request comprises a file identifier of a file to which a second data block of the physical disk needs to be written and a data block identifier of the second data block;
a first allocation unit, configured to allocate, according to the data block bitmap, a block storage unit for storing the second data block from a storage area where no data block is stored in the physical disk;
the writing unit is used for writing the second data block into a physical disk according to the block storage unit allocated for the second data block;
and the first adding unit is used for adding the node corresponding to the second data block into the secondary index tree corresponding to the file to which the second data block belongs according to the data block identifier of the second data block and the block storage unit allocated for the second data block if the file identifier of the file to which the second data block belongs exists in the primary index tree.
In some embodiments, the apparatus further comprises:
the second allocation unit is used for allocating the storage address of the second-level index tree corresponding to the file to which the second data block belongs in the memory if the file identifier of the file to which the second data block belongs does not exist in the first-level index tree;
The second adding unit is used for creating a secondary index tree corresponding to the file to which the second data block belongs in a memory based on the data block identification of the second data block and the storage address of the second data block in a physical disk, and adding a node corresponding to the second data block in the secondary index tree corresponding to the file to which the second data block belongs;
and the third adding unit is used for adding nodes corresponding to the files of the second data blocks in the primary index tree based on the file identifications of the files of the second data blocks and the storage addresses of the secondary index tree corresponding to the files of the second data blocks in the memory.
The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.
For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are illustrative only, in that the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present description. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or a combination of any of these devices.
In a typical configuration, a computer includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, read only compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, quantum memory, graphene-based storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by the computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
User information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in this specification are both information and data authorized by the user or sufficiently authorized by the parties, and the collection, use and processing of relevant data requires compliance with relevant laws and regulations and standards of the relevant country and region, and is provided with corresponding operation portals for the user to choose authorization or denial.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in one or more embodiments of the present description to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" depending on the context.
The foregoing description of the preferred embodiment(s) is (are) merely intended to illustrate the embodiment(s) of the present invention, and it is not intended to limit the embodiment(s) of the present invention to the particular embodiment(s) described.
Claims (10)
1. The data access method based on distributed block storage is characterized in that a first-level index tree and a second-level index tree are stored in a memory, the first-level index tree is used for searching the storage address of the second-level index tree in the memory, and each second-level index tree is used for searching the storage address of a data block included in a file stored in a physical disk in the physical disk; wherein, a node of the primary index tree stores a file identifier of a stored file in a physical disk and a storage address of a secondary index tree corresponding to the file in a memory; a node of the secondary index tree corresponding to the file stores a data block identifier of a data block included in the file and a storage address of the data block in a physical disk; the method comprises the following steps:
Receiving a data block reading request, wherein the data block reading request comprises a file identifier of a file to which a first data block to be read from a physical disk belongs and a data block identifier of the first data block;
Searching a storage address of a secondary index tree corresponding to the file to which the first data block belongs in the memory in the primary index tree according to the file identification of the file to which the first data block belongs;
searching a storage address of the first data block in a physical disk in a secondary index tree corresponding to a file to which the first data block belongs according to the data block identification of the first data block;
And reading the first data block from the physical disk according to the storage address of the first data block in the physical disk.
2. The method of claim 1, wherein a file index table is stored in the physical disk, and the file index table is used for recording file identifications of stored files in the physical disk and storage addresses of the data index table in the physical disk, wherein the data index table corresponds to the stored files in the physical disk one by one; the method further comprises the steps of:
loading the file index table into a memory;
According to the storage addresses of the data index tables in the physical disk, which are in one-to-one correspondence with the stored files in the physical disk, loading the data index tables in one-to-one correspondence with the stored files in the physical disk into the memory;
generating a secondary index tree corresponding to the stored files in the physical disk in the memory according to the data index table corresponding to the stored files in the physical disk one by one;
and generating the primary index tree in the memory according to the file identification of the stored file in the physical disk recorded in the file index table and the storage address of the secondary index tree in the memory, which is generated in the memory and corresponds to the stored file in the physical disk one by one.
3. The method of claim 2, wherein the file index table includes array elements, each array element including a file identifier of a stored file in the physical disk and a storage address of the data index table corresponding to the stored file in the physical disk, wherein each array element included in the file index table is stored in an order in which the stored file corresponding thereto is written to the physical disk.
4. The method of claim 2, wherein the data index table corresponding to the stored file in the physical disk includes array elements, each array element including a data block identifier of a data block stored in the physical disk and a storage address of the data block in the physical disk, and wherein each array element included in the data index table corresponding to the stored file in the physical disk is stored in an order in which the data blocks corresponding thereto are written to the physical disk.
5. The method according to claim 2, wherein the method further comprises:
Determining the storage address of the stored data block in the physical disk according to the file index table and the data index table corresponding to the stored files in the physical disk one by one;
Generating and storing a data block bitmap in a memory according to the storage address of the stored data block in the physical disk; the bits in the data block bitmap are in one-to-one correspondence with the divided block storage units in the physical disk, the storage capacity of each block storage unit divided by the physical disk is the same, and the storage capacity of each block storage unit is not less than 1MB; the bit in the data block bitmap is set to a first value, which indicates that the block storage unit corresponding to the bit has stored the data block, and the bit in the data block bitmap is set to a second value, which indicates that the block storage unit corresponding to the bit has not stored the data block.
6. The method of claim 5, wherein the method further comprises:
Receiving a data block writing request, wherein the data block writing request comprises a file identifier of a file to which a second data block of a physical disk needs to be written and a data block identifier of the second data block;
Distributing a block storage unit for storing the second data block from a storage area of the physical disk, which is not used for storing the data block, according to the data block bitmap;
writing the second data block into a physical disk according to the block storage unit allocated for the second data block;
And if the file identifier of the file to which the second data block belongs exists in the primary index tree, adding a node corresponding to the second data block into a secondary index tree corresponding to the file to which the second data block belongs according to the data block identifier of the second data block and a block storage unit allocated for the second data block.
7. The method of claim 6, wherein the method further comprises:
If the file identification of the file to which the second data block belongs does not exist in the primary index tree, a storage address of a secondary index tree corresponding to the file to which the second data block belongs in a memory is allocated;
Creating a secondary index tree corresponding to a file to which the second data block belongs in a memory based on a data block identifier of the second data block and a storage address of the second data block in a physical disk, and adding a node corresponding to the second data block in the secondary index tree corresponding to the file to which the second data block belongs;
And adding nodes corresponding to the files of the second data blocks into the primary index tree based on the file identifications of the files of the second data blocks and the storage addresses in the memory of the secondary index tree corresponding to the files of the second data blocks.
8. The data access device based on distributed block storage is characterized in that a primary index tree and a secondary index tree are stored in a memory of the device, the primary index tree is used for searching a storage address of the secondary index tree in the memory, and each secondary index tree is used for searching a storage address of a data block included in a file stored in a physical disk in the physical disk; wherein, a node of the primary index tree stores a file identifier of a stored file in a physical disk and a storage address of a secondary index tree corresponding to the file in a memory; a node of the secondary index tree corresponding to the file stores a data block identifier of a data block included in the file and a storage address of the data block in a physical disk; the device comprises:
the receiving unit is used for receiving a data block reading request, wherein the data block reading request comprises a file identifier of a file to which a first data block which needs to be read from a physical disk belongs and a data block identifier of the first data block;
The first searching unit is used for searching a storage address of a secondary index tree corresponding to the file to which the first data block belongs in the memory in the primary index tree according to the file identifier of the file to which the first data block belongs;
The second searching unit is used for searching a storage address of the first data block in a physical disk in a secondary index tree corresponding to a file to which the first data block belongs according to the data block identification of the first data block;
and the reading unit is used for reading the first data block from the physical disk according to the storage address of the first data block in the physical disk.
9. An electronic device, comprising a communication interface, a processor, a memory and a bus, wherein the communication interface, the processor and the memory are connected with each other through the bus;
The memory stores machine readable instructions, the processor executing the method of any of claims 1 to 7 by invoking the machine readable instructions.
10. A machine-readable storage medium storing machine-readable instructions which, when invoked and executed by a processor, implement the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411036978.3A CN118567577B (en) | 2024-07-30 | 2024-07-30 | Data access method and device based on distributed block storage and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411036978.3A CN118567577B (en) | 2024-07-30 | 2024-07-30 | Data access method and device based on distributed block storage and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118567577A true CN118567577A (en) | 2024-08-30 |
CN118567577B CN118567577B (en) | 2024-10-22 |
Family
ID=92478588
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202411036978.3A Active CN118567577B (en) | 2024-07-30 | 2024-07-30 | Data access method and device based on distributed block storage and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118567577B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104021223A (en) * | 2014-06-25 | 2014-09-03 | 国家电网公司 | Method and device for accessing survey point of cluster database |
CN106233259A (en) * | 2014-04-30 | 2016-12-14 | 国际商业机器公司 | The many storage data from generation to generation of retrieval in decentralized storage networks |
CN106326421A (en) * | 2016-08-24 | 2017-01-11 | 中国科学院上海微系统与信息技术研究所 | FPGA (Field Programmable Gate Array) parallel sorting method and system based on index tree and data linked list |
CN107168657A (en) * | 2017-06-15 | 2017-09-15 | 深圳市云舒网络技术有限公司 | It is a kind of that cache design method is layered based on the virtual disk that distributed block is stored |
CN110245136A (en) * | 2019-05-06 | 2019-09-17 | 阿里巴巴集团控股有限公司 | Data retrieval method and device, equipment and storage equipment |
US20200053122A1 (en) * | 2018-08-10 | 2020-02-13 | International Business Machines Corporation | Intrusion detection system for automated determination of ip addresses |
CN111782590A (en) * | 2020-06-19 | 2020-10-16 | 新华三技术有限公司成都分公司 | File reading method and device |
WO2023274197A1 (en) * | 2021-06-29 | 2023-01-05 | 华为技术有限公司 | Operation request processing method and related device |
CN116541360A (en) * | 2023-05-09 | 2023-08-04 | 成都智明达电子股份有限公司 | High-speed anti-power-failure high-capacity file system and anti-power-failure method |
-
2024
- 2024-07-30 CN CN202411036978.3A patent/CN118567577B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106233259A (en) * | 2014-04-30 | 2016-12-14 | 国际商业机器公司 | The many storage data from generation to generation of retrieval in decentralized storage networks |
CN104021223A (en) * | 2014-06-25 | 2014-09-03 | 国家电网公司 | Method and device for accessing survey point of cluster database |
CN106326421A (en) * | 2016-08-24 | 2017-01-11 | 中国科学院上海微系统与信息技术研究所 | FPGA (Field Programmable Gate Array) parallel sorting method and system based on index tree and data linked list |
CN107168657A (en) * | 2017-06-15 | 2017-09-15 | 深圳市云舒网络技术有限公司 | It is a kind of that cache design method is layered based on the virtual disk that distributed block is stored |
US20200053122A1 (en) * | 2018-08-10 | 2020-02-13 | International Business Machines Corporation | Intrusion detection system for automated determination of ip addresses |
CN110245136A (en) * | 2019-05-06 | 2019-09-17 | 阿里巴巴集团控股有限公司 | Data retrieval method and device, equipment and storage equipment |
CN111782590A (en) * | 2020-06-19 | 2020-10-16 | 新华三技术有限公司成都分公司 | File reading method and device |
WO2023274197A1 (en) * | 2021-06-29 | 2023-01-05 | 华为技术有限公司 | Operation request processing method and related device |
CN116541360A (en) * | 2023-05-09 | 2023-08-04 | 成都智明达电子股份有限公司 | High-speed anti-power-failure high-capacity file system and anti-power-failure method |
Non-Patent Citations (1)
Title |
---|
叶常春, 罗金平, 周兴铭: "一种加快WebGIS服务器响应速度的空间索引", 软件学报, no. 05, 30 May 2005 (2005-05-30) * |
Also Published As
Publication number | Publication date |
---|---|
CN118567577B (en) | 2024-10-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9298384B2 (en) | Method and device for storing data in a flash memory using address mapping for supporting various block sizes | |
CN106294190B (en) | Storage space management method and device | |
CN110555001B (en) | Data processing method, device, terminal and medium | |
US8621134B2 (en) | Storage tiering with minimal use of DRAM memory for header overhead | |
CN110858162B (en) | Memory management method and device and server | |
CN105320775A (en) | Data access method and apparatus | |
CN114817341B (en) | Method and device for accessing database | |
CN111143285A (en) | Small file storage file system and small file processing method | |
KR20210027625A (en) | Method for managing of memory address mapping table for data storage device | |
CN113835639B (en) | I/O request processing method, device, equipment and readable storage medium | |
CN109407985B (en) | Data management method and related device | |
US8239427B2 (en) | Disk layout method for object-based storage devices | |
CN108804571B (en) | Data storage method, device and equipment | |
CN118567577B (en) | Data access method and device based on distributed block storage and electronic equipment | |
CN115079957B (en) | Request processing method, device, controller, equipment and storage medium | |
CN114647388B (en) | Distributed block storage system and management method | |
CN111338569A (en) | Object storage back-end optimization method based on direct mapping | |
CN111274259A (en) | Data updating method for storage nodes in distributed storage system | |
CN116466885A (en) | Data access method and data processing system | |
CN115964350A (en) | File system management model and system | |
CN115509437A (en) | Storage system, network card, processor, data access method, device and system | |
CN115904211A (en) | Storage system, data processing method and related equipment | |
CN111309261A (en) | Physical data position mapping method on single node in distributed storage system | |
CN117785889B (en) | Index management method for graph database and related equipment | |
CN117931811B (en) | Database processing method, device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |