WO2022257685A1 - Storage system, network interface card, processor, and data access method, apparatus, and system - Google Patents

Storage system, network interface card, processor, and data access method, apparatus, and system Download PDF

Info

Publication number
WO2022257685A1
WO2022257685A1 PCT/CN2022/092015 CN2022092015W WO2022257685A1 WO 2022257685 A1 WO2022257685 A1 WO 2022257685A1 CN 2022092015 W CN2022092015 W CN 2022092015W WO 2022257685 A1 WO2022257685 A1 WO 2022257685A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
target data
storage
block
layer
Prior art date
Application number
PCT/CN2022/092015
Other languages
French (fr)
Chinese (zh)
Inventor
任仁
王晨
叶利杰
崔文林
张鹏
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202110944933.6A external-priority patent/CN115509437A/en
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022257685A1 publication Critical patent/WO2022257685A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers

Definitions

  • the present application relates to the technical field of storage, and in particular to a storage system, a network card, a processor, a data access method, device and system.
  • the storage system when the storage system receives a data read request from a client device, the storage system needs to traverse each storage medium in the storage system in a specific order to determine whether the target data to be read is stored in each storage medium. For example, the storage system first checks whether the target data is stored in the write cache, and if not, continues to check whether the target data is stored in the read cache, and if not, then continues to search for the lower storage medium.
  • the storage system needs to traverse each storage medium that is higher in the storage system in turn until the storage medium where the target data is located can be successfully found. If the sorting position of the storage medium where the target data is located is lower, the time consumption of the data search process will be greater, so that the processing efficiency of the storage system for the data read request is reduced.
  • the present application provides a storage system, a network card, a processor, a data access method, device and system to improve data reading efficiency.
  • the embodiment of the present application provides a storage system
  • the storage system includes an I/O stack and a processing unit
  • the I/O stack includes multiple storage layers (the storage layer may be referred to simply as a layer in the embodiment of the application)
  • the I/O stack is formed by layering the storage media in the storage system.
  • Each storage layer may include one or more storage media, and the data read delay of each storage layer may be different.
  • the processing unit may be located in a device in the storage system (the device may be referred to as a storage device), and the embodiment of the present application does not limit the specific form of the processing unit.
  • the processing unit can process a data reading request generated inside the client device or the storage system, and feed back the requested target data.
  • the processing unit receives a data read request, and the data read request is used to request to read the target data stored in the storage system; the processing unit can query the global index based on the data read request, and the global index can indicate where the target data is located in the I/O stack storage layer, the processing unit may determine the storage layer where the target data is located according to the global index. After the storage layer where the target data is located is determined, the target data can be read from the storage layer.
  • the processing unit in the storage system can directly determine the storage layer where the target data is located by querying the global index, without traversing the storage media in the storage system in order, eliminating the need to traverse the storage media The delay, the processing process of the data reading request is more efficient, which can effectively improve the data reading efficiency.
  • the processing unit may also process the data write request.
  • the specific process is as follows: the processing unit can receive the data write request, and the data write request is used to request to write the target data in the storage system; the processing unit can write the target data into the storage system according to the data write request, and can also according to The target data updates the global index, and the updated global index is used to indicate the storage layer of the target data in the I/O stack.
  • the so-called updating the global index refers to storing the target data in a certain storage layer, and this information is recorded in the first index item, so that when the target data needs to be read later, it can be accessed through Query the first index item in the global index to obtain the storage location of the data.
  • the index items are called subgroups in the embodiment, and the first index item is called the first subgroup.
  • the processing unit can update the global index when the target data is written, and the updated global index can indicate the storage layer where the target data is located, so that when the processing unit needs to read the target data, it can follow the updated
  • the global index accurately determines the storage layer where the target data resides, and reads the target data.
  • each character sub-block corresponds to a storage layer in the I/O stack, and points to the logical storage space mapped by the storage medium in the storage system a space in .
  • the value of the character sub-block can indicate whether the data in the pointed space is located in the corresponding storage layer.
  • the global index is to establish the corresponding relationship between the logical storage space and the storage medium.
  • the logical storage space mapped by the storage medium in the storage system is refined, and the logical storage space is divided into large logical blocks.
  • the logic block can be further refined, and a logic block can be divided into multiple logic sub-blocks.
  • the global index may include multiple character blocks, and one character block points to at least one logical block.
  • a character block includes multiple subgroups, and each subgroup points to a logical subblock in the at least one logical block.
  • Each subgroup includes a plurality of character subblocks, and the character subblocks correspond to a storage layer in the I/O stack.
  • the data read request includes the logical address of the target data, and the logical address indicates a space in the logical storage space.
  • the processing unit queries the global index based on the data read request, it can be determined in the global index according to the logical address of the target data A plurality of character sub-blocks pointing to the logical address of the target data; then, determine the storage layer where the target data in the I/O stack is located according to the values of the plurality of character sub-blocks.
  • the global index can associate a logical storage space (also can be understood as a logical address) with a storage layer (also can be understood as a storage medium) in the I/O stack.
  • the processing unit can use the logical address of the target data to query the character sub-block pointing to the logical address in the global index. According to the value of the character sub-block, the storage layer where the target data is located can be conveniently and quickly determined, ensuring the accuracy of data reading. Efficiency.
  • the character sub-blocks in the global index may be represented in various forms, for example, a character sub-block may be a bit, and the value of the bit may be 0 or 1.
  • a character sub-block may be a bit, and the value of the bit may be 0 or 1.
  • the value of this bit is 1, it means that the target data is located in the storage layer corresponding to the character sub-block.
  • the value of this bit is 0, it means that the target data is located in the storage layer corresponding to the character sub-block. If there are multiple non-zero bits, then the target data is located in the highest layer among the storage layers corresponding to the multiple non-zero bits.
  • a character sub-block can be a counter, and the value of the counter can be 0 or a non-zero integer.
  • a counter has a value of 0, it means that no data is written into the storage layer corresponding to the counter.
  • the non-zero integer represents the number of times data is written to the storage layer corresponding to the counter.
  • a non-zero counter among the plurality of counters determined to point to the logical address of the target data in the global index may indicate that the target data is located in the storage layer corresponding to the non-zero counter. If there are multiple non-zero counters, then the target data is located in the highest layer among the storage layers corresponding to the multiple non-zero counters.
  • the processing unit when it determines multiple character sub-blocks pointing to the logical address of the target data in the global index according to the logical address of the target data, it may perform a hash operation on the logical address of the target data, such as Query the hash table or act on the hash function, and determine multiple character sub-blocks pointing to the logical address of the target data according to the result of the hash operation.
  • a hash operation on the logical address of the target data, such as Query the hash table or act on the hash function, and determine multiple character sub-blocks pointing to the logical address of the target data according to the result of the hash operation.
  • the method of hash operation is relatively simple and fast, and multiple character sub-blocks pointing to the logical address of the target data can be quickly determined to ensure the efficiency of data reading.
  • the processing unit when it determines multiple character sub-blocks pointing to the logical address of the target data according to the result of the hash operation, it may first determine the logical block of the logical address pointing to the target data in the global index. The character block, and then determine a plurality of character sub-blocks pointing to the logical address of the target data from the character block according to the logical address of the target data.
  • the processing unit can first locate a character block pointing to a larger logical block, and then locate a character sub-block pointing to a smaller logical sub-block from the character block.
  • the processing unit first locates the character blocks in a large range, and then locates the character sub-blocks in a small range, which can improve the efficiency of locating the character sub-blocks.
  • the processing unit determines a plurality of character sub-blocks pointing to the logical address of the target data from the character block according to the logical address of the target data.
  • the offset between the logical blocks determines the target subgroup among the multiple subgroups in the character block, and the character subblocks in the target subgroup are multiple character subblocks pointing to the logical address of the target data. That is to say, the processing unit can determine the address indicated by the logical address in the global index according to the offset between the start address of the logical block pointed to by the character block and the logical address of the target data, and the data length of the target data.
  • Each subgroup of the logical subblock ie, the target subgroup).
  • the character sub-blocks in each subgroup are the character sub-blocks pointing to logical addresses in the global index.
  • the processing unit can more accurately determine the character sub-block pointing to the logical address in the global index through the offset of the logical address in the logical block and the data length.
  • the processing unit may be located in a network card of a device in the storage system, or in a processor of a device in the storage system, and the processor may also be a data processor, or may be a A separate hardware component.
  • the network card or processor of the device in the storage device can have the function of processing data read requests, which effectively expands the application scenarios.
  • the client device can obtain the global index from the storage system through unilateral RDMA and metadata for the target data.
  • the processing unit may feed back the global index and the metadata of the target data to the client device under a first instruction of the client device, where the first instruction is based on RDMA transmission.
  • the global index fed back by the processing unit to the client device may be the entire global index or a partial global index, for example, only all character sub-blocks or part of the character sub-blocks in the global index that only need the logical address of the target data are fed back.
  • the storage system may notify the client device of the address (that is, the memory address) of the global index in the memory of the storage system in advance.
  • the client device can read the global index and the metadata of the target data through unilateral RDMA, without the participation of the processor in the storage system, which can improve the efficiency of data interaction.
  • the client device when the processing unit is a network card, the global index is located in the memory of the device in the storage system, and the metadata of the target data is located in the persistent memory of the device in the storage system, the client device can use unilateral RDMA
  • the global index is obtained from the storage system by means of direct access, and the metadata of the target data is obtained from the storage system by means of direct access.
  • the processing unit may feed back the global index to the client device under the second instruction of the client device, and the second instruction is based on RDMA transmission; the processing unit may also obtain the target data from the persistent storage under the third instruction of the client device The metadata of the target data is fed back to the client device.
  • the storage system may notify the client device of the address (that is, the memory address) of the global index in the memory of the storage system in advance.
  • the client device can read the global index through unilateral RDMA, and obtain the metadata of the target data through direct access, without the participation of the processor in the storage system, which can improve the efficiency of data interaction.
  • the client device can determine whether the metadata of the target data is valid through the obtained global index, that is, determine whether the storage layer indicated by the global index is consistent with the location indicated by the metadata of the target data. If they are consistent, it indicates that the metadata of the target data The data is valid, and the client device can obtain the target data from the storage system through unilateral RDMA.
  • the processing unit may feed back the target data to the client device under a fourth instruction of the client device, where the fourth instruction is initiated according to metadata of the target data and based on RDMA transmission.
  • the storage system allows the client device to obtain the global index through unilateral RDMA, and allows the client device to obtain the target data through unilateral RDMA, which effectively simplifies the interaction process between the storage system and the client device.
  • the processor in the storage system does not need to participate, and the occupation of the processor in the storage system can also be reduced.
  • the target data when the target data is located in the persistent memory of the device in the storage system (the persistent memory of the device in the storage system may belong to one or more layers of the I/O stack of the storage system middle).
  • the client device can determine whether the metadata of the target data is valid through the obtained global index, that is, determine whether the storage layer indicated by the global index is consistent with the location indicated by the metadata of the target data. If they are consistent, it indicates that the metadata of the target data The data is valid, and the client device can obtain target data from the storage system through direct access.
  • the processing unit may obtain the target data from the persistent storage under the fifth instruction of the client device, and feed back the target data to the client device, where the fifth instruction is initiated according to the metadata of the target data.
  • the storage system allows the client device to obtain the global index through unilateral RDMA, and allows the client device to obtain the target data through the direct access method, so that the client device can directly access the index without going through the processor in the storage system. Get target data.
  • the processing unit can also control data flow in the I/O stack (data flows out of one storage layer and then flows into another storage layer) and data elimination (data in one storage layer is Delete), and global indexes can also be updated according to data flow in the I/O stack and data elimination.
  • the processing unit may be a network card or a processor.
  • the processing unit can update the global index for data flow and data elimination in the I/O stack, so that the global index can accurately indicate the storage layer where each data is located, and ensure the accuracy and effectiveness of the data reading process.
  • the embodiment of the present application provides a data access method, which can be executed by a processing unit in the storage system.
  • a processing unit in the storage system.
  • the storage system also includes an I/O stack.
  • the processing unit may receive a data read request, where the data read request is used to request to read target data stored in the storage system.
  • the processing unit can query the global index based on the data read request, and the global index is used to indicate the storage layer where the target data in the I/O stack is located; after that, the processing unit can according to the storage layer indicated by the global index , to read the target data.
  • the processing unit may also process the data writing request.
  • the processing unit may receive a data write request, and the data write request is used for requesting to write target data in the storage system. Afterwards, the processing unit can write the target data into the storage system according to the data write request, and can also update the global index according to the target data, and the updated global index is used to indicate the storage layer where the target data in the I/O stack is located.
  • the data read request includes the logical address of the target data
  • the processing unit queries the global index based on the data read request, it can determine the logical address pointing to the target data in the global index according to the logical address of the target data.
  • a plurality of character sub-blocks of the address after that, determine the storage layer where the target data in the I/O stack is located according to the values of the plurality of character sub-blocks.
  • the character sub-blocks in the global index have multiple representation forms.
  • a character sub-block may be a bit, and the value of the bit may be 0 or 1. When the value of this bit is 1, it means that the target data is located in the storage layer corresponding to the character sub-block. When the value of this bit is 0, it means that the target data is located in the storage layer corresponding to the character sub-block.
  • a character sub-block can be a counter, and the value of the counter can be 0 or a non-zero integer.
  • a counter has a value of 0, it means that no data is written into the storage layer corresponding to the counter.
  • the non-zero integer indicates that the target data is located in the corresponding storage layer, and may also indicate the number of times data (the data includes the target data) has been written into the storage layer corresponding to the counter.
  • the processing unit when it determines multiple character sub-blocks pointing to the logical address of the target data in the global index according to the logical address of the target data, it may perform a hash operation on the logical address of the target data, such as Query the hash table or act on the hash function, and determine multiple character sub-blocks pointing to the logical address of the target data according to the result of the hash operation.
  • a hash operation on the logical address of the target data, such as Query the hash table or act on the hash function, and determine multiple character sub-blocks pointing to the logical address of the target data according to the result of the hash operation.
  • the processing unit when it determines multiple character sub-blocks pointing to the logical address of the target data according to the result of the hash operation, it may first determine the logical block of the logical address pointing to the target data in the global index. The character block, and then determine a plurality of character sub-blocks pointing to the logical address of the target data from the character block according to the logical address of the target data.
  • the processing unit determines a plurality of character sub-blocks pointing to the logical address of the target data from the character block according to the logical address of the target data.
  • the offset between the logical blocks determines the target subgroup among the multiple subgroups in the character block, and the character subblocks in the target subgroup are multiple character subblocks pointing to the logical address of the target data. That is to say, the processing unit can determine the address indicated by the logical address in the global index according to the offset between the start address of the logical block pointed to by the character block and the logical address of the target data, and the data length of the target data.
  • Each subgroup of the logical subblock ie, the target subgroup).
  • the character sub-blocks in each subgroup are the character sub-blocks pointing to logical addresses in the global index.
  • the client device can obtain the global index from the storage system through unilateral RDMA and metadata for the target data.
  • the processing unit may feed back the global index and the metadata of the target data to the client device under a first instruction of the client device, where the first instruction is based on RDMA transmission.
  • the global index fed back by the processing unit to the client device may be the entire global index or a partial global index, for example, only all character sub-blocks or part of the character sub-blocks in the global index that only need the logical address of the target data are fed back.
  • the global index is located in the memory of the device in the method
  • the metadata of the target data is located in the persistent storage of the device in the method
  • the processing unit may feed back the The global index
  • the second indication is based on RDMA transmission.
  • the processing unit may also acquire the metadata of the target data from the persistent storage under the third instruction of the client device, and feed back the metadata of the target data to the client device.
  • the client device when the processing unit is a network card, the global index is located in the memory of the device in the storage system, and the metadata of the target data is located in the persistent memory of the device in the storage system, the client device can use unilateral RDMA
  • the global index is obtained from the storage system by means of direct access, and the metadata of the target data is obtained from the storage system by means of direct access.
  • the processing unit may feed back the global index to the client device under the second instruction of the client device, and the second instruction is based on RDMA transmission; and may also obtain the metadata of the target data from the persistent storage under the third instruction of the client device. Data, which feeds back metadata of the target data to the client device.
  • the client device can determine whether the metadata of the target data is valid through the obtained global index, that is, determine whether the storage layer indicated by the global index is consistent with the location indicated by the metadata of the target data. If they are consistent, it indicates that the metadata of the target data The data is valid, and the client device can obtain the target data from the storage system through unilateral RDMA.
  • the processing unit may feed back the target data to the client device under a fourth instruction of the client device, where the fourth instruction is initiated according to metadata of the target data and based on RDMA transmission.
  • the processing unit may also control data flow and data elimination in the I/O stack; and update the global index according to the data flow and data elimination in the I/O stack.
  • the embodiment of the present application also provides a network card.
  • the network card may be a network card on a device in a storage system, and the network card has the method examples in the above-mentioned second aspect and each possible implementation manner of the second aspect.
  • the function and beneficial effect of the behavior please refer to the description of the first aspect and will not go into details here.
  • the embodiment of the present application also provides a processor, which may be a processor on a device in a storage system, and the processor has the following functions for realizing the above second aspect and each possible implementation manner of the second aspect
  • a processor which may be a processor on a device in a storage system
  • the processor has the following functions for realizing the above second aspect and each possible implementation manner of the second aspect
  • the function of the behavior in the method example, the beneficial effect can refer to the description of the first aspect and will not be repeated here.
  • the embodiment of the present application also provides a data access device, the data access device has the function of implementing the behavior in the method example of the second aspect above, and the beneficial effects can be referred to the description of the first aspect, which will not be repeated here.
  • the functions may be implemented by hardware, or may be implemented by executing corresponding software through hardware.
  • Hardware or software includes one or more modules corresponding to the above-mentioned functions.
  • the structure of the device includes a transmission module, a reading module, and optionally a writing module, and a control module. These modules can perform the corresponding functions in the method example of the second aspect above, specifically Refer to the detailed description in the method example, and do not repeat them here.
  • the embodiment of the present application provides a data access system.
  • the data access system includes a storage system and a client device.
  • the storage system includes an I/O stack and a processing unit.
  • I/O stack please refer to the foregoing content. I won't repeat them here.
  • the client device may send a data read request to the storage system, where the data read request is used to request to read target data stored in the storage system.
  • the processing unit in the storage system can query the global index based on the data read request, and the global index is used to indicate the storage layer where the target data in the I/O stack is located. After that, read the target data according to the storage layer indicated by the global index, and feed back the target data to the client device.
  • the client device may also send a data write request to the storage system, where the data write request is used to request to write target data in the storage system.
  • the processing unit can write the target data according to the data write request, and can update the global index according to the target data, and the updated global index is used to indicate the storage layer where the target data is located in the I/O stack.
  • the data read request includes the logical address of the target data
  • the processing unit queries the global index based on the data read request, it can determine the logical address pointing to the target data in the global index according to the logical address of the target data.
  • a plurality of character sub-blocks of the address after that, determine the storage layer where the target data in the I/O stack is located according to the values of the plurality of character sub-blocks.
  • the character sub-block is a bit, and the value of the bit includes 0 or 1. 1 indicates that the target data is located in the storage layer corresponding to the character sub-block, and 0 indicates that the target data is located in the storage layer corresponding to the character sub-block. layer.
  • the character sub-block is a counter, and the counter is 0 or a non-zero integer, and the non-zero integer indicates the number of times data is written to the storage layer corresponding to the character sub-block.
  • the processing unit is a processing chip with computing power, such as a data processor, which may be located in the network card of the storage system, may also be located in the central processing unit, or may be a independent hardware components.
  • the processing unit when it determines multiple character sub-blocks pointing to the logical address of the target data in the global index according to the logical address of the target data, it may perform a hash operation on the logical address of the target data, such as Query the hash table or act on the hash function, and determine multiple character sub-blocks pointing to the logical address of the target data according to the result of the hash operation.
  • a hash operation on the logical address of the target data, such as Query the hash table or act on the hash function, and determine multiple character sub-blocks pointing to the logical address of the target data according to the result of the hash operation.
  • the processing unit when it determines multiple character sub-blocks pointing to the logical address of the target data according to the result of the hash operation, it may first determine the logical block of the logical address pointing to the target data in the global index. The character block, and then determine a plurality of character sub-blocks pointing to the logical address of the target data from the character block according to the logical address of the target data.
  • the processing unit determines a plurality of character sub-blocks pointing to the logical address of the target data from the character block according to the logical address of the target data.
  • the offset between the logical blocks determines the target subgroup among the multiple subgroups in the character block, and the character subblocks in the target subgroup are multiple character subblocks pointing to the logical address of the target data.
  • the client device may initiate a first indication to the storage system based on RDMA, where the first indication is used to request the global index and metadata of the target data.
  • the processing unit may feed back the global index and the metadata of the target data to the client device under the first instruction of the client device.
  • the client device can request to obtain the entire global index, or only obtain a part of the global index. For example, the client device can determine the global index pointing to the The memory address of part or all of the character sub-blocks of the logical address, according to the memory address of the part or all of the character sub-blocks, initiates a first instruction to the storage system to request the part or all of the character sub-blocks.
  • the processing unit may notify the client device of the memory address of the global index in the storage system.
  • the metadata of the target data is located in the persistent storage of the device in the system.
  • the client device may initiate a second indication to the storage system based on RDMA, where the second indication is used to request a global index; and may also initiate a third indication to the storage system, where the third indication is used to request metadata of the target data.
  • the processing unit may feed back the global index to the client device under the second instruction of the client device; obtain the metadata of the target data from the persistent storage under the third instruction of the client device, and feed back the metadata of the target data to the client device. data.
  • the client device can request to obtain the entire global index, or only obtain a part of the global index. For example, the client device can determine the global index pointing to the The memory address of part or all of the character sub-blocks of the logical address, according to the memory address of the part or all of the character sub-blocks, initiates a second instruction to the storage system to request the part or all of the character sub-blocks.
  • the client device when the target data is located in the memory of the device in the system; the client device can verify the validity of the metadata of the target data according to the global index; when it is determined that the metadata of the target data is valid, A fourth indication is initiated to the storage system according to the metadata of the target data, where the fourth indication is based on RDMA transmission. Afterwards, the processing unit may feed back the target data to the client device under the fourth instruction of the client device.
  • the client device may check the validity of the metadata of the target data according to the global index; and initiate a fifth indication to the storage system according to the metadata of the target data when it is determined that the metadata of the target data is valid.
  • the processing unit may obtain the target data from the persistent storage under the fifth instruction of the client device, and feed back the target data to the client device.
  • the processing unit may also control data flow and data elimination in the I/O stack; and update the global index according to the data flow and data elimination in the I/O stack.
  • the present application also provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when it is run on a computer, the computer executes the above-mentioned second aspect and various possible implementations of the second aspect methods in methods.
  • the present application further provides a computer program product including instructions, which, when run on a computer, cause the computer to execute the above second aspect and the method in each possible implementation manner of the second aspect.
  • the present application also provides a computer chip, the chip is connected to the memory, and the chip is used to read and execute the software program stored in the memory, and execute the method in the above-mentioned second aspect and each possible implementation manner of the second aspect .
  • FIG. 1 is a schematic structural diagram of a system provided by the present application.
  • Fig. 2 is a schematic structural diagram of an I/O stack provided by the present application.
  • 3A to 3B are schematic diagrams of a global index provided by the present application.
  • 4 to 5 are schematic diagrams of a data access method provided by the present application.
  • FIGS. 6 to 7 are schematic diagrams of another data access method provided by this application.
  • FIG. 8 is a schematic structural diagram of a data processing device provided by the present application.
  • Metadata is data describing data (data about data). Metadata can indicate the attributes of data. For example, metadata can record the physical address of data, modification information of data, etc.
  • RDMA Remote direct memory access
  • RDMA is a technology that bypasses the operating system kernel of a remote device (such as a storage device) to access data in its memory. Because it does not go through the operating system, it not only saves a lot of processor resources, but also improves system throughput and reduces system traffic. Network communication delay, especially suitable for wide application in large-scale parallel computer clusters.
  • RDMA has several major characteristics, (1) data is transmitted between the network and the remote device; (2) without the participation of the operating system kernel, all content related to sending and transmitting is offloaded to the smart network card; (3) virtualized in the user space Direct data transmission between the memory and the iNIC does not involve the operating system kernel, and there is no additional data movement and copying.
  • client for short
  • server in this embodiment of the application, the server can be understood as a storage device
  • the client is deployed on the user side, and the user can initiate a request to the server through the client.
  • the server can be deployed at the remote end.
  • the server generally refers to the storage system, and can be specifically understood as a device in the storage system.
  • Unilateral RDMA can be divided into RDMA read (READ) and RDMA write (WRITE).
  • the client can directly determine the location of the target data in the memory of the server.
  • the message is sent to the server.
  • the network card on the server side reads the data on the location information.
  • the processor on the server side is not aware of a series of operations on the client side. In other words, the processor on the server side does not know that the client has performed a read operation, thus reducing the consumption of the processor participating in the data transmission process and improving the performance of the system for processing business, with high bandwidth, low latency and low CPU usage. rate features.
  • the client device can read the subgroups in the global index from the server through unilateral RDMA, and can also read the data of a certain layer of the target data in the I/O stack from the server through unilateral RDMA metadata.
  • Bilateral RDMA can be divided into RDMA transmission (SEND) and RDMA reception (RECEIVE).
  • the client does not know where the metadata of the target data is stored in the memory of the server, so the message initiated by the client to request to read data does not carry metadata location information.
  • the processor on the server side queries the location information of the metadata and returns it to the client, and the client sends a message to the server again to request to read the data.
  • the text contains the location information of the metadata (that is, the address of the metadata).
  • the network card of the server obtains the metadata according to the location information of the metadata, further obtains the target data, and sends the target data to the client.
  • bilateral RDMA requires the processor on the server side to process messages from the client, so unilateral RDMA takes less time to read data than bilateral RDMA , lower processor usage and better user experience. Therefore, the application of unilateral RDMA is getting wider and wider.
  • Pass-through access is a way to read and write data from the server-side persistent storage (such as hard disk) without going through the server-side processor.
  • the client can determine the location of the target data in the server's persistent storage, and the client can communicate with the controller in the server's hard disk through the server's network card, and then read from the server's hard disk. data or write data.
  • the client when the metadata (or index) of the data is stored on the hard disk of the server, the client can read the data of a certain layer of the data in the I/O stack from the server through direct access. metadata.
  • the client can also read the data of the target data in this layer of the I/O stack from the server through direct access.
  • FIG. 1 it is a schematic structural diagram of a data access system provided by an embodiment of the present application.
  • the system includes a client device 200 and a storage system 100.
  • the storage system 100 includes multiple storage devices. In FIG. 1 only A storage device 110 of the storage system is exemplarily shown.
  • client devices 200 Users access data through applications.
  • the computers running these applications are referred to as "client devices 200".
  • the client device 200 may be a physical machine or a virtual machine.
  • Physical client devices 200 include, but are not limited to, desktop computers, servers, laptop computers, and mobile devices.
  • a user may initiate a data access request, such as a data write request or a data read request, to the storage device 110 in the storage system 100 through the client device 200 .
  • the storage device 110 receives the data access request and processes the data access request.
  • the storage device 110 processes the data access request and executes the data access method provided in the embodiment of the present application as an example for description.
  • the data access request is a data write request, which is used to request to write target data in the storage system 100 .
  • the data writing request includes the target data and the logical address of the target data.
  • the storage device 110 may first write the target data to the position indicated by the logical address of the target data according to the data write request, and the position may be located at a position in the I/O stack. layer; and update the global index, and the updated global index can indicate the layer where the target data is located in the I/O stack.
  • the I/O stack is a layered structure formed by dividing the storage media in the storage system into layers.
  • Each layer in the I/O stack includes one or more storage media that can be used to store data.
  • the global index can associate the logical address of the data with the layer where the data resides, so as to indicate the layer where the data resides in the I/O stack.
  • the data access request is a data read request, which is used to request to read target data from the storage system 100, and the data read request carries a logical address of the target data.
  • the storage device 110 can determine the layer where the target data is located in the I/O stack according to the global index and the logical address of the target data, and then read the target data from the layer where the target data is located. data.
  • the storage device 110 includes a bus 111 , a processor 112 , a memory 113 , a network card 114 and a hard disk 115 .
  • Memory 113 may be located in processor 112 .
  • a hard disk is used as an example of a persistent memory of the storage device for illustration, but the embodiment of the present application is also applicable to mechanical hard disks or other types of hard disks.
  • Processor 112 can be central processing unit (central processing unit, CPU), and this processor 112 can also be other general processors, digital signal processor (digital signal processor, DSP), application specific integrated circuit (application specific integrated circuit, ASIC) ), field programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, artificial intelligence chips, chip-on-chip, etc.
  • CPU central processing unit
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • FPGA field programmable gate array
  • the memory 113 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM), a dynamic random access memory (dynamic random access memory, DRAM), and the like. It can also be a non-volatile memory (non-volatile memory), such as a storage-class memory (storage-class memory, SCM), etc., or a combination of a volatile memory and a non-volatile memory.
  • the memory 113 can be divided from a functional point of view, and the memory 113 can be divided into a write cache, a read cache, and the like.
  • the write cache refers to a cache that can provide high-efficiency write capabilities
  • the read cache refers to a cache that can store data with a high read frequency.
  • the storage device 110 may also include one or more hard disks 115 .
  • the hard disk 115 can be used to permanently store data.
  • the hard disk 115 may also include a hard disk cache and a persistent storage medium.
  • the data access method provided by the embodiment of the present application may be executed by the processor 112, that is, the processor 112 may execute the data access method provided by the embodiment of the present application by invoking a computer to execute instructions.
  • the data access method provided in the embodiment of the present application can also be executed by the network card 114 , for example, the network card 114 can execute the data processing method provided in the embodiment of the present application by calling the computer-executed instructions stored in the memory 113 .
  • the network card 114 may also invoke computer-executed instructions stored inside the network card 114 to execute the data access method provided in the embodiment of the present application.
  • computer storage instructions may also be programmed on the network card 114, and the network card 114 may execute the data access method provided in the embodiment of the present application.
  • the storage system in FIG. 1 can be either a centralized storage system or a distributed storage system.
  • the concept of storage medium layering is introduced. Divide the storage medium into multiple layers from top to bottom (from high to low).
  • the embodiment of the present application does not limit the division standard used when the storage medium is layered.
  • the storage medium can be divided only according to the type of the storage medium, and the storage medium of the same type is divided into one layer to form a top-down ( from high to low) in multiple layers.
  • the storage medium may only be divided according to the function of the storage medium to form multiple layers from top to bottom (from high to low).
  • the storage medium may be divided into multiple layers from top to bottom (from high to low) by comprehensively considering the function and type of the storage medium.
  • an input/output (input/out, I/O) stack of the storage system can be formed.
  • I/O stack it is divided into a performance layer and a capacity layer from top to bottom in a coarse-grained manner.
  • the read and write performance of the performance tier is better than that of the capacity tier, but the capacity of the performance tier is relatively small.
  • write cache write cache, Wcache
  • read cache read cache
  • Rcache hard disk cache
  • the capacity layer can be divided into high-performance layer and ordinary performance layer.
  • the high-performance layer can include high-performance hard disks, such as solid-state drives (SSD).
  • the general performance tier may include a hard disk with general performance, such as a mechanical hard disk (HDD).
  • each layer in the capacity layer is only an example.
  • the types of hard disks included in different storage systems 100 may also be different.
  • some storage systems 100 may only include one type of hard disk, and in this case, the capacity layer may not be further refined.
  • the arrangement of the layers in the I/O stack describes the flow of data in the storage system 100 , and in the storage system 100 data usually flows from top to bottom in the layers of the I/O stack.
  • data A when data A is written into the storage system 100, it will be written into the performance layer first. Inside the performance layer, it will be written to the write cache first.
  • W When the data in the write cache exceeds the threshold W, the data in the write cache will be migrated to the read cache. In this way, free storage space will be formed in the write cache to store the latest write The data.
  • the data in the read cache exceeds a certain threshold R, the data in the read cache will continue to migrate downwards to the hard disk cache.
  • the data in the hard disk cache will flow to the capacity layer.
  • the data in the hard disk cache will be migrated to the high-performance tier in the capacity tier first.
  • the data in the high-performance tier reaches the threshold H, the data in the high-performance tier will be migrated to the normal performance tier.
  • each layer of the I/O stack may include one or more storage media, which may be used to store data.
  • the data stored in the layer of the I/O stack may be indexed. That is, when data is written into this layer, an index can be created for the data.
  • the index can indicate the correspondence between the logical address of the data and the metadata of the data. According to the logical address of the data, the metadata of the data can be determined, and then the physical address of the data can be determined.
  • the embodiment of the present application does not limit the method of indexing data in each layer.
  • the data can be indexed through a hash table, that is, the index of the data is stored in the layer in the form of a hash table.
  • the hash table The corresponding relationship between the logical address of the data and the metadata is recorded in .
  • data can also be indexed in the form of a B+ tree or a linked list.
  • the storage media in each layer are sequentially arranged in the order of the I/O stack from top to bottom Find the data requested by the data read request until the data requested by the data read request is found. After the requested data is found, the data is fed back to the client device 200 .
  • This method of searching for data in each layer in the top-to-bottom order of the I/O stack is a relatively common method.
  • the time-consuming to find the data is greater. This will reduce the processing efficiency of data read requests.
  • a global index (global mask) is set in the storage system 100, and the global index can indicate where the data on the logical address is located in the I/O stack. layer.
  • the storage device 110 After the storage device 110 receives the data read request, it can determine the layer where the target data in the I/O stack is located according to the logical address of the requested read target data carried in the data read request and the global index. Afterwards, the target data is read from the layer according to the logical address. In this way, in the process of processing the entire data read request, it is not necessary to search for data in each layer according to the top-down order of the I/O stack, and the global index can be used to directly determine the location of the requested data. layer, which can greatly reduce the delay of data search and improve the processing efficiency of data read requests.
  • the global index is the key to simplifying the data reading process.
  • the composition of the global index will be described below.
  • the global index can exist in any one or both of the following forms.
  • the physical storage space composed of various storage media in the storage system 100 can be mapped to a logical storage space, the logical storage space is divided according to a set size, and the logical storage space is divided into multiple logical blocks.
  • Each logical block can be the same size.
  • each logical block may be equal to 256 kilobytes (Kibibyte, KB), and for another example, each logical block may be equal to 1 megabyte (megabyte, MB).
  • Each logical block can be divided into multiple logical sub-blocks according to the minimum reading and writing unit of the I/O stack. The size of each logical sub-block may be equal to the minimum read/write unit of the I/O stack.
  • the minimum reading and writing unit of the I/O stack refers to the minimum amount of data written to the I/O stack at one time during the data writing process, or the minimum amount of data read from the I/O stack at one time during the data reading process .
  • the minimum amount of data written at one time is the same as the minimum amount of data read. For example, when writing a 256KB data to the storage system 100, if the minimum read/write unit of the I/O stack is 8KB, then when writing the 256KB data, it will be divided into multiple times (32 times). Write 8KB of data until all the 256KB of data is written.
  • a character block for pointing to a logical block is set in the global index, and the character block includes a character sub-block pointing to a logical sub-block.
  • the specific value of the character block pointing to the logical block can indicate whether data is stored in the logical block, and which layer of the I/O stack the stored data is stored in.
  • the character sub-block pointing to the logical sub-block can indicate whether data is stored in the logical sub-block, and which layer of the I/O stack the stored data is stored in.
  • the global index includes multiple character blocks, each character block is used to point to at least one logical block, each character block includes multiple subgroups, and each subgroup points to a logical subblock in the at least one logical block.
  • Each subgroup includes multiple character sub-blocks, and one character sub-block corresponds to one layer in the I/O stack.
  • the specific value of the character sub-block can indicate whether data is stored in the logical sub-block, and whether the stored data is stored in the layer corresponding to the character sub-block in the I/O stack.
  • the smallest character sub-block is one bit, each sub-group includes multiple bits corresponding to different layers in the I/O stack, and the character block includes multiple sub-groups pointing to different logical sub-blocks.
  • a character block is a group of bits.
  • the counter group the smallest character sub-block is a counter, and usually the counter needs to be represented by multiple bits.
  • Each subgroup includes multiple counters corresponding to different layers in the I/O stack, and the character block includes multiple subgroups pointing to different logical subblocks.
  • a character block is a set of counters.
  • the schematic diagram of the bitmap includes multiple groups of bits in the bitmap, and each group of bits can point to at least one logical block, and each group of bits includes multiple subgroups (grain), each subgroup The group includes multiple bits, and each subgroup points to a logical subblock in the logical block. Different subgroups in a group of bits point to different logical subblocks.
  • the number of bits in each subgroup can be equal to the layer in the I/O stack can also be equal to the total number of layers in the I/O stack minus one.
  • a bit in a subgroup corresponds to a layer in the I/O stack, and different bits correspond to different layers in the I/O stack. For example, for a bit in the subgroup, when the value of the bit is 1, it may indicate that the data in the logical sub-block is located at this layer, and when the value of the bit is 0, it may indicate that the logical sub-block The data in is not in this layer.
  • multiple bits in a subgroup correspond to a layer in the I/O stack in the top-down order of the I/O stack, that is, the first bit corresponds to the first layer in the I/O stack (such as corresponding to the write cache), the second bit corresponds to the second layer in the I/O stack (such as corresponding to the read cache), and the third bit corresponds to the third layer in the I/O stack (such as corresponding to the hard disk cache).
  • the fourth bit corresponds to the fourth layer in the I/O stack (eg, corresponds to the capacity layer).
  • the value of the first bit When the value of the first bit is 1, it can indicate that the data on the logical sub-block pointed to by the subgroup is in the write cache; when the value of the first bit is 0, it can indicate that the data on the logical subblock pointed to by the subgroup is The data on the logical sub-block is not in the write cache.
  • the value of the second bit when the value of the second bit is 1, it can indicate that the data on the logical sub-block pointed to by the subgroup is in the read cache; when the value of the second bit is 0, it can indicate that the subgroup The data on the logical sub-block pointed to is not in the read cache.
  • the value of the third bit when the value of the third bit is 1, it can indicate that the data on the logical sub-block pointed to by the subgroup is located in the hard disk cache; when the value of the third bit is 0, it can indicate that the subgroup The data on the logical sub-block pointed to is not in the hard disk cache.
  • the value of the fourth bit when the value of the fourth bit is 1 (the fourth bit is not shown in Figure 2), it may indicate that the data on the logical sub-block pointed to by the subgroup is located at the capacity layer, when the fourth bit When the value of is 0, it may indicate that the data on the logical sub-block pointed to by the sub-group is not in the capacity layer.
  • the bit corresponding to the last layer may not be set, for example, the fourth bit may be omitted.
  • Data flow refers to the inflow or outflow of data in each layer of the I/O stack, such as migrating data from the upper layer of the I/O stack to the next layer. Another example is that data flows out of one layer in the I/O stack, and the outflowing data flows into another layer in the I/O stack. Data flow mainly occurs in the disk flushing process for the write cache, garbage collection at the capacity layer, data loading when executing a read data request, data prefetch process (such as writing data with high read and write frequency to the read cache) or During data migration in the dynamic storage tiering feature.
  • Data writing refers to writing data to a certain layer of the I/O stack.
  • Data elimination refers to the deletion of data at a certain layer in the I/O stack. For example, when data in the read cache is eliminated, or when garbage collection is performed after the capacity layer, the overwritten data in the capacity layer needs to be deleted.
  • the first bit in a subgroup corresponds to the first layer in the I/O stack, and the value of the first bit is 1, indicating that the logical subblock pointed to by the subgroup is
  • the data is located at the first level in the I/O stack.
  • the threshold W the amount of data in the first layer in the I/O stack reaches the threshold W, the data in the first layer can be migrated to the second layer.
  • the value of the first bit will be reduced by one and become 0.
  • the value 0 indicates The data in the logical subblock pointed to by this subgroup is not in the first layer.
  • the written data is written into the logical sub-block pointed to by the subgroup, then, when the data is written, the value of the first bit will become 1.
  • the written data is also written into the logical sub-block pointed to by the sub-group, and the data written in the logical sub-block before is overwritten. Then, when the data is written, the first A bit will still hold a value of 1. Taking the value of the first bit at this time as 1 as an example, when data is eliminated later, if the data written in the logical sub-block pointed to by this sub-group is found to be inactive data, the sub-block will be deleted. After the data in the logical sub-block pointed to by the group is deleted, the value of the first bit becomes 0.
  • the value of a bit in each subgroup on the bitmap can only indicate whether the data on the logical subblock pointed to by the subgroup is located in this layer, but cannot specifically describe the data on the logical subblock pointed to by the subgroup.
  • the data of is the data written for the first time.
  • a subgroup may not set the bit corresponding to the last layer in the I/O stack.
  • each bit in the subgroup corresponds to the other layers in the I/O stack except the last layer.
  • all the bits in the subgroup are 0, it means that the subgroup points to The data in the logical sub-block of the group is not located in the rest of the I/O stack except the last layer. It can be further explained that the data in the logical sub-block pointed to by the sub-group can only be located in the I/O stack in the last layer.
  • the indicated space of the logical address of the data may be a part of the logical block
  • the indicated space of the logical address may be a part of the logical sub-block ( For the convenience of expression, this part of the logical sub-block is referred to as the logical sub-block indicated by the logical address).
  • the logical sub-block indicated by the logical address For the convenience of expression, it is necessary to first determine the logical block to which the space indicated by the logical address belongs (for the convenience of expression, the logical block to which the space indicated by the logical address belongs can also be referred to as the logical block to which the logical address belongs), One or more groups of bits in the bitmap that point to the logical block are determined. Afterwards, the logical sub-block indicated by the logical address is determined from the logical block, and one or more subgroups pointing to the logical sub-block in the one or more groups of bits in the bitmap are determined.
  • a hash table can also be set in the storage system 100, and the logical address and the bit position are recorded in the hash table. The corresponding relationship of each group of bits in the figure.
  • One or more groups of bits pointing to the logical block to which the logical address belongs can be determined through the logical address of the data and the hash table.
  • a hash function can also be set in the storage system 100, and the logical address is used as the input of the hash function, and the obtained hash value can indicate one or more groups of bits in the bitmap, one or more groups of bits Points to the logical block to which this logical address belongs.
  • the logical address When determining one or more groups of bits in the bitmap, the logical address may be rounded according to the size of the logical block.
  • the number obtained after rounding is used to query the hash table to determine one or more groups of bits pointing to the logical block to which the logical address belongs. Or apply a hash function to the number obtained after rounding to determine one or more groups of bits pointing to the logical block to which the logical address belongs.
  • one or more subgroups can be further determined from the one or more groups of bits, and the logical subblocks pointed to by the one or more subgroups are the logical The logical subblock indicated by the address.
  • the logical subblock indicated by the logical address may be determined according to the offset of the logical address in the logical block and the data length of the target data.
  • the offset of the logical address in the logical block can be determined by the difference between the start address of the logical block and the logical address. Taking the size of each logical block as 256KB, the logical block includes 32 logical sub-blocks, the size of each logical sub-block is 8KB, and each group of bits includes 32 sub-groups as an example. If the location indicated by the logical block address (LBA) of the data is 1MB+520KB, the data length is 256KB. You can first round 1MB+520KB according to the size of the logical block (256K) to obtain 1MB+512KB. By applying a hash function to 1MB+512KB, two groups of bits pointing to the logical block to which the logical address belongs can be determined.
  • LBA logical block address
  • 1MB+512KB is the starting address of the logical block
  • the offset of the logical address in the logical block is the difference between 1MB+512KB and 1MB+520KB, that is, the offset is 8KB.
  • the location indicated by the logical address is the location where the logical block is offset by 8KB.
  • the logical sub-block indicated by the logical address is the logical sub-block offset by one logical sub-block after the starting position of the logical block, that is, the second logical sub-block .
  • the logical sub-block pointed to by this logical address is the first logical sub-block from the second logical sub-block to the 32nd logical sub-block and the next logical sub-block in this logical block, a total of 32 logical sub-blocks.
  • the subgroups pointing to the 32 logical subblocks on the bitmap are the second subgroup to the 32nd subgroup subblock in a group of bits pointing to the logical block and the first subgroup in the next group of bits.
  • the data on the logical subblock pointed to by the subgroup can be searched from the layer.
  • the data index in this layer can be searched according to the logical address of the data, the metadata of the data can be determined, and then the data can be read from the position indicated by the metadata.
  • the subgroup corresponds to the first layer (write cache) and the second layer (read cache) in the I/O stack respectively. cache) and the third layer (hard disk cache). If the value of the three bits in the subgroup is 100, it means that the data on the logical subblock pointed to by the subgroup is in the write cache, and the data can be read from the write cache. If the value of the three bits in the subgroup is 010, it means that the data on the logical subblock pointed to by the subgroup is in the read cache, and the data can be read from the read cache.
  • the value of the three bits in the subgroup is 000, it means that the data on the logical subblock pointed to by the subgroup is not located in the first three layers of the I/O stack, but in the fourth layer capacity tier, the data can be read from the capacity tier. If the value of the three bits in the subgroup is 110, it means that the data on the logical subblock pointed to by the subgroup is stored in the first two layers of the I/O stack. When data is written into the storage system 100, it is preferentially written to the first layer of the I/O stack, and the latest data written on the logical sub-block is on the first layer of the I/O stack.
  • the schematic diagram of the counter group includes multiple groups (group) counters in the counter group, and each group of counters can point to at least one logic block, and each group of counters includes a plurality of subgroups (grain), in each subgroup Including multiple counters, each subgroup points to a logical subblock in the logical block, different subgroups in a set of counters point to different logical subblocks, and the number of counters in each subgroup can be equal to the total number of layers in the I/O stack , which can also be equal to the total number of layers in the I/O stack minus one.
  • a counter in a subgroup corresponds to a layer in the I/O stack, and different counters correspond to different layers in the I/O stack.
  • multiple counters in a subgroup correspond to a layer in the I/O stack in the top-down order of the I/O stack, that is, the first counter corresponds to the first layer in the I/O stack (such as corresponding to the write cache), the second counter corresponds to the second layer in the I/O stack (such as corresponding to the read cache), and the third counter corresponds to the third layer in the I/O stack (such as corresponding to the hard disk cache).
  • the fourth counter corresponds to the fourth layer in the I/O stack (eg, corresponds to the capacity layer).
  • a counter in this subgroup when the value of the counter is null or 0, it can indicate that the data in the logical sub-block is not in this layer; when the value of the counter is non-null or a non-zero integer, it can indicate The data in the logical sub-block is located at this layer, and the specific value on the counter can represent the number of times the data in the logical sub-block is updated. For example, when the value of the first counter is not 0, it may indicate that the data on the logical sub-block pointed to by the subgroup is in the write cache; when the value of the first counter is 0 or null, it may indicate that the The data on the logical subblock pointed to by the subgroup is not in the write cache.
  • the value of the second counter when the value of the second counter is non-zero, it can indicate that the data on the logical sub-block pointed to by the subgroup is in the read cache; when the value of the second counter is 0 or null, it can indicate The data on the logical subblock pointed to by this subgroup is not in the read cache.
  • the value of the third counter when the value of the third counter is non-zero, it can indicate that the data on the logical sub-block pointed to by the subgroup is located in the hard disk cache; when the value of the third counter is 0 or null, it can indicate The data on the logical subblock pointed to by this subgroup is not cached on the hard disk.
  • the value of the fourth counter when the value of the fourth counter is not 0, it can indicate that the data on the logical sub-block pointed to by the subgroup is located in the capacity layer; when the value of the fourth counter is 0 or null, it can indicate The data on the logical subblock pointed to by this subgroup is not in the capacity layer.
  • the outflowing data will flow into the Another layer in the I/O stack.
  • data may be written to the same logical address multiple times. For example, in the I/O stack, data is allowed to be written to the same logical address multiple times. The last written data will overwrite the previously written data. The data.
  • data elimination in the I/O stack such as deleting inactive data in a certain layer in the I/O stack, or deleting overwritten data in a certain layer in the I/O stack.
  • the value of the counter can be used to record the data flow, data writing and data elimination of the I/O stack on the layer.
  • the first counter can correspond to the first layer in the I/O stack, and the value of the first counter is 1, indicating the logical subblock pointed to by the subgroup
  • the data in is at the first level in the I/O stack.
  • the threshold W the amount of data in the first layer in the I/O stack reaches the threshold W, the data in the first layer can be migrated to the second layer.
  • the value of the first counter will be reduced by one, indicating that the subgroup points to The data in the logical sub-block is not in the first layer.
  • the current value of the counter is 1, and then two data writing processes are performed, and the written data is written into the logical subblock pointed to by the subgroup , then, when the data is written for the first time, the value of the first counter will increase by one and become 2. When the data is written for the second time, the value of the first counter will be increased by one more to become 3. The value of the first counter changes to 2 or 3, which may indicate that data has been written 2 times or 3 times successively in the logical subblock pointed to by the subgroup.
  • a subgroup may not set the counter corresponding to the last layer in the I/O stack.
  • each counter in the subgroup corresponds to other layers in the I/O stack except the last layer.
  • all the counters in the subgroup are 0 or null, it means that the subgroup
  • the data in the logical sub-block pointed to by the group is not located in the rest of the I/O stack except the last layer. It can be further explained that the data in the logical sub-block pointed to by the sub-group can only be located in the I/O stack. /O last layer in the stack.
  • Using a logical address to identify one or more groups of counters, and finding one or more subgroups from the group or group of counters is the same as a method of using a logical address to determine one or more groups of bits, and finding one or more subgroups from the group
  • the method of finding one or more subgroups in or group bits is similar, for details, please refer to the foregoing description, which will not be repeated here.
  • Example 1 When determining one or more character blocks (such as one or more sets of bits, one or more sets of counters) of the logical block to which the logical address belongs in the global index, use hash calculation to map the logical address to the character piece. Different logical addresses, after hash calculation, may have the same hash value, resulting in a hash collision, which will map two different logical addresses to the same character block. In this case, the value of the character block mapped to (that is, the value of the character sub-block in each subgroup) actually needs to represent where the data in the logical block to which the two logical addresses belong is located in the I/O stack. layer.
  • the value of the character block mapped to that is, the value of the character sub-block in each subgroup
  • the two logical addresses are LBA1 and LBA2
  • the character block to which the two logical addresses in the global index are mapped is character block A.
  • the characters in each subgroup in the character block A are The value of the first character sub-block a will be described, and the value of other character sub-blocks a is similar to the value of the first character sub-block.
  • the data When data is written to the logical address LBA1, the data will be written to the first layer of the I/O stack first. If the character sub-block a is a bit, the value of this bit will become 1. If the character sub-block is a counter, the value of the counter will also become 1. Afterwards, if data is written to the logical address LBA2, the data will also be preferentially written to the first layer of the I/O stack. If the character sub-block a is a bit, the value of this bit remains 1. If the character sub-block a is a counter, the value of the counter will increase from 1 to 2. It can be seen from this that the value of the counter can clearly record the number of times data is written to the first layer of the I/O stack. In this case, the global index exists in the form of a bitmap or a counter group, both of which can accurately indicate that the data in the logical block to which the logical address LBA1 or LBA2 belongs is located at the first layer of the I/O stack.
  • the data in the logic block to which the logical address LBA1 belongs has a data flow, it flows from the first layer of the I/O stack to the second layer of the I/O stack.
  • the character sub-block a is a bit, the bit The value of will change from 1 to 0, and the next bit of this bit (that is, the bit corresponding to the second layer of the I/O stack) will change from 0 to 1.
  • the character sub-block a is a counter, the value of the counter will change from 2 to 1, and the next counter of the counter (that is, the counter corresponding to the second layer of the I/O stack) will change from 0 to 1.
  • the data in the logical address LBA2 is determined to be located at the first layer of the I/O stack by querying the global index. It is consistent with the layer where the data in the logical address LAB2 in the I/O stack is located, and the data in the logical address LBA2 can be accurately read subsequently.
  • the global index exists in the form of a bitmap, since the value of the bit corresponding to the first layer of the I/O stack is 0, the value of the next bit is 1 , it is determined by querying the global index that the data in the logical address LBA1 is located in the second layer of the I/O stack. It is consistent with the layer where the data in the logical address LAB1 in the I/O stack is located, and the data can be read accurately.
  • the global index exists in the form of a counter group, since the value of the counter corresponding to the first layer of the I/O stack is 1, it is determined that the data in the logical address LBA1 is located at the first layer of the I/O stack by querying the global index. Although it is inconsistent with the layer where the data in the logical address LAB1 in the I/O stack is located, in the process of subsequent data reading, although the data in the logical address LBA1 cannot be queried on the first layer of the I/O stack, but later , can be traversed according to the order of the layers of the I/O stack, and the data in the logical address LBA1 can be queried at the second layer, and the data in the logical address LBA1 can still be read accurately.
  • the value of the counter can not only clearly record the flow status of data in the first layer of the I/O stack, but also solve the problem of hash conflicts to a certain extent and ensure the accuracy of data reading.
  • Example 2 Still taking the character sub-block a and LBA1 as an example, in the scenario of additional writing, when data is written to the logical address LBA1 for the first time, the data will be written to the first layer of the I/O stack first, if The character sub-block a is one bit, and the value of this bit is 1. If the character sub-block a is a counter, the value of the counter is also 1. Afterwards, if data is written to the logical address LBA1 again to cover the previously written data, the data will also be written to the first layer of the I/O stack first. If the character sub-block a is one bit, the bit The value remains 1. If the character sub-block a is a counter, the value of the counter will increase from 1 to 2.
  • the data written for the first time will be deleted from the storage medium, and the global index can be updated during data elimination. If the character sub-block a is a bit, the value of this bit will change from 1 to 0. If the character sub-block a is a counter, the value of the counter will change from 2 to 1. In fact, the layer where the data in the logical address LBA1 is located is still the first layer. When the character sub-block a is a bit, there may be errors in the value. It can be seen that the global index in the form of a counter group can accurately Record the layer where the data in the logical address LBA1 in the I/O stack is located.
  • one counter in the counter group will occupy multiple bits, and the multiple bits are used to represent different values, which also makes the space occupied by the counter group larger than that of the bitmap.
  • the global index may be realized by means of a counter group, or may be realized by a bitmap, or both methods may be applicable.
  • the following describes the data access method provided by the embodiment of the present application with reference to the accompanying drawings.
  • the execution subject of the data access method provided by the embodiment of the present application will be different.
  • the data access method may be executed by the processor 112 of the storage device 110 in the storage system 100 shown in FIG. 1 , or may be executed by the network card 114 of the storage device 110 in the storage system 100 .
  • the two possible cases are described below:
  • Scenario 1 The processor 112 of the storage device 110 in the storage system 100 executes the data access method provided in the embodiment of the present application.
  • a data access method provided by this application, the method includes:
  • Step 401 The processor 112 receives a data write request, the data write request is used to request to write target data, and the data write request carries target data and a logical address of the target data.
  • the logical address may include a start logical address and a data length (length).
  • the starting logical address of the data can be represented by a logical block address (logic block address, LBA) and a logical unit number (logical unit number, LUN).
  • the data writing request may be directly sent by the client device 200 to the storage device 110 .
  • the data writing request may also be sent to the storage device 110 by other storage devices 110 in the storage system 100.
  • there is a device for managing the storage device 110 in the storage system 100 and the device can allocate the storage device 110 for data, It is also possible to instruct the storage device 110 to write data into the storage device 110 .
  • the device may send a data writing request to the storage device 110 .
  • Step 402 The processor 112 of the storage device 110 writes the target data to the location indicated by the logical address according to the data writing request.
  • the processor 112 When the processor 112 writes the target data to the location indicated by the logical address, it may preferentially write the target data to the first layer in the I/O stack, for example, it may preferentially write the target data into the cache.
  • the processor 112 creates an index for the target data when writing to the first layer of the I/O stack, and the index of the target data can indicate the correspondence between the logical address of the target data and the metadata of the target data.
  • the metadata of the object data can indicate the physical address of the object data in the layer.
  • Step 403 the processor 112 updates the global index, and after the update, the global index can indicate the layer where the target data is located in the I/O stack.
  • the processor 112 may determine, according to the logical address of the target data, a character block in the global index pointing to the logical block of the logical address and a subgroup indicating a logical sub-block in the logical block. For example, the processor 112 may query the character block corresponding to the logical address of the target data and the subgroup in the character block in the hash table.
  • the processor 112 can set the character corresponding to the first layer in the I/O stack in the subgroup of the logical sub-block in the logical block
  • the specific value of the sub-block is such that the set value of the character sub-block can indicate that data is stored in the first layer of the I/O stack in the sub-group.
  • the processor 112 can determine one or more groups of bits indicating the logical block to which the logical address belongs according to the logical address of the target data, and then , and then determine the logical sub-block indicated by the logical address according to the offset of the logical address in the logical block and the data length of the target data, and then determine each subgroup pointing to the logical sub-block in the bitmap.
  • the processor 112 can point to the first bit in each subgroup of the logical subblock (that is, the corresponding I/O stack The value of the bit of the first layer) is set to 1 to indicate that the target data is stored in the first layer of the I/O stack.
  • the processor 112 can determine one or more groups of counters indicating the logical block to which the logical address belongs according to the logical address of the target data, and then, Then determine the logical sub-block indicated by the logical address according to the offset of the logical address in the logical block and the data length of the target data, and then determine each subgroup pointing to the logical sub-block in the bitmap.
  • the processor 112 can point to the first counter in each subgroup of the logical subblock (that is, the corresponding I/O stack The value of the counter of the first layer) is increased by 1 to indicate that the target data is written in the first layer of the I/O stack.
  • the embodiment of the present application does not limit the order in which step 402 and step 403 are performed, that is, the processor 112 may first write the target data. After writing the target data, processor 112 updates the global index. The processor 112 may also update the global index first, and then write the target data. Whether it is the writing of target data or the updating of the global index, the processor 112 needs to realize the logical address of the target data.
  • the writing of the target data and the updating of the global index are two relatively independent processes, and there is no Therefore, in the embodiment of the present application, the sequence of execution of step 402 and step 403 is not particularly emphasized.
  • the processor 112 may update the global index first (step 403 is executed first), and then writes the target data into the storage location indicated by the logical address (step 402 is executed again).
  • the processor 112 first updates the global index to update the layer of the target data in the I/O stack to the global index in advance, so that if the processor 112 receives the data for requesting the target data within a short time after receiving the data write request For the read request, because the update of the global index is performed before, the processor 112 can accurately determine the layer where the target data in the I/O stack is located according to the updated global index.
  • Step 404 The processor 112 feeds back a data writing response, indicating that the target data has been successfully written.
  • Steps 401 to 404 are the data writing process.
  • the processor 112 may also execute other data processing processes, such as data flow, data elimination, and the like.
  • the processor 112 may migrate the data in the storage medium of the layer to the storage medium of the next layer, and after the data migration is completed, the processor 112 may update the global index.
  • the processor 112 when the processor 112 updates the global index, the processor 112 can set the bit in the global index corresponding to the layer in the I/O stack to 0 to represent the layer If there is no data in the storage medium of the I/O stack, the bit corresponding to the lower layer of the I/O stack in the global index is set to 1 to indicate that data is stored in the storage medium of the lower layer of the layer.
  • the processor 112 may decrement the value of the counter in the global index corresponding to the layer in the I/O stack by 1 to represent the The data in the storage medium of the layer has been migrated once, and the value of the counter corresponding to the next layer of the layer in the I/O stack in the global index is increased by 1 to represent the storage medium of the next layer of the layer The new data has been moved into .
  • the processor 112 may also migrate data with a higher reading frequency in a certain layer of the I/O stack to a higher layer in the I/O stack, such as the second layer where the read cache is located.
  • the processor 112 may control the outflow of data with a higher reading frequency in the storage medium in this layer, and control the inflow of the data into the storage medium in the second layer.
  • Processor 112 may also update the global index. The operation of the processor 112 to update the global index may be performed after the data flows out and before the data flows into the second layer.
  • the processor 112 can associate the subgroup pointed to by the logical address of the data in the global index with the layer in the I/O stack
  • the corresponding bit is set to 0 to indicate that the data does not exist in the storage medium of this layer
  • the bit corresponding to the second layer in the I/O stack in the subgroup pointed to by the logical address of the data in the global index is set to 1, indicating that the data is stored in the storage medium of the second layer in the I/O stack.
  • the processor 112 can point to the subgroup corresponding to the layer in the I/O stack in the subgroup pointed to by the logical address of the data in the global index.
  • the value of the counter is decremented by 1 to indicate that the data in the storage medium of this layer has been outflowed once, and the value of the counter corresponding to the first layer in the I/O stack in the subgroup pointed to by the logical address of the data in the global index is Add 1 to the value to indicate that new data has flowed into the storage medium of the first layer in the I/O stack.
  • the processor 112 may also delete invalid data in a certain layer in the I/O stack, where the invalid data may be overwritten data in an additional write scenario, or some data with a low reading frequency .
  • the processor 112 can move out and delete the data in the storage medium in the layer, and the processor 112 can also update the global index.
  • the processor 112 when the processor 112 updates the global index, the processor 112 can correspond to the layer in the I/O stack in the subgroup pointed to by the logical address of the data in the global index
  • the bit of is set to 0 to indicate that the data does not exist in the storage medium of this layer.
  • the processor 112 can point to the subgroup corresponding to the layer in the I/O stack in the subgroup pointed to by the logical address of the data in the global index.
  • the value of the counter is decremented by 1, which indicates that the data in the storage medium of this layer has been eliminated once.
  • the processor 112 may also execute the data reading process, see steps 405 to 408 for details.
  • Step 405 The processor 112 receives a data read request, the data read request is used to request to read the target data, and the data write request carries the logical address of the target data.
  • Step 406 The processor 112 queries the global index according to the logical address of the target data, and determines the layer of the target data in the I/O stack.
  • the processor 112 can determine the character block in the global index that executes the logical block to which the logical address belongs according to the logical address of the target data, and then determine according to the offset of the logical address in the logical block and the data length of the target data The character block points to the subgroup of the logical subblock indicated by the logical address. The processor 112 determines the layer where the target data is located according to the value of each character sub-block in the subgroup.
  • the processor 112 determines, according to the logical address of the target data, one or more groups of bits indicating the logical block to which the logical address belongs, and then according to the logical address in the logical block The offset in and the data length of the target data determine the logical sub-block indicated by the logical address, and then determine each subgroup pointing to the logical sub-block in the bitmap.
  • the processor 112 may determine the value of each bit in each subgroup pointing to the logical subblock, and determine that the layer corresponding to the bit value of 1 is the layer where the target data is located.
  • the processor 112 may determine that the layer where the target data is located is the last layer in the I/O stack.
  • the processor 112 determines according to the logical address of the target data one or more groups of counters indicating the logical block to which the logical address belongs, and then according to the logical address in the logical block
  • the offset of the target data and the data length of the target data determine the logical sub-block indicated by the logical address, and then determine each subgroup pointing to the logical sub-block in the bitmap.
  • the processor 112 can determine the value of each counter in each subgroup pointing to the logical subblock, and determine that the layer corresponding to the counter value other than 0 is the layer where the target data is located.
  • the processor 112 may determine that the layer where the target data resides is the last layer in the I/O stack.
  • Step 407 After the processor 112 determines the layer of the target data in the I/O stack, it can directly read the target data from the layer according to the logical address of the target data.
  • the processor 112 may query the index of the target data in the layer according to the logical address of the target data, determine the metadata of the target data, and then read the target data from the location indicated by the metadata.
  • Step 408 The processor 112 feeds back a data read response, where the data read response includes the target data.
  • Scenario 2 The network card 114 of the node in the storage system 100 executes the data access method provided in the embodiment of the present application.
  • FIG 5 it is a data access method provided by the present application, in which the network card 114 of the storage device 110 shown in Figure 1 executes the data writing process and the data reading process, and the network card 114 performs data writing
  • the process and the way of the data reading process are similar to the way that the processor 112 executes the data writing process and the data reading process. Let me repeat.
  • Step 501 The network card 114 of the storage device 110 receives a data writing request.
  • Step 502 According to the data writing request, the network card 114 writes the target data into the storage location indicated by the logical address.
  • the network card 114 When the network card 114 writes the target data into the storage location indicated by the logical address, it may preferentially write the target data into the first layer in the I/O stack, for example, it may preferentially write the target data into the cache.
  • Step 503 The network card 114 updates the global index, and the updated global index can indicate the layer where the target data is located in the I/O stack.
  • Step 504 The network card 114 feeds back a data writing response, indicating that the target data has been successfully written.
  • Steps 501 to 504 are the data writing process.
  • the network card 114 can also execute the data reading process. Refer to steps 505 to 408 for details.
  • Step 505 The network card 114 receives a data read request, the data read request is used to request to read the target data, and the data write request carries the logical address of the target data.
  • Step 506 The network card 114 queries the global index according to the logical address of the target data, and determines the layer of the target data in the I/O stack.
  • Step 507 After determining the layer of the target data in the I/O stack, the network card 114 can directly read the target data from the layer according to the logical address of the target data.
  • the network card 114 may query the index of the target data in the layer according to the logical address of the target data, determine the metadata of the target data, and then read the target data from the location indicated by the metadata.
  • Step 508 The network card 114 feeds back a data read response, and the data read response includes the target data.
  • the data writing request and the data reading request are processed by the network card 114 of the storage device 110, which can effectively reduce the occupation of the processor 112 of the storage device 110, and can also effectively improve the efficiency of the data writing process and the data reading process.
  • the network card 114 may also be provided with a cache, and the network card 114 may preferentially write the target data into the cache in the network card 114 when processing the data write request.
  • the cache in the network card 114 may be added to the I/O stack as a layer of the I/O stack.
  • the cache of the network card 114 is used as the first layer of the I/O stack, and the write cache, read cache, hard disk cache, and capacity layer in the storage device 110 are sequentially used as the second, third, and third layers of the I/O stack.
  • Fourth floor, fifth floor The writing and flow sequence of data in the I/O stack is still carried out in a top-down direction. The difference from the I/O stack mentioned in the previous description is that a new layer is added to the I/O stack.
  • the network card 114 may execute the data writing process and the data reading process, and the processor 112 may execute other data processing processes, such as data flow, data elimination, and the like.
  • the processor 112 executes other data processing procedures, reference may be made to the description in the embodiment shown in FIG. 4 , which will not be repeated here.
  • the index of data in one or more layers of the I/O stack of the storage system 100 supports unilateral RDMA access or pass-through access, that is, the client device 200 can use unilateral RDMA access or pass-through access method to read the metadata for the data in that tier or tiers.
  • Supporting unilateral RDMA access means that the index of the data in the layer or layers is stored in the memory of the storage device 110, and the client device 200 records the start address of the index of the data in the layer or layers.
  • the client device 200 can calculate the memory address of the metadata of the data according to the starting address of the index of the data in the layer and the logical address of the data , the client device 200 may obtain the metadata of the data through unilateral RDMA based on the memory address of the metadata.
  • Supporting pass-through access means that the storage medium in one or more layers is a persistent storage.
  • the persistent storage is a hard disk as an example. Indexes of data stored in one or more layers are stored in the hard disk of the storage device 110 .
  • the client device 200 side records the starting address of the data index in the layer or layers.
  • the client device 200 can calculate the metadata of the data in the hard disk according to the starting address of the index of the data in the layer and the logical address of the data Based on the storage address of the metadata, the client device 200 can obtain the metadata of the data from the hard disk through direct access.
  • the so-called direct access indicates that the client device 200 directly reads the data stored in the hard disk through the network card 114 of the storage device 110 and the controller in the hard disk, and the processor 112 of the storage device 110 does not need to participate in the direct access process.
  • the embodiment of the present application provides a data access method, in which the processor 112 of the storage device 110 does not need to participate.
  • the client device 200 can read the metadata of the target data from the one or more layers through unilateral RDMA access or direct access, and the client device 200 can also access the target data through unilateral RDMA
  • the global index points to some or all character sub-blocks in the target sub-group of the target logical sub-block, wherein the target logical sub-block is the logical sub-block indicated by the logical address of the target data.
  • the client device 200 can determine the layer of the target data in the I/O stack according to the specific values of some or all of the character sub-blocks accessed, and then determine whether the metadata of the read target data is valid, that is, Whether the target data is stored at the storage address indicated by the metadata of the target data. In the case where it is determined that the metadata of the read target data is valid, if the metadata of the target data indicates that the target data is located in memory, the target data can be read through unilateral RDMA access; if the target data When the metadata indicates that the target data is located in the hard disk, the target data can be read through a direct access method. The data access method in this scenario is described below.
  • Scenario 3 The client device 200 accesses target data through unilateral RDAM or direct access.
  • the client accesses the target data through unilateral RDMA.
  • the difference in unilateral RDAM or direct access is only due to the difference in the storage medium where the target data or data index is located. , whether the target data is read from the storage system 100 through unilateral RDAM, or the target data is accessed from the storage system 100 through direct access, the basic process is the same, the difference is only when the target data is read metadata, or target data.
  • the layer in the I/O stack of the storage system 100 that supports unilateral RDMA access or pass-through access is called a pass-through layer, that is, there may be one or more pass-through layers in the I/O stack of the storage system 100 .
  • a data access method provided by this application, the method includes:
  • Step 601 the storage device 110 notifies the client device 200 of the memory address of the global index.
  • the global index may be stored in the memory of the storage device 110, and the storage device 110 may notify the client device 200 of the starting address of the global index in the memory and the length of the global index as the memory address of the global index .
  • Step 602 the client device 200 may initiate a first one-sided RDMA to the storage device 110, and the first one-sided RDMA is used to read the metadata of the target data in the target pass-through layer in the I/O stack of the storage system 100.
  • the target pass-through layer is one or more layers of one or more pass-through layers.
  • Step 603 the client device 200 may initiate a second unilateral RDMA to the storage device 110, and the second unilateral RDMA is used to obtain from the storage device 110 all or part of the characters in each subgroup in the global index pointing to the target logical subblock subblock.
  • some character sub-blocks may not include character sub-blocks corresponding to the target direct layer.
  • the client device 200 may obtain character sub-blocks corresponding to layers above the target direct layer in each subgroup.
  • the embodiment of the present application does not limit the order in which the client device 200 initiates the first unilateral RDMA and the second unilateral RDMA to the storage device 110 .
  • the client device 200 can initiate the first unilateral RDMA and the second unilateral RDMA to the storage device 110 within a relatively short period of time, that is, can initiate the first unilateral RDMA and the second unilateral RDMA synchronously.
  • the first unilateral RDMA and the second unilateral RDMA are described below:
  • the client device 200 side records the starting address of the data index in the target pass-through layer.
  • the client device 200 can calculate the metadata of the data according to the starting address of the index of the data in the target pass-through layer and the logical address of the data. memory address. This does not limit the manner in which the client device 200 calculates the memory address of the metadata of the target data according to the start address of the data index in the target pass-through layer and the logical address of the target data.
  • the logical address of the target data can be Calculate the memory address of the metadata of the target data by querying the hash table, or acting on the hash function, and performing a learning index.
  • the client device 200 may initiate a first request to the network card 114 of the storage device 110 based on RDMA, and the first request is used to request to read the metadata of the target data.
  • the first request carries the memory address of the metadata of the target data.
  • the network card 114 of the storage device 110 can process the first request, obtain the metadata of the target data according to the memory address of the metadata of the target data, and obtain the metadata of the target data.
  • the metadata is carried in the first response, and the first response is fed back to the client device 200 .
  • the embodiment of the present application does not limit the number of target direct layers, and allows the client device 200 to acquire metadata of target data in multiple target direct layers through the first unicast RDMA.
  • the client device 200 may initiate a first unilateral RDMA once to acquire metadata of target data in the target pass-through layer.
  • the client device 200 may initiate the first one-sided RDMA multiple times, and each time the first one-sided RDMA acquires the metadata of the target data in one of the target pass-through layers.
  • the metadata of the target data in the target pass-through layer may be invalid, that is, the data stored at the physical address indicated by the metadata of the target data is not the latest written data.
  • the client device 200 may perform a second one-sided RDMA.
  • the second unilateral RDMA the storage device 110 acquires each subgroup of the logical subblock indicated by the logical address pointing to the target data in the global index.
  • the client device 200 may determine the position in the global index of each subgroup of the logical subblock indicated by the logical address pointing to the target data in the global index according to the logical address of the target data.
  • the client device 200 After determining the position of each subgroup in the global index, since the storage device 110 has notified the client device 200 of the memory address of the global index in the storage device 110, the client device 200 can determine according to the memory address of the global index The memory addresses of the respective subgroups.
  • each logical block includes 32 logical sub-blocks
  • the size of each logical sub-block is 8KB
  • each group of bits includes 32 sub-groups as an example.
  • the location indicated by the LBA of the target data is 1MB+520KB
  • two character blocks pointing to the logical block to which the logical address belongs can be determined.
  • the two character blocks are the third and fourth character blocks in the global index. Then determine the 32 logical sub-blocks indicated by the logical address according to the offset of the logical address in the logical block and the data length of the target data, for example, determine the 32 sub-blocks pointed to in the third and fourth character blocks 32 subgroups of logical subblocks.
  • the subgroups pointing to the 32 logical subblocks are the second subgroup to the 32nd subgroup subblock in the third character block pointing to the logical block and the first in the fourth character block subgroup.
  • the starting positions of the 32 subgroups are positions offset by two character blocks and the length of one subgroup in the global index, and the length of the 32 subgroups is the length of one character block.
  • the total number of character sub-blocks corresponding to the layer of the I/O layer is set to N in each subgroup in the global index. If the global index exists in the form of a bitmap, the size of a subgroup is equal to N bits.
  • a character block is a group of bits, and the size of a group of bits is 32*N bits.
  • the 32 subgroups are located in the global index at positions offset by 32*N+N bits from the start address and with a length of 32*N bits.
  • a character block is a group of counters, and the size of a group of counters is 32*N*M bits.
  • the 32 subgroups are located in the global index at positions offset from the start address of 32*N*M+N*M bits and with a length of 32*N*M bits.
  • the client device 200 can determine the memory address of each subgroup according to the memory address of the global index. For example, the client device 200 may offset the start address of the global index by two character blocks and the length of one subgroup as the start address of each subgroup, and the length of each subgroup is the length of one character block. The start address of each subgroup and the length of each subgroup can be used as the memory address of each subgroup.
  • the client device 200 may also offset the starting address of the global index by two character blocks and the length of one subgroup as the starting address of each subgroup, and offset the starting address of the global index by three characters After the block and the length of a subgroup are used as the end address of each subgroup, the start address and end address of each subgroup can be used as the memory address of each subgroup.
  • the client device 200 can further The memory address of each group is processed, and the memory address of the character sub-block corresponding to the target through layer is removed from the memory address of each sub-group to obtain the memory addresses of some character sub-blocks in each sub-group.
  • each subgroup includes P character subblocks
  • the character subblock corresponding to the target direct layer in each subgroup is the last one Character sub-blocks
  • the memory addresses of some character sub-blocks in 32 subgroups can be 32 address segments
  • the starting address of the 32 address segments is the starting address of the global index offset by two character blocks and a subgroup length address
  • the length of each address segment is the length of a sub-block of P-1 characters.
  • Each address segment is separated by the length of a character sub-block.
  • the starting position of the memory address of some character sub-blocks in the 32 subgroups is located at the offset starting address of 32*N+N bits in the global index, a total of 32 address segments, each The length of the address segment is P-1 bits, and each address segment is separated by 1 bit.
  • the size of a subgroup is equal to N*M bits.
  • a character block is a group of counters, and the size of a group of counters is 32*N*M bits.
  • the starting position of the memory address of some character sub-blocks in the 32 subgroups is located at the offset starting address of 32*N*M+N*M bits in the global index, and the length of each address segment is (P-1)*M Bits, each address segment is separated by M bits.
  • the client device 200 may initiate a second unilateral RDMA to acquire the subgroups from the storage device 110 .
  • the client device 200 may initiate a second request to the network card 114 of the storage device 110 based on RDMA, the second request is used to request to obtain the subgroups in the global index, and the second request carries the memory address of each subgroup.
  • the network card 114 of the storage device 110 can read the subgroups according to the memory addresses of the subgroups, carry the subgroups in the second response, and send the second response to Client device 200.
  • the client device 200 may initiate a second unilateral RDMA to acquire some of the character sub-blocks in each of the sub-groups from the storage device 110 .
  • the client device 200 may initiate a third request to the network card 114 of the storage device 110 based on RDMA, the second request is used to request to obtain some character sub-blocks in each sub-group in the global index, and the second request carries the character sub-blocks of each sub-group The memory address of the partial character subblock in the group.
  • the network card 114 of the storage device 110 can read some character sub-blocks in each sub-group according to the memory addresses of some character sub-blocks in each sub-group, and read some character sub-blocks in each sub-group.
  • the sub-block is carried in the third response, and the third response is sent to the client device 200 .
  • Step 604 The client device 200 checks the validity of the metadata of the target data in the target pass-through layer according to the specific values of each subgroup or some character subblocks in each subgroup.
  • the client device 200 may determine whether the layer where the target data resides is the target pass-through layer according to the respective subgroups.
  • the client device 200 can first determine whether the bits other than the bit corresponding to the target direct layer in each subgroup are 1, and the layer corresponding to the bit with a value of 1 Whether to be above this target passthrough layer. If there is a bit of 1 in each subgroup except the bit corresponding to the target pass-through layer, and the layer corresponding to the bit with a value of 1 is located above the target pass-through layer, it means that the latest write in The target data of the logical address is located in the layer corresponding to the bit whose value is 1, and the metadata of the target data in the target pass-through layer is invalid.
  • the layer corresponding to the bit with a value of 1 is the target pass-through layer.
  • the layer corresponding to this bit is located under the target pass-through layer, which means that the target data newly written at the logical address is located in the target pass-through layer, and the target data in the target pass-through layer
  • the metadata for is valid.
  • the client device 200 can first determine whether the counters in each subgroup other than the counter corresponding to the target direct layer are non-zero, and whether the layer corresponding to the non-zero counter is above the target passthrough layer. If there is a non-zero counter in each subgroup except the counter corresponding to the target pass-through layer, and the layer corresponding to the non-zero counter is above the target pass-through layer, it means that the latest write in The target data of the logical address is located in the layer corresponding to the counter whose value is 1, and the metadata of the target data in the target pass-through layer is invalid.
  • the layer corresponding to the non-zero counter is located in the target direct layer, optionally, there are other counters with a value of 1, and the layer corresponding to the other counter If it is located under the target pass-through layer, it means that the latest target data written in the logical address is located in the target pass-through layer, and the metadata of the target data in the target pass-through layer is valid.
  • the client device 200 may determine whether the layer where the target data resides is the target through layer according to specific values of some character sub-blocks in each subgroup.
  • the client device 200 can first determine whether some of the bits in each subgroup (that is, some of the character subblocks) are 1, and whether the layer corresponding to the bit with a value of 1 is located in the above the target passthrough layer. If there is a bit of 1 in each subgroup except the bit corresponding to the target pass-through layer, and the layer corresponding to the bit with a value of 1 is located above the target pass-through layer, it means that the latest write in The target data of the logical address is located in the layer corresponding to the bit whose value is 1, and the metadata of the target data in the target pass-through layer is invalid.
  • the client device 200 can first determine whether the counters other than some counters (that is, some character sub-blocks) in each subgroup are non-zero, and the layers corresponding to the non-zero counters Whether to be above this target passthrough layer. If there is a non-zero counter in each subgroup except the counter corresponding to the target pass-through layer, and the layer corresponding to the non-zero counter is above the target pass-through layer, it means that the latest write in The target data of the logical address is located in the layer corresponding to the counter whose value is 1, and the metadata of the target data in the target pass-through layer is invalid.
  • the layer corresponding to the non-zero counter is located under the target pass-through layer, it means that the latest target data written in the logical address is located in the target pass-through layer, The metadata of the target data in the target passthrough layer is valid.
  • Step 605 If the metadata of the target data in the target pass-through layer is valid, the client device 200 obtains the target data from the storage device 110 by using the metadata of the target data.
  • the client device 200 may send a fourth request to the network card 114 of the storage device 110 based on RDMA, where the fourth request carries metadata of the target data in the target pass-through layer.
  • the network card 114 of the storage device 110 can determine the physical address of the target data according to the metadata of the target data in the target pass-through layer, read the target data, and feed back the target data through the fourth response to the client device 200.
  • the client device 200 may acquire the target data from the storage device 110 using the embodiment shown in FIG. 4 or 5 .
  • the client device 200 may also acquire target data from the storage device 110 by using bilateral RDMA.
  • Figure 5 and Figure 6 respectively show that the data access method provided by this embodiment is executed by the processor 112 and the network card, and in this embodiment, the method can also be executed by other chips different from the processor 112 or the network card .
  • the chip may be a data processing unit (data processing unit, DPU).
  • the I/O stack in the storage system 100 includes at least two layers, which are respectively a write cache and a read cache, and the global index in the storage device 110 can be stored in memory.
  • Each subgroup of the global index includes two character subblocks, the first character subblock corresponds to the write cache, and the second character subblock corresponds to the read cache.
  • the global index is represented in two forms in memory, one is a bitmap, and the other is a counter group.
  • Step 701 the storage device 110 notifies the client device 200 of the memory address of the bitmap.
  • bitmap occupies a relatively small space, when the subsequent storage device 110 needs to read some subgroups or bits of the bitmap from the client device 200, it only needs to read a small space, which can effectively improve the performance of the global index. read efficiency.
  • Step 702 The client device 200 may initiate a first unilateral RDMA to the storage device 110, and the first unilateral RDMA is used to read metadata of the target data in the read cache in the I/O stack of the storage system 100.
  • Step 703 The client device 200 may initiate a second unilateral RDMA to the storage device 110, and the second unilateral RDMA is used to obtain from the storage device 110 the first one of the subgroups pointing to the target logical subblock in the bitmap bit.
  • Step 704 The client device 200 checks the validity of the metadata of the target data in the read cache according to the specific value of the first bit in each subgroup.
  • the client device 200 first determines whether the first bit in each subgroup is 1. If the first bit is 1, it means that the target data is in the write cache, and the metadata of the target data in the read cache is invalid.
  • the first bit in each subgroup is 0, it means that the latest target data written in the logical address is in the read cache, and the metadata of the target data in the target pass-through layer is valid.
  • Step 705 If the metadata of the target data in the read cache is valid, the client device 200 uses the metadata of the target data to obtain the target data from the storage device 110 through unilateral RDMA. If the metadata of the target data in the read cache is invalid, the client device 200 may acquire the target data from the storage device 110 by using the embodiment shown in FIG. 4 or 5 . The client device 200 may also acquire target data from the storage device 110 by using bilateral RDMA.
  • the embodiment of the present application also provides a data access device, which is used to execute the processor described in the method embodiment shown in Figures 4, 5, 6, and 7 above. Or the method executed by the network card, related features can refer to the above method embodiment, and will not be repeated here.
  • the data access device 800 includes a transmission module 801 and a reading module 802;
  • the transmission module 801 is configured to receive a data read request, and the data read request is used to request to read the target data stored in the storage device.
  • the transmission module 801 may execute step 405 shown in FIG. 4 or step 505 shown in FIG. 5 .
  • the reading module 802 is configured to query the global index based on the data reading request, and the global index is used to indicate the storage layer where the target data in the I/O stack is located; read the target data according to the storage layer indicated by the global index.
  • the reading module 802 may execute steps 406 to 408 shown in FIG. 4 or steps 506 to 508 shown in FIG. 5 .
  • the data access device 800 further includes a writing module 803 .
  • the transmission module 801 may receive a data write request, where the data write request is used to request to write target data in the storage system.
  • the transmission module 801 may execute step 401 shown in FIG. 4 or step 501 shown in FIG. 5 .
  • the writing module 803 may update the global index according to the target data written in the data writing request, and the updated global index is used to indicate the storage layer of the target data in the I/O stack.
  • the reading module 802 may execute steps 402 to 404 shown in FIG. 4 or steps 502 to 504 shown in FIG. 5 .
  • the data read request includes the logical address of the target data
  • the read module 802 queries the global index based on the data read request, it can determine in the global index the target data according to the logical address of the target data Multiple character sub-blocks of the logical address; determine the storage layer where the target data in the I/O stack is located according to the non-zero character sub-blocks in the multiple character sub-blocks.
  • the character sub-block is a bit, and the value of the bit includes 0 or 1. 1 indicates that the target data is located in the storage layer corresponding to the character sub-block, and 0 indicates that the target data is located in the storage layer corresponding to the character sub-block. layer.
  • the character sub-block is a counter, and the counter is 0 or a non-zero integer, and the non-zero integer is used to indicate that the target data is located in the storage layer corresponding to the character block, and the The non-zero integer is also used to indicate the number of times data is written into the storage layer corresponding to the character sub-block, and the data includes the target data.
  • the reading module 802 when it determines a plurality of character sub-blocks pointing to the logical address of the target data in the global index according to the logical address of the target data, it may perform a hash operation according to the logical address of the target data
  • the result of determining a plurality of character sub-blocks pointing to the logical address of the target data, and the hash operation is to query a hash table or act on a hash function.
  • the reading module 802 determines a plurality of character sub-blocks pointing to the logical address of the target data according to the result of the hash operation on the logical address of the target data, it can be determined according to the result that the global index points to the target A character block of the logical block to which the logical address of the data belongs; and then determine a plurality of character sub-blocks pointing to the logical address of the target data from the character block according to the logical address of the target data.
  • the character block includes multiple subgroups, each subgroup corresponds to a logical subblock in the logical block, and each subgroup includes multiple character subblocks
  • the reading module 802 can The offset between the logical address and the logical block to which the logical address of the target data belongs determines the target subgroup in multiple subgroups in the character block, and the character subblocks in the target subgroup are multiple characters pointing to the logical address of the target data subblock.
  • the metadata of the global index and the target data are located in the memory of the device in the storage device, and the transmission module 801 may feed back the global index and the metadata of the target data to the client device under the first instruction of the client device.
  • Metadata, the first indication is based on RDMA transport.
  • the first indication may be the first request and the second request in the embodiment shown in FIG. 6 , or the first request and the third request in the embodiment shown in FIG. 6 .
  • the global index is located in the memory of the device in the storage device, and the metadata of the target data is located in the persistent storage of the device in the storage device, and the transmission module 801 may send the The end device feeds back the global index, and the second indication is based on RDMA transmission; and the metadata of the target data is obtained from the persistent storage under the third indication of the client device, and the metadata of the target data is fed back to the client device.
  • the target data is located in the memory of the device in the storage device, and the transmission module 801 may feed back the target data to the client device under a fourth instruction of the client device, and the fourth instruction is based on the metadata of the target data Data-initiated, based on RDMA transmission.
  • the fourth indication may be the fourth request in the embodiment shown in FIG. 6 .
  • the target data is located in the persistent memory of the device in the storage device, and the transmission module 801 can obtain the target data from the persistent memory under the fifth instruction of the client device, and send the target data to the client device The target data is fed back, and the fifth indication is initiated according to the metadata of the target data.
  • the data access device 800 further includes a control module 804, the control module 804 can control the data flow and data elimination in the I/O stack, and update the data according to the data flow and data elimination in the I/O stack global index.
  • each functional module in the embodiment of the present application may be integrated into one processing module, each module may exist separately physically, or two or more modules may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules.
  • the above-mentioned embodiments may be implemented in whole or in part by software, hardware, firmware or other arbitrary combinations.
  • the above-described embodiments may be implemented in whole or in part in the form of computer program products.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded or executed on the computer, the processes or functions according to the embodiments of the present invention will be generated in whole or in part.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center that includes one or more sets of available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media.
  • the semiconductor medium may be a solid state drive (SSD).
  • the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions
  • the device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A storage system, a network interface card, a processor, and a data access method, apparatus, and system. In the present application, the storage system comprises an I/O stack and a processing unit, and the I/O stack comprises a plurality of storage layers. The processing unit can receive a data read request, the data read request being used for reading target data stored in the storage system; the processing unit queries a global index on the basis of the data read request, wherein the global index can indicate the storage layer in the I/O stack where the target data is located; and the processing unit can read the target data from the storage layer after determining, according to the global index, the storage layer where the target data is located. When processing the data read request, the processing unit in the storage system directly determines, by querying the global index, the storage layer where the target data is located, and there is no need to sequentially traverse a storage medium in the storage system, such that delay caused due to traversing the storage medium is omitted, the processing process of the data read request is more efficient, and the data read efficiency can be effectively improved.

Description

存储系统、网卡、处理器、数据访问方法、装置及系统Storage system, network card, processor, data access method, device and system
相关申请的交叉引用Cross References to Related Applications
本申请要求在2021年06月07日提交中国专利局、申请号为202110634011.5、申请名称为“”的中国专利申请的优先权,其全部内容通过引用结合在本申请中;本申请要求在2021年08月17日提交中国专利局、申请号为202110944933.6、申请名称为“存储系统、网卡、处理器、数据访问方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202110634011.5 and the application name "" submitted to the China Patent Office on June 07, 2021, the entire contents of which are incorporated in this application by reference; this application claims that in 2021 The priority of the Chinese patent application filed with the China Patent Office on August 17, the application number is 202110944933.6, and the application title is "storage system, network card, processor, data access method, device and system", the entire content of which is incorporated herein by reference Applying.
技术领域technical field
本申请涉及存储技术领域,尤其涉及存储系统、网卡、处理器、数据访问方法、装置及系统。The present application relates to the technical field of storage, and in particular to a storage system, a network card, a processor, a data access method, device and system.
背景技术Background technique
在存储领域,当存储系统接收到来自客户端设备的数据读取请求时,存储系统需要按照特定的顺序依次遍历存储系统中各个存储介质,判断各个存储介质中是否存储有待读取的目标数据。例如,存储系统首先查找写缓存中是否保存有所述目标数据,若没有,则继续查找读缓存中是否保存有所述目标数据,若没有,则继续向查找下层的存储介质。In the storage field, when the storage system receives a data read request from a client device, the storage system needs to traverse each storage medium in the storage system in a specific order to determine whether the target data to be read is stored in each storage medium. For example, the storage system first checks whether the target data is stored in the write cache, and if not, continues to check whether the target data is stored in the read cache, and if not, then continues to search for the lower storage medium.
可见,若目标数据存储在排序靠后的存储介质中,存储系统需要依次遍历存储系统中排序靠前的各个存储介质,直到遍历到该目标数据所在的存储介质才能够成功查找到该目标数据。若该目标数据所在的存储介质的排序位置越靠后,数据查找的过程的耗时会越大,使得存储系统对数据读取请求的处理效率降低。It can be seen that if the target data is stored in a lower-ranked storage medium, the storage system needs to traverse each storage medium that is higher in the storage system in turn until the storage medium where the target data is located can be successfully found. If the sorting position of the storage medium where the target data is located is lower, the time consumption of the data search process will be greater, so that the processing efficiency of the storage system for the data read request is reduced.
发明内容Contents of the invention
本申请提供存储系统、网卡、处理器、数据访问方法、装置及系统,用以提升数据读取效率。The present application provides a storage system, a network card, a processor, a data access method, device and system to improve data reading efficiency.
第一方面,本申请实施例提供了一种存储系统,该存储系统包括I/O栈和处理单元,I/O栈包括多个存储层(在本申请实施例中存储层可以简称为层),该I/O栈是对存储系统中的存储介质分层所形成的。每个存储层可以包括一种或多种存储介质,每个存储层的数据读取时延可以不同。处理单元可以位于该存储系统中设备(该设备可以称为存储设备)中,本申请实施例并不限定该处理单元的具体形态。该处理单元能够处理来自客户端设备或存储系统内部产生的数据读取请求,反馈所请求读取的目标数据。In the first aspect, the embodiment of the present application provides a storage system, the storage system includes an I/O stack and a processing unit, and the I/O stack includes multiple storage layers (the storage layer may be referred to simply as a layer in the embodiment of the application) , the I/O stack is formed by layering the storage media in the storage system. Each storage layer may include one or more storage media, and the data read delay of each storage layer may be different. The processing unit may be located in a device in the storage system (the device may be referred to as a storage device), and the embodiment of the present application does not limit the specific form of the processing unit. The processing unit can process a data reading request generated inside the client device or the storage system, and feed back the requested target data.
具体处理过程如下:The specific process is as follows:
处理单元接收数据读取请求,数据读取请求用于请求读取存储系统中存储的目标数据;处理单元可以基于数据读取请求查询全局索引,该全局索引能够指示I/O栈中目标数据所在的存储层,处理单元可以根据全局索引确定该目标数据的所在的存储层。在确定了目标数据的所在的存储层后,可以从该存储层中读取该目标数据。The processing unit receives a data read request, and the data read request is used to request to read the target data stored in the storage system; the processing unit can query the global index based on the data read request, and the global index can indicate where the target data is located in the I/O stack storage layer, the processing unit may determine the storage layer where the target data is located according to the global index. After the storage layer where the target data is located is determined, the target data can be read from the storage layer.
通过上述系统,存储系统中的处理单元在处理数据读取请求时,通过查询全局索引可 以直接确定目标数据的所在的存储层,不需要按序遍历存储系统中存储介质,省去了遍历存储介质的时延,数据读取请求的处理过程更加高效,能够有效提高数据读取效率。Through the above system, when processing a data read request, the processing unit in the storage system can directly determine the storage layer where the target data is located by querying the global index, without traversing the storage media in the storage system in order, eliminating the need to traverse the storage media The delay, the processing process of the data reading request is more efficient, which can effectively improve the data reading efficiency.
在一种可能的实现方式中,处理单元除了处理数据读取请求,还可以处理数据写入请求。具体过程如下:处理单元可以接收数据写入请求,数据写入请求用于请求在存储系统中写入目标数据;处理单元可以根据数据写入请求将目标数据写入到存储系统中,还可以根据目标数据更新全局索引,更新后全局索引用于指示I/O栈中目标数据所在的存储层。具体的,所谓更新全局索引是指将所述目标数据存储在某一个存储层中,这条信息记录在第一索引项中,由此,当后续需要读取所述目标数据时,则可以通过查询所述全局索引中的第一索引项获知该数据的存储位置。索引项在实施例中称为子组,第一索引项则称为第一子组。In a possible implementation manner, in addition to processing the data read request, the processing unit may also process the data write request. The specific process is as follows: the processing unit can receive the data write request, and the data write request is used to request to write the target data in the storage system; the processing unit can write the target data into the storage system according to the data write request, and can also according to The target data updates the global index, and the updated global index is used to indicate the storage layer of the target data in the I/O stack. Specifically, the so-called updating the global index refers to storing the target data in a certain storage layer, and this information is recorded in the first index item, so that when the target data needs to be read later, it can be accessed through Query the first index item in the global index to obtain the storage location of the data. The index items are called subgroups in the embodiment, and the first index item is called the first subgroup.
通过上述系统,处理单元在目标数据写入时,可以更新全局索引,更新后的全局索引能够指示目标数据所在的存储层,以便后续在处理单元需要读取目标数据时,能够根据该更新后的全局索引准确确定目标数据的所在的存储层,并读取该目标数据。Through the above system, the processing unit can update the global index when the target data is written, and the updated global index can indicate the storage layer where the target data is located, so that when the processing unit needs to read the target data, it can follow the updated The global index accurately determines the storage layer where the target data resides, and reads the target data.
在一种可能的实现方式中,全局索引中设置有多个字符子块,每个字符子块与I/O栈中的一个存储层对应,并指向该存储系统中存储介质映射的逻辑存储空间中的一段空间。换句话说,字符子块的取值能够指示所指向的一段空间中的数据是否位于所对应的存储层中。实质上全局索引是建立逻辑存储空间与存储介质的对应关系。In a possible implementation, multiple character sub-blocks are set in the global index, each character sub-block corresponds to a storage layer in the I/O stack, and points to the logical storage space mapped by the storage medium in the storage system a space in . In other words, the value of the character sub-block can indicate whether the data in the pointed space is located in the corresponding storage layer. In essence, the global index is to establish the corresponding relationship between the logical storage space and the storage medium.
将该存储系统中存储介质映射的逻辑存储空间进行细化,将逻辑存储空间划分为大的逻辑块。逻辑块还可以再细化,一个逻辑块可以划分为多个逻辑子块。The logical storage space mapped by the storage medium in the storage system is refined, and the logical storage space is divided into large logical blocks. The logic block can be further refined, and a logic block can be divided into multiple logic sub-blocks.
全局索引中可以包括多个字符块,一个字符块指向至少一个逻辑块。一个字符块中包括多个子组,每个子组指向该至少一个逻辑块中的逻辑子块。每个子组包括多个字符子块,该字符子块与I/O栈中的一个存储层对应。The global index may include multiple character blocks, and one character block points to at least one logical block. A character block includes multiple subgroups, and each subgroup points to a logical subblock in the at least one logical block. Each subgroup includes a plurality of character subblocks, and the character subblocks correspond to a storage layer in the I/O stack.
数据读取请求包括目标数据的逻辑地址,该逻辑地址指示的是逻辑存储空间中的一段空间,处理单元在基于数据读取请求查询全局索引时,可以根据目标数据的逻辑地址在全局索引中确定指向目标数据的逻辑地址的多个字符子块;之后,根据多个字符子块的取值确定I/O栈中目标数据所在的存储层。The data read request includes the logical address of the target data, and the logical address indicates a space in the logical storage space. When the processing unit queries the global index based on the data read request, it can be determined in the global index according to the logical address of the target data A plurality of character sub-blocks pointing to the logical address of the target data; then, determine the storage layer where the target data in the I/O stack is located according to the values of the plurality of character sub-blocks.
通过上述系统,全局索引能够将逻辑存储空间(也可以理解为逻辑地址)与I/O栈中的存储层(也可以理解为存储介质)关联起来。使得处理单元能够利用目标数据的逻辑地址查询到全局索引中指向逻辑地址的字符子块,根据该字符子块的取值可以方便、快捷的确定目标数据所在的存储层,保证了数据读取的高效性。Through the above system, the global index can associate a logical storage space (also can be understood as a logical address) with a storage layer (also can be understood as a storage medium) in the I/O stack. The processing unit can use the logical address of the target data to query the character sub-block pointing to the logical address in the global index. According to the value of the character sub-block, the storage layer where the target data is located can be conveniently and quickly determined, ensuring the accuracy of data reading. Efficiency.
在一种可能的实现方式中,全局索引中字符子块的表现形式有多种,例如,一个字符子块可以为一个比特,该比特的取值可以为0,也可以为1。当该比特的取值为1时,表示目标数据位于字符子块对应的存储层中。当该比特的取值为0时,表示目标数据位于字符子块对应的存储层中。若存在多个非零比特,那么,该目标数据位于该多个非零比特所对应的存储层中的最高层中。In a possible implementation manner, the character sub-blocks in the global index may be represented in various forms, for example, a character sub-block may be a bit, and the value of the bit may be 0 or 1. When the value of this bit is 1, it means that the target data is located in the storage layer corresponding to the character sub-block. When the value of this bit is 0, it means that the target data is located in the storage layer corresponding to the character sub-block. If there are multiple non-zero bits, then the target data is located in the highest layer among the storage layers corresponding to the multiple non-zero bits.
又例如,一个字符子块可以为一个计数器,计数器的取值可以为0,也可以为非零整数,当一个计数器取值为0时,表示没有数据写入到该计数器对应的存储层中,当一个计数器取值为非0整数时,该非0整数表示数据写入计数器对应的存储层的次数。全局索引中确定指向目标数据的逻辑地址的多个计数器中存在的非零计数器,即可表示该目标数据位于该非零计数器所对应的存储层中。若存在多个非零计数器,那么,该目标数据位于该 多个非零计数器所对应的存储层中的最高层中。For another example, a character sub-block can be a counter, and the value of the counter can be 0 or a non-zero integer. When a counter has a value of 0, it means that no data is written into the storage layer corresponding to the counter. When a counter is a non-zero integer, the non-zero integer represents the number of times data is written to the storage layer corresponding to the counter. A non-zero counter among the plurality of counters determined to point to the logical address of the target data in the global index may indicate that the target data is located in the storage layer corresponding to the non-zero counter. If there are multiple non-zero counters, then the target data is located in the highest layer among the storage layers corresponding to the multiple non-zero counters.
通过上述系统,全局索引中字符子块呈现形式有很多种,呈现方式较为灵活,能够应用在不同的场景中。Through the above system, there are many kinds of presentation forms of character sub-blocks in the global index, and the presentation methods are relatively flexible and can be applied in different scenarios.
在一种可能的实现方式中,处理单元在根据目标数据的逻辑地址在全局索引中确定指向目标数据的逻辑地址的多个字符子块时,可以对目标数据的逻辑地址进行哈希操作,如查询哈希表或作用哈希函数,根据哈希操作的结果确定指向目标数据的逻辑地址的多个字符子块。In a possible implementation, when the processing unit determines multiple character sub-blocks pointing to the logical address of the target data in the global index according to the logical address of the target data, it may perform a hash operation on the logical address of the target data, such as Query the hash table or act on the hash function, and determine multiple character sub-blocks pointing to the logical address of the target data according to the result of the hash operation.
通过上述系统,哈希操作的方式较为简单、快捷,能够较快的确定出指向目标数据的逻辑地址的多个字符子块,保证数据读取效率。Through the above-mentioned system, the method of hash operation is relatively simple and fast, and multiple character sub-blocks pointing to the logical address of the target data can be quickly determined to ensure the efficiency of data reading.
在一种可能的实现方式中,处理单元在根据哈希操作的结果确定指向目标数据的逻辑地址的多个字符子块时,可以先确定全局索引中指向目标数据的逻辑地址所属的逻辑块的字符块,之后,再根据目标数据的逻辑地址从字符块中确定指向目标数据的逻辑地址的多个字符子块。In a possible implementation manner, when the processing unit determines multiple character sub-blocks pointing to the logical address of the target data according to the result of the hash operation, it may first determine the logical block of the logical address pointing to the target data in the global index. The character block, and then determine a plurality of character sub-blocks pointing to the logical address of the target data from the character block according to the logical address of the target data.
通过上述系统,处理单元可以先定位指向较大的逻辑块的字符块,之后再从字符块中定位指向较小的逻辑子块的字符子块。处理单元先定位大范围的字符块,再定位小范围的字符子块,能够提升定位字符子块的效率。Through the above system, the processing unit can first locate a character block pointing to a larger logical block, and then locate a character sub-block pointing to a smaller logical sub-block from the character block. The processing unit first locates the character blocks in a large range, and then locates the character sub-blocks in a small range, which can improve the efficiency of locating the character sub-blocks.
在一种可能的实现方式中,处理单元根据目标数据的逻辑地址从字符块中确定指向目标数据的逻辑地址的多个字符子块,可以根据目标数据的逻辑地址与目标数据的逻辑地址所属的逻辑块的之间偏移在字符块中的多个子组中确定目标子组,目标子组中的字符子块为指向目标数据的逻辑地址的多个字符子块。也即处理单元可以根据该字符块所指向的逻辑块的起始地址与该目标数据的逻辑地址之间的偏移、以及该目标数据的数据长度,确定全局索引中指向该逻辑地址所指示的逻辑子块的各个子组(也即目标子组)。该各个子组中的字符子块即为全局索引中指向逻辑地址的字符子块。In a possible implementation, the processing unit determines a plurality of character sub-blocks pointing to the logical address of the target data from the character block according to the logical address of the target data. The offset between the logical blocks determines the target subgroup among the multiple subgroups in the character block, and the character subblocks in the target subgroup are multiple character subblocks pointing to the logical address of the target data. That is to say, the processing unit can determine the address indicated by the logical address in the global index according to the offset between the start address of the logical block pointed to by the character block and the logical address of the target data, and the data length of the target data. Each subgroup of the logical subblock (ie, the target subgroup). The character sub-blocks in each subgroup are the character sub-blocks pointing to logical addresses in the global index.
通过上述系统,处理单元能够通过逻辑地址在逻辑块中的偏移以及数据长度,较为精准的确定出全局索引中指向逻辑地址的字符子块。Through the above system, the processing unit can more accurately determine the character sub-block pointing to the logical address in the global index through the offset of the logical address in the logical block and the data length.
在一种可能的实现方式中,处理单元可以位于存储系统中设备的网卡,也可以位于该存储系统中设备的处理器中,该处理器还可以为数据处理器,也可以是存储系统中的一个独立的硬件组件。In a possible implementation manner, the processing unit may be located in a network card of a device in the storage system, or in a processor of a device in the storage system, and the processor may also be a data processor, or may be a A separate hardware component.
通过上述系统,存储设备中设备的网卡或处理器可以具备处理数据读取请求的功能,有效的扩展了应用场景。Through the above system, the network card or processor of the device in the storage device can have the function of processing data read requests, which effectively expands the application scenarios.
在一种可能的实现方式中,当处理单元为网卡,全局索引以及目标数据的元数据位于存储系统中设备的内存中时,客户端设备能够通过单边RDMA的方式从存储系统中获取全局索引以及目标数据的元数据。In a possible implementation, when the processing unit is a network card, the global index and the metadata of the target data are located in the memory of the device in the storage system, the client device can obtain the global index from the storage system through unilateral RDMA and metadata for the target data.
处理单元可以在客户端设备的第一指示下向客户端设备反馈全局索引以及目标数据的元数据,第一指示为基于RDMA传输的。The processing unit may feed back the global index and the metadata of the target data to the client device under a first instruction of the client device, where the first instruction is based on RDMA transmission.
其中,处理单元向客户端设备反馈的全局索引可以是整个全局索引,也可以是部分全局索引,如只反馈全局索引中只需该目标数据的逻辑地址的所有字符子块或部分字符子块。Wherein, the global index fed back by the processing unit to the client device may be the entire global index or a partial global index, for example, only all character sub-blocks or part of the character sub-blocks in the global index that only need the logical address of the target data are fed back.
其中,存储系统可以预先将全局索引在存储系统内存中的地址(也即内存地址)通知给客户端设备。Wherein, the storage system may notify the client device of the address (that is, the memory address) of the global index in the memory of the storage system in advance.
通过上述系统,客户端设备可以通过单边RDMA的方式读取全局索引以及目标数据 的元数据,不需要存储系统中处理器的参与,能够提升数据交互效率。Through the above system, the client device can read the global index and the metadata of the target data through unilateral RDMA, without the participation of the processor in the storage system, which can improve the efficiency of data interaction.
在一种可能的实现方式中,当处理单元为网卡,全局索引位于存储系统中设备的内存,目标数据的元数据位于存储系统中设备的持久化存储器时,客户端设备能够通过单边RDMA的方式从存储系统中获取全局索引,通过直通访问的方式从存储系统中获取目标数据的元数据。In a possible implementation, when the processing unit is a network card, the global index is located in the memory of the device in the storage system, and the metadata of the target data is located in the persistent memory of the device in the storage system, the client device can use unilateral RDMA The global index is obtained from the storage system by means of direct access, and the metadata of the target data is obtained from the storage system by means of direct access.
处理单元可以在客户端设备的第二指示下向客户端设备反馈全局索引,第二指示为基于RDMA传输的;处理单元还可以在客户端设备的第三指示下从持久化存储器中获取目标数据的元数据,向客户端设备反馈目标数据的元数据。The processing unit may feed back the global index to the client device under the second instruction of the client device, and the second instruction is based on RDMA transmission; the processing unit may also obtain the target data from the persistent storage under the third instruction of the client device The metadata of the target data is fed back to the client device.
其中,存储系统可以预先将全局索引在存储系统内存中的地址(也即内存地址)通知给客户端设备。Wherein, the storage system may notify the client device of the address (that is, the memory address) of the global index in the memory of the storage system in advance.
通过上述系统,客户端设备可以通过单边RDMA的方式读取全局索引,通过直通访问的方式获取目标数据的元数据,不需要存储系统中处理器的参与,能够提升数据交互效率。Through the above system, the client device can read the global index through unilateral RDMA, and obtain the metadata of the target data through direct access, without the participation of the processor in the storage system, which can improve the efficiency of data interaction.
在一种可能的实现方式中,当目标数据的元数据指示该目标数据位于存储系统中设备的内存中(其中,存储系统中设备的内存可以归属在该存储系统I/O栈中的一层或多层中)。客户端设备可以通过获取的全局索引确定该目标数据的元数据是否有效,也即确定全局索引指示的存储层与目标数据的元数据所指示的位置是否一致,若一致,则说明目标数据的元数据有效,客户端设备可以通过单边RDMA从该存储系统获取目标数据。处理单元可以在客户端设备的第四指示下向客户端设备反馈目标数据,第四指示是根据目标数据的元数据发起的、基于RDMA传输的。In a possible implementation, when the metadata of the target data indicates that the target data is located in the memory of the device in the storage system (wherein, the memory of the device in the storage system may belong to a layer in the I/O stack of the storage system or multiple layers). The client device can determine whether the metadata of the target data is valid through the obtained global index, that is, determine whether the storage layer indicated by the global index is consistent with the location indicated by the metadata of the target data. If they are consistent, it indicates that the metadata of the target data The data is valid, and the client device can obtain the target data from the storage system through unilateral RDMA. The processing unit may feed back the target data to the client device under a fourth instruction of the client device, where the fourth instruction is initiated according to metadata of the target data and based on RDMA transmission.
通过上述系统,存储系统允许客户端设备通过单边RDMA的方式获取全局索引,以及允许客户端设备通过单边RDMA的方式获取目标数据,有效简化了存储系统与客户端设备之间得到交互流程,存储系统中的处理器不需要参与,也可以减少对存储系统中处理器的占用。Through the above system, the storage system allows the client device to obtain the global index through unilateral RDMA, and allows the client device to obtain the target data through unilateral RDMA, which effectively simplifies the interaction process between the storage system and the client device. The processor in the storage system does not need to participate, and the occupation of the processor in the storage system can also be reduced.
在一种可能的实现方式中,当目标数据位于存储系统中设备的持久化存储器中(其中,存储系统中设备的持久化存储器可以归属在该存储系统I/O栈中的一层或多层中)。客户端设备可以通过获取的全局索引确定该目标数据的元数据是否有效,也即确定全局索引指示的存储层与目标数据的元数据所指示的位置是否一致,若一致,则说明目标数据的元数据有效,客户端设备可以通过直通访问的方式从该存储系统获取目标数据。处理单元可以在客户端设备的第五指示下从向持久化存储器中获取目标数据,并向客户端设备反馈目标数据,第五指示根据目标数据的元数据发起的。In a possible implementation, when the target data is located in the persistent memory of the device in the storage system (the persistent memory of the device in the storage system may belong to one or more layers of the I/O stack of the storage system middle). The client device can determine whether the metadata of the target data is valid through the obtained global index, that is, determine whether the storage layer indicated by the global index is consistent with the location indicated by the metadata of the target data. If they are consistent, it indicates that the metadata of the target data The data is valid, and the client device can obtain target data from the storage system through direct access. The processing unit may obtain the target data from the persistent storage under the fifth instruction of the client device, and feed back the target data to the client device, where the fifth instruction is initiated according to the metadata of the target data.
通过上述系统,存储系统允许客户端设备通过单边RDMA的方式获取全局索引,以及允许客户端设备通过直通访问的方式获取目标数据,使得客户端设备不经过存储系统中的处理器就可以能够直接获得目标数据。Through the above system, the storage system allows the client device to obtain the global index through unilateral RDMA, and allows the client device to obtain the target data through the direct access method, so that the client device can directly access the index without going through the processor in the storage system. Get target data.
在一种可能的实现方式中,处理单元还可以控制I/O栈中的数据流动(数据从一个存储层中流出,再流入到另一个存储层)以及数据淘汰(一个存储层中的数据被删除),还可以根据I/O栈中的数据流动以及数据淘汰更新全局索引。该处理单元可以是网卡,也可以是处理器。In a possible implementation, the processing unit can also control data flow in the I/O stack (data flows out of one storage layer and then flows into another storage layer) and data elimination (data in one storage layer is Delete), and global indexes can also be updated according to data flow in the I/O stack and data elimination. The processing unit may be a network card or a processor.
通过上述系统,处理单元能够I/O栈中的数据流动以及数据淘汰更新全局索引,使得全局索引能够准确的指示各个数据所在的存储层,保证数据读取过程的准确、有效。Through the above system, the processing unit can update the global index for data flow and data elimination in the I/O stack, so that the global index can accurately indicate the storage layer where each data is located, and ensure the accuracy and effectiveness of the data reading process.
第二方面,本申请实施例提供了一种数据访问方法,方法可以由存储系统中的处理单元执行,有益效果可以参见第一方面以及第一方面任一种可能的实现方式中的相关说明,此处不再赘述。该存储系统中还包括I/O栈,关于I/O栈的说明可以参见前述内容,此处不再赘述。该方法中,处理单元可以接收数据读取请求,数据读取请求用于请求读取存储系统中存储的目标数据。在接收到数据读取请求后,处理单元可以基于数据读取请求查询全局索引,全局索引用于指示I/O栈中目标数据所在的存储层;之后,处理单元可以根据全局索引指示的存储层,读取目标数据。In the second aspect, the embodiment of the present application provides a data access method, which can be executed by a processing unit in the storage system. For the beneficial effects, please refer to the first aspect and the related descriptions in any possible implementation of the first aspect. I won't repeat them here. The storage system also includes an I/O stack. For the description of the I/O stack, reference may be made to the foregoing content, and details will not be repeated here. In this method, the processing unit may receive a data read request, where the data read request is used to request to read target data stored in the storage system. After receiving the data read request, the processing unit can query the global index based on the data read request, and the global index is used to indicate the storage layer where the target data in the I/O stack is located; after that, the processing unit can according to the storage layer indicated by the global index , to read the target data.
在一种可能的实现方式中,处理单元还可以处理数据写入请求。处理单元可以接收数据写入请求,数据写入请求用于请求在存储系统中写入目标数据。之后,处理单元可以根据数据写入请求将目标数据写入到存储系统中,还可以根据该目标数据更新全局索引,更新后全局索引用于指示I/O栈中目标数据所在的存储层。In a possible implementation manner, the processing unit may also process the data writing request. The processing unit may receive a data write request, and the data write request is used for requesting to write target data in the storage system. Afterwards, the processing unit can write the target data into the storage system according to the data write request, and can also update the global index according to the target data, and the updated global index is used to indicate the storage layer where the target data in the I/O stack is located.
在一种可能的实现方式中,数据读取请求包括目标数据的逻辑地址,处理单元在基于数据读取请求查询全局索引时,可以根据目标数据的逻辑地址在全局索引中确定指向目标数据的逻辑地址的多个字符子块;之后,再根据多个字符子块的取值确定I/O栈中目标数据所在的存储层。In a possible implementation, the data read request includes the logical address of the target data, and when the processing unit queries the global index based on the data read request, it can determine the logical address pointing to the target data in the global index according to the logical address of the target data. A plurality of character sub-blocks of the address; after that, determine the storage layer where the target data in the I/O stack is located according to the values of the plurality of character sub-blocks.
在一种可能的实现方式中,全局索引中字符子块的表现形式有多种。例如,一个字符子块可以为一个比特,该比特的取值可以为0,也可以为1。当该比特的取值为1时,表示目标数据位于字符子块对应的存储层中。当该比特的取值为0时,表示目标数据位于字符子块对应的存储层中。In a possible implementation manner, the character sub-blocks in the global index have multiple representation forms. For example, a character sub-block may be a bit, and the value of the bit may be 0 or 1. When the value of this bit is 1, it means that the target data is located in the storage layer corresponding to the character sub-block. When the value of this bit is 0, it means that the target data is located in the storage layer corresponding to the character sub-block.
又例如,一个字符子块可以为一个计数器,计数器的取值可以为0,也可以为非零整数,当一个计数器取值为0时,表示没有数据写入到该计数器对应的存储层中,当一个计数器取值为非0整数时,该非0整数表示目标数据位于所对应的存储层中,还可以指示数据(该数据包括目标数据)写入计数器对应的存储层的次数。For another example, a character sub-block can be a counter, and the value of the counter can be 0 or a non-zero integer. When a counter has a value of 0, it means that no data is written into the storage layer corresponding to the counter. When a counter is a non-zero integer, the non-zero integer indicates that the target data is located in the corresponding storage layer, and may also indicate the number of times data (the data includes the target data) has been written into the storage layer corresponding to the counter.
在一种可能的实现方式中,处理单元在根据目标数据的逻辑地址在全局索引中确定指向目标数据的逻辑地址的多个字符子块时,可以对目标数据的逻辑地址进行哈希操作,如查询哈希表或作用哈希函数,根据哈希操作的结果确定指向目标数据的逻辑地址的多个字符子块。In a possible implementation, when the processing unit determines multiple character sub-blocks pointing to the logical address of the target data in the global index according to the logical address of the target data, it may perform a hash operation on the logical address of the target data, such as Query the hash table or act on the hash function, and determine multiple character sub-blocks pointing to the logical address of the target data according to the result of the hash operation.
在一种可能的实现方式中,处理单元在根据哈希操作的结果确定指向目标数据的逻辑地址的多个字符子块时,可以先确定全局索引中指向目标数据的逻辑地址所属的逻辑块的字符块,之后,再根据目标数据的逻辑地址从字符块中确定指向目标数据的逻辑地址的多个字符子块。In a possible implementation manner, when the processing unit determines multiple character sub-blocks pointing to the logical address of the target data according to the result of the hash operation, it may first determine the logical block of the logical address pointing to the target data in the global index. The character block, and then determine a plurality of character sub-blocks pointing to the logical address of the target data from the character block according to the logical address of the target data.
在一种可能的实现方式中,处理单元根据目标数据的逻辑地址从字符块中确定指向目标数据的逻辑地址的多个字符子块,可以根据目标数据的逻辑地址与目标数据的逻辑地址所属的逻辑块的之间偏移在字符块中的多个子组中确定目标子组,目标子组中的字符子块为指向目标数据的逻辑地址的多个字符子块。也即处理单元可以根据该字符块所指向的逻辑块的起始地址与该目标数据的逻辑地址之间的偏移、以及该目标数据的数据长度,确定全局索引中指向该逻辑地址所指示的逻辑子块的各个子组(也即目标子组)。该各个子组中的字符子块即为全局索引中指向逻辑地址的字符子块。In a possible implementation, the processing unit determines a plurality of character sub-blocks pointing to the logical address of the target data from the character block according to the logical address of the target data. The offset between the logical blocks determines the target subgroup among the multiple subgroups in the character block, and the character subblocks in the target subgroup are multiple character subblocks pointing to the logical address of the target data. That is to say, the processing unit can determine the address indicated by the logical address in the global index according to the offset between the start address of the logical block pointed to by the character block and the logical address of the target data, and the data length of the target data. Each subgroup of the logical subblock (ie, the target subgroup). The character sub-blocks in each subgroup are the character sub-blocks pointing to logical addresses in the global index.
在一种可能的实现方式中,当处理单元为网卡,全局索引以及目标数据的元数据位于存储系统中设备的内存中时,客户端设备能够通过单边RDMA的方式从存储系统中获取 全局索引以及目标数据的元数据。In a possible implementation, when the processing unit is a network card, the global index and the metadata of the target data are located in the memory of the device in the storage system, the client device can obtain the global index from the storage system through unilateral RDMA and metadata for the target data.
处理单元可以在客户端设备的第一指示下向客户端设备反馈全局索引以及目标数据的元数据,第一指示为基于RDMA传输的。The processing unit may feed back the global index and the metadata of the target data to the client device under a first instruction of the client device, where the first instruction is based on RDMA transmission.
其中,处理单元向客户端设备反馈的全局索引可以是整个全局索引,也可以是部分全局索引,如只反馈全局索引中只需该目标数据的逻辑地址的所有字符子块或部分字符子块。Wherein, the global index fed back by the processing unit to the client device may be the entire global index or a partial global index, for example, only all character sub-blocks or part of the character sub-blocks in the global index that only need the logical address of the target data are fed back.
在一种可能的实现方式中,全局索引位于方法中设备的内存,目标数据的元数据位于方法中设备的持久化存储器中,处理单元可以在客户端设备的第二指示下向客户端设备反馈全局索引,第二指示为基于RDMA传输的。In a possible implementation manner, the global index is located in the memory of the device in the method, the metadata of the target data is located in the persistent storage of the device in the method, and the processing unit may feed back the The global index, the second indication is based on RDMA transmission.
处理单元还可以在客户端设备的第三指示下从持久化存储器中获取目标数据的元数据,向客户端设备反馈目标数据的元数据。The processing unit may also acquire the metadata of the target data from the persistent storage under the third instruction of the client device, and feed back the metadata of the target data to the client device.
在一种可能的实现方式中,当处理单元为网卡,全局索引位于存储系统中设备的内存,目标数据的元数据位于存储系统中设备的持久化存储器时,客户端设备能够通过单边RDMA的方式从存储系统中获取全局索引,通过直通访问的方式从存储系统中获取目标数据的元数据。In a possible implementation, when the processing unit is a network card, the global index is located in the memory of the device in the storage system, and the metadata of the target data is located in the persistent memory of the device in the storage system, the client device can use unilateral RDMA The global index is obtained from the storage system by means of direct access, and the metadata of the target data is obtained from the storage system by means of direct access.
处理单元可以在客户端设备的第二指示下向客户端设备反馈全局索引,第二指示为基于RDMA传输的;还可以在客户端设备的第三指示下从持久化存储器中获取目标数据的元数据,向客户端设备反馈目标数据的元数据。The processing unit may feed back the global index to the client device under the second instruction of the client device, and the second instruction is based on RDMA transmission; and may also obtain the metadata of the target data from the persistent storage under the third instruction of the client device. Data, which feeds back metadata of the target data to the client device.
在一种可能的实现方式中,当目标数据的元数据指示该目标数据位于存储系统中设备的内存中(其中,存储系统中设备的内存可以归属在该存储系统I/O栈中的一层或多层中)。客户端设备可以通过获取的全局索引确定该目标数据的元数据是否有效,也即确定全局索引指示的存储层与目标数据的元数据所指示的位置是否一致,若一致,则说明目标数据的元数据有效,客户端设备可以通过单边RDMA从该存储系统获取目标数据。处理单元可以在客户端设备的第四指示下向客户端设备反馈目标数据,第四指示是根据目标数据的元数据发起的、基于RDMA传输的。In a possible implementation, when the metadata of the target data indicates that the target data is located in the memory of the device in the storage system (wherein, the memory of the device in the storage system may belong to a layer in the I/O stack of the storage system or multiple layers). The client device can determine whether the metadata of the target data is valid through the obtained global index, that is, determine whether the storage layer indicated by the global index is consistent with the location indicated by the metadata of the target data. If they are consistent, it indicates that the metadata of the target data The data is valid, and the client device can obtain the target data from the storage system through unilateral RDMA. The processing unit may feed back the target data to the client device under a fourth instruction of the client device, where the fourth instruction is initiated according to metadata of the target data and based on RDMA transmission.
在一种可能的实现方式中,处理单元还可以控制I/O栈中的数据流动以及数据淘汰;并根据I/O栈中的数据流动以及数据淘汰更新全局索引。In a possible implementation manner, the processing unit may also control data flow and data elimination in the I/O stack; and update the global index according to the data flow and data elimination in the I/O stack.
第三方面,本申请实施例还提供了一种网卡,该网卡可以是存储系统中设备上的网卡,该网卡具有实现上述第二方面以及第二方面的各个可能的实现方式中的方法实例中行为的功能,有益效果可以参见第一方面的描述此处不再赘述。In the third aspect, the embodiment of the present application also provides a network card. The network card may be a network card on a device in a storage system, and the network card has the method examples in the above-mentioned second aspect and each possible implementation manner of the second aspect. For the function and beneficial effect of the behavior, please refer to the description of the first aspect and will not go into details here.
第四方面,本申请实施例还提供了一种处理器,该处理器可以是存储系统中设备上的处理器,该处理器具有实现上述第二方面以及第二方面的各个可能的实现方式中的方法实例中行为的功能,有益效果可以参见第一方面的描述此处不再赘述。In a fourth aspect, the embodiment of the present application also provides a processor, which may be a processor on a device in a storage system, and the processor has the following functions for realizing the above second aspect and each possible implementation manner of the second aspect The function of the behavior in the method example, the beneficial effect can refer to the description of the first aspect and will not be repeated here.
第五方面,本申请实施例还提供了一种数据访问装置,该数据访问装置具有实现上述第二方面的方法实例中行为的功能,有益效果可以参见第一方面的描述此处不再赘述。功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个与上述功能相对应的模块。在一个可能的设计中,装置的结构中包括传输模块、读取模块,可选的,还可以包括写入模块、以及控制模块,这些模块可以执行上述第二方面方法示例中的相应功能,具体参见方法示例中的详细描述,此处不做赘述。In the fifth aspect, the embodiment of the present application also provides a data access device, the data access device has the function of implementing the behavior in the method example of the second aspect above, and the beneficial effects can be referred to the description of the first aspect, which will not be repeated here. The functions may be implemented by hardware, or may be implemented by executing corresponding software through hardware. Hardware or software includes one or more modules corresponding to the above-mentioned functions. In a possible design, the structure of the device includes a transmission module, a reading module, and optionally a writing module, and a control module. These modules can perform the corresponding functions in the method example of the second aspect above, specifically Refer to the detailed description in the method example, and do not repeat them here.
第六方面,本申请实施例提供了一种数据访问系统,数据访问系统包括存储系统以及客户端设备,存储系统包括I/O栈和处理单元,关于I/O栈的说明可以参见前述内容,此 处不再赘述。In the sixth aspect, the embodiment of the present application provides a data access system. The data access system includes a storage system and a client device. The storage system includes an I/O stack and a processing unit. For the description of the I/O stack, please refer to the foregoing content. I won't repeat them here.
客户端设备可以向存储系统发送数据读取请求,数据读取请求用于请求读取存储系统中存储的目标数据。存储系统中的处理单元接收数据读取请求后,可以基于数据读取请求查询全局索引,全局索引用于指示I/O栈中目标数据所在的存储层。之后,再根据全局索引指示的存储层,读取目标数据,向客户端设备反馈目标数据。The client device may send a data read request to the storage system, where the data read request is used to request to read target data stored in the storage system. After receiving the data read request, the processing unit in the storage system can query the global index based on the data read request, and the global index is used to indicate the storage layer where the target data in the I/O stack is located. After that, read the target data according to the storage layer indicated by the global index, and feed back the target data to the client device.
在一种可能的实现方式中,客户端设备还可以向存储系统发送数据写入请求,数据写入请求用于请求在存储系统中写入目标数据。处理单元在接收数据写入请求后,可以根据数据写入请求写入目标数据,该可以根据该目标数据更新全局索引,更新后全局索引用于指示I/O栈中目标数据所在的存储层。In a possible implementation manner, the client device may also send a data write request to the storage system, where the data write request is used to request to write target data in the storage system. After receiving the data write request, the processing unit can write the target data according to the data write request, and can update the global index according to the target data, and the updated global index is used to indicate the storage layer where the target data is located in the I/O stack.
在一种可能的实现方式中,数据读取请求包括目标数据的逻辑地址,处理单元在基于数据读取请求查询全局索引时,可以根据目标数据的逻辑地址在全局索引中确定指向目标数据的逻辑地址的多个字符子块;之后,再根据多个字符子块的取值确定I/O栈中目标数据所在的存储层。In a possible implementation, the data read request includes the logical address of the target data, and when the processing unit queries the global index based on the data read request, it can determine the logical address pointing to the target data in the global index according to the logical address of the target data. A plurality of character sub-blocks of the address; after that, determine the storage layer where the target data in the I/O stack is located according to the values of the plurality of character sub-blocks.
在一种可能的实现方式中,字符子块为一个比特,比特的取值包括0或1,1表示目标数据位于字符子块对应的存储层中,0表示目标数据位于字符子块对应的存储层中。In a possible implementation, the character sub-block is a bit, and the value of the bit includes 0 or 1. 1 indicates that the target data is located in the storage layer corresponding to the character sub-block, and 0 indicates that the target data is located in the storage layer corresponding to the character sub-block. layer.
在一种可能的实现方式中,字符子块为一个计数器,计数器为0或非零整数,非零整数指示数据写入字符子块对应的存储层的次数。In a possible implementation manner, the character sub-block is a counter, and the counter is 0 or a non-zero integer, and the non-zero integer indicates the number of times data is written to the storage layer corresponding to the character sub-block.
在一种可能的实现方式中,处理单元是一个具有计算能力的处理芯片,例如数据处理器,它可以位于存储系统的网卡中,也可以位于中央处理器中,还可以是存储系统内部的一个独立的硬件组件。In a possible implementation manner, the processing unit is a processing chip with computing power, such as a data processor, which may be located in the network card of the storage system, may also be located in the central processing unit, or may be a independent hardware components.
在一种可能的实现方式中,处理单元在根据目标数据的逻辑地址在全局索引中确定指向目标数据的逻辑地址的多个字符子块时,可以对目标数据的逻辑地址进行哈希操作,如查询哈希表或作用哈希函数,根据哈希操作的结果确定指向目标数据的逻辑地址的多个字符子块。In a possible implementation, when the processing unit determines multiple character sub-blocks pointing to the logical address of the target data in the global index according to the logical address of the target data, it may perform a hash operation on the logical address of the target data, such as Query the hash table or act on the hash function, and determine multiple character sub-blocks pointing to the logical address of the target data according to the result of the hash operation.
在一种可能的实现方式中,处理单元在根据哈希操作的结果确定指向目标数据的逻辑地址的多个字符子块时,可以先确定全局索引中指向目标数据的逻辑地址所属的逻辑块的字符块,之后,再根据目标数据的逻辑地址从字符块中确定指向目标数据的逻辑地址的多个字符子块。In a possible implementation manner, when the processing unit determines multiple character sub-blocks pointing to the logical address of the target data according to the result of the hash operation, it may first determine the logical block of the logical address pointing to the target data in the global index. The character block, and then determine a plurality of character sub-blocks pointing to the logical address of the target data from the character block according to the logical address of the target data.
在一种可能的实现方式中,处理单元根据目标数据的逻辑地址从字符块中确定指向目标数据的逻辑地址的多个字符子块,可以根据目标数据的逻辑地址与目标数据的逻辑地址所属的逻辑块的之间偏移在字符块中的多个子组中确定目标子组,目标子组中的字符子块为指向目标数据的逻辑地址的多个字符子块。In a possible implementation, the processing unit determines a plurality of character sub-blocks pointing to the logical address of the target data from the character block according to the logical address of the target data. The offset between the logical blocks determines the target subgroup among the multiple subgroups in the character block, and the character subblocks in the target subgroup are multiple character subblocks pointing to the logical address of the target data.
在一种可能的实现方式中,当全局索引以及目标数据的元数据位于系统中设备的内存中。In a possible implementation manner, when the global index and the metadata of the target data are located in the memory of the device in the system.
客户端设备可以基于RDMA向存储系统发起第一指示,第一指示用于请求全局索引以及目标数据的元数据。处理单元可以在客户端设备的第一指示下向客户端设备反馈全局索引以及目标数据的元数据。The client device may initiate a first indication to the storage system based on RDMA, where the first indication is used to request the global index and metadata of the target data. The processing unit may feed back the global index and the metadata of the target data to the client device under the first instruction of the client device.
需要说明的是,客户端设备可以请求获取整个全局索引,也可以只获取部分全局索引,例如,客户端设备可以根据目标数据的逻辑地址和预先获得的全局索引的内存地址确定全局索引中指向该逻辑地址的部分或全部字符子块的内存地址,根据该部分或全部字符子块 的内存地址向存储系统发起第一指示,请求该部分或全部字符子块。It should be noted that the client device can request to obtain the entire global index, or only obtain a part of the global index. For example, the client device can determine the global index pointing to the The memory address of part or all of the character sub-blocks of the logical address, according to the memory address of the part or all of the character sub-blocks, initiates a first instruction to the storage system to request the part or all of the character sub-blocks.
在一种可能的实现方式中,处理单元可以将全局索引在存储系统中的内存地址通知给客户端设备。In a possible implementation manner, the processing unit may notify the client device of the memory address of the global index in the storage system.
在一种可能的实现方式中,当全局索引位于系统中设备的内存,目标数据的元数据位于系统中设备的持久化存储器中。In a possible implementation manner, when the global index is located in the memory of the device in the system, the metadata of the target data is located in the persistent storage of the device in the system.
客户端设备可以基于RDMA向存储系统发起第二指示,第二指示用于请求全局索引;还可以向存储系统发起第三指示,第三指示用于请求目标数据的元数据。The client device may initiate a second indication to the storage system based on RDMA, where the second indication is used to request a global index; and may also initiate a third indication to the storage system, where the third indication is used to request metadata of the target data.
处理单元可以在客户端设备的第二指示下向客户端设备反馈全局索引;在客户端设备的第三指示下从持久化存储器中获取目标数据的元数据,向客户端设备反馈目标数据的元数据。The processing unit may feed back the global index to the client device under the second instruction of the client device; obtain the metadata of the target data from the persistent storage under the third instruction of the client device, and feed back the metadata of the target data to the client device. data.
需要说明的是,客户端设备可以请求获取整个全局索引,也可以只获取部分全局索引,例如,客户端设备可以根据目标数据的逻辑地址和预先获得的全局索引的内存地址确定全局索引中指向该逻辑地址的部分或全部字符子块的内存地址,根据该部分或全部字符子块的内存地址向存储系统发起第二指示,请求该部分或全部字符子块。It should be noted that the client device can request to obtain the entire global index, or only obtain a part of the global index. For example, the client device can determine the global index pointing to the The memory address of part or all of the character sub-blocks of the logical address, according to the memory address of the part or all of the character sub-blocks, initiates a second instruction to the storage system to request the part or all of the character sub-blocks.
在一种可能的实现方式中,当目标数据位于系统中设备的内存中;客户端设备可以根据全局索引校验目标数据的元数据的有效性;在确定目标数据的元数据有效的情况下,根据目标数据的元数据向存储系统发起第四指示,第四指示是基于RDMA传输的。之后,处理单元可以在客户端设备的第四指示下向客户端设备反馈目标数据。In a possible implementation manner, when the target data is located in the memory of the device in the system; the client device can verify the validity of the metadata of the target data according to the global index; when it is determined that the metadata of the target data is valid, A fourth indication is initiated to the storage system according to the metadata of the target data, where the fourth indication is based on RDMA transmission. Afterwards, the processing unit may feed back the target data to the client device under the fourth instruction of the client device.
在一种可能的实现方式中,当目标数据位于系统中设备的持久化存储器中。In a possible implementation manner, when the target data is located in the persistent memory of the device in the system.
客户端设备可以根据全局索引校验目标数据的元数据的有效性;在确定目标数据的元数据有效的情况下,根据目标数据的元数据向存储系统发起第五指示。The client device may check the validity of the metadata of the target data according to the global index; and initiate a fifth indication to the storage system according to the metadata of the target data when it is determined that the metadata of the target data is valid.
处理单元可以在客户端设备的第五指示下从向持久化存储器中获取目标数据,并向客户端设备反馈目标数据。The processing unit may obtain the target data from the persistent storage under the fifth instruction of the client device, and feed back the target data to the client device.
在一种可能的实现方式中,处理单元还可以控制I/O栈中的数据流动以及数据淘汰;以及根据I/O栈中的数据流动以及数据淘汰更新全局索引。In a possible implementation manner, the processing unit may also control data flow and data elimination in the I/O stack; and update the global index according to the data flow and data elimination in the I/O stack.
第七方面,本申请还提供一种计算机可读存储介质,计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述第二方面以及第二方面的各个可能的实现方式中的方法。In the seventh aspect, the present application also provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when it is run on a computer, the computer executes the above-mentioned second aspect and various possible implementations of the second aspect methods in methods.
第八方面,本申请还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第二方面以及第二方面的各个可能的实现方式中的方法。In an eighth aspect, the present application further provides a computer program product including instructions, which, when run on a computer, cause the computer to execute the above second aspect and the method in each possible implementation manner of the second aspect.
第九方面,本申请还提供一种计算机芯片,芯片与存储器相连,芯片用于读取并执行存储器中存储的软件程序,执行上述第二方面以及第二方面的各个可能的实现方式中的方法。In the ninth aspect, the present application also provides a computer chip, the chip is connected to the memory, and the chip is used to read and execute the software program stored in the memory, and execute the method in the above-mentioned second aspect and each possible implementation manner of the second aspect .
附图说明Description of drawings
图1为本申请提供的一种系统的架构示意图;FIG. 1 is a schematic structural diagram of a system provided by the present application;
图2为本申请提供的一种I/O栈的结构示意图;Fig. 2 is a schematic structural diagram of an I/O stack provided by the present application;
图3A~图3B为本申请提供的一种全局索引的示意图;3A to 3B are schematic diagrams of a global index provided by the present application;
图4~图5为本申请提供的一种数据访问方法示意图;4 to 5 are schematic diagrams of a data access method provided by the present application;
图6~图7为本申请提供的另一种数据访问方法示意图;6 to 7 are schematic diagrams of another data access method provided by this application;
图8为本申请提供的数据处理装置的结构示意图。FIG. 8 is a schematic structural diagram of a data processing device provided by the present application.
具体实施方式Detailed ways
在对本申请所提供的数据处理方法进行说明之前,先对本申请涉及的概念进行说明:Before explaining the data processing method provided by this application, first explain the concepts involved in this application:
1、元数据(metadata)。1. Metadata.
又称中介数据、中继数据。元数据为描述数据的数据(data about data),元数据可以指示数据的属性,如元数据可以记录数据的物理地址、数据的修改信息等。Also known as intermediary data and relay data. Metadata is data describing data (data about data). Metadata can indicate the attributes of data. For example, metadata can record the physical address of data, modification information of data, etc.
2、远程直接内存访问(remote direct memory access,RDMA)。2. Remote direct memory access (RDMA).
RDMA是一种绕过远程设备(如存储设备)操作系统内核访问其内存中数据的技术,由于不经过操作系统,不仅节省了大量处理器资源,同样也提高了系统吞吐量、降低了系统的网络通信延迟,尤其适合在大规模并行计算机集群中有广泛应用。RDMA is a technology that bypasses the operating system kernel of a remote device (such as a storage device) to access data in its memory. Because it does not go through the operating system, it not only saves a lot of processor resources, but also improves system throughput and reduces system traffic. Network communication delay, especially suitable for wide application in large-scale parallel computer clusters.
RDMA有几大特点,(1)数据通过网络与远程设备间进行数据传输;(2)没有操作系统内核的参与,有关发送传输的所有内容都卸载到智能网卡上;(3)在用户空间虚拟内存与智能网卡之间直接进行数据传输不涉及操作系统内核,没有额外的数据移动和复制。RDMA has several major characteristics, (1) data is transmitted between the network and the remote device; (2) without the participation of the operating system kernel, all content related to sending and transmitting is offloaded to the smart network card; (3) virtualized in the user space Direct data transmission between the memory and the iNIC does not involve the operating system kernel, and there is no additional data movement and copying.
3、单边RDMA和双边RDMA。3. Unilateral RDMA and bilateral RDMA.
在这里将需要交互信息的两端分别称为客户端设备(简称客户端)和服务端(在本申请实施例中服务端可以理解为存储设备)。客户端部署在用户侧,用户可以通过客户端向服务端发起请求。服务端可以部署在远端,服务端泛指存储系统,具体可以理解为存储系统中的设备。Here, the two ends that need to exchange information are respectively referred to as a client device (client for short) and a server (in this embodiment of the application, the server can be understood as a storage device). The client is deployed on the user side, and the user can initiate a request to the server through the client. The server can be deployed at the remote end. The server generally refers to the storage system, and can be specifically understood as a device in the storage system.
单边RDMA可以分为RDMA读(READ)以及RDMA写(WRITE)。Unilateral RDMA can be divided into RDMA read (READ) and RDMA write (WRITE).
以单边RDMA中的RDMA READ为例,客户端可以直接确定目标数据在服务端的内存中的位置,因此客户端发起的用于请求读取数据的报文中携带目标数据的位置信息,将该报文发送给服务端。在服务端侧,服务端侧的网卡读取该位置信息上的数据。上述过程中,服务端侧的处理器对客户端的一系列操作不感知。换句话说,服务端侧的处理器不知道客户端执行了读操作,因而减少了处理器参与数据传输过程的消耗,提升了系统处理业务的性能,具有高带宽、低时延及低CPU占用率的特点。Taking RDMA READ in unilateral RDMA as an example, the client can directly determine the location of the target data in the memory of the server. The message is sent to the server. On the server side, the network card on the server side reads the data on the location information. In the above process, the processor on the server side is not aware of a series of operations on the client side. In other words, the processor on the server side does not know that the client has performed a read operation, thus reducing the consumption of the processor participating in the data transmission process and improving the performance of the system for processing business, with high bandwidth, low latency and low CPU usage. rate features.
在本申请实施例中客户端设备可以通过单边RDMA从服务端中读取全局索引中的子组,还可以通过单边RDMA从服务端读取目标数据在I/O栈中某层的数据的元数据。In the embodiment of this application, the client device can read the subgroups in the global index from the server through unilateral RDMA, and can also read the data of a certain layer of the target data in the I/O stack from the server through unilateral RDMA metadata.
双边RDMA可以分为RDMA发送(SEND)以及RDMA接收(RECEIVE)。Bilateral RDMA can be divided into RDMA transmission (SEND) and RDMA reception (RECEIVE).
以双边RDMA中的RDMA RECEIVE为例,客户端并不知道所述目标数据的元数据存储在服务端的内存中的位置,因此客户端发起的用于请求读取数据的报文中不携带元数据的位置信息。服务端接收到该报文后,服务端侧的处理器查询该元数据的位置信息并返回给客户端,客户端会再次向服务端发起用于请求读取数据的报文,这次的报文中包含该元数据的位置信息(也即元数据的地址)。服务端的网卡再根据该元数据的位置信息获取元数据,并进一步获得目标数据,并将该目标数据发送给客户端。在该过程中,需要服务端侧的处理器参与,也就是说,双边RDMA需要服务端侧的处理器处理来自客户端的报文,所以单边RDMA相对于双边RDMA,读取数据的时间更短,对处理器的占用率更低,用户体验更好。因此,单边RDMA的应用越来越广。Taking RDMA RECEIVE in bilateral RDMA as an example, the client does not know where the metadata of the target data is stored in the memory of the server, so the message initiated by the client to request to read data does not carry metadata location information. After the server receives the message, the processor on the server side queries the location information of the metadata and returns it to the client, and the client sends a message to the server again to request to read the data. The text contains the location information of the metadata (that is, the address of the metadata). The network card of the server obtains the metadata according to the location information of the metadata, further obtains the target data, and sends the target data to the client. In this process, the processor on the server side is required to participate, that is to say, bilateral RDMA requires the processor on the server side to process messages from the client, so unilateral RDMA takes less time to read data than bilateral RDMA , lower processor usage and better user experience. Therefore, the application of unilateral RDMA is getting wider and wider.
4、直通访问。4. Direct access.
直通访问是一种不需要经过服务端处理器从服务端的持久化存储器(如硬盘)中读写 数据的方式。在直通访问的方式中,客户端能够确定服务端的持久化存储器中目标数据所在的位置,客户端可以通过服务端的网卡与服务端的硬盘中的控制器进行通信,进而从该服务端的硬盘中读取数据或写入数据。Pass-through access is a way to read and write data from the server-side persistent storage (such as hard disk) without going through the server-side processor. In the direct access method, the client can determine the location of the target data in the server's persistent storage, and the client can communicate with the controller in the server's hard disk through the server's network card, and then read from the server's hard disk. data or write data.
在本申请实施例中,当该数据的元数据(或索引)存储在服务端的硬盘上时,客户端可以通过直通访问的方式从服务端读取该数据在I/O栈中某层的数据的元数据。当I/O栈中的某层的数据存储在服务端的持久化存储器上,客户端还可以通过直通访问的方式从服务端读取目标数据在I/O栈中该层的数据。In the embodiment of this application, when the metadata (or index) of the data is stored on the hard disk of the server, the client can read the data of a certain layer of the data in the I/O stack from the server through direct access. metadata. When the data of a certain layer in the I/O stack is stored on the persistent memory of the server, the client can also read the data of the target data in this layer of the I/O stack from the server through direct access.
如图1所示,为本申请实施例提供的一种数据访问系统的结构示意图,该系统包括客户端设备200和存储系统100,该存储系统100中包括多个存储设备,在图1中仅示例性的展示了存储系统的一个存储设备110。As shown in FIG. 1, it is a schematic structural diagram of a data access system provided by an embodiment of the present application. The system includes a client device 200 and a storage system 100. The storage system 100 includes multiple storage devices. In FIG. 1 only A storage device 110 of the storage system is exemplarily shown.
用户通过应用程序来存取数据。运行这些应用程序的计算机被称为“客户端设备200”。客户端设备200可以是物理机,也可以是虚拟机。物理客户端设备200包括但不限于桌面电脑、服务器、笔记本电脑以及移动设备。Users access data through applications. The computers running these applications are referred to as "client devices 200". The client device 200 may be a physical machine or a virtual machine. Physical client devices 200 include, but are not limited to, desktop computers, servers, laptop computers, and mobile devices.
用户可以通过客户端设备200向存储系统100中的存储设备110发起数据访问请求,如数据写入请求或数据读取请求。存储设备110接收该数据访问请求,并处理该数据访问请求。A user may initiate a data access request, such as a data write request or a data read request, to the storage device 110 in the storage system 100 through the client device 200 . The storage device 110 receives the data access request and processes the data access request.
这里以存储设备110对数据访问请求进行处理,执行本申请实施例提供的数据访问方法为例进行说明。例如,数据访问请求为数据写入请求,用于请求在存储系统100中写入目标数据。该数据写入请求中包括该目标数据、目标数据的逻辑地址。存储设备110在接收到该数据写入请求后,可以根据该数据写入请求先将该目标数据写入到该目标数据的逻辑地址指示的位置处,该位置可以位于I/O栈中的一层;并更新全局索引,更新后的全局索引能够指示I/O栈中该目标数据所在的层。I/O栈将存储系统中的存储介质划分层后形成的层状结构,I/O栈中的每一层包括一种或多种存储介质,可以用于存储数据。全局索引能够将数据的逻辑地址与该数据所在的层进行关联,以指示该I/O栈中数据所在的层。关于I/O栈以及全局索引的说明可以参见下文中的相关说明。Here, the storage device 110 processes the data access request and executes the data access method provided in the embodiment of the present application as an example for description. For example, the data access request is a data write request, which is used to request to write target data in the storage system 100 . The data writing request includes the target data and the logical address of the target data. After receiving the data write request, the storage device 110 may first write the target data to the position indicated by the logical address of the target data according to the data write request, and the position may be located at a position in the I/O stack. layer; and update the global index, and the updated global index can indicate the layer where the target data is located in the I/O stack. The I/O stack is a layered structure formed by dividing the storage media in the storage system into layers. Each layer in the I/O stack includes one or more storage media that can be used to store data. The global index can associate the logical address of the data with the layer where the data resides, so as to indicate the layer where the data resides in the I/O stack. For the description of the I/O stack and the global index, please refer to the relevant description below.
又例如,数据访问请求为数据读取请求,用于请求从存储系统100中读取目标数据,该数据读取请求中携带有该目标数据的逻辑地址。存储设备110接收到该数据读取请求后,可以根据全局索引以及该目标数据的逻辑地址确定I/O栈中该目标数据的所在的层,进而从该目标数据的所在的层读取该目标数据。For another example, the data access request is a data read request, which is used to request to read target data from the storage system 100, and the data read request carries a logical address of the target data. After receiving the data read request, the storage device 110 can determine the layer where the target data is located in the I/O stack according to the global index and the logical address of the target data, and then read the target data from the layer where the target data is located. data.
具体到存储设备110的结构,参见图1,该存储设备110包括总线111、处理器112、内存113、网卡114以及硬盘115。内存113可以位于处理器112中。Specifically, referring to FIG. 1 for the structure of the storage device 110 , the storage device 110 includes a bus 111 , a processor 112 , a memory 113 , a network card 114 and a hard disk 115 . Memory 113 may be located in processor 112 .
需要说明的是,在本申请实施例中以硬盘作为存储设备的持久化存储器为例进行说明,但对机械硬盘或者其他类型的硬盘也同样适用于本申请实施例。It should be noted that, in the embodiment of the present application, a hard disk is used as an example of a persistent memory of the storage device for illustration, but the embodiment of the present application is also applicable to mechanical hard disks or other types of hard disks.
处理器112可以为中央处理器(central processing unit,CPU),该处理器112还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件、人工智能芯片、片上芯片等。Processor 112 can be central processing unit (central processing unit, CPU), and this processor 112 can also be other general processors, digital signal processor (digital signal processor, DSP), application specific integrated circuit (application specific integrated circuit, ASIC) ), field programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, artificial intelligence chips, chip-on-chip, etc.
内存113可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)、动态随机存取存储器(dynamic random access memory,DRAM)等。也 可以为非易失性存储器(non-volatile memory),例如存储级存储器(storage-class memory,SCM)等,或者易失性存储器与非易失性存储器的组合等。内存113可以从功能角度进行划分,可以将内存113划分为写缓存、读缓存等。其中写缓存是指能够提供高效写入能力的缓存,读缓存是指能够存储读取频率较高的数据的缓存。The memory 113 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM), a dynamic random access memory (dynamic random access memory, DRAM), and the like. It can also be a non-volatile memory (non-volatile memory), such as a storage-class memory (storage-class memory, SCM), etc., or a combination of a volatile memory and a non-volatile memory. The memory 113 can be divided from a functional point of view, and the memory 113 can be divided into a write cache, a read cache, and the like. The write cache refers to a cache that can provide high-efficiency write capabilities, and the read cache refers to a cache that can store data with a high read frequency.
该存储设备110中还可以包括一个或多个硬盘115。这硬盘115可以用于永久地存储数据。具体到硬盘115内部,硬盘115中也可以包括硬盘缓存以及持久化存储介质。The storage device 110 may also include one or more hard disks 115 . The hard disk 115 can be used to permanently store data. Specifically, inside the hard disk 115, the hard disk 115 may also include a hard disk cache and a persistent storage medium.
在存储设备110内部,本申请实施例提供的数据访问方法可以由于处理器112执行,也即,处理器112可以通过调用计算机执行指令,执行本申请实施例提供的数据访问方法。本申请实施例提供的数据访问方法也可以由于网卡114执行,例如,网卡114可以通过调用内存113中存储的计算机执行指令,执行本申请实施例提供的数据处理方法。又例如,网卡114也可以调用网卡114内部存储的计算机执行指令,执行本申请实施例提供的数据访问方法。又例如,在一些可能的场景中,网卡114上也可以烧写有计算机存储指令,网卡114可以执行本申请实施例提供的数据访问方法。Inside the storage device 110, the data access method provided by the embodiment of the present application may be executed by the processor 112, that is, the processor 112 may execute the data access method provided by the embodiment of the present application by invoking a computer to execute instructions. The data access method provided in the embodiment of the present application can also be executed by the network card 114 , for example, the network card 114 can execute the data processing method provided in the embodiment of the present application by calling the computer-executed instructions stored in the memory 113 . For another example, the network card 114 may also invoke computer-executed instructions stored inside the network card 114 to execute the data access method provided in the embodiment of the present application. For another example, in some possible scenarios, computer storage instructions may also be programmed on the network card 114, and the network card 114 may execute the data access method provided in the embodiment of the present application.
本申请实施例并不限定该存储系统的类型,在实践中图1中的存储系统既可以表现为集中式存储系统,也可以表现为分布式存储系统。The embodiment of the present application does not limit the type of the storage system. In practice, the storage system in FIG. 1 can be either a centralized storage system or a distributed storage system.
对于存储系统中存储设备中的各个存储介质(如读缓存、写缓存、硬盘等),引入存储介质分层的概念,存储介质分层是指按照存储设备中各个存储介质的功能或类型等标准将存储介质划分成自上而下(由高到低)的多层。本申请实施例并不限定存储介质分层时所采用的划分标准,例如,可以仅是按照存储介质的类型将存储介质划分,将同一类型的存储介质划分为一层,形成自上而下(由高到低)的多层。又例如,可以仅是按照存储介质的功能将存储介质划分,形成自上而下(由高到低)的多层。又例如,可以综合考虑存储介质的功能和类型将存储介质划分,形成自上而下(由高到低)的多层。For each storage medium in the storage device in the storage system (such as read cache, write cache, hard disk, etc.), the concept of storage medium layering is introduced. Divide the storage medium into multiple layers from top to bottom (from high to low). The embodiment of the present application does not limit the division standard used when the storage medium is layered. For example, the storage medium can be divided only according to the type of the storage medium, and the storage medium of the same type is divided into one layer to form a top-down ( from high to low) in multiple layers. For another example, the storage medium may only be divided according to the function of the storage medium to form multiple layers from top to bottom (from high to low). For another example, the storage medium may be divided into multiple layers from top to bottom (from high to low) by comprehensively considering the function and type of the storage medium.
如图2所示,在本申请实施例中,通过对存储系统中存储介质进行分层,可以形成存储系统的输入/输出(input/out,I/O)栈。在I/O栈中自上而下以粗粒度划分,可以分为性能层和容量层。性能层的读写性能优于容量层,但性能层的容量相对较小。As shown in FIG. 2 , in the embodiment of the present application, by layering storage media in the storage system, an input/output (input/out, I/O) stack of the storage system can be formed. In the I/O stack, it is divided into a performance layer and a capacity layer from top to bottom in a coarse-grained manner. The read and write performance of the performance tier is better than that of the capacity tier, but the capacity of the performance tier is relatively small.
具体到性能层内部,继续细化,自上而下可以分为写缓存(write cache,Wcache)、读缓存(read cache,Rcache)、硬盘缓存(smart cache)。写缓存(write cache,Wcache)、读缓存(read cache,Rcache)、硬盘缓存分别为I/O栈中的一层。Specific to the inside of the performance layer, continue to refine, from top to bottom can be divided into write cache (write cache, Wcache), read cache (read cache, Rcache), hard disk cache (smart cache). Write cache (write cache, Wcache), read cache (read cache, Rcache), and hard disk cache are respectively a layer in the I/O stack.
需要说明的是,性能层中包括各种缓存仅是举例。存储介质分层的标准不同,性能层中的细化分层也可能不同。It should be noted that the inclusion of various caches in the performance layer is only an example. The standards for storage medium tiering are different, and the granular tiering in the performance tier may also be different.
具体到容量层内部,继续细化,自上而下可以分为高性能层和普通性能层,其中,高性能层可以包括性能较高的硬盘,如固态硬盘(SSD)。普通性能层可以包括性能一般的硬盘,如机械硬盘(HDD)。Specific to the inside of the capacity layer, continue to refine, from top to bottom, it can be divided into high-performance layer and ordinary performance layer. Among them, the high-performance layer can include high-performance hard disks, such as solid-state drives (SSD). The general performance tier may include a hard disk with general performance, such as a mechanical hard disk (HDD).
需要说明的是,容量层中包括各层也仅是举例。不同的存储系统100包括的硬盘的类型也可能不同。例如,有些存储系统100中可能只包括一种硬盘,这种情况下容量层可以不再继续细化。It should be noted that the inclusion of each layer in the capacity layer is only an example. The types of hard disks included in different storage systems 100 may also be different. For example, some storage systems 100 may only include one type of hard disk, and in this case, the capacity layer may not be further refined.
该I/O栈中各层的排布描述了存储系统100中数据的流向,在存储系统100中数据通常是从I/O栈各层中自上而下流动的。以图2所示的I/O栈为例,当数据A写入到存储系统100时,会优先写入到性能层。在性能层内部,会优先写入到写缓存,当写缓存中的数据超过阈值W时,写缓存的数据会迁移至读缓存,这样,写缓存中会形成空闲的存储空间 以存储最新写入的数据。当读缓存的数据超过一定阈值R时,读缓存的数据会继续向下迁移,迁移至硬盘缓存。若硬盘缓存中的数据超过阈值S,硬盘缓存中的数据会流向容量层。硬盘缓存中的数据会优先迁移至容量层中的高性能层,当高性能层中的数据达到阈值H,高性能层中的数据会迁移至普通性能层。The arrangement of the layers in the I/O stack describes the flow of data in the storage system 100 , and in the storage system 100 data usually flows from top to bottom in the layers of the I/O stack. Taking the I/O stack shown in FIG. 2 as an example, when data A is written into the storage system 100, it will be written into the performance layer first. Inside the performance layer, it will be written to the write cache first. When the data in the write cache exceeds the threshold W, the data in the write cache will be migrated to the read cache. In this way, free storage space will be formed in the write cache to store the latest write The data. When the data in the read cache exceeds a certain threshold R, the data in the read cache will continue to migrate downwards to the hard disk cache. If the data in the hard disk cache exceeds the threshold S, the data in the hard disk cache will flow to the capacity layer. The data in the hard disk cache will be migrated to the high-performance tier in the capacity tier first. When the data in the high-performance tier reaches the threshold H, the data in the high-performance tier will be migrated to the normal performance tier.
具体到该I/O栈的每一层,该I/O栈的每一层中可以包括一种或多种存储介质,可用于存储数据。为了便于能够快速从该I/O栈的一层中找到数据,在该I/O栈的一层中可以对该层所存储的数据进行索引。也即在数据写入到该层中时,可以为数据创建索引。该索引能够指示数据的逻辑地址与该数据的元数据之间的对应关系。根据数据的逻辑地址能够确定该数据的元数据,进而确定该数据的物理地址。Specific to each layer of the I/O stack, each layer of the I/O stack may include one or more storage media, which may be used to store data. In order to quickly find data from a layer of the I/O stack, the data stored in the layer of the I/O stack may be indexed. That is, when data is written into this layer, an index can be created for the data. The index can indicate the correspondence between the logical address of the data and the metadata of the data. According to the logical address of the data, the metadata of the data can be determined, and then the physical address of the data can be determined.
本申请实施例并不限定每层中对数据进行索引的方式,例如可以通过哈希表的方式对数据进行索引,也即数据的索引以哈希表的形式保存在该层中,哈希表中记录了数据的逻辑地址与元数据的对应关系。又例如,还可以通过B+树或者链表的方式对数据进行索引。The embodiment of the present application does not limit the method of indexing data in each layer. For example, the data can be indexed through a hash table, that is, the index of the data is stored in the layer in the form of a hash table. The hash table The corresponding relationship between the logical address of the data and the metadata is recorded in . For another example, data can also be indexed in the form of a B+ tree or a linked list.
基于如图2所示的I/O栈,当存储系统100在处理来自客户端设备200的数据读取请求时,按照I/O栈自上而下的顺序依次在各层中的存储介质中查找数据读取请求所请求读取的数据,直到找到数据读取请求所请求读取的数据。在查找到所请求的数据后,向客户端设备200反馈该数据。Based on the I/O stack shown in Figure 2, when the storage system 100 is processing the data read request from the client device 200, the storage media in each layer are sequentially arranged in the order of the I/O stack from top to bottom Find the data requested by the data read request until the data requested by the data read request is found. After the requested data is found, the data is fed back to the client device 200 .
这种按照I/O栈自上而下的顺序依次在各层中查找数据的方式是较为常见的方式,当所查找的数据越位于下层,查找数据的耗时就越大。这样会导致数据读取请求的处理效率降低。This method of searching for data in each layer in the top-to-bottom order of the I/O stack is a relatively common method. When the data to be searched is located in the lower layer, the time-consuming to find the data is greater. This will reduce the processing efficiency of data read requests.
为了能够有效提高数据读取请求的处理效率,在本申请实施例中,存储系统100中设置有全局索引(global mask),该全局索引能够指示逻辑地址上的数据位于I/O栈中的哪一层。当存储设备110接收到数据读取请求后,可以根据该数据读取请求中携带的所请求读取目标数据的逻辑地址、以及该全局索引确定该I/O栈中目标数据所在的层中。之后,根据该逻辑地址从该层中读取目标数据。这样,在整个数据读取请求的处理过程中,并不需要按照I/O栈自上而下的顺序依次在各层中查找数据,利用全局索引就可以直接确定所请求读取的数据所在的层,能够较大程度的减少数据查找的时延,提升数据读取请求的处理效率。In order to effectively improve the processing efficiency of data read requests, in the embodiment of the present application, a global index (global mask) is set in the storage system 100, and the global index can indicate where the data on the logical address is located in the I/O stack. layer. After the storage device 110 receives the data read request, it can determine the layer where the target data in the I/O stack is located according to the logical address of the requested read target data carried in the data read request and the global index. Afterwards, the target data is read from the layer according to the logical address. In this way, in the process of processing the entire data read request, it is not necessary to search for data in each layer according to the top-down order of the I/O stack, and the global index can be used to directly determine the location of the requested data. layer, which can greatly reduce the delay of data search and improve the processing efficiency of data read requests.
显然,全局索引是简化数据读取过程的关键,下面对全局索引的构成进行说明,在本申请实施例中,全局索引可以以如下任一种或两种形式存在。Obviously, the global index is the key to simplifying the data reading process. The composition of the global index will be described below. In the embodiment of the present application, the global index can exist in any one or both of the following forms.
在对全局索引介绍之前,先对逻辑存储空间的划分进行简单说明,以更好的理解全局索引的存在形式。在本申请实施例中可以将存储系统100中将各个存储介质构成的物理存储空间映射到逻辑存储空间,该逻辑存储空间按照设定大小进行划分,将逻辑存储空间划分为多个逻辑块。每个逻辑块的大小可以相同。例如每个逻辑块可以等于256千字节(Kibibyte,KB),又例如,每个逻辑块可以等于1兆字节(megabyte,MB)。在每个逻辑块中可以按照I/O栈的最小读写单位进行划分,划分为多个逻辑子块。每个逻辑子块的大小可以等于该I/O栈的最小读写单位。Before introducing the global index, let's briefly explain the division of logical storage space to better understand the existence form of the global index. In the embodiment of the present application, the physical storage space composed of various storage media in the storage system 100 can be mapped to a logical storage space, the logical storage space is divided according to a set size, and the logical storage space is divided into multiple logical blocks. Each logical block can be the same size. For example, each logical block may be equal to 256 kilobytes (Kibibyte, KB), and for another example, each logical block may be equal to 1 megabyte (megabyte, MB). Each logical block can be divided into multiple logical sub-blocks according to the minimum reading and writing unit of the I/O stack. The size of each logical sub-block may be equal to the minimum read/write unit of the I/O stack.
I/O栈的最小读写单位是指在数据写入过程中一次向I/O栈中写入的最小数据量,或数据读取过程中一次从I/O栈中读取的最小数据量。通常一次写入的最小数据量以及读取的最小数据量是相同的。举例来说,当向存储系统100写入一个256KB的数据时,若I/O栈的最小读写单位为8KB,则在写入256KB的数据时,会分多次(32次),每次写入8KB 大小的数据,直至256KB的数据全部写完。同样的,当从存储系统100读取一个256K的数据时,若I/O栈的最小读写单位为8KB,则在读取256KB的数据时,会分多次(32次),每次读取8KB大小的数据,直至256KB的数据全部读取。The minimum reading and writing unit of the I/O stack refers to the minimum amount of data written to the I/O stack at one time during the data writing process, or the minimum amount of data read from the I/O stack at one time during the data reading process . Usually the minimum amount of data written at one time is the same as the minimum amount of data read. For example, when writing a 256KB data to the storage system 100, if the minimum read/write unit of the I/O stack is 8KB, then when writing the 256KB data, it will be divided into multiple times (32 times). Write 8KB of data until all the 256KB of data is written. Similarly, when reading a 256K data from the storage system 100, if the minimum reading and writing unit of the I/O stack is 8KB, then when reading the 256KB data, it will be divided into multiple times (32 times). Take the data of 8KB size, and read all the data up to 256KB.
在全局索引中设置了用于指向逻辑块的字符块,字符块内部包括有指向逻辑子块的字符子块。对于任一逻辑块,指向该逻辑块的字符块的具体取值能够指示该逻辑块上是否存储有数据,所存储的数据存储在I/O栈的哪一层。类似的,对于任一逻辑子块,指向该逻辑子块的字符子块能够指示该逻辑子块上是否存储有数据,所存储的数据存储在I/O栈的哪一层。A character block for pointing to a logical block is set in the global index, and the character block includes a character sub-block pointing to a logical sub-block. For any logical block, the specific value of the character block pointing to the logical block can indicate whether data is stored in the logical block, and which layer of the I/O stack the stored data is stored in. Similarly, for any logical sub-block, the character sub-block pointing to the logical sub-block can indicate whether data is stored in the logical sub-block, and which layer of the I/O stack the stored data is stored in.
全局索引中包括多个字符块,每个字符块用于指向至少一个逻辑块,每个字符块中包括多个子组,每个子组指向该至少一个逻辑块中的一个逻辑子块。每个子组包括多个字符子块,一个字符子块对应I/O栈中一层。字符子块的具体取值能够指示该逻辑子块上是否存储有数据,所存储的数据是否存储在I/O栈的中该字符子块所对应的层。The global index includes multiple character blocks, each character block is used to point to at least one logical block, each character block includes multiple subgroups, and each subgroup points to a logical subblock in the at least one logical block. Each subgroup includes multiple character sub-blocks, and one character sub-block corresponds to one layer in the I/O stack. The specific value of the character sub-block can indicate whether data is stored in the logical sub-block, and whether the stored data is stored in the layer corresponding to the character sub-block in the I/O stack.
全局索引的不同存在形式,字符块、字符子块的大小不同。对于比特位图,最小的字符子块为一个比特,每个子组包括对应I/O栈中不同层的多个比特,字符块包括指向不同逻辑子块的多个子组。一个字符块为一组比特。对于计数器组,最小的字符子块为一个计数器,通常计数器需要由多个比特表示。每个子组包括对应I/O栈中不同层的多个计数器,字符块包括指向不同逻辑子块的多个子组构成。一个字符块为一组计数器。Different existence forms of the global index have different sizes of character blocks and character sub-blocks. For the bitmap, the smallest character sub-block is one bit, each sub-group includes multiple bits corresponding to different layers in the I/O stack, and the character block includes multiple sub-groups pointing to different logical sub-blocks. A character block is a group of bits. For the counter group, the smallest character sub-block is a counter, and usually the counter needs to be represented by multiple bits. Each subgroup includes multiple counters corresponding to different layers in the I/O stack, and the character block includes multiple subgroups pointing to different logical subblocks. A character block is a set of counters.
形式一、比特位图(bitmap)。Form 1, bitmap (bitmap).
如图3A所示,比特位图的示意图,在该比特位图中包括多组(group)比特,每组比特能够指向至少一个逻辑块,每组比特中包括多个子组(grain),每个子组中包括多个比特,每个子组指向逻辑块中的一个逻辑子块,一组比特中的不同子组所指向的逻辑子块不同,每个子组中比特的数量可以等于I/O栈中层的总数,也可以等于I/O栈中层的总数减一。As shown in Figure 3A, the schematic diagram of the bitmap includes multiple groups of bits in the bitmap, and each group of bits can point to at least one logical block, and each group of bits includes multiple subgroups (grain), each subgroup The group includes multiple bits, and each subgroup points to a logical subblock in the logical block. Different subgroups in a group of bits point to different logical subblocks. The number of bits in each subgroup can be equal to the layer in the I/O stack can also be equal to the total number of layers in the I/O stack minus one.
一个子组中的一个比特对应I/O栈中的一层,不同比特对应I/O栈中的不同层。例如,对于该子组中一个比特,当该比特的取值为1时,可以指示该逻辑子块中的数据位于该层,当该比特的取值为0值时,可以指示该逻辑子块中的数据不在该层。举例来说,一个子组中的多个比特按照I/O栈自上而下的顺序依次对应I/O栈中的一层,也即第一个比特对应I/O栈中的第一层(如对应写缓存),第二个比特对应I/O栈中的第二层(如对应读缓存),第三个比特对应I/O栈中的第三层(如对应硬盘缓存)。第四个比特对应I/O栈中的第四层(如对应容量层)。A bit in a subgroup corresponds to a layer in the I/O stack, and different bits correspond to different layers in the I/O stack. For example, for a bit in the subgroup, when the value of the bit is 1, it may indicate that the data in the logical sub-block is located at this layer, and when the value of the bit is 0, it may indicate that the logical sub-block The data in is not in this layer. For example, multiple bits in a subgroup correspond to a layer in the I/O stack in the top-down order of the I/O stack, that is, the first bit corresponds to the first layer in the I/O stack (such as corresponding to the write cache), the second bit corresponds to the second layer in the I/O stack (such as corresponding to the read cache), and the third bit corresponds to the third layer in the I/O stack (such as corresponding to the hard disk cache). The fourth bit corresponds to the fourth layer in the I/O stack (eg, corresponds to the capacity layer).
当第一个比特的取值为1时,可以表示该子组所指向的逻辑子块上的数据位于写缓存,当第一个比特的取值为0时,可以表示该子组所指向的逻辑子块上的数据不在写缓存。同样的,当第二个比特的取值为1时,可以表示该子组所指向的逻辑子块上的数据位于读缓存,当第二个比特的取值为0时,可以表示该子组所指向的逻辑子块上的数据不在读缓存。同样的,当第三个比特的取值为1时,可以表示该子组所指向的逻辑子块上的数据位于硬盘缓存,当第三个比特的取值为0时,可以表示该子组所指向的逻辑子块上的数据不在硬盘缓存。同样的,当第四个比特的取值为1时(图2中未示出第四个比特),可以表示该子组所指向的逻辑子块上的数据位于容量层,当第四个比特的取值为0时,可以表示该子组所指向的逻辑子块上的数据不在容量层。在一些场景中,也可以不设置对应与最后一层的比特,如可以省略第四个比特。When the value of the first bit is 1, it can indicate that the data on the logical sub-block pointed to by the subgroup is in the write cache; when the value of the first bit is 0, it can indicate that the data on the logical subblock pointed to by the subgroup is The data on the logical sub-block is not in the write cache. Similarly, when the value of the second bit is 1, it can indicate that the data on the logical sub-block pointed to by the subgroup is in the read cache; when the value of the second bit is 0, it can indicate that the subgroup The data on the logical sub-block pointed to is not in the read cache. Similarly, when the value of the third bit is 1, it can indicate that the data on the logical sub-block pointed to by the subgroup is located in the hard disk cache; when the value of the third bit is 0, it can indicate that the subgroup The data on the logical sub-block pointed to is not in the hard disk cache. Similarly, when the value of the fourth bit is 1 (the fourth bit is not shown in Figure 2), it may indicate that the data on the logical sub-block pointed to by the subgroup is located at the capacity layer, when the fourth bit When the value of is 0, it may indicate that the data on the logical sub-block pointed to by the sub-group is not in the capacity layer. In some scenarios, the bit corresponding to the last layer may not be set, for example, the fourth bit may be omitted.
比特位图上每个子组中各个比特的值是随着数据的写入、流动、淘汰发生变化的。The value of each bit in each subgroup on the bitmap changes as data is written, flowed, and eliminated.
数据流动,是指在I/O栈中各层中数据的流入或流出,如将I/O栈中上一层的数据迁移到下一层。又如数据从I/O栈中的一层流出,流出的数据再流入到I/O栈中的另一层。数据流动主要发生在针对写缓存的刷盘流程、容量层的垃圾回收、执行读数据请求时的数据加载、数据预取流程(如将读写频率较高的数据写入到读缓存中)或动态分级存储特性中的数据迁移过程中。Data flow refers to the inflow or outflow of data in each layer of the I/O stack, such as migrating data from the upper layer of the I/O stack to the next layer. Another example is that data flows out of one layer in the I/O stack, and the outflowing data flows into another layer in the I/O stack. Data flow mainly occurs in the disk flushing process for the write cache, garbage collection at the capacity layer, data loading when executing a read data request, data prefetch process (such as writing data with high read and write frequency to the read cache) or During data migration in the dynamic storage tiering feature.
数据写入是指向I/O栈某一层写入数据。Data writing refers to writing data to a certain layer of the I/O stack.
数据淘汰是指删除I/O栈中某一层的数据,如对读缓存中的数据的淘汰,或对容量层之后进行垃圾回收时,需要将该容量层中被覆盖的数据删除。Data elimination refers to the deletion of data at a certain layer in the I/O stack. For example, when data in the read cache is eliminated, or when garbage collection is performed after the capacity layer, the overwritten data in the capacity layer needs to be deleted.
以一个子组中的第一个比特为例,该第一个比特对应该I/O栈中的第一层,第一个比特的值为1,表明该子组所指向的逻辑子块中的数据位于I/O栈中的第一层。当I/O栈中第一层的数据的数据量达到阈值W时,第一层的数据可以迁移到第二层,此时第一个比特的值会减一,变成0,数值0表明该子组所指向的逻辑子块中的数据不在第一层中。当之后进行数据写入时,写入的数据写入到了该子组所指向的逻辑子块中,那么,在数据写入时,第一个比特的值会变为1。若后续又进行了数据写入,写入的数据同样写入到了该子组所指向的逻辑子块中,之前该逻辑子块中写入的数据被覆盖,那么,在数据写入时,第一个比特的值仍会保持1。以此时第一个比特的值为1为例,当在后续在进行数据淘汰时,若发现该子组所指向的逻辑子块中写入的数据是不活跃的数据,则会删除该子组所指向的逻辑子块中的数据,删除了该子组所指向的逻辑子块中的数据后,第一个比特的值变为0。若在该子组所指向的逻辑子块中发生了两次数据写入,该子组所指向的逻辑子块中第一次写入的数据是需要被覆盖的数据,删除该子组所指向的逻辑子块中第一次写入的数据,保留最近一次写入的数据,该第一个比特的值仍保持1。Taking the first bit in a subgroup as an example, the first bit corresponds to the first layer in the I/O stack, and the value of the first bit is 1, indicating that the logical subblock pointed to by the subgroup is The data is located at the first level in the I/O stack. When the amount of data in the first layer in the I/O stack reaches the threshold W, the data in the first layer can be migrated to the second layer. At this time, the value of the first bit will be reduced by one and become 0. The value 0 indicates The data in the logical subblock pointed to by this subgroup is not in the first layer. When data is written later, the written data is written into the logical sub-block pointed to by the subgroup, then, when the data is written, the value of the first bit will become 1. If data is written later, the written data is also written into the logical sub-block pointed to by the sub-group, and the data written in the logical sub-block before is overwritten. Then, when the data is written, the first A bit will still hold a value of 1. Taking the value of the first bit at this time as 1 as an example, when data is eliminated later, if the data written in the logical sub-block pointed to by this sub-group is found to be inactive data, the sub-block will be deleted. After the data in the logical sub-block pointed to by the group is deleted, the value of the first bit becomes 0. If two data writes occur in the logical sub-block pointed to by the sub-group, and the data written for the first time in the logical sub-block pointed to by the sub-group is the data that needs to be overwritten, delete the data pointed to by the sub-group The data written for the first time in the logical sub-block of , retains the data written last time, and the value of the first bit remains 1.
可见,比特位图上每个子组中的一个比特的取值仅能够表示该子组所指向的逻辑子块上的数据是否位于该层,但无法具体描述该子组所指向的逻辑子块上的数据是第几次写入的数据。It can be seen that the value of a bit in each subgroup on the bitmap can only indicate whether the data on the logical subblock pointed to by the subgroup is located in this layer, but cannot specifically describe the data on the logical subblock pointed to by the subgroup. The data of is the data written for the first time.
需要说明的是,在实际应用中,一个子组也可以不设置对应于I/O栈中最后一层的比特。在这种情况下,该子组中各个比特对应该I/O栈中除最后一层之外的其他层,当该子组中的所有比特均为0时,也即说明该子组所指向的逻辑子块中的数据并不位于该I/O栈中除最后一层之外的其余层,进一步可以说明,该子组所指向的逻辑子块中的数据只能位于该I/O栈中最后一层。It should be noted that, in practical applications, a subgroup may not set the bit corresponding to the last layer in the I/O stack. In this case, each bit in the subgroup corresponds to the other layers in the I/O stack except the last layer. When all the bits in the subgroup are 0, it means that the subgroup points to The data in the logical sub-block of the group is not located in the rest of the I/O stack except the last layer. It can be further explained that the data in the logical sub-block pointed to by the sub-group can only be located in the I/O stack in the last layer.
由于逻辑块是逻辑存储空间中一个较大的空间,而数据的逻辑地址所指示的空间可能是该逻辑块中的一部分,逻辑地址所指示的空间可以是该逻辑块中的部分逻辑子块(为了方便表述,该部分逻辑子块以称为该逻辑地址所指示的逻辑子块)。为此,在查找数据时,需要先确定该逻辑地址所指示的空间所属的逻辑块(为了方便表述,该逻辑地址所指示的空间所属的逻辑块也可以称为逻辑地址所属的逻辑块),确定比特位图中指向该逻辑块的一组或多组比特。之后,从该逻辑块中确定该逻辑地址所指示的逻辑子块,确定比特位图中该一组或多组比特中指向该逻辑子块的一个或多个子组。Because the logical block is a larger space in the logical storage space, and the indicated space of the logical address of the data may be a part of the logical block, the indicated space of the logical address may be a part of the logical sub-block ( For the convenience of expression, this part of the logical sub-block is referred to as the logical sub-block indicated by the logical address). For this reason, when searching for data, it is necessary to first determine the logical block to which the space indicated by the logical address belongs (for the convenience of expression, the logical block to which the space indicated by the logical address belongs can also be referred to as the logical block to which the logical address belongs), One or more groups of bits in the bitmap that point to the logical block are determined. Afterwards, the logical sub-block indicated by the logical address is determined from the logical block, and one or more subgroups pointing to the logical sub-block in the one or more groups of bits in the bitmap are determined.
为了能够根据逻辑地址确定比特位图中指向该逻辑地址所属的逻辑块的一组或多组比特,该存储系统100中还可以设置哈希表,该哈希表中记录了逻辑地址与比特位图中各 组比特的对应关系。通过该数据的逻辑地址和该哈希表可以确定指向该逻辑地址所属的逻辑块的一组或多组比特。或者,该存储系统100中还可以设置哈希函数,将逻辑地址作为该哈希函数的输入,获得的哈希值可以指示该比特位图中一组或多组比特,一组或多组比特指向该逻辑地址所属的逻辑块。In order to be able to determine one or more groups of bits pointing to the logical block to which the logical address belongs in the bitmap according to the logical address, a hash table can also be set in the storage system 100, and the logical address and the bit position are recorded in the hash table. The corresponding relationship of each group of bits in the figure. One or more groups of bits pointing to the logical block to which the logical address belongs can be determined through the logical address of the data and the hash table. Alternatively, a hash function can also be set in the storage system 100, and the logical address is used as the input of the hash function, and the obtained hash value can indicate one or more groups of bits in the bitmap, one or more groups of bits Points to the logical block to which this logical address belongs.
在确定该比特位图中一组或多组比特时,可以按照逻辑块的大小对逻辑地址取整。利用取整后获得的数查询哈希表,确定指向该逻辑地址所属的逻辑块的一组或多组比特。或者对取整后获得的数作用哈希函数,确定指向该逻辑地址所属的逻辑块的一组或多组比特。When determining one or more groups of bits in the bitmap, the logical address may be rounded according to the size of the logical block. The number obtained after rounding is used to query the hash table to determine one or more groups of bits pointing to the logical block to which the logical address belongs. Or apply a hash function to the number obtained after rounding to determine one or more groups of bits pointing to the logical block to which the logical address belongs.
在确定了该比特位图中一组或多组比特之后,可以进一步从该一组或多组比特中确定一组或多个子组,该一个或多个子组所指向的逻辑子块为该逻辑地址所指示的逻辑子块。在确定该一个或多个子组时,可以根据该逻辑地址在该逻辑块中的偏移以及目标数据的数据长度确定该逻辑地址所指示的逻辑子块。After determining one or more groups of bits in the bitmap, one or more subgroups can be further determined from the one or more groups of bits, and the logical subblocks pointed to by the one or more subgroups are the logical The logical subblock indicated by the address. When determining the one or more subgroups, the logical subblock indicated by the logical address may be determined according to the offset of the logical address in the logical block and the data length of the target data.
该逻辑地址在该逻辑块中的偏移可以通过该逻辑块的起始地址与该逻辑地址之间的差值确定。以每个逻辑块大小为256KB,逻辑块中包括32个逻辑子块,每个逻辑子块的大小为8KB,每组比特中包括32个子组为例。若数据的逻辑块地址(logic block address,LBA)所指示的位置为1MB+520KB,数据长度为256KB。可以先按照逻辑块的大小(256K)对1MB+520KB取整,获得1MB+512KB。对1MB+512KB作用哈希函数,可以确定指向该逻辑地址所属的逻辑块的两组比特。1MB+512KB为该逻辑块的起始地址,该逻辑地址在该逻辑块中的偏移即为1MB+512KB与1MB+520KB之间的差值,也即偏移为8KB。该逻辑地址所指示的位置即为逻辑块偏移了8KB的位置。又由于每个逻辑子块的大小为8KB,因此该逻辑地址所指示的逻辑子块为该逻辑块起始位置后偏移一个逻辑子块后的逻辑子块,也即第二个逻辑子块。因为数据长度为256KB,该逻辑地址所指向的逻辑子块是该逻辑块中从第二个逻辑子块到第32个逻辑子块以及下一个逻辑子块中的第一个逻辑子块,共32个逻辑子块。在比特位图上指向该32个逻辑子块的子组为指向该逻辑块的一组比特中的第二个子组到第32个子组子块以及下一组比特中的第一个子组。The offset of the logical address in the logical block can be determined by the difference between the start address of the logical block and the logical address. Taking the size of each logical block as 256KB, the logical block includes 32 logical sub-blocks, the size of each logical sub-block is 8KB, and each group of bits includes 32 sub-groups as an example. If the location indicated by the logical block address (LBA) of the data is 1MB+520KB, the data length is 256KB. You can first round 1MB+520KB according to the size of the logical block (256K) to obtain 1MB+512KB. By applying a hash function to 1MB+512KB, two groups of bits pointing to the logical block to which the logical address belongs can be determined. 1MB+512KB is the starting address of the logical block, and the offset of the logical address in the logical block is the difference between 1MB+512KB and 1MB+520KB, that is, the offset is 8KB. The location indicated by the logical address is the location where the logical block is offset by 8KB. And because the size of each logical sub-block is 8KB, the logical sub-block indicated by the logical address is the logical sub-block offset by one logical sub-block after the starting position of the logical block, that is, the second logical sub-block . Because the data length is 256KB, the logical sub-block pointed to by this logical address is the first logical sub-block from the second logical sub-block to the 32nd logical sub-block and the next logical sub-block in this logical block, a total of 32 logical sub-blocks. The subgroups pointing to the 32 logical subblocks on the bitmap are the second subgroup to the 32nd subgroup subblock in a group of bits pointing to the logical block and the first subgroup in the next group of bits.
在找到了比特位图上指向该逻辑地址所指示的逻辑子块的各个子组之后,可以继续根据该各个子组中每个比特的取值确定该子组所指向的逻辑子块上的数据所在的I/O栈中的层。After finding each subgroup on the bitmap that points to the logical subblock indicated by the logical address, you can continue to determine the data on the logical subblock pointed to by the subgroup according to the value of each bit in each subgroup The layer in the I/O stack where it resides.
对于每个子组,在确定了I/O栈中该子组所指向的逻辑子块上的数据所在的层后,可以从该层中查找该子组所指向的逻辑子块上的数据。具体到该层,可以根据该数据的逻辑地址查找该层中的数据索引,确定该数据的元数据,进而从该元数据的指示的位置处读取该数据。For each subgroup, after determining the layer of the data on the logical subblock pointed to by the subgroup in the I/O stack, the data on the logical subblock pointed to by the subgroup can be searched from the layer. Specifically for this layer, the data index in this layer can be searched according to the logical address of the data, the metadata of the data can be determined, and then the data can be read from the position indicated by the metadata.
举例来说,对于第一组比特中的第二个子组来说,若该子组中包括三个比特,分别对应该I/O栈中的第一层(写缓存)、第二层(读缓存)以及第三层(硬盘缓存)。若该子组中的三个比特的取值为100,则说明该子组所指向的逻辑子块上的数据位于写缓存,可以从该写缓存中读取该数据。若该子组中的三个比特的取值为010,则说明该子组所指向的逻辑子块上的数据位于读缓存,可以从该读缓存中读取该数据。若该子组中的三个比特的取值为000,则说明该子组所指向的逻辑子块上的数据并不位于该I/O栈中的前三层,而是位于第四层容量层中,可以从该容量层中读取该数据。若该子组中的三个比特的取值为110,则说明该子组所指向的逻辑子块上的数据存储在该I/O栈中的前两层。由于数据写入存储系统100时,会优先会写入到该I/O栈的第一层,该逻辑子块上最新写入的数据在该 I/O栈中的第一层。For example, for the second subgroup in the first group of bits, if the subgroup includes three bits, they correspond to the first layer (write cache) and the second layer (read cache) in the I/O stack respectively. cache) and the third layer (hard disk cache). If the value of the three bits in the subgroup is 100, it means that the data on the logical subblock pointed to by the subgroup is in the write cache, and the data can be read from the write cache. If the value of the three bits in the subgroup is 010, it means that the data on the logical subblock pointed to by the subgroup is in the read cache, and the data can be read from the read cache. If the value of the three bits in the subgroup is 000, it means that the data on the logical subblock pointed to by the subgroup is not located in the first three layers of the I/O stack, but in the fourth layer capacity tier, the data can be read from the capacity tier. If the value of the three bits in the subgroup is 110, it means that the data on the logical subblock pointed to by the subgroup is stored in the first two layers of the I/O stack. When data is written into the storage system 100, it is preferentially written to the first layer of the I/O stack, and the latest data written on the logical sub-block is on the first layer of the I/O stack.
形式二、计数器(counter)组。Form 2, the counter (counter) group.
如图3B所示,计数器组的示意图,在该计数器组中包括多组(group)计数器,每组计数器能够指向至少一个逻辑块,每组计数器中包括多个子组(grain),每个子组中包括多个计数器,每个子组指向逻辑块中的一个逻辑子块,一组计数器中的不同子组所指向的逻辑子块不同,每个子组中计数器的数量可以等于I/O栈中层的总数,也可以等于I/O栈中层的总数减一。As shown in Figure 3B, the schematic diagram of the counter group includes multiple groups (group) counters in the counter group, and each group of counters can point to at least one logic block, and each group of counters includes a plurality of subgroups (grain), in each subgroup Including multiple counters, each subgroup points to a logical subblock in the logical block, different subgroups in a set of counters point to different logical subblocks, and the number of counters in each subgroup can be equal to the total number of layers in the I/O stack , which can also be equal to the total number of layers in the I/O stack minus one.
一个子组中的一个计数器对应I/O栈中的一层,不同计数器对应I/O栈中的不同层。举例来说,一个子组中的多个计数器按照I/O栈自上而下的顺序依次对应I/O栈中的一层,也即第一个计数器对应I/O栈中的第一层(如对应写缓存),第二个计数器对应I/O栈中的第二层(如对应读缓存),第三个计数器对应I/O栈中的第三层(如对应硬盘缓存)。第四个计数器对应I/O栈中的第四层(如对应容量层)。A counter in a subgroup corresponds to a layer in the I/O stack, and different counters correspond to different layers in the I/O stack. For example, multiple counters in a subgroup correspond to a layer in the I/O stack in the top-down order of the I/O stack, that is, the first counter corresponds to the first layer in the I/O stack (such as corresponding to the write cache), the second counter corresponds to the second layer in the I/O stack (such as corresponding to the read cache), and the third counter corresponds to the third layer in the I/O stack (such as corresponding to the hard disk cache). The fourth counter corresponds to the fourth layer in the I/O stack (eg, corresponds to the capacity layer).
对于该子组中一个计数器,当该计数器的取值为空值或0时,可以指示该逻辑子块中的数据不在该层,当该计数器的取值非空或者非0整数时,可以指示该逻辑子块中的数据位于该层,该计数器上的具体值能够表示该逻辑子块中的数据被更新的次数。例如,当第一个计数器的取值非0时,可以表示该子组所指向的逻辑子块上的数据位于写缓存,当第一个计数器的取值为0或空值时,可以表示该子组所指向的逻辑子块上的数据不在写缓存。同样的,当第二个计数器的取值非0时,可以表示该子组所指向的逻辑子块上的数据位于读缓存,当第二个计数器的取值为0或空值时,可以表示该子组所指向的逻辑子块上的数据不在读缓存。同样的,当第三个计数器的取值非0时,可以表示该子组所指向的逻辑子块上的数据位于硬盘缓存,当第三个计数器的取值为0或空值时,可以表示该子组所指向的逻辑子块上的数据不在硬盘缓存。同样的,当第四个计数器的取值非0时,可以表示该子组所指向的逻辑子块上的数据位于容量层,当第四个计数器的取值为0或空值时,可以表示该子组所指向的逻辑子块上的数据不在容量层。For a counter in this subgroup, when the value of the counter is null or 0, it can indicate that the data in the logical sub-block is not in this layer; when the value of the counter is non-null or a non-zero integer, it can indicate The data in the logical sub-block is located at this layer, and the specific value on the counter can represent the number of times the data in the logical sub-block is updated. For example, when the value of the first counter is not 0, it may indicate that the data on the logical sub-block pointed to by the subgroup is in the write cache; when the value of the first counter is 0 or null, it may indicate that the The data on the logical subblock pointed to by the subgroup is not in the write cache. Similarly, when the value of the second counter is non-zero, it can indicate that the data on the logical sub-block pointed to by the subgroup is in the read cache; when the value of the second counter is 0 or null, it can indicate The data on the logical subblock pointed to by this subgroup is not in the read cache. Similarly, when the value of the third counter is non-zero, it can indicate that the data on the logical sub-block pointed to by the subgroup is located in the hard disk cache; when the value of the third counter is 0 or null, it can indicate The data on the logical subblock pointed to by this subgroup is not cached on the hard disk. Similarly, when the value of the fourth counter is not 0, it can indicate that the data on the logical sub-block pointed to by the subgroup is located in the capacity layer; when the value of the fourth counter is 0 or null, it can indicate The data on the logical subblock pointed to by this subgroup is not in the capacity layer.
由于在I/O栈中会存在数据流动,如将I/O栈中上一层的数据迁移到下一层,又如数据从I/O栈中的一层流出,流出的数据再流入到I/O栈中的另一层。I/O栈中还存在数据多次写入同一个逻辑地址的情况,例如,在I/O栈中允许向同一个逻辑地址多次写入数据,最近一次写入的数据会覆盖之前写入的数据。I/O栈中还存在数据淘汰,如将I/O栈中某一层中不活跃的数据删除,或将I/O栈中某一层中被覆盖的数据删除。Because there will be data flow in the I/O stack, such as migrating data from the upper layer in the I/O stack to the next layer, or data flowing out from a layer in the I/O stack, the outflowing data will flow into the Another layer in the I/O stack. In the I/O stack, data may be written to the same logical address multiple times. For example, in the I/O stack, data is allowed to be written to the same logical address multiple times. The last written data will overwrite the previously written data. The data. There is also data elimination in the I/O stack, such as deleting inactive data in a certain layer in the I/O stack, or deleting overwritten data in a certain layer in the I/O stack.
利用该计数器的取值可以记录该I/O栈在该层上发生的数据流动、数据写入以及数据淘汰。以一个子组中的第一个计数器为例,该第一个计数器可以对应该I/O栈中的第一层,第一个计数器的值为1,表明该子组所指向的逻辑子块中的数据位于I/O栈中的第一层。当I/O栈中第一层的数据的数据量达到阈值W时,第一层的数据可以迁移到第二层,此时第一个计数器的值会减一,表明该子组所指向的逻辑子块中的数据不在第一层中。仍以一个子组中的第一个计数器为例,该计数器当前的取值为1,之后进行了两次数据写入过程,写入的数据均写入到了该子组所指向的逻辑子块中,那么,在第一次数据写入时,第一个计数器的值会增一,变为2。在第二次数据写入时,第一个计数器的值会再增一,变为3。第一个计数器的值变为2或3,可以表明在该子组所指向的逻辑子块中先后共写入2次数据或3次数据。以此时第一个计数器的值变为3为例,当在后续在进行数据淘汰时,若发 现该子组所指向的逻辑子块中前两次写入的数据是需要被覆盖的数据,则会删除该子组所指向的逻辑子块中前两次写入的数据,删除了该子组所指向的逻辑子块中前两次写入的数据后,第一个计数器的值变为1。The value of the counter can be used to record the data flow, data writing and data elimination of the I/O stack on the layer. Taking the first counter in a subgroup as an example, the first counter can correspond to the first layer in the I/O stack, and the value of the first counter is 1, indicating the logical subblock pointed to by the subgroup The data in is at the first level in the I/O stack. When the amount of data in the first layer in the I/O stack reaches the threshold W, the data in the first layer can be migrated to the second layer. At this time, the value of the first counter will be reduced by one, indicating that the subgroup points to The data in the logical sub-block is not in the first layer. Still taking the first counter in a subgroup as an example, the current value of the counter is 1, and then two data writing processes are performed, and the written data is written into the logical subblock pointed to by the subgroup , then, when the data is written for the first time, the value of the first counter will increase by one and become 2. When the data is written for the second time, the value of the first counter will be increased by one more to become 3. The value of the first counter changes to 2 or 3, which may indicate that data has been written 2 times or 3 times successively in the logical subblock pointed to by the subgroup. Taking the value of the first counter at this time as 3 as an example, when the subsequent data elimination is performed, if it is found that the data written in the previous two times in the logical sub-block pointed to by the sub-group is the data that needs to be overwritten, The data written in the previous two times in the logical sub-block pointed to by the sub-group will be deleted. After deleting the data written in the previous two times in the logical sub-block pointed to by the sub-group, the value of the first counter becomes 1.
需要说明的是,在实际应用中,一个子组也可以不设置对应于I/O栈中最后一层的计数器。在这种情况下,该子组中各个计数器对应该I/O栈中除最后一层之外的其他层,当该子组中的所有计数器均为0或空值时,也即说明该子组所指向的逻辑子块中的数据并不位于该I/O栈中除最后一层之外的其余层,进一步可以说明,该子组所指向的逻辑子块中的数据只能位于该I/O栈中最后一层。It should be noted that, in practical applications, a subgroup may not set the counter corresponding to the last layer in the I/O stack. In this case, each counter in the subgroup corresponds to other layers in the I/O stack except the last layer. When all the counters in the subgroup are 0 or null, it means that the subgroup The data in the logical sub-block pointed to by the group is not located in the rest of the I/O stack except the last layer. It can be further explained that the data in the logical sub-block pointed to by the sub-group can only be located in the I/O stack. /O last layer in the stack.
利用逻辑地址确定一组或多组计数器,以及从该一组或组计数器组中找到一个或多个子组的方式,与形式一种利用逻辑地址确定一组或多组比特,以及从该一组或组比特中找到一个或多个子组的方式类似,具体可以参见前述说明,此处不再赘述。Using a logical address to identify one or more groups of counters, and finding one or more subgroups from the group or group of counters, is the same as a method of using a logical address to determine one or more groups of bits, and finding one or more subgroups from the group The method of finding one or more subgroups in or group bits is similar, for details, please refer to the foregoing description, which will not be repeated here.
从上述说明可以看出,计数器组与比特位图所能描述的信息基本相同,计数器组所能描述的信息更加丰富,通过计数器组中各个计数器的值可以确定该计数器对应的层中数据流动、写入以及淘汰情况。这里以两个简单的例子对计数器组的优势进行说明:It can be seen from the above description that the information that can be described by the counter group and the bitmap is basically the same, and the information that the counter group can describe is more abundant. The value of each counter in the counter group can determine the data flow in the layer corresponding to the counter, Writing and elimination. Here are two simple examples to illustrate the advantages of counter groups:
例子1:在确定全局索引中逻辑地址所属的逻辑块的一个或多个字符块(如一组或多组比特,一组或多组计数器)时,采用哈希计算的方式将逻辑地址映射到字符块。不同的逻辑地址,通过哈希计算后,哈希值可能相同,产生哈希冲突,会将两个不同的逻辑地址映射到相同的字符块。这种情况下所映射到的字符块的取值(也即各个子组中字符子块的取值)实际上需要表征该两个逻辑地址所属的逻辑块中的数据位于I/O栈的哪一层。Example 1: When determining one or more character blocks (such as one or more sets of bits, one or more sets of counters) of the logical block to which the logical address belongs in the global index, use hash calculation to map the logical address to the character piece. Different logical addresses, after hash calculation, may have the same hash value, resulting in a hash collision, which will map two different logical addresses to the same character block. In this case, the value of the character block mapped to (that is, the value of the character sub-block in each subgroup) actually needs to represent where the data in the logical block to which the two logical addresses belong is located in the I/O stack. layer.
为了便于理解,这里以两个逻辑地址分别为LBA1和LBA2、以及全局索引中该两个逻辑地址所映射到的字符块为字符块A为例,这里仅以该字符块A中每个子组中的第一个字符子块a的取值做相关说明,其他字符子块a的取值的情况与第一个字符子块的取值方式类似。For ease of understanding, here is an example where the two logical addresses are LBA1 and LBA2, and the character block to which the two logical addresses in the global index are mapped is character block A. Here, only the characters in each subgroup in the character block A are The value of the first character sub-block a will be described, and the value of other character sub-blocks a is similar to the value of the first character sub-block.
当向逻辑地址LBA1中写入数据,数据会优先写入到I/O栈的第一层,若字符子块a为一个比特,该比特的取值会变为1。若字符子块为一个计数器,该计数器的取值也会变为1。之后,若向逻辑地址LBA2中写入数据,数据也会优先写入到I/O栈的第一层,若字符子块a为一个比特,该比特的取值仍保持1。若字符子块a为一个计数器,该计数器的取值将由1增加到2。从这里可以看出,该计数器的取值能够较为清楚的记录数据写入I/O栈的第一层的次数。这种情况下,全局索引以比特位图或计数器组形式存在,均能够准确地指示出逻辑地址LBA1或LBA2所属的逻辑块中的数据位于I/O栈的第一层。When data is written to the logical address LBA1, the data will be written to the first layer of the I/O stack first. If the character sub-block a is a bit, the value of this bit will become 1. If the character sub-block is a counter, the value of the counter will also become 1. Afterwards, if data is written to the logical address LBA2, the data will also be preferentially written to the first layer of the I/O stack. If the character sub-block a is a bit, the value of this bit remains 1. If the character sub-block a is a counter, the value of the counter will increase from 1 to 2. It can be seen from this that the value of the counter can clearly record the number of times data is written to the first layer of the I/O stack. In this case, the global index exists in the form of a bitmap or a counter group, both of which can accurately indicate that the data in the logical block to which the logical address LBA1 or LBA2 belongs is located at the first layer of the I/O stack.
但是,之后,若逻辑地址LBA1所属逻辑块中的数据发生了数据流动,从I/O栈的第一层流动到I/O栈的第二层,若字符子块a为一个比特,该比特的取值会从1变为0,该比特的下一个比特(也即与I/O栈的第二层对应的比特)会从0变为1。若字符子块a为一个计数器,该计数器的取值会从2变为1,该计数器的下一个计数器(也即与I/O栈的第二层对应的计数器)会从0变为1。However, later, if the data in the logic block to which the logical address LBA1 belongs has a data flow, it flows from the first layer of the I/O stack to the second layer of the I/O stack. If the character sub-block a is a bit, the bit The value of will change from 1 to 0, and the next bit of this bit (that is, the bit corresponding to the second layer of the I/O stack) will change from 0 to 1. If the character sub-block a is a counter, the value of the counter will change from 2 to 1, and the next counter of the counter (that is, the counter corresponding to the second layer of the I/O stack) will change from 0 to 1.
当需要查询逻辑地址LBA2中的数据时,若全局索引以比特位图的形式存在,由于与I/O栈的第一层对应的比特的取值为0,其下一个比特的取值为1,经过查询全局索引确定逻辑地址LBA2中的数据位于I/O栈的第二层。而实际上逻辑地址LAB2中的数据并未发生流动,仍位于I/O栈的第一层,这样容易导致后续的数据读取出现问题,无法从该I/O栈的第二层中读取LBA2中的数据。若全局索引以计数器组的形式存在,由于与I/O栈的 第一层对应的计数器的取值为1,经过查询全局索引确定逻辑地址LBA2中的数据位于I/O栈的第一层,与I/O栈中逻辑地址LAB2中的数据所在的层一致,后续可以准确的读取逻辑地址LBA2中的数据。When it is necessary to query the data in the logical address LBA2, if the global index exists in the form of a bitmap, since the value of the bit corresponding to the first layer of the I/O stack is 0, the value of the next bit is 1 , by querying the global index, it is determined that the data in the logical address LBA2 is located in the second layer of the I/O stack. In fact, the data in the logical address LAB2 has not flowed, and is still located on the first layer of the I/O stack. This will easily cause problems in subsequent data reading and cannot be read from the second layer of the I/O stack. Data in LBA2. If the global index exists in the form of a counter group, since the value of the counter corresponding to the first layer of the I/O stack is 1, the data in the logical address LBA2 is determined to be located at the first layer of the I/O stack by querying the global index. It is consistent with the layer where the data in the logical address LAB2 in the I/O stack is located, and the data in the logical address LBA2 can be accurately read subsequently.
当需要查询逻辑地址LBA1中的数据时,若全局索引以比特位图的形式存在,由于与I/O栈的第一层对应的比特的取值为0,其下一个比特的取值为1,经过查询全局索引确定逻辑地址LBA1中的数据位于I/O栈的第二层。与I/O栈中逻辑地址LAB1中的数据所在的层一致,可以准确读取数据。若全局索引以计数器组的形式存在,由于与I/O栈的第一层对应的计数器的取值为1,经过查询全局索引确定逻辑地址LBA1中的数据位于I/O栈的第一层,虽然与I/O栈中逻辑地址LAB1中的数据所在的层不一致,但在后续数据读取的过程中,虽然在I/O栈的第一层查询不到逻辑地址LBA1中的数据,但是之后,可以按照I/O栈的层的顺序遍历,能够在第二层查询到逻辑地址LBA1中的数据,逻辑地址LBA1中的数据仍能被准确读取。When it is necessary to query the data in the logical address LBA1, if the global index exists in the form of a bitmap, since the value of the bit corresponding to the first layer of the I/O stack is 0, the value of the next bit is 1 , it is determined by querying the global index that the data in the logical address LBA1 is located in the second layer of the I/O stack. It is consistent with the layer where the data in the logical address LAB1 in the I/O stack is located, and the data can be read accurately. If the global index exists in the form of a counter group, since the value of the counter corresponding to the first layer of the I/O stack is 1, it is determined that the data in the logical address LBA1 is located at the first layer of the I/O stack by querying the global index. Although it is inconsistent with the layer where the data in the logical address LAB1 in the I/O stack is located, in the process of subsequent data reading, although the data in the logical address LBA1 cannot be queried on the first layer of the I/O stack, but later , can be traversed according to the order of the layers of the I/O stack, and the data in the logical address LBA1 can be queried at the second layer, and the data in the logical address LBA1 can still be read accurately.
从这里可以看出,该计数器的取值除了能清楚的记录I/O栈的第一层中数据的流动状态,还能够在一定程度上解决哈希冲突问题,保证数据读取准确性。It can be seen from this that the value of the counter can not only clearly record the flow status of data in the first layer of the I/O stack, but also solve the problem of hash conflicts to a certain extent and ensure the accuracy of data reading.
例子2:仍以字符子块a以及LBA1为例,在追加写的场景中,当第一次向逻辑地址LBA1中写入数据,数据会优先写入到I/O栈的第一层,若字符子块a为一个比特,该比特的取值为1。若字符子块a为一个计数器,该计数器的取值也是1。之后,若再次向逻辑地址LBA1中写入数据,以覆盖之前的写入的数据,数据也会优先写入到I/O栈的第一层,若字符子块a为一个比特,该比特的取值仍保持1。若字符子块a为一个计数器,该计数器的取值将由1增加到2。在进行数据淘汰时,会将第一次写入的数据从存储介质中删除,在数据淘汰时,可以更新全局索引。若字符子块a为一个比特,该比特的取值会从1变为0。若字符子块a为一个计数器,该计数器的取值会从2变为1。而实际上,逻辑地址LBA1中数据所在的层仍为第一层,当字符子块a为比特时,取值有存在误差的可能性,可见,以计数器组形式存在的全局索引,可以准确的记录I/O栈中逻辑地址LBA1中数据所在的层。Example 2: Still taking the character sub-block a and LBA1 as an example, in the scenario of additional writing, when data is written to the logical address LBA1 for the first time, the data will be written to the first layer of the I/O stack first, if The character sub-block a is one bit, and the value of this bit is 1. If the character sub-block a is a counter, the value of the counter is also 1. Afterwards, if data is written to the logical address LBA1 again to cover the previously written data, the data will also be written to the first layer of the I/O stack first. If the character sub-block a is one bit, the bit The value remains 1. If the character sub-block a is a counter, the value of the counter will increase from 1 to 2. During data elimination, the data written for the first time will be deleted from the storage medium, and the global index can be updated during data elimination. If the character sub-block a is a bit, the value of this bit will change from 1 to 0. If the character sub-block a is a counter, the value of the counter will change from 2 to 1. In fact, the layer where the data in the logical address LBA1 is located is still the first layer. When the character sub-block a is a bit, there may be errors in the value. It can be seen that the global index in the form of a counter group can accurately Record the layer where the data in the logical address LBA1 in the I/O stack is located.
但相对比特位图来说,计数器组中的一个计数器会占用多个比特,利用该多个比特来表征不同的数值,这样也使得计数器组所占用的空间与比特位图相比更大。However, compared with the bitmap, one counter in the counter group will occupy multiple bits, and the multiple bits are used to represent different values, which also makes the space occupied by the counter group larger than that of the bitmap.
在本实施例中,可采用计数器组的方式实现全局索引,也可采用比特位图的方式实现全局索引,还可以两种方式都适用。下面对结合附图对本申请实施例提供的数据访问方法进行说明,在不同的场景中,本申请实施例所提供的数据访问方法的执行主体会不同。例如,可以由图1所示的存储系统100中存储设备110的处理器112来执行数据访问方法,也可以由存储系统100中存储设备110的网卡114来执行数据访问方法。下面对这两种可能的情况进行说明:In this embodiment, the global index may be realized by means of a counter group, or may be realized by a bitmap, or both methods may be applicable. The following describes the data access method provided by the embodiment of the present application with reference to the accompanying drawings. In different scenarios, the execution subject of the data access method provided by the embodiment of the present application will be different. For example, the data access method may be executed by the processor 112 of the storage device 110 in the storage system 100 shown in FIG. 1 , or may be executed by the network card 114 of the storage device 110 in the storage system 100 . The two possible cases are described below:
场景一、存储系统100中存储设备110的处理器112执行本申请实施例提供的数据访问方法。Scenario 1: The processor 112 of the storage device 110 in the storage system 100 executes the data access method provided in the embodiment of the present application.
如图4所示,为本申请提供的一种数据访问方法,该方法包括:As shown in Figure 4, a data access method provided by this application, the method includes:
步骤401:处理器112接收数据写入请求,该数据写入请求用于请求写入目标数据,该数据写入请求中携带有目标数据以及目标数据的逻辑地址。逻辑地址可以包括起始逻辑地址、数据长度(length)。数据的起始逻辑地址可以用逻辑块地址(logic block address,LBA)和逻辑单元号(logical unit number,LUN)表示。Step 401: The processor 112 receives a data write request, the data write request is used to request to write target data, and the data write request carries target data and a logical address of the target data. The logical address may include a start logical address and a data length (length). The starting logical address of the data can be represented by a logical block address (logic block address, LBA) and a logical unit number (logical unit number, LUN).
该数据写入请求可以是客户端设备200直接发送给存储设备110的。该数据写入请求也可以是存储系统100中其他存储设备110发送给该存储设备110的,如存储系统100中存在用于管理该存储设备110的设备,该设备能够为数据分配存储设备110,还能够指示存储设备110将数据写入到该存储设备110中。当该设备确定需要在该存储设备110中写入目标数据时,可以向该存储设备110发送数据写入请求。The data writing request may be directly sent by the client device 200 to the storage device 110 . The data writing request may also be sent to the storage device 110 by other storage devices 110 in the storage system 100. For example, there is a device for managing the storage device 110 in the storage system 100, and the device can allocate the storage device 110 for data, It is also possible to instruct the storage device 110 to write data into the storage device 110 . When the device determines that target data needs to be written in the storage device 110 , it may send a data writing request to the storage device 110 .
步骤402:存储设备110的处理器112根据该数据写入请求,将目标数据写入到该逻辑地址所指示的位置。Step 402: The processor 112 of the storage device 110 writes the target data to the location indicated by the logical address according to the data writing request.
处理器112在将目标数据写入到逻辑地址所指示的位置时,可以优先将目标数据写入到I/O栈中的第一层,例如可以优先写缓存中。When the processor 112 writes the target data to the location indicated by the logical address, it may preferentially write the target data to the first layer in the I/O stack, for example, it may preferentially write the target data into the cache.
处理器112在写入到I/O栈中的第一层中,为该目标数据创建索引,目标数据的索引能够指示目标数据的逻辑地址与该目标数据的元数据之间的对应关系。该目标数据的元数据能指示该层中该目标数据的物理地址。The processor 112 creates an index for the target data when writing to the first layer of the I/O stack, and the index of the target data can indicate the correspondence between the logical address of the target data and the metadata of the target data. The metadata of the object data can indicate the physical address of the object data in the layer.
步骤403:处理器112更新全局索引,更新后全局索引能够指示I/O栈中该目标数据所在的层。Step 403: the processor 112 updates the global index, and after the update, the global index can indicate the layer where the target data is located in the I/O stack.
处理器112可以根据该目标数据的逻辑地址可以确定该全局索引中指向该逻辑地址的逻辑块的字符块以及指示该逻辑块中逻辑子块的子组。例如处理器112可以在哈希表中查询目标数据的逻辑地址所对应的字符块以及字符块中的子组。由于处理器112会优先将目标数据存储在I/O栈中的第一层,处理器112可以设置指示该逻辑块中逻辑子块的子组中I/O栈中的第一层对应的字符子块的具体取值,使得设置后的字符子块的取值能够指示子组中I/O栈中的第一层中存储有数据。The processor 112 may determine, according to the logical address of the target data, a character block in the global index pointing to the logical block of the logical address and a subgroup indicating a logical sub-block in the logical block. For example, the processor 112 may query the character block corresponding to the logical address of the target data and the subgroup in the character block in the hash table. Because the processor 112 will preferentially store the target data in the first layer in the I/O stack, the processor 112 can set the character corresponding to the first layer in the I/O stack in the subgroup of the logical sub-block in the logical block The specific value of the sub-block is such that the set value of the character sub-block can indicate that data is stored in the first layer of the I/O stack in the sub-group.
当全局索引以比特位图的形式存在时,处理器112在更新全局索引时,处理器112可以根据该目标数据的逻辑地址确定指示该逻辑地址所属的逻辑块的一组或多组比特,之后,再根据该逻辑地址在该逻辑块中的偏移以及目标数据的数据长度确定该逻辑地址所指示的逻辑子块,进而确定比特位图中指向该逻辑子块的各个子组。由于处理器112优先将该目标数据存储在I/O栈中的第一层,处理器112可以将指向该逻辑子块的各个子组中的第一个比特(也即对应I/O栈的第一层的比特)的取值设置为1,以指示目标数据存储在该I/O栈的第一层。When the global index exists in the form of a bitmap, when the processor 112 updates the global index, the processor 112 can determine one or more groups of bits indicating the logical block to which the logical address belongs according to the logical address of the target data, and then , and then determine the logical sub-block indicated by the logical address according to the offset of the logical address in the logical block and the data length of the target data, and then determine each subgroup pointing to the logical sub-block in the bitmap. Since the processor 112 preferentially stores the target data in the first layer in the I/O stack, the processor 112 can point to the first bit in each subgroup of the logical subblock (that is, the corresponding I/O stack The value of the bit of the first layer) is set to 1 to indicate that the target data is stored in the first layer of the I/O stack.
当全局索引以计数器组的形式存在时,处理器112在更新全局索引时,处理器112可以根据该目标数据的逻辑地址确定指示该逻辑地址所属的逻辑块的一组或多组计数器,之后,再根据该逻辑地址在该逻辑块中的偏移以及目标数据的数据长度确定该逻辑地址所指示的逻辑子块,进而确定比特位图中指向该逻辑子块的各个子组。由于处理器112优先将该目标数据存储在I/O栈中的第一层,处理器112可以将指向该逻辑子块的各个子组中的第一个计数器(也即对应I/O栈的第一层的计数器)的取值增1,以指示在该I/O栈的第一层写入了目标数据。When the global index exists in the form of a counter group, when the processor 112 updates the global index, the processor 112 can determine one or more groups of counters indicating the logical block to which the logical address belongs according to the logical address of the target data, and then, Then determine the logical sub-block indicated by the logical address according to the offset of the logical address in the logical block and the data length of the target data, and then determine each subgroup pointing to the logical sub-block in the bitmap. Since the processor 112 preferentially stores the target data in the first layer in the I/O stack, the processor 112 can point to the first counter in each subgroup of the logical subblock (that is, the corresponding I/O stack The value of the counter of the first layer) is increased by 1 to indicate that the target data is written in the first layer of the I/O stack.
本申请实施例并不限定步骤402以及步骤403执行的前后顺序,也就是说,处理器112可以先将目标数据写入。在写入目标数据之后,处理器112更新全局索引。处理器112也可以先更新全局索引,再将目标数据写入。无论是目标数据的写入,还是全局索引的更新,处理器112均需要通过目标数据的逻辑地址来实现的,目标数据的写入以及全局索引的更新是两个相对独立的过程,之间没有直接的关联关系,故而在本申请实施例中并不特别强调步骤402以及步骤403执行的前后顺序。The embodiment of the present application does not limit the order in which step 402 and step 403 are performed, that is, the processor 112 may first write the target data. After writing the target data, processor 112 updates the global index. The processor 112 may also update the global index first, and then write the target data. Whether it is the writing of target data or the updating of the global index, the processor 112 needs to realize the logical address of the target data. The writing of the target data and the updating of the global index are two relatively independent processes, and there is no Therefore, in the embodiment of the present application, the sequence of execution of step 402 and step 403 is not particularly emphasized.
在本申请实施例中,处理器112可以先更新全局索引(先执行步骤403),之后再将目标数据写入到逻辑地址指示的存储位置(再执行步骤402)。处理器112先更新全局索引能够提前将I/O栈中目标数据所在的层更新到全局索引中,这样若处理器112接收数据写入请求之后很短时间内接收到用于请求目标数据的数据读取请求,由于之前先执行了全局索引的更新,处理器112能够根据更新后的全局索引准确的确定出I/O栈中目标数据所在的层。In the embodiment of the present application, the processor 112 may update the global index first (step 403 is executed first), and then writes the target data into the storage location indicated by the logical address (step 402 is executed again). The processor 112 first updates the global index to update the layer of the target data in the I/O stack to the global index in advance, so that if the processor 112 receives the data for requesting the target data within a short time after receiving the data write request For the read request, because the update of the global index is performed before, the processor 112 can accurately determine the layer where the target data in the I/O stack is located according to the updated global index.
步骤404:处理器112反馈数据写入响应,指示目标数据已成功写入。Step 404: The processor 112 feeds back a data writing response, indicating that the target data has been successfully written.
步骤401~步骤404即为数据写入流程,除了数据写入流程,处理器112还可以执行其他数据处理流程,例如数据流动、数据淘汰等。Steps 401 to 404 are the data writing process. In addition to the data writing process, the processor 112 may also execute other data processing processes, such as data flow, data elimination, and the like.
例如,当需要将I/O栈中的某一层的存储介质中存储的数据迁移到下一层存储介质中(也即对I/O栈中的某一层的存储介质进行刷盘)时,处理器112可以将该层中存储介质中的数据迁移至下一层存储介质,在完成数据迁移之后,处理器112可以更新全局索引。For example, when it is necessary to migrate the data stored in the storage medium of a certain layer in the I/O stack to the storage medium of the next layer (that is, to refresh the storage medium of a certain layer in the I/O stack) , the processor 112 may migrate the data in the storage medium of the layer to the storage medium of the next layer, and after the data migration is completed, the processor 112 may update the global index.
当全局索引以比特位图的形式存在时,处理器112在更新全局索引时,处理器112可以将该全局索引中与I/O栈中的该层对应的比特设置为0,以表征该层的存储介质已不存在数据,将该全局索引中与I/O栈中的该层的下一层对应的比特设置为1,以表征该层的下一层的存储介质中存储有数据。When the global index exists in the form of a bitmap, when the processor 112 updates the global index, the processor 112 can set the bit in the global index corresponding to the layer in the I/O stack to 0 to represent the layer If there is no data in the storage medium of the I/O stack, the bit corresponding to the lower layer of the I/O stack in the global index is set to 1 to indicate that data is stored in the storage medium of the lower layer of the layer.
当全局索引以计数器组的形式存在时,处理器112在更新全局索引时,处理器112可以将该全局索引中与I/O栈中的该层对应的计数器的取值减1,以表征该层的存储介质中的数据发生了一次迁移,将该全局索引中与I/O栈中的该层的下一层对应的计数器的取值加1,以表征该层的下一层的存储介质中迁入了新的数据。When the global index exists in the form of a counter group, when the processor 112 updates the global index, the processor 112 may decrement the value of the counter in the global index corresponding to the layer in the I/O stack by 1 to represent the The data in the storage medium of the layer has been migrated once, and the value of the counter corresponding to the next layer of the layer in the I/O stack in the global index is increased by 1 to represent the storage medium of the next layer of the layer The new data has been moved into .
又例如,处理器112也可以将I/O栈中的某一层中读取频率较高的数据迁移到I/O栈中的较高层中,如读缓存所在的第二层。处理器112可以控制该层中存储介质中的读取频率较高的数据流出,控制该数据流入第二层的存储介质。处理器112还可以更新全局索引。处理器112更新全局索引的操作可以在数据流出之后,数据流入第二层之前执行。For another example, the processor 112 may also migrate data with a higher reading frequency in a certain layer of the I/O stack to a higher layer in the I/O stack, such as the second layer where the read cache is located. The processor 112 may control the outflow of data with a higher reading frequency in the storage medium in this layer, and control the inflow of the data into the storage medium in the second layer. Processor 112 may also update the global index. The operation of the processor 112 to update the global index may be performed after the data flows out and before the data flows into the second layer.
当全局索引以比特位图的形式存在时,处理器112在更新全局索引时,处理器112可以将该全局索引中该数据的逻辑地址所指向的子组中与I/O栈中的该层对应的比特设置为0,以表征该层的存储介质已不存在该数据,将该全局索引中数据的逻辑地址所指向的子组中与I/O栈中的第二层对应的比特设置为1,以表征该I/O栈中的第二层的存储介质中存储有该数据。When the global index exists in the form of a bitmap, when the processor 112 updates the global index, the processor 112 can associate the subgroup pointed to by the logical address of the data in the global index with the layer in the I/O stack The corresponding bit is set to 0 to indicate that the data does not exist in the storage medium of this layer, and the bit corresponding to the second layer in the I/O stack in the subgroup pointed to by the logical address of the data in the global index is set to 1, indicating that the data is stored in the storage medium of the second layer in the I/O stack.
当全局索引以计数器组的形式存在时,处理器112在更新全局索引时,处理器112可以将该全局索引中数据的逻辑地址所指向的子组中与I/O栈中的该层对应的计数器的取值减1,以表征该层的存储介质中的数据发生了一次流出,将该全局索引中数据的逻辑地址所指向的子组中与I/O栈中第一层对应的计数器的取值加1,以表征该I/O栈中的第一层的存储介质中流入了新的数据。When the global index exists in the form of a counter group, when the processor 112 updates the global index, the processor 112 can point to the subgroup corresponding to the layer in the I/O stack in the subgroup pointed to by the logical address of the data in the global index. The value of the counter is decremented by 1 to indicate that the data in the storage medium of this layer has been outflowed once, and the value of the counter corresponding to the first layer in the I/O stack in the subgroup pointed to by the logical address of the data in the global index is Add 1 to the value to indicate that new data has flowed into the storage medium of the first layer in the I/O stack.
又例如,处理器112也可以将I/O栈中的某一层中无效数据删除,这里的无效数据可以为追加写的场景中被覆盖的数据,也可以是一些读取频率较低的数据。处理器112可以将该层中存储介质中的数据迁出并删除,处理器112还可以更新全局索引。For another example, the processor 112 may also delete invalid data in a certain layer in the I/O stack, where the invalid data may be overwritten data in an additional write scenario, or some data with a low reading frequency . The processor 112 can move out and delete the data in the storage medium in the layer, and the processor 112 can also update the global index.
当全局索引以比特位图的形式存在时,处理器112在更新全局索引时,处理器112可以将该全局索引中数据的逻辑地址所指向的子组中与I/O栈中的该层对应的比特设置为0,以表征该层的存储介质已不存在该数据。When the global index exists in the form of a bitmap, when the processor 112 updates the global index, the processor 112 can correspond to the layer in the I/O stack in the subgroup pointed to by the logical address of the data in the global index The bit of is set to 0 to indicate that the data does not exist in the storage medium of this layer.
当全局索引以计数器组的形式存在时,处理器112在更新全局索引时,处理器112可以将该全局索引中数据的逻辑地址所指向的子组中与I/O栈中的该层对应的计数器的取值减1,以表征该层的存储介质中的数据进行了一次淘汰。When the global index exists in the form of a counter group, when the processor 112 updates the global index, the processor 112 can point to the subgroup corresponding to the layer in the I/O stack in the subgroup pointed to by the logical address of the data in the global index. The value of the counter is decremented by 1, which indicates that the data in the storage medium of this layer has been eliminated once.
除了数据写入流程、数据流动以及数据淘汰,处理器112还可以执行数据读取流程,具体可以参见步骤405~步骤408。In addition to the data writing process, data flow, and data elimination, the processor 112 may also execute the data reading process, see steps 405 to 408 for details.
步骤405:处理器112接收数据读取请求,该数据读取请求用于请求读取目标数据,该数据写入请求中携带有目标数据的逻辑地址。Step 405: The processor 112 receives a data read request, the data read request is used to request to read the target data, and the data write request carries the logical address of the target data.
步骤406:处理器112根据该目标数据的逻辑地址查询全局索引,确定I/O栈中该目标数据所在的层。Step 406: The processor 112 queries the global index according to the logical address of the target data, and determines the layer of the target data in the I/O stack.
处理器112可以根据该目标数据的逻辑地址确定该全局索引中执行该逻辑地址所属的逻辑块的字符块,之后再根据该逻辑地址在该逻辑块中的偏移以及该目标数据的数据长度确定该字符块中指向该逻辑地址所指示的逻辑子块的子组。处理器112根据子组中各个字符子块的取值确定该目标数据所在的层。The processor 112 can determine the character block in the global index that executes the logical block to which the logical address belongs according to the logical address of the target data, and then determine according to the offset of the logical address in the logical block and the data length of the target data The character block points to the subgroup of the logical subblock indicated by the logical address. The processor 112 determines the layer where the target data is located according to the value of each character sub-block in the subgroup.
当全局索引以比特位图的形式存在时,处理器112根据该目标数据的逻辑地址确定指示该逻辑地址所属的逻辑块的一组或多组比特,之后,再根据该逻辑地址在该逻辑块中的偏移以及目标数据的数据长度确定该逻辑地址所指示的逻辑子块,进而确定比特位图中指向该逻辑子块的各个子组。处理器112可以确定指向该逻辑子块的各个子组中的各个比特的取值,确定比特的取值为1所对应的层为该目标数据所在的层。When the global index exists in the form of a bitmap, the processor 112 determines, according to the logical address of the target data, one or more groups of bits indicating the logical block to which the logical address belongs, and then according to the logical address in the logical block The offset in and the data length of the target data determine the logical sub-block indicated by the logical address, and then determine each subgroup pointing to the logical sub-block in the bitmap. The processor 112 may determine the value of each bit in each subgroup pointing to the logical subblock, and determine that the layer corresponding to the bit value of 1 is the layer where the target data is located.
需要说明的是,在比特位图中每个子组中未设置与I/O栈中最后一层对应的比特的情况下,若指向该逻辑子块的各个子组中的各个比特的值均为0,处理器112可以确定该目标数据所在的层为该I/O栈中最后一层。It should be noted that, in the case that the bit corresponding to the last layer in the I/O stack is not set in each subgroup in the bitmap, if the value of each bit in each subgroup pointing to the logical subblock is 0, the processor 112 may determine that the layer where the target data is located is the last layer in the I/O stack.
当全局索引以计数器组的形式存在时,处理器112根据该目标数据的逻辑地址确定指示该逻辑地址所属的逻辑块的一组或多组计数器,之后,再根据该逻辑地址在该逻辑块中的偏移以及目标数据的数据长度确定该逻辑地址所指示的逻辑子块,进而确定比特位图中指向该逻辑子块的各个子组。处理器112可以确定指向该逻辑子块的各个子组中的各个计数器的取值,确定计数器的取值非0所对应的层为该目标数据所在的层,若一个子组中存在多个计数器的值非0,说明逻辑子块中的数据发生了多次数据写入,最新写入的目标数据所在的层为多个非0的计数器对应的层中的最高层。When the global index exists in the form of a counter group, the processor 112 determines according to the logical address of the target data one or more groups of counters indicating the logical block to which the logical address belongs, and then according to the logical address in the logical block The offset of the target data and the data length of the target data determine the logical sub-block indicated by the logical address, and then determine each subgroup pointing to the logical sub-block in the bitmap. The processor 112 can determine the value of each counter in each subgroup pointing to the logical subblock, and determine that the layer corresponding to the counter value other than 0 is the layer where the target data is located. If there are multiple counters in a subgroup The value of is not 0, indicating that the data in the logical sub-block has been written multiple times, and the layer where the latest written target data is located is the highest layer among the layers corresponding to multiple non-zero counters.
需要说明的是,在计数器组中每个子组中未设置与I/O栈中最后一层对应的计数器的情况下,若指向该逻辑子块的各个子组中的各个计数器的值均为0,处理器112可以确定该目标数据所在的层为该I/O栈中最后一层。It should be noted that, in the case that the counter corresponding to the last layer in the I/O stack is not set in each subgroup in the counter group, if the values of each counter in each subgroup pointing to the logical subblock are 0 , the processor 112 may determine that the layer where the target data resides is the last layer in the I/O stack.
步骤407:处理器112在确定了I/O栈中该目标数据所在的层后,可以直接根据该目标数据的逻辑地址从该层中读取该目标数据。Step 407: After the processor 112 determines the layer of the target data in the I/O stack, it can directly read the target data from the layer according to the logical address of the target data.
处理器112可以根据该目标数据的逻辑地址查询该层中该目标数据的索引,确定该目标数据的元数据,之后从元数据所指示的位置读取该目标数据。The processor 112 may query the index of the target data in the layer according to the logical address of the target data, determine the metadata of the target data, and then read the target data from the location indicated by the metadata.
步骤408:处理器112反馈数据读取响应,该数据读取响应中包括该目标数据。Step 408: The processor 112 feeds back a data read response, where the data read response includes the target data.
场景二、存储系统100中节点的网卡114执行本申请实施例提供的数据访问方法。Scenario 2: The network card 114 of the node in the storage system 100 executes the data access method provided in the embodiment of the present application.
如图5所示,为本申请提供的一种数据访问方法,在该方法中由图1所示的存储设备 110的网卡114执行数据写入流程以及数据读取流程,网卡114执行数据写入流程以及数据读取流程的方式与处理器112执行数据写入流程以及数据读取流程的方式类似,区别在于执行主体不同,具体可以参见图4所示的实施例中的相关说明,此处不再赘述。As shown in Figure 5, it is a data access method provided by the present application, in which the network card 114 of the storage device 110 shown in Figure 1 executes the data writing process and the data reading process, and the network card 114 performs data writing The process and the way of the data reading process are similar to the way that the processor 112 executes the data writing process and the data reading process. Let me repeat.
步骤501:存储设备110的网卡114接收数据写入请求。Step 501: The network card 114 of the storage device 110 receives a data writing request.
步骤502:网卡114根据该数据写入请求,将目标数据写入到该逻辑地址所指示的存储位置。Step 502: According to the data writing request, the network card 114 writes the target data into the storage location indicated by the logical address.
网卡114在将目标数据写入到逻辑地址所指示的存储位置时,可以优先将目标数据写入到I/O栈中的第一层,例如可以优先写缓存中。When the network card 114 writes the target data into the storage location indicated by the logical address, it may preferentially write the target data into the first layer in the I/O stack, for example, it may preferentially write the target data into the cache.
步骤503:网卡114更新全局索引,更新后全局索引能够指示I/O栈中该目标数据所在的层。Step 503: The network card 114 updates the global index, and the updated global index can indicate the layer where the target data is located in the I/O stack.
步骤504:网卡114反馈数据写入响应,指示目标数据已成功写入。Step 504: The network card 114 feeds back a data writing response, indicating that the target data has been successfully written.
步骤501~步骤504即为数据写入流程,除了数据写入流程网卡114还可以执行数据读取流程,具体可以参见步骤505~408。Steps 501 to 504 are the data writing process. In addition to the data writing process, the network card 114 can also execute the data reading process. Refer to steps 505 to 408 for details.
步骤505:网卡114接收数据读取请求,该数据读取请求用于请求读取目标数据,该数据写入请求中携带有目标数据的逻辑地址。Step 505: The network card 114 receives a data read request, the data read request is used to request to read the target data, and the data write request carries the logical address of the target data.
步骤506:网卡114根据该目标数据的逻辑地址查询全局索引,确定I/O栈中该目标数据所在的层。Step 506: The network card 114 queries the global index according to the logical address of the target data, and determines the layer of the target data in the I/O stack.
步骤507:网卡114在确定了I/O栈中该目标数据所在的层后,可以直接根据该目标数据的逻辑地址从该层中读取该目标数据。Step 507: After determining the layer of the target data in the I/O stack, the network card 114 can directly read the target data from the layer according to the logical address of the target data.
网卡114可以根据该目标数据的逻辑地址查询该层中该目标数据的索引,确定该目标数据的元数据,之后从元数据指示的位置读取该目标数据。The network card 114 may query the index of the target data in the layer according to the logical address of the target data, determine the metadata of the target data, and then read the target data from the location indicated by the metadata.
步骤508:网卡114反馈数据读取响应,该数据读取响应中包括该目标数据。Step 508: The network card 114 feeds back a data read response, and the data read response includes the target data.
由存储设备110的网卡114对数据写入请求以及数据读取请求进行处理,能够有效减少对存储设备110的处理器112的占用,也能够有效提高数据写入流程以及数据读取流程的效率。The data writing request and the data reading request are processed by the network card 114 of the storage device 110, which can effectively reduce the occupation of the processor 112 of the storage device 110, and can also effectively improve the efficiency of the data writing process and the data reading process.
在这种场景中,网卡114中也可以设置有缓存,网卡114在处理数据写入请求时,可以优先将目标数据写入网卡114中的缓存。这种情况下,可以将网卡114中的缓存作为I/O栈的一层,增加到I/O栈中。例如,将网卡114的缓存作为I/O栈的第一层,将存储设备110中的写缓存、读缓存、硬盘缓存、容量层依次作为I/O栈的第二层、第三层、第四层、第五层。I/O栈中数据的写入以及流动顺序依旧按照自上而下的方向进行,与前述说明中提及的I/O栈不同的是I/O栈中增加了新的一层。In this scenario, the network card 114 may also be provided with a cache, and the network card 114 may preferentially write the target data into the cache in the network card 114 when processing the data write request. In this case, the cache in the network card 114 may be added to the I/O stack as a layer of the I/O stack. For example, the cache of the network card 114 is used as the first layer of the I/O stack, and the write cache, read cache, hard disk cache, and capacity layer in the storage device 110 are sequentially used as the second, third, and third layers of the I/O stack. Fourth floor, fifth floor. The writing and flow sequence of data in the I/O stack is still carried out in a top-down direction. The difference from the I/O stack mentioned in the previous description is that a new layer is added to the I/O stack.
在场景二中,网卡114可以执行数据写入流程以及数据读取流程,处理器112可以执行其他数据处理流程,例如数据流动、数据淘汰等。关于处理器112执行其他数据处理流程的方式可以参见图4所示的实施例中的说明,此处不再赘述。In the second scenario, the network card 114 may execute the data writing process and the data reading process, and the processor 112 may execute other data processing processes, such as data flow, data elimination, and the like. For the manner in which the processor 112 executes other data processing procedures, reference may be made to the description in the embodiment shown in FIG. 4 , which will not be repeated here.
在另一些场景中,存储系统100的I/O栈中的一层或多层中数据的索引支持单边RDMA访问或直通访问,也即客户端设备200可以通过单边RDMA访问或直通访问的方式读取该一层或多层中数据的元数据。In other scenarios, the index of data in one or more layers of the I/O stack of the storage system 100 supports unilateral RDMA access or pass-through access, that is, the client device 200 can use unilateral RDMA access or pass-through access method to read the metadata for the data in that tier or tiers.
支持单边RDMA访问是指该一层或多层中数据的索引存储在存储设备110的内存中,客户端设备200侧记录该一层或多层中数据的索引的起始地址。当客户端设备200需要读取其中一层某一个数据的元数据时,客户端设备200可以根据该层中数据的索引的起始地 址以及该数据的逻辑地址计算该数据的元数据的内存地址,客户端设备200可以基于该元数据的内存地址,通过获单边RDMA获得数据的元数据。Supporting unilateral RDMA access means that the index of the data in the layer or layers is stored in the memory of the storage device 110, and the client device 200 records the start address of the index of the data in the layer or layers. When the client device 200 needs to read the metadata of a certain layer of data, the client device 200 can calculate the memory address of the metadata of the data according to the starting address of the index of the data in the layer and the logical address of the data , the client device 200 may obtain the metadata of the data through unilateral RDMA based on the memory address of the metadata.
支持直通访问是指该一层或多层中的存储介质为持久化存储器,这里以持久化存储器为硬盘为例。该一层或多层中存储的数据的索引存储在存储设备110的硬盘。客户端设备200侧记录该一层或多层中数据的索引的起始地址。当客户端设备200需要读取其中一层某一个数据的元数据时,客户端设备200可以根据该层中数据的索引的起始地址以及该数据的逻辑地址计算在硬盘中该数据的元数据的存储地址,客户端设备200可以基于该元数据的存储地址通过直通访问从硬盘中获得数据的元数据。这里所谓直通访问指示客户端设备200通过存储设备110的网卡114以及硬盘中的控制器直接读取硬盘中存储的数据的方式,在直通访问过程中无需存储设备110的处理器112参与。Supporting pass-through access means that the storage medium in one or more layers is a persistent storage. Here, the persistent storage is a hard disk as an example. Indexes of data stored in one or more layers are stored in the hard disk of the storage device 110 . The client device 200 side records the starting address of the data index in the layer or layers. When the client device 200 needs to read the metadata of a certain layer of data, the client device 200 can calculate the metadata of the data in the hard disk according to the starting address of the index of the data in the layer and the logical address of the data Based on the storage address of the metadata, the client device 200 can obtain the metadata of the data from the hard disk through direct access. The so-called direct access here indicates that the client device 200 directly reads the data stored in the hard disk through the network card 114 of the storage device 110 and the controller in the hard disk, and the processor 112 of the storage device 110 does not need to participate in the direct access process.
在这种场景中,本申请实施例提供的一种数据访问方法,在该数据访问方式中,存储设备110的处理器112无需参与。客户端设备200需要读取目标数据时,可以通过单边RDMA访问或直通访问的方式从该一层或多层中读取该目标数据的元数据,客户端设备200还可以通过单边RDMA访问全局索引中指向目标逻辑子块的目标子组中的部分或全部字符子块,其中目标逻辑子块为该目标数据的逻辑地址所指示的逻辑子块。客户端设备200可以根据所访问的部分或全部字符子块的具体取值来确定I/O栈中该目标数据所在的层,进而确定所读取的目标数据的元数据是否为有效,也即该目标数据的元数据所指示的存储地址上是否存储有该目标数据。在确定所读取的目标数据的元数据有效的情况下,若该目标数据的元数据指示目标数据位于内存中时,可以通过单边RDMA访问的方式读取该目标数据;若该目标数据的元数据指示目标数据位于硬盘中时,可以通过直通访问的方式读取该目标数据。下面对这种场景下的数据访问方法进行说明。In this scenario, the embodiment of the present application provides a data access method, in which the processor 112 of the storage device 110 does not need to participate. When the client device 200 needs to read the target data, it can read the metadata of the target data from the one or more layers through unilateral RDMA access or direct access, and the client device 200 can also access the target data through unilateral RDMA The global index points to some or all character sub-blocks in the target sub-group of the target logical sub-block, wherein the target logical sub-block is the logical sub-block indicated by the logical address of the target data. The client device 200 can determine the layer of the target data in the I/O stack according to the specific values of some or all of the character sub-blocks accessed, and then determine whether the metadata of the read target data is valid, that is, Whether the target data is stored at the storage address indicated by the metadata of the target data. In the case where it is determined that the metadata of the read target data is valid, if the metadata of the target data indicates that the target data is located in memory, the target data can be read through unilateral RDMA access; if the target data When the metadata indicates that the target data is located in the hard disk, the target data can be read through a direct access method. The data access method in this scenario is described below.
场景三、客户端设备200通过单边RDAM或直通访问的方式访问目标数据。Scenario 3: The client device 200 accesses target data through unilateral RDAM or direct access.
在本申请实施例中是以客户端通过单边RDMA的方式访问目标数据的方式进行说明,单边RDAM或直通访问的方式的不同仅是由于目标数据或数据的索引所在的存储介质不同导致的,无论是通过单边RDAM从存储系统100中读取目标数据,还是通过直通访问的方式从存储系统100中访问目标数据的方式,基本流程是相同的,区别仅是在于当具体读取目标数据的元数据、或目标数据时所采用的方式。为了方便说明,将存储系统100的I/O栈中支持单边RDMA访问或直通访问的一层称为直通层,也即将存储系统100的I/O栈中可以存在一个或多个直通层。In the embodiment of this application, the client accesses the target data through unilateral RDMA. The difference in unilateral RDAM or direct access is only due to the difference in the storage medium where the target data or data index is located. , whether the target data is read from the storage system 100 through unilateral RDAM, or the target data is accessed from the storage system 100 through direct access, the basic process is the same, the difference is only when the target data is read metadata, or target data. For convenience of description, the layer in the I/O stack of the storage system 100 that supports unilateral RDMA access or pass-through access is called a pass-through layer, that is, there may be one or more pass-through layers in the I/O stack of the storage system 100 .
如图6所示,为本申请提供的一种数据访问方法,该方法包括:As shown in Figure 6, a data access method provided by this application, the method includes:
步骤601:存储设备110将全局索引的内存地址通知给客户端设备200。Step 601: the storage device 110 notifies the client device 200 of the memory address of the global index.
在存储设备110侧全局索引可以存储在存储设备110的内存中,存储设备110可以将全局索引在内存中的起始地址以及该全局索引的长度作为全局索引的内存地址,通知给客户端设备200。On the storage device 110 side, the global index may be stored in the memory of the storage device 110, and the storage device 110 may notify the client device 200 of the starting address of the global index in the memory and the length of the global index as the memory address of the global index .
步骤602:客户端设备200可以向存储设备110发起第一单边RDMA,第一单边RDMA用于读取存储系统100的I/O栈中该目标直通层中目标数据的元数据。该目标直通层为一个或多个直通层中的一层或多层。Step 602: the client device 200 may initiate a first one-sided RDMA to the storage device 110, and the first one-sided RDMA is used to read the metadata of the target data in the target pass-through layer in the I/O stack of the storage system 100. The target pass-through layer is one or more layers of one or more pass-through layers.
步骤603:客户端设备200可以向存储设备110发起第二单边RDMA,第二单边RDMA用于从存储设备110获取全局索引中指向该目标逻辑子块的各个子组中的全部或部分字符子块。Step 603: the client device 200 may initiate a second unilateral RDMA to the storage device 110, and the second unilateral RDMA is used to obtain from the storage device 110 all or part of the characters in each subgroup in the global index pointing to the target logical subblock subblock.
这里部分字符子块可以不包括与该目标直通层对应的字符子块,例如客户端设备200可以获取各个子组中与该目标直通层之上的各层对应的字符子块。Here, some character sub-blocks may not include character sub-blocks corresponding to the target direct layer. For example, the client device 200 may obtain character sub-blocks corresponding to layers above the target direct layer in each subgroup.
本申请实施例并不限定客户端设备200向存储设备110发起第一单边RDMA以及第二单边RDMA的先后顺序。客户端设备200可以在较短时间内向存储设备110发起第一单边RDMA和第二单边RDMA,也即可以同步发起第一单边RDMA以及第二单边RDMA。The embodiment of the present application does not limit the order in which the client device 200 initiates the first unilateral RDMA and the second unilateral RDMA to the storage device 110 . The client device 200 can initiate the first unilateral RDMA and the second unilateral RDMA to the storage device 110 within a relatively short period of time, that is, can initiate the first unilateral RDMA and the second unilateral RDMA synchronously.
下面分别对这第一单边RDMA以及第二单边RDMA进行说明:The first unilateral RDMA and the second unilateral RDMA are described below:
(1)、第一单边RDMA—读取存储系统100的I/O栈中该目标直通层中目标数据的元数据。(1) First unilateral RDMA—reading the metadata of the target data in the target pass-through layer in the I/O stack of the storage system 100 .
客户端设备200侧记录该目标直通层中数据的索引的起始地址。当客户端设备200需要读取目标直通层中目标数据的元数据时,客户端设备200可以根据该目标直通层中数据的索引的起始地址以及该数据的逻辑地址计算该数据的元数据的内存地址。这里并不限定客户端设备200根据该目标直通层中数据的索引的起始地址以及该目标数据的逻辑地址计算该目标数据的元数据的内存地址的方式,例如可以对该目标数据的逻辑地址查询哈希表、或作用哈希函数、进行学习型索引的方式计算目标数据的元数据的内存地址。The client device 200 side records the starting address of the data index in the target pass-through layer. When the client device 200 needs to read the metadata of the target data in the target pass-through layer, the client device 200 can calculate the metadata of the data according to the starting address of the index of the data in the target pass-through layer and the logical address of the data. memory address. This does not limit the manner in which the client device 200 calculates the memory address of the metadata of the target data according to the start address of the data index in the target pass-through layer and the logical address of the target data. For example, the logical address of the target data can be Calculate the memory address of the metadata of the target data by querying the hash table, or acting on the hash function, and performing a learning index.
客户端设备200在确定了该目标数据的元数据的内存地址之后,可以基于RDMA向存储设备110的网卡114发起第一请求,该第一请求用于请求读取该目标数据的元数据,该第一请求中携带有目标数据的元数据的内存地址。After determining the memory address of the metadata of the target data, the client device 200 may initiate a first request to the network card 114 of the storage device 110 based on RDMA, and the first request is used to request to read the metadata of the target data. The first request carries the memory address of the metadata of the target data.
存储设备110的网卡114在接收到第一请求后,存储设备110的网卡114可以处理该第一请求,根据该目标数据的元数据的内存地址获取该目标数据的元数据,将该目标数据的元数据携带在第一响应中,将第一响应反馈给客户端设备200。After the network card 114 of the storage device 110 receives the first request, the network card 114 of the storage device 110 can process the first request, obtain the metadata of the target data according to the memory address of the metadata of the target data, and obtain the metadata of the target data. The metadata is carried in the first response, and the first response is fed back to the client device 200 .
根据对I/O栈的组成的说明,可知对于同一个逻辑子块,数据可以存储I/O栈中的不同层中。又由于同一逻辑子块中数据可以在I/O栈中的不同层流入、流出,这样可能会导致在I/O栈中的不同层中对该逻辑子块中的数据都进行了索引,也即在I/O栈中的不同层中均保留了该逻辑子块中的数据的元数据。故而,本申请实施例并不限定目标直通层的数量,允许客户端设备200通过第一单播RDMA获取多个目标直通层中目标数据的元数据。当目标直通层的数量为一时,客户端设备200可以发起一次第一单边RDMA,以获取该目标直通层中目标数据的元数据。当存在多个目标直通层时,客户端设备200可以发起多次第一单边RDMA,每次第一单边RDMA获取其中一个目标直通层中目标数据的元数据。According to the description of the composition of the I/O stack, it can be known that for the same logical sub-block, data can be stored in different layers in the I/O stack. And because the data in the same logical sub-block can flow in and out at different layers in the I/O stack, this may cause the data in the logical sub-block to be indexed in different layers in the I/O stack, and also That is, the metadata of the data in the logical sub-block is reserved in different layers in the I/O stack. Therefore, the embodiment of the present application does not limit the number of target direct layers, and allows the client device 200 to acquire metadata of target data in multiple target direct layers through the first unicast RDMA. When the number of target pass-through layers is one, the client device 200 may initiate a first unilateral RDMA once to acquire metadata of target data in the target pass-through layer. When there are multiple target pass-through layers, the client device 200 may initiate the first one-sided RDMA multiple times, and each time the first one-sided RDMA acquires the metadata of the target data in one of the target pass-through layers.
但这些目标直通层中目标数据的元数据可能是无效的,也即该目标数据的元数据所指示的物理地址上存储的数据并非最新写入的数据。为了能够验证目标直通层中目标数据的元数据的有效性,客户端设备200可以执行第二单边RDMA。However, the metadata of the target data in the target pass-through layer may be invalid, that is, the data stored at the physical address indicated by the metadata of the target data is not the latest written data. In order to be able to verify the validity of the metadata of the target data in the target pass-through layer, the client device 200 may perform a second one-sided RDMA.
(2)、第二单边RDMA—存储设备110获取全局索引中指向该目标数据的逻辑地址所指示的逻辑子块的各个子组。(2) The second unilateral RDMA—the storage device 110 acquires each subgroup of the logical subblock indicated by the logical address pointing to the target data in the global index.
客户端设备200可以根据该目标数据的逻辑地址确定该全局索引中指向该目标数据的逻辑地址所指示的逻辑子块的各个子组在全局索引中的位置。The client device 200 may determine the position in the global index of each subgroup of the logical subblock indicated by the logical address pointing to the target data in the global index according to the logical address of the target data.
在确定该各个子组在全局索引中的位置后,由于存储设备110已将全局索引在存储设备110中的内存地址告知了客户端设备200,客户端设备200可以根据该全局索引的内存地址确定该各个子组的内存地址。After determining the position of each subgroup in the global index, since the storage device 110 has notified the client device 200 of the memory address of the global index in the storage device 110, the client device 200 can determine according to the memory address of the global index The memory addresses of the respective subgroups.
以每个逻辑块大小为256KB,逻辑块中包括32个逻辑子块,每个逻辑子块的大小为8KB,每组比特中包括32个子组为例。若目标数据的LBA所指示的位置为1MB+520KB, 可以确定指向该逻辑地址所属的逻辑块的两个字符块。例如,该两个字符块为全局索引中的第三个和第四个字符块。之后再根据该逻辑地址在该逻辑块中的偏移以及目标数据的数据长度确定该逻辑地址所指示的32个逻辑子块,例如,确定第三个和第四个字符块中指向该32个逻辑子块的32个子组。在全局索引中,指向该32个逻辑子块的子组为指向该逻辑块的第三个字符块中的第二个子组到第32个子组子块以及第四个字符块中的第一个子组。该32个子组的起始位置是全局索引中偏移两个字符块和一个子组长度的位置,该32个子组的长度为一个字符块的长度。Taking the size of each logical block as 256KB, the logical block includes 32 logical sub-blocks, the size of each logical sub-block is 8KB, and each group of bits includes 32 sub-groups as an example. If the location indicated by the LBA of the target data is 1MB+520KB, two character blocks pointing to the logical block to which the logical address belongs can be determined. For example, the two character blocks are the third and fourth character blocks in the global index. Then determine the 32 logical sub-blocks indicated by the logical address according to the offset of the logical address in the logical block and the data length of the target data, for example, determine the 32 sub-blocks pointed to in the third and fourth character blocks 32 subgroups of logical subblocks. In the global index, the subgroups pointing to the 32 logical subblocks are the second subgroup to the 32nd subgroup subblock in the third character block pointing to the logical block and the first in the fourth character block subgroup. The starting positions of the 32 subgroups are positions offset by two character blocks and the length of one subgroup in the global index, and the length of the 32 subgroups is the length of one character block.
以全局索引中每个子组中设置与I/O层的层对应的字符子块的总数为N,若全局索引以比特位图的形式存在,一个子组的大小等于N比特。一个字符块为一组比特,一组比特的大小为32*N比特。该32个子组位于全局索引中偏移起始地址32*N+N比特、长度为32*N比特的位置处。The total number of character sub-blocks corresponding to the layer of the I/O layer is set to N in each subgroup in the global index. If the global index exists in the form of a bitmap, the size of a subgroup is equal to N bits. A character block is a group of bits, and the size of a group of bits is 32*N bits. The 32 subgroups are located in the global index at positions offset by 32*N+N bits from the start address and with a length of 32*N bits.
若全局索引以计数器组的形式存在,若每个计数器占用M个比特,一个子组的大小等于N*M比特。一个字符块为一组计数器,一组计数器的大小为32*N*M比特。该32个子组位于全局索引中偏移起始地址32*N*M+N*M比特、长度为32*N*M比特的位置处。If the global index exists in the form of a counter group, and if each counter occupies M bits, the size of a subgroup is equal to N*M bits. A character block is a group of counters, and the size of a group of counters is 32*N*M bits. The 32 subgroups are located in the global index at positions offset from the start address of 32*N*M+N*M bits and with a length of 32*N*M bits.
由此确定了该各个子组在全局索引中的位置。之后,客户端设备200可以根据该全局索引的内存地址确定该各个子组的内存地址。例如,客户端设备200可以将全局索引的起始地址偏移两个字符块和一个子组长度后作为该各个子组的起始地址,该各个子组的长度为一个字符块的长度。该各个子组的起始地址和该各个子组的长度可以作为该各个子组的内存地址。又例如,客户端设备200也可以将全局索引的起始地址偏移两个字符块和一个子组长度后作为该各个子组的起始地址,将全局索引的起始地址偏移三个字符块和一个子组长度后作为该各个子组的终止地址,该各个子组的起始地址和终止地址可以作为该各个子组的内存地址。The positions of the respective subgroups in the global index are thus determined. Afterwards, the client device 200 can determine the memory address of each subgroup according to the memory address of the global index. For example, the client device 200 may offset the start address of the global index by two character blocks and the length of one subgroup as the start address of each subgroup, and the length of each subgroup is the length of one character block. The start address of each subgroup and the length of each subgroup can be used as the memory address of each subgroup. For another example, the client device 200 may also offset the starting address of the global index by two character blocks and the length of one subgroup as the starting address of each subgroup, and offset the starting address of the global index by three characters After the block and the length of a subgroup are used as the end address of each subgroup, the start address and end address of each subgroup can be used as the memory address of each subgroup.
若客户端设备200仅需获取该各个子组中的部分字符子块,获取该各个子组中除了与该目标直通层对应的字符子块的其余字符子块,客户端设备200还可以进一步对各个组的内存地址进行处理,将各个子组的内存地址去除与该目标直通层对应的字符子块的内存地址,获得该各个子组中部分字符子块的内存地址。If the client device 200 only needs to obtain part of the character sub-blocks in each subgroup, and obtains the rest of the character sub-blocks in each subgroup except the character sub-block corresponding to the target direct layer, the client device 200 can further The memory address of each group is processed, and the memory address of the character sub-block corresponding to the target through layer is removed from the memory address of each sub-group to obtain the memory addresses of some character sub-blocks in each sub-group.
仍以目标数据的LBA所指示的位置为1MB+520KB,数据长度为256KB为例,若每个子组中包括P个字符子块,各个子组中与目标直通层对应的字符子块为最后一个字符子块,32个子组中部分字符子块的内存地址可以为32个地址段,该32个地址段的起始地址为全局索引的起始地址偏移两个字符块和一个子组长度后的地址,每个地址段的长度为P-1个字符子块的长度。每个地址段间隔一个字符子块的长度。Still taking the location indicated by the LBA of the target data as 1MB+520KB, and the data length as 256KB as an example, if each subgroup includes P character subblocks, the character subblock corresponding to the target direct layer in each subgroup is the last one Character sub-blocks, the memory addresses of some character sub-blocks in 32 subgroups can be 32 address segments, the starting address of the 32 address segments is the starting address of the global index offset by two character blocks and a subgroup length address, and the length of each address segment is the length of a sub-block of P-1 characters. Each address segment is separated by the length of a character sub-block.
若全局索引以比特位图的形式存在,32个子组中部分字符子块的内存地址的起始位置位于全局索引中偏移起始地址32*N+N比特处共32个地址段,每个地址段的长度为P-1比特,每个地址段间隔1个比特。If the global index exists in the form of a bitmap, the starting position of the memory address of some character sub-blocks in the 32 subgroups is located at the offset starting address of 32*N+N bits in the global index, a total of 32 address segments, each The length of the address segment is P-1 bits, and each address segment is separated by 1 bit.
若全局索引以计数器组的形式存在,若每个计数器占用M个比特,一个子组的大小等于N*M比特。一个字符块为一组计数器,一组计数器的大小为32*N*M比特。32个子组中部分字符子块的内存地址的起始位置位于全局索引中偏移起始地址32*N*M+N*M比特处,每个地址段的长度为(P-1)*M比特,每个地址段间隔M个比特。If the global index exists in the form of a counter group, and if each counter occupies M bits, the size of a subgroup is equal to N*M bits. A character block is a group of counters, and the size of a group of counters is 32*N*M bits. The starting position of the memory address of some character sub-blocks in the 32 subgroups is located at the offset starting address of 32*N*M+N*M bits in the global index, and the length of each address segment is (P-1)*M Bits, each address segment is separated by M bits.
客户端设备200在确定了该各个子组的内存地址之后,可以发起第二单边RDMA,从存储设备110获取该各个子组。客户端设备200可以基于RDMA向存储设备110的网卡 114发起第二请求,该第二请求用于请求获取全局索引中该各个子组,该第二请求中携带各个子组的内存地址。After determining the memory addresses of the subgroups, the client device 200 may initiate a second unilateral RDMA to acquire the subgroups from the storage device 110 . The client device 200 may initiate a second request to the network card 114 of the storage device 110 based on RDMA, the second request is used to request to obtain the subgroups in the global index, and the second request carries the memory address of each subgroup.
存储设备110的网卡114在接收到该第二请求后,可以根据该各个子组的内存地址,读取该各个子组,将该各个子组携带在第二响应中,将第二响应发送给客户端设备200。After receiving the second request, the network card 114 of the storage device 110 can read the subgroups according to the memory addresses of the subgroups, carry the subgroups in the second response, and send the second response to Client device 200.
客户端设备200在确定了该各个子组中部分字符子块的内存地址之后,可以发起第二单边RDMA,从存储设备110获取该各个子组中部分字符子块。客户端设备200可以基于RDMA向存储设备110的网卡114发起第三请求,该第二请求用于请求获取全局索引中该各个子组中的部分字符子块,该第二请求中携带该各个子组中的部分字符子块的内存地址。After the client device 200 determines the memory address of some of the character sub-blocks in each sub-group, it may initiate a second unilateral RDMA to acquire some of the character sub-blocks in each of the sub-groups from the storage device 110 . The client device 200 may initiate a third request to the network card 114 of the storage device 110 based on RDMA, the second request is used to request to obtain some character sub-blocks in each sub-group in the global index, and the second request carries the character sub-blocks of each sub-group The memory address of the partial character subblock in the group.
存储设备110的网卡114在接收到该第三请求后,可以根据该各个子组中部分字符子块的内存地址,读取该各个子组中部分字符子块,将该各个子组中部分字符子块携带在第三响应中,将第三响应发送给客户端设备200。After receiving the third request, the network card 114 of the storage device 110 can read some character sub-blocks in each sub-group according to the memory addresses of some character sub-blocks in each sub-group, and read some character sub-blocks in each sub-group. The sub-block is carried in the third response, and the third response is sent to the client device 200 .
步骤604:客户端设备200根据该各个子组或该各个子组中部分字符子块的具体取值校验目标直通层中目标数据的元数据的有效性。Step 604: The client device 200 checks the validity of the metadata of the target data in the target pass-through layer according to the specific values of each subgroup or some character subblocks in each subgroup.
客户端设备200可以根据该各个子组或确定该目标数据所在层是否为该目标直通层。The client device 200 may determine whether the layer where the target data resides is the target pass-through layer according to the respective subgroups.
当全局索引以比特位图的形式存在,客户端设备200可以先确定该各个子组中除与该目标直通层对应的比特之外的比特是否为1,取值为1的比特所对应的层是否位于该目标直通层之上。若该各个子组中除与该目标直通层对应的比特之外的比特存在为1的比特,且取值为1的比特所对应的层位于该目标直通层之上,则说明最新写入在逻辑地址的目标数据位于取值为1的比特所对应的层中,目标直通层中目标数据的元数据的无效。When the global index exists in the form of a bitmap, the client device 200 can first determine whether the bits other than the bit corresponding to the target direct layer in each subgroup are 1, and the layer corresponding to the bit with a value of 1 Whether to be above this target passthrough layer. If there is a bit of 1 in each subgroup except the bit corresponding to the target pass-through layer, and the layer corresponding to the bit with a value of 1 is located above the target pass-through layer, it means that the latest write in The target data of the logical address is located in the layer corresponding to the bit whose value is 1, and the metadata of the target data in the target pass-through layer is invalid.
若该各个子组中存在为1的比特,且取值为1的比特所对应的层为该目标直通层。可选的,还存在取值为1的比特,该比特所对应的层位于该目标直通层之下,则说明最新写入在逻辑地址的目标数据位于目标直通层中,目标直通层中目标数据的元数据有效。If there is a bit with a value of 1 in each subgroup, and the layer corresponding to the bit with a value of 1 is the target pass-through layer. Optionally, there is also a bit with a value of 1, and the layer corresponding to this bit is located under the target pass-through layer, which means that the target data newly written at the logical address is located in the target pass-through layer, and the target data in the target pass-through layer The metadata for is valid.
若该各个子组中除与该目标直通层对应的比特之外的比特不存在为1的比特,则说明最新写入在逻辑地址的目标数据位于目标直通层中,目标直通层中目标数据的元数据有效。If there is no bit of 1 in each subgroup except the bits corresponding to the target pass-through layer, it means that the latest target data written in the logical address is located in the target pass-through layer, and the target data in the target pass-through layer Metadata is valid.
当全局索引以计数器组的形式存在,客户端设备200可以先确定该各个子组中除与该目标直通层对应的计数器之外的计数器是否非0,取值非0的计数器所对应的层是否位于该目标直通层之上。若该各个子组中除与该目标直通层对应的计数器之外的计数器存在非0的计数器,且取值非0的计数器所对应的层位于该目标直通层之上,则说明最新写入在逻辑地址的目标数据位于取值为1的计数器所对应的层中,目标直通层中目标数据的元数据的无效。When the global index exists in the form of a counter group, the client device 200 can first determine whether the counters in each subgroup other than the counter corresponding to the target direct layer are non-zero, and whether the layer corresponding to the non-zero counter is above the target passthrough layer. If there is a non-zero counter in each subgroup except the counter corresponding to the target pass-through layer, and the layer corresponding to the non-zero counter is above the target pass-through layer, it means that the latest write in The target data of the logical address is located in the layer corresponding to the counter whose value is 1, and the metadata of the target data in the target pass-through layer is invalid.
若该各个子组中存在非0的计数器,且取值非0的计数器所对应的层位于该目标直通层,可选的,还存在取值为1的其他计数器,该其他计数器所对应的层位于该目标直通层之下,则说明最新写入在逻辑地址的目标数据位于目标直通层中,目标直通层中目标数据的元数据有效。If there are non-zero counters in each subgroup, and the layer corresponding to the non-zero counter is located in the target direct layer, optionally, there are other counters with a value of 1, and the layer corresponding to the other counter If it is located under the target pass-through layer, it means that the latest target data written in the logical address is located in the target pass-through layer, and the metadata of the target data in the target pass-through layer is valid.
若该各个子组中除与该目标直通层对应的计数器之外的计数器不存在非0的计数器,则说明最新写入在逻辑地址的目标数据位于目标直通层中,目标直通层中目标数据的元数据有效。If there is no non-zero counter in each subgroup except the counter corresponding to the target pass-through layer, it means that the latest target data written in the logical address is located in the target pass-through layer, and the target data in the target pass-through layer Metadata is valid.
客户端设备200可以根据该各个子组中部分字符子块的具体取值确定该目标数据所在层是否为该目标直通层。The client device 200 may determine whether the layer where the target data resides is the target through layer according to specific values of some character sub-blocks in each subgroup.
当全局索引以比特位图的形式存在,客户端设备200可以先确定该各个子组中部分比 特(也即部分字符子块)是否为1,取值为1的比特所对应的层是否位于该目标直通层之上。若该各个子组中除与该目标直通层对应的比特之外的比特存在为1的比特,且取值为1的比特所对应的层位于该目标直通层之上,则说明最新写入在逻辑地址的目标数据位于取值为1的比特所对应的层中,目标直通层中目标数据的元数据无效。When the global index exists in the form of a bitmap, the client device 200 can first determine whether some of the bits in each subgroup (that is, some of the character subblocks) are 1, and whether the layer corresponding to the bit with a value of 1 is located in the above the target passthrough layer. If there is a bit of 1 in each subgroup except the bit corresponding to the target pass-through layer, and the layer corresponding to the bit with a value of 1 is located above the target pass-through layer, it means that the latest write in The target data of the logical address is located in the layer corresponding to the bit whose value is 1, and the metadata of the target data in the target pass-through layer is invalid.
若该各个子组中部分比特存在为1的比特,且取值为1的比特所对应的层位于该目标直通层之下,则说明最新写入在逻辑地址的目标数据位于目标直通层中,目标直通层中目标数据的元数据有效。If some bits in each subgroup have 1 bits, and the layer corresponding to the bit with a value of 1 is located under the target pass-through layer, it means that the latest target data written in the logical address is located in the target pass-through layer, The metadata of the target data in the target passthrough layer is valid.
若该各个子组中部分比特不存在为1的比特,则说明最新写入在逻辑地址的目标数据位于目标直通层中,目标直通层中目标数据的元数据有效。If some bits in each subgroup do not have 1 bits, it means that the latest target data written in the logical address is located in the target pass-through layer, and the metadata of the target data in the target pass-through layer is valid.
当全局索引以计数器组的形式存在,客户端设备200可以先确定该各个子组中部分计数器(也即部分字符子块)之外的计数器是否非0,取值非0的计数器所对应的层是否位于该目标直通层之上。若该各个子组中除与该目标直通层对应的计数器之外的计数器存在非0的计数器,且取值非0的计数器所对应的层位于该目标直通层之上,则说明最新写入在逻辑地址的目标数据位于取值为1的计数器所对应的层中,目标直通层中目标数据的元数据无效。When the global index exists in the form of a counter group, the client device 200 can first determine whether the counters other than some counters (that is, some character sub-blocks) in each subgroup are non-zero, and the layers corresponding to the non-zero counters Whether to be above this target passthrough layer. If there is a non-zero counter in each subgroup except the counter corresponding to the target pass-through layer, and the layer corresponding to the non-zero counter is above the target pass-through layer, it means that the latest write in The target data of the logical address is located in the layer corresponding to the counter whose value is 1, and the metadata of the target data in the target pass-through layer is invalid.
若该各个子组中部分计数器存在非0的计数器,且取值非0的计数器所对应的层位于该目标直通层之下,则说明最新写入在逻辑地址的目标数据位于目标直通层中,目标直通层中目标数据的元数据有效。If there are non-zero counters in some counters in each subgroup, and the layer corresponding to the non-zero counter is located under the target pass-through layer, it means that the latest target data written in the logical address is located in the target pass-through layer, The metadata of the target data in the target passthrough layer is valid.
若该各个子组中部分计数器不存在非0的计数器,则说明最新写入在逻辑地址的目标数据位于目标直通层中,目标直通层中目标数据的元数据有效。If there are no non-zero counters in some of the subgroups, it means that the latest target data written in the logical address is located in the target pass-through layer, and the metadata of the target data in the target pass-through layer is valid.
步骤605:若目标直通层中目标数据的元数据有效,客户端设备200利用目标数据的元数据,通过单边RDMA从存储设备110获取该目标数据。Step 605: If the metadata of the target data in the target pass-through layer is valid, the client device 200 obtains the target data from the storage device 110 by using the metadata of the target data.
也即客户端设备200可以基于RDMA向存储设备110的网卡114发送第四请求,该第四请求中携带目标直通层中目标数据的元数据。存储设备110的网卡114在接收带该第四请求后,可以根据目标直通层中目标数据的元数据确定该目标数据的物理地址,读取该目标数据,并将该目标数据通过第四响应反馈给客户端设备200。That is, the client device 200 may send a fourth request to the network card 114 of the storage device 110 based on RDMA, where the fourth request carries metadata of the target data in the target pass-through layer. After receiving the fourth request, the network card 114 of the storage device 110 can determine the physical address of the target data according to the metadata of the target data in the target pass-through layer, read the target data, and feed back the target data through the fourth response to the client device 200.
若目标直通层中目标数据的元数据无效,客户端设备200可以采用如图4或5所示的实施例从存储设备110获取该目标数据。客户端设备200也可以采用双边RDMA从存储设备110获取目标数据。If the metadata of the target data in the target pass-through layer is invalid, the client device 200 may acquire the target data from the storage device 110 using the embodiment shown in FIG. 4 or 5 . The client device 200 may also acquire target data from the storage device 110 by using bilateral RDMA.
图5和图6分别示出了由处理器112以及网卡来执行本实施例提供的数据访问方法,另外在本实施例中,还可以由不同于处理器112或者网卡的其他芯片来执行该方法。例如,该芯片可以是一个数据处理单元(data processing unit,DPU)。Figure 5 and Figure 6 respectively show that the data access method provided by this embodiment is executed by the processor 112 and the network card, and in this embodiment, the method can also be executed by other chips different from the processor 112 or the network card . For example, the chip may be a data processing unit (data processing unit, DPU).
为了能够进一步理解图6所示的实施例,介绍场景三中一种数据访问方法的具体实现方式。参见图7,在图7中,存储系统100中的I/O栈至少包括两层,这两层分别为写缓存和读缓存,存储设备110中的全局索引可以存储在内存中。在全局索引的每个子组中包括两个字符子块,第一个字符子块与写缓存对应,第二个字符子块与读缓存对应。全局索引在内存中表现为两种形式,一种为比特位图,另一种为计数器组。In order to further understand the embodiment shown in FIG. 6 , a specific implementation manner of a data access method in scenario three is introduced. Referring to FIG. 7 , in FIG. 7 , the I/O stack in the storage system 100 includes at least two layers, which are respectively a write cache and a read cache, and the global index in the storage device 110 can be stored in memory. Each subgroup of the global index includes two character subblocks, the first character subblock corresponds to the write cache, and the second character subblock corresponds to the read cache. The global index is represented in two forms in memory, one is a bitmap, and the other is a counter group.
步骤701:存储设备110将比特位图的内存地址通知给客户端设备200。Step 701: the storage device 110 notifies the client device 200 of the memory address of the bitmap.
由于比特位图占用空间相对较小,后续存储设备110需要从客户端设备200读取比特位图的部分子组或部分比特时,也仅需要读取较小的空间,能够有效提升全局索引的读取 效率。Since the bitmap occupies a relatively small space, when the subsequent storage device 110 needs to read some subgroups or bits of the bitmap from the client device 200, it only needs to read a small space, which can effectively improve the performance of the global index. read efficiency.
步骤702:客户端设备200可以向存储设备110发起第一单边RDMA,第一单边RDMA中用于读取存储系统100的I/O栈中该读缓存中目标数据的元数据。Step 702: The client device 200 may initiate a first unilateral RDMA to the storage device 110, and the first unilateral RDMA is used to read metadata of the target data in the read cache in the I/O stack of the storage system 100.
步骤703:客户端设备200可以向存储设备110发起第二单边RDMA,第二单边RDMA用于从存储设备110获取比特位图中指向该目标逻辑子块的各个子组中的第一个比特。Step 703: The client device 200 may initiate a second unilateral RDMA to the storage device 110, and the second unilateral RDMA is used to obtain from the storage device 110 the first one of the subgroups pointing to the target logical subblock in the bitmap bit.
步骤704:客户端设备200根据该各个子组中第一个比特的具体取值校验读缓存中目标数据的元数据的有效性。Step 704: The client device 200 checks the validity of the metadata of the target data in the read cache according to the specific value of the first bit in each subgroup.
客户端设备200先确定该各个子组中第一个比特是否为1,若第一个比特为1,则说明目标数据位于写缓存中,读缓存中目标数据的元数据的无效。The client device 200 first determines whether the first bit in each subgroup is 1. If the first bit is 1, it means that the target data is in the write cache, and the metadata of the target data in the read cache is invalid.
若该各个子组中第一个比特为0,则说明最新写入在逻辑地址的目标数据位于读缓存中,目标直通层中目标数据的元数据有效。If the first bit in each subgroup is 0, it means that the latest target data written in the logical address is in the read cache, and the metadata of the target data in the target pass-through layer is valid.
步骤705:若读缓存中目标数据的元数据有效,客户端设备200利用目标数据的元数据,通过单边RDMA从存储设备110获取该目标数据。若读缓存中目标数据的元数据无效,客户端设备200可以采用如图4或5所示的实施例从存储设备110获取该目标数据。客户端设备200也可以采用双边RDMA从存储设备110获取目标数据。Step 705: If the metadata of the target data in the read cache is valid, the client device 200 uses the metadata of the target data to obtain the target data from the storage device 110 through unilateral RDMA. If the metadata of the target data in the read cache is invalid, the client device 200 may acquire the target data from the storage device 110 by using the embodiment shown in FIG. 4 or 5 . The client device 200 may also acquire target data from the storage device 110 by using bilateral RDMA.
基于与方法实施例同一发明构思,本申请实施例还提供了一种数据访问装置,该数据访问装置用于执行上述如图4、5以及6、7所示的方法实施例中所述处理器或网卡执行的方法,相关特征可参见上述方法实施例,此处不再赘述。如图8所示,所述数据访问装置800包括传输模块801、读取模块802;Based on the same inventive concept as the method embodiment, the embodiment of the present application also provides a data access device, which is used to execute the processor described in the method embodiment shown in Figures 4, 5, 6, and 7 above. Or the method executed by the network card, related features can refer to the above method embodiment, and will not be repeated here. As shown in FIG. 8, the data access device 800 includes a transmission module 801 and a reading module 802;
传输模块801,用于接收数据读取请求,数据读取请求用于请求读取存储装置中存储的目标数据。传输模块801可以执行如图4所示的步骤405或图5所示的步骤505。The transmission module 801 is configured to receive a data read request, and the data read request is used to request to read the target data stored in the storage device. The transmission module 801 may execute step 405 shown in FIG. 4 or step 505 shown in FIG. 5 .
读取模块802,用于基于数据读取请求查询全局索引,全局索引用于指示I/O栈中目标数据所在的存储层;根据全局索引指示的存储层,读取目标数据。读取模块802可以执行如图4所示的步骤406~步骤408或图5所示的步骤506~步骤508。The reading module 802 is configured to query the global index based on the data reading request, and the global index is used to indicate the storage layer where the target data in the I/O stack is located; read the target data according to the storage layer indicated by the global index. The reading module 802 may execute steps 406 to 408 shown in FIG. 4 or steps 506 to 508 shown in FIG. 5 .
在一种可能的实施方式中,数据访问装置800还包括写入模块803。In a possible implementation manner, the data access device 800 further includes a writing module 803 .
传输模块801可以接收数据写入请求,数据写入请求用于请求在存储系统中写入目标数据。传输模块801可以执行如图4所示的步骤401或图5所示的步骤501。The transmission module 801 may receive a data write request, where the data write request is used to request to write target data in the storage system. The transmission module 801 may execute step 401 shown in FIG. 4 or step 501 shown in FIG. 5 .
写入模块803可以根据数据写入请求所述请求写入的目标数据,更新全局索引,更新后全局索引用于指示I/O栈中目标数据所在的存储层。读取模块802可以执行如图4所示的步骤402~步骤404或图5所示的步骤502~步骤504。The writing module 803 may update the global index according to the target data written in the data writing request, and the updated global index is used to indicate the storage layer of the target data in the I/O stack. The reading module 802 may execute steps 402 to 404 shown in FIG. 4 or steps 502 to 504 shown in FIG. 5 .
在一种可能的实施方式中,数据读取请求包括目标数据的逻辑地址,读取模块802在基于数据读取请求查询全局索引时,可以根据目标数据的逻辑地址在全局索引中确定指向目标数据的逻辑地址的多个字符子块;根据多个字符子块中取值非零的字符子块确定I/O栈中目标数据所在的存储层。In a possible implementation, the data read request includes the logical address of the target data, and when the read module 802 queries the global index based on the data read request, it can determine in the global index the target data according to the logical address of the target data Multiple character sub-blocks of the logical address; determine the storage layer where the target data in the I/O stack is located according to the non-zero character sub-blocks in the multiple character sub-blocks.
在一种可能的实施方式中,字符子块为一个比特,比特的取值包括0或1,1表示目标数据位于字符子块对应的存储层中,0表示目标数据位于字符子块对应的存储层中。In a possible implementation manner, the character sub-block is a bit, and the value of the bit includes 0 or 1. 1 indicates that the target data is located in the storage layer corresponding to the character sub-block, and 0 indicates that the target data is located in the storage layer corresponding to the character sub-block. layer.
在一种可能的实施方式中,字符子块为一个计数器,计数器为0或非零整数,所述非零整数用于指示所述目标数据位于所述字符字块对应的存储层中,所述非零整数还用于指示数据写入所述字符子块对应的存储层的次数,所述数据包括所述目标数据。In a possible implementation manner, the character sub-block is a counter, and the counter is 0 or a non-zero integer, and the non-zero integer is used to indicate that the target data is located in the storage layer corresponding to the character block, and the The non-zero integer is also used to indicate the number of times data is written into the storage layer corresponding to the character sub-block, and the data includes the target data.
在一种可能的实施方式中,读取模块802根据目标数据的逻辑地址在全局索引中确定 指向目标数据的逻辑地址的多个字符子块时,可以根据对目标数据的逻辑地址进行哈希操作的结果确定指向目标数据的逻辑地址的多个字符子块,哈希操作为查询哈希表或作用哈希函数。In a possible implementation, when the reading module 802 determines a plurality of character sub-blocks pointing to the logical address of the target data in the global index according to the logical address of the target data, it may perform a hash operation according to the logical address of the target data The result of determining a plurality of character sub-blocks pointing to the logical address of the target data, and the hash operation is to query a hash table or act on a hash function.
在一种可能的实施方式中,读取模块802根据对目标数据的逻辑地址进行哈希操作的结果确定指向目标数据的逻辑地址的多个字符子块时,可以根据结果确定全局索引中指向目标数据的逻辑地址所属的逻辑块的字符块;之后根据目标数据的逻辑地址从字符块中确定指向目标数据的逻辑地址的多个字符子块。In a possible implementation, when the reading module 802 determines a plurality of character sub-blocks pointing to the logical address of the target data according to the result of the hash operation on the logical address of the target data, it can be determined according to the result that the global index points to the target A character block of the logical block to which the logical address of the data belongs; and then determine a plurality of character sub-blocks pointing to the logical address of the target data from the character block according to the logical address of the target data.
在一种可能的实施方式中,字符块包括多个子组,每个子组与逻辑块中的一个逻辑子块对应,每个子组中包括多个字符子块,读取模块802可以根据目标数据的逻辑地址与目标数据的逻辑地址所属的逻辑块的之间偏移在字符块中的多个子组中确定目标子组,目标子组中的字符子块为指向目标数据的逻辑地址的多个字符子块。In a possible implementation manner, the character block includes multiple subgroups, each subgroup corresponds to a logical subblock in the logical block, and each subgroup includes multiple character subblocks, and the reading module 802 can The offset between the logical address and the logical block to which the logical address of the target data belongs determines the target subgroup in multiple subgroups in the character block, and the character subblocks in the target subgroup are multiple characters pointing to the logical address of the target data subblock.
在一种可能的实施方式中,全局索引以及目标数据的元数据位于存储装置中设备的内存中,传输模块801可以在客户端设备的第一指示下向客户端设备反馈全局索引以及目标数据的元数据,第一指示为基于RDMA传输的。该第一指示可以为图6所示的实施例中的第一请求以及第二请求,或为图6所示的实施例中的第一请求以及第三请求。In a possible implementation manner, the metadata of the global index and the target data are located in the memory of the device in the storage device, and the transmission module 801 may feed back the global index and the metadata of the target data to the client device under the first instruction of the client device. Metadata, the first indication is based on RDMA transport. The first indication may be the first request and the second request in the embodiment shown in FIG. 6 , or the first request and the third request in the embodiment shown in FIG. 6 .
在一种可能的实施方式中,全局索引位于存储装置中设备的内存,目标数据的元数据位于存储装置中设备的持久化存储器中,传输模块801可以在客户端设备的第二指示下向客户端设备反馈全局索引,第二指示为基于RDMA传输的;以及在客户端设备的第三指示下从持久化存储器中获取目标数据的元数据,向客户端设备反馈目标数据的元数据。In a possible implementation, the global index is located in the memory of the device in the storage device, and the metadata of the target data is located in the persistent storage of the device in the storage device, and the transmission module 801 may send the The end device feeds back the global index, and the second indication is based on RDMA transmission; and the metadata of the target data is obtained from the persistent storage under the third indication of the client device, and the metadata of the target data is fed back to the client device.
在一种可能的实施方式中,目标数据位于存储装置中设备的内存中,传输模块801可以在客户端设备的第四指示下向客户端设备反馈目标数据,第四指示是根据目标数据的元数据发起的、基于RDMA传输的。该第四指示可以为图6所示的实施例中的第四请求。In a possible implementation manner, the target data is located in the memory of the device in the storage device, and the transmission module 801 may feed back the target data to the client device under a fourth instruction of the client device, and the fourth instruction is based on the metadata of the target data Data-initiated, based on RDMA transmission. The fourth indication may be the fourth request in the embodiment shown in FIG. 6 .
在一种可能的实施方式中,目标数据位于存储装置中设备的持久化存储器中,传输模块801可以在客户端设备的第五指示下从向持久化存储器中获取目标数据,并向客户端设备反馈目标数据,第五指示根据目标数据的元数据发起的。In a possible implementation manner, the target data is located in the persistent memory of the device in the storage device, and the transmission module 801 can obtain the target data from the persistent memory under the fifth instruction of the client device, and send the target data to the client device The target data is fed back, and the fifth indication is initiated according to the metadata of the target data.
在一种可能的实施方式中,数据访问装置800还包括控制模块804,控制模块804可以控制I/O栈中的数据流动以及数据淘汰,以及根据I/O栈中的数据流动以及数据淘汰更新全局索引。In a possible implementation, the data access device 800 further includes a control module 804, the control module 804 can control the data flow and data elimination in the I/O stack, and update the data according to the data flow and data elimination in the I/O stack global index.
需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。在本申请的实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。It should be noted that the division of modules in the embodiment of the present application is schematic, and is only a logical function division, and there may be other division methods in actual implementation. Each functional module in the embodiment of the present application may be integrated into one processing module, each module may exist separately physically, or two or more modules may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules.
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载或执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站 点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质。半导体介质可以是固态硬盘(solid state drive,SSD)。The above-mentioned embodiments may be implemented in whole or in part by software, hardware, firmware or other arbitrary combinations. When implemented using software, the above-described embodiments may be implemented in whole or in part in the form of computer program products. The computer program product includes one or more computer instructions. When the computer program instructions are loaded or executed on the computer, the processes or functions according to the embodiments of the present invention will be generated in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center that includes one or more sets of available media. The available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media. The semiconductor medium may be a solid state drive (SSD).
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
本申请是参照根据本申请的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the present application. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变形在内。Obviously, those skilled in the art can make various changes and modifications to the application without departing from the scope of the application. In this way, if these modifications and variations of the application fall within the scope of the claims of the application and their equivalent technologies, the application also intends to include these modifications and variations.

Claims (33)

  1. 一种存储系统,其特征在于,所述存储系统包括多个存储层和处理单元,每个存储层之间的读数据的时延不同,A storage system, characterized in that the storage system includes multiple storage layers and processing units, and the time delays for reading data between each storage layer are different,
    所述处理单元,用于:The processing unit is used for:
    接收数据读取请求,以请求读取所述存储系统中存储的目标数据;receiving a data read request to request to read target data stored in the storage system;
    基于所述数据读取请求查询全局索引,所述全局索引中的第一索引项用于指示所述多个存储层中所述目标数据所在的存储层;Querying a global index based on the data read request, where the first index item in the global index is used to indicate the storage layer where the target data is located among the multiple storage layers;
    根据所述第一索引项指示的存储层,读取所述目标数据。Read the target data according to the storage layer indicated by the first index item.
  2. 如权利要求1所述的存储系统,其特征在于,所述处理单元还用于:The storage system according to claim 1, wherein the processing unit is further configured to:
    接收数据写入请求以请求在所述存储系统中写入所述目标数据;receiving a data write request to request to write the target data in the storage system;
    根据所述数据写入请求,在所述全局索引中记录所述第一索引项。Record the first index item in the global index according to the data writing request.
  3. 如权利要求1或2所述的存储系统,其特征在于,所述数据读取请求包括所述目标数据的逻辑地址,所述处理单元在基于所述数据读取请求查询全局索引时,具体用于:The storage system according to claim 1 or 2, wherein the data read request includes the logical address of the target data, and when the processing unit queries the global index based on the data read request, it specifically uses At:
    根据所述逻辑地址在所述全局索引中确定指向所述目标数据的逻辑地址的多个字符子块,所述多个字符子块属于所述第一索引项;determining a plurality of character sub-blocks pointing to the logical address of the target data in the global index according to the logical address, the plurality of character sub-blocks belonging to the first index item;
    根据所述多个字符子块的取值确定所述多个存储层中所述目标数据所在的存储层。The storage layer where the target data is located in the multiple storage layers is determined according to the values of the multiple character sub-blocks.
  4. 如权利要求3所述的存储系统,其特征在于,每个字符子块用于描述所述多个存储层中的一个存储层中是否存储数据,一个字符子块为一个比特,所述比特的取值包括0或1,所述1表示所述目标数据位于所述一个字符子块对应的存储层中,所述0表示所述目标数据不位于所述一个字符子块对应的存储层中。The storage system according to claim 3, wherein each character sub-block is used to describe whether data is stored in a storage layer in the plurality of storage layers, a character sub-block is a bit, and the bit The value includes 0 or 1, the 1 indicates that the target data is located in the storage layer corresponding to the one character sub-block, and the 0 indicates that the target data is not located in the storage layer corresponding to the one character sub-block.
  5. 如权利要求3所述的存储系统,其特征在于,每个字符子块用于描述所述多个存储层中的一个存储层中是否存储数据,一个字符子块为一个计数器,所述计数器的取值包括0或非零整数,所述0表示所述目标数据不位于所述一个字符子块对应的存储层中,所述非零整数用于指示所述目标数据位于所述一个字符子块对应的存储层中,所述非零整数还用于指示数据写入所述一个字符子块对应的存储层的次数,所述数据包括所述目标数据。The storage system according to claim 3, wherein each character sub-block is used to describe whether data is stored in a storage layer in the plurality of storage layers, and a character sub-block is a counter, and the counter is The value includes 0 or a non-zero integer, the 0 indicates that the target data is not located in the storage layer corresponding to the one character sub-block, and the non-zero integer is used to indicate that the target data is located in the one character sub-block In the corresponding storage layer, the non-zero integer is also used to indicate the number of times data is written into the storage layer corresponding to the one character sub-block, and the data includes the target data.
  6. 如权利要求3~5任一项所述的存储系统,其特征在于,所述处理单元根据所述目标数据的逻辑地址在所述全局索引中确定指向所述目标数据的逻辑地址的多个字符子块,具体用于:The storage system according to any one of claims 3 to 5, wherein the processing unit determines a plurality of characters pointing to the logical address of the target data in the global index according to the logical address of the target data Subblocks, specifically for:
    根据对所述目标数据的逻辑地址进行哈希操作的结果确定指向所述目标数据的逻辑地址的多个字符子块,所述哈希操作为查询哈希表或作用哈希函数。A plurality of character sub-blocks pointing to the logical address of the target data are determined according to a result of a hash operation on the logical address of the target data, where the hash operation is to query a hash table or apply a hash function.
  7. 如权利要求6所述的存储系统,其特征在于,所述处理单元根据对所述目标数据的逻辑地址进行哈希操作的结果确定指向所述目标数据的逻辑地址的多个字符子块,具体用于:The storage system according to claim 6, wherein the processing unit determines a plurality of character sub-blocks pointing to the logical address of the target data according to the result of a hash operation on the logical address of the target data, specifically Used for:
    根据所述结果确定所述全局索引中指向所述目标数据的逻辑地址所属的逻辑块的字符块;determining a character block in the global index pointing to a logical block to which the logical address of the target data belongs according to the result;
    根据所述目标数据的逻辑地址从所述字符块中确定指向所述目标数据的逻辑地址的多个字符子块。A plurality of character sub-blocks pointing to the logical address of the target data are determined from the character block according to the logical address of the target data.
  8. 如权利要求1~7任一项所述的存储系统,其特征在于,所述处理单元是数据处理器DPU。The storage system according to any one of claims 1-7, wherein the processing unit is a data processor (DPU).
  9. 如权利要求1~8任一项所述的存储系统,其特征在于,所述处理单元位于所述存储系统中的网卡中或位于中央处理器中。The storage system according to any one of claims 1-8, wherein the processing unit is located in a network card in the storage system or in a central processing unit.
  10. 如权利要求1~9任一项所述的存储系统,其特征在于,所述处理单元,还用于控制所述多个存储层中的数据流动以及数据淘汰,以及根据所述存储层中的数据流动以及数据淘汰更新所述全局索引。The storage system according to any one of claims 1 to 9, wherein the processing unit is further configured to control data flow and data elimination in the multiple storage layers, and Data flow and data eviction updates the global index.
  11. 如权利要求1~10任一项所述的存储系统,其特征在于,所述多个存储层包括性能层和容量层,所述性能层包括写缓存、读缓存和硬盘缓存中的一项或多项,所述容量层包括固态硬盘和机械硬盘中的一项或多项。The storage system according to any one of claims 1-10, wherein the multiple storage layers include a performance layer and a capacity layer, and the performance layer includes one or more of a write cache, a read cache, and a hard disk cache. multiple items, and the capacity layer includes one or more items of solid state disks and mechanical hard disks.
  12. 一种数据访问方法,其特征在于,所述方法应用于存储系统,所述存储系统包括多个存储层和处理单元,每个存储层之间的读数据的时延不同,所述方法,包括:A data access method, characterized in that the method is applied to a storage system, the storage system includes a plurality of storage layers and processing units, and the time delays for reading data between each storage layer are different, and the method includes :
    所述处理单元接收数据读取请求以读取所述存储系统中存储的目标数据;The processing unit receives a data read request to read target data stored in the storage system;
    所述处理单元基于所述数据读取请求查询全局索引,所述全局索引中的第一索引项用于指示所述多个存储层中所述目标数据所在的存储层;The processing unit queries a global index based on the data read request, and the first index item in the global index is used to indicate the storage layer where the target data is located among the multiple storage layers;
    所述处理单元根据所述第一索引项指示的存储层,读取所述目标数据。The processing unit reads the target data according to the storage layer indicated by the first index item.
  13. 如权利要求12所述的方法,其特征在于,所述方法还包括:The method of claim 12, further comprising:
    所述处理单元接收数据写入请求以请求在所述存储系统中写入所述目标数据;The processing unit receives a data write request to request to write the target data in the storage system;
    所述处理单元根据所述数据写入请求,在所述全局索引中记录所述第一索引项。The processing unit records the first index item in the global index according to the data write request.
  14. 如权利要求12或13所述的方法,其特征在于,所述数据读取请求包括所述目标数据的逻辑地址,所述处理单元基于所述数据读取请求查询全局索引,包括:The method according to claim 12 or 13, wherein the data read request includes the logical address of the target data, and the processing unit queries the global index based on the data read request, comprising:
    所述处理单元根据所述逻辑地址在所述全局索引中确定指向所述目标数据的逻辑地址的多个字符子块,所述多个字符子块属于所述第一索引项;The processing unit determines a plurality of character sub-blocks pointing to the logical address of the target data in the global index according to the logical address, and the plurality of character sub-blocks belong to the first index item;
    所述处理单元根据所述多个字符子块的取值确定所述多个存储层中所述目标数据所在的存储层。The processing unit determines the storage layer where the target data is located in the multiple storage layers according to the values of the multiple character sub-blocks.
  15. 如权利要求14所述的方法,其特征在于,每个字符子块用于描述所述多个存储层中的一个存储层中是否存储数据,一个字符子块为一个比特,所述比特的取值包括0或1,所述1表示所述目标数据位于所述一个字符子块对应的存储层中,所述0表示所述目标数据不位于所述一个字符子块对应的存储层中。The method according to claim 14, wherein each character sub-block is used to describe whether to store data in a storage layer in the plurality of storage layers, a character sub-block is a bit, and the selection of the bit The value includes 0 or 1, the 1 indicates that the target data is located in the storage layer corresponding to the one character sub-block, and the 0 indicates that the target data is not located in the storage layer corresponding to the one character sub-block.
  16. 如权利要求14所述的方法,其特征在于,每个字符子块用于描述所述多个存储层中的一个存储层中是否存储数据,一个字符子块为一个计数器,所述计数器的取值包括0或非零整数,所述0表示所述目标数据不位于所述一个字符子块对应的存储层中,所述非零整数用于指示所述目标数据位于所述一个字符子块对应的存储层中,所述非零整数还用于指示数据写入所述一个字符子块对应的存储层的次数,所述数据包括所述目标数据。The method according to claim 14, wherein each character sub-block is used to describe whether data is stored in a storage layer in the plurality of storage layers, and a character sub-block is a counter, and the fetching of the counter is The value includes 0 or a non-zero integer, the 0 indicates that the target data is not located in the storage layer corresponding to the one character sub-block, and the non-zero integer is used to indicate that the target data is located in the storage layer corresponding to the one character sub-block In the storage layer, the non-zero integer is also used to indicate the number of times data is written into the storage layer corresponding to the one character sub-block, and the data includes the target data.
  17. 如权利要求14~16任一项所述的方法,其特征在于,所述处理单元根据所述目标数据的逻辑地址在所述全局索引中确定指向所述目标数据的逻辑地址的多个字符子块,包括:The method according to any one of claims 14-16, characterized in that, the processing unit determines a plurality of character substrings pointing to the logical address of the target data in the global index according to the logical address of the target data blocks, including:
    所述处理单元根据对所述目标数据的逻辑地址进行哈希操作的结果确定指向所述目标数据的逻辑地址的多个字符子块,所述哈希操作为查询哈希表或作用哈希函数。The processing unit determines a plurality of character sub-blocks pointing to the logical address of the target data according to a result of a hash operation on the logical address of the target data, and the hash operation is to query a hash table or act on a hash function .
  18. 如权利要求17所述的方法,其特征在于,所述处理单元根据对所述目标数据的逻辑地址进行哈希操作的结果确定指向所述目标数据的逻辑地址的多个字符子块,包括:The method according to claim 17, wherein the processing unit determines a plurality of character sub-blocks pointing to the logical address of the target data according to a result of a hash operation on the logical address of the target data, comprising:
    所述处理单元根据所述结果确定所述全局索引中指向所述目标数据的逻辑地址所属的逻辑块的字符块;The processing unit determines, according to the result, a character block in the global index pointing to a logical block to which the logical address of the target data belongs;
    所述处理单元根据所述目标数据的逻辑地址从所述字符块中确定指向所述目标数据的逻辑地址的多个字符子块。The processing unit determines a plurality of character sub-blocks pointing to the logical address of the target data from the character block according to the logical address of the target data.
  19. 如权利要求12~18任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 12-18, characterized in that the method further comprises:
    所述处理单元控制所述多个存储层中的数据流动以及数据淘汰;the processing unit controls data flow and data retirement in the plurality of storage tiers;
    所述处理单元根据所述多个存储层中的数据流动以及数据淘汰更新所述全局索引。The processing unit updates the global index according to data flow and data elimination in the plurality of storage layers.
  20. 一种网卡,其特征在于,所述网卡位于所述存储系统中,所述网卡用于执行如权利要求12~19任一项所述的方法。A network card, characterized in that the network card is located in the storage system, and the network card is used to execute the method according to any one of claims 12-19.
  21. 一种处理器,其特征在于,所述处理器位于所述存储系统中,所述处理器用于执行如权利要求12~19任一项所述的方法。A processor, wherein the processor is located in the storage system, and the processor is configured to execute the method according to any one of claims 12-19.
  22. 一种数据访问装置,其特征在于,所述数据处理装置位于存储系统中,所述存储系统还包括多个存储层,每个存储层之间的读数据的时延不同,所述数据访问装置包括传输模块、读取模块;A data access device, characterized in that the data processing device is located in a storage system, the storage system further includes a plurality of storage layers, and the time delays for reading data between each storage layer are different, and the data access device Including transmission module and reading module;
    所述传输模块,用于接收数据读取请求,以读取所述存储系统中存储的目标数据;The transmission module is configured to receive a data reading request to read the target data stored in the storage system;
    所述读取模块,用于基于所述数据读取请求查询全局索引,所述全局索引中的第一索引项用于指示所述多个存储层中所述目标数据所在的存储层;根据所述第一索引项指示的存储层,读取所述目标数据。The reading module is configured to query a global index based on the data read request, the first index item in the global index is used to indicate the storage layer where the target data is located in the multiple storage layers; according to the The storage layer indicated by the first index item is used to read the target data.
  23. 如权利要求22所述的装置,其特征在于,所述数据访问装置还包括写入模块;The device according to claim 22, wherein the data access device further comprises a writing module;
    所述传输模块,还用于接收数据写入请求,请求在所述存储系统中写入所述目标数据;The transmission module is further configured to receive a data writing request, requesting to write the target data in the storage system;
    所述写入模块,用于根据所述数据写入请求,在所述全局索引中记录所述第一索引项。The writing module is configured to record the first index item in the global index according to the data writing request.
  24. 如权利要求22或23所述的装置,其特征在于,所述数据读取请求包括所述目标数据的逻辑地址,所述读取模块在基于所述数据读取请求查询全局索引时,具体用于:The device according to claim 22 or 23, wherein the data read request includes the logical address of the target data, and when the read module queries the global index based on the data read request, it specifically uses At:
    根据所述逻辑地址在所述全局索引中确定指向所述目标数据的逻辑地址的多个字符子块,所述多个字符子块属于所述第一索引项,根据所述多个字符子块的取值确定所述多个存储层中所述目标数据所在的存储层。Determine a plurality of character sub-blocks pointing to the logical address of the target data in the global index according to the logical address, the plurality of character sub-blocks belong to the first index item, and according to the plurality of character sub-blocks The value of determines the storage tier where the target data is located in the multiple storage tiers.
  25. 如权利要求24所述的装置,其特征在于,每个字符子块用于描述所述多个存储层中的一个存储层中是否存储数据,一个字符子块为一个比特,所述比特的取值包括0或1,所述1表示所述目标数据位于所述字符子块对应的存储层中,所述0表示所述目标数据不位于所述字符子块对应的存储层中。The device according to claim 24, wherein each character sub-block is used to describe whether data is stored in a storage layer in the plurality of storage layers, a character sub-block is a bit, and the selection of the bit The value includes 0 or 1, the 1 indicates that the target data is located in the storage layer corresponding to the character sub-block, and the 0 indicates that the target data is not located in the storage layer corresponding to the character sub-block.
  26. 如权利要求24所述的装置,其特征在于,每个字符子块用于描述所述多个存储层中的一个存储层中是否存储数据,一个字符子块为一个计数器,所述计数器的取值包括0或非零整数,所述0表示所述目标数据不位于所述一个字符子块对应的存储层中,所述非零整数用于指示所述目标数据位于一个字符子块对应的存储层中,所述非零整数还用于指示数据写入所述一个字符子块对应的存储层的次数,所述数据包括所述目标数据。The device according to claim 24, wherein each character sub-block is used to describe whether data is stored in a storage layer in the plurality of storage layers, and a character sub-block is a counter, and the fetching of the counter is The value includes 0 or a non-zero integer, the 0 indicates that the target data is not located in the storage layer corresponding to the character sub-block, and the non-zero integer is used to indicate that the target data is located in the storage layer corresponding to a character sub-block In the layer, the non-zero integer is also used to indicate the number of times data is written into the storage layer corresponding to the one character sub-block, and the data includes the target data.
  27. 如权利要求24~26任一项所述的装置,其特征在于,所述读取模块在根据所述目标数据的逻辑地址在所述全局索引中确定指向所述目标数据的逻辑地址的多个字符子块时,具体用于:The device according to any one of claims 24-26, characterized in that, the reading module determines a plurality of logical addresses pointing to the logical address of the target data in the global index according to the logical address of the target data When a character sub-block is used, it is specifically used for:
    根据对所述目标数据的逻辑地址进行哈希操作的结果确定指向所述目标数据的逻辑地址的多个字符子块,所述哈希操作为查询哈希表或作用哈希函数。A plurality of character sub-blocks pointing to the logical address of the target data are determined according to a result of a hash operation on the logical address of the target data, where the hash operation is to query a hash table or apply a hash function.
  28. 一种数据访问系统,其特征在于,所述数据访问系统包括如权1-11任一所述的存储系统以及客户端设备;A data access system, characterized in that the data access system includes a storage system and a client device as described in any one of claims 1-11;
    所述客户端设备,用于向所述存储系统发送数据读取请求,以读取所述存储系统中存储的目标数据。The client device is configured to send a data read request to the storage system to read target data stored in the storage system.
  29. 如权利要求28所述的系统,其特征在于,所述全局索引以及所述目标数据的元数据位于所述存储系统中的内存中;The system of claim 28, wherein the global index and the metadata of the target data are located in memory in the storage system;
    所述客户端设备,还用于基于远程直接内存访问RDMA向所述存储系统发起第一指示,所述第一指示用于请求所述全局索引以及所述目标数据的元数据;The client device is further configured to initiate a first indication to the storage system based on Remote Direct Memory Access (RDMA), where the first indication is used to request the global index and metadata of the target data;
    所述存储系统,还用于基于所述第一指示向所述客户端设备发送所述全局索引以及所述目标数据的元数据。The storage system is further configured to send the global index and metadata of the target data to the client device based on the first indication.
  30. 如权利要求28所述的系统,其特征在于,所述全局索引位于所述存储系统中的内存,所述目标数据的元数据位于所述存储系统中的持久化存储器中;The system according to claim 28, wherein the global index is located in a memory in the storage system, and the metadata of the target data is located in a persistent storage in the storage system;
    所述客户端设备,还用于基于远程直接内存访问RDMA向所述存储系统发起第二指示,所述第二指示用于请求所述全局索引;以及向所述存储系统发起第三指示,所述第三指示用于请求所述目标数据的元数据;The client device is further configured to initiate a second indication to the storage system based on Remote Direct Memory Access (RDMA), the second indication is used to request the global index; and initiate a third indication to the storage system, the said third indication is used to request metadata of said target data;
    所述存储系统,还用于基于所述第二指示向所述客户端设备发送所述全局索引;基于所述第三指示从所述持久化存储器中获取所述目标数据的元数据,向所述客户端设备发送所述目标数据的元数据。The storage system is further configured to send the global index to the client device based on the second indication; acquire metadata of the target data from the persistent storage based on the third indication, and send the The client device sends metadata of the target data.
  31. 如权利要求29或30所述的系统,其特征在于,A system as claimed in claim 29 or 30 wherein,
    所述客户端设备,还用于根据所述全局索引校验所述目标数据的元数据是否有效;在确定所述目标数据的元数据有效的情况下,根据所述目标数据的元数据向所述存储系统发起所述数据读取请求;The client device is further configured to check whether the metadata of the target data is valid according to the global index; if it is determined that the metadata of the target data is valid, send the The storage system initiates the data read request;
    所述存储系统,还用于基于所述数据读取请求向所述客户端设备发送所述目标数据。The storage system is further configured to send the target data to the client device based on the data read request.
  32. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述如权利要求12~19任一项所述的方法。A computer-readable storage medium, characterized in that instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium is run on a computer, the computer is made to execute the method described in any one of claims 12-19.
  33. 一种包含指令的计算机程序产品,其特征在于,当其在计算机上运行时,使得计算机执行上述如权利要求12~19任一项所述的方法。A computer program product containing instructions, characterized in that, when it is run on a computer, it causes the computer to execute the method described in any one of claims 12-19.
PCT/CN2022/092015 2021-06-07 2022-05-10 Storage system, network interface card, processor, and data access method, apparatus, and system WO2022257685A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202110634011.5 2021-06-07
CN202110634011 2021-06-07
CN202110944933.6A CN115509437A (en) 2021-06-07 2021-08-17 Storage system, network card, processor, data access method, device and system
CN202110944933.6 2021-08-17

Publications (1)

Publication Number Publication Date
WO2022257685A1 true WO2022257685A1 (en) 2022-12-15

Family

ID=84424578

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/092015 WO2022257685A1 (en) 2021-06-07 2022-05-10 Storage system, network interface card, processor, and data access method, apparatus, and system

Country Status (1)

Country Link
WO (1) WO2022257685A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573043A (en) * 2024-01-17 2024-02-20 济南浪潮数据技术有限公司 Transmission method, device, system, equipment and medium for distributed storage data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090055351A1 (en) * 2007-08-24 2009-02-26 Microsoft Corporation Direct mass storage device file indexing
CN102999519A (en) * 2011-09-15 2013-03-27 上海盛付通电子商务有限公司 Read-write method and system for database
CN104111898A (en) * 2014-05-26 2014-10-22 中国能源建设集团广东省电力设计研究院 Hybrid storage system based on multidimensional data similarity and data management method
CN111399764A (en) * 2019-12-25 2020-07-10 杭州海康威视系统技术有限公司 Data storage method, data reading device, data storage equipment and data storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090055351A1 (en) * 2007-08-24 2009-02-26 Microsoft Corporation Direct mass storage device file indexing
CN102999519A (en) * 2011-09-15 2013-03-27 上海盛付通电子商务有限公司 Read-write method and system for database
CN104111898A (en) * 2014-05-26 2014-10-22 中国能源建设集团广东省电力设计研究院 Hybrid storage system based on multidimensional data similarity and data management method
CN111399764A (en) * 2019-12-25 2020-07-10 杭州海康威视系统技术有限公司 Data storage method, data reading device, data storage equipment and data storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573043A (en) * 2024-01-17 2024-02-20 济南浪潮数据技术有限公司 Transmission method, device, system, equipment and medium for distributed storage data

Similar Documents

Publication Publication Date Title
US11687446B2 (en) Namespace change propagation in non-volatile memory devices
US20230315290A1 (en) Namespaces allocation in non-volatile memory devices
US20210191615A1 (en) Methods, systems and devices relating to data storage interfaces for managing data address spaces in data storage devices
US10885005B2 (en) Disk optimized paging for column oriented databases
TWI778157B (en) Ssd, distributed data storage system and method for leveraging key-value storage
US9311252B2 (en) Hierarchical storage for LSM-based NoSQL stores
CN114860163B (en) Storage system, memory management method and management node
US20150262632A1 (en) Grouping storage ports based on distance
WO2023185770A1 (en) Cloud data caching method and apparatus, device and storage medium
CN111158602A (en) Data layered storage method, data reading method, storage host and storage system
US11010091B2 (en) Multi-tier storage
US11366609B2 (en) Technique for encoding deferred reference count increments and decrements
CN115509437A (en) Storage system, network card, processor, data access method, device and system
WO2022257685A1 (en) Storage system, network interface card, processor, and data access method, apparatus, and system
US11586353B2 (en) Optimized access to high-speed storage device
CN108984432B (en) Method and device for processing IO (input/output) request
US11150827B2 (en) Storage system and duplicate data management method
JP2019074912A (en) Storage system and control method
WO2023029417A1 (en) Data storage method and device
JP7310110B2 (en) storage and information processing systems;
WO2021017647A1 (en) Method and apparatus for merging data units
CN117917649A (en) Data processing method, device, chip and computer readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22819285

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22819285

Country of ref document: EP

Kind code of ref document: A1