CN115270033A - Data access system, method, equipment and network card - Google Patents

Data access system, method, equipment and network card Download PDF

Info

Publication number
CN115270033A
CN115270033A CN202110697375.8A CN202110697375A CN115270033A CN 115270033 A CN115270033 A CN 115270033A CN 202110697375 A CN202110697375 A CN 202110697375A CN 115270033 A CN115270033 A CN 115270033A
Authority
CN
China
Prior art keywords
data
network card
target data
storage device
metadata
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110697375.8A
Other languages
Chinese (zh)
Inventor
陈灿
蒋凡璐
徐启明
韩兆皎
余博伟
姚建业
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to EP22787387.4A priority Critical patent/EP4318251A1/en
Priority to PCT/CN2022/084322 priority patent/WO2022218160A1/en
Publication of CN115270033A publication Critical patent/CN115270033A/en
Priority to US18/485,942 priority patent/US20240039995A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/972Access to data in other repository systems, e.g. legacy data or dynamic Web page generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation

Abstract

In the application, a client device sends a first message to a first storage device to request the first storage device to be capable of writing target data, wherein the first message comprises a logical address to which the target data needs to be written. The front-end network card of the first storage device is provided with the capability of analyzing the first message and can process the first message; writing the target data into the first storage device according to the indication of the first message; and generating metadata, and recording the corresponding relation between the logical address of the target data and the metadata, wherein the metadata can indicate the physical address of the target data stored in the first storage device. In the whole data access process, the processor of the first storage device does not need to participate, and the front-end network card of the first storage device executes data access operation, so that the occupation of computing resources of the processor is reduced to a certain extent, and the access efficiency of the storage device is improved.

Description

Data access system, method, equipment and network card
Cross Reference to Related Applications
The present application claims priority of the chinese patent application entitled "a method and apparatus for data access" filed by the intellectual property office of the people's republic of china, application number 202110399947.4, at 14/4/2021, the entire contents of which are incorporated herein by reference.
Technical Field
The present application relates to the field of storage technologies, and in particular, to a data access system, method, device, and network card.
Background
Currently, a client sends a data access request to a server according to a certain network protocol to read data or write data. Taking the case that the data access request is used for requesting to read the data a, the processor of the server needs to search the data a from the memory, cache the data a in the memory of the server, package the data a in the data access response, and feed back the data access response to the client.
When processing a data access request, a processor at a server needs to perform data copying and packaging on data a. Therefore, the server side occupies a large amount of computing resources of the processor in the process of processing the data access request, and the read-write efficiency of the whole server side storage system is poor.
Disclosure of Invention
The application provides a data access system, a method, equipment and a network card, which are used for reducing the consumption of a processor of a server side in the data access process.
In a first aspect, an embodiment of the present application provides a data access system, where the data access system includes a client device and a first storage device, and the first storage device is a device in a storage system. In the data access system, when target data needs to be written, a client device may send a first packet to a first storage device to request that the first storage device can write the target data, where the first packet includes a logical address to which the target data needs to be written, that is, a logical address of the target data.
When the first message reaches the first storage device, the front-end network card of the first storage device receives the first message first, and the front-end network card of the first storage device can process the first message and write target data into the first storage device. In the process of writing the target data, the front-end network card of the first storage device may further generate metadata, and record a corresponding relationship between a logical address of the target data and the metadata, where the metadata is used to indicate a physical address of the target data stored in the first storage device.
After writing the target data, the front-end network card of the first storage device may further feed back a data access response (in this case, the data access response may be understood as a data write response) to the client device, indicating that the target data has been successfully written.
In the embodiment of the present application, the first storage device may also be referred to as a first device for short, and similarly, for other devices in the storage system, such as the second storage device, may also be referred to as a second device for short.
Through the system, in the data access process, particularly in the data writing process, the processor of the first storage device does not need to participate, but the front-end network card of the first storage device executes the whole data access operation, wherein the whole data access operation comprises the writing of target data, the generation of metadata and the creation or updating of index information (the index information can indicate the corresponding relation between the metadata and the logical address of the target data), the consumption of the processor is reduced to a certain extent, the occupation of computing resources of the processor is effectively reduced, and the data writing efficiency is improved.
In a possible implementation manner, the first storage device further includes a memory, and the memory may store index information. When recording the corresponding relationship between the logical address of the target data and the metadata, the front-end network card of the first storage device may directly record the corresponding relationship between the logical address of the target data and the metadata in the index information, for example, record the corresponding relationship between the logical address of the target data and the metadata itself, or record the corresponding relationship between the logical address of the target data and the address of the metadata. The front-end network card of the first storage device may also record the corresponding relationship between the logical address of the target data and the metadata in the index information in an indirect manner. For example, the front-end network card of the first storage device may generate a key (key) from the logical address of the target data, use the metadata itself or the address of the metadata as a value (value), and record the correspondence between the logical address of the target data and the metadata by recording a key-value pair.
Through the system, the front-end network card of the first storage device can maintain the index information without a processor maintaining the index information, and further, the occupation of the processor is reduced.
In a possible implementation manner, the front-end network card of the first storage device may analyze a message from the client device, taking the first message as an example. After receiving the first packet, the first storage device may parse the first packet, remove some headers in the first packet, and obtain a data write command based on the RDMA protocol. And then, the data writing command can be continuously analyzed, and the logical address of the target data carried in the data writing command is obtained.
By the method, the front-end network card of the first storage device can completely analyze the first message, can directly obtain the logical address of the target data carried in the first message, and can write the target data conveniently.
In one possible implementation, the message from the client device, such as the first message, may be based on any one of the following protocols: IB. RoCE, iWARP.
Through the system, the front-end network card of the first storage device can be suitable for analyzing messages transmitted by different protocols, and application scenes are effectively expanded.
In a possible implementation manner, the data access system further includes a second storage device, and the front-end network card of the first storage device may further instruct the second storage device to perform mirror storage, so as to store a copy of the target data in the second storage device. When the front-end network card of the first storage device indicates the second storage device to perform mirror image storage, a mirror image data write command may be sent to the second storage device, where the mirror image data write command is used to request to write a copy of target data, and the mirror image data write command includes the copy of the target data and a logical address of the target data.
Through the system, the front-end network card of the first storage device can store the copy of the target data in the second storage device by indicating the second storage device to perform mirror image storage, and even if the target data stored in the first storage device is damaged or lost, the copy of the target data is still stored in the second storage device, so that the reliability of the target data can be effectively ensured.
In a possible implementation manner, before instructing the second storage device to perform mirror image storage, the front-end network card of the first storage device needs to determine the second storage device having a mirror image mapping relationship with the first storage device, where the mirror image mapping relationship indicates that a corresponding relationship exists between the first storage device and the second storage device. The mirror image mapping relationship may be stored in the first storage device, or may be stored in a position where the first storage device can acquire the mirror image mapping relationship.
Through the system, the two storage devices which are in mirror image with each other are defined through the mirror image mapping relation, so that the first storage device can conveniently and quickly determine the second storage device which has the mirror image mapping relation with the first storage device.
In a possible implementation manner, the first storage device and the second storage device may establish a connection through a switching network card. For example, the first storage device may include at least two switching network cards, and the first storage device is connected to the second storage device through the at least two switching network cards. On the side of the second storage device, the second storage device may also include a switching network card, and a connection may be established between the switching network card of the first storage device and the switching network card of the second device. When the front-end network card of the first storage device sends a mirror image data write-in command to the second storage device, the switching network card may be selected from at least two switching network cards in the first storage device based on a load balancing policy. And then sending mirror image data write-in commands to the network card of the second storage device through the selected switching network card.
Through the system, the first storage device comprises at least two exchange network cards, when the first storage device and the second storage device are interacted, information can be sent through different exchange network cards, so that the information needing interaction between the first storage device and the second storage device is distributed to different exchange network cards, the data interaction efficiency is effectively improved, in addition, due to the fact that a load balancing strategy is adopted when a switching network is selected, the amount of information distributed on each switching network of the first storage device is basically consistent, and the data interaction efficiency can be further ensured.
In one possible implementation, after writing the target data to the first storage device, the client device may also request the target data from the first storage device. For example, the client device may send a second message to the first storage device to request to read the target data, the second message including a logical address of the target data. After receiving the second message, the front-end network card of the first storage device may obtain the metadata according to the logical address of the target data, and obtain the target data from the first storage device according to the metadata.
After the front-end network card of the first storage device acquires the target data, a data access response carrying the target data may be fed back to the client device (in this case, the data access response may be understood as a data read response).
Through the system, in the data access process, such as the data reading process, the processor of the first storage device does not need to participate, but the front-end network card of the first storage device executes the data reading operation, so that the consumption of the processor can be effectively reduced, and the data reading efficiency is improved.
In a possible implementation manner, after receiving the second message, the front-end network card of the first storage device can parse the second message to obtain the logical address of the target data carried therein. For example, when the front-end network card of the first storage device receives the second packet, the RDMA protocol-based data read command may be parsed from the second packet. And then, analyzing the data reading command to obtain the logical address of the target data carried by the data reading command. Wherein the first message is based on any one of the following protocols: IB. RoCE, iWARP.
Through the system, the front-end network card of the first storage device can completely analyze the second message, can directly obtain the logical address of the target data carried in the second message, and is convenient for reading the target data later.
In a possible implementation manner, before the front-end network card of the first storage device obtains the metadata, a home location search may be performed to determine a home node of the target data, and when it is determined that the home node of the target data is the first storage device, the metadata may be obtained according to a logical address of the target data.
Through the system, the first storage device can determine the home node of the target data through home finding, and only after determining that the home node of the target data is the first storage device, the metadata is acquired, so that the target data can be successfully read subsequently.
In a possible implementation manner, the front-end network card of the first storage device and the switching network card of the first storage device may be deployed in a centralized manner or may be deployed separately. The number of the front-end network cards and the number of the switching network cards are not limited, and the number of the front-end network cards and the number of the switching network cards can be one or more.
Through the system, the deployment modes of the front-end network card and the switching network card in the first storage device are flexible, and the system is suitable for different device structures.
In a second aspect, an embodiment of the present application provides a data access method, where the method may be applied to a first storage device, and for beneficial effects, reference may be made to relevant descriptions of the first aspect, and details are not described here again. In the method, a front-end network card of a first storage device may receive a first message from a client device, where the first message is used to request to write target data in the first storage device.
When the first message reaches the first storage device, the front-end network card of the first storage device receives the first message first and processes the first message. The front-end network card of the first storage device can write the target data into the first storage device by itself. In the process of writing the target data, metadata can be generated, and the corresponding relation between the logical address of the target data and the metadata is recorded, wherein the metadata is used for indicating the physical address of the target data stored in the first storage device.
After the front-end network card of the first storage device writes the target data, the front-end network card of the first storage device may also feed back a data access response to the client device, indicating that the target data has been successfully written.
In one possible implementation manner, there are many ways for the front-end network card of the first storage device to record the corresponding relationship between the logical address of the target data and the metadata. For example, the front-end network card of the first storage device may directly record the corresponding relationship between the logical address of the target data and the metadata in the index information, such as recording the corresponding relationship between the logical address of the target data and the metadata itself, or recording the corresponding relationship between the logical address of the target data and the address of the metadata. For another example, the front-end network card of the first storage device may record the correspondence between the logical address of the target data and the metadata in the index information in an indirect manner. The front-end network card of the first storage device may generate a key (key) according to the logical address of the target data, use the metadata itself or the address of the metadata as a value (value), and record the corresponding relationship between the logical address of the target data and the metadata by recording a key-value pair.
In a possible implementation manner, after receiving the first message, the front-end network card of the first storage device may obtain the logical address of the target data from the first message. For example, the front-end network card of the first storage device may parse the first packet, and parse the first packet to obtain a data write command based on the RDMA protocol; and analyzing the data writing command to acquire the logical address of the target data carried by the data writing command.
In a possible implementation manner, the first packet is based on any one of the following protocols: IB. RoCE, iWARP.
In a possible implementation manner, the front-end network card of the first storage device may further instruct the second storage device to perform mirror image storage. For example, the front-end network card of the first storage device sends a mirror data write command to the network card of the second storage device, the mirror data write command is used for requesting to write a copy of the target data, and the mirror data write command includes the copy of the target data and a logical address of the target data.
In a possible implementation manner, before the front-end network card of the first storage device sends the mirror image data write-in command to the second device (e.g., the network card of the second device), the second device may also be determined according to a mirror image mapping relationship, where the mirror image mapping relationship records a corresponding relationship between the first storage device and the second device. After the second device is determined, a mirrored data write command is sent to the second device.
In a possible implementation manner, the first storage device includes at least two switching network cards, and the first storage device may be connected to the second device through the at least two switching network cards. The front-end network card of the first storage device may select a switching network card from the at least two switching network cards based on a load balancing policy. And then sending mirror image data write-in commands to the network card of the second device through the selected switching network card.
In a possible implementation manner, the front-end network card of the first storage device may further read the target data at the request of the client device. For example, the front-end network card of the first storage device may receive a second message from the client device, where the second message is used to request to read the target data. Then, the front-end network card of the first storage device can acquire the logical address of the target data from the second message; acquiring metadata according to the logical address of the target data, wherein the metadata is used for describing the physical address of the target data stored in the first storage device; and acquiring target data from the first storage device according to the metadata. The front-end network card of the first storage device can also feed back a data access response carrying the target data to the client device.
In a possible implementation manner, the front-end network card of the first storage device may also parse the second packet to obtain a data read command based on the RDMA protocol. And then, analyzing the data reading command based on the RDMA protocol to acquire a logical address of the target data carried by the data reading command.
Wherein the second message is based on any one of the following protocols: IB. RoCE, iWARP.
In a possible implementation manner, before the front-end network card of the first storage device obtains the metadata, a home lookup may be further performed, and it is determined that a home node of the target data is the first storage device according to the logical address of the target data.
In a possible implementation manner, the front-end network card of the first storage device and the switching network card of the first storage device may be deployed in a centralized manner, that is, one network card has both the functions of the front-end network card and the switching network card. The front-end network card of the first storage device and the switching network card of the first storage device may also be deployed separately, that is, there are two different network cards, and one network card can interact with a front end, such as a client device, and is a front-end network card. Another network card can interact with other devices in the storage system and is a switching network card.
In a third aspect, an embodiment of the present application provides a data access system, where the data access system includes a client device and a first storage device. In the data access system, a first storage device is capable of reading target data locally at a client device or at a second storage device.
When determining that the target data needs to be read, the client device may send a second packet to the first storage device to read the target data, where the second packet includes a logical address of the target data. The target data may be the same as or different from the target data mentioned in the foregoing aspects.
If only the first storage device is located in the storage system of the first storage device, or the first storage device can manage all storage spaces in the storage system, the front-end network card of the first storage device can directly obtain the metadata according to the logical address of the target data, and then obtain the target data from the first storage device according to the metadata. If the first storage device is located in a storage system that has other devices (e.g., a second storage device) besides the first storage device, or the first storage device can manage only a part of the storage space in the storage system. The first storage device may perform a home lookup, and when it is determined that the home node of the target data is the first storage device, obtain metadata according to the logical address of the target data, and then obtain the target data from the first storage device according to the metadata. After the front-end network card of the first storage device acquires the target data, it may feed back a data access response carrying the target data to the client device.
By the method, the front-end network card of the first storage device can replace a processor of the first storage device to execute the data reading operation in the data reading process, the data reading process does not need to occupy the computing resource of the processor, and meanwhile, the data reading efficiency can be improved.
In a possible implementation manner, the first storage device includes a memory, index information is stored in the memory, and if the index information records a correspondence between a logical address of target data and metadata in a direct manner, the front-end network card of the first storage device may directly obtain the metadata according to the logical address of the target data. If the index information records the corresponding relationship between the logical address of the target data and the metadata in an indirect manner, for example, the corresponding relationship between the logical address of the target data and the metadata is recorded in a key-value pair manner. When the front-end network card of the first storage device acquires the metadata, the key of the target data can be acquired according to the logical address of the target data; and then, inquiring the value corresponding to the key in the index information according to the key, wherein the value is the metadata or the address of the metadata.
By the method, the front-end network card of the first storage device can acquire the metadata in different modes, so that the target data can be conveniently read subsequently.
In a possible implementation manner, the front-end network card of the first storage device has a message parsing capability, and after receiving the second message, the front-end network card can parse the second message, and parse the second message to obtain the RDMA protocol-based data reading command. And then, analyzing the data reading command to acquire the logical address of the target data carried by the data reading command.
Through the system, the front-end network card of the first storage device can completely analyze the second message, can directly obtain the logical address of the target data carried in the second message, and is convenient for reading the target data later.
In a possible implementation manner, the second packet may be a packet based on any one of the following protocols: IB. RoCE, iWARP.
In a possible implementation manner, if the front-end network card of the first storage device confirms that the home node of the target data is the second storage device, the front-end network card of the first storage device needs to acquire the target data from the second storage device. The front-end network card of the first storage device may forward the data reading command to the network card of the second storage device.
By the method, the first storage device can acquire the target data from other nodes after determining that the home node is other nodes, and the target data can be successfully fed back to the client device.
In a possible implementation manner, the first storage device includes at least two switching network cards, the first storage device may be connected to the second storage device through the at least two switching devices, and when the front-end network card of the first storage device forwards the data reading command, the switching network card may be selected from the at least two switching network cards based on a load balancing policy; and sending a data reading command to the network card of the second device by using the selected switching network card.
By the method, the first storage device can be connected with the second storage device through the at least two switching devices, so that efficient data interaction between the first storage device and the second storage device can be ensured, and the data exchange efficiency is improved.
In a possible implementation manner, the front-end network card of the first storage device and the switching network card of the first storage device may be deployed in a centralized manner or may be deployed separately.
In a fourth aspect, an embodiment of the present application provides a network card, where the network card is located in a storage device, and the network card includes a first protocol acceleration module, an index acceleration module, and a read-write module; the method executed by the front-end network card in the second aspect can be implemented by matching each module in the network card, and beneficial effects can be seen in the foregoing description, which is not described herein again.
The first protocol acceleration module may parse a data write command, the data write command being for requesting to write target data; acquiring a logical address of target data from the data write command; the read-write module may write the target data to the storage device. The index acceleration module may generate metadata describing physical addresses where the target data is stored in the storage device, and record a correspondence between logical addresses of the target data and the metadata.
In a possible implementation manner, the network card further includes a second protocol acceleration module; after the network card receives a first message from the client device, the second protocol acceleration module analyzes the first message to obtain a data write command, wherein the data write command is based on an RDMA protocol, and the first message is based on an IB, roCE or iWARP protocol.
In one possible implementation, the index acceleration module can be configured to maintain index information indicating a correspondence of logical addresses of the target data to the metadata. The index information indicates the corresponding relationship directly, or indirectly (e.g. by recording key value pairs) to indicate the corresponding relationship between the logical address of the target data and the metadata. For example, when the index acceleration module records the correspondence between the logical address of the target and the metadata, the index acceleration module may create a correspondence between a key and a value in the index information in the memory of the storage device, where the value is the metadata or the address of the metadata, and the key is determined according to the logical address of the target data.
In a possible implementation manner, the read-write module may further instruct a mirroring device of the storage device to perform mirroring storage. For example, the read-write module may send a mirror data write command to the mirroring device, the mirror data write command being used to request to write the target data, the mirror data write command including the target data and a logical address of the target data.
In a possible implementation manner, the read-write module may further determine the mirror image device according to a mirror image mapping relationship, where the mirror image mapping relationship records a corresponding relationship between the storage device and the mirror image device.
In one possible implementation, the storage device includes at least two switching network cards, and the storage device is connected to the mirroring device through the at least two network cards. The read-write module can select the switching network card from the at least two switching network cards based on a load balancing strategy. And sending a mirror image data write-in command to the network card of the second storage device through the selected switching network card.
In a possible implementation manner, each module in the network card may cooperate to implement data reading in addition to data writing. The first protocol acceleration module can analyze a data reading command, wherein the data reading command is used for requesting to read target data; the logical address of the target data is obtained from the data read command. The index acceleration module may obtain metadata according to the logical address of the target data, the metadata describing a physical address where the target data is stored in the first storage device. The read module may then retrieve the target data from the storage device based on the metadata.
In one possible implementation, before reading the metadata, the index acceleration module may further determine, according to the logical address of the target data, that the home node confirming the target data is the storage device.
In a possible implementation manner, the second protocol acceleration module is further configured to parse a second packet from the client device to obtain a data read command, where the data read command is based on an RDMA protocol, and the second packet is based on an IB, an RoCE, and an iWARP protocol.
In a possible implementation manner, the first protocol acceleration module, the second protocol acceleration module, the index acceleration module, and the read-write module may be software modules; the first protocol acceleration module, the second protocol acceleration module, the index acceleration module, and the read-write module may also be hardware modules, such as part or all of the following:
ASIC, FPGA, AI chip, soC, CPLD, GPU.
In a fifth aspect, an embodiment of the present application provides a storage device, where the storage device includes a network card, and the network card is configured to execute the method executed by the front-end network card in the second aspect and each possible implementation manner of the second aspect.
In a possible implementation manner, the storage device further includes a memory, and the memory is configured to store index information, where the index information is used to indicate a correspondence between a logical address of the target data and the metadata.
In a sixth aspect, an embodiment of the present application further provides a network card, where the network card may be a network card of a computer system or a server, and the network card has a function of implementing behaviors in the method examples in the second aspect and each possible implementation manner of the second aspect, and beneficial effects may refer to descriptions of the second aspect and are not described herein again.
In a possible implementation manner, the structure of the network card may include a processor and a memory, and the processor is configured to support the network card to execute the corresponding function of the network card in the second aspect. The memory is coupled to the processor and holds the program instructions and data necessary for the network card. The network card may further include an interface for communicating with other devices, for example, receiving the first message or the second message, or sending a data access response.
In another possible implementation manner, the network card may also include a processor and an interface in the structure, and the processor is configured to support the network card to execute corresponding functions in the method of the second aspect. The processor may also transmit data through the interface, for example, receive a first message or a second message, or send a data access response, etc.
In a seventh aspect, the present application further provides a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to perform the method described in the second aspect and the possible embodiments of the second aspect.
In an eighth aspect, the present application further provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method described above in the second aspect and in each of the possible embodiments of the second aspect.
In a ninth aspect, the present application further provides a computer chip, the chip is connected to a memory, and the chip is configured to read and execute a software program stored in the memory, and perform the method described in the second aspect and each possible implementation manner of the second aspect.
Drawings
FIG. 1A is a schematic diagram of an architecture of a data access system provided herein;
fig. 1B is a schematic structural diagram of a network card provided in the present application;
FIG. 2A is a schematic diagram of a data access system provided herein;
FIG. 2B is a schematic diagram of a data access system provided herein;
FIG. 3A is a schematic diagram of a data access method provided herein;
fig. 3B is a schematic structural diagram of a message provided in the present application;
fig. 3C is a schematic structural diagram of an IB payload provided herein;
FIG. 4 is a schematic diagram of a data access method provided herein;
FIG. 5 is a schematic diagram illustrating data transmission in a memory system under a data read command according to the present application;
FIG. 6 is a schematic diagram of a data access method provided herein;
FIG. 7 is a schematic diagram illustrating data transmission in a memory system under a data write command according to the present application;
fig. 8 is a schematic structural diagram of a network card provided in the present application.
Detailed Description
Before explaining the data access method provided by the present application, the concepts related to the present application will be explained:
1. metadata (metadata)
Also called intermediate data and relay data. The metadata is data (data about data) describing the data, and the metadata may indicate attributes of the data, such as a storage address where the metadata may record the data, modification information of the data, and the like.
2. Remote Direct Memory Access (RDMA)
RDMA is a technology for accessing data in a memory by bypassing a remote device (such as a storage device) operating system kernel, and because the RDMA does not pass through an operating system, a large amount of CPU resources are saved, the system throughput is improved, the network communication delay of the system is reduced, and the RDMA is particularly suitable for being widely applied to large-scale parallel computer clusters.
RDMA has several characteristics, (1) data is transmitted between a network and a remote device; (2) All contents related to sending and transmission are unloaded to the intelligent network card without the participation of an operating system kernel; (3) The direct data transmission between the user space virtual memory and the intelligent network card does not involve an operating system kernel, and no additional data movement and copying are needed.
Currently, there are three general types of RDMA networks, namely Infiniband (IB), remote direct memory access over Converged Ethernet (RoCE), and Internet Wide Area RDMA Protocol (iWARP). Infiniband is a network designed for RDMA, ensures reliable transmission from hardware, and requires network cards and switches to support the technology. RoCE and iWARP are both RDMA technologies based on Ethernet, and only special network cards need to be configured.
3. One-sided RDMA and two-sided RDMA
The two ends that need to exchange information are referred to as a client device (simply referred to as a client) and a server, respectively (in the embodiment of the present application, the server may be understood as a first storage device or a second storage device). The client is deployed on the user side, and the user can initiate a request to the server side through the client. The server may be deployed at a remote location, and the server generally refers to a storage system, and may be specifically understood as a device in the storage system.
Unilateral RDMA can be divided into RDMA READ (READ) and RDMA WRITE (WRITE).
Taking RDMAREAD in unilateral RDMA as an example, the client can directly determine the location of the metadata of the data to be read in the memory of the server, so that the message initiated by the client for requesting to read the data carries the location information of the metadata and sends the message to the server. And at the server side, the network card at the server side reads the metadata corresponding to the position information, and the data to be read is obtained according to the metadata. In the above process, the processor on the server side is not aware of a series of operations of the client. In other words, the processor on the server side does not know that the client executes the read operation, so that the consumption of the processor in the data transmission process is reduced, the performance of the system for processing the service is improved, and the method has the characteristics of high bandwidth, low time delay and low CPU occupancy rate.
Bilateral RDMA may be divided into RDMA SEND (SEND) and RDMA RECEIVE (RECEIVE).
Taking RDMARECEIVE in bilateral RDMA as an example, the client does not know the location of the metadata of the data to be read stored in the memory of the server, so that a message initiated by the client for requesting to read the data does not carry location information of the metadata. After receiving the message, the server-side processor queries the location information of the metadata and returns the location information to the client, and the client initiates a message for requesting to read data to the server again, where the message includes the location information of the metadata (i.e., the address of the metadata). And the network card of the server acquires the metadata according to the position information of the metadata, further acquires the data to be read, and sends the data to be read to the client. In the process, a processor on the server side is required to participate, that is, the processor on the server side is required to process a message from the client side in the dual-side RDMA, so that the time for reading data is shorter in the single-side RDMA, the occupancy rate of the processor is lower, and the user experience is better compared with the dual-side RDMA. Therefore, unilateral RDMA is more and more widely used.
4. Modules, e.g. first protocol acceleration module, second protocol acceleration module, read-write module, index acceleration module
In the embodiments of the present application, a module is intended to refer to a computer-related entity, hardware, software in execution, firmware, middleware, microcode, and/or any combination thereof. For example, a module may be an allowed process on a processor, an object, an executable, a thread of execution, a program, and/or a computer. But is not limited to, one or more modules that may reside within a process and/or thread of execution. Also, these modules may execute from various computer readable media having various data structures stored thereon.
For example, the module related to the present application may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), an Artificial Intelligence (AI) chip, a system on chip (SoC) or a Complex Programmable Logic Device (CPLD), a Graphics Processing Unit (GPU), and the like. Or may be a software program running on a processor such as a CPU or GPU.
5. A and/or B, may indicate three relationships, A only, B only, and both A and B.
As shown in fig. 1A, a schematic structural diagram of a data access system provided in an embodiment of the present application, the system includes a client device 200 and a storage system 100, the storage system 100 includes multiple devices, and only two devices of the storage system, namely a first device 110 and a second device 120, are exemplarily shown in fig. 1A.
The user accesses the data through the application. The computer running these applications is referred to as the "client device 200". The client device 200 may be a physical machine or a virtual machine. Physical client devices 200 include, but are not limited to, desktop computers, servers, laptops, and mobile devices.
The first device 110 includes one or more memories, which may include a hard disk, a memory or other device capable of storing data, and the specific type of the one or more memories is not limited in this application.
The first device 110 can manage the one or more memories. Management operations herein include, but are not limited to: recording the storage state of the one or more memories, and performing read and write operations on the one or more memories (e.g., writing data in the one or more memories, reading data from the one or more memories).
For convenience of illustration, the one or more memories of the first device 110 may be divided into two types, one type is an internal memory located inside the first device 110, and the other type is an external memory connected to the first device 110, and may also be referred to as an auxiliary memory. In the present application, the type of the auxiliary memory is not limited, and in the present embodiment, the auxiliary memory is described as an example of a hard disk, but a mechanical hard disk or another type of hard disk may be used as the auxiliary memory as well.
Specifically, referring to fig. 1A, the first device 110 includes a bus 111, a processor 112, a memory 113, a network card 114, and a hard disk 115. The processor 112, the memory 113 and the network card 114 communicate with each other through the bus 111.
It should be noted that the first device 110 and the second device 120 have similar structures, and the second device 120 also includes a bus 121, a processor 122, a memory 123, a network card 124, and a hard disk 125. For the effect and structure of the second device 120 are similar to the effect and structure of the first device 110, reference may be specifically made to the relevant description of the first device 110, and details are not described here.
The processor 112 may be a Central Processing Unit (CPU), other processing chips, or the like.
The memory 113 may include volatile memory (volatile memory), such as Random Access Memory (RAM), dynamic Random Access Memory (DRAM), and the like. The memory may be a non-volatile memory (SCM), a storage-class memory (SCM), or a combination of a volatile memory and a nonvolatile memory.
The memory 113 may also include other software modules required for running processes, such as an operating system. The operating system may be LINUXTM, UNIXTM, WINDOWSTM, etc.
The network card 114 may execute the data access method provided in the embodiment of the present application by calling a computer execution instruction stored in the memory 113. The network card 114 may also call a computer execution instruction stored in the network card 114 to execute the data access method provided in the embodiment of the present application. In some possible scenarios, the network card 114 may also be programmed with computer storage instructions, and the network card 114 may execute the data access method provided in the embodiment of the present application.
The first device 110 may also include one or more hard disks 115. The hard disk 115 may be used for permanent storage of data. In the storage system shown in fig. 1A, the first device 110 and the second device 120 are connected to different hard disks. This is only one possible connection (e.g., in the case where the storage system 100 is a distributed storage system), and in some scenarios, the first device 110 and the second device 120 may also connect to the same hard disk (e.g., in the case where the storage system is a centralized storage system), and the first device 110 and the second device 120 can perform read and write operations on these hard disks. In this embodiment, the network card 114 of the first device 110 and the network card 124 of the second device 120 can perform read/write operations on these hard disks.
In the embodiment of the present application, the network card 114 of the first device may be functionally divided into a front-end network card 114A and a switching network card 114B.
The front-end network card 114A of the first device 110 is a network card 114 in the first device 110 that is responsible for communicating with the client device 200, and for example, receives a message from the client device 200 and feeds back a response to the client device 200. The number of the front-end network cards 114A of the first device 110 is not limited in the embodiment of the present application, and may be one or multiple.
The switching network card 114B of the first device 110 is a network card 114 in the first device 110 that is responsible for interacting with other devices (e.g., the second device 120) in the storage system, such as forwarding a data access command to the second device 120 or sending an image data write command to the second device 120. One or more switching network cards 114B may be present in first device 110 to establish a connection with second device 120 (e.g., with switching network card 124B in second device 120). When a plurality of switching network cards 114B exist in the first device 110, the first device 110 may send a message to the second device 120 through any one of the switching network cards 114B, and particularly, under the condition that the first device 110 and the second device 120 interact frequently or the condition that the amount of data that the first device 110 and the second device 120 need to interact is large, the plurality of switching network cards 114B in the first device 110 can effectively share the information that needs to interact between the first device 110 and the second device 120, and can effectively improve the data transmission efficiency.
Fig. 1A illustrates an example in which a front-end network card 114A of the first device 110 and a switching network card 114B of the first device 110 are independently deployed. In some scenarios, the front-end network card 114A of the first device 110 and the switching network card 114B of the first device 110 may also be deployed in a centralized manner, that is, the front-end network card 114A of the first device 110 and the switching network card 114B of the first device 110 may be presented in a centralized manner as one network card 114, and the network card 114 has both the function of the front-end network card 114A, can communicate with the client device 200, has the function of the switching network card 114B, and can interact with other devices in the storage system. In the embodiment of the present application, it is described by taking an example that the front-end network card 114A of the first device 110 and the switching network card 114B of the first device 110 are deployed independently. Certainly, the data access method provided in this embodiment is also applicable to a case where the front-end network card 114A of the first device 110 and the switching network card 114B of the first device 110 are deployed in a centralized manner, and in this case, the interaction between the front-end network card 114A of the first device 110 and the switching network card 124B of the first device 120 may be understood as the interaction inside the network card 114 of the first device 110.
In the embodiment of the present application, the network card 114 of the first device 110 can directly interact with the client device 200, and write data in the first device 110 or read data. The processor 112 of the first device 110 is not required to participate in the entire process. The structure of the network card 114 of the first device 110 is explained below.
The configuration of the network card 114 in the first device 110 is described below as an example. As shown in fig. 1B, which is a schematic structural diagram of a network card provided in the embodiment of the present application, the network card 114 includes a first protocol acceleration module 114-2, a second protocol acceleration module 114-1, an index acceleration module 114-3, and a read/write module 114-4.
The second protocol acceleration module 114-1 can parse a packet (e.g., the first packet or the second packet) from the client device 200, remove a portion of the header from the packet, and obtain a data access command (e.g., a data read command or a data write command) based on the RDMA protocol.
The first protocol acceleration module 114-2 can further analyze the packet from the client device 200, that is, analyze the data access command obtained after the second protocol acceleration module 114-1 analyzes the packet, and obtain information carried in the data access command, such as a logical address of the obtained data (e.g., a logical address of the first target data or a logical address of the second target data).
The index acceleration module 114-3 is used to maintain the correspondence between the logical address of the data and the metadata of the data. The maintenance includes processing operations such as saving, updating, creating, and the like.
When data needs to be read, if the command parsed from the received message by the second protocol acceleration module 114-1 is a data reading command, the index acceleration module 114-3 can search for the metadata of the data according to the logical address of the data acquired by the first protocol acceleration module 114-2 by using the maintained corresponding relationship between the logical address of the data and the metadata of the data.
When data needs to be written, if the command parsed from the received message by the second protocol acceleration module 114-1 is a data write command, the index acceleration module 114-3 records a corresponding relationship between a logical address of the data and metadata of the data.
There are many ways in which the index acceleration module 114-3 may represent the correspondence between the logical address of the data maintained and the metadata of the data. For example, the index acceleration module 114-3 may maintain a correspondence between a logical address of data and metadata of the data in a straightforward manner. For example, the index acceleration module 114-3 may store a correspondence table in which each row records a logical address of data and metadata of the data. Or each row records a logical address of data and an address of metadata of the data. As another example, the index acceleration module 114-3 may maintain a correspondence between a logical address of data and metadata of the data in an indirect manner. For example, the index acceleration module 114-3 may convert the logical address of the data, such as converting the logical address with a longer data length (which may be converted by hashing) into the data with a shorter data length (such as converting into a key), record the corresponding relationship between the converted data and the metadata of the data, or record the corresponding relationship between the converted data and the address of the metadata of the data. For example, the index acceleration module 114-3 may record a key value pair, where the key (key) is obtained after the logical address translation, and the value (value) is the metadata of the data, or the address of the metadata of the data.
In the embodiment of the present application, the correspondence relationship between the logical address of the data maintained by the index acceleration module 114-3 and the metadata of the data is collectively referred to as index information. As described above, the index information may record the correspondence between the logical address of the data and the metadata of the data in a direct manner, or may record the correspondence between the logical address of the data and the metadata of the data in an indirect manner (for example, the index information records the correspondence between the logical address of the data and the metadata of the data in the form of key-value pairs). In this embodiment of the application, the position where the index information is stored is not limited, and the index information may be stored in the network card 114 (for example, in a memory in the network card 114), or may be stored in a memory of the first device 110, for example, in a memory of the first device 110.
The read-write module 114-4 is used for reading data from the memory of the first device 110 or writing data into the memory of the first device 110.
When data needs to be read, the read-write module 114-4 may obtain the metadata of the data or the address of the metadata of the data from the index acceleration module 114-3, determine the physical address of the data according to the metadata of the data, and read the data according to the physical address of the data. When the read-write module 114-4 reads data, if the data is stored in an external memory such as a hard disk of the first device 110, the read-write module 114-4 may first read the data from the external memory such as the hard disk of the first device 110 to the internal memory of the first device 110, and then read the data from the internal memory of the first device 110. The read/write module 114-4 may also directly read the data from an external memory such as a hard disk of the first device 110.
When data needs to be written, the read/write module 114-4 may write the data into the memory of the first device 110, and the data writing manner of the read/write module 114-4 may be an additional writing (appended) manner, and the data that needs to be written at this time is written from the end address of the data that was written last time. The read-write module 114-4 may also generate metadata of the data when writing the data, and send the metadata of the data to the index acceleration module 114-3, so that the index acceleration module 114-3 may record a corresponding relationship between the data and the metadata of the data. Of course, the read/write module 114-4 may also save the metadata of the data after generating the metadata of the data, and send the address of the metadata of the data to the index acceleration module 114-3, so that the index acceleration module 114-3 may record the corresponding relationship between the data and the metadata of the data. The data writing method of the read/write module 114-4 may also adopt other writing methods, and the embodiment of the present application is not limited.
The embodiment of the present application does not limit the type of the storage system, and in practice, the storage system in fig. 1A may be represented as a centralized storage system or a distributed storage system, and a storage system to which the embodiment of the present application is applied will be described below.
1. Centralized storage system
Fig. 2A is a schematic diagram of a system architecture provided in the embodiment of the present application, where the system architecture includes a storage system 100, a client device 200, and a switch 300.
The description of the client device 200 can refer to the related description in fig. 1A, which is not repeated here, and the client device 200 accesses the storage system 100 through the fabric switch 300 to access data. However, the switch 300 is only an optional device, and the client device 200 may also communicate with the storage system 100 directly through a network. The storage system 100 shown in FIG. 2A is a centralized storage system 100. The centralized storage system 100 is characterized by a unified entry through which all data from external devices pass, which is the engine 130 of the centralized storage system 100. The engine 130 is the most central component of the centralized storage system 100 in which many of the high-level functions of the storage system 100 are implemented.
As shown in fig. 2A, there are one or more controllers in the engine 130, and fig. 2A illustrates an example where the engine 130 includes two controllers. If a mirror channel is provided between the controller 0 and the controller 1, after the controller 0 writes a copy of data into its memory, the mirror channel can send a copy of the data to the controller 1, and the controller 1 stores the copy in its local memory. Therefore, the controller 0 and the controller 1 are backup to each other, when the controller 0 fails, the controller 1 can manage the service of the controller 0, and when the controller 1 fails, the controller 0 can manage the service of the controller 1, thereby avoiding unavailability of the whole storage system 100 caused by hardware failure. When 4 controllers are deployed in the engine 130, any two controllers have mirror channels therebetween, and thus any two controllers are backup to each other. Here, the controller 0 and the controller 1 are identical to the first device 110 and the second device 120 shown in fig. 1A.
Taking the controller 0 as an example, in terms of hardware, the controller 0 includes a processor 112, a memory 113, and a front-end network card 114A, where the front-end network card 114A is configured to communicate with the client device 200, so as to provide a storage service for the client device 200. The controller 0 also includes a back-end interface 116, the back-end interface 116 for communicating with the hard disk 115 to expand the capacity of the storage system 100. Through the backend interface 116, the engine 130 can connect more hard disks 115, thereby forming a very large pool of storage resources. In fig. 2A, the controller 0 may further include a switching network card 114B, and the controller 0 may be connected to the controller 1 through the switching network card 114B. For the description of the processor 112, the memory 113, the front-end network card 114A, and the switch network card 114B, reference may be made to the description in the embodiment shown in fig. 1A, and details are not repeated here.
The front-end network card 114A (or the switching network card 114B in cooperation) in the controller 0 may execute the data access method provided by the embodiment of the present application.
In this embodiment, taking the first device 110 as an example, when the front-end network card 114A can receive a message (such as a first message and a second message) sent by the client device 200, the data access method provided in this embodiment is executed, and the message is processed. When the first packet carries the data reading command, if the target data to be read is located in the memory 113, the front-end network card 114A may directly read the target data from the memory 113, and feed back the target data to the client device 200 through the data reading response. If the target data to be read is located in the hard disk 115, the front-end network card 114A may first read the target data from the hard disk 115 to the memory 113, then read the target data from the memory 113, and feed the target data back to the client device 200 through a data access response.
When the message carries a data write command, the front-end network card 114A may first write target data to be written into the memory 113, and when the total amount of data in the memory 113 reaches a certain threshold, send the data stored in the memory 113 to the hard disk 115 through the back-end interface 116 for persistent storage.
The front-end network card 114A may run a computer execution instruction in the memory 113 to implement management on the hard disk 115, and the front-end network card 114A may also run a computer execution instruction stored in the front-end network card 114A or a computer execution instruction written by itself to implement management on the hard disk 115. For example, the hard disk 115 (and the memory 113) is abstracted into a storage resource pool, and then the storage resource pool is divided into Logical Unit Numbers (LUNs) to be provided for the client device 200 to use. The LUN here is actually the storage space that is visible at the client device 200. Of course, some centralized storage systems 100 are themselves file servers, and may provide shared file services to the servers.
The hardware components and software structure of controller 1 (and other controllers not shown in fig. 2A) are similar to controller 0 and will not be described again.
FIG. 2A illustrates a disk separated centralized storage system. In this system, the engine 130 may not have a hard disk slot, the hard disk 115 needs to be placed in a hard disk box, and the backend interface 116 communicates with the hard disk box. The backend interface 116 exists in the form of an adapter card in the engine 130, and two or more backend interfaces 116 can be used simultaneously on one engine 130 to connect multiple hard disk frames. Alternatively, the adapter card may be integrated on the motherboard, and the adapter card may communicate with the processor 112 via the PCIE bus. In this system, the engine 130 may also have a hard disk slot into which the hard disk 115 is directly inserted, with the backend interface 116 communicating with each hard disk.
In the centralized storage system, the hard disks 115 managed by the controller 0 and the controller 1 may be the same or different. For example, controller 0 may manage a portion of hard disk 115 and controller 1 manages the remaining hard disk 115.
2. Distributed storage system
The data access method provided by the embodiment of the application is applicable to a centralized storage system and is also applicable to a distributed storage system. As shown in fig. 2B, which is a schematic diagram of a system architecture of a distributed storage system provided in the embodiment of the present application, the distributed storage system 100 includes a server cluster. The server cluster includes one or more servers (server 150 and server 160 are shown in FIG. 2B, but not limited to these two servers) that may communicate with each other. A server is a device, such as a server, a desktop computer, etc., that has both computing and storage capabilities. In software, each server has an operating system on it. Here, the server 150 and the server 160 are identical to the first device 110 and the second device 120, respectively, shown in fig. 1A.
Taking the server 150 as an example for description, the network card 114 in the server 150 may execute the data access method provided in the embodiment of the present application, and in terms of hardware, as shown in fig. 2B, the server 150 at least includes the processor 112, the memory 113, the network card 114, and the hard disk 115. The processor 112, the memory 113, the network card 114 and the hard disk 115 are connected through a bus. The functions and types of the processor 112, the memory 113, the network card 114, and the hard disk 115 may refer to the description of the processor 112, the memory 113, the network card 114, and the hard disk 115 of the first device 110 shown in fig. 1A, which is not described herein again.
When the network card 114 can receive the message sent by the client device 200, the data access method provided in the embodiment of the present application is executed to process the message. When the packet carries the data reading command, if the target data to be read is located in the memory 113, the target data may be directly read from the memory 113, and the target data is fed back to the client device 200 through the data reading response. If the target data to be read is located in the hard disk 115, the target data may be read from the hard disk 115 to the memory 113, and then the target data may be read from the memory 113, and the target data may be fed back to the client device 200 through a data access response. When the message carries a data write command, the network card 114 may first write target data to be written into the memory 113, and when the total amount of data in the memory 113 reaches a certain threshold, the backend interface 116 sends the data stored in the memory 113 to the hard disk 115 for persistent storage.
The hard disk 115 is used to provide storage resources, such as storing data. It may be a magnetic disk or other type of storage medium such as a solid state hard disk or a shingled magnetic recording hard disk. The network card 114 is used to communicate with other client devices 200.
The centralized storage system and the distributed storage system mentioned above are only examples, and the data access method provided by the embodiment of the present application is also applicable to other centralized storage systems and distributed storage systems.
As shown in fig. 3A, for a data access method provided in this embodiment of the present application, the first device 110 mentioned in this method may be a controller (e.g., controller 0 or controller 1) of fig. 2A, or may be the server 150 or server 160 of fig. 2B.
Step 301: the client device 200 sends a message to the first device 110 requesting access to the target data.
In the embodiment of the present application, the message includes at least two types. One is a packet carrying a data write command for requesting writing of target data. The data write command carries target data and a logical address to which the target data needs to be written (in this embodiment, the logical address may be referred to as a logical address of the target data). The other type is a packet carrying a data read command, where the data read command is used to request to read target data, and the data read command carries a logical address of the target data.
The embodiment of the present application does not limit the protocol supported by the packet, for example, the packet may support infiniband, roCE, and iWARP. For messages supporting different protocols, the information carried in the messages is similar, and the difference is that the positions of the information placed in the messages under different protocols may be different.
Here, taking the message as a message supporting the RoCE as an example, a structure of the message is described:
as shown in fig. 3B, a structural diagram of a message supporting a RoCE provided for the embodiment of the present application is that the message may be divided into two parts, where one part is a data access command based on an RDMA protocol, such as an RDMA SEND command (a command required to be sent in an RDMA SEND operation), and a RDMARECEIVE command (a command required to be sent in a RDMA RECEIVE operation), where the data access command includes an IB transport layer header and an IB payload. The IB transport layer header records information related to the IB transport layer.
The remaining part is information required to support network transmission, including a two-layer ethernet header, an ethernet type, an Internet Protocol (IP) header, a User Datagram Protocol (UDP) header, a fixed cyclic redundancy check (ICRC), and a Frame Check (FCS).
The two-layer ethernet header is used to record related information of the two-layer network, such as a source address and a destination address in the two-layer network. The IP headers may record the source IP address as well as the destination IP address in the IP network. The destination port number in the UDP header is 4791, which means a RoCEv2 frame. ICRC and FCS are parameters set for guaranteeing data integrity.
Step 302: after the message reaches the first device 110, the front-end network card 114A of the first device 110 receives the message first, analyzes the message, obtains the data access command carried therein, and obtains information carried in the data access target access request, such as obtaining target data and a logical address of the target data (e.g., the data access command is a data write command); or a logical address to obtain the target data (e.g., the data access command is a data read command).
The parsing of the packet by the front-end network card 114A of the first device 110 may be divided into two steps, where the first step of parsing may be performed by the second protocol acceleration module 114-1, and the second protocol acceleration module 114-1 may obtain a data access command based on the RDMA protocol from the packet. The second protocol acceleration module 114-1 is capable of parsing the message for information other than RDMA protocol based data access commands. The second parsing step may be performed by the first protocol acceleration module 114-2, and may be capable of parsing the data access command, determining the type of the data access command, and obtaining information carried in the data access command.
Still taking a message as a message supporting a RoCE, taking a data access command carried in the message as a data write command as an example, as shown in fig. 3C, the structure diagram of an IB payload is shown, where the payload includes an operation code (OPcode), a command Identifier (command Identifier), a namespace Identifier (NSID), a cache address (buffer address), a length (length), target data, a Logical Block Address (LBA), and a Logical Unit Number (LUN). The buffer address, length, and target data may be placed in the IB payload in a scatter gather table (SGL) format. When the data access command is a data read command, the structure of the IB payload is similar to that shown in fig. 3C, except that the target data is not included in the IB payload when the data access command is a data read command.
The OPcode can indicate an operation corresponding to the data access command, such as write (write) or read (read). The command identifier is used to identify the data access command. The NSID is used to indicate the namespace to which the data access command is directed. The cache address is used for indicating the cache position of the target data after the receiving end receives the data access command. The front-end network card 114A of the first device 110 may temporarily store the target data at the location indicated by the cache address after receiving the data write command. The front-end network card 114A of the first device 110 may temporarily cache the acquired target data at the location indicated by the cache address after receiving the data read command. The length is used to indicate the length of the target data. The logical block address and logical unit number may indicate a starting logical address of the target data. The logical block address, logical unit number, and length may indicate an address segment.
The front-end network card 114A of the first device 110 may obtain the logical address of the target data from the IB payload, such as the length, LBA, and LUN of the target data.
Step 303: after parsing the data access command, the front-end network card 114A of the first device 110 may process the data access command in the memory of the first device 110 according to the logical address of the target data.
When the data access command is a data write command, the front-end network card 114A of the first device 110 may store the target data in the memory of the first device 110 at the location indicated by the logical address.
When the data access command is a data read command, the front-end network card 114A of the first device 110 may read the target data from the location indicated by the logical address in the memory of the first device 110.
The front-end network card 114A of the first device 110 may feed back a data access response to the client device 200 after processing the data access command, step 304.
When the data access command is a data write command, the data access response is a data write response, and the data write response is used for indicating that the target data is successfully written.
And when the data access command is a data reading command, the data access response is a data reading response, and the data reading response carries target data.
It can be seen that when the types of the data access commands are different, the processing manners of the front-end network card 114A of the first device 110 are different, and the processing manners of the front-end network card 114A of the first device 110 under the data write command and the data read command are described below respectively.
1. The data access command is a data write command.
As shown in fig. 4, in an embodiment of the method for accessing data, a front-end network card 114A of a first device 110 receives a first packet carrying a data write command, the front-end network card 114A of the first device 110 can independently process the first packet, and can write first target data carried in the data write command into a memory (such as a memory 113 or a hard disk 115 of the first device 110) of the first device 110 by itself, without involvement of a processor 112 of the first device 110 in the whole process, which can reduce consumption of the processor 112 of the first device 110, and the front-end network card 114A of the first device 110 directly writes the first target data into the memory of the first device 110, which can effectively improve data writing efficiency. The method comprises the following steps:
step 401: the front-end network card 114A of the first device 110 receives a first packet from the client device 200, where the first packet carries a data write command, and the data write request is used to request to write first target data. The data write command carries first target data and a logical address to which the first target data needs to be written (in this embodiment, referred to as a logical address of the first target data).
The logical address of the first target data is a logical address that can be perceived by the client device 200 side, and the embodiment of the present application does not limit the expression manner of the logical address of the first target data. For example, the logical address of the first target data may include a logical start address of the first target data and a data length (length) of the first target data. The logical start address may be represented by a LBA and a LUN.
Step 402: the front-end network card 114A of the first device 110 parses the first packet, and obtains the logical address of the first target data and the first target data from the data write command.
The way for the front-end network card 114A of the first device 110 to analyze the first packet to obtain the logical address of the first target data and the first target data may refer to the embodiment shown in fig. 3A, and details are not repeated here.
Step 403: after the front-end network card 114A of the first device 110 obtains the logical address of the first target data, the front-end network card 114A of the first device 110 may determine whether the home node of the first target data is the first device 110 according to the logical address of the first target data. That is, the front-end network card 114A of the first device 110 determines whether the node where the first target data needs to be stored is the first device 110 according to the logical address need of the first target data. This step may be performed by the index acceleration module 114-3 in the front-end network card 114A of the first device 110.
Because the storage space formed by a plurality of hard disks in the storage system is mapped into a plurality of storage logic layers in the storage system, one address field in each storage logic layer corresponds to the address field in the next storage logic layer. What the client device 200 can perceive is the address of the highest stored logical layer, i.e., the logical address. The storage space indicated by these logical addresses may be the memory or hard disk of each device in the storage system. Different devices in the storage system can manage respective memories and respective external memories such as hard disks. One device in the storage system can only process data access commands carrying logical addresses formed after mapping of the managed memory or hard disk.
After the front-end network card 114A of the first device 110 parses the logical address of the first target data from the data write command, it may first determine, according to the logical address of the first target data, whether the storage space indicated by the logical address is a storage space that can be managed by the first device 110, that is, determine whether the home node of the first target data is the first device 110.
The embodiment of the present application does not limit the way in which the front-end network card 114A of the first device 110 performs the home lookup. Two of these are listed below.
In a first manner, the front-end network card 114A of the first device 110 may hash the logical address of the first target data to determine the home node of the first target data.
The hash may also be referred to as hash calculation, and the hash is to convert input data of an arbitrary length into output of a fixed length. In step 403, the front-end network card 114A of the first device 110 may take the logical address of the first target data as an input of the hash, and hash the logical address of the first target data to obtain the hash value. The specific form of the hash is various, the hash values obtained by adopting different hashes are different, and the information indicated by the hash values is also different.
For example, the hash value may indicate a home node of the first target data, e.g., the hash value may be an identification of the home node of the first target data. The front-end network card 114A of the first device 110 may determine whether the home node of the first target data is the first device 110 according to the hash value.
For another example, the hash value may be a key, and the front-end network card 114A of the first device 110 may store the correspondence between different keys and respective devices. The first device 110 may search for a corresponding device according to the key, where the searched device is a home node of the first target data.
In the second mode, the front-end network card 114A of the first device 110 records the corresponding relationship between the logical address of the data and each device, and the front-end network card 114A of the first device 110 may search for the device corresponding to the logical address of the first target data in the corresponding relationship between the logical address of the data and each device, where the searched device is the home node of the first target data.
Step 404: after the front-end network card 114A of the first device 110 determines that the home node of the first target data is the first device 110, the front-end network card 114A of the first device 110 may store the first target data in the memory of the first device 110 according to the logical address of the first target data. This step may be performed by the read-write module 114-4 in the front-end network card 114A of the first device 110.
Since the storage of the first device 110 may be divided into the memory 113 and the hard disk 115, the front-end network card 114A of the first device 110 may preferentially write the first target data into the memory 113 of the first device 110, and when the amount of data stored in the memory 113 of the first device 110 reaches the threshold, the front-end network card 114A of the first device 110 may transfer the first target data from the memory 113 to the hard disk 115 of the first device 110. Whether the first target data is written into the memory 115 or transferred from the memory 113 to the hard disk 115 of the first device 110, the front-end network card 114A of the first device 110 may write the first target data from the end address of the last write data by means of additional writing when writing the first target data.
Step 405: the front-end network card 114A of the first device 110 generates first metadata of the first target data, and records a corresponding relationship between a logical address of the first target data and the first metadata of the first target data.
Specifically, inside the front-end network card 114A of the first device 110, the read-write module 114-4 may generate the first metadata of the first target data after writing the first target data. The read/write module 114-4 may operate as follows.
The first operation, the read/write module 114-4 may directly transmit the first metadata of the first target data to the index acceleration module 114-3, and the index acceleration module 114-3 may record the corresponding relationship between the logical address of the first target data and the first metadata of the first target data.
The manner in which the index acceleration module 114-3 records the corresponding relationship may be a direct recording manner, and the index information maintained by the index acceleration module 114-3 directly records the corresponding relationship between the logical address of the first target data and the first metadata of the first target data.
The index acceleration module 114-3 may record the corresponding relationship in an indirect recording manner, and the index acceleration module 114-3 may hash the logical address of the first target data to obtain a key corresponding to the logical address, and create a key value pair in the index information, where a value in the key value pair is the first metadata of the first target data.
Operation two, the read/write module 114-4 may store the first metadata of the first target data in a storage (e.g., in the memory 113) of the first device 110, transmit an address of the first metadata of the first target data (i.e., a storage address of the first metadata in the storage of the first device 110) to the index acceleration module 114-3, and the index acceleration module 114-3 may record a corresponding relationship between a logical address of the first target data and the first metadata of the first target data.
The index acceleration module 114-3 may record the corresponding relationship directly, and the index acceleration module 114-3 directly records the corresponding relationship between the logical address of the first target data and the address of the first metadata of the first target data in the index information maintained by the index acceleration module.
The index acceleration module 114-3 may record the corresponding relationship in an indirect recording manner, and the index acceleration module 114-3 may hash the logical address of the first target data to obtain a key corresponding to the logical address, and create a key value pair in the index information, where a value in the key value pair is an address of the first metadata of the first target data.
In this embodiment, the front-end network card 114A of the first device 110 may instruct other devices in the storage system to perform mirror storage, that is, store a copy of the first target data in the other devices, besides writing the first target data into the memory of the first device 110. To ensure the reliability of the data. The method for the front-end network card 114A of the first device 110 to instruct other devices in the storage system to perform mirror image storage may refer to steps 406 to 409. In the process of instructing the other devices in the storage system to perform the image storage, the processor 112 of the first device 110 is also not required to participate, and the consumption of the processor 112 of the first device 110 can be reduced. The front-end network card 114A of the first device 110 directly instructs other devices in the storage system to perform mirror image storage, which can effectively improve data writing efficiency.
Step 406: the front end network card 114A of the first device 110 determines the second device 120 that needs to have the mirrored storage. This step may be performed by the read-write module 114-4 in the front-end network card 114A of the first device 110.
In a storage system, a mirror mapping relationship may be configured for devices in the storage system. The two devices with mirror image mapping relation are mirror images of each other. When one device in the storage system needs to store data, the data that needs to be stored may be carried in a mirror data write command, the mirror data write command is sent to a device that has a mirror mapping relationship with the device, and the mirror data write is used to request the device that has the mirror mapping relationship with the device to store a copy of the data (the copy of the data may be another copy of the data, or may be data that has a higher similarity with the data and has redundant data removed), so as to implement mirror storage. For any device in the storage system, the device may have a mirroring mapping relationship with another device in the storage system, or may have a mirroring mapping relationship with another plurality of devices in the storage system. The number of devices having a mirror mapping relationship with the device is not limited in the embodiments of the present application.
Taking the first device 110 in the storage system as an example, the mirror mapping relationship of the first device 110 may be pre-configured in the front-end network card 114A of the first device 110, for example, the mirror mapping relationship of the first device 110 may be pre-configured in the front-end network card 114A of the first device 110 by the processor 112 of the first device 110. The mirror mapping relationship of the first device 110 may be stored in the front-end network card 114A of the device, and when the front-end network card 114A of the first device 110 determines a device having a mirror mapping relationship with the device, the mirror mapping relationship may be directly determined according to the locally obtained mirror mapping table relationship. The mirror mapping relationship of the first device 110 may also be stored in the memory 113 of the first device 110, and when the front-end network card 114A of the first device 110 determines a device having the mirror mapping relationship with the front-end network card, the mirror mapping relationship may be called from the memory 113 of the first device 110, and then the device having the mirror mapping relationship with the front-end network card may be determined according to the obtained mirror mapping relationship.
Here, the mirror mapping relationship of the first device 110 indicates that the first device 110 and the second device 120 are mirror images of each other. The front-end network card 114A of the first device 110 may determine the second device 120 according to the mirror mapping relationship of the first device 110. Of course, if the mirror mapping relationship of the first device 110 indicates that the first device 110 and a plurality of devices (including the second device 120) are mirror images of each other, the first device 110 may select one device from the plurality of devices, such as the second device 120.
In the foregoing description, the manner in which the front-end network card 114A of the first device 110 determines the second device 120 according to the mirror mapping relationship is merely an example. In practical applications, the front-end network card 114A of the first device 110 may also select the second device 120 that needs to perform the mirror storage in other manners. For example, the front-end network card 114A of the first device 110 may select one device from the plurality of devices connected to the first device 110, and instruct the selected device to perform the mirror storage (i.e., perform step 405). Taking the device selected by the front-end network card 114A of the first device 110 as the second device 120 as an example, after the front-end network card 114A of the first device 110 selects the second device 120, the correspondence between the first device 110 and the second device 120 may be locally recorded, and subsequently, if the front-end network card 114A of the first device 110 needs to write other data (for example, the front-end network card 114A of the first device 110 receives a data write command carrying other data), the front-end network card 114A of the first device 110 may determine the second device 120 directly according to the correspondence between the first device 110 and the second device 120, and instruct the second device 120 to perform mirror image storage on the other data.
Step 407: the front-end network card 114A of the first device 110 sends a mirror data write command to the switching network card 124B of the second device 120, where the mirror data write command includes first target data and a logical address to which the first target data needs to be written. This step may be performed by the read-write module 114-4 in the front-end network card 114A of the first device 110.
The front end network card 114A of the first device 110 may generate a mirrored data write command after selecting the second device 120. For example, the front-end network card 114A of the first device 110 may reserve the IB payload in the data write command, change the destination address and the source address in the data write command, change the destination address to the address of the second device 120, and change the source address to the address of the first device 110, where the changed data write command is the mirror data write command.
When the front-end network card 114A of the first device 110 sends a mirror image data write-in command to the switching network card 114B of the second device 120, if the first device 110 is connected to the second device 120 through at least two switching network cards 114B, the front-end network card 114A of the first device 110 may further select a switching network card 114B from the at least two switching network cards 114B, and send the mirror image data write-in command to the switching network card 124B of the second device 120 through the selected switching network card 114B.
The embodiment of the present application does not limit the manner in which the front-end network card 114A of the first device 110 selects the switching network card 114B from the plurality of switching network cards 114B. For example, the front-end network card 114A of the first device 110 may randomly select one of the switching network cards 114B. For another example, the front-end network card 114A of the first device 110 may select one switching network card 114B from the plurality of switching network cards 114B based on a load balancing policy.
The load balancing policy describes rules that need to be followed for selecting the switching network card 114B from the plurality of switching network cards 114B, so as to ensure that different switching network cards 114B can achieve load balancing, that is, the data volume transmitted by each switching network card 114B is consistent or close.
Step 408: after receiving the mirror data write command, the network card 124 of the second device 120 may write the first target data into the memory of the second device 120 according to the logical address of the first target data.
The way in which the network card 124 of the second device 120 writes the first target data in the memory of the second device 120 according to the logical address of the first target data is similar to the way in which the network card 114 of the first device 110 writes the first target data in the memory of the first device 110 according to the logical address of the first target data in step 405, which may specifically refer to the foregoing description and is not described again here.
Step 409: after writing the first target data, the network card 124 of the second device 120 may feed back a mirror image data writing response to the front-end network card 114A of the first device 110, indicating that the first target data is successfully written.
Step 410: after receiving the mirror data write response, the front-end network card 114A of the first device 110 may feed back a data write response to the client device 200, indicating that the first target data write is successful. This step may be performed cooperatively by the first protocol acceleration module 114-2 and the second protocol acceleration module 114-1 in the front-end network card 114A of the first device 110. That is, the first protocol acceleration module 114-2 may encapsulate the information that needs to be fed back into a message format supporting RDMA, and the second protocol acceleration module 114-1 may encapsulate the message encapsulated by the first protocol acceleration module 114-2 into a data write response supporting RoCE, IB, or iWAPP on the basis of the first protocol acceleration module 114-2. It should be noted that, in step 410, the front-end network card 114A of the first device 110 will feed back the data write response to the client device 200 after receiving the mirror data write response. In practical applications, the front-end network card 114A of the first device 110 may also feed back a data write response to the client device 200 after the first target data is stored in the first device 110. Thereafter, the second device 120 is instructed to perform the mirroring operation. Still alternatively, in some scenarios, the front-end network card 114A of the first device 110 may feed back a data write response to the client device 200 after sending the mirrored data write command to the second device 120. That is, the embodiment of the present application does not limit the sequence of the step of feeding back the data write response to the client device 200 and the step of instructing the second device 120 to perform the mirror storage.
If the front-end network card 114A of the first device 110 determines that the home node of the first target data is not the first device 110, the front-end network card 114A of the first device 110 may directly reject the data write request to the client device 200, for example, send a response indicating that the data cannot be written to the client device 200. The front-end network card 114A of the first device 110 may also forward the first packet to the home node of the first target data. A manner in which the front-end network card 114A of the first device 110 sends the first packet is similar to a manner in which the front-end network card 114A of the first device 110 sends the mirror data write request, which may specifically refer to the foregoing description, and details are not described here. After receiving the first packet, the home node of the first target data may refer to step 402 to step 409 for the processing process of the first packet, which is not described herein again. It should be noted that, after receiving the first packet, the home node of the first target data may perform a process on the first packet by using the switching network card 114B of the home node.
As shown in fig. 5, which is a schematic diagram of data transmission in a storage system under a data write command, a client device 200 may send a first message to a first device 110 in the storage system through a network card. After the front-end network card 114A of the first device 110 receives the first packet, the second protocol acceleration module 114-1 may parse the data write command from the first packet, and the first protocol acceleration module 114-2 may parse the target data and the logical address of the target data from the data write command. The read/write module 114-4 may first store the target data in the memory 113 of the first device 110, and generate metadata of the target data. The read-write module 114-4 sends the metadata of the target data to the index acceleration module 114-3. The index acceleration module 114-3 records the logical address of the target data and the corresponding relationship of the metadata of the target data in the index information. The front-end network card 114A (e.g., the read-write module 114-4) of the first device 110 may further send a mirror image data write command to the switching network card 114B of the first device 110, and after receiving the mirror image data write command, the switching network card 114B of the first device 110 may send the mirror image data write command to the switching network card 124B of the second device 120. The exchange of the mirror data write commands between the exchange network card 114B of the first device 110 and the exchange network card 124B of the second device 120 may be based on RDMA. After receiving the mirror data write command, the switching network card 124B of the second device 120 may store the target data in the memory 123 of the second device 120.
2. The data access command is a data read command.
As shown in fig. 6, in the data access method provided in this embodiment of the present application, in the method, the front-end network card 114A of the first device 110 receives a data reading command, in this method embodiment, the front-end network card 114A of the first device 110 can independently process the data reading command without the processor 112 of the first device 110 participating in the whole process, that is, the front-end network card 114A of the first device 110 no longer needs to interact with the processor 112 of the first device 110, which can shorten the data reading time, improve the data reading efficiency, and reduce the consumption of the processor 112 of the first device 110. The method comprises the following steps:
step 601: the front-end network card 114A of the first device 110 receives a second packet from the client device 200, where the second packet carries a data reading command, and the data reading command is used to request to read second target data. The data read command carries a logical address of the second target data. For the description of the logical address of the second target data, reference may be made to the related description in step 301, and details are not described here.
Step 602: the front-end network card 114A of the first device 110 parses the second packet, and obtains the logical address of the second target data from the data reading command.
The way for the front-end network card 114A of the first device 110 to analyze the first packet to obtain the logical address of the second target data and the second target data may refer to the embodiment shown in fig. 3A, and details are not repeated here.
Step 603: after acquiring the logical address of the second target data, the front-end network card 114A of the first device 110 performs home lookup according to the logical address of the second target data, and determines a home node of the second target data. The home finding refers to determining a device where the second target data is located, where the device is a home node of the second target data. This step may be performed by the index acceleration module 114-3 in the front-end network card 114A of the first device 110. The execution manner of step 603 is similar to that of step 403, and reference may be made to the foregoing specifically, and details are not described here again.
The embodiment of the present application does not limit the way in which the front-end network card 114A of the first device 110 performs the home lookup. Two of these ways are listed below.
The front-end network card 114A of the first device 110 determines whether the home node of the second target data is the first device 110. If the home node of the second target data is the first device 110, the front-end network card 114A of the first device 110 performs steps 604 to 605. If the home node of the second target data is not the first device 110, and here, taking the home node of the second target data as the second device 120 as an example, the front-end network card 114A of the first device 110 may perform steps 606 to 609.
Step 604: the front-end network card 114A of the first device 110 obtains the second target data from the memory of the first device 110 according to the logical address of the second target data.
In the process that the front-end network card 114A of the first device 110 acquires the second target data from the memory of the first device 110 according to the logical address of the second target data, the front-end network card 114A of the first device 110 searches for the index information by using the logical address of the second target data, and acquires the second metadata of the second target data.
When the index information directly records the logical address of the data and the corresponding relationship between the second metadata of the data, the front-end network card 114A of the first device 110 (e.g., the index acceleration module 114-3 in the front-end network card 114A of the first device 110) may directly search the index information for the second metadata corresponding to the logical address of the data or the address of the second metadata. If the front-end network card 114A of the first device 110 obtains the address of the second metadata, the front-end network card 114A of the first device 110 (for example, the index acceleration module 114-3 in the front-end network card 114A of the first device 110) may obtain the second metadata according to the address of the second metadata.
If the front-end network card 114A of the first device 110 obtains the second metadata, the front-end network card 114A of the first device 110 may determine a physical address of the second target data according to the second metadata, and obtain the second target data according to the physical address of the target.
When the second target data is stored in the memory 113 of the first device 110, the front-end network card 114A of the first device 110 may directly read the second target data from the memory 113 of the first device 110. When the second target data is stored in the hard disk 115 of the first device 110, the front-end network card 114A of the first device 110 may directly read the second target data from the hard disk 115 of the first device 110, or may transfer the second target data from the hard disk 115 of the first device 110 to the memory 113 of the first device 110, and then read the second target data from the memory 113 of the first device 110.
Specifically, inside the front-end network card 114A of the first device 110, the index acceleration module 114-3 may obtain the second metadata of the second target data according to the logical address of the second target data, and the index acceleration module 114-3 obtains the second metadata of the second target data from the maintained index information according to the logical address of the second target data. If the corresponding relationship between the logical address of the data and the metadata of the data is recorded in the index information in a direct manner, the index acceleration module 114-3 may directly obtain the second metadata of the second target data or the address of the second metadata from the maintained index information according to the logical address of the second target data. If the corresponding relationship between the logical address of the data and the metadata of the data is recorded in the index information in an indirect manner, for example, in a key-value pair manner, the index acceleration module 114-3 may hash the logical address of the second target data to obtain a key, and then obtain the second metadata of the second target data or the address of the second metadata from the maintained index information according to the key.
After obtaining the second metadata, the index acceleration module 114-3 sends the second metadata to the read/write module 114-4. The read-write module 114-4 reads the second target data from the first device 110 according to the second metadata (the description of reading the first target data by the front-end network card 114A for reading the second target data from the first device 110 according to the second metadata is referred to by the read-write module 114-4, and is not repeated here).
Step 605: after obtaining the second target data, the front-end network card 114A of the first device 110 feeds back a data reading response to the client device 200, where the data reading response carries the second target data.
Step 606: if the home node of the second target data is the second device 120, the front-end network card 114A of the first device 110 obtains the second target data from the second device 120.
If the front-end network card 114A of the first device 110 determines that the home node of the second target data is the second device 120, the front-end network card 114A of the first device 110 may directly interact with the second device 120 (e.g., the network card 114 of the second device 120), without intervention of the processor 112 of the first device 110.
The front-end network card 114A of the first device 110 may process the data reading command, and send the processed data reading command to the network card 114 of the second device 120, where the network card 114 of the second device 120 may be a switching network card 114B of the second device 120.
The process of the front-end network card 114A of the first device 110 obtaining the second target data from the second device 120 includes the following steps:
step 1, the front-end network card 114A of the first device 110 may process the data reading command to generate a data command that needs to be transmitted to the second device 120.
The processing of the data read command by the front-end network card 114A of the first device 110 includes: the destination address of the data read command is updated to the address of the second device 120, and the address of the first device 110 is carried in the data read command. There are many ways to carry the address of the first device 110 in the data read command. For example, the source address of the data read command may be updated to the address of the first device 110, and the address of the first device 110 may be written, for example, in the IB payload of the data read command.
Step 2, the front-end network card 114A of the first device 110 sends the data command to the switching network card 114B of the second device 120.
When the front-end network card 114A of the first device 110 sends the data command to the switching network card 114B of the second device 120, if there are a plurality of switching network cards 114B in the first device 110 connected to the second device 120, the front-end network card 114A of the first device 110 may further select a switching network card 114B from the plurality of switching network cards 114B, and send the processed data reading command to the selected switching network card 114B. Then, the selected switching network card 114B sends the processed data reading command to the switching network card 124B of the second device 120 according to the destination address of the data reading command.
The embodiment of the present application does not limit the manner in which the front-end network card 114A of the first device 110 selects the switching network card 114B from the plurality of switching network cards 114B. For example, the front-end network card 114A of the first device 110 may randomly select one of the switching network cards 114B. For another example, the front-end network card 114A of the first device 110 may select one switching network card 114B from the plurality of switching network cards 114B based on a load balancing policy.
It should be noted that, when the data volume of the second target data is large, the front-end network card 114A of the first device 110 may split the second target data into a plurality of small sub-data, and the front-end network of the first device 110 may generate a plurality of data commands, where each data command carries one sub-data. The front-end network card 114A of the first device 110 may send a plurality of data commands to the switching network card 114B of the second device 120. If there are multiple switching network cards 114B connected to the second device 120 in the first device 110, the front-end network card 114A of the first device 110 may select one switching network card 114B from the multiple switching network cards 114B to send the multiple data commands.
Step 3, after receiving the data command, the switching network card 124B of the second device 120 acquires the second target data from the memory of the second device 120 according to the logical address of the second target data carried in the data command.
The manner in which the switching network card 124B of the second device 120 obtains the second target data from the memory of the second device 120 according to the logical address of the second target data carried in the data command is similar to the manner in which the front-end network card 114A of the first device 110 obtains the second target data from the memory of the second device 120 according to the logical address of the second target data carried in the data reading command in step 604, which may specifically refer to the foregoing description, and is not described herein again.
Step 4, after the switching network card 114B of the second device 120 acquires the second target data, a data response may be sent to the switching network card 114B of the first device 110, where the data response carries the second target data. The destination address of the data response is the address of the first device 110.
Step 5, the switching network card 114B of the first device 110 analyzes the data response, acquires the second target data from the data response, and sends the second target data to the front-end network card 114A of the first device 110.
The front-end network card 114A of the first device 110 may execute step 607 after acquiring the second target data.
Step 607: after obtaining the second target data, the front-end network card 114A of the first device 110 feeds back a data reading response to the client device 200, where the data reading response carries the second target data.
As shown in fig. 7, which is a schematic diagram of data transmission in the storage system under the data write command, the client device 200 may send a second message to the first device 110 in the storage system through the network card. After receiving the second packet, the front-end network card 114A of the first device 110 parses the second packet. Specifically, inside the front-end network card 114A of the first device 110, the second acceleration protocol module 114-1 may parse the data read command from the second packet. The first acceleration protocol module 114-2 may parse the logical address of the second target data from the data read command. The index acceleration module 114-3 in the front-end network card 114A of the first device 110 may then determine a home node for the second target data. If the home node of the second target data is the first device 110, that is, the second target data is stored locally, the reading module in the front-end network card 114A of the first device 110 may obtain the second target data from a local storage (e.g., the memory 113 of the first device 110 or the hard disk 115 of the first device 110). The front-end network card 114A of the first device 110 may directly obtain the second target data from the local memory based on Direct Memory Access (DMA). If the home node of the second target data is the second device 120, that is, the second target data storage remote end, the reading module in the front-end network card 114A of the first device 110 may generate a data command according to the data reading command, and send the data command to the switching network card 114B of the first device 110. The switching network card 114B of the first device 110, upon receiving the data command, may send the data command to the switching network card 124B of the second device 120. The switching network card 114B of the first device 110 may SEND data commands to the switching network card 124B of the second device 120 via RDMA SEND operations. After obtaining the data command, the switching network card 124B of the second device 120 may obtain the second target data from a local storage (e.g., the memory 123 of the second device 120 or the hard disk 115 of the second device 120). The switching network card 114B of the second device 120 may retrieve the second target data directly from the local memory based on DMA. After the switching network card 114B of the second device 120 acquires the second target data, it may feed back a data response carrying the second target data to the switching network card 114B of the first device 110. After receiving the data response, the switching network card 114B of the first device 110 may send the second target data to the front-end network card 114A of the first device 110. The front-end network card 114A of the first device 110 may feed back a data read response to the client device 200, where the data read response carries the second target data. The front-end network card 114A of the first device 110 may feed back a data read response to the client device 200 via an RDMA unilateral operation (e.g., RDMA WRITE operation).
Based on the same inventive concept as the method embodiment, an embodiment of the present application further provides a network card, where the network card is configured to execute a method executed by the front-end network card 114A or the network card 114 of the first device 110 in the method embodiments shown in fig. 3A, 4, 5, 6, and 7, and related features may refer to the method embodiments, which are not described herein again. As shown in fig. 8, the network card 114 includes a processor 1141 and an interface 1142, and the specific form of the processor 1141 may refer to the relevant description of the processor 112, which is not described herein again. The processor 1141 enables the front-end network card 114A or the network card 114 to have a capability of executing a method executed by the front-end network card 114A or the network card 114 of the first device 110 in the method embodiments shown in fig. 3A, 4, 5, 6, and 7, and the processor 1141 may further send or receive information through the interface 1142, for example, receive a first message or a second message, or send a data access response, or a mirror data write command.
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded or executed on a computer, cause the flow or functions according to embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more collections of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a Solid State Drive (SSD).
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications can be made in the present application without departing from the scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to include such modifications and variations.

Claims (32)

1. A data access system comprising a client device and a first storage device;
the client device is configured to send a first packet to the first storage device to write target data in the first storage device, where the first packet includes a logical address of the target data;
the front-end network card of the first storage device is configured to write the target data into the first storage device, generate metadata, and record a correspondence between a logical address of the target data and the metadata, where the metadata is used to indicate a physical address of the target data stored in the first storage device.
2. The data access system of claim 1, wherein the first storage device includes a memory having index information stored therein,
the front-end network card of the first storage device is specifically configured to:
and creating a corresponding relation between a key and a value in the index information, wherein the value is the metadata or the address of the metadata, and the key is determined according to the logical address of the target data.
3. The data access system of claim 1 or 2, wherein the front-end network card of the first storage device is further configured to:
receiving the first message;
analyzing the first message to obtain a data write command based on a Remote Direct Memory Access (RDMA) protocol;
and analyzing the data writing command to acquire the logic address of the target data carried by the data writing command.
4. A data access system according to any one of claims 1 to 3, wherein the first message is based on any one of the following protocols:
infiniband IB, remote direct memory access (RoCE) based on the aggregation Ethernet and network wide-area remote direct memory access protocol iWARP.
5. The data access system of any one of claims 1 to 4, wherein the data access system further comprises a second storage device, the front-end network card of the first storage device further configured to:
sending a mirror data write command to the second storage device, where the mirror data write command is used to request to write a copy of the target data, and the mirror data write command includes the copy of the target data and a logical address of the target data.
6. The data access system of claim 5, wherein the first storage device comprises at least two switching network cards, the first storage device is connected to the second storage device through the at least two switching network cards, and a front-end network card of the first storage device is specifically configured to:
selecting a switching network card from the at least two switching network cards based on a load balancing strategy;
and sending the mirror image data write-in command to the second storage device through the selected switching network card.
7. The data access system of any one of claims 1 to 6,
the client device is further configured to send a second packet to the first storage device to read the target data, where the second packet includes a logical address of the target data;
the front-end network card of the first storage device is further configured to obtain the metadata according to the logical address of the target data, and obtain the target data from the first storage device according to the metadata.
8. The data access system of claim 7, wherein the front-end network card of the first storage device is further to:
receiving the second message;
parsing the second packet to obtain an RDMA protocol-based data read command;
and analyzing the data reading command to acquire the logic address of the target data carried by the data reading command.
9. The data access system of claim 7 or 8, wherein the front-end network card of the first storage device, prior to obtaining the metadata from the logical address of the target data, is further to:
determining, according to the logical address of the target data, that a home node confirming the first target data is the first storage device.
10. The data access system of claim 6, wherein the front-end network card of the first storage device and the switching network card of the first storage device are deployed collectively.
11. A method of data access, characterized in that,
a front-end network card of a first storage device receives a first message from a client device, wherein the first message is used for requesting to write target data in the first storage device;
a front-end network card of the first storage device acquires a logical address of the target data from the first message; writing the target data into the first storage device, generating metadata, and recording a corresponding relation between a logical address of the target data and the metadata, wherein the metadata is used for describing a physical address of the target data stored in the first storage device.
12. The method of claim 11, wherein the front-end network card of the first storage device records a correspondence between the logical address of the target data and the metadata, and comprises:
the method comprises the steps that a key and a value corresponding relation are created by index information of a front-end network card of the first storage device in a memory of the first storage device, wherein the value is the metadata or the address of the metadata, and the key is determined according to the logical address of the target data.
13. The method according to claim 11 or 12, wherein the acquiring, by the front-end network card of the first storage device, the logical address of the target data from the first packet includes:
the front-end network card of the first storage device analyzes the first message to obtain a data write command based on a Remote Direct Memory Access (RDMA) protocol;
and the front-end network card of the first storage device analyzes the data writing command to acquire the logical address of the target data carried by the data writing command.
14. The method according to any of claims 11 to 13, wherein the first message is based on any of the following protocols:
infiniband IB, remote direct memory access (RoCE) based on the aggregation Ethernet and network wide-area remote direct memory access protocol iWARP.
15. The method of any of claims 11 to 14, further comprising:
the front-end network card of the first storage device sends a mirror image data writing command to the second storage device, the mirror image data writing command is used for requesting to write a copy of the target data, and the mirror image data writing command comprises the copy of the target data and a logical address of the target data.
16. The method of claim 15, wherein the first storage device comprises at least two switching network cards, the first storage device is connected to the second storage device through the at least two switching network cards, and a front-end network card of the first storage device sends a mirror data write command to the second storage device, comprising:
the front-end network card of the first storage device selects a switching network card from the at least two switching network cards based on a load balancing strategy;
and the front-end network card of the first storage device sends the mirror image data write-in command to the second device through the selected switching network card.
17. The method of any of claims 11 to 16, further comprising:
a front-end network card of the first storage device receives a second message from the client device, wherein the second message is used for requesting to read the target data;
the front-end network card of the first storage device acquires the logical address of the target data from the second message;
the front-end network card of the first storage device acquires metadata according to the logical address of the target data, wherein the metadata is used for describing the physical address of the target data stored in the first storage device;
and the front-end network card of the first storage equipment acquires the target data from the first storage equipment according to the metadata.
18. The method of claim 17, wherein the obtaining, by the front-end network card of the first storage device, the logical address of the target data from the second message comprises:
the front-end network card of the first storage device analyzes the second message to obtain a data reading command based on an RDMA protocol;
and the front-end network card of the first storage device analyzes the data reading command to acquire the logical address of the target data carried by the data reading command.
19. The method of claim 17 or 18, wherein the front-end network card of the first storage device, prior to obtaining metadata from the logical address of the target data, further comprises:
and the front-end network card of the first storage equipment determines to confirm that the home node of the target data is the first storage equipment according to the logical address of the target data.
20. The method of claim 16, wherein the front-end network card of the first storage device and the switching network card of the first storage device are deployed collectively.
21. A network card is characterized in that the network card comprises a first protocol acceleration module, an index acceleration module and a read-write module;
the first protocol acceleration module is used for analyzing a data write command, and the data write command is used for requesting to write target data; acquiring a logical address of the target data from the data write command;
the read-write module is used for writing the target data into a storage device;
the index acceleration module is configured to generate metadata, and record a corresponding relationship between a logical address of the target data and the metadata, where the metadata is used to describe a physical address of the target data stored in the storage device.
22. The network card of claim 21, wherein the network card further comprises a second protocol acceleration module;
the second protocol acceleration module is configured to analyze a first packet from a client device to obtain the data write command, where the data write command is based on an RDMA protocol, and the first packet is based on an infiniband IB, an aggregate ethernet-based remote direct memory access (RoCE), or a network wide-area remote direct memory access protocol (iWARP).
23. The network card of claim 21 or 22, wherein the index acceleration module is specifically configured to:
creating a corresponding relation between a key and a value in index information in a memory of the storage device, wherein the value is the metadata or the address of the metadata, and the key is determined according to the logical address of the target data.
24. The network card of any one of claims 21-23, wherein the read-write module is further configured to:
sending a mirror data write command to mirror equipment, wherein the mirror data write command is used for requesting to write a copy of the target data, and the mirror data write command comprises the copy of the target data and a logical address of the target data.
25. The network card of claim 24, wherein there are at least two switching network cards in the storage device that are connected to the mirroring device, and the read-write module is specifically configured to:
selecting a switching network card from the at least two switching network cards based on a load balancing strategy;
and sending the mirror image data write-in command to the network card of the mirror image device through the selected exchange network card.
26. The network card of any one of claims 21-25,
the first protocol acceleration module is further configured to parse a data read command, where the data read command is used to request to read the target data; acquiring a logical address of the target data from the data reading command;
the index acceleration module is further configured to obtain the metadata according to a logical address of the target data, where the metadata is used to describe a physical address of the target data stored in the storage device;
the reading module is further configured to obtain the target data from the storage device according to the metadata.
27. The network card of claim 29, wherein the index acceleration module is further configured to:
and determining that the home node of the target data is the storage device according to the logical address of the target data.
28. The network card of claim 26 or 27,
the second protocol acceleration module is further configured to parse a second packet from the client device to obtain the data reading command, where the data reading command is based on an RDMA protocol, and the second packet is based on an infiniband IB, an aggregate ethernet-based remote direct memory access RoCE, and a network wide-area remote direct memory access protocol iWARP protocol.
29. The network card according to any one of claims 21 to 28, wherein the first protocol acceleration module is implemented by any one of or a combination of an application specific integrated circuit ASIC, a field programmable gate array FPGA, an artificial intelligence AI chip, a system on chip SoC, a complex programmable logic device CPLD, and a graphics processor GPU; the index acceleration module is realized by any one or the combination of an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), an Artificial Intelligence (AI) chip, a system on chip (SoC), a Complex Programmable Logic Device (CPLD) and a Graphic Processor (GPU); the read-write module is realized by any one or the combination of an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), an Artificial Intelligent (AI) chip, a system on chip (SoC), a Complex Programmable Logic Device (CPLD) and a Graphic Processor (GPU).
30. A network card is characterized in that the network card comprises a processor and an interface;
the interface is used for data transmission;
the processor is arranged to perform a data access method as claimed in any one of claims 11 to 20.
31. A storage device comprising a network card for performing the data access method of any one of claims 11 to 20.
32. The storage device of claim 31, wherein the storage device further comprises a memory for storing index information indicating a correspondence between a logical address of the target data and the metadata.
CN202110697375.8A 2021-04-14 2021-06-23 Data access system, method, equipment and network card Pending CN115270033A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP22787387.4A EP4318251A1 (en) 2021-04-14 2022-03-31 Data access system and method, and device and network card
PCT/CN2022/084322 WO2022218160A1 (en) 2021-04-14 2022-03-31 Data access system and method, and device and network card
US18/485,942 US20240039995A1 (en) 2021-04-14 2023-10-12 Data access system and method, device, and network adapter

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110399947 2021-04-14
CN2021103999474 2021-04-14

Publications (1)

Publication Number Publication Date
CN115270033A true CN115270033A (en) 2022-11-01

Family

ID=83744983

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110697375.8A Pending CN115270033A (en) 2021-04-14 2021-06-23 Data access system, method, equipment and network card

Country Status (1)

Country Link
CN (1) CN115270033A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116107516A (en) * 2023-04-10 2023-05-12 苏州浪潮智能科技有限公司 Data writing method and device, solid state disk, electronic equipment and storage medium
CN116886719A (en) * 2023-09-05 2023-10-13 苏州浪潮智能科技有限公司 Data processing method and device of storage system, equipment and medium
CN117193669A (en) * 2023-11-06 2023-12-08 格创通信(浙江)有限公司 Discrete storage method, device and equipment for message descriptors and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116107516A (en) * 2023-04-10 2023-05-12 苏州浪潮智能科技有限公司 Data writing method and device, solid state disk, electronic equipment and storage medium
CN116107516B (en) * 2023-04-10 2023-07-11 苏州浪潮智能科技有限公司 Data writing method and device, solid state disk, electronic equipment and storage medium
CN116886719A (en) * 2023-09-05 2023-10-13 苏州浪潮智能科技有限公司 Data processing method and device of storage system, equipment and medium
CN116886719B (en) * 2023-09-05 2024-01-23 苏州浪潮智能科技有限公司 Data processing method and device of storage system, equipment and medium
CN117193669A (en) * 2023-11-06 2023-12-08 格创通信(浙江)有限公司 Discrete storage method, device and equipment for message descriptors and storage medium
CN117193669B (en) * 2023-11-06 2024-02-06 格创通信(浙江)有限公司 Discrete storage method, device and equipment for message descriptors and storage medium

Similar Documents

Publication Publication Date Title
EP4318251A1 (en) Data access system and method, and device and network card
CN115270033A (en) Data access system, method, equipment and network card
US11403227B2 (en) Data storage method and apparatus, and server
US9917884B2 (en) File transmission method, apparatus, and distributed cluster file system
CN114201421B (en) Data stream processing method, storage control node and readable storage medium
US20150127649A1 (en) Efficient implementations for mapreduce systems
US8725879B2 (en) Network interface device
JP6724252B2 (en) Data processing method, storage system and switching device
CN110908600B (en) Data access method and device and first computing equipment
CN109564502B (en) Processing method and device applied to access request in storage device
WO2020199760A1 (en) Data storage method, memory and server
CN113014662A (en) Data processing method and storage system based on NVMe-oF protocol
CN113179327A (en) High-concurrency protocol stack unloading method, equipment and medium based on high-capacity memory
CN107920101A (en) A kind of file access method, device, system and electronic equipment
CN115202573A (en) Data storage system and method
US20240126847A1 (en) Authentication method and apparatus, and storage system
US10545667B1 (en) Dynamic data partitioning for stateless request routing
CN110798366B (en) Task logic processing method, device and equipment
CN113411363A (en) Uploading method of image file, related equipment and computer storage medium
WO2023000770A1 (en) Method and apparatus for processing access request, and storage device and storage medium
WO2022218218A1 (en) Method and apparatus for processing data, reduction server, and mapping server
CN106790521B (en) System and method for distributed networking by using node equipment based on FTP
CN107615259A (en) A kind of data processing method and system
WO2024041140A1 (en) Data processing method, accelerator, and computing device
WO2024066904A1 (en) Container creation method, system, and node

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination