WO2019000423A1 - Data storage method and device - Google Patents

Data storage method and device Download PDF

Info

Publication number
WO2019000423A1
WO2019000423A1 PCT/CN2017/091240 CN2017091240W WO2019000423A1 WO 2019000423 A1 WO2019000423 A1 WO 2019000423A1 CN 2017091240 W CN2017091240 W CN 2017091240W WO 2019000423 A1 WO2019000423 A1 WO 2019000423A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
data
write request
memory
data object
Prior art date
Application number
PCT/CN2017/091240
Other languages
French (fr)
Chinese (zh)
Inventor
于群
徐君
陈荔城
黄永兵
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201780087416.3A priority Critical patent/CN110337633A/en
Priority to PCT/CN2017/091240 priority patent/WO2019000423A1/en
Publication of WO2019000423A1 publication Critical patent/WO2019000423A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data storage method and device, used to reduce the time required by a distributed storage system to respond to a client with respect to a write request. In the method, after a network adapter in a first node determines that the first node and at least one secondary node intending to store first data have both buffered a first write request comprising the first data, a response message is returned to a client informing of the success of the buffering operation. The method not only decreases the time required by the distributed storage system to respond to the client, but also reduces the time consumed by the node to store the first data, thereby maximally reducing the time required by the distributed storage system to respond to the client with respect to the first write request.

Description

一种数据存储方法及设备Data storage method and device 技术领域Technical field
本申请涉及存储技术领域,尤其涉及一种数据存储方法及设备。The present application relates to the field of storage technologies, and in particular, to a data storage method and device.
背景技术Background technique
目前,分布式存储系统有Hadoop分布式文件系统(Hadoop Distributed File System,HDFS)、Ceph等。为了保证数据可靠性,分布式存储系统通常采用副本机制提升系统存储的数据的可靠性,将数据保存到所述分布式存储系统中的多个存储节点。其中,所述多个存储节点中包括一个主节点和至少一个从节点。Currently, distributed storage systems include the Hadoop Distributed File System (HDFS), Ceph, and the like. In order to ensure data reliability, a distributed storage system usually adopts a copy mechanism to improve the reliability of data stored in the system, and saves the data to a plurality of storage nodes in the distributed storage system. The plurality of storage nodes include one primary node and at least one secondary node.
在客户端将数据存储到分布式存储系统的过程中,待存储所述数据的每个存储节点(例如,主节点或从节点)均需要在接收到客户端的写请求后,将所述写请求存储到该存储节点的内存,然后再将所述写请求中数据保存至该存储节点的磁盘中;所述主节点在确定所述主节点和每个从节点均成功将所述数据保存至磁盘后,向所述客户端返回成功响应消息,从而通知所述客户端所述主节点和所述至少一个从节点均已经成功存储所述数据。In the process of storing data in the distributed storage system by the client, each storage node (for example, a master node or a slave node) to store the data needs to write the write request after receiving the write request of the client. Storing to the memory of the storage node, and then saving the data in the write request to the disk of the storage node; the master node determines that the master node and each slave node successfully save the data to the disk Thereafter, a success response message is returned to the client, thereby notifying the client that the primary node and the at least one secondary node have successfully stored the data.
由于任一个存储节点将所述写请求中的所述数据持久化存储时,均需要先将所述写请求传输到内存,再将内存中的所述写请求中的所述数据存储到本地的磁盘。因此,所述存储节点将所述数据保存到磁盘的过程的消耗的时间较长。并且,主节点或从节点需要在所述写请求中的所述数据存储成功后才能向另一个从节点发送所述写请求,因此,导致所述分布式存储系统接收到写请求后,响应客户端的时间较长。Since any one of the storage nodes stores the data in the write request for persistence, the write request needs to be first transferred to the memory, and then the data in the write request in the memory is stored locally. Disk. Therefore, the process of saving the data to the disk by the storage node takes a long time. And, the master node or the slave node needs to send the write request to another slave node after the data storage in the write request is successful, thereby causing the distributed storage system to respond to the client after receiving the write request. The end of the time is longer.
发明内容Summary of the invention
本申请实施例提供了一种数据存储方法及设备,以减少分布式存储系统向客户端响应写请求的时间。The embodiment of the present application provides a data storage method and device to reduce the time for a distributed storage system to respond to a write request to a client.
第一方面,本申请实施例提供了一种数据存储方法,该方法应用于分布式存储系统,其中,所述分布式存储系统中包含多个存储节点,所述多个存储节点中的第一节点包括网卡,所述网卡中包含有第一存储器,所述第一存储器为非易失性存储器,该方法包括以下步骤:In a first aspect, the embodiment of the present application provides a data storage method, where the method is applied to a distributed storage system, where the distributed storage system includes multiple storage nodes, and the first of the multiple storage nodes The node includes a network card, the network card includes a first memory, and the first memory is a non-volatile memory, and the method includes the following steps:
所述网卡将接收的第一写请求缓存于所述第一存储器后,将所述第一写请求发送给待存储所述第一数据的至少一个从节点,以使所述至少一个从节点缓存所述第一写请求;其中,所述第一写请求中携带有待存储的第一数据,所述第一节点为待存储所述第一数据的主节点;所述网卡在接收所述至少一个从节点返回的第一缓存成功消息后,向客户端返回成功响应消息,其中,所述第一缓存成功消息用于指示所述至少一个从节点已经成功缓存所述第一写请求,所述成功响应消息用于指示所述第一数据已经被存储于待存储所述第一数据的主节点以及所述至少一个从节点中。After the network card caches the received first write request in the first memory, sends the first write request to at least one slave node that is to store the first data, so that the at least one slave node caches The first write request, wherein the first write request carries first data to be stored, the first node is a primary node to store the first data, and the network card is receiving the at least one After the first cache success message returned by the node, the success response message is returned to the client, where the first cache success message is used to indicate that the at least one slave node has successfully cached the first write request, the success The response message is used to indicate that the first data has been stored in a primary node to which the first data is to be stored and the at least one secondary node.
通过上述方法,第一节点中的网卡在确定所述第一节点以及待存储第一数据的至少一个从节点均已经成功缓存包含所述第一数据的第一写请求后,即向客户端返回成功响应消息。因此,该方法不仅可以减少所述分布式存储系统响应所述客户端的时间中由所述第一 节点保存所述第一数据所消耗的时间,还可以减少由从节点保存所述第一数据所消耗的时间,从而可以最大限度的降低所述分布式存储系统向客户端响应所述第一写请求的时间。Through the above method, the network card in the first node returns to the client after determining that the first node and at least one slave node that is to store the first data have successfully cached the first write request that includes the first data. Successful response message. Therefore, the method can not only reduce the time that the distributed storage system responds to the client by the first The time taken by the node to save the first data may also reduce the time consumed by the slave node to save the first data, so that the distributed storage system may be minimized to respond to the first write to the client. The time of the request.
在一个可能的设计中,所述第一写请求中还包含所述第一数据所属的数据对象的标识,在该情况下,所述网卡在将所述第一写请求发送给待存储所述第一数据的至少一个从节点之前,还可以通过以下步骤,确定所述第一节点为待存储所述第一数据的主节点(即存储所述数据对象的主节点),以及在所述分布式存储系统中确定待存储所述第一数据的所述至少一个从节点(即存储所述数据对象的至少一个从节点):In a possible design, the first write request further includes an identifier of the data object to which the first data belongs, in which case the network card sends the first write request to the to-be-stored Before the at least one slave node of the first data, the first node may be determined as a master node to store the first data (ie, a master node storing the data object), and in the distribution Determining, in the storage system, the at least one slave node to store the first data (ie, storing at least one slave node of the data object):
所述网卡按照设置的数据分布算法(例如CRUSH算法等)对所述数据对象的标识进行计算,获得存储所述数据对象的多个存储节点,其中,存储所述数据对象的多个存储节点包括所述第一节点;The network card calculates the identifier of the data object according to a set data distribution algorithm (for example, a CRUSH algorithm, etc.), and obtains a plurality of storage nodes that store the data object, where the multiple storage nodes that store the data object include The first node;
所述网卡根据设定的主节点选择规则,确定所述第一节点为存储所述数据对象的主节点,并确定所述存储所述数据对象的多个存储节点中除第一节点之外的其他存储节点为存储所述数据对象的从节点。Determining, by the network card, the first node as a primary node storing the data object according to the set primary node selection rule, and determining, among the plurality of storage nodes storing the data object, other than the first node The other storage nodes are slave nodes that store the data objects.
通过该设计,所述网卡可以确定所述第一节点为存储所述数据对象的主节点以及存储所述数据对象的至少一个从节点。With this design, the network card can determine that the first node is a primary node that stores the data object and at least one secondary node that stores the data object.
在一个可能的设计中,所述网卡还可以根据数据对象副本分布拓扑结构信息,确定存储所述数据对象的多个存储节点,其中,所述拓扑结构信息中包含所述分布式存储系统中存储所述数据对象的标识所对应的数据对象的所有节点,以及这些节点的连接情况。In a possible design, the network card may further determine, according to the data object replica distribution topology information, a plurality of storage nodes that store the data object, where the topology information includes the storage in the distributed storage system. All nodes of the data object corresponding to the identifier of the data object, and the connection status of the nodes.
该设计可以保证所述网卡确定的存储所述数据对象的多个存储节点可以进行通信,即可以传输所述第一写请求。The design can ensure that the plurality of storage nodes of the data object determined by the network card can communicate, that is, the first write request can be transmitted.
在一个可能的设计中,所述第一节点还包括处理器和第二存储器,所述第二存储器为非易失性存储器,(例如所述第一节点的内存或外存),所述方法还包括以下步骤:In one possible design, the first node further includes a processor and a second memory, and the second memory is a non-volatile memory (eg, memory or external memory of the first node), the method It also includes the following steps:
所述处理器从所述第一存储器中读取所述第一写请求,并根据所述第一写请求,将所述第一数据写入所述第二存储器中、所述数据对象的标识对应的所述数据对象中。The processor reads the first write request from the first memory, and writes the first data into the second memory according to the first write request, the identifier of the data object Corresponding to the data object.
通过该设计,所述第一节点可以成功存储所述第一数据到所述数据对象中。With this design, the first node can successfully store the first data into the data object.
在一个可能的设计中,所述处理器从所述第一存储器中读取包含所述第一写请求在内的多个写请求。例如所述处理器可以将在所述第一存储器中存储的未读取的写请求达到设定数量时,读取所述设定数量的写请求(包含所述第一写请求);所述处理器可以按照设定的读取周期读取所述第二存储器中存储的未读取的写请求(包含所述第一写请求)。In one possible design, the processor reads a plurality of write requests including the first write request from the first memory. For example, the processor may read the set number of write requests (including the first write request) when the unread write request stored in the first memory reaches a set number; The processor may read the unread write request (including the first write request) stored in the second memory according to a set read cycle.
通过该设计,可以节省所述处理器的信令开销,以及降低了所述处理器执行读取写请求的次数,从而可以降低所述处理器的功耗。With this design, the signaling overhead of the processor can be saved, and the number of times the processor performs a read write request can be reduced, so that the power consumption of the processor can be reduced.
在一个可能的设计中,所述第一存储器按照所述数据对象的标识存储所述第一写请求;在该情况下,在所述处理器根据所述第一写请求将所述第一数据写入所述第二存储器中的所述数据对象之后,所述处理器还可以向所述网卡发送包含所述数据对象的标识的空间释放消息,这样,所述网卡根据所述数据对象的标识删除所述第一存储器中的所述第一写请求。In a possible design, the first memory stores the first write request according to the identifier of the data object; in this case, the first data is sent by the processor according to the first write request After writing the data object in the second memory, the processor may further send a space release message including the identifier of the data object to the network card, such that the network card is identified according to the identifier of the data object Deleting the first write request in the first memory.
通过上述方法,所述网卡可以释放所述第一写请求在所述第一存储器中占用的存储空间,以使所述第一存储器中的存储空间可以循环使用。Through the above method, the network card can release the storage space occupied by the first write request in the first memory, so that the storage space in the first memory can be recycled.
在一个可能的设计中,所述处理器在读取所述第一写请求后,通知所述网卡对所述第一写请求添加对应的已读标志;所述网卡在根据所述数据对象的标识删除所述第一存储器 中的所述第一写请求时,只删除包含所述数据对象的标识、且对应已读标志的所述第一写请求。In a possible design, after reading the first write request, the processor notifying the network card to add a corresponding read flag to the first write request; the network card is according to the data object. Identifying the deleted first memory In the first write request, only the first write request containing the identifier of the data object and corresponding to the read flag is deleted.
在一个可能的设计中,所述网卡在接收所述第一写请求后,确定所述第一写请求的接收时间戳,将所述第一写请求和对应的所述接收时间戳同时存储到所述第一存储器,其中,所述接收时间戳包含所述网卡接收所述第一写请求的时间;所述处理器在读取所述第一写请求时同时读取所述接收时间戳,并在所述空间释放消息中添加所述接收时间戳;所述网卡删除包含所述数据对象的标识、且对应所述接收时间戳的所述第一写请求。In a possible design, after receiving the first write request, the network card determines a receiving timestamp of the first write request, and stores the first write request and the corresponding receiving timestamp simultaneously The first memory, wherein the receiving timestamp includes a time when the network card receives the first write request; the processor simultaneously reads the receiving timestamp when reading the first write request, And adding the receiving timestamp to the space release message; the network card deleting the first write request that includes the identifier of the data object and corresponding to the receiving timestamp.
通过上述两种设计,所述网卡可以准确地删除已经被所述处理器读取的所述第一写请求,而避免误删除包含所述数据对象的标识且未被所述处理器读取的其他写请求,从而保证所述第一节点后续可以将所述其他写请求中的数据存储到所述第二存储器中。With the above two designs, the network card can accurately delete the first write request that has been read by the processor, and avoid erroneous deletion of the identifier containing the data object and is not read by the processor. Other write requests, thereby ensuring that the first node can subsequently store data in the other write requests into the second memory.
在一个可能的设计中,当第一节点作为待存储第二数据的从节点时,所述方法还包括:In a possible design, when the first node is a slave node of the second data to be stored, the method further includes:
所述网卡将接收的第二写请求缓存于所述第一存储器后,即向所述多个存储节点中的第二节点发送第二缓存成功消息,其中,所述第二写请求中携带有待存储的第二数据,所述第二缓存成功消息用于指示所述第一节点已经成功缓存所述第二写请求,所述第二节点为待存储所述第二数据的主节点。After the network card caches the received second write request in the first memory, the second cache success message is sent to the second node of the multiple storage nodes, where the second write request is carried in the second write request. The stored second data, the second cache success message is used to indicate that the first node has successfully cached the second write request, and the second node is a primary node to store the second data.
通过该设计,第一节点中的网卡在确定所述第一节点已经成功缓存包含所述第二数据的第二写请求后,即向所述第二节点反馈所述第二缓存成功消息。因此,该设计可以减少所述分布式存储系统响应所述客户端的时间中由所述第一节点保存所述第二数据所消耗的时间,从而可以最大限度的降低所述分布式存储系统向客户端响应所述第二写请求的时间。With the design, the network card in the first node feeds back the second cache success message to the second node after determining that the first node has successfully cached the second write request that includes the second data. Therefore, the design can reduce the time consumed by the distributed storage system to save the second data by the first node in response to the time of the client, thereby minimizing the distributed storage system to the client. The time at which the end responds to the second write request.
第二方面,本申请实施例还提供了一种存储节点,所述存储节点具有实现上述方法实例中第一节点行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。In a second aspect, the embodiment of the present application further provides a storage node, where the storage node has a function of implementing the behavior of the first node in the foregoing method instance. The functions may be implemented by hardware or by corresponding software implemented by hardware. The hardware or software includes one or more modules corresponding to the functions described above.
在一种可能的设计中,所述存储节点的结构中包括通信单元(其中包括第一存储单元)、处理单元和第二存储单元,这些单元可以执行上述方法示例中的相应功能,具体参见方法示例中的详细描述,此处不做赘述。In a possible design, the structure of the storage node includes a communication unit (including a first storage unit), a processing unit, and a second storage unit, and the units may perform corresponding functions in the foregoing method examples, and specifically refer to the method. The detailed description in the examples is not described here.
第三方面,本申请实施例还提供了一种存储设备,所述存储设备为分布式存储系统中的一个存储节点,所述分布式存储系统中包括多个存储节点。所述存储设备具有实现上述方法实例中第一节点行为的功能。所述存储设备包括接口卡、处理器以及至少一个存储器。其中,所述接口卡中包括第一存储器,用于缓存写请求。所述至少一个存储器中包括用于存储数据的第二存储器。所述接口卡,以及所述处理器被配置为支持所述存储设备执行上述方法中相应的功能。所述第一存储器和所述第二存储器为非易失性存储器。In a third aspect, the embodiment of the present application further provides a storage device, where the storage device is a storage node in a distributed storage system, and the distributed storage system includes multiple storage nodes. The storage device has a function of implementing the behavior of the first node in the above method example. The storage device includes an interface card, a processor, and at least one memory. The interface card includes a first memory for buffering a write request. The at least one memory includes a second memory for storing data. The interface card, and the processor, are configured to support the storage device to perform a corresponding function in the above method. The first memory and the second memory are non-volatile memories.
第四方面,本申请实施例还提供了一种网卡,所述网卡具有实现上述方法实例中第一节点中的网卡的行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。In a fourth aspect, the embodiment of the present application further provides a network card, where the network card has a function of implementing a behavior of a network card in a first node in the foregoing method instance. The functions may be implemented by hardware or by corresponding software implemented by hardware. The hardware or software includes one or more modules corresponding to the functions described above.
在一种可能的设计中,所述网卡的结构中包括接收单元、发送单元和第一存储器,所述第一存储器为非易失性存储器,这些单元可以执行上述方法示例中的相应功能,具体参见方法示例中的详细描述,此处不做赘述。In a possible design, the structure of the network card includes a receiving unit, a sending unit, and a first memory, where the first memory is a non-volatile memory, and the units can perform corresponding functions in the foregoing method examples, specifically See the detailed description in the method example, which is not described here.
在一种可能的设计中,所述网卡的结构中包括通信接口、处理器以及至少一个存储器,所述至少一个存储器中包括第一存储器,所述第一存储器为非易失性存储器,所述处理器 被配置为支持所述网卡执行上述方法中相应的功能。所述至少一个存储器与所述处理器耦合,其保存所述处理器保存写请求时必要的程序指令(或应用程序)。In a possible design, the network card includes a communication interface, a processor, and at least one memory, the at least one memory includes a first memory, and the first memory is a non-volatile memory, Processor It is configured to support the network card to perform the corresponding functions in the above method. The at least one memory is coupled to the processor, which stores program instructions (or applications) necessary for the processor to save a write request.
第五方面,本申请实施例还提供了一种分布式存储系统,所述分布式存储系统中包括多个存储节点,每个存储节点具有上述方法实例中第一节点的行为的功能。In a fifth aspect, the embodiment of the present application further provides a distributed storage system, where the distributed storage system includes multiple storage nodes, and each storage node has the function of the behavior of the first node in the foregoing method instance.
第六方面,本申请实施例中还提供一种计算机存储介质,该存储介质中存储软件程序,该软件程序在被一个或多个处理器读取并执行时可实现所述第一方面所述的方法。In a sixth aspect, the embodiment of the present application further provides a computer storage medium, where the software program stores a software program, where the software program can be read and executed by one or more processors to implement the first aspect. Methods.
第七方面,本申请实施例还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面所述的方法。In a seventh aspect, the embodiment of the present application further provides a computer program product comprising instructions, when executed on a computer, causing the computer to perform the method described in the first aspect above.
本申请实施例中,第一节点中的网卡在确定所述第一节点以及待存储第一数据的至少一个从节点均已经成功缓存包含所述第一数据的第一写请求后,即向客户端返回成功响应消息。因此,该方法不仅可以减少所述分布式存储系统响应所述客户端的时间中由所述第一节点保存所述第一数据所消耗的时间,还可以减少由从节点保存所述第一数据所消耗的时间,从而可以最大限度的降低所述分布式存储系统向客户端响应所述第一写请求的时间。In the embodiment of the present application, the network card in the first node determines that the first node and the at least one slave node that is to store the first data have successfully cached the first write request that includes the first data, that is, to the client. The terminal returns a success response message. Therefore, the method can not only reduce the time consumed by the first storage node to save the first data in the time when the distributed storage system responds to the client, but also reduce the saving of the first data by the slave node. The time consumed so that the time that the distributed storage system responds to the first write request to the client can be minimized.
附图说明DRAWINGS
图1为本申请实施例提供的一种写请求的结构示意图;FIG. 1 is a schematic structural diagram of a write request according to an embodiment of the present application;
图2为本申请实施例提供的一种分布式存储系统的架构示意图;2 is a schematic structural diagram of a distributed storage system according to an embodiment of the present application;
图3为本申请实施例提供的一种数据存储方法的流程图;FIG. 3 is a flowchart of a data storage method according to an embodiment of the present application;
图4为本申请实施例提供的一种数据存储实例流程图;4 is a flowchart of an example of data storage provided by an embodiment of the present application;
图5为本申请实施例提供的一种存储节点的结构图;FIG. 5 is a structural diagram of a storage node according to an embodiment of the present application;
图6为本申请实施例提供的一种存储设备的结构图;FIG. 6 is a structural diagram of a storage device according to an embodiment of the present disclosure;
图7为本申请实施例提供的一种分布式存储系统的示意图。FIG. 7 is a schematic diagram of a distributed storage system according to an embodiment of the present application.
具体实施方式Detailed ways
本申请实施例提供一种数据存储方法及设备,用以减少分布式存储系统向客户端响应写请求的时间。其中,方法和设备是基于同一发明构思的,由于所述方法及所述设备解决问题的原理相似,因此所述设备与所述方法的实施可以相互参见,重复之处不再赘述。The embodiment of the present application provides a data storage method and device, which are used to reduce the time for a distributed storage system to respond to a write request to a client. The method and the device are based on the same inventive concept. Since the method and the device solve the problem are similar, the implementation of the device and the method can be referred to each other, and the repeated description is not repeated.
以下,对本申请中的部分用语进行解释说明,以便与本领域技术人员理解。Hereinafter, some of the terms in the present application will be explained to be understood by those skilled in the art.
1)、本申请实施例涉及的分布式存储系统,包含多个独立的存储节点。所述分布式存储系统将数据分散存储在多个独立的存储节点中,可以提高数据的可靠性和安全性。可选的,所述分布式存储系统可以为HDFS、Ceph等,本申请对此不作限定。1) A distributed storage system according to an embodiment of the present application includes a plurality of independent storage nodes. The distributed storage system stores data in a plurality of independent storage nodes, which can improve data reliability and security. Optionally, the distributed storage system may be an HDFS, a Ceph, or the like, which is not limited in this application.
2)、本申请实施例涉及的存储节点,为分布式存储系统中用于存储数据的设备,所述设备可以为服务器等具有分析处理功能和数据存储功能的设备。2) The storage node involved in the embodiment of the present application is a device for storing data in a distributed storage system, and the device may be a device having an analysis processing function and a data storage function, such as a server.
在采用副本机制存储数据的分布式存储系统中,每个数据都要存在多个副本分布到多个存储节点中。其中,存储到主节点的副本为主副本,存储到从节点的副本为从副本。In a distributed storage system that uses a copy mechanism to store data, each data must have multiple copies distributed across multiple storage nodes. Wherein, the copy stored to the master node is the master copy, and the copy stored to the slave node is the slave copy.
待存储数据的主节点用于存储该数据的主副本,与待存储该数据的所有从节点之间具有通信连接,且与发起包含该数据的写请求的客户端之间具有通信连接。所述客户端可以通过所述主节点,将该数据分发到所有待存储该数据的从节点,即所述主节点在接收到包含该数据的写请求后,需要将该数据保存到本地,另外也需要将所述写请求传输给待存储 该数据的从节点。The master node to store the data is used to store a master copy of the data, has a communication connection with all of the slave nodes to which the data is to be stored, and has a communication connection with the client that initiated the write request containing the data. The client may distribute the data to all the slave nodes that are to store the data through the master node, that is, the master node needs to save the data to the local after receiving the write request including the data, and Also need to transfer the write request to be stored The slave node of the data.
待存储数据的从节点用于存储该数据的从副本,与待存储该数据的主节点之间具有通信连接,在接收到主节点传输的包含该数据的写请求后,将该数据保存到本地。The slave node to store the data is used to store the slave copy of the data, and has a communication connection with the master node to store the data, and after receiving the write request containing the data transmitted by the master node, the data is saved to the local .
当分布式存储系统中的存储节点以数据对象为存储数据的基本单位时,待存储数据的主节点为存储该数据所属的数据对象的主节点;相应的,待存储数据的从节点为存储该数据所属数据对象的从节点。When a storage node in a distributed storage system uses a data object as a basic unit for storing data, the primary node of the data to be stored is a primary node that stores the data object to which the data belongs; correspondingly, the slave node to which the data is to be stored is stored. The slave node of the data object to which the data belongs.
需要说明的是,主节点和从节点是针对一个数据(或一个数据对象)而言的。例如,当分布式存储系统中的一存储节点中存储多个数据时,该存储节点可以为所述多个数据中一部分数据的主节点,为另一部分数据的从节点。又例如,当分布式存储系统中的一存储节点中存储多个数据对象时,该存储节点可以为所述多个数据对象中一部分数据对象的主节点,为另一部分数据对象的从节点。It should be noted that the master node and the slave node are for one data (or one data object). For example, when a plurality of data are stored in a storage node in a distributed storage system, the storage node may be a primary node of a part of the plurality of data and a slave node of another partial data. For another example, when a plurality of data objects are stored in a storage node in the distributed storage system, the storage node may be a master node of a part of the plurality of data objects and a slave node of another part of the data object.
3)、本申请实施例涉及的数据对象,可以作为分布式存储系统中存储节点存储数据的基本单位。数据对象作为对象存储中的基本数据存储管理单元,实际为数据和一组属性信息(meta data)的组合,其中,所述属性信息可以包括廉价磁盘冗余阵列(redundant array of inexpensive disks,RAID)参数、数据分布和服务质量参数等信息。因此,数据对象不仅可以存储数据,还可以实现对数据的分布结构进行组织和管理。3) The data object involved in the embodiment of the present application can be used as a basic unit for storing data in a storage node in a distributed storage system. The data object is a basic data storage management unit in the object storage, which is actually a combination of data and a set of meta data, wherein the attribute information may include a redundant array of inexpensive disks (RAID). Information such as parameters, data distribution, and quality of service parameters. Therefore, data objects can not only store data, but also organize and manage the distribution structure of data.
每个数据对象均对应唯一的标识,这样,存储节点或客户端无需通过数据的物理地址,而是可以通过数据对象的标识来检索确定对应的数据对象。Each data object corresponds to a unique identifier, so that the storage node or the client does not need to pass the physical address of the data, but can retrieve and determine the corresponding data object by the identifier of the data object.
4)、本申请实施例涉及的客户端(Client),又称为用户端,是为客户提供服务的程序,与分布式存储系统中的存储节点(服务器)相对应。所述客户端与所述存储节点之间需要建立通信连接,以保证所述程序可以正常运行,为客户提供服务。可选的,所述客户端可以安装在普通的客户机中。4) A client (client), which is also referred to as a client, is a program for providing services to a client, and corresponds to a storage node (server) in a distributed storage system. A communication connection needs to be established between the client and the storage node to ensure that the program can operate normally and provide services for customers. Optionally, the client can be installed in a normal client.
所述客户端作为写请求的发起设备,可以根据客户或程序的数据指示,生成包含待存储的数据的写请求,并将所述写请求发送给所述分布式存储系统中的存储节点,以实现将所述写请求中包含的数据存储到所述分布式存储系统中。The client, as an initiating device of the write request, may generate a write request including data to be stored according to the data indication of the client or the program, and send the write request to the storage node in the distributed storage system, Implementing storing data contained in the write request into the distributed storage system.
5)、本申请实施例涉及的第一存储器是非易失性存储器(non-volatile memory,NVM),作为网卡的缓存设置于网卡内部。所述第一存储器可以持久化存储写请求,且具有掉电不丢失的特征。可选的,所述第一存储器可以包含相变存储器(phase change memory,PCM)、电阻性随机存取存储器(resistive random access memory,RRAM)、磁性随机存取存储器(magnetic random access memory,MRAM)、自旋扭矩转换磁性随机存取存储器(spin-transfer torque MRAM,STT MRAM)、闪存(Flash Memory)等等。5) The first memory involved in the embodiment of the present application is a non-volatile memory (NVM), and is set as a cache of the network card inside the network card. The first memory can persist the storage write request and have features that are not lost when power is lost. Optionally, the first memory may include a phase change memory (PCM), a resistive random access memory (RRAM), and a magnetic random access memory (MRAM). Spin torque transfer magnetic random access memory (SPT-RAM), flash memory (Flash Memory) and so on.
6)、本申请实施例涉及的第二存储器也是非易失性存储器(non-volatile memory,NVM),可以为存储节点中的内存或外存。第二存储器可以持久化存储数据,且具有掉电不丢失的特征。可选的,所述第二存储器可以为PCM、RRAM、MRAM、STT MRAM、Flash Memory、磁盘、软盘、硬盘、磁光盘、光盘等。可选的,所述第二存储器可以以数据对象为存储数据的基本单位。6) The second memory involved in the embodiment of the present application is also a non-volatile memory (NVM), which may be a memory or an external memory in the storage node. The second memory can persist the stored data and has features that are not lost when powered down. Optionally, the second memory may be PCM, RRAM, MRAM, STT MRAM, Flash Memory, magnetic disk, floppy disk, hard disk, magneto-optical disk, optical disk, or the like. Optionally, the second memory may use a data object as a basic unit for storing data.
7)、本申请实施例涉及的存储节点中的网卡(Network Interface Card,NIC),用于与其他设备(包括其他存储节点或客户端)进行通信。例如,接收客户端或其他存储节点发送的写请求,向其他存储节点发送写请求或其他信息。所述网卡中设置有第一存储器,在所述网卡接收到一写请求后,可以将所述写请求存储到所述第一存储器中。 7) A network interface card (NIC) in a storage node according to an embodiment of the present application, for communicating with other devices (including other storage nodes or clients). For example, receiving a write request sent by a client or other storage node, sending a write request or other information to other storage nodes. The network card is provided with a first memory, and after the network card receives a write request, the write request may be stored in the first memory.
8)、本申请涉及的写请求,由客户端向分布式存储系统发起的,用于通知接收到所述写请求的存储节点存储所述写请求中包含的待存储的数据。8) The write request related to the present application is initiated by the client to the distributed storage system, and is used to notify the storage node that receives the write request to store the data to be stored included in the write request.
可选的,当分布式存储系统中的存储节点以数据对象为存储数据的基本单位时,所述写请求中还需要包括指示所述数据所属的数据对象的标识。这样,所述存储节点可以确定所述数据对象的标识所对应的所述数据对象,从而可以将所述写请求中的数据存储到所述数据对象中。Optionally, when the storage node in the distributed storage system uses the data object as a basic unit for storing data, the write request further includes an identifier indicating a data object to which the data belongs. In this way, the storage node can determine the data object corresponding to the identifier of the data object, so that the data in the write request can be stored into the data object.
可选的,当所述写请求中包含数据对象的标识时,所述写请求中还可以包含数据对象内偏移地址、数据长度,其中,所述数据对象内偏移地址为存储节点在所述数据对象内存储所述数据的起始地址相对于所述数据对象的起始地址的偏移量;所述数据长度为所述数据的长度。Optionally, when the write request includes the identifier of the data object, the write request may further include an offset address and a data length in the data object, where the offset address in the data object is a storage node. An offset of a starting address of the data stored relative to a starting address of the data object in the data object; the data length being a length of the data.
可选的,由于所述写请求需要通过网络传输给分布式存储系统,因此,所述写请求中还包含网络传输所需的网络协议栈层次:传输层、网络层、数据链路层。例如,图1所示的写请求中包含数据、数据语义层、传输层、网络层,以及数据链路层。其中,所述数据语义层中可以包含数据对象标识,还可以包含数据对象内偏移地址、数据长度等。Optionally, since the write request needs to be transmitted to the distributed storage system through the network, the write request further includes a network protocol stack level required for network transmission: a transport layer, a network layer, and a data link layer. For example, the write request shown in FIG. 1 includes data, a data semantic layer, a transport layer, a network layer, and a data link layer. The data semantic layer may include a data object identifier, and may also include an offset address, a data length, and the like in the data object.
9)本申请实施例涉及的缓存成功消息,用于指示该从节点已经成功缓存所述写请求,在待存储数据的从节点中的网卡将接收的写请求缓存到网卡的第一存储器后,由该网卡发送给待存储数据的主节点。9) The cache success message according to the embodiment of the present application is used to indicate that the slave node has successfully cached the write request, and after the network card in the slave node of the data to be stored buffers the received write request to the first memory of the network card, The NIC sends the primary node to the data to be stored.
10)、本申请实施例涉及的成功响应消息,为待存储数据的主节点在确定该主节点以及所有待存储数据的从节点已经成功缓存写请求后,向发起所述写请求的客户端发送。因此,所述成功响应消息为所述主节点中的网卡在接收到所有从节点发送的缓存成功消息后,向发起所述客户端发送的。10) The success response message related to the embodiment of the present application, after the master node that is to store the data determines that the master node and all the slave nodes to be stored have successfully cached the write request, send the request to the client that initiates the write request. . Therefore, the success response message is sent to the originating client after the network card in the primary node receives the cache success message sent by all the slave nodes.
由于该成功响应消息为所述主节点和所有从节点已经成功缓存所述写请求后发送的,因此,在所述主节点发送所述成功响应消息之前,所述主节点和从节点的网卡已将所述写请求成功缓存到网卡中的第一存储器中。而且,由于网卡中的第一存储器为非易失性存储器,因此,主节点和从节点后续可以根据保存的所述写请求,将所述写请求中的数据保存到自身的第二存储器。再者,成功缓存的所述写请求中包含所述数据。综上所述,该成功响应消息也可以用于指示写请求中的数据已经被存储于待存储所述数据的主节点以及所有从节点中。Since the success response message is sent after the master node and all the slave nodes have successfully cached the write request, the master node and the slave node's network card have been sent before the master node sends the success response message. The write request is successfully cached in the first memory in the network card. Moreover, since the first memory in the network card is a non-volatile memory, the master node and the slave node can subsequently save the data in the write request to its own second memory according to the saved write request. Furthermore, the data is successfully included in the write request that is successfully cached. In summary, the success response message can also be used to indicate that the data in the write request has been stored in the primary node and all the secondary nodes where the data is to be stored.
11)、本申请实施例涉及的多个,是指两个或两个以上。11) A plurality of the embodiments of the present application refer to two or more.
另外,需要理解的是,在本申请的描述中,“第一”、“第二”等词汇,仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。In addition, it should be understood that in the description of the present application, the terms "first", "second" and the like are used only to distinguish the purpose of description, and are not to be understood as indicating or implying relative importance, nor as an indication. Or suggest the order.
下面将结合附图对本申请实施例作进一步地详细描述。The embodiments of the present application will be further described in detail below with reference to the accompanying drawings.
图2示出了本申请实施例提供的数据存储方法适用的分布式存储系统。参阅图2所示,在分布式存储系统中包括:多个存储节点201(如图所示的存储节点a201、存储节点b201、存储节点c201,以及存储节点n201)。其中,FIG. 2 shows a distributed storage system to which the data storage method provided by the embodiment of the present application is applicable. Referring to FIG. 2, a plurality of storage nodes 201 (such as storage node a201, storage node b201, storage node c201, and storage node n201) are included in the distributed storage system. among them,
在所述分布式存储系统中的每个存储节点201与至少一个其他存储节点201之间存在通信连接,以实现存储节点201之间的通信。如图所示,所述存储节点a201分别与所述存储节点b201、所述存储节点c201之间分别存在通信连接;所述存储节点c201和所述存储节点n201之间存在通信连接。There is a communication connection between each storage node 201 in the distributed storage system and at least one other storage node 201 to enable communication between the storage nodes 201. As shown in the figure, the storage node a201 has a communication connection with the storage node b201 and the storage node c201, respectively; and a communication connection exists between the storage node c201 and the storage node n201.
另外所述分布式存储系统中的至少一个存储节点201与客户端202相连,以实现所述 存储节点201和所述客户端202之间的通信。如图所示,所述存储节点a201与所述客户端202之间存在通信连接。In addition, at least one storage node 201 in the distributed storage system is connected to the client 202 to implement the Communication between the storage node 201 and the client 202. As shown, there is a communication connection between the storage node a201 and the client 202.
在所述分布式存储系统存储写请求中的数据的过程中:In the process of storing the data in the write request by the distributed storage system:
所述多个存储节点201中的待存储所述数据的主节点接收所述写请求后,可以在所述主节点的网卡成功缓存所述写请求之后(成功存储所述数据到本地磁盘之前),即将所述写请求分别发送给所述多个存储节点201中的待存储所述数据的至少一个从节点;并在向所述至少一个从节点发送所述写请求之后,成功存储所述数据;After the master node of the plurality of storage nodes 201 to store the data receives the write request, after the network card of the master node successfully caches the write request (before successfully storing the data to the local disk) Sending the write request to at least one slave node of the plurality of storage nodes 201 to be stored, and successfully storing the data after transmitting the write request to the at least one slave node ;
每个从节点在接收到所述写请求后,可以在该从节点的网卡成功缓存所述写请求之后(成功存储所述数据到本地磁盘之前),向所述主节点返回缓存成功消息,以通知所述主节点该从节点已经成功缓存所述写请求;After receiving the write request, each slave node may return a cache success message to the master node after the slave node's network card successfully caches the write request (before successfully storing the data to the local disk), Notifying the master node that the slave node has successfully cached the write request;
所述主节点在接收到所有从节点的缓存成功消息后,向客户端202返回成功响应消息,以通知所述客户端202所述主节点和所述至少一个从节点已经存储所述数据。After receiving the cache success message of all the slave nodes, the master node returns a success response message to the client 202 to notify the client 202 that the master node and the at least one slave node have stored the data.
在上述分布式存储系统在存储数据的过程中,由于所述主节点在成功存储所述数据之前,将所述写请求发送给所述从节点,避免传统的数据存储方法中主节点将所述数据保存到磁盘之后,再将所述写请求发送所述从节点;因此,该方法可以减少所述分布式存储系统响应所述客户端202的时间中由所述主节点将数据保存到磁盘所消耗的时间。另外,由于所述从节点在成功存储所述数据之前,即向所述主节点返回缓存成功消息,避免传统的数据存储方法中从节点在将所述数据保存到磁盘之后,再返回缓存成功消息;因此,该方法还可以减少所述分布式存储系统响应所述客户端202的时间中由所述从节点将数据保存到磁盘所消耗的时间。显然,所述分布式存储系统不仅可以减少所述分布式存储系统响应所述客户端202的时间中由所述主节点将所述数据保存到磁盘所消耗的时间,还可以减少由所述从节点将所述数据保存到磁盘所消耗的时间,从而可以最大限度的降低所述分布式存储系统向客户端响应所述写请求的时间。In the process of storing data in the distributed storage system, the master node sends the write request to the slave node before successfully storing the data, thereby avoiding that the master node in the traditional data storage method After the data is saved to the disk, the write request is sent to the slave node; therefore, the method can reduce the time when the distributed storage system responds to the client 202 by the master node to save data to the disk. time consumed. In addition, since the slave node returns a cache success message to the master node before successfully storing the data, the slave node avoids the cache success message after saving the data to the disk in the traditional data storage method. Therefore, the method can also reduce the time taken by the distributed storage system to save data to the disk by the slave node in response to the time of the client 202. Obviously, the distributed storage system can not only reduce the time consumed by the distributed storage system to save the data to the disk by the primary node in response to the time of the client 202, but also reduce the time The time taken by the node to save the data to the disk, thereby minimizing the time that the distributed storage system responds to the write request to the client.
本申请实施例提供了一种数据存储方法,该方法应用于分布式存储系统,例如图2所示的分布式存储系统,其中,所述分布式存储系统中包含多个存储节点,本申请实施例涉及的所述第一节点为所述多个存储节点中的一个。所述存储节点中包括网卡和第一存储器,其中,所述网卡中包含第一存储器,所述第一存储器为非易失性存储器。参阅图3所示,该方法的流程包括:The embodiment of the present application provides a data storage method, which is applied to a distributed storage system, such as the distributed storage system shown in FIG. 2, wherein the distributed storage system includes multiple storage nodes, which are implemented by the present application. The first node involved in the example is one of the plurality of storage nodes. The storage node includes a network card and a first memory, wherein the network card includes a first memory, and the first memory is a non-volatile memory. Referring to Figure 3, the process of the method includes:
S301:所述网卡接收第一写请求,所述第一写请求中携带有待存储的第一数据。S301: The network card receives a first write request, where the first write request carries the first data to be stored.
可选的,所述网卡可以接收发起所述第一写请求的客户端或所述分布式存储系统中的其他设备发送的所述第一写请求。Optionally, the network card may receive the first write request sent by a client that initiates the first write request or another device in the distributed storage system.
根据前序中对写请求中的描述可知,当所述分布式存储系统中的存储节点是以数据对象为存储数据的基本单位时,所述第一写请求中还包括所述第一数据所属数据对象的标识。另外,所述第一写请求中还可以包括数据对象内偏移地址、数据长度等信息。According to the description in the write request in the preamble, when the storage node in the distributed storage system is a data unit as a basic unit for storing data, the first write request further includes the first data The identifier of the data object. In addition, the first write request may further include information such as an offset address, a data length, and the like in the data object.
当所述第一写请求需要通过网络传输至所述分布式存储系统时,所述第一写请求的结构如图1所示,此处不再赘述。When the first write request needs to be transmitted to the distributed storage system through the network, the structure of the first write request is as shown in FIG. 1 , and details are not described herein again.
S302:所述网卡将所述第一写请求缓存于所述第一存储器中。S302: The network card caches the first write request in the first memory.
在上述S302中,由于所述第一存储器设置于所述网卡内部,这样,所述网卡将所述写请求保存到所述第一存储器的过程中消耗的时间较短。另外由于所述第一存储器为非易失性存储器,因此,通过S302,所述第一节点可以快速地持久化缓存所述写请求。 In the above S302, since the first memory is disposed inside the network card, the time taken by the network card to save the write request to the first memory is shorter. In addition, since the first memory is a non-volatile memory, the first node can quickly cache the write request by S302.
S303:所述网卡将所述第一写请求发送给待存储所述第一数据的至少一个从节点,其中,所述第一节点为待存储所述第一数据的主节点。S303: The network card sends the first write request to at least one slave node that is to be used to store the first data, where the first node is a master node that is to store the first data.
通过S303,所述第一节点将所述第一写请求分发给所述至少一个从节点,以保证所述至少一个从节点也可以缓存所述第一写请求。And by S303, the first node distributes the first write request to the at least one slave node to ensure that the at least one slave node may also cache the first write request.
可选的,在所述第一写请求中还包含所述数据对象的标识的情况下,所述网卡在执行S303之前,可以通过以下步骤,确定所述第一节点为待存储所述第一数据的主节点(即存储所述数据对象的主节点),以及在所述分布式存储系统中确定待存储所述第一数据的所述至少一个从节点(即存储所述数据对象的至少一个从节点):Optionally, in a case that the first write request further includes the identifier of the data object, before the performing the S303, the network card may determine, by using the following steps, that the first node is to be the first to be stored. a primary node of data (ie, a primary node storing the data object), and determining, in the distributed storage system, the at least one secondary node to store the first data (ie, storing at least one of the data objects) From the node):
步骤a:所述网卡按照设置的数据分布算法对所述数据对象的标识进行计算,获得存储所述数据对象的多个存储节点,其中,存储所述数据对象的多个存储节点包括所述第一节点;Step a: The network card calculates the identifier of the data object according to the set data distribution algorithm, and obtains a plurality of storage nodes that store the data object, where the plurality of storage nodes storing the data object include the first One node
步骤b:所述网卡根据设定的主节点选择规则,确定所述第一节点为存储所述数据对象的主节点,并确定所述存储所述数据对象的多个存储节点中除第一节点之外的其他存储节点为存储所述数据对象的从节点。Step b: the network card determines, according to the set primary node selection rule, that the first node is a primary node that stores the data object, and determines that the first node is the plurality of storage nodes that store the data object. Other storage nodes than others are slave nodes that store the data objects.
可选的,在步骤a之前,所述网卡可以在所述第一存储器中或所述网卡的其他缓存中读取所述第一写请求,并获取所述第一写请求中包含的所述数据对象的标识。Optionally, before step a, the network card may read the first write request in the first memory or another cache of the network card, and obtain the foregoing included in the first write request. The identifier of the data object.
为了保证所述网卡通过上述步骤a确定的存储所述数据对象的多个存储节点中存在所述数据对象的标识所对应的数据对象,当所述第一节点中还包含数据对象副本分布拓扑结构信息的情况下,可选的,所述网卡在执行上述步骤a时,可以按照设置的数据分布算法,对所述数据对象的标识和所述拓扑结构信息进行计算,获得存储所述数据对象的多个存储节点。其中,所述拓扑结构信息中包含所述分布式存储系统中存储所述数据对象的标识所对应的数据对象的所有节点,以及这些节点的连接情况。In order to ensure that the data object corresponding to the identifier of the data object exists in the plurality of storage nodes storing the data object determined by the network card, the first node further includes a data object replica distribution topology. In the case of the information, optionally, when the network card performs the foregoing step a, the identifier of the data object and the topology information may be calculated according to a set data distribution algorithm, and the data object is obtained. Multiple storage nodes. The topology information includes all nodes of the data object corresponding to the identifier of the data object stored in the distributed storage system, and the connection status of the nodes.
在上述方法中,由于所述网卡还可以通过所述拓扑结构信息,确定存储所述数据对象的多个存储节点。这样,所述网卡不仅可以保证获得的存储所述数据对象的多个存储节点中均存在所述数据对象的标识所对应的所述数据对象,还可以保证所述多个存储节点中可以进行通信,即可以传输所述第一写请求。In the above method, since the network card can also determine, by using the topology information, a plurality of storage nodes that store the data object. In this way, the network card can ensure that the data object corresponding to the identifier of the data object exists in the plurality of storage nodes that store the data object, and that the plurality of storage nodes can communicate. That is, the first write request can be transmitted.
可选的,在上述步骤a中,所述数据分布算法可以为控制数据副本分布的可伸缩哈希(Controlled Replication Under Scalable Hashing,CRUSH)算法。Optionally, in the foregoing step a, the data distribution algorithm may be a Controlled Replication Under Scalable Hashing (CRUSH) algorithm.
由于在步骤a中,所述网卡可以确定存储所述数据对象的多个存储节点,但是无法确定其中哪个存储节点为存储所述数据对象的主节点,因此,所述网卡可以通过上述步骤b确定所述主节点和至少一个从节点。Since in step a, the network card can determine a plurality of storage nodes storing the data object, but it is not possible to determine which storage node is the primary node storing the data object, and therefore, the network card can be determined by the above step b The primary node and at least one secondary node.
可选的,所述主节点选择规则可以但不限于为以下规则:选择对应的存储节点标识最大的存储节点作为主节点、选择对应的存储节点标识最小的存储节点作为主节点、选择对应的存储节点标识在指定范围内的存储节点作为主节点、选择位于指定物理位置范围或指定通信位置范围内的存储节点作为主节点。Optionally, the primary node selection rule may be, but is not limited to, the following rule: selecting a storage node with the largest storage node identifier as the primary node, selecting a storage node with the smallest storage node identifier as the primary node, and selecting the corresponding storage. A storage node whose node is identified within the specified range serves as the primary node, and selects a storage node that is within the specified physical location range or the specified communication location as the primary node.
需要说明的是,所述客户端在确定待存储的所述第一数据后,向所述分布式存储系统发送所述第一写请求之前,也需要在所述分布式存储系统中的多个存储节点中确定待存储所述第一数据的主节点(即所述第一节点),并向所述主节点发送所述第一写请求。其中,为了保证所述客户端确定的所述主节点和所述第一节点确定所述主节点为同一个存储节点,所述客户端在确定所述主节点时,也需要采用与所述网卡在确定主节点时相同的方法。 It should be noted that, after determining the first data to be stored, the client needs to be in the distributed storage system before sending the first write request to the distributed storage system. Determining, in the storage node, a primary node (ie, the first node) to store the first data, and transmitting the first write request to the primary node. In order to ensure that the primary node and the first node determined by the client determine that the primary node is the same storage node, the client needs to adopt the network card when determining the primary node. The same method when determining the primary node.
例如,当所述第一节点采用上述步骤a和步骤b确定自身为所述主节点时,所述客户端可以通过上述步骤a和步骤b确定所述第一节点为所述主节点,且所述第一节点和所述客户端可需要采用相同的数据分布算法和主节点选择规则。For example, when the first node determines that the master node is the master node by using the foregoing steps a and b, the client may determine that the first node is the master node by using the foregoing steps a and b, and The first node and the client may need to adopt the same data distribution algorithm and the master node selection rule.
S304:所述网卡接收所述至少一个从节点返回的第一缓存成功消息,所述第一缓存成功消息用于指示所述至少一个从节点已经成功缓存所述第一写请求。S304: The network card receives a first cache success message returned by the at least one slave node, where the first cache success message is used to indicate that the at least one slave node has successfully cached the first write request.
由于所述第一缓存成功消息为从节点的网卡将接收的所述第一写请求缓存到该网卡中的第一存储器后,由该网卡发送给所述第一节点的,因此,所述网卡在接收到任一从节点发送的第一缓存成功消息后,可以确定该从节点已经成功缓存所述第一写请求。After the first cache success message is cached by the network card of the slave node to the first memory in the network card, the first cache request is sent by the network card to the first node, and therefore, the network card After receiving the first cache success message sent by any slave node, it may be determined that the slave node has successfully cached the first write request.
S305:所述网卡向客户端返回成功响应消息,所述成功响应消息用于指示所述第一数据已经被存储于待存储所述第一数据的主节点以及所述至少一个从节点中。S305: The network card returns a success response message to the client, where the success response message is used to indicate that the first data has been stored in a primary node that is to store the first data and the at least one secondary node.
可选的,在本申请实施例中,所述第一节点还包括处理器和第二存储器,其中,所述第二存储器为非易失性存储器。可选的,所述第二存储器可以为所述第一节点中的内存或外存(例如磁盘、硬盘、软盘、光盘等)。Optionally, in the embodiment of the present application, the first node further includes a processor and a second memory, where the second memory is a non-volatile memory. Optionally, the second memory may be a memory or an external memory (such as a magnetic disk, a hard disk, a floppy disk, an optical disk, or the like) in the first node.
可选的,在S302之后,所述第一节点还可以通过以下步骤,将所述第一写请求中的第一数据存储到所述第二存储器中:Optionally, after S302, the first node may further store the first data in the first write request into the second memory by using the following steps:
步骤A:所述处理器从所述第一存储器中读取所述第一写请求;Step A: The processor reads the first write request from the first memory;
步骤B:所述处理器根据所述第一写请求,将所述第一数据写入所述第二存储器中与所述数据对象的标识对应的所述数据对象中。Step B: The processor writes the first data into the data object corresponding to the identifier of the data object in the second memory according to the first write request.
可选的,上述所述处理器存储所述第一数据的流程与所述网卡执行S303-S305的流程之间不存在关联关系,因此,两个流程可以同时启动执行,也可以在所述网卡执行S303-S305的流程后,所述处理器在执行存储所述第一数据的流程。本申请对此不作限定。Optionally, the process of storing the first data by the processor does not have an association relationship with the process of executing the network card S303-S305. Therefore, the two processes may be started at the same time, or may be performed on the network card. After executing the processes of S303-S305, the processor is executing a process of storing the first data. This application does not limit this.
这样,所述网卡可以在所述处理器在将所述第一数据存储到所述第二存储器之前,将所述第一写请求分发给所述至少一个从节点,避免传统的数据存储方法中主节点先将所述第一数据保存到所述第二存储器之后在分发所述第一写请求,因此,该方法可以减少所述分布式存储系统响应客户端的时间中由所述第一节点保存所述第一数据所消耗的时间,从而减少了所述分布式系统向客户端响应所述写请求的时间。In this way, the network card can distribute the first write request to the at least one slave node before the processor stores the first data to the second memory, avoiding the traditional data storage method. The primary node first distributes the first write request after saving the first data to the second memory, so the method may reduce the time that the distributed storage system responds to the client by the first node. The time consumed by the first data, thereby reducing the time that the distributed system responds to the write request to the client.
可选的,所述处理器可以通过以下方法,从所述第一存储器中读取所述第一写请求,包括:Optionally, the processor may read the first write request from the first memory by using the following method, including:
所述处理器从所述第一存储器中读取包含所述第一写请求在内的多个写请求。The processor reads a plurality of write requests including the first write request from the first memory.
例如,所述处理器可以将在所述第一存储器中存储的未读取的写请求达到设定数量时,读取所述设定数量的写请求(包含所述第一写请求);所述处理器可以按照设定的读取周期读取所述第二存储器中存储的未读取的写请求(包含所述第一写请求)。本申请对此不作限定。For example, the processor may read the set number of write requests (including the first write request) when the unread write request stored in the first memory reaches a set number; The processor can read the unread write request (including the first write request) stored in the second memory according to the set read cycle. This application does not limit this.
所述处理器通过上述方法读取未读取的写请求,可以节省所述处理器的信令开销,以及降低了所述处理器执行读取写请求的次数,从而可以降低所述处理器的功耗。The processor reads the unread write request by the foregoing method, which can save the signaling overhead of the processor, and reduce the number of times the processor performs the read write request, thereby reducing the processor. Power consumption.
可选的,当所述第一写请求不仅包含所述数据对象的标识,还可以包括数据对象内偏移地址、数据长度等信息。这样,所述处理在执行步骤B时,可以根据所述数据对象内偏移地址、数据长度等信息,将所述第一数据存储到所述数据对象中准确的位置。Optionally, when the first write request includes not only the identifier of the data object, but also information such as an offset address, a data length, and the like in the data object. In this way, when the step B is performed, the processing may store the first data to an accurate position in the data object according to information such as an offset address, a data length, and the like in the data object.
由于所述第一存储器为非易失性存储介质,因此,在所述处理器在从所述第一存储器中读取所述第一写请求之后,所述第一存储器中仍然缓存有所述第一写请求,为了保证所 述第一存储器中的存储空间可以循环使用,可选的,所述处理器在执行将所述第一数据写入所述第二存储器中的所述数据对象之后,可以通知所述网卡删除所述第一写请求。Since the first memory is a non-volatile storage medium, after the processor reads the first write request from the first memory, the first memory is still cached in the first memory First write request, in order to guarantee The storage space in the first memory may be recycled. Optionally, after executing the writing the first data to the data object in the second memory, the processor may notify the network card deletion station Describe the first write request.
因此,可选的,在所述处理器在将所述第一数据写入所述第二存储器中的所述数据对象之后,所述方法还包括:Therefore, optionally, after the processor writes the first data to the data object in the second memory, the method further includes:
所述处理器向所述网卡发送空间释放消息,其中,所述空间释放消息中包含所述数据对象的标识;The processor sends a space release message to the network card, where the space release message includes an identifier of the data object;
所述网卡根据所述数据对象的标识删除所述第一存储器中的所述第一写请求。The network card deletes the first write request in the first memory according to the identifier of the data object.
通过上述方法,所述网卡可以释放所述第一写请求在所述第一存储器中占用的存储空间。Through the above method, the network card can release the storage space occupied by the first write request in the first memory.
可选的,所述网卡可以通过以下方法,获取所述空间释放消息:Optionally, the network card may obtain the space release message by using the following method:
所述处理器将所述空间释放消息发送给所述网卡;或者Sending, by the processor, the space release message to the network card; or
所述处理器将所述空间释放消息发送到所述第一节点中的内存中,所述网卡可以按照设定的空间释放消息读取周期,读取所述内存中存储的所述空间释放消息。The processor sends the space release message to a memory in the first node, and the network card may release a message read period according to a set space, and read the space release message stored in the memory. .
可选的,在所述网卡在执行S302时,可以按照第一写请求中的数据对象的标识将所述第一写请求存储到所述第一存储器中,例如,所述网卡将包含的数据对象的标识相同的写请求存储于所述第一存储器中相同的存储空间中。这样,所述网卡在删除第一写请求时,也可以按照包含的所述数据对象的标识,确定所述第一写请求所在的存储空间。Optionally, when the network card is executing S302, the first write request may be stored into the first memory according to an identifier of the data object in the first write request, for example, the data to be included by the network card The write request with the same identifier of the object is stored in the same storage space in the first memory. In this way, when deleting the first write request, the network card may determine the storage space where the first write request is located according to the identifier of the included data object.
可选的,为了避免所述网卡在根据所述数据对象的标识,删除所述第一存储器中的所述第一写请求时,会误删除占用所述存储空间的、包含所述数据对象的标识,且未被所述处理器读取的写请求,所述第一节点可以但不限于通过以下可能的实现方式,删除保存到所述第一存储器中的所述第一写请求:Optionally, in order to prevent the network card from deleting the first write request in the first memory according to the identifier of the data object, the data object that occupies the storage space and includes the data object may be deleted by mistake. And identifying, by the processor, a write request that is not read by the processor, the first node may, but is not limited to, deleting the first write request saved in the first memory by using the following possible implementations:
第一种可能的实现方式:所述处理器在读取所述第一写请求后,通知所述网卡对所述第一写请求添加对应的已读标志;所述网卡在根据所述数据对象的标识删除所述第一存储器中的所述第一写请求时,只删除包含所述数据对象的标识、且对应已读标志的所述第一写请求。a first possible implementation manner: after reading the first write request, the processor notifying the network card to add a corresponding read flag to the first write request; the network card is according to the data object When the identifier deletes the first write request in the first memory, only the first write request that includes the identifier of the data object and corresponds to the read flag is deleted.
第二种可能的实现方式:所述网卡在接收所述第一写请求后,确定所述第一写请求的接收时间戳,将所述第一写请求和对应的所述接收时间戳同时存储到所述第一存储器,其中,所述接收时间戳包含所述网卡接收所述第一写请求的时间;所述处理器在读取所述第一写请求时同时读取所述接收时间戳,并在所述空间释放消息中添加所述接收时间戳;所述网卡删除包含所述数据对象的标识、且对应所述接收时间戳的所述第一写请求。a second possible implementation manner: after receiving the first write request, the network card determines a receiving timestamp of the first write request, and simultaneously stores the first write request and the corresponding receiving timestamp Go to the first memory, wherein the receiving timestamp includes a time when the network card receives the first write request; the processor simultaneously reads the receiving timestamp when reading the first write request And adding the receiving timestamp in the space release message; the network card deleting the first write request that includes the identifier of the data object and corresponding to the receiving timestamp.
通过上述两种实现方式,所述网卡可以准确地删除已经被所述处理器读取的所述第一写请求,而避免误删除包含所述数据对象的标识,且未被所述处理器读取的其他写请求,从而保证所述第一节点后续可以将所述其他写请求中的数据存储到所述第二存储器中。Through the above two implementation manners, the network card can accurately delete the first write request that has been read by the processor, and avoid erroneously deleting the identifier that includes the data object, and is not read by the processor. Other write requests are fetched to ensure that the first node can subsequently store data in the other write requests into the second memory.
采用本申请实施例的方法,第一节点中的网卡在确定所述第一节点以及待存储第一数据的至少一个从节点均已经成功缓存包含所述第一数据的第一写请求后,即向客户端返回成功响应消息。因此,该方法不仅可以减少所述分布式存储系统响应所述客户端的时间中由所述第一节点保存所述第一数据所消耗的时间,还可以减少由从节点保存所述第一数据所消耗的时间,从而可以最大限度的降低所述分布式存储系统向客户端响应所述第一写请求的时间。With the method of the embodiment of the present application, after determining, by the network card in the first node, that the first node and the at least one slave node that is to store the first data have successfully cached the first write request that includes the first data, Returns a success response message to the client. Therefore, the method can not only reduce the time consumed by the first storage node to save the first data in the time when the distributed storage system responds to the client, but also reduce the saving of the first data by the slave node. The time consumed so that the time that the distributed storage system responds to the first write request to the client can be minimized.
另外,可选的,在上述实施例中,当第一节点作为待存储第二数据的从节点时,所述 方法还包括:In addition, optionally, in the foregoing embodiment, when the first node is a slave node that is to store the second data, The method also includes:
所述网卡接收第二写请求,所述第二写请求中携带有待存储的第二数据;The network card receives a second write request, where the second write request carries the second data to be stored;
所述网卡将所述第二写请求缓存于所述第一存储器中;The network card buffers the second write request in the first memory;
所述网卡向所述多个存储节点中的第二节点发送第二缓存成功消息,所述第二缓存成功消息用于指示所述第一节点已经成功缓存所述第二写请求,其中,所述第一节点为待存储所述第二数据的至少一个从节点中的一个从节点,所述第二节点为待存储所述第二数据的主节点。Sending, by the network card, a second cache success message to a second one of the plurality of storage nodes, where the second cache success message is used to indicate that the first node has successfully cached the second write request, where The first node is a slave node of at least one slave node to be stored, and the second node is a master node to store the second data.
其中,所述处理器将所述第二数据写入所述第二存储器中的过程可以参见所述处理器将第一数据写入所述第二存储器的描述,此处不再赘述。The process of writing the second data into the second memory by the processor may refer to the description of the processor writing the first data to the second memory, and details are not described herein again.
基于以上实施例,参阅图4所示,本申请实施例还提供了一种数据存储方法实例,在所述实例中包括客户端以及分布式存储系统中的存储数据的多个存储节点。在本实例中以待存储数据的主节点为第一节点,待存储数据的从节点以第二节点为例进行说明。如图所示,在该实例中的数据存储流程包括:Based on the foregoing embodiment, referring to FIG. 4, an embodiment of the present application further provides an example of a data storage method, where the client includes a plurality of storage nodes that store data in a distributed storage system. In this example, the primary node to store data is the first node, and the secondary node to store data is described by taking the second node as an example. As shown, the data storage process in this example includes:
步骤1:客户端确定待存储的数据,并确定所述数据所属的数据对象的标识。Step 1: The client determines the data to be stored and determines the identity of the data object to which the data belongs.
步骤2:所述客户端根据所述数据对象标识、所述分布式存储系统中的数据对象副本分布拓扑结构信息,按照设定数据分布算法(例如CRUSH算法),在所述分布式存储系统中包含的多个节点中,确定存储所述数据对象的多个存储节点(包括所述第一节点和所述第二节点)。Step 2: The client according to the data object identifier, the data object replica distribution topology information in the distributed storage system, according to a set data distribution algorithm (for example, CRUSH algorithm), in the distributed storage system Among the plurality of nodes included, a plurality of storage nodes (including the first node and the second node) storing the data object are determined.
步骤3:所述客户端按照设定的主节点选择规则(例如选择存储节点标识最大的存储节点作为主节点),在确定的存储所述数据对象的多个存储节点中选择所述第一节点作为存储所述数据对象的主节点。Step 3: The client selects the first node among the determined plurality of storage nodes storing the data object according to the set primary node selection rule (for example, selecting a storage node with the largest storage node identifier as the primary node) As the primary node that stores the data object.
步骤4:所述客户端向所述第一节点发送包含所述待存储的数据的写请求。所述写请求的结构如图1所示:Step 4: The client sends a write request including the data to be stored to the first node. The structure of the write request is as shown in Figure 1:
其中,所述写请求中的数据语义层中包括:所述数据对象标识、数据对象内偏移地址、数据长度等信息。所述写请求中的传输层、网络层、数据链路层为网络传输所需的网络协议栈层次。所述写请求中的传输层中包括指示处理所述写请求的存储软件系统的端口号。The data semantic layer in the write request includes: the data object identifier, an offset address in the data object, a data length, and the like. The transport layer, the network layer, and the data link layer in the write request are network protocol stack layers required for network transmission. The transport layer in the write request includes a port number indicating a storage software system that processes the write request.
步骤5:所述第一节点中的网卡a接收所述写请求,并将所述写请求缓存到所述网卡a包含的第一存储器a中。Step 5: The network card a in the first node receives the write request, and caches the write request into the first memory a included in the network card a.
步骤6:所述网卡a从所述第一存储器a或所述网卡a的缓存中读取所述写请求,并对所述写请求进行解封装,并解析所述写请求中的所述数据语义层,获取所述数据语义层中包含的所述数据对象的标识。Step 6: The network card a reads the write request from the cache of the first memory a or the network card a, decapsulates the write request, and parses the data in the write request a semantic layer that acquires an identifier of the data object included in the data semantic layer.
步骤7:所述网卡a在所述第一节点中的共享内存中获取所述分布式存储系统中的数据对象副本分布拓扑结构信息,并根据所述数据对象的标识、所述数据对象副本分布拓扑结构信息,按照设定的所述数据分布算法(与步骤2中所述客户端使用的算法相同),在所述分布式存储系统中包含的多个存储节点中,确定存储所述数据对象的多个存储节点(包括所述第一节点和所述第二节点)。Step 7: The network card a acquires data object replica distribution topology information in the distributed storage system in a shared memory in the first node, and distributes the data object according to the identifier of the data object. The topology information is determined to store the data object among the plurality of storage nodes included in the distributed storage system according to the set data distribution algorithm (same as the algorithm used by the client in step 2) A plurality of storage nodes (including the first node and the second node).
步骤8:所述网卡a按照设定的所述主节点选择规则(与步骤3中所述客户端使用的规则相同),在存储所述数据对象的多个存储节点中选择所述第一节点作为存储所述数据对象的主节点;并确定存储所述数据对象的多个存储节点中除所述主节点以外的其他存储节点(所述第二节点)为存储所述数据对象的从节点。 Step 8: The network card a selects the first node among a plurality of storage nodes storing the data object according to the set primary node selection rule (same as the rule used by the client in step 3) And a storage node that stores the data object; and determines a storage node (the second node) other than the primary node among the plurality of storage nodes storing the data object as a slave node that stores the data object.
步骤9:所述网卡a将所述写请求分别发送给所述至少一个从节点(包括所述第二节点)。Step 9: The network card a sends the write request to the at least one slave node (including the second node).
步骤10:所述第二节点中的所述网卡b接收所述写请求,并将所述写请求缓存到所述网卡b包含的第一存储器b中。Step 10: The network card b in the second node receives the write request, and caches the write request into the first memory b included in the network card b.
步骤11:所述网卡b从所述第一存储器b或所述网卡b的缓存中读取所述写请求,并对所述写请求进行解封装,并解析所述写请求中的所述数据语义层,获取所述数据语义层中包含的所述数据对象的标识。Step 11: The network card b reads the write request from the cache of the first memory b or the network card b, decapsulates the write request, and parses the data in the write request a semantic layer that acquires an identifier of the data object included in the data semantic layer.
步骤12:所述网卡b在所述第二节点中的共享内存中获取所述分布式存储系统中的数据对象副本分布拓扑结构信息,并根据所述数据对象的标识、所述数据对象副本分布拓扑结构信息,按照设定的所述数据分布算法(与步骤2和步骤7中使用的算法相同),在所述分布式存储系统中包含的多个节点中,确定存储所述数据对象的多个存储节点(包括所述第一节点和所述第二节点)。Step 12: The network card b acquires data object replica distribution topology information in the distributed storage system in a shared memory in the second node, and distributes according to the identifier of the data object and the data object replica. Topology information, according to the set data distribution algorithm (same as the algorithm used in steps 2 and 7), determining, among the plurality of nodes included in the distributed storage system, storing the data object Storage nodes (including the first node and the second node).
步骤13:所述网卡b按照设定的所述主节点选择规则(与步骤3和步骤8中使用的规则相同),在存储所述数据对象的多个存储节点中选择所述第一节点作为存储所述数据对象的主节点;并确定存储所述数据对象的多个存储节点中除所述主节点以外的其他存储节点(第二节点)为存储所述数据对象的从节点。Step 13: The network card b selects the first node as one of a plurality of storage nodes storing the data object according to the set primary node selection rule (same as the rules used in steps 3 and 8) Storing a primary node of the data object; and determining, among the plurality of storage nodes storing the data object, a storage node (second node) other than the primary node as a secondary node storing the data object.
步骤14:所述网卡b向所述第一节点(所述主节点)发送缓存成功消息,所述缓存成功消息响应用于指示所述第二节点已经成功缓存所述写请求。Step 14: The network card b sends a cache success message to the first node (the primary node), and the cache success message response is used to indicate that the second node has successfully cached the write request.
步骤15:所述第一节点中的所述网卡a接收所述第二节点发送的缓存成功消息后,向所述客户端发送成功响应消息,所述成功响应消息用于指示所述数据已经被存储于所述第一节点和所述第二节点中。Step 15: After receiving the cache success message sent by the second node, the network card a in the first node sends a success response message to the client, where the success response message is used to indicate that the data has been Stored in the first node and the second node.
在本实例中,在上述步骤9之后,所述第一节点还需要通过以下步骤,将所述写请求中的所述数据存储到所述第一节点中的第二存储器a:In this example, after the above step 9, the first node further needs to store the data in the write request to the second memory a in the first node by the following steps:
步骤(1-a):所述网卡a确定接收的写请求达到设定数量时,生成直接存储器访问(direct memory access,DMA)中断,并发送给所述处理器a中的DMA控制器a,所述DMA控制器a从所述第一存储器中读取所述设定数量的写请求(包含所述写请求),并拷贝到所述处理器a的内核内存缓存区。Step (1-a): when the network card a determines that the received write request reaches the set number, generates a direct memory access (DMA) interrupt, and sends it to the DMA controller a in the processor a, The DMA controller a reads the set number of write requests (including the write request) from the first memory and copies to the kernel memory buffer of the processor a.
步骤(1-b):所述处理器a中的内核进程探测到所述内核内存缓存区中存在新的写请求(所述写请求)后,对所述写请求进行解封装,得到所述写请求中传输层中的端口号,和包含所述数据和数据语义层的数据内容。Step (1-b): after the kernel process in the processor a detects that there is a new write request (the write request) in the kernel memory buffer, decapsulating the write request to obtain the The port number in the transport layer in the write request, and the data content containing the data and data semantic layers.
步骤(1-c):所述内核进程根据所述端口号,启动所述端口号对应的存储软件系统a,所述存储软件系统a将所述内核内存缓存区中解封装后所述数据内容拷贝到所述存储软件系统中的用户缓存区。Step (1-c): the kernel process starts the storage software system a corresponding to the port number according to the port number, and the storage software system a decapsulates the data content in the kernel memory buffer area Copy to the user cache in the storage software system.
步骤(1-d):所述存储软件系统a将所述用户缓存区内的所述数据内容进行解析,获取所述数据,以及所述数据语义层中的数据对象标识、数据对象内偏移地址、数据长度等信息。Step (1-d): the storage software system a parses the data content in the user cache area, acquires the data, and the data object identifier and the data object offset in the data semantic layer Information such as address and data length.
步骤(1-e):所述存储软件系统a根据所述数据对象内偏移地址、数据长度,将所述数据持久化存储到所述第一存储器a。Step (1-e): The storage software system a stores the data in the first memory a according to the offset address and the data length in the data object.
在本实例中,在上述步骤14之后,所述第二节点也可以需要通过上述步骤(步骤(1-a)至步骤(1-e)),将所述写请求中的所述数据存储到所述第二节点中的第二存储器b,具体 可以参照上述步骤,此处不再赘述。In this example, after the above step 14, the second node may also need to store the data in the write request to the data through the above steps (step (1-a) to step (1-e)). a second memory b in the second node, specifically The above steps can be referred to, and details are not described herein again.
在本实例中,在所述第一节点将所述数据存储到所述第二存储器a之后,所述第一节点还需要通过以下步骤,删除所述第一存储器a中的所述写请求:In this example, after the first node stores the data to the second memory a, the first node further needs to delete the write request in the first memory a by the following steps:
步骤(2-a):所述存储软件系统a生成包含所述数据对象的标识的空间释放消息,并将所述空间释放消息发送到所述共享内存中。Step (2-a): The storage software system a generates a space release message containing an identifier of the data object, and transmits the space release message to the shared memory.
步骤(2-b):所述网卡a按照设定的空间释放消息读取周期,读取所述共享内存中的所述空间释放消息。Step (2-b): The network card a releases the message release period according to the set space, and reads the space release message in the shared memory.
步骤(2-c):所述网卡a根据所述空间释放消息中的数据对象的标识,删除所述第二存储器a中的包含所述数据对象的标识、且对应已读标志的所述写请求。Step (2-c): the network card a deletes the identifier in the second memory a that includes the identifier of the data object and corresponds to the read flag according to the identifier of the data object in the space release message. request.
其中,所述写请求对应的已读标志为步骤(1-a)中所述DMA控制器a从所述第一存储器a中读取包含所述写请求时通知所述网卡添加的。The read flag corresponding to the write request is that the DMA controller a in the step (1-a) notifies the network card to add when the read request is read from the first memory a.
在本实例中,在所述第二节点将所述数据存储到所述第一存储器b之后,所述第二节点也需要通过上述步骤(步骤(2-a)至步骤(2-c)),删除所述第一存储器b中的所述写请求,具体可以参照上述步骤,此处不再赘述。In this example, after the second node stores the data to the first memory b, the second node also needs to pass the above steps (step (2-a) to step (2-c)) And deleting the write request in the first memory b. For details, refer to the foregoing steps, and details are not described herein again.
在上述实例中,由于第一节点中的网卡a在确定所述第一节点以及所述第二节点均已经成功缓存包含所述写请求后,即向客户端返回成功响应消息。因此,该实例不仅可以减少所述分布式存储系统响应所述客户端的时间中由所述第一节点保存所述数据所消耗的时间,还可以减少由所述第二节点保存所述数据所消耗的时间,从而可以最大限度的降低所述分布式存储系统向客户端响应所述写请求的时间。In the above example, the network card a in the first node returns a success response message to the client after determining that the first node and the second node have successfully cached the write request. Therefore, the example can not only reduce the time consumed by the distributed storage system to save the data by the first node in response to the time of the client, but also reduce the consumption of saving the data by the second node. Time, thereby minimizing the time that the distributed storage system responds to the write request to the client.
基于以上实施例,本申请实施例还提供了一种存储节点。所述存储节点可以为图2所示的分布式存储系统中的一个存储节点。参阅图5所示,所述存储节点500包括:通信单元501,所述通信单元中包含第一存储单元5011,其中,所述第一存储单元为非易失性存储器。其中,Based on the above embodiment, the embodiment of the present application further provides a storage node. The storage node may be a storage node in the distributed storage system shown in FIG. 2. Referring to FIG. 5, the storage node 500 includes a communication unit 501, and the communication unit includes a first storage unit 5011, wherein the first storage unit is a non-volatile memory. among them,
所述通信单元501,用于接收第一写请求,所述第一写请求中携带有待存储的第一数据;并将所述第一写请求缓存于所述第一存储单元5011中;将所述第一写请求发送给待存储所述第一数据的至少一个从节点,其中,所述第一节点为待存储所述第一数据的主节点;接收所述至少一个从节点返回的第一缓存成功消息,所述第一缓存成功消息用于指示所述至少一个从节点已经成功缓存所述第一写请求;以及向客户端返回成功响应消息,所述成功响应消息用于指示所述第一数据已经被存储于待存储所述第一数据的主节点以及所述至少一个从节点中;The communication unit 501 is configured to receive a first write request, where the first write request carries first data to be stored, and caches the first write request in the first storage unit 5011; Transmitting a first write request to at least one slave node to be stored, wherein the first node is a master node to store the first data; and receiving the first one returned by the at least one slave node Cache success message, the first cache success message is used to indicate that the at least one slave node has successfully cached the first write request; and return a success response message to the client, where the success response message is used to indicate the first a data has been stored in the primary node of the first data to be stored and the at least one secondary node;
所述第一存储单元5011,用于缓存所述第一写请求。The first storage unit 5011 is configured to cache the first write request.
可选的,所述第一写请求中还包含所述第一数据所属的数据对象的标识,所述通信单元501还用于:Optionally, the first write request further includes an identifier of the data object to which the first data belongs, and the communication unit 501 is further configured to:
在将所述第一写请求发送给待存储所述第一数据的至少一个从节点之前,按照设置的数据分布算法对所述数据对象的标识进行计算,获得存储所述数据对象的多个存储节点,其中,存储所述数据对象的多个存储节点包括所述第一节点;Before the first write request is sent to the at least one slave node to be stored, the identifier of the data object is calculated according to a set data distribution algorithm, and multiple stores storing the data object are obtained. a node, wherein the plurality of storage nodes storing the data object comprise the first node;
根据设定的主节点选择规则,确定所述第一节点为存储所述数据对象的主节点,并确定所述存储所述数据对象的多个存储节点中除第一节点之外的其他存储节点为存储所述数据对象的从节点。Determining, according to the set primary node selection rule, that the first node is a primary node that stores the data object, and determining another storage node other than the first node of the multiple storage nodes that store the data object A slave node for storing the data object.
可选的,所述第一节点500还包括:处理单元502和第二存储单元503,所述第二存 储单元503为非易失性存储器,其中,Optionally, the first node 500 further includes: a processing unit 502 and a second storage unit 503, where the second storage The storage unit 503 is a nonvolatile memory, wherein
所述处理单元502,用于从所述第一存储单元5011中读取所述第一写请求;并根据所述第一写请求,将所述第一数据写入所述第二存储单元中、所述数据对象的标识对应的所述数据对象中;The processing unit 502 is configured to read the first write request from the first storage unit 5011; and write the first data into the second storage unit according to the first write request And the data object corresponding to the identifier of the data object;
所述第二存储单元503,用于存储所述数据对象。The second storage unit 503 is configured to store the data object.
可选的,所述处理单元502,还用于在根据所述第一写请求将所述第一数据写入所述第二存储单元503中的所述数据对象之后,向所述通信单元501发送空间释放消息,其中,所述空间释放消息中包含所述数据对象的标识;Optionally, the processing unit 502 is further configured to: after the first data is written to the data object in the second storage unit 503 according to the first write request, to the communication unit 501 Transmitting a space release message, where the space release message includes an identifier of the data object;
所述通信单元501,还用于根据所述数据对象的标识删除所述第一存储单元5011中的所述第一写请求。The communication unit 501 is further configured to delete the first write request in the first storage unit 5011 according to the identifier of the data object.
可选的,所述通信单元501,还用于接收第二写请求,所述第二写请求中携带有待存储的第二数据;并将所述第二写请求缓存于所述第一存储单元5011中;以及向所述分布式存储系统中的目标存储节点发送第二缓存成功消息,所述第二缓存成功消息用于指示所述第一节点已经成功缓存所述第二写请求,其中,所述存储节点为待存储所述第二数据的至少一个从节点中的一个从节点,所述目标存储节点为待存储所述第二数据的主节点;Optionally, the communication unit 501 is further configured to receive a second write request, where the second write request carries second data to be stored, and cache the second write request in the first storage unit And sending a second cache success message to the target storage node in the distributed storage system, where the second cache success message is used to indicate that the first node has successfully cached the second write request, where The storage node is a slave node of at least one slave node to be stored, and the target storage node is a master node to store the second data;
所述第一存储单元5011,还用于缓存所述第二写请求。The first storage unit 5011 is further configured to cache the second write request.
采用本申请实施例的存储节点,所述存储节点中的网卡在确定所述存储节点以及待存储第一数据的至少一个从节点均已经成功缓存包含所述第一数据的第一写请求后,即向客户端返回成功响应消息。因此,所述存储节点不仅可以减少所述分布式存储系统响应所述客户端的时间中由所述存储节点保存所述第一数据所消耗的时间,还可以减少由从节点保存所述第一数据所消耗的时间,从而可以最大限度的降低所述分布式存储系统向客户端响应所述第一写请求的时间。With the storage node of the embodiment of the present application, after the network card in the storage node determines that the storage node and at least one slave node that is to store the first data have successfully cached the first write request including the first data, That is, a success response message is returned to the client. Therefore, the storage node can not only reduce the time consumed by the storage node to save the first data in the time when the distributed storage system responds to the client, but also reduce the saving of the first data by the slave node. The time consumed, thereby minimizing the time that the distributed storage system responds to the client with the first write request.
需要说明的是,本申请实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。在本申请的实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logical function division. In actual implementation, there may be another division manner. The functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
基于以上实施例,本申请实施例还提供了一种计算机存储介质,该存储介质中存储软件程序,该软件程序在被一个或多个处理器读取并执行时可实现上述实施例提供的数据存储方法。所述计算机存储介质可以包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。Based on the above embodiments, the embodiment of the present application further provides a computer storage medium, where the software program stores a software program, and the software program can implement the data provided by the foregoing embodiment when being read and executed by one or more processors. Storage method. The computer storage medium may include: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. medium.
基于以上实施例,本申请实施例还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述实施例提供的数据存储方法。Based on the above embodiments, the embodiment of the present application further provides a computer program product including instructions, which when executed on a computer, causes the computer to execute the data storage method provided by the foregoing embodiment.
基于以上实施例,本申请实施例还提供了一种存储设备,所述存储设备为图2所示的分布式存储系统中的一个存储节点。参阅图6所示,所述存储设备600包括:接口卡601,所述接口卡中还包括第一存储器6011。其中,所述第一存储器6011为非易失性存储器。Based on the above embodiment, the embodiment of the present application further provides a storage device, which is a storage node in the distributed storage system shown in FIG. 2 . Referring to FIG. 6, the storage device 600 includes an interface card 601, and the interface card further includes a first memory 6011. The first memory 6011 is a non-volatile memory.
所述第一存储器6011用于缓存所述接口卡601接收到的写请求。The first memory 6011 is configured to cache a write request received by the interface card 601.
所述接口卡601用于实现如图3所示的数据存储方法,包括:The interface card 601 is used to implement the data storage method as shown in FIG. 3, including:
接收第一写请求,所述第一写请求中携带有待存储的第一数据;Receiving a first write request, where the first write request carries first data to be stored;
将所述第一写请求缓存于所述第一存储器6011中; Caching the first write request in the first memory 6011;
将所述第一写请求发送给所述分布式存储系统中待存储所述第一数据的至少一个从节点,其中,所述存储设备为所述分布式存储系统中待存储所述第一数据的主节点;Sending the first write request to at least one slave node of the distributed storage system to be stored, where the storage device is the first data to be stored in the distributed storage system Primary node
接收所述至少一个从节点返回的第一缓存成功消息,所述第一缓存成功消息用于指示所述至少一个从节点已经成功缓存所述第一写请求;Receiving, by the at least one slave node, a first cache success message, where the first cache success message is used to indicate that the at least one slave node has successfully cached the first write request;
向客户端返回成功响应消息,所述成功响应消息用于指示所述第一数据已经被存储于待存储所述第一数据的主节点以及所述至少一个从节点中。Returning a success response message to the client, the success response message is used to indicate that the first data has been stored in a primary node and the at least one secondary node that are to store the first data.
可选的,所述第一写请求中还包含所述第一数据所属的数据对象的标识,所述接口卡601还用于:Optionally, the first write request further includes an identifier of the data object to which the first data belongs, and the interface card 601 is further configured to:
在将所述第一写请求发送给待存储所述第一数据的至少一个从节点之前,按照设置的数据分布算法对所述数据对象的标识进行计算,获得存储所述数据对象的多个存储节点,其中,存储所述数据对象的多个存储节点包括所述存储设备;Before the first write request is sent to the at least one slave node to be stored, the identifier of the data object is calculated according to a set data distribution algorithm, and multiple stores storing the data object are obtained. a node, wherein the plurality of storage nodes storing the data object comprise the storage device;
根据设定的主节点选择规则,确定所述存储设备为存储所述数据对象的主节点,并确定存储所述数据对象的多个存储节点中除所述存储设备之外的其他存储节点为存储所述数据对象的从节点。Determining, according to the set primary node selection rule, the storage device as a primary node storing the data object, and determining that other storage nodes other than the storage device of the plurality of storage nodes storing the data object are stored The slave node of the data object.
可选的,所述存储设备600还包括处理器602和至少一个存储器603,所述至少一个存储器中包含第二存储器603,所述第二存储器603为非易失性存储器,其中,Optionally, the storage device 600 further includes a processor 602 and at least one memory 603, where the at least one memory includes a second memory 603, and the second memory 603 is a non-volatile memory, where
所述处理器602,用于从所述第一存储器6011中读取所述第一写请求,并根据所述第一写请求,将所述第一数据写入所述第二存储器603中与所述数据对象的标识对应的所述数据对象中;The processor 602 is configured to read the first write request from the first memory 6011, and write the first data into the second memory 603 according to the first write request. The data object corresponding to the identifier of the data object;
所述第二存储器603,用于存储所述数据对象。The second memory 603 is configured to store the data object.
可选的,所述处理器602,还用于在根据所述第一写请求将所述第一数据写入所述第二存储器603中的所述数据对象之后,向所述接口卡601发送空间释放消息,其中,所述空间释放消息中包含所述数据对象的标识;Optionally, the processor 602 is further configured to send the first data to the interface card 601 after the first data is written to the data object in the second memory 603 according to the first write request. a space release message, wherein the space release message includes an identifier of the data object;
所述接口卡601,还用于根据所述数据对象的标识删除所述第一存储器6011中的所述第一写请求。The interface card 601 is further configured to delete the first write request in the first memory 6011 according to the identifier of the data object.
可选的,所述接口卡601,还用于:Optionally, the interface card 601 is further configured to:
接收第二写请求,所述第二写请求中携带有待存储的第二数据;Receiving a second write request, where the second write request carries the second data to be stored;
将所述第二写请求缓存于所述第一存储器6011中;Caching the second write request in the first memory 6011;
向所述分布式存储系统中的目标存储节点发送第二缓存成功消息,所述第二缓存成功消息用于指示所述存储设备已经成功缓存所述第二写请求,其中,所述存储设备600为待存储所述第二数据的至少一个从节点中的一个从节点,所述目标存储节点为待存储所述第二数据的主节点。Sending a second cache success message to the target storage node in the distributed storage system, where the second cache success message is used to indicate that the storage device has successfully cached the second write request, where the storage device 600 And being a slave node of the at least one slave node that is to store the second data, where the target storage node is a master node to store the second data.
所述至少一个存储器603,用于存放程序指令等。具体地,程序指令可以包括程序代码,该程序代码包括计算机操作指令。所述至少一个存储器603可能包含RAM,也可能还包括非易失性存储器,例如至少一个磁盘存储器。所述处理器602执行存储器1203所存放的程序指令,将所述第一数据写入所述第二存储器603中所述数据对象中,从而实现上述实施例提供的数据存储方法。The at least one memory 603 is configured to store program instructions and the like. In particular, program instructions may include program code, the program code including computer operating instructions. The at least one memory 603 may include RAM and may also include non-volatile memory, such as at least one disk storage. The processor 602 executes the program instructions stored in the memory 1203, and writes the first data into the data object in the second memory 603, thereby implementing the data storage method provided by the foregoing embodiment.
采用本申请实施例的存储节点,所述存储节点中的接口卡在确定所述存储节点以及待存储第一数据的至少一个从节点均已经成功缓存包含所述第一数据的第一写请求后,即向客户端返回成功响应消息。因此,所述存储节点不仅可以减少所述分布式存储系统响应所 述客户端的时间中由所述存储节点保存所述第一数据所消耗的时间,还可以减少由从节点保存所述第一数据所消耗的时间,从而可以最大限度的降低所述分布式存储系统向客户端响应所述第一写请求的时间。With the storage node of the embodiment of the present application, the interface card in the storage node determines that the storage node and the at least one slave node that is to store the first data have successfully cached the first write request including the first data. , which returns a success response message to the client. Therefore, the storage node can not only reduce the response of the distributed storage system Determining the time consumed by the storage node to save the first data in the time of the client, and reducing the time consumed by the slave node to save the first data, thereby minimizing the distributed storage system The time at which the client responds to the first write request.
基于以上实施例,本申请实施例还提供了一种分布式存储系统,所述分布式存储系统中包含多个存储节点,每个存储节点中包括网卡,所述网卡中包括第一存储器,所述第一存储器为非易失性存储器,其中,Based on the foregoing embodiment, the embodiment of the present application further provides a distributed storage system, where the distributed storage system includes multiple storage nodes, each storage node includes a network card, and the network card includes a first memory. The first memory is a non-volatile memory, wherein
第一节点701中的网卡,用于接收写请求,所述写请求中携带有待存储的数据;并将所述写请求缓存于所述第一节点701中的第一存储器中;以及将所述写请求发送给第二节点702,其中,所述第一节点701、所述第二节点702为所述多个存储节点中的存储节点,所述第一节点701为待存储所述数据的主节点,所述第二节点702为待存储所述数据的从节点;a network card in the first node 701, configured to receive a write request, where the write request carries data to be stored; and cache the write request in a first memory in the first node 701; The write request is sent to the second node 702, wherein the first node 701 and the second node 702 are storage nodes of the plurality of storage nodes, and the first node 701 is a host to store the data. a node, the second node 702 being a slave node to store the data;
所述第二节点702中的网卡,用于接收所述写请求,并将所述写请求缓存于所述第二节点702中的第一存储器中,以及向所述第一节点701发送缓存成功消息,所述缓存成功消息用于指示所述第二节点702已经成功缓存所述写请求;The network card in the second node 702 is configured to receive the write request, cache the write request in a first memory in the second node 702, and send a cache success to the first node 701. a message, the cache success message is used to indicate that the second node 702 has successfully cached the write request;
所述第一节点701中的网卡,还用于接收所述第二节点702返回的缓存成功消息,以及向客户端返回成功响应消息,所述成功响应消息用于指示所述数据已经被存储于待存储所述数据的主节点以及从节点中。The network card in the first node 701 is further configured to receive a cache success message returned by the second node 702, and return a success response message to the client, where the success response message is used to indicate that the data has been stored in the The primary node and the secondary node of the data to be stored.
可选的,所述写请求中还包含所述数据所属的数据对象的标识,所述第一节点701中的网卡,还用于:Optionally, the write request further includes an identifier of the data object to which the data belongs, and the network card in the first node 701 is further configured to:
在将所述写请求发送给第二节点702之前,按照设置的数据分布算法对所述数据对象的标识进行计算,获得存储所述数据对象的多个存储节点,其中,存储所述数据对象的多个存储节点包括所述第一节点701和所述第二节点702;Before transmitting the write request to the second node 702, calculating an identifier of the data object according to a set data distribution algorithm, obtaining a plurality of storage nodes storing the data object, wherein storing the data object The plurality of storage nodes include the first node 701 and the second node 702;
根据设定的主节点选择规则,确定所述第一节点701为存储所述数据对象的主节点,并确定所述第二节点702为存储所述数据对象的从节点。Determining, according to the set primary node selection rule, the first node 701 is a primary node storing the data object, and determining that the second node 702 is a secondary node that stores the data object.
可选的,每个存储节点中还包括处理器和第二存储器,所述第二存储器为非易失性存储器;在该情况下,所述第一节点701中的处理器,用于从所述第一节点701中的第一存储器中读取所述写请求,并根据所述写请求,将所述数据写入所述第一节点701中的第二存储器中、所述数据对象标识对应的所述数据对象中;Optionally, each storage node further includes a processor and a second memory, where the second memory is a non-volatile memory; in this case, the processor in the first node 701 is used to Reading the write request in the first memory in the first node 701, and writing the data into the second memory in the first node 701 according to the write request, where the data object identifier corresponds In the data object;
所述第二节点702中的处理器,用于从所述第二节点702中的第一存储器中读取所述写请求,并根据所述写请求,将所述数据写入所述第二节点702中的第二存储器中、所述数据对象标识对应的所述数据对象中。a processor in the second node 702, configured to read the write request from a first memory in the second node 702, and write the data into the second according to the write request In the second memory in the node 702, the data object identifier is in the data object corresponding to the data object.
可选的,所述第一节点701中的处理器,还用于在所述数据写入所述第一节点701中的第二存储器中的所述数据对象之后,向所述第一节点701中的网卡发送空间释放消息,其中,所述空间释放消息中包含所述数据对应的标识;Optionally, the processor in the first node 701 is further configured to: after the data is written into the data object in the second memory in the first node 701, to the first node 701 The network card in the medium sends a space release message, where the space release message includes an identifier corresponding to the data;
所述第一节点701中的网卡,还用于根据所述数据对象的标识删除所述第一节点701中的第一存储器中的所述写请求;The network card in the first node 701 is further configured to delete the write request in the first memory in the first node 701 according to the identifier of the data object;
所述二节点中的处理器,还用于在所述数据写入所述第二节点702中的第二存储器中的所述数据对象之后,向所述第二节点702中的网卡发送空间释放消息,其中,所述空间释放消息中包含所述数据对应的标识;The processor in the two nodes is further configured to send a space release to the network card in the second node 702 after the data is written into the data object in the second memory in the second node 702. a message, where the space release message includes an identifier corresponding to the data;
所述第二节点702中的网卡,还用于根据所述数据对象的标识删除所述第二节点702 中的第一存储器中的所述写请求。The network card in the second node 702 is further configured to delete the second node 702 according to the identifier of the data object. The write request in the first memory.
在该分布式存储系统中,第一节点中的网卡在确定所述第一节点以及所述第二节点均已经成功缓存包含待存储的数据的写请求后,即向客户端返回成功响应消息。因此,该方法不仅可以减少所述分布式存储系统响应所述客户端的时间中由所述第一节点保存所述数据所消耗的时间,还可以减少由所述第二节点保存所述数据所消耗的时间,从而可以最大限度的降低所述分布式存储系统向客户端响应所述写请求的时间。In the distributed storage system, after determining that the first node and the second node have successfully cached the write request including the data to be stored, the network card in the first node returns a success response message to the client. Therefore, the method can not only reduce the time consumed by the distributed storage system to save the data by the first node in response to the time of the client, but also reduce the consumption of saving the data by the second node. Time, thereby minimizing the time that the distributed storage system responds to the write request to the client.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present application can be provided as a method, system, or computer program product. Thus, the present application can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment in combination of software and hardware. Moreover, the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。 These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Claims (14)

  1. 一种数据存储方法,其特征在于,所述方法应用于分布式存储系统,所述分布式存储系统中包含多个存储节点,所述多个存储节点中的第一节点包括网卡,所述网卡中包含有第一存储器,所述第一存储器为非易失性存储器,所述方法包括:A data storage method, wherein the method is applied to a distributed storage system, where the distributed storage system includes a plurality of storage nodes, and a first node of the plurality of storage nodes includes a network card, and the network card A first memory is included, the first memory is a non-volatile memory, and the method includes:
    所述网卡接收第一写请求,所述第一写请求中携带有待存储的第一数据;Receiving, by the network card, a first write request, where the first write request carries the first data to be stored;
    所述网卡将所述第一写请求缓存于所述第一存储器中;The network card caches the first write request in the first memory;
    所述网卡将所述第一写请求发送给待存储所述第一数据的至少一个从节点,其中,所述第一节点为待存储所述第一数据的主节点;Sending, by the network card, the first write request to at least one slave node that is to be used to store the first data, where the first node is a master node to store the first data;
    所述网卡接收所述至少一个从节点返回的第一缓存成功消息,所述第一缓存成功消息用于指示所述至少一个从节点已经成功缓存所述第一写请求;Receiving, by the network card, the first cache success message returned by the at least one slave node, where the first cache success message is used to indicate that the at least one slave node has successfully cached the first write request;
    所述网卡向客户端返回成功响应消息,所述成功响应消息用于指示所述第一数据已经被存储于待存储所述第一数据的主节点以及所述至少一个从节点中。The network card returns a success response message to the client, where the success response message is used to indicate that the first data has been stored in the primary node to be stored and the at least one secondary node.
  2. 根据权利要求1所述的方法,其特征在于,所述第一写请求中还包含所述第一数据所属的数据对象的标识,在所述网卡将所述第一写请求发送给待存储所述第一数据的至少一个从节点之前,所述方法还包括:The method according to claim 1, wherein the first write request further includes an identifier of the data object to which the first data belongs, and the first write request is sent to the to-be-stored place by the network card Before the at least one slave node of the first data, the method further includes:
    所述网卡按照设置的数据分布算法对所述数据对象的标识进行计算,获得存储所述数据对象的多个存储节点,其中,存储所述数据对象的多个存储节点包括所述第一节点;The network card calculates the identifier of the data object according to the set data distribution algorithm, and obtains a plurality of storage nodes that store the data object, wherein the plurality of storage nodes storing the data object include the first node;
    所述网卡根据设定的主节点选择规则,确定所述第一节点为存储所述数据对象的主节点,并确定所述存储所述数据对象的多个存储节点中除第一节点之外的其他存储节点为存储所述数据对象的从节点。Determining, by the network card, the first node as a primary node storing the data object according to the set primary node selection rule, and determining, among the plurality of storage nodes storing the data object, other than the first node The other storage nodes are slave nodes that store the data objects.
  3. 根据权利要求2所述的方法,其特征在于,所述第一节点还包括处理器和第二存储器,所述第二存储器为非易失性存储器,所述方法还包括:The method of claim 2, wherein the first node further comprises a processor and a second memory, the second memory is a non-volatile memory, the method further comprising:
    所述处理器从所述第一存储器中读取所述第一写请求;The processor reads the first write request from the first memory;
    所述处理器根据所述第一写请求,将所述第一数据写入所述第二存储器中与所述数据对象的标识对应的所述数据对象中。The processor writes the first data into the data object corresponding to the identifier of the data object in the second memory according to the first write request.
  4. 根据权利要求3所述的方法,其特征在于,在所述处理器根据所述第一写请求将所述第一数据写入所述第二存储器中的所述数据对象之后,所述方法还包括:The method according to claim 3, wherein after the processor writes the first data to the data object in the second memory according to the first write request, the method further include:
    所述处理器向所述网卡发送空间释放消息,其中,所述空间释放消息中包含所述数据对象的标识;The processor sends a space release message to the network card, where the space release message includes an identifier of the data object;
    所述网卡根据所述数据对象的标识删除所述第一存储器中的所述第一写请求。The network card deletes the first write request in the first memory according to the identifier of the data object.
  5. 根据权利要求1-4任一项所述的方法,其特征在于,还包括:The method according to any one of claims 1 to 4, further comprising:
    所述网卡接收第二写请求,所述第二写请求中携带有待存储的第二数据;The network card receives a second write request, where the second write request carries the second data to be stored;
    所述网卡将所述第二写请求缓存于所述第一存储器中;The network card buffers the second write request in the first memory;
    所述网卡向所述多个存储节点中的第二节点发送第二缓存成功消息,所述第二缓存成功消息用于指示所述第一节点已经成功缓存所述第二写请求,其中,所述第一节点为待存储所述第二数据的至少一个从节点中的一个从节点,所述第二节点为待存储所述第二数据的主节点。Sending, by the network card, a second cache success message to a second one of the plurality of storage nodes, where the second cache success message is used to indicate that the first node has successfully cached the second write request, where The first node is a slave node of at least one slave node to be stored, and the second node is a master node to store the second data.
  6. 一种存储设备,其特征在于,所述存储设备为分布式存储系统中的一个存储节点,所述分布式存储系统中包括多个存储节点,所述存储设备包括接口卡,所述接口卡中包括 第一存储器,所述第一存储器为非易失性存储器,其中,A storage device, wherein the storage device is a storage node in a distributed storage system, the distributed storage system includes a plurality of storage nodes, and the storage device includes an interface card, where the interface card is Including a first memory, the first memory is a non-volatile memory, wherein
    所述接口卡,用于:The interface card is used to:
    接收第一写请求,所述第一写请求中携带有待存储的第一数据;Receiving a first write request, where the first write request carries first data to be stored;
    将所述第一写请求缓存于所述第一存储器中;Caching the first write request in the first memory;
    将所述第一写请求发送给所述分布式存储系统中待存储所述第一数据的至少一个从节点,其中,所述存储设备为所述分布式存储系统中待存储所述第一数据的主节点;Sending the first write request to at least one slave node of the distributed storage system to be stored, where the storage device is the first data to be stored in the distributed storage system Primary node
    接收所述至少一个从节点返回的第一缓存成功消息,所述第一缓存成功消息用于指示所述至少一个从节点已经成功缓存所述第一写请求;Receiving, by the at least one slave node, a first cache success message, where the first cache success message is used to indicate that the at least one slave node has successfully cached the first write request;
    向客户端返回成功响应消息,所述成功响应消息用于指示所述第一数据已经被存储于待存储所述第一数据的主节点以及所述至少一个从节点中;Returning a success response message to the client, where the success response message is used to indicate that the first data has been stored in a primary node and the at least one secondary node that are to store the first data;
    所述第一存储器,用于缓存所述第一写请求。The first memory is configured to cache the first write request.
  7. 根据权利要求6所述的存储设备,其特征在于,所述第一写请求中还包含所述第一数据所属的数据对象的标识,所述接口卡还用于:The storage device according to claim 6, wherein the first write request further includes an identifier of the data object to which the first data belongs, and the interface card is further configured to:
    在将所述第一写请求发送给待存储所述第一数据的至少一个从节点之前,按照设置的数据分布算法对所述数据对象的标识进行计算,获得存储所述数据对象的多个存储节点,其中,存储所述数据对象的多个存储节点包括所述存储设备;Before the first write request is sent to the at least one slave node to be stored, the identifier of the data object is calculated according to a set data distribution algorithm, and multiple stores storing the data object are obtained. a node, wherein the plurality of storage nodes storing the data object comprise the storage device;
    根据设定的主节点选择规则,确定所述存储设备为存储所述数据对象的主节点,并确定存储所述数据对象的多个存储节点中除所述存储设备之外的其他存储节点为存储所述数据对象的从节点。Determining, according to the set primary node selection rule, the storage device as a primary node storing the data object, and determining that other storage nodes other than the storage device of the plurality of storage nodes storing the data object are stored The slave node of the data object.
  8. 根据权利要求7所述的存储设备,其特征在于,所述存储设备还包括:处理器和第二存储器,所述第二存储器为非易失性存储器,其中,The storage device according to claim 7, wherein the storage device further comprises: a processor and a second memory, wherein the second memory is a non-volatile memory, wherein
    所述处理器,用于从所述第一存储器中读取所述第一写请求,并根据所述第一写请求,将所述第一数据写入所述第二存储器中与所述数据对象的标识对应的所述数据对象中;The processor, configured to read the first write request from the first memory, and write the first data into the second memory and the data according to the first write request The data object corresponding to the identifier of the object;
    所述第二存储器,用于存储所述数据对象。The second memory is configured to store the data object.
  9. 根据权利要求8所述的存储设备,其特征在于,A storage device according to claim 8 wherein:
    所述处理器,还用于在根据所述第一写请求将所述第一数据写入所述第二存储器中的所述数据对象之后,向所述接口卡发送空间释放消息,其中,所述空间释放消息中包含所述数据对象的标识;The processor is further configured to send a space release message to the interface card after the first data is written to the data object in the second memory according to the first write request, where The space release message includes an identifier of the data object;
    所述接口卡,还用于根据所述数据对象的标识删除所述第一存储器中的所述第一写请求。The interface card is further configured to delete the first write request in the first memory according to the identifier of the data object.
  10. 根据权利要求6-9任一项所述的存储设备,其特征在于,所述接口卡还用于:The storage device according to any one of claims 6-9, wherein the interface card is further configured to:
    接收第二写请求,所述第二写请求中携带有待存储的第二数据;Receiving a second write request, where the second write request carries the second data to be stored;
    将所述第二写请求缓存于所述第一存储器中;Caching the second write request in the first memory;
    向所述分布式存储系统中的目标存储节点发送第二缓存成功消息,所述第二缓存成功消息用于指示所述存储设备已经成功缓存所述第二写请求,其中,所述存储设备为待存储所述第二数据的至少一个从节点中的一个从节点,所述目标存储节点为所述分布式存储系统中待存储所述第二数据的主节点;Sending a second cache success message to the target storage node in the distributed storage system, where the second cache success message is used to indicate that the storage device has successfully cached the second write request, where the storage device is a slave node of the at least one slave node of the second data to be stored, where the target storage node is a master node of the distributed storage system to store the second data;
    所述第一存储器,还用于缓存所述第二写请求。The first memory is further configured to cache the second write request.
  11. 一种分布式存储系统,其特征在于,所述分布式存储系统中包含多个存储节点,每个存储节点中包括网卡,所述网卡中包括第一存储器,所述第一存储器为非易失性存储 器,其中,A distributed storage system, wherein the distributed storage system includes a plurality of storage nodes, each storage node includes a network card, the network card includes a first memory, and the first memory is a nonvolatile Sexual storage Device, among them,
    第一节点中的网卡,用于接收写请求,所述写请求中携带有待存储的数据;并将所述写请求缓存于所述第一节点中的第一存储器中;以及将所述写请求发送给第二节点,其中,所述第一节点、所述第二节点为所述多个存储节点中的存储节点,所述第一节点为待存储所述数据的主节点,所述第二节点为待存储所述数据的从节点;a network card in the first node, configured to receive a write request, where the write request carries data to be stored; and cache the write request in a first memory in the first node; and the write request Sending to the second node, where the first node and the second node are storage nodes of the plurality of storage nodes, the first node is a primary node to store the data, and the second node The node is a slave node to which the data is to be stored;
    所述第二节点中的网卡,用于接收所述写请求,并将所述写请求缓存于所述第二节点中的第一存储器中,以及向所述第一节点发送缓存成功消息,所述缓存成功消息用于指示所述第二节点已经成功缓存所述写请求;a network card in the second node, configured to receive the write request, cache the write request in a first memory in the second node, and send a cache success message to the first node, where The cache success message is used to indicate that the second node has successfully cached the write request;
    所述第一节点中的网卡,还用于接收所述第二节点返回的缓存成功消息,以及向客户端返回成功响应消息,所述成功响应消息用于指示所述数据已经被存储于待存储所述数据的主节点以及从节点中。The network card in the first node is further configured to receive a cache success message returned by the second node, and return a success response message to the client, where the success response message is used to indicate that the data has been stored in the to-be-stored The primary node and the secondary node of the data.
  12. 如权利要求11所述的系统,其特征在于,所述写请求中还包含所述数据所属的数据对象的标识,所述第一节点中的网卡,还用于:The system of claim 11, wherein the write request further includes an identifier of the data object to which the data belongs, and the network card in the first node is further configured to:
    在将所述写请求发送给第二节点之前,按照设置的数据分布算法对所述数据对象的标识进行计算,获得存储所述数据对象的多个存储节点,其中,存储所述数据对象的多个存储节点包括所述第一节点和所述第二节点;Before the sending the request to the second node, calculating the identifier of the data object according to the set data distribution algorithm, obtaining a plurality of storage nodes storing the data object, wherein storing the data object Storage nodes include the first node and the second node;
    根据设定的主节点选择规则,确定所述第一节点为存储所述数据对象的主节点,并确定所述第二节点为存储所述数据对象的从节点。Determining, according to the set primary node selection rule, that the first node is a primary node that stores the data object, and determining that the second node is a secondary node that stores the data object.
  13. 如权利要求12所述的系统,其特征在于,每个存储节点中还包括处理器和第二存储器,所述第二存储器为非易失性存储器,The system of claim 12, further comprising a processor and a second memory in each of the storage nodes, the second memory being a non-volatile memory.
    所述第一节点中的处理器,用于从所述第一节点中的第一存储器中读取所述写请求,并根据所述写请求,将所述数据写入所述第一节点中的第二存储器中、所述数据对象标识对应的所述数据对象中;a processor in the first node, configured to read the write request from a first memory in the first node, and write the data into the first node according to the write request In the second memory, the data object identifier corresponds to the data object;
    所述第二节点中的处理器,用于从所述第二节点中的第一存储器中读取所述写请求,并根据所述写请求,将所述数据写入所述第二节点中的第二存储器中、所述数据对象标识对应的所述数据对象中。a processor in the second node, configured to read the write request from a first memory in the second node, and write the data into the second node according to the write request In the second memory, the data object identifier is in the data object corresponding to the data object.
  14. 如权利要求13所述的系统,其特征在于,The system of claim 13 wherein:
    所述第一节点中的处理器,还用于在所述数据写入所述第一节点中的第二存储器中的所述数据对象之后,向所述第一节点中的网卡发送空间释放消息,其中,所述空间释放消息中包含所述数据对应的标识;The processor in the first node is further configured to send a space release message to the network card in the first node after the data is written into the data object in the second memory in the first node The space release message includes an identifier corresponding to the data;
    所述第一节点中的网卡,还用于根据所述数据对象的标识删除所述第一节点中的第一存储器中的所述写请求;The network card in the first node is further configured to delete the write request in the first memory in the first node according to the identifier of the data object;
    所述二节点中的处理器,还用于在所述数据写入所述第二节点中的第二存储器中的所述数据对象之后,向所述第二节点中的网卡发送空间释放消息,其中,所述空间释放消息中包含所述数据对应的标识;The processor in the two nodes is further configured to send a space release message to the network card in the second node after the data is written into the data object in the second memory in the second node, The space release message includes an identifier corresponding to the data.
    所述第二节点中的网卡,还用于根据所述数据对象的标识删除所述第二节点中的第一存储器中的所述写请求。 The network card in the second node is further configured to delete the write request in the first memory in the second node according to the identifier of the data object.
PCT/CN2017/091240 2017-06-30 2017-06-30 Data storage method and device WO2019000423A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201780087416.3A CN110337633A (en) 2017-06-30 2017-06-30 A kind of date storage method and equipment
PCT/CN2017/091240 WO2019000423A1 (en) 2017-06-30 2017-06-30 Data storage method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/091240 WO2019000423A1 (en) 2017-06-30 2017-06-30 Data storage method and device

Publications (1)

Publication Number Publication Date
WO2019000423A1 true WO2019000423A1 (en) 2019-01-03

Family

ID=64740801

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/091240 WO2019000423A1 (en) 2017-06-30 2017-06-30 Data storage method and device

Country Status (2)

Country Link
CN (1) CN110337633A (en)
WO (1) WO2019000423A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023024656A1 (en) * 2021-08-23 2023-03-02 华为技术有限公司 Data access method, storage system, and storage node

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111813334B (en) * 2020-06-04 2024-04-05 中国科学院上海微系统与信息技术研究所 Ceph write performance optimization and double-control node combination method
CN113297324B (en) * 2021-04-13 2023-05-23 福建天泉教育科技有限公司 Data writing optimization method and terminal

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514173A (en) * 2012-06-20 2014-01-15 华为技术有限公司 Data processing method and node equipment
CN105450588A (en) * 2014-07-31 2016-03-30 华为技术有限公司 RDMA-based data transmission method and RDMA network cards
CN106569729A (en) * 2015-10-09 2017-04-19 阿里巴巴集团控股有限公司 Method and device for writing in data in distributed system
CN106873902A (en) * 2015-12-11 2017-06-20 杭州华为数字技术有限公司 A kind of document storage system, data dispatching method and back end

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567505B (en) * 2011-12-26 2015-05-20 中兴通讯股份有限公司 Distributed database and data manipulation method
CN104125303B (en) * 2014-08-12 2018-09-04 北京奇虎科技有限公司 Reading and writing data requesting method, client and system
CN104135539B (en) * 2014-08-15 2018-03-16 华为技术有限公司 Date storage method, SDN controllers and distributed network storage system
CN107748702B (en) * 2015-06-04 2021-05-04 华为技术有限公司 Data recovery method and device
CN105278877A (en) * 2015-09-30 2016-01-27 成都华为技术有限公司 Object storage method and device
CN105160039A (en) * 2015-10-13 2015-12-16 四川携创信息技术服务有限公司 Query method based on big data
US9733834B1 (en) * 2016-01-28 2017-08-15 Weka.IO Ltd. Congestion mitigation in a distributed storage system
CN105739924B (en) * 2016-01-29 2018-05-11 华为技术有限公司 Caching method and system based on cache cluster
CN106603692B (en) * 2016-12-27 2020-12-01 中国银联股份有限公司 Data storage method and device in distributed storage system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514173A (en) * 2012-06-20 2014-01-15 华为技术有限公司 Data processing method and node equipment
CN105450588A (en) * 2014-07-31 2016-03-30 华为技术有限公司 RDMA-based data transmission method and RDMA network cards
CN106569729A (en) * 2015-10-09 2017-04-19 阿里巴巴集团控股有限公司 Method and device for writing in data in distributed system
CN106873902A (en) * 2015-12-11 2017-06-20 杭州华为数字技术有限公司 A kind of document storage system, data dispatching method and back end

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023024656A1 (en) * 2021-08-23 2023-03-02 华为技术有限公司 Data access method, storage system, and storage node

Also Published As

Publication number Publication date
CN110337633A (en) 2019-10-15

Similar Documents

Publication Publication Date Title
CN102523279B (en) A kind of distributed file system and focus file access method thereof
US8347050B2 (en) Append-based shared persistent storage
US9507720B2 (en) Block storage-based data processing methods, apparatus, and systems
US11397668B2 (en) Data read/write method and apparatus, and storage server
US20130232313A1 (en) System and method for managing replication in an object storage system
KR101709118B1 (en) Method and appratus for managing file in hybrid storage system
WO2019127018A1 (en) Memory system access method and device
KR20140106588A (en) Application consistent snapshots of a shared volume
CN108540510B (en) Cloud host creation method and device and cloud service system
CN112988680B (en) Data acceleration method, cache unit, electronic device and storage medium
US8090907B2 (en) Method for migration of synchronous remote copy service to a virtualization appliance
WO2019000423A1 (en) Data storage method and device
CN113626184A (en) Super-fusion performance optimization method, device and equipment
CN111399760B (en) NAS cluster metadata processing method and device, NAS gateway and medium
CN106020713A (en) File storage method based on buffer area
US11068299B1 (en) Managing file system metadata using persistent cache
WO2019127017A1 (en) Method and apparatus for managing storage device in storage system
WO2019127021A1 (en) Management method and apparatus for storage device in storage system
US9164904B2 (en) Accessing remote memory on a memory blade
WO2016065610A1 (en) Method for accessing files, distributed storage system and storage node
US11755226B2 (en) Tracking changes of storage volumes during data transfers
WO2017177400A1 (en) Data processing method and system
KR20150010242A (en) Method of data re-replication in asymmetric file system
CN113434290A (en) Data processing method and device based on RAFT protocol, and computer storage medium
KR101486546B1 (en) Duplication system and method for treating system failure

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17915954

Country of ref document: EP

Kind code of ref document: A1