WO2019086016A1 - Data storage method and device - Google Patents

Data storage method and device Download PDF

Info

Publication number
WO2019086016A1
WO2019086016A1 PCT/CN2018/113837 CN2018113837W WO2019086016A1 WO 2019086016 A1 WO2019086016 A1 WO 2019086016A1 CN 2018113837 W CN2018113837 W CN 2018113837W WO 2019086016 A1 WO2019086016 A1 WO 2019086016A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
data
logical address
mirror
written
Prior art date
Application number
PCT/CN2018/113837
Other languages
French (fr)
Chinese (zh)
Inventor
赵旺
陈飘
张鹏
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2019086016A1 publication Critical patent/WO2019086016A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0613Improving I/O performance in relation to throughput
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0631Configuration or reconfiguration of storage systems by allocating resources to storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD

Definitions

  • the embodiments of the present invention relate to the field of computer technologies, and in particular, to a data storage method and device.
  • a distributed array of independent disk (RAID) system generally consists of multiple storage nodes interconnected by a network.
  • redundant algorithms such as erasure code (EC) algorithm can be used to implement redundant storage of data, and in order to improve the system Input/output (I/O) performance enables data caching (cache) mechanisms in distributed RAID systems, allowing storage nodes to quickly read and write data from the cache.
  • I/O Input/output
  • the local storage node may send the write request to other storage nodes in the distributed RAID system after receiving a write request, so that the data in the write request can be separately stored to include the local storage node. At least two storage nodes are cached to ensure the reliability of the data, and then the local storage node can reply to the write request. After the local storage node receives multiple write requests, it may determine N data blocks from the data in the cache that has not been written to the hard disk, and calculate M check data blocks according to the EC algorithm, and then respectively go to N+M. The storage nodes send N data blocks and M check data blocks, so that each of the N+M storage nodes can store the data in the received data block into its own hard disk.
  • the local storage node since the local storage node sends data to the node when sending the write request to other storage nodes in the process of storing the data to the hard disk, the local storage node sends N data to the N storage nodes. Blocks also need to send data across nodes, so data is sent twice across nodes, resulting in lower I/O bandwidth.
  • the embodiment of the present application provides a data storage method and device, which solves the problem that the I/O bandwidth of the system is low because the data is sent twice across the node.
  • the present application provides a data storage method, which is applied to a distributed RAID system, where the distributed RAID system includes K storage nodes, and K is an integer greater than 0.
  • the method may include: receiving, receiving, including, to be written a first write request of the data and the logical address of the data to be written, and storing the data to be written in its own cache, and determining a mirror node according to the logical address of the data to be written, and recording the first correspondence, the first A correspondence relationship is a mapping relationship between the logical address of the data to be written and the mirror node, and the receiving node sends the logical address to be written and the data to be written to the mirror node, and from the cache according to the first correspondence.
  • the mirror nodes of the N target data blocks are different from each other, so that the receiving node sends a notification message to the N mirror nodes corresponding to the N target data blocks, and the notification message is included in each notification message.
  • the logical address of the target data block corresponding to the mirrored node to be used to indicate that the mirror node caches itself and the logic in the notification message. Access to the corresponding data block stored in the hard disk itself.
  • the receiving node is any one of K storage nodes, and N is an integer greater than 0 and less than K.
  • the receiving node determines the mirroring node according to the logical address of the data to be written, and sends the logical address to be written and the data to be written to the mirroring node, so that the N target data blocks respectively Stored in the N mirror nodes, the receiving node can directly send a notification message including the logical address of the target data block to the N mirror nodes, to instruct the mirror node to write the data block corresponding to the logical address to the hard disk, so that Compared with the prior art, the storage node needs to send data to be written twice across the node, and the receiving node only needs to send data once across the node when sending the data to be written to the mirror node, and the logic included in the notification message
  • the address has a smaller amount of data and consumes less bandwidth, thus improving system I/O bandwidth compared to the prior art.
  • the logical address of the data to be written includes at least one sub-logical address
  • the integer X and the receiving node performs a hash calculation on each integer X by using a pre-configured hash algorithm, and then the result of each hash calculation is used, and the number of the mirror node is obtained according to the result of the redundancy.
  • the length of the data block is the number of sub-logical addresses included in each data block.
  • the receiving node when the receiving node performs hash calculation on each integer X by using a pre-configured hash algorithm, if the preset number of mirroring nodes is One, the integer X is hashed using a pre-configured hash algorithm.
  • the receiving node when the receiving node performs a hash calculation on each integer X by using a pre-configured hash algorithm, if the preset number of mirroring nodes is greater than One, the integer X is calculated using a different type of hash algorithm pre-configured.
  • the receiving node when the receiving node performs a hash calculation on each integer X by using a pre-configured hash algorithm, if the preset number of mirroring nodes is greater than One is to calculate the integer X by using a pre-configured hash algorithm, and then obtain a preset number of other results according to the calculation result.
  • the logical address of the data to be written includes at least one sub-logical address, and the first correspondence relationship records the mirror node corresponding to each sub-logical address.
  • the receiving node selects N target data blocks in the cache according to the first correspondence, and specifically includes: the receiving node determines, in the first correspondence, a child logical address having the same mirror node, and the child corresponding to the N different mirror nodes Among the logical addresses, the sub-logical addresses constituting one data block are selected, and the data corresponding to the selected sub-logical addresses constitutes N target data blocks.
  • the receiving node selects a sub-logical logical address constituting one data block from the sub-logical logical addresses corresponding to the N different mirroring nodes, determining the composition A sub-logical address missing in a data block, determining whether the missing sub-logical address is recorded in the second correspondence, the second correspondence being the logical address of the data written to the hard disk of the mirror node and the hard disk written to the mirror node Correspondence of physical addresses.
  • the method may further include: the receiving node calculates M check data blocks according to the N target data blocks, and sends the M check data blocks. Storage node storage except for N mirror nodes.
  • the method may further include: The receiving node deletes the correspondence between the sub-logical address of the N target data blocks and the mirror node from the first correspondence. And the receiving node deletes the N target data blocks from the cache, and sends an indication message to the N mirror nodes corresponding to the N target data blocks, where each indication message includes a mirror node corresponding to the indication message The logical address of the target data block, used to instruct the mirror node to delete the data block corresponding to the logical address in the indication message from the cache.
  • the present application provides a data storage method, which is applied to a distributed RAID system, where the distributed RAID system includes K storage nodes, and K is an integer greater than 0.
  • the method may include: the mirror node receives at least one receiving node.
  • the first write request sent includes the logical address of the mirrored data and the mirrored data, and the mirrored data is written into its own cache, and the logical relationship between the logical address of the mirrored data and the receiving node is recorded.
  • the mirror node selects N target data blocks in its own cache according to the correspondence between the mirror data and the receiving node, and the receiving nodes of the N target data blocks are different from each other, and respectively receive N corresponding to the N target data blocks.
  • the node sends a notification message, where each notification message includes a logical address of the target data block corresponding to the receiving node to which the notification message is sent, to instruct the receiving node to write the data block corresponding to the logical address in the cache to the hard disk.
  • the receiving node is a node that receives a second write request from the host, and the second write request includes a logical address to be written and a data to be written, and the mirror node is determined by the receiving node according to the logical address of the data to be written.
  • the image data is the data written to the mirror node determined by the receiving node, and N is an integer greater than 0 and less than K.
  • the mirroring node writes the mirrored data in the received first write request to the cache, and records the correspondence between the logical address of the mirrored data and the receiving node, so that the N target data blocks respectively Stored in the N receiving nodes, the mirroring node can directly send a notification message including the logical address of the target data block to the N receiving nodes, to instruct the receiving node to write the data block corresponding to the logical address to the hard disk, so that Compared with the prior art, the storage node needs to send data twice to be written across the node, and only sends data once across the node, and the occupied bandwidth is small because the data amount of the logical address included in the notification message is small. Therefore, the system I/O bandwidth is improved compared with the prior art.
  • the present application provides a receiving node, which may include a module capable of implementing the method in the above first aspect and its various embodiments.
  • the application provides a mirroring node, which may include a module capable of implementing the method in the second aspect above.
  • the application provides a storage node, where the storage node includes: at least one processor, a memory, a communication interface, and a communication bus. At least one processor is coupled to the memory and the communication interface via a communication bus, the memory is configured to store the computer execution instructions, and when the storage node is in operation, the processor executes the memory storage computer execution instructions to cause the storage node to perform the first aspect or the first A data storage method of any of the possible implementations of aspects, or a data storage method as in the second aspect.
  • the present application provides a computer storage medium having stored thereon computer-executable instructions for implementing any of the possible implementations of the first aspect or the first aspect when the computer-executed instructions are executed by the processor A data storage method, or a data storage method as in the second aspect.
  • the present application also provides a computer program product, which when executed on a computer, causes the computer to perform the method of the first aspect or the second aspect described above.
  • the present application also provides a communication chip in which computer-executed instructions are stored, and when executed on a computer, cause the computer to perform the method of the first aspect or the second aspect described above.
  • any of the devices or computer storage media or computer program products provided above are used to perform the corresponding methods provided above, and therefore, the beneficial effects that can be achieved can be referred to the beneficial effects in the corresponding methods. , will not repeat them here.
  • FIG. 1 is a schematic structural diagram of an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of a storage node according to an embodiment of the present application.
  • FIG. 3 is a flowchart of a data storage method according to an embodiment of the present application.
  • FIG. 4 is a flowchart of another data storage method according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of another storage node according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of another storage node according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of another storage node according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of another storage node according to an embodiment of the present application.
  • FIG. 1 is a schematic structural diagram of an embodiment of the present application. As shown in FIG. 1 , the architecture may include: a host 11 and a distributed RAID system 12 .
  • the host 11 is configured to send a first write request to any one of the distributed RAID systems 12 when the host 11 needs to store data to the distributed RAID system 12, and also receive a response message returned by the storage node.
  • the response message is used to notify the host 11 that the data to be stored in the first write request has been stored in the distributed RAID system 12.
  • the distributed RAID system 12 is composed of K storage nodes through network interconnection, and is used to provide a large amount of storage space, K is an integer greater than 1, and each storage node in the distributed RAID system 12 can be a server in a specific implementation.
  • FIG. 2 is a schematic diagram of a composition of a storage node according to an embodiment of the present disclosure.
  • the storage node may include: at least one processor 21, a memory 22, a communication interface 23, and a communication bus 24.
  • the processor 21 is a control center of the storage node, and may be a processor or a collective name of a plurality of processing elements.
  • the processor 21 is a central processing unit (CPU), may be an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present application.
  • CPU central processing unit
  • ASIC application specific integrated circuit
  • DSPs digital signal processors
  • FPGAs field programmable gate arrays
  • the storage node may include one or more CPUs, such as CPU0 and CPU1 shown in FIG. 2.
  • the storage node may include a plurality of processors, such as processor 21 and processor 25 shown in FIG.
  • processors can be a single core processor (CPU) or a multi-core processor (multi-CPU).
  • a processor herein may refer to one or more devices, circuits, and/or processing cores for processing data, such as computer program instructions.
  • processor 21 may perform various functions of the storage node by running or executing a software program stored in memory 22, as well as invoking data stored in memory 22.
  • the processor 21 can execute the computer program code stored in the memory 22 to execute the data storage method provided by the present application, and save the data to be stored in the write request to the hard disk of the distributed RAID system.
  • the memory 22 can be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (RAM) or other type that can store information and instructions.
  • the dynamic storage device can also be an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disc storage, and a disc storage device. (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be Any other media accessed, but not limited to this.
  • Memory 22 may be present independently and coupled to processor 21 via communication bus 24. The memory 22 can also be integrated with the processor 21.
  • the memory 22 is used to store data in the present application and to execute the software program of the present application.
  • the memory 22 can be used to store computer program code corresponding to the data storage method provided by the embodiment of the present application.
  • the memory 22 may include a cache and a hard disk for storing data to be stored in the write request.
  • the communication interface 23 uses devices such as any transceiver for communicating with other devices or communication networks, such as a host, a radio access network (RAN), a wireless local area networks (WLAN), and the like.
  • the communication interface 23 may include a receiving unit that implements a receiving function, and a transmitting unit that implements a transmitting function.
  • the communication bus 24 may be an industry standard architecture (ISA) bus, a peripheral component interconnect (PCI) bus, or an extended industry standard architecture (EISA) bus.
  • ISA industry standard architecture
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 2, but it does not mean that there is only one bus or one type of bus.
  • the data storage method shown in FIG. 3 may be used to store data. As shown in FIG. 3, the method may include:
  • the receiving node is taken as the first storage node as an example.
  • the first storage node is a storage node that receives any first write request sent by the host in any one of the distributed RAID systems.
  • the first storage node receives a first write request sent by the host.
  • the first write request may include a logical address to be written and a data to be written, and the logical address of the data to be written includes at least one sub-logical address.
  • the logical address of the data to be written generally includes a first address and a data length. For example, if the first address is 0 and the data length is 3, the logical address of the data to be written includes 3 sub-logical addresses. , that is, logical address 0, logical address 1, and logical address 2.
  • the first storage node stores the data to be written in the first write request in its own cache.
  • the first storage node determines the mirror node according to the logical address of the data to be written in the first write request.
  • the length of the data block is the number of sub-logical addresses included in each data block.
  • the sub-logical address is rounded after dividing by the length of the data block to obtain an integer X corresponding to each sub-logical address.
  • the first storage node may perform a hash calculation on each integer X by using a pre-configured hash algorithm, and then perform the result of each hash calculation after dividing by the number K of storage nodes in the distributed RAID system. Take the remainder and get the number of the mirrored node based on the result of the remainder.
  • the first storage node When the first storage node performs hash calculation on each integer X by using a pre-configured hash algorithm, if the preset number of mirror nodes is only one, the first storage node may adopt a pre-configured hash algorithm. Calculate the integer X and then take the remainder to get the number of the mirror node. If the preset number of the mirrored nodes is greater than one, in one implementation manner, the first storage node may calculate the integer X by using a different type of hash algorithm configured in advance, and the type of the hash algorithm and the mirror node The number is the same. After the hash result is used, the number of different mirror nodes is obtained.
  • the first storage node may perform hash calculation on the integer X first, and after obtaining the remainder, obtain the number of one mirror node, and then determine the number of the other mirror node according to the determined result of the number of the mirror node. For example, if the preset number of mirror nodes is three, the first storage node may first perform hash calculation on the integer X, and take the number of the mirror node obtained after the remainder, and then the number result of the mirror node with the determination The two mirror node number values adjacent to each other are used as the other two mirror nodes, or two hash calculations are performed on the calculation result to obtain the numbers of the other two mirror nodes.
  • the length S of the data block may be pre-configured in the first storage node, so that starting from the sub-logical address 0, each consecutive S sub-logical addresses is a logical address of one data block. .
  • the first storage node performs a rounding operation on each sub-logical address by adopting the length S of the data block, so that the integer X obtained by rounding each sub-logical address in one data block is the same, thereby making it according to a data block.
  • the mirror nodes calculated by each sub-logical address are the same, that is, one data block can exist in one mirror node.
  • the preset number of mirror nodes is determined by the reliability index of the distributed RAID system.
  • the reliability index of the distributed RAID system is that in the case where the caches of the two storage nodes respectively fail, the data in the caches of the two storage nodes can still be recovered, and the data in each storage node is It needs to be stored in the cache of at least three storage nodes respectively, that is, the preset number of mirror nodes in the first storage node is two or more.
  • the sub-logical addresses included in the logical address of the data to be written are: logical address 2, logical address 3, logical address 4, and the length S of the pre-configured data block in the first storage node is 4, and There are two hash algorithms configured, the preset number of mirror nodes is two, and the number K of storage nodes in the distributed RAID system is 16. Then, the first storage node can perform rounding calculation on the logical addresses 2, 3, and 4 respectively, and obtain corresponding Xs of 0, 0, and 1, respectively. In this way, the first storage node may perform hash calculation on the integer 0 by using two hash algorithms respectively, and then perform the remainder after dividing the two hash calculation results by 16 to obtain the number of the mirror node, which is assumed to be 5 respectively.
  • the first storage node can determine that the mirror nodes corresponding to the logical address 2 are the storage nodes numbered 5 and 6, and the mirror nodes corresponding to the logical address 3 are the storage nodes numbered 5 and 6, and the logical address 4 The corresponding mirror nodes are storage nodes numbered 6 and 7.
  • the first storage node records the first correspondence.
  • the first storage node may save the first correspondence, where the first correspondence is a mapping relationship between each sub-logical address and the number of the mirror node. .
  • the first storage node 1 can save the first correspondence as shown in Table 1.
  • the first storage node sends the logical address to be written and the data to be written to the mirror node.
  • the first storage node may send the logical address of the data to be written and the data to be written to the mirror node.
  • the first storage node may send a second write request to each mirror node, where the second write request includes a logical address of the mirrored data and the mirrored data, where the logical address of the mirrored data is in the first correspondence relationship and the The logical address corresponding to the mirror node.
  • the mirror node can save the mirror data contained in the received second write request in its own cache.
  • the mirroring node may send a response message to the first storage node, to notify the first storage node that the mirror data included in the second write request has been stored in the cache of the mirror node.
  • the data to be written included in the first write request is data A, data B, and data C, wherein data A corresponds to logical address 2, and data B corresponds to logical address 3.
  • the data C corresponds to the logical address 4.
  • the logical address of the mirrored data includes: a logical address 2 corresponding to the mirror node 5, a logical address 3, and mirror data. Includes: Data A and Data B.
  • the logical address of the mirrored data includes: a logical address 2 corresponding to the mirror node 6, a logical address 3, and a logical address 4.
  • the mirrored data includes: data A, data B and data C.
  • the logical address of the mirrored data includes: a logical address 4 corresponding to the mirror node 7, and the mirrored data includes the data C.
  • the first storage node sends a response message to the host.
  • the first storage node may send a response message to the host to notify the host that the data to be written in the first write request is stored in the distributed RAID system. in.
  • the host sends multiple write requests to the first storage node.
  • the first storage node may process the received multiple write requests in parallel according to the processing capability, and the first storage Steps 301 - 306 can be performed when the node processes each write request.
  • the data can be persisted in the background. Specifically, the following steps 307-308 can be performed:
  • the first storage node selects N target data blocks from its own cache according to the first correspondence, and the mirror nodes of the N target data blocks are different from each other.
  • the target data block includes data corresponding to each consecutive S logical addresses.
  • the first storage node may first determine a sub-logical address having the same mirror node in the first correspondence, the same mirror node may be one or more, and if there are multiple mirror nodes, the first storage A node can first select a mirror node from multiple mirror nodes. In this way, the first storage node can select N mutually different mirror nodes, and select sub-logical addresses constituting one data block from the sub-logical addresses corresponding to the N different mirror nodes, and the selected sub-logical addresses correspond to The data is N target data blocks.
  • the first storage node When the first storage node selects a sub-logical address constituting a data block, it may first determine a sub-logical address that is missing when the data block is formed, and then determine whether the missing logical address is recorded in the second correspondence,
  • the second correspondence is a correspondence between a logical address of data written to the hard disk of the mirror node and a physical address of the hard disk written to the mirror node. If the missing sub-logical address does not exist in the second correspondence, it indicates that the first storage node does not write data corresponding to the missing sub-logical address to the hard disk, and the first storage node may set the missing sub-logic
  • the data corresponding to the address is 0 to constitute a target data block, and the missing sub-logical address is recorded into the first correspondence.
  • the missing sub-logical address exists in the second correspondence, it indicates that the first storage node has stored data corresponding to the missing logical address to the hard disk of the mirror node, and at this time, the first storage node may be according to the second Corresponding relationship, the data corresponding to the missing sub-logical address is obtained from the mirror node to form a target data block, and the missing sub-logical address is recorded into the first correspondence.
  • the first correspondence stored by the first storage node 1 is as shown in Table 2.
  • the first storage node 1 can be determined in Table 2, respectively, with the sub-logical addresses of the mirror nodes 2, 3, 4, 7. Then, the first storage node 1 can select three mirror nodes that are different from each other, such as mirror nodes numbered 2, 3, and 4. At this time, the first storage node 1 can obtain the sub-logical addresses corresponding to the three mirror nodes. Selecting a sub-logical address constituting a data block, and when selecting a sub-logical address constituting a data block, the first storage node may determine the missing sub-logical address 5 and add the corresponding logical address 5 corresponding to the missing Data is used to form a target data block, and the missing logical address 5 is recorded into the first correspondence.
  • the three target data blocks selected by the first storage node 1 are: logical address 3, logical address 4, and logic. Data corresponding to address 5, data corresponding to logical address 9, logical address 10, and logical address 11, data corresponding to logical address 0, logical address 1, and logical address 2.
  • the first storage node sends a notification message to the N mirror nodes corresponding to the N target data blocks.
  • the notification message includes a logical address of the target data block corresponding to the mirror node to which the notification message is sent, and the notification message is used to indicate that the mirror node caches the data block in the cache corresponding to the logical address in the notification message. Store to your own hard drive.
  • the first storage node may send a notification message to the N mirror nodes corresponding to the N target data blocks, respectively.
  • each mirroring node may first determine whether data corresponding to the logical address in the notification message is stored in its own cache. If the data corresponding to the logical address in the notification message is stored in the self cache, the mirror node may store the data block corresponding to the logical address in the notification message to its own hard disk. If the data corresponding to a logical address in the notification message is not stored in the cache, the mirror node may determine whether the logical address of the missing data is recorded in the logical address of the data written to the hard disk and the physical address written to the hard disk.
  • the mirror node may set the data corresponding to the logical address of the missing data to 0 to constitute the target data block. If yes, the mirror node can acquire data corresponding to the logical address to form a target data block according to a mapping relationship between a logical address of data written to the hard disk and a physical address written to the hard disk. The first storage node then stores the data block corresponding to the logical address in the notification message to its own hard disk.
  • the first storage node 1 may respectively send a notification message to the storage node numbered 2, 3, 4, wherein the first storage node 1 sends a notification to the storage node numbered 2
  • the logical address included in the message is 3, 4, and 5.
  • the logical address included in the notification message sent by the first storage node 1 to the storage node numbered 3 is 9, 10, and 11, and the first storage node 1 is numbered 4
  • the logical address included in the notification message sent by the storage node is 0, 1, and 2.
  • the first storage node calculates M check data blocks according to the N target data blocks.
  • the first storage node may calculate the data of the N target data blocks by using the EC algorithm. , obtain M check data blocks.
  • the first storage node sends the M check data blocks to a storage node storage other than the N mirror nodes.
  • the first storage node may select M storage nodes except N mirror nodes, and send M check data blocks to the M storage nodes respectively. Storage, so that each storage node stores the received check data block into its own hard disk, so that N target data blocks stored in the hard disk of the N storage nodes and M checksums stored in the hard disks of the M storage nodes are stored.
  • the data blocks may form a stripe such that, in the event that data of any of the N target data blocks of the strip is corrupted or lost, the remaining uncorrupted NY target data in the strip may be passed.
  • the data of the block and the data of the M check data blocks recover the data of the damaged Y target data blocks.
  • the first storage node deletes the correspondence between the sub-logical address of the N target data blocks and the mirror node from the first correspondence, and deletes N target data blocks from the cache, and respectively goes to the N target data blocks.
  • the corresponding N mirror nodes send an indication message.
  • the data of the N target data blocks is stored in the hard disks of the N storage nodes of the distributed RAID system, and the data of the M check data blocks is stored in the hard disks of the M storage nodes, and the N target data is indicated.
  • the block's data has been reliably stored in a distributed RAID system.
  • the first storage node may delete the correspondence between the sub-logical address of the N target data blocks and the mirror node from the first correspondence.
  • the first storage node may delete the N target data blocks from the cache, and send an indication message to the N mirror nodes corresponding to the N target data blocks, where the indication message includes the mirror node corresponding to the indication message.
  • the logical address of the target data block is used to instruct the mirror node to delete the data block corresponding to the logical address in the indication message from its own cache.
  • the logical address included in the indication message sent by the first storage node to any one of the N mirror nodes corresponding to the N target data blocks, and the first step in step 308 The logical address included in the notification message sent by the storage node to the mirror node is the same.
  • the receiving node determines the mirroring node according to the logical address of the data to be written, and sends the logical address to be written and the data to be written to the mirroring node, so that the N target data blocks respectively Stored in the N mirror nodes, the receiving node can directly send a notification message including the logical address of the target data block to the N mirror nodes, to instruct the mirror node to write the data block corresponding to the logical address to the hard disk, so that Compared with the prior art, the storage node needs to send data to be written twice across the node, and the receiving node only needs to send data once across the node when sending the data to be written to the mirror node, and the logic included in the notification message
  • the address has a smaller amount of data and consumes less bandwidth, thus improving system I/O bandwidth compared to the prior art.
  • the first storage node rounds each sub-logical address by adopting the length of the data block, so that one data block can exist in one mirror node.
  • the result of each hash calculation is divided by K, and the number of the mirror node is obtained according to the result of the redundancy, so that the calculated mirror node is a storage node in the distributed RAID system.
  • FIG. 4 is a flowchart of another data storage method according to an embodiment of the present application. As shown in FIG. 4, the method may include:
  • the mirror node is used as the second storage node as an example for description.
  • the second storage node receives a first write request sent by at least one receiving node.
  • the first write request includes a logical address of the mirrored data and the mirrored data
  • the receiving node is a node that receives the second write request from the host, and the second write request includes the logical address to be written and the data to be written
  • the mirror node The receiving node determines, according to the logical address of the data to be written, the mirrored data is data determined by the receiving node to be written to the mirror node.
  • the logical address of the mirrored data includes at least one sub-logical address.
  • the second storage node writes the mirrored data into its own cache, and records the correspondence between the logical address of the mirrored data and the receiving node.
  • the second storage node may save the mirrored data in the received first write request in the own cache, and record each sub-logical address in the logical address of the mirrored data. Correspondence with the receiving node.
  • the second storage node selects N target data blocks from its own cache according to the correspondence between the logical address of the mirrored data and the receiving node, and the receiving nodes of the N data blocks are different from each other.
  • the second storage node may first determine a sub-logical address having the same receiving node in the correspondence between the logical address of the mirrored data and the receiving node. Then, the first storage node may select N different receiving nodes, and select sub-logical addresses constituting one data block from the sub-logical addresses corresponding to the N different receiving nodes, and the selected sub-logical addresses correspond to The data is N target data blocks.
  • a specific description of selecting a sub-logical address constituting a data block for the second storage node may refer to step 307 in FIG. 3, where the first storage node selects a data block.
  • the description of the sub-logical address is not described here.
  • Step 404-Step 407 in the embodiment of the present application, the related description of Step 404-Step 407 is similar to the related description of Step 308-Step 311 in FIG. 3 of another embodiment of the present application, and the related description of Step 404-Step 407 is performed.
  • the steps 308 to 311 in FIG. 3 which are not described in detail in the embodiments of the present application.
  • the mirroring node writes the mirrored data in the received first write request to the cache, and records the correspondence between the logical address of the mirrored data and the receiving node, so that the N target data blocks respectively Stored in the N receiving nodes, the mirroring node can directly send a notification message including the logical address of the target data block to the N receiving nodes, to instruct the receiving node to write the data block corresponding to the logical address to the hard disk, so that Compared with the prior art, the storage node needs to send data twice to be written across the node, and only sends data once across the node, and the occupied bandwidth is small because the data amount of the logical address included in the notification message is small. Therefore, the system I/O bandwidth is improved compared with the prior art.
  • the storage node includes corresponding hardware structures and/or software modules for performing various functions.
  • the present invention can be implemented in a combination of hardware or hardware and computer software in combination with the algorithm steps of the various examples described in the embodiments disclosed herein. Whether a function is implemented in hardware or computer software to drive hardware depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods for implementing the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.
  • the embodiment of the present application may perform the division of the function module on the storage node according to the foregoing method example.
  • each function module may be divided according to each function, or two or more functions may be integrated into one processing module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules. It should be noted that the division of the modules in the embodiment of the present application is schematic, and is only a logical function division, and the actual implementation may have another division manner.
  • FIG. 5 is a schematic diagram showing a possible composition of the storage node involved in the foregoing and the embodiment.
  • the storage node may include: a receiving unit 51.
  • the storage unit 52 the determining unit 53, the transmitting unit 54, and the selecting unit 55.
  • the receiving unit 51 is configured to support the storage node to perform step 301 in the data storage method shown in FIG. 3.
  • the storage unit 52 is configured to support the storage node to perform step 302 and step 304 in the data storage method shown in FIG. 3.
  • the determining unit 53 is configured to support the storage node to perform step 303 in the data storage method shown in FIG. 3.
  • the sending unit 54 is configured to support the storage node to send the N mirror nodes corresponding to the N target data blocks respectively, as described in step 305, step 306, step 308, step 310, and step 311 in the data storage method shown in FIG. Indicate the message.
  • the selecting unit 55 is configured to support the storage node to perform step 307 in the data storage method shown in FIG. 3.
  • the storage node may further include: a calculating unit 56 and a deleting unit 57.
  • the calculating unit 56 is configured to support the storage node to perform step 309 in the data storage method shown in FIG. 3.
  • a deleting unit 57 configured to support the storage node to perform the correspondence between the sub-logical address and the mirror node of the N target data blocks deleted from the first correspondence relationship as described in step 311 in the data storage method shown in FIG. N target data blocks are deleted in their own cache.
  • the storage node provided by the embodiment of the present application is used to execute the data storage method in FIG. 3 above, so that the same effect as the above data storage method can be achieved.
  • FIG. 7 is a schematic diagram showing a possible composition of the storage node involved in the foregoing and the embodiment.
  • the storage node may include: a receiving unit 61.
  • the storage unit 62 the selection unit 63, and the transmission unit 64.
  • the receiving unit 61 is configured to support the storage node to perform step 401 in the data storage method shown in FIG. 4.
  • the storage unit 62 is configured to support the storage node to perform step 402 in the data storage method shown in FIG. 4.
  • the selecting unit 63 is configured to support the storage node to perform step 403 in the data storage method shown in FIG. 4.
  • the sending unit 64 is configured to support the storage node to send the indication message to the N receiving nodes corresponding to the N target data blocks respectively, as described in step 404, step 406, and step 407 in the data storage method shown in FIG. 4 .
  • the storage node provided in the embodiment of the present application is used to execute the data storage method in FIG. 4 above, so that the same effect as the above data storage method can be achieved.
  • FIG. 8 shows another possible composition diagram of the storage node involved in the above embodiment.
  • the storage node includes a processing module 71 and a communication module 72.
  • the processing module 71 is configured to control and manage the action of the storage node.
  • the processing module 71 is configured to support the storage node to perform the deletion from the first correspondence relationship as described in step 303, step 307, step 309, and step 311 in FIG. 3 .
  • the correspondence between the address and the receiving node deletes the correspondence between the sub-logical address of the N target data blocks and the mirror node, and deletes N target data blocks from its own cache, and/or other techniques for the techniques described herein process.
  • Communication module 72 is used to support communication of storage nodes with other network entities, such as hosts, other storage nodes in a distributed RAID system.
  • the communication module 72 is configured to support the storage node to perform the sending to the N mirror nodes corresponding to the N target data blocks, respectively, as described in step 301, step 305, step 306, step 308, step 310, and step 311 in FIG.
  • the indication message, the step 401, the step 404, the step 406, and the step 407 shown in FIG. 4 respectively send an indication message to the N receiving nodes corresponding to the N target data blocks.
  • the storage node may further include a storage module 73 for storing program code and data of the storage node.
  • the storage module 73 is configured to support the storage node to perform step 302, step 304 in FIG. 3, and step 402 shown in FIG.
  • the processing module 71 can be the processor or controller in FIG. 2. It is possible to implement or carry out the various illustrative logical blocks, modules and circuits described in connection with the present disclosure.
  • the processor can also be a combination of computing functions, for example, including one or more microprocessor combinations, a combination of a DSP and a microprocessor, and the like.
  • the communication module 72 can be the communication interface or the like in FIG.
  • the storage module 73 can be the memory in FIG.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the modules or units is only a logical function division.
  • there may be another division manner for example, multiple units or components may be used.
  • the combination may be integrated into another device, or some features may be ignored or not performed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may be one physical unit or multiple physical units, that is, may be located in one place, or may be distributed to multiple different places. . Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a readable storage medium.
  • the technical solution of the embodiments of the present application may be embodied in the form of a software product in the form of a software product in essence or in the form of a contribution to the prior art, and the software product is stored in a storage medium.
  • a number of instructions are included to cause a device (which may be a microcontroller, chip, etc.) or a processor to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes various media that can store program codes, such as a USB flash drive, a mobile hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

A data storage method and device, relating to the technical field of computers, and solving the problem that data is sent twice across nodes, resulting in an I/O bandwidth of a system being low. A particular solution is: a receiving node receiving a first writing request comprising data to be written and a logical address of the data to be written, storing, in its own buffer, the data to be written, determining a mirror node according to the logical address of the data to be written, and recording a first correlation, wherein the first correlation is a mapping relation between the logical address of the data to be written and the mirror node; and the receiving node sending the data to be written and the logical address of the data to be written to the mirror node, selecting N target data blocks from its own buffer according to the first correlation, with mirror nodes of the N target data blocks being different from one another, and respectively sending a notification message to N mirror nodes corresponding to the N target data blocks.

Description

一种数据存储方法及设备Data storage method and device 技术领域Technical field
本申请实施例涉及计算机技术领域,尤其涉及一种数据存储方法及设备。The embodiments of the present invention relate to the field of computer technologies, and in particular, to a data storage method and device.
背景技术Background technique
分布式独立磁盘冗余阵列(redundant array of independent disk,RAID)系统一般由多个存储节点通过网络互联组成。在将数据存储至分布式RAID系统的情况下,为了确保数据的可靠性,可以采用冗余算法,如纠删码(erasure code,EC)算法来实现数据的冗余存放,且为了提升系统的输入/输出(input/output,I/O)性能,可以在分布式RAID系统中实现数据缓存(cache)机制,使得存储节点可以从缓存中快速读写数据。A distributed array of independent disk (RAID) system generally consists of multiple storage nodes interconnected by a network. In the case of storing data to a distributed RAID system, in order to ensure data reliability, redundant algorithms such as erasure code (EC) algorithm can be used to implement redundant storage of data, and in order to improve the system Input/output (I/O) performance enables data caching (cache) mechanisms in distributed RAID systems, allowing storage nodes to quickly read and write data from the cache.
在现有技术中,本地存储节点可以在接收到一个写请求后,向分布式RAID系统中的其他存储节点发送该写请求,使得该写请求中的数据能够分别存储到包括本地存储节点在内的至少两个存储节点的缓存中,以确保该数据的可靠性,然后本地存储节点可以回复该写请求。在本地存储节点接收到多个写请求之后,可以从自身缓存中还未写入硬盘的数据中确定N个数据块,并根据EC算法计算出M个校验数据块,再分别向N+M个存储节点发送N个数据块和M个校验数据块,以便N+M个存储节点中的每个存储节点可以将接收到的数据块中的数据存储至自身的硬盘中。In the prior art, the local storage node may send the write request to other storage nodes in the distributed RAID system after receiving a write request, so that the data in the write request can be separately stored to include the local storage node. At least two storage nodes are cached to ensure the reliability of the data, and then the local storage node can reply to the write request. After the local storage node receives multiple write requests, it may determine N data blocks from the data in the cache that has not been written to the hard disk, and calculate M check data blocks according to the EC algorithm, and then respectively go to N+M. The storage nodes send N data blocks and M check data blocks, so that each of the N+M storage nodes can store the data in the received data block into its own hard disk.
现有技术中至少存在以下技术问题:由于在将数据存储到硬盘的过程中,本地存储节点向其他存储节点发送写请求时会跨节点发送数据,本地存储节点向N个存储节点发送N个数据块时也需要跨节点发送数据,因此数据被跨节点发送了两次,导致系统的I/O带宽较低。At least the following technical problems exist in the prior art: since the local storage node sends data to the node when sending the write request to other storage nodes in the process of storing the data to the hard disk, the local storage node sends N data to the N storage nodes. Blocks also need to send data across nodes, so data is sent twice across nodes, resulting in lower I/O bandwidth.
发明内容Summary of the invention
本申请实施例提供一种数据存储方法及设备,解决了由于数据被跨节点发送了两次,导致系统的I/O带宽较低的问题。The embodiment of the present application provides a data storage method and device, which solves the problem that the I/O bandwidth of the system is low because the data is sent twice across the node.
为达到上述目的,本申请采用如下技术方案:To achieve the above objectives, the present application adopts the following technical solutions:
第一方面,本申请提供一种数据存储方法,应用于分布式RAID系统,该分布式RAID系统包括K个存储节点,K为大于0的整数,该方法可以包括:接收节点接收包括有待写入数据和待写入数据的逻辑地址的第一写请求,并将待写入数据存储在自身的缓存中,且根据待写入数据的逻辑地址确定镜像节点,并记录第一对应关系,该第一对应关系为待写入数据的逻辑地址与镜像节点的映射关系,且接收节点将待写入数据和待写入数据的逻辑地址发送至镜像节点,并根据第一对应关系从自身的缓存中选择N个目标数据块,该N个目标数据块的镜像节点互不相同,这样,接收节点分别向N个目标数据块对应的N个镜像节点发送通知消息,每个通知消息中包括该通知消息所发往的镜像节点对应的目标数据块的逻辑地址,用于指示镜像节点将自身缓存中与该通知消息中的逻辑地址对应的数据块存储至自身的硬盘中。其中,该接收节点为K个存储节点中的任意一个,N为大于0且小于K的整数。In a first aspect, the present application provides a data storage method, which is applied to a distributed RAID system, where the distributed RAID system includes K storage nodes, and K is an integer greater than 0. The method may include: receiving, receiving, including, to be written a first write request of the data and the logical address of the data to be written, and storing the data to be written in its own cache, and determining a mirror node according to the logical address of the data to be written, and recording the first correspondence, the first A correspondence relationship is a mapping relationship between the logical address of the data to be written and the mirror node, and the receiving node sends the logical address to be written and the data to be written to the mirror node, and from the cache according to the first correspondence. Selecting N target data blocks, the mirror nodes of the N target data blocks are different from each other, so that the receiving node sends a notification message to the N mirror nodes corresponding to the N target data blocks, and the notification message is included in each notification message. The logical address of the target data block corresponding to the mirrored node to be used to indicate that the mirror node caches itself and the logic in the notification message. Access to the corresponding data block stored in the hard disk itself. The receiving node is any one of K storage nodes, and N is an integer greater than 0 and less than K.
本申请实施例提供的数据存储方法,接收节点通过根据待写入数据的逻辑地址确定 镜像节点,并向镜像节点发送待写入数据和待写入数据的逻辑地址,使得N个目标数据块分别存储在N个镜像节点中,接收节点便可以直接向N个镜像节点发送包含有目标数据块的逻辑地址的通知消息,来指示镜像节点将与该逻辑地址对应的数据块写入硬盘,这样,本申请与现有技术中存储节点需要跨节点发送两次待写入数据相比,接收节点仅需在向镜像节点发送待写入数据时跨节点发送一次数据,且由于通知消息中包括的逻辑地址的数据量较小,占用的带宽较小,因此与现有技术相比,提升了系统I/O带宽。In the data storage method provided by the embodiment of the present application, the receiving node determines the mirroring node according to the logical address of the data to be written, and sends the logical address to be written and the data to be written to the mirroring node, so that the N target data blocks respectively Stored in the N mirror nodes, the receiving node can directly send a notification message including the logical address of the target data block to the N mirror nodes, to instruct the mirror node to write the data block corresponding to the logical address to the hard disk, so that Compared with the prior art, the storage node needs to send data to be written twice across the node, and the receiving node only needs to send data once across the node when sending the data to be written to the mirror node, and the logic included in the notification message The address has a smaller amount of data and consumes less bandwidth, thus improving system I/O bandwidth compared to the prior art.
结合第一方面,在一种可能的实现方式中,待写入数据的逻辑地址包括至少一个子逻辑地址,接收节点根据待写入数据的逻辑地址确定镜像节点,具体的可以包括:接收节点采用公式:X=Int(子逻辑地址/数据块的长度),对待写入数据的逻辑地址中的每个子逻辑地址在除以数据块的长度之后进行取整运算,获得与每个子逻辑地址对应的整数X,且接收节点采用预先配置的哈希算法对每个整数X进行哈希计算,再对每个哈希计算的结果进行取余,并根据取余的结果得到镜像节点的编号。其中,数据块的长度为每个数据块所包括的子逻辑地址的个数。With reference to the first aspect, in a possible implementation, the logical address of the data to be written includes at least one sub-logical address, and the receiving node determines the mirroring node according to the logical address of the data to be written, which may specifically include: adopting the receiving node Formula: X=Int (sub-logical address/length of data block), each sub-logical address in the logical address of the data to be written is rounded after dividing by the length of the data block to obtain a correspondence corresponding to each sub-logical address The integer X, and the receiving node performs a hash calculation on each integer X by using a pre-configured hash algorithm, and then the result of each hash calculation is used, and the number of the mirror node is obtained according to the result of the redundancy. The length of the data block is the number of sub-logical addresses included in each data block.
结合第一方面和上述可能的实现方式,在另一种可能的实现方式中,接收节点在采用预先配置的哈希算法对每个整数X进行哈希计算时,如果镜像节点的预设数量为1个,则采用预先配置的哈希算法对整数X进行哈希计算。With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, when the receiving node performs hash calculation on each integer X by using a pre-configured hash algorithm, if the preset number of mirroring nodes is One, the integer X is hashed using a pre-configured hash algorithm.
结合第一方面和上述可能的实现方式,在另一种可能的实现方式中,接收节点在采用预先配置的哈希算法对每个整数X进行哈希计算时,如果镜像节点的预设数量大于1个,则采用预先配置的不同种类的哈希算法对整数X进行计算。With reference to the first aspect and the foregoing possible implementation manner, in another possible implementation manner, when the receiving node performs a hash calculation on each integer X by using a pre-configured hash algorithm, if the preset number of mirroring nodes is greater than One, the integer X is calculated using a different type of hash algorithm pre-configured.
结合第一方面和上述可能的实现方式,在另一种可能的实现方式中,接收节点在采用预先配置的哈希算法对每个整数X进行哈希计算时,如果镜像节点的预设数量大于1个,则先采用预先配置的一种哈希算法对整数X进行计算,然后根据该计算结果获取预设数量的其他结果。With reference to the first aspect and the foregoing possible implementation manner, in another possible implementation manner, when the receiving node performs a hash calculation on each integer X by using a pre-configured hash algorithm, if the preset number of mirroring nodes is greater than One is to calculate the integer X by using a pre-configured hash algorithm, and then obtain a preset number of other results according to the calculation result.
结合第一方面和上述可能的实现方式,在另一种可能的实现方式中,待写入数据的逻辑地址包括至少一个子逻辑地址,第一对应关系记录了每个子逻辑地址对应的镜像节点。接收节点根据第一对应关系在缓存中选择N个目标数据块,具体的可以包括:接收节点在第一对应关系中确定具有同样镜像节点的子逻辑地址,并从N个不同镜像节点对应的子逻辑地址中选出构成一个数据块的子逻辑地址,所选择的子逻辑地址对应的数据构成N个目标数据块。With reference to the first aspect and the foregoing possible implementation manner, in another possible implementation manner, the logical address of the data to be written includes at least one sub-logical address, and the first correspondence relationship records the mirror node corresponding to each sub-logical address. The receiving node selects N target data blocks in the cache according to the first correspondence, and specifically includes: the receiving node determines, in the first correspondence, a child logical address having the same mirror node, and the child corresponding to the N different mirror nodes Among the logical addresses, the sub-logical addresses constituting one data block are selected, and the data corresponding to the selected sub-logical addresses constitutes N target data blocks.
结合第一方面和上述可能的实现方式,在另一种可能的实现方式中,接收节点在从N个不同镜像节点对应的子逻辑地址中选出构成一个数据块的子逻辑地址时,确定构成一个数据块时缺失的子逻辑地址,确定缺失的子逻辑地址是否记录在第二对应关系中,该第二对应关系为写入镜像节点的硬盘的数据的逻辑地址与写入该镜像节点的硬盘的物理地址的对应关系。如果不存在,设置缺失的子逻辑地址对应的数据为0以构成目标数据块,并将缺失的子逻辑地址记录至第一对应关系;如果存在,则根据第二对应关系,从该镜像节点中获取缺失的子逻辑地址对应的数据构成目标数据块,并将缺失的子逻辑地址加入第一对应关系。With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, when the receiving node selects a sub-logical logical address constituting one data block from the sub-logical logical addresses corresponding to the N different mirroring nodes, determining the composition A sub-logical address missing in a data block, determining whether the missing sub-logical address is recorded in the second correspondence, the second correspondence being the logical address of the data written to the hard disk of the mirror node and the hard disk written to the mirror node Correspondence of physical addresses. If not, set the data corresponding to the missing sub-logical address to 0 to form the target data block, and record the missing sub-logical address to the first correspondence; if yes, according to the second correspondence, from the mirror node Obtaining the data corresponding to the missing sub-logical address constitutes the target data block, and adding the missing sub-logical address to the first correspondence.
结合第一方面和上述可能的实现方式,在另一种可能的实现方式中,还可以包括:接收节点根据N个目标数据块计算M个校验数据块,并将M个校验数据块发送至除N个镜像节点之外的存储节点存储。With reference to the first aspect and the foregoing possible implementation manner, in another possible implementation manner, the method may further include: the receiving node calculates M check data blocks according to the N target data blocks, and sends the M check data blocks. Storage node storage except for N mirror nodes.
结合第一方面和上述可能的实现方式,在另一种可能的实现方式中,在接收节点将 M个校验数据块发送至除N个镜像节点之外的存储节点存储之后,还可以包括:接收节点从第一对应关系中删除N个目标数据块的子逻辑地址与镜像节点的对应关系。且接收节点从自身的缓存中删除N个目标数据块,并分别向N个目标数据块对应的N个镜像节点发送指示消息,每个指示消息中包括该指示消息所发往的镜像节点对应的目标数据块的逻辑地址,用于指示镜像节点从缓存中删除与该指示消息中的逻辑地址对应的数据块。With reference to the first aspect and the foregoing possible implementation manner, in another possible implementation manner, after the receiving node sends the M check data blocks to the storage node storage other than the N mirror nodes, the method may further include: The receiving node deletes the correspondence between the sub-logical address of the N target data blocks and the mirror node from the first correspondence. And the receiving node deletes the N target data blocks from the cache, and sends an indication message to the N mirror nodes corresponding to the N target data blocks, where each indication message includes a mirror node corresponding to the indication message The logical address of the target data block, used to instruct the mirror node to delete the data block corresponding to the logical address in the indication message from the cache.
第二方面,本申请提供一种数据存储方法,应用于分布式RAID系统,该分布式RAID系统包括K个存储节点,K为大于0的整数,该方法可以包括:镜像节点接收至少一个接收节点发送的第一写请求,该第一写请求包括镜像数据及镜像数据的逻辑地址,并将镜像数据写入自身的缓存,记录镜像数据的逻辑地址与接收节点的对应关系。且镜像节点根据镜像数据与接收节点的对应关系在自身的缓存中选择N个目标数据块,该N个目标数据块的接收节点互不相同,并分别向N个目标数据块对应的N个接收节点发送通知消息,每个通知消息包括该通知消息所发往的接收节点对应的目标数据块的逻辑地址,以指示接收节点将自身缓存中与该逻辑地址对应的数据块写入硬盘。其中,接收节点为从主机接收到第二写入请求的节点,第二写请求包括待写入数据及待写入数据的逻辑地址,该镜像节点为接收节点根据待写入数据的逻辑地址确定,镜像数据为接收节点确定的写入该镜像节点的数据,N为大于0且小于K的整数。In a second aspect, the present application provides a data storage method, which is applied to a distributed RAID system, where the distributed RAID system includes K storage nodes, and K is an integer greater than 0. The method may include: the mirror node receives at least one receiving node. The first write request sent includes the logical address of the mirrored data and the mirrored data, and the mirrored data is written into its own cache, and the logical relationship between the logical address of the mirrored data and the receiving node is recorded. And the mirror node selects N target data blocks in its own cache according to the correspondence between the mirror data and the receiving node, and the receiving nodes of the N target data blocks are different from each other, and respectively receive N corresponding to the N target data blocks. The node sends a notification message, where each notification message includes a logical address of the target data block corresponding to the receiving node to which the notification message is sent, to instruct the receiving node to write the data block corresponding to the logical address in the cache to the hard disk. The receiving node is a node that receives a second write request from the host, and the second write request includes a logical address to be written and a data to be written, and the mirror node is determined by the receiving node according to the logical address of the data to be written. The image data is the data written to the mirror node determined by the receiving node, and N is an integer greater than 0 and less than K.
本申请实施例提供的数据存储方法,镜像节点通过将接收到的第一写请求中的镜像数据写入缓存,并记录镜像数据的逻辑地址与接收节点的对应关系,使得N个目标数据块分别存储在N个接收节点中,镜像节点便可以直接向N个接收节点发送包含有目标数据块的逻辑地址的通知消息,来指示接收节点将与该逻辑地址对应的数据块写入硬盘,这样,本申请与现有技术中存储节点需要跨节点发送两次待写入数据相比,仅跨节点发送了一次数据,且由于通知消息中包括的逻辑地址的数据量较小,占用的带宽较小,因此与现有技术相比,提升了系统I/O带宽。In the data storage method provided by the embodiment of the present application, the mirroring node writes the mirrored data in the received first write request to the cache, and records the correspondence between the logical address of the mirrored data and the receiving node, so that the N target data blocks respectively Stored in the N receiving nodes, the mirroring node can directly send a notification message including the logical address of the target data block to the N receiving nodes, to instruct the receiving node to write the data block corresponding to the logical address to the hard disk, so that Compared with the prior art, the storage node needs to send data twice to be written across the node, and only sends data once across the node, and the occupied bandwidth is small because the data amount of the logical address included in the notification message is small. Therefore, the system I/O bandwidth is improved compared with the prior art.
第三方面,本申请提供一种接收节点,该接收节点可以包括能够实现上述第一方面及其各实施方式中的方法的模块。In a third aspect, the present application provides a receiving node, which may include a module capable of implementing the method in the above first aspect and its various embodiments.
第四方面,本申请提供一种镜像节点,该镜像节点可以包括能够实现上述第二方面中的方法的模块。In a fourth aspect, the application provides a mirroring node, which may include a module capable of implementing the method in the second aspect above.
第五方面,本申请提供一种存储节点,该存储节点包括:至少一个处理器、存储器、通信接口和通信总线。至少一个处理器与存储器、通信接口通过通信总线连接,存储器用于存储计算机执行指令,当存储节点运行时,处理器执行存储器存储的计算机执行指令,以使存储节点执行如第一方面或第一方面的可能的实现方式中任意一项的数据存储方法,或者执行如第二方面中的数据存储方法。In a fifth aspect, the application provides a storage node, where the storage node includes: at least one processor, a memory, a communication interface, and a communication bus. At least one processor is coupled to the memory and the communication interface via a communication bus, the memory is configured to store the computer execution instructions, and when the storage node is in operation, the processor executes the memory storage computer execution instructions to cause the storage node to perform the first aspect or the first A data storage method of any of the possible implementations of aspects, or a data storage method as in the second aspect.
第六方面,本申请提供一种计算机存储介质,其上存储有计算机执行指令,当计算机执行指令被处理器执行时,实现如第一方面或第一方面的可能的实现方式中任意一项的数据存储方法,或实现如第二方面中的数据存储方法。In a sixth aspect, the present application provides a computer storage medium having stored thereon computer-executable instructions for implementing any of the possible implementations of the first aspect or the first aspect when the computer-executed instructions are executed by the processor A data storage method, or a data storage method as in the second aspect.
第七方面,本申请还提供了一种计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面或第二方面所述的方法。In a seventh aspect, the present application also provides a computer program product, which when executed on a computer, causes the computer to perform the method of the first aspect or the second aspect described above.
第八方面,本申请还提供了一种通信芯片,其中存储有计算机执行指令,当其在计算机上运行时,使得计算机执行上述第一方面或第二方面所述的方法。In an eighth aspect, the present application also provides a communication chip in which computer-executed instructions are stored, and when executed on a computer, cause the computer to perform the method of the first aspect or the second aspect described above.
可以理解地,上述提供的任一种装置或计算机存储介质或计算机程序产品均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考对应的方法中的有益 效果,此处不再赘述。It is to be understood that any of the devices or computer storage media or computer program products provided above are used to perform the corresponding methods provided above, and therefore, the beneficial effects that can be achieved can be referred to the beneficial effects in the corresponding methods. , will not repeat them here.
附图说明DRAWINGS
图1为本申请实施例提供的一种可以应用本申请实施例的架构图;FIG. 1 is a schematic structural diagram of an embodiment of the present application;
图2为本申请实施例提供的一种存储节点的组成示意图;2 is a schematic structural diagram of a storage node according to an embodiment of the present application;
图3为本申请实施例提供的一种数据存储方法的流程图;FIG. 3 is a flowchart of a data storage method according to an embodiment of the present application;
图4为本申请实施例提供的另一种数据存储方法的流程图;FIG. 4 is a flowchart of another data storage method according to an embodiment of the present application;
图5为本申请实施例提供的另一种存储节点的组成示意图;FIG. 5 is a schematic structural diagram of another storage node according to an embodiment of the present disclosure;
图6为本申请实施例提供的另一种存储节点的组成示意图;FIG. 6 is a schematic structural diagram of another storage node according to an embodiment of the present disclosure;
图7为本申请实施例提供的另一种存储节点的组成示意图;FIG. 7 is a schematic structural diagram of another storage node according to an embodiment of the present disclosure;
图8为本申请实施例提供的另一种存储节点的组成示意图。FIG. 8 is a schematic structural diagram of another storage node according to an embodiment of the present application.
具体实施方式Detailed ways
图1为本申请实施例提供的一种可以应用本申请实施例的架构图,如图1所示,该架构可以包括:主机11和分布式RAID系统12。FIG. 1 is a schematic structural diagram of an embodiment of the present application. As shown in FIG. 1 , the architecture may include: a host 11 and a distributed RAID system 12 .
其中,主机11用于在主机11需要将数据存储至分布式RAID系统12时,向分布式RAID系统12中的任意一个存储节点发送第一写请求,还用于接收该存储节点返回的响应消息,该响应消息用于通知主机11第一写请求中需存储的数据已存储在分布式RAID系统12中。The host 11 is configured to send a first write request to any one of the distributed RAID systems 12 when the host 11 needs to store data to the distributed RAID system 12, and also receive a response message returned by the storage node. The response message is used to notify the host 11 that the data to be stored in the first write request has been stored in the distributed RAID system 12.
分布式RAID系统12由K个存储节点通过网络互联组成,用于提供海量存储空间,K为大于1的整数,且分布式RAID系统12中的每个存储节点在具体的实现中可以为服务器。The distributed RAID system 12 is composed of K storage nodes through network interconnection, and is used to provide a large amount of storage space, K is an integer greater than 1, and each storage node in the distributed RAID system 12 can be a server in a specific implementation.
需要说明的是,在本申请实施例中,为了便于区分分布式RAID系统12中不同的存储节点,可以用不同的编号来代表不同的存储节点。It should be noted that, in the embodiment of the present application, in order to facilitate distinguishing different storage nodes in the distributed RAID system 12, different numbers may be used to represent different storage nodes.
图2为本申请实施例提供的一种存储节点的组成示意图,如图2所示,该存储节点可以包括:至少一个处理器21、存储器22、通信接口23和通信总线24。FIG. 2 is a schematic diagram of a composition of a storage node according to an embodiment of the present disclosure. As shown in FIG. 2, the storage node may include: at least one processor 21, a memory 22, a communication interface 23, and a communication bus 24.
下面结合图2对存储节点的各个构成部件进行具体的介绍:The following describes the components of the storage node in conjunction with Figure 2:
处理器21是存储节点的控制中心,可以是一个处理器,也可以是多个处理元件的统称。例如,处理器21是一个中央处理器(central processing unit,CPU),也可以是特定集成电路(application specific integrated circuit,ASIC),或者是被配置成实施本申请实施例的一个或多个集成电路,例如:一个或多个数字信号处理器(digital signal processor,DSP),或,一个或者多个现场可编程门阵列(field programmable gate array,FPGA)。The processor 21 is a control center of the storage node, and may be a processor or a collective name of a plurality of processing elements. For example, the processor 21 is a central processing unit (CPU), may be an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present application. For example, one or more digital signal processors (DSPs), or one or more field programmable gate arrays (FPGAs).
在具体的实现中,作为一种实施例,存储节点可以包括一个或多个CPU,例如图2中所示的CPU0和CPU1。且,作为一种实施例,存储节点可以包括多个处理器,例如图2中所示的处理器21和处理器25。这些处理器中的每一个可以是一个单核处理器(single-CPU),也可以是一个多核处理器(multi-CPU)。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。In a particular implementation, as an embodiment, the storage node may include one or more CPUs, such as CPU0 and CPU1 shown in FIG. 2. Moreover, as an embodiment, the storage node may include a plurality of processors, such as processor 21 and processor 25 shown in FIG. Each of these processors can be a single core processor (CPU) or a multi-core processor (multi-CPU). A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data, such as computer program instructions.
在具体的实现中,处理器21可以通过运行或执行存储在存储器22内的软件程序,以及调用存储在存储器22内的数据,执行存储节点的各种功能。例如,处理器21可以 运行存储器22中保存的计算机程序代码,以执行本申请提供的数据存储方法,将写请求中需存储的数据保存至分布式RAID系统的硬盘中。In a particular implementation, processor 21 may perform various functions of the storage node by running or executing a software program stored in memory 22, as well as invoking data stored in memory 22. For example, the processor 21 can execute the computer program code stored in the memory 22 to execute the data storage method provided by the present application, and save the data to be stored in the write request to the hard disk of the distributed RAID system.
存储器22可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器22可以是独立存在,通过通信总线24与处理器21相连接。存储器22也可以和处理器21集成在一起。The memory 22 can be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (RAM) or other type that can store information and instructions. The dynamic storage device can also be an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disc storage, and a disc storage device. (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be Any other media accessed, but not limited to this. Memory 22 may be present independently and coupled to processor 21 via communication bus 24. The memory 22 can also be integrated with the processor 21.
在具体的实现中,存储器22,用于存储本申请中的数据和执行本申请的软件程序。例如,存储器22可以用于存储本申请实施例提供的数据存储方法所对应的计算机程序代码。且在本申请实施例中,存储器22可以包括缓存和硬盘,用于存储写请求中需存储的数据。In a particular implementation, the memory 22 is used to store data in the present application and to execute the software program of the present application. For example, the memory 22 can be used to store computer program code corresponding to the data storage method provided by the embodiment of the present application. In the embodiment of the present application, the memory 22 may include a cache and a hard disk for storing data to be stored in the write request.
通信接口23,使用任何收发器一类的装置,用于与其他设备或通信网络通信,如主机、无线接入网(radio access network,RAN),无线局域网(wireless local area networks,WLAN)等。通信接口23可以包括接收单元实现接收功能,以及发送单元实现发送功能。The communication interface 23 uses devices such as any transceiver for communicating with other devices or communication networks, such as a host, a radio access network (RAN), a wireless local area networks (WLAN), and the like. The communication interface 23 may include a receiving unit that implements a receiving function, and a transmitting unit that implements a transmitting function.
通信总线24,可以是工业标准体系结构(industry standard architecture,ISA)总线、外部设备互连(peripheral component interconnect,PCI)总线或扩展工业标准体系结构(extended industry standard architecture,EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。为便于表示,图2中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The communication bus 24 may be an industry standard architecture (ISA) bus, a peripheral component interconnect (PCI) bus, or an extended industry standard architecture (EISA) bus. The bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 2, but it does not mean that there is only one bus or one type of bus.
由于在现有技术中,本地存储节点向其他存储节点发送写请求时需要跨节点发送数据,且分别向N个存储节点发送N个数据块时也需要跨节点发送数据,使得数据被跨节点发送了两次,从而导致系统的I/O带宽较低。因此为了解决系统的I/O带宽较低的问题,可以采用图3所示的数据存储方法来存储数据,如图3所示,该方法可以包括:In the prior art, when the local storage node sends a write request to other storage nodes, it is required to send data across nodes, and when sending N data blocks to N storage nodes respectively, it is also required to send data across nodes, so that data is sent across nodes. Twice, resulting in a lower I/O bandwidth of the system. Therefore, in order to solve the problem that the system has a low I/O bandwidth, the data storage method shown in FIG. 3 may be used to store data. As shown in FIG. 3, the method may include:
需要说明的是,在本申请实施例中,以接收节点为第一存储节点为例进行说明,第一存储节点为分布式RAID系统中任意一个接收到主机发送的第一写请求的存储节点。It should be noted that, in the embodiment of the present application, the receiving node is taken as the first storage node as an example. The first storage node is a storage node that receives any first write request sent by the host in any one of the distributed RAID systems.
301、第一存储节点接收主机发送的第一写请求。301. The first storage node receives a first write request sent by the host.
其中,第一写请求中可以包括待写入数据和待写入数据的逻辑地址,该待写入数据的逻辑地址包括至少一个子逻辑地址。在具体的实现中,该待写入数据的逻辑地址一般包括首地址和数据长度,例如,如果首地址为0,数据长度为3,则说明该待写入数据的逻辑地址包括3个子逻辑地址,即逻辑地址0、逻辑地址1、及逻辑地址2。The first write request may include a logical address to be written and a data to be written, and the logical address of the data to be written includes at least one sub-logical address. In a specific implementation, the logical address of the data to be written generally includes a first address and a data length. For example, if the first address is 0 and the data length is 3, the logical address of the data to be written includes 3 sub-logical addresses. , that is, logical address 0, logical address 1, and logical address 2.
302、第一存储节点将第一写请求中的待写入数据存储在自身的缓存中。302. The first storage node stores the data to be written in the first write request in its own cache.
303、第一存储节点根据第一写请求中的待写入数据的逻辑地址确定镜像节点。303. The first storage node determines the mirror node according to the logical address of the data to be written in the first write request.
其中,数据块的长度为每个数据块所包括的子逻辑地址的个数。在第一存储节点将待写入数据存储在自身的缓存中之后,第一存储节点可以采用公式:X=Int(子逻辑地址/数据块的长度),对待写入数据的逻辑地址中的每个子逻辑地址在除以数据块的长度之后进行取整运算,获得与每个子逻辑地址对应的整数X。此时,第一存储节点便可以采 用预先配置的哈希算法对每个整数X进行哈希计算,再对每个哈希计算的结果在除以分布式RAID系统中存储节点的数量K之后进行取余,并根据取余的结果得到镜像节点的编号。The length of the data block is the number of sub-logical addresses included in each data block. After the first storage node stores the data to be written in its own cache, the first storage node may adopt the formula: X=Int (the length of the sub-logical address/data block), each of the logical addresses of the data to be written. The sub-logical address is rounded after dividing by the length of the data block to obtain an integer X corresponding to each sub-logical address. At this time, the first storage node may perform a hash calculation on each integer X by using a pre-configured hash algorithm, and then perform the result of each hash calculation after dividing by the number K of storage nodes in the distributed RAID system. Take the remainder and get the number of the mirrored node based on the result of the remainder.
其中,第一存储节点在采用预先配置的哈希算法对每个整数X进行哈希计算时,如果镜像节点的预设数量仅有1个,则第一存储节点可以采用预先配置的哈希算法对整数X进行计算,然后取余,得到镜像节点的编号。如果镜像节点的预设数量大于1个,则,在一种实现方式中,第一存储节点可以采用预先配置的不同种类的哈希算法对整数X进行计算,哈希算法的种类与镜像节点的数量相同,对哈希结果取余后,分别得到不同的镜像节点的编号。在另一种实现方式中,第一存储节点可以先对整数X进行哈希计算,并取余后,得到一个镜像节点的编号,然后根据已确定的镜像节点的编号结果确定其他镜像节点的编号,例如,若镜像节点的预设数量为3个,则第一存储节点可以先对整数X进行哈希计算,并取余后得到的镜像节点的编号,然后将与确定的镜像节点的编号结果左右相邻的两个镜像节点编号值作为其他两个镜像节点,或者对计算结果再做两次哈希计算,得到其他两个镜像节点的编号。When the first storage node performs hash calculation on each integer X by using a pre-configured hash algorithm, if the preset number of mirror nodes is only one, the first storage node may adopt a pre-configured hash algorithm. Calculate the integer X and then take the remainder to get the number of the mirror node. If the preset number of the mirrored nodes is greater than one, in one implementation manner, the first storage node may calculate the integer X by using a different type of hash algorithm configured in advance, and the type of the hash algorithm and the mirror node The number is the same. After the hash result is used, the number of different mirror nodes is obtained. In another implementation manner, the first storage node may perform hash calculation on the integer X first, and after obtaining the remainder, obtain the number of one mirror node, and then determine the number of the other mirror node according to the determined result of the number of the mirror node. For example, if the preset number of mirror nodes is three, the first storage node may first perform hash calculation on the integer X, and take the number of the mirror node obtained after the remainder, and then the number result of the mirror node with the determination The two mirror node number values adjacent to each other are used as the other two mirror nodes, or two hash calculations are performed on the calculation result to obtain the numbers of the other two mirror nodes.
需要说明的是,在本申请实施例中,可以在第一存储节点中预先配置数据块的长度S,使得从子逻辑地址0开始,每连续的S个子逻辑地址便为一个数据块的逻辑地址。这样,第一存储节点通过采用数据块的长度S对每个子逻辑地址进行取整运算,使得根据一个数据块中的每个子逻辑地址取整得到的整数X相同,从而使得根据一个数据块中的每个子逻辑地址计算得到的镜像节点相同,也就是说,使得一个数据块能够存在一个镜像节点中。It should be noted that, in the embodiment of the present application, the length S of the data block may be pre-configured in the first storage node, so that starting from the sub-logical address 0, each consecutive S sub-logical addresses is a logical address of one data block. . In this way, the first storage node performs a rounding operation on each sub-logical address by adopting the length S of the data block, so that the integer X obtained by rounding each sub-logical address in one data block is the same, thereby making it according to a data block. The mirror nodes calculated by each sub-logical address are the same, that is, one data block can exist in one mirror node.
且,在本申请实施例中,预设数量的镜像节点是由分布式RAID系统的可靠性指标决定的。示例性的,假设分布式RAID系统的可靠性指标为在两个存储节点的缓存分别发生故障的情况下,这两个存储节点缓存中的数据仍可以恢复,那么每个存储节点中的数据便需要分别在至少三个存储节点的缓存中存储,即第一存储节点中镜像节点的预设数量为两个或两个以上。Moreover, in the embodiment of the present application, the preset number of mirror nodes is determined by the reliability index of the distributed RAID system. Exemplarily, assuming that the reliability index of the distributed RAID system is that in the case where the caches of the two storage nodes respectively fail, the data in the caches of the two storage nodes can still be recovered, and the data in each storage node is It needs to be stored in the cache of at least three storage nodes respectively, that is, the preset number of mirror nodes in the first storage node is two or more.
示例性的,假设待写入数据的逻辑地址中包括的子逻辑地址为:逻辑地址2、逻辑地址3、逻辑地址4,第一存储节点中预先配置的数据块的长度S为4,且预先配置有两种哈希算法,镜像节点的预设数量为两个,分布式RAID系统中存储节点的数量K为16。那么第一存储节点可以分别对逻辑地址2、3、4进行取整计算,得到相应的X分别为0、0、1。这样,第一存储节点可以分别采用两种哈希算法对整数0进行哈希计算,再分别对两个哈希计算结果在除以16之后进行取余,获得镜像节点的编号,假设分别为5和6,并分别采用两种哈希算法对整数1进行计算,再分别对两个哈希计算结果在除以16之后进行取余,获得镜像节点的编号,假设分别为6和7。此时,第一存储节点便可以确定与逻辑地址2对应的镜像节点是编号为5和6的存储节点,与逻辑地址3对应的镜像节点是编号为5和6的存储节点,与逻辑地址4对应的镜像节点是编号为6和7的存储节点。Exemplarily, the sub-logical addresses included in the logical address of the data to be written are: logical address 2, logical address 3, logical address 4, and the length S of the pre-configured data block in the first storage node is 4, and There are two hash algorithms configured, the preset number of mirror nodes is two, and the number K of storage nodes in the distributed RAID system is 16. Then, the first storage node can perform rounding calculation on the logical addresses 2, 3, and 4 respectively, and obtain corresponding Xs of 0, 0, and 1, respectively. In this way, the first storage node may perform hash calculation on the integer 0 by using two hash algorithms respectively, and then perform the remainder after dividing the two hash calculation results by 16 to obtain the number of the mirror node, which is assumed to be 5 respectively. And 6, and use two hash algorithms to calculate the integer 1 respectively, and then separately calculate the results of the two hash calculations after dividing by 16, to obtain the number of the mirror node, which is assumed to be 6 and 7. At this time, the first storage node can determine that the mirror nodes corresponding to the logical address 2 are the storage nodes numbered 5 and 6, and the mirror nodes corresponding to the logical address 3 are the storage nodes numbered 5 and 6, and the logical address 4 The corresponding mirror nodes are storage nodes numbered 6 and 7.
304、第一存储节点记录第一对应关系。304. The first storage node records the first correspondence.
其中,在第一存储节点根据待写入数据的逻辑地址确定出镜像节点之后,第一存储节点可以保存第一对应关系,该第一对应关系为每个子逻辑地址与镜像节点的编号的映射关系。After the first storage node determines the mirroring node according to the logical address of the data to be written, the first storage node may save the first correspondence, where the first correspondence is a mapping relationship between each sub-logical address and the number of the mirror node. .
示例性的,按照步骤303中的例子,假设第一存储节点的编号为1,那么第一存储 节点1可以保存如表1所示的第一对应关系。Exemplarily, according to the example in step 303, assuming that the number of the first storage node is 1, the first storage node 1 can save the first correspondence as shown in Table 1.
表1Table 1
逻辑地址Logical address 接收节点的编号Receive node number 镜像节点的编号Mirror node number
2、32, 3 11 5、65,6
44 11 6、76,7
305、第一存储节点将待写入数据和待写入数据的逻辑地址发送至镜像节点。305. The first storage node sends the logical address to be written and the data to be written to the mirror node.
其中,在第一存储节点记录了第一对应关系之后,第一存储节点可以向镜像节点发送待写入数据和待写入数据的逻辑地址。具体的:第一存储节点可以向每个镜像节点发送第二写请求,该第二写请求中包括镜像数据和镜像数据的逻辑地址,其中,镜像数据的逻辑地址为第一对应关系中与该镜像节点对应的逻辑地址。该镜像节点便可以在自身缓存中保存接收到的第二写请求中包含的镜像数据。且该镜像节点可以向第一存储节点发送响应消息,用于通知第一存储节点该第二写请求包含的镜像数据已存储到该镜像节点的缓存中。After the first storage node records the first correspondence, the first storage node may send the logical address of the data to be written and the data to be written to the mirror node. Specifically, the first storage node may send a second write request to each mirror node, where the second write request includes a logical address of the mirrored data and the mirrored data, where the logical address of the mirrored data is in the first correspondence relationship and the The logical address corresponding to the mirror node. The mirror node can save the mirror data contained in the received second write request in its own cache. And the mirroring node may send a response message to the first storage node, to notify the first storage node that the mirror data included in the second write request has been stored in the cache of the mirror node.
示例性的,按照步骤304中的例子,假设第一写请求中包括的待写入数据为数据A、数据B和数据C,其中,数据A与逻辑地址2对应,数据B和逻辑地址3对应,数据C与逻辑地址4对应,那么第一存储节点1向存储节点5发送的第二写请求中,镜像数据的逻辑地址包括:与镜像节点5对应的逻辑地址2、逻辑地址3,镜像数据包括:数据A和数据B。第一存储节点1向存储节点6发送的第二写请求中,镜像数据的逻辑地址包括:与镜像节点6对应的逻辑地址2、逻辑地址3、逻辑地址4,镜像数据包括:数据A、数据B和数据C。第一存储节点1向存储节点7发送的第二写请求中,镜像数据的逻辑地址包括:与镜像节点7对应的逻辑地址4,镜像数据包括数据C。Exemplarily, according to the example in step 304, it is assumed that the data to be written included in the first write request is data A, data B, and data C, wherein data A corresponds to logical address 2, and data B corresponds to logical address 3. The data C corresponds to the logical address 4. Then, in the second write request sent by the first storage node 1 to the storage node 5, the logical address of the mirrored data includes: a logical address 2 corresponding to the mirror node 5, a logical address 3, and mirror data. Includes: Data A and Data B. In the second write request sent by the first storage node 1 to the storage node 6, the logical address of the mirrored data includes: a logical address 2 corresponding to the mirror node 6, a logical address 3, and a logical address 4. The mirrored data includes: data A, data B and data C. In the second write request sent by the first storage node 1 to the storage node 7, the logical address of the mirrored data includes: a logical address 4 corresponding to the mirror node 7, and the mirrored data includes the data C.
306、第一存储节点向主机发送响应消息。306. The first storage node sends a response message to the host.
其中,在第一存储节点接收到每个镜像节点发送的响应消息之后,第一存储节点可以向主机发送响应消息,用于通知主机第一写请求中待写入数据已存储在分布式RAID系统中。After the first storage node receives the response message sent by each mirror node, the first storage node may send a response message to the host to notify the host that the data to be written in the first write request is stored in the distributed RAID system. in.
需要说明的是,在本申请实施例中,主机会向第一存储节点发送多个写请求,此时,第一存储节点可以根据处理能力并行处理接收到的多个写请求,且第一存储节点处理每个写请求时均可以执行步骤301-步骤306。在第一存储节点向主机回复了响应消息之后,便可以在后台进行数据的持久化,具体可以执行以下步骤307-步骤308:It should be noted that, in the embodiment of the present application, the host sends multiple write requests to the first storage node. At this time, the first storage node may process the received multiple write requests in parallel according to the processing capability, and the first storage Steps 301 - 306 can be performed when the node processes each write request. After the first storage node replies to the response message to the host, the data can be persisted in the background. Specifically, the following steps 307-308 can be performed:
307、第一存储节点根据第一对应关系从自身的缓存中选择N个目标数据块,这N个目标数据块的镜像节点互不相同。307. The first storage node selects N target data blocks from its own cache according to the first correspondence, and the mirror nodes of the N target data blocks are different from each other.
其中,目标数据块中包括每连续的S个逻辑地址对应的数据。第一存储节点可以先在第一对应关系中确定具有同样镜像节点的子逻辑地址,该同样的镜像节点可能是一个,也可能是多个,如果同样的镜像节点有多个,则第一存储节点可以先从多个镜像节点中选出一个镜像节点。这样,第一存储节点便可以选择N个互不相同的镜像节点,并从N个不同镜像节点对应的子逻辑地址中选出构成一个数据块的子逻辑地址,所选择的子逻辑地址对应的数据便为N个目标数据块。The target data block includes data corresponding to each consecutive S logical addresses. The first storage node may first determine a sub-logical address having the same mirror node in the first correspondence, the same mirror node may be one or more, and if there are multiple mirror nodes, the first storage A node can first select a mirror node from multiple mirror nodes. In this way, the first storage node can select N mutually different mirror nodes, and select sub-logical addresses constituting one data block from the sub-logical addresses corresponding to the N different mirror nodes, and the selected sub-logical addresses correspond to The data is N target data blocks.
且第一存储节点在选出构成一个数据块的子逻辑地址时,可以先确定构成一个数据块时缺失的子逻辑地址,然后再判断该缺失的逻辑地址是否记录在第二对应关系中,该第二对应关系为写入镜像节点的硬盘的数据的逻辑地址与写入该镜像节点的硬盘的物理地址的对应关系。如果第二对应关系中不存在该缺失的子逻辑地址,则表明第一存储节 点未将与缺失的子逻辑地址对应的数据写入硬盘,此时第一存储节点可以设置与该缺失的子逻辑地址对应的数据为0以构成目标数据块,并将该缺失的子逻辑地址记录至第一对应关系中。如果第二对应关系中存在该缺失的子逻辑地址,则表明第一存储节点已将与该缺失的逻辑地址对应的数据存储到镜像节点的硬盘中,此时,第一存储节点可以根据第二对应关系,从该镜像节点中获取与该缺失的子逻辑地址对应的数据以构成目标数据块,并将该缺失的子逻辑地址记录至第一对应关系中。When the first storage node selects a sub-logical address constituting a data block, it may first determine a sub-logical address that is missing when the data block is formed, and then determine whether the missing logical address is recorded in the second correspondence, The second correspondence is a correspondence between a logical address of data written to the hard disk of the mirror node and a physical address of the hard disk written to the mirror node. If the missing sub-logical address does not exist in the second correspondence, it indicates that the first storage node does not write data corresponding to the missing sub-logical address to the hard disk, and the first storage node may set the missing sub-logic The data corresponding to the address is 0 to constitute a target data block, and the missing sub-logical address is recorded into the first correspondence. If the missing sub-logical address exists in the second correspondence, it indicates that the first storage node has stored data corresponding to the missing logical address to the hard disk of the mirror node, and at this time, the first storage node may be according to the second Corresponding relationship, the data corresponding to the missing sub-logical address is obtained from the mirror node to form a target data block, and the missing sub-logical address is recorded into the first correspondence.
示例性的,假设数据块的长度S为3,N为3,第一存储节点1保存的第一对应关系如表2所示。Exemplarily, assuming that the length S of the data block is 3 and N is 3, the first correspondence stored by the first storage node 1 is as shown in Table 2.
表2Table 2
逻辑地址Logical address 接收节点Receiving node 镜像节点Mirror node
33 11 22
0、1、20, 1, 2 11 44
44 11 22
9、109,10 11 33
1111 11 33
1515 11 77
那么第一存储节点1可以在表2中分别确定,具有镜像节点2、3、4、7的子逻辑地址。然后,第一存储节点1可以选择3个互不相同的镜像节点,如编号为2、3和4的镜像节点,此时,第一存储节点1可以从这3个镜像节点对应的子逻辑地址中选出构成一个数据块的子逻辑地址,且在选出构成一个数据块的子逻辑地址时,第一存储节点可以确定缺失的子逻辑地址5,并补入与缺失的逻辑地址5对应的数据以构成目标数据块,并将缺失的逻辑地址5记录至第一对应关系中,此时,第一存储节点1选出的3个目标数据块分别为:逻辑地址3、逻辑地址4和逻辑地址5对应的数据,逻辑地址9、逻辑地址10和逻辑地址11对应的数据,逻辑地址0、逻辑地址1和逻辑地址2对应的数据。Then the first storage node 1 can be determined in Table 2, respectively, with the sub-logical addresses of the mirror nodes 2, 3, 4, 7. Then, the first storage node 1 can select three mirror nodes that are different from each other, such as mirror nodes numbered 2, 3, and 4. At this time, the first storage node 1 can obtain the sub-logical addresses corresponding to the three mirror nodes. Selecting a sub-logical address constituting a data block, and when selecting a sub-logical address constituting a data block, the first storage node may determine the missing sub-logical address 5 and add the corresponding logical address 5 corresponding to the missing Data is used to form a target data block, and the missing logical address 5 is recorded into the first correspondence. At this time, the three target data blocks selected by the first storage node 1 are: logical address 3, logical address 4, and logic. Data corresponding to address 5, data corresponding to logical address 9, logical address 10, and logical address 11, data corresponding to logical address 0, logical address 1, and logical address 2.
308、第一存储节点分别向N个目标数据块对应的N个镜像节点发送通知消息。308. The first storage node sends a notification message to the N mirror nodes corresponding to the N target data blocks.
其中,每个通知消息中包括该通知消息所发往的镜像节点对应的目标数据块的逻辑地址,该通知消息用于指示镜像节点将自身缓存中与该通知消息中的逻辑地址对应的数据块存储至自身的硬盘中。The notification message includes a logical address of the target data block corresponding to the mirror node to which the notification message is sent, and the notification message is used to indicate that the mirror node caches the data block in the cache corresponding to the logical address in the notification message. Store to your own hard drive.
在第一存储节点确定出N个目标数据块之后,第一存储节点可以分别向N个目标数据块对应的N个镜像节点发送通知消息。每个镜像节点在接收到通知消息之后,可以先判断自身缓存中是否存储有与通知消息中的逻辑地址对应的数据。如果自身缓存中存储有与通知消息中的逻辑地址对应的数据,则该镜像节点可以将与通知消息中的逻辑地址对应的数据块存储至自身的硬盘中。如果自身缓存中未存储有与通知消息中的某个逻辑地址对应的数据,则该镜像节点可以判断该缺失数据的逻辑地址是否记录在写入硬盘的数据的逻辑地址与写入硬盘的物理地址的映射关系中,如果不存在,那么该镜像节点可以设置与该缺失数据的逻辑地址对应的数据为0以构成目标数据块。如果存在,那么该镜像节点可以根据写入硬盘的数据的逻辑地址与写入硬盘的物理地址的映射关系,获取与该逻辑地址对应的数据构成目标数据块。然后第一存储节点再将与通知消息中的逻辑地址对应的数据块存储至自身的硬盘中。After the first storage node determines the N target data blocks, the first storage node may send a notification message to the N mirror nodes corresponding to the N target data blocks, respectively. After receiving the notification message, each mirroring node may first determine whether data corresponding to the logical address in the notification message is stored in its own cache. If the data corresponding to the logical address in the notification message is stored in the self cache, the mirror node may store the data block corresponding to the logical address in the notification message to its own hard disk. If the data corresponding to a logical address in the notification message is not stored in the cache, the mirror node may determine whether the logical address of the missing data is recorded in the logical address of the data written to the hard disk and the physical address written to the hard disk. In the mapping relationship, if it does not exist, the mirror node may set the data corresponding to the logical address of the missing data to 0 to constitute the target data block. If yes, the mirror node can acquire data corresponding to the logical address to form a target data block according to a mapping relationship between a logical address of data written to the hard disk and a physical address written to the hard disk. The first storage node then stores the data block corresponding to the logical address in the notification message to its own hard disk.
示例性的,按照步骤307中的例子,第一存储节点1可以分别向编号为2、3、4的存储节点发送通知消息,其中,第一存储节点1向编号为2的存储节点发送的通知消息 中包括的逻辑地址为3、4、5,第一存储节点1向编号为3的存储节点发送的通知消息中包括的逻辑地址为9、10、11,第一存储节点1向编号为4的存储节点发送的通知消息中包括的逻辑地址为0、1、2。Exemplarily, according to the example in step 307, the first storage node 1 may respectively send a notification message to the storage node numbered 2, 3, 4, wherein the first storage node 1 sends a notification to the storage node numbered 2 The logical address included in the message is 3, 4, and 5. The logical address included in the notification message sent by the first storage node 1 to the storage node numbered 3 is 9, 10, and 11, and the first storage node 1 is numbered 4 The logical address included in the notification message sent by the storage node is 0, 1, and 2.
309、第一存储节点根据N个目标数据块计算M个校验数据块。309. The first storage node calculates M check data blocks according to the N target data blocks.
其中,为了在将数据写入硬盘时确保数据的可靠性,在步骤307第一存储节点确定出N个目标数据块之后,第一存储节点可以采用EC算法对N个目标数据块的数据进行计算,获得M个校验数据块。In order to ensure data reliability when writing data to the hard disk, after the first storage node determines N target data blocks in step 307, the first storage node may calculate the data of the N target data blocks by using the EC algorithm. , obtain M check data blocks.
310、第一存储节点将M个校验数据块发送至除N个镜像节点之外的存储节点存储。310. The first storage node sends the M check data blocks to a storage node storage other than the N mirror nodes.
其中,在第一存储节点获得M个校验数据块之后,第一存储节点可以选择除N个镜像节点外的M个存储节点,并将M个校验数据块分别发送至该M个存储节点存储,以便每个存储节点将接收到的校验数据块存储至自身硬盘中,这样N个存储节点的硬盘中存储的N个目标数据块和M个存储节点的硬盘中存储的M个校验数据块可以组成一个条带,使得在该条带的N个目标数据块中的任意Y个目标数据块的数据损坏或丢失的情况下,可以通过该条带中其余未损坏的N-Y个目标数据块的数据以及M个校验数据块的数据来恢复损坏的Y个目标数据块的数据。After the first storage node obtains M check data blocks, the first storage node may select M storage nodes except N mirror nodes, and send M check data blocks to the M storage nodes respectively. Storage, so that each storage node stores the received check data block into its own hard disk, so that N target data blocks stored in the hard disk of the N storage nodes and M checksums stored in the hard disks of the M storage nodes are stored. The data blocks may form a stripe such that, in the event that data of any of the N target data blocks of the strip is corrupted or lost, the remaining uncorrupted NY target data in the strip may be passed. The data of the block and the data of the M check data blocks recover the data of the damaged Y target data blocks.
311、第一存储节点从第一对应关系中删除N个目标数据块的子逻辑地址与镜像节点的对应关系,并从自身的缓存中删除N个目标数据块,且分别向N个目标数据块对应的N个镜像节点发送指示消息。311. The first storage node deletes the correspondence between the sub-logical address of the N target data blocks and the mirror node from the first correspondence, and deletes N target data blocks from the cache, and respectively goes to the N target data blocks. The corresponding N mirror nodes send an indication message.
其中,在分布式RAID系统的N个存储节点的硬盘中存储有N个目标数据块的数据,M个存储节点的硬盘中存储有M个校验数据块的数据之后,表明该N个目标数据块的数据已可靠地存储在分布式RAID系统中了。此时,第一存储节点可以从第一对应关系中删除N个目标数据块的子逻辑地址与镜像节点的对应关系。且第一存储节点可以从自身的缓存中删除N个目标数据块,并分别向N个目标数据块对应的N镜像节点发送指示消息,该指示消息中包括该指示消息所发往的镜像节点对应的目标数据块的逻辑地址,用于指示镜像节点从自身缓存中删除与该指示消息中的逻辑地址对应的数据块。The data of the N target data blocks is stored in the hard disks of the N storage nodes of the distributed RAID system, and the data of the M check data blocks is stored in the hard disks of the M storage nodes, and the N target data is indicated. The block's data has been reliably stored in a distributed RAID system. At this time, the first storage node may delete the correspondence between the sub-logical address of the N target data blocks and the mirror node from the first correspondence. The first storage node may delete the N target data blocks from the cache, and send an indication message to the N mirror nodes corresponding to the N target data blocks, where the indication message includes the mirror node corresponding to the indication message. The logical address of the target data block is used to instruct the mirror node to delete the data block corresponding to the logical address in the indication message from its own cache.
需要说明的是,在本申请实施例中,第一存储节点向N个目标数据块对应的N个镜像节点中的任意一个镜像节点发送的指示消息中包括的逻辑地址,与步骤308中第一存储节点向该镜像节点发送的通知消息中包括的逻辑地址相同。It should be noted that, in the embodiment of the present application, the logical address included in the indication message sent by the first storage node to any one of the N mirror nodes corresponding to the N target data blocks, and the first step in step 308 The logical address included in the notification message sent by the storage node to the mirror node is the same.
本申请实施例提供的数据存储方法,接收节点通过根据待写入数据的逻辑地址确定镜像节点,并向镜像节点发送待写入数据和待写入数据的逻辑地址,使得N个目标数据块分别存储在N个镜像节点中,接收节点便可以直接向N个镜像节点发送包含有目标数据块的逻辑地址的通知消息,来指示镜像节点将与该逻辑地址对应的数据块写入硬盘,这样,本申请与现有技术中存储节点需要跨节点发送两次待写入数据相比,接收节点仅需在向镜像节点发送待写入数据时跨节点发送一次数据,且由于通知消息中包括的逻辑地址的数据量较小,占用的带宽较小,因此与现有技术相比,提升了系统I/O带宽。In the data storage method provided by the embodiment of the present application, the receiving node determines the mirroring node according to the logical address of the data to be written, and sends the logical address to be written and the data to be written to the mirroring node, so that the N target data blocks respectively Stored in the N mirror nodes, the receiving node can directly send a notification message including the logical address of the target data block to the N mirror nodes, to instruct the mirror node to write the data block corresponding to the logical address to the hard disk, so that Compared with the prior art, the storage node needs to send data to be written twice across the node, and the receiving node only needs to send data once across the node when sending the data to be written to the mirror node, and the logic included in the notification message The address has a smaller amount of data and consumes less bandwidth, thus improving system I/O bandwidth compared to the prior art.
并且,第一存储节点通过采用数据块的长度对每个子逻辑地址进行取整运算,使得一个数据块能够存在一个镜像节点中。通过对每个哈希计算的结果在除以K之后进行取余,并根据取余的结果得到镜像节点的编号,使得计算得到的镜像节点是分布式RAID系统中的存储节点。Moreover, the first storage node rounds each sub-logical address by adopting the length of the data block, so that one data block can exist in one mirror node. The result of each hash calculation is divided by K, and the number of the mirror node is obtained according to the result of the redundancy, so that the calculated mirror node is a storage node in the distributed RAID system.
图4为本申请实施例提供的另一种数据存储方法的流程图,如图4所示,该方法可以包括:FIG. 4 is a flowchart of another data storage method according to an embodiment of the present application. As shown in FIG. 4, the method may include:
需要说明的是,在本申请实施例中,以镜像节点为第二存储节点为例进行说明。It should be noted that, in the embodiment of the present application, the mirror node is used as the second storage node as an example for description.
401、第二存储节点接收至少一个接收节点发送的第一写请求。401. The second storage node receives a first write request sent by at least one receiving node.
其中,第一写请求包括镜像数据及镜像数据的逻辑地址,接收节点为从主机接收到第二写请求的节点,第二写请求包括待写入数据及待写入数据的逻辑地址,镜像节点为接收节点根据待写入数据的逻辑地址确定,镜像数据为接收节点确定的写入该镜像节点的数据。且镜像数据的逻辑地址中包括至少一个子逻辑地址。The first write request includes a logical address of the mirrored data and the mirrored data, the receiving node is a node that receives the second write request from the host, and the second write request includes the logical address to be written and the data to be written, and the mirror node The receiving node determines, according to the logical address of the data to be written, the mirrored data is data determined by the receiving node to be written to the mirror node. And the logical address of the mirrored data includes at least one sub-logical address.
402、第二存储节点将镜像数据写入自身的缓存,并记录镜像数据的逻辑地址与接收节点的对应关系。402. The second storage node writes the mirrored data into its own cache, and records the correspondence between the logical address of the mirrored data and the receiving node.
其中,在第二存储节点接收到第一写请求之后,第二存储节点可以在自身缓存中保存接收到的第一写请求中的镜像数据,并记录镜像数据的逻辑地址中的每个子逻辑地址与接收节点的对应关系。After the second storage node receives the first write request, the second storage node may save the mirrored data in the received first write request in the own cache, and record each sub-logical address in the logical address of the mirrored data. Correspondence with the receiving node.
403、第二存储节点根据镜像数据的逻辑地址与接收节点的对应关系从自身的缓存中选出N个目标数据块,这N个数据块的接收节点互不相同。403. The second storage node selects N target data blocks from its own cache according to the correspondence between the logical address of the mirrored data and the receiving node, and the receiving nodes of the N data blocks are different from each other.
其中,第二存储节点可以先在镜像数据的逻辑地址与接收节点的对应关系中确定具有同样接收节点的子逻辑地址。然后,第一存储节点便可以选择N个互不相同的接收节点,并从N个不同接收节点对应的子逻辑地址中选出构成一个数据块的子逻辑地址,所选择的子逻辑地址对应的数据便为N个目标数据块。The second storage node may first determine a sub-logical address having the same receiving node in the correspondence between the logical address of the mirrored data and the receiving node. Then, the first storage node may select N different receiving nodes, and select sub-logical addresses constituting one data block from the sub-logical addresses corresponding to the N different receiving nodes, and the selected sub-logical addresses correspond to The data is N target data blocks.
需要说明的是,在本申请实施例中,对于第二存储节点选出构成一个数据块的子逻辑地址的具体描述可以参考图3中步骤307中,第一存储节点选出构成一个数据块的子逻辑地址的相关描述,本申请实施例在此不再赘述。It should be noted that, in the embodiment of the present application, a specific description of selecting a sub-logical address constituting a data block for the second storage node may refer to step 307 in FIG. 3, where the first storage node selects a data block. The description of the sub-logical address is not described here.
步骤404-步骤407,在本申请实施例中,步骤404-步骤407的相关描述与本申请另一实施例图3中步骤308-步骤311的相关描述类似,对于步骤404-步骤407的相关描述,可以参考图3中步骤308-步骤311的相关描述,本申请实施例不再一一赘述。Step 404-Step 407, in the embodiment of the present application, the related description of Step 404-Step 407 is similar to the related description of Step 308-Step 311 in FIG. 3 of another embodiment of the present application, and the related description of Step 404-Step 407 is performed. For details, refer to the descriptions of the steps 308 to 311 in FIG. 3, which are not described in detail in the embodiments of the present application.
本申请实施例提供的数据存储方法,镜像节点通过将接收到的第一写请求中的镜像数据写入缓存,并记录镜像数据的逻辑地址与接收节点的对应关系,使得N个目标数据块分别存储在N个接收节点中,镜像节点便可以直接向N个接收节点发送包含有目标数据块的逻辑地址的通知消息,来指示接收节点将与该逻辑地址对应的数据块写入硬盘,这样,本申请与现有技术中存储节点需要跨节点发送两次待写入数据相比,仅跨节点发送了一次数据,且由于通知消息中包括的逻辑地址的数据量较小,占用的带宽较小,因此与现有技术相比,提升了系统I/O带宽。In the data storage method provided by the embodiment of the present application, the mirroring node writes the mirrored data in the received first write request to the cache, and records the correspondence between the logical address of the mirrored data and the receiving node, so that the N target data blocks respectively Stored in the N receiving nodes, the mirroring node can directly send a notification message including the logical address of the target data block to the N receiving nodes, to instruct the receiving node to write the data block corresponding to the logical address to the hard disk, so that Compared with the prior art, the storage node needs to send data twice to be written across the node, and only sends data once across the node, and the occupied bandwidth is small because the data amount of the logical address included in the notification message is small. Therefore, the system I/O bandwidth is improved compared with the prior art.
上述主要从存储节点的角度对本申请实施例提供的方案进行了介绍。可以理解的是,存储节点为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的算法步骤,本发明能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。The solution provided by the embodiment of the present application is mainly introduced from the perspective of a storage node. It can be understood that, in order to implement the above functions, the storage node includes corresponding hardware structures and/or software modules for performing various functions. Those skilled in the art will readily appreciate that the present invention can be implemented in a combination of hardware or hardware and computer software in combination with the algorithm steps of the various examples described in the embodiments disclosed herein. Whether a function is implemented in hardware or computer software to drive hardware depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods for implementing the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.
本申请实施例可以根据上述方法示例对存储节点进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分, 实际实现时可以有另外的划分方式。The embodiment of the present application may perform the division of the function module on the storage node according to the foregoing method example. For example, each function module may be divided according to each function, or two or more functions may be integrated into one processing module. The above integrated modules can be implemented in the form of hardware or in the form of software functional modules. It should be noted that the division of the modules in the embodiment of the present application is schematic, and is only a logical function division, and the actual implementation may have another division manner.
在采用对应各个功能划分各个功能模块的情况下,图5示出了上述和实施例中涉及的存储节点的一种可能的组成示意图,如图5所示,该存储节点可以包括:接收单元51、存储单元52、确定单元53、发送单元54和选择单元55。FIG. 5 is a schematic diagram showing a possible composition of the storage node involved in the foregoing and the embodiment. As shown in FIG. 5, the storage node may include: a receiving unit 51. The storage unit 52, the determining unit 53, the transmitting unit 54, and the selecting unit 55.
其中,接收单元51,用于支持存储节点执行图3所示的数据存储方法中的步骤301。The receiving unit 51 is configured to support the storage node to perform step 301 in the data storage method shown in FIG. 3.
存储单元52,用于支持存储节点执行图3所示的数据存储方法中的步骤302、步骤304。The storage unit 52 is configured to support the storage node to perform step 302 and step 304 in the data storage method shown in FIG. 3.
确定单元53,用于支持存储节点执行图3所示的数据存储方法中步骤303。The determining unit 53 is configured to support the storage node to perform step 303 in the data storage method shown in FIG. 3.
发送单元54,用于支持存储节点执行图3所示的数据存储方法中步骤305、步骤306、步骤308、步骤310、步骤311所述的分别向N个目标数据块对应的N个镜像节点发送指示消息。The sending unit 54 is configured to support the storage node to send the N mirror nodes corresponding to the N target data blocks respectively, as described in step 305, step 306, step 308, step 310, and step 311 in the data storage method shown in FIG. Indicate the message.
选择单元55,用于支持存储节点执行图3所示的数据存储方法中步骤307。The selecting unit 55 is configured to support the storage node to perform step 307 in the data storage method shown in FIG. 3.
在本申请实施例中,进一步的,如图6所示,存储节点还可以包括:计算单元56和删除单元57。In the embodiment of the present application, further, as shown in FIG. 6, the storage node may further include: a calculating unit 56 and a deleting unit 57.
计算单元56,用于支持存储节点执行图3所示的数据存储方法中的步骤309。The calculating unit 56 is configured to support the storage node to perform step 309 in the data storage method shown in FIG. 3.
删除单元57,用于支持存储节点执行图3所示的数据存储方法中的步骤311所述的从第一对应关系中删除N个目标数据块的子逻辑地址与镜像节点的对应关系,并从自身的缓存中删除N个目标数据块。a deleting unit 57, configured to support the storage node to perform the correspondence between the sub-logical address and the mirror node of the N target data blocks deleted from the first correspondence relationship as described in step 311 in the data storage method shown in FIG. N target data blocks are deleted in their own cache.
需要说明的是,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。It should be noted that all the related content of the steps involved in the foregoing method embodiments may be referred to the functional descriptions of the corresponding functional modules, and details are not described herein again.
本申请实施例提供的存储节点,用于执行上述图3中的数据存储方法,因此可以达到与上述数据存储方法相同的效果。The storage node provided by the embodiment of the present application is used to execute the data storage method in FIG. 3 above, so that the same effect as the above data storage method can be achieved.
在采用对应各个功能划分各个功能模块的情况下,图7示出了上述和实施例中涉及的存储节点的一种可能的组成示意图,如图7所示,该存储节点可以包括:接收单元61、存储单元62、选择单元63和发送单元64。FIG. 7 is a schematic diagram showing a possible composition of the storage node involved in the foregoing and the embodiment. As shown in FIG. 7, the storage node may include: a receiving unit 61. The storage unit 62, the selection unit 63, and the transmission unit 64.
其中,接收单元61,用于支持存储节点执行图4所示的数据存储方法中的步骤401。The receiving unit 61 is configured to support the storage node to perform step 401 in the data storage method shown in FIG. 4.
存储单元62,用于支持存储节点执行图4所示的数据存储方法中的步骤402。The storage unit 62 is configured to support the storage node to perform step 402 in the data storage method shown in FIG. 4.
选择单元63,用于支持存储节点执行图4所示的数据存储方法中的步骤403。The selecting unit 63 is configured to support the storage node to perform step 403 in the data storage method shown in FIG. 4.
发送单元64,用于支持存储节点执行图4所示的数据存储方法中的步骤404、步骤406、步骤407所述的分别向N个目标数据块对应的N个接收节点发送指示消息。The sending unit 64 is configured to support the storage node to send the indication message to the N receiving nodes corresponding to the N target data blocks respectively, as described in step 404, step 406, and step 407 in the data storage method shown in FIG. 4 .
需要说明的是,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。It should be noted that all the related content of the steps involved in the foregoing method embodiments may be referred to the functional descriptions of the corresponding functional modules, and details are not described herein again.
本申请实施例提供的存储节点,用于执行上述图4中的数据存储方法,因此可以达到与上述数据存储方法相同的效果。The storage node provided in the embodiment of the present application is used to execute the data storage method in FIG. 4 above, so that the same effect as the above data storage method can be achieved.
在采用集成的单元的情况下,图8示出了上述实施例中所涉及的存储节点的另一种可能的组成示意图。如图8所示,该存储节点包括:处理模块71和通信模块72。In the case of employing an integrated unit, FIG. 8 shows another possible composition diagram of the storage node involved in the above embodiment. As shown in FIG. 8, the storage node includes a processing module 71 and a communication module 72.
处理模块71用于对存储节点的动作进行控制管理,例如,处理模块71用于支持存储节点执行图3中的步骤303、步骤307、步骤309、步骤311所述的从第一对应关系中删除N个目标数据块的子逻辑地址与镜像节点的对应关系,并从自身的缓存中删除N个目标数据块,图4所示的步骤403、步骤405、步骤407所述的从镜像数据的逻辑地址与接收节点的对应关系中删除N个目标数据块的子逻辑地址与镜像节点的对应关系,并从 自身的缓存中删除N个目标数据块,和/或用于本文所描述的技术的其它过程。通信模块72用于支持存储节点与其他网络实体,如主机、分布式RAID系统中的其他存储节点的通信。例如,通信模块72,用于支持存储节点执行图3中的步骤301、步骤305、步骤306、步骤308、步骤310、步骤311所述的分别向N个目标数据块对应的N个镜像节点发送指示消息,图4所示的步骤401、步骤404、步骤406、步骤407所述的分别向N个目标数据块对应的N个接收节点发送指示消息。存储节点还可以包括存储模块73,用于保存存储节点的程序代码和数据。例如,存储模块73用于支持存储节点执行图3中的步骤302、步骤304,图4所示的步骤402。The processing module 71 is configured to control and manage the action of the storage node. For example, the processing module 71 is configured to support the storage node to perform the deletion from the first correspondence relationship as described in step 303, step 307, step 309, and step 311 in FIG. 3 . Corresponding relationship between the sub-logical address of the N target data blocks and the mirror node, and deleting N target data blocks from its own cache, the logic of the slave mirror data described in step 403, step 405, and step 407 shown in FIG. The correspondence between the address and the receiving node deletes the correspondence between the sub-logical address of the N target data blocks and the mirror node, and deletes N target data blocks from its own cache, and/or other techniques for the techniques described herein process. Communication module 72 is used to support communication of storage nodes with other network entities, such as hosts, other storage nodes in a distributed RAID system. For example, the communication module 72 is configured to support the storage node to perform the sending to the N mirror nodes corresponding to the N target data blocks, respectively, as described in step 301, step 305, step 306, step 308, step 310, and step 311 in FIG. The indication message, the step 401, the step 404, the step 406, and the step 407 shown in FIG. 4 respectively send an indication message to the N receiving nodes corresponding to the N target data blocks. The storage node may further include a storage module 73 for storing program code and data of the storage node. For example, the storage module 73 is configured to support the storage node to perform step 302, step 304 in FIG. 3, and step 402 shown in FIG.
其中,处理模块71可以是图2中的处理器或控制器。其可以实现或执行结合本发明公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。通信模块72可以是图2中的通信接口等。存储模块73可以是图2中的存储器。The processing module 71 can be the processor or controller in FIG. 2. It is possible to implement or carry out the various illustrative logical blocks, modules and circuits described in connection with the present disclosure. The processor can also be a combination of computing functions, for example, including one or more microprocessor combinations, a combination of a DSP and a microprocessor, and the like. The communication module 72 can be the communication interface or the like in FIG. The storage module 73 can be the memory in FIG.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。Through the description of the above embodiments, those skilled in the art can clearly understand that for the convenience and brevity of the description, only the division of the above functional modules is illustrated. In practical applications, the above functions can be allocated according to needs. It is completed by different functional modules, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be used. The combination may be integrated into another device, or some features may be ignored or not performed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may be one physical unit or multiple physical units, that is, may be located in one place, or may be distributed to multiple different places. . Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be embodied in the form of a software product in the form of a software product in essence or in the form of a contribution to the prior art, and the software product is stored in a storage medium. A number of instructions are included to cause a device (which may be a microcontroller, chip, etc.) or a processor to perform all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes various media that can store program codes, such as a USB flash drive, a mobile hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何在本发明揭露的技术范围内的变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。The above is only the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions within the technical scope of the present invention should be covered by the scope of the present invention. . Therefore, the scope of the invention should be determined by the scope of the appended claims.

Claims (12)

  1. 一种数据存储方法,其特征在于,应用于分布式独立磁盘冗余阵列RAID系统,所述分布式RAID系统包括K个存储节点,K为大于1的整数,所述方法包括:A data storage method, which is characterized in that it is applied to a distributed independent disk redundant array RAID system, the distributed RAID system includes K storage nodes, and K is an integer greater than 1, the method includes:
    接收节点接收第一写请求,所述第一写请求包括待写入数据和待写入数据的逻辑地址,所述接收节点为所述K个存储节点中的任意一个;Receiving, by the receiving node, a first write request, where the first write request includes a logical address to be written data and a data to be written, and the receiving node is any one of the K storage nodes;
    所述接收节点将所述待写入数据存储在自身的缓存中;The receiving node stores the to-be-written data in its own cache;
    所述接收节点根据所述待写入数据的逻辑地址确定镜像节点,并记录第一对应关系,所述第一对应关系为待写入数据的逻辑地址与镜像节点的映射关系;The receiving node determines a mirroring node according to the logical address of the data to be written, and records a first correspondence, where the first correspondence is a mapping relationship between a logical address to be written data and a mirror node;
    所述接收节点将所述待写入数据和所述待写入数据的逻辑地址发送至所述镜像节点;Sending, by the receiving node, the logical address of the data to be written and the data to be written to the mirror node;
    所述接收节点根据所述第一对应关系在缓存中选择N个目标数据块,所述N个目标数据块的镜像节点互不相同,其中,N为大于0且小于K的整数;The receiving node selects N target data blocks in the cache according to the first correspondence, and the mirror nodes of the N target data blocks are different from each other, where N is an integer greater than 0 and less than K;
    所述接收节点分别向所述N个目标数据块对应的N个镜像节点发送通知消息,每个通知消息中包括该通知消息所发往的镜像节点对应的目标数据块的逻辑地址,该通知消息用于指示镜像节点将自身缓存中与该通知消息中的逻辑地址对应的数据块存储至自身的硬盘中。The receiving node sends a notification message to the N mirror nodes corresponding to the N target data blocks, where each notification message includes a logical address of the target data block corresponding to the mirror node to which the notification message is sent, and the notification message It is used to instruct the mirror node to store the data block in its cache corresponding to the logical address in the notification message to its own hard disk.
  2. 根据权利要求1所述的方法,其特征在于,所述待写入数据的逻辑地址包括至少一个子逻辑地址,所述接收节点根据所述待写入数据的逻辑地址确定镜像节点,包括:The method according to claim 1, wherein the logical address of the data to be written includes at least one sub-logical address, and the receiving node determines the mirroring node according to the logical address of the data to be written, including:
    所述接收节点采用公式:X=Int(子逻辑地址/数据块的长度),对所述待写入数据的逻辑地址中的每个子逻辑地址除以所述数据块的长度之后进行取整运算,获得与每个子逻辑地址对应的整数X,所述数据块的长度为每个数据块所包括的子逻辑地址的个数;The receiving node adopts a formula: X=Int (sub logical address/length of the data block), and performs rounding operation after dividing each sub-logical address in the logical address of the data to be written by the length of the data block. Obtaining an integer X corresponding to each sub-logical address, the length of the data block being the number of sub-logical addresses included in each data block;
    所述接收节点采用预先配置的哈希算法对每个整数X进行哈希计算,再对每个哈希计算的结果在除以K之后进行取余,并根据取余的结果得到镜像节点的编号。The receiving node performs a hash calculation on each integer X by using a pre-configured hash algorithm, and then performs a remainder after dividing the result of each hash calculation by K, and obtains the number of the mirror node according to the result of the redundancy. .
  3. 根据权利要求1或2所述的方法,其特征在于,所述待写入数据的逻辑地址包括至少一个子逻辑地址,所述第一对应关系记录了每个子逻辑地址对应的镜像节点;The method according to claim 1 or 2, wherein the logical address of the data to be written includes at least one sub-logical address, and the first correspondence records a mirror node corresponding to each sub-logical address;
    所述接收节点根据所述第一对应关系在缓存中选择N个目标数据块,包括:The receiving node selects N target data blocks in the cache according to the first correspondence, and includes:
    所述接收节点在所述第一对应关系中确定具有同样镜像节点的子逻辑地址;The receiving node determines, in the first correspondence, a sub-logical address having the same mirror node;
    所述接收节点从N个不同镜像节点对应的子逻辑地址中选出构成一个数据块的子逻辑地址,所选择的子逻辑地址对应的数据构成所述N个目标数据块。The receiving node selects a sub-logical logical address constituting one data block from the sub-logical logical addresses corresponding to the N different mirroring nodes, and the data corresponding to the selected sub-logical logical address constitutes the N target data blocks.
  4. 根据权利要求1-3任意一项所述的方法,其特征在于,所述方法还包括:The method of any of claims 1-3, wherein the method further comprises:
    所述接收节点根据所述N个目标数据块计算M个校验数据块;The receiving node calculates M check data blocks according to the N target data blocks;
    所述接收节点将所述M个校验数据块发送至除所述N个镜像节点之外的存储节点存储。The receiving node sends the M check data blocks to a storage node storage other than the N mirror nodes.
  5. 根据权利要求4所述的方法,其特征在于,在所述接收节点将所述M个校验数据块发送至除所述N个镜像节点之外的存储节点存储之后,还包括:The method according to claim 4, further comprising: after the receiving node sends the M check data blocks to a storage node other than the N mirror nodes, further comprising:
    所述接收节点从所述第一对应关系中删除所述N个目标数据块的子逻辑地址与镜像节点的对应关系;Determining, by the receiving node, a correspondence between a sub-logical address of the N target data blocks and a mirror node from the first correspondence relationship;
    所述接收节点从自身的缓存中删除所述N个目标数据块,并分别向所述N个目标数据块对应的N个镜像节点发送指示消息;每个指示消息中包括该指示消息所发往的镜像 节点对应的目标数据块的逻辑地址,该指示消息用于指示镜像节点从缓存中删除与该指示消息中的逻辑地址对应的数据块。The receiving node deletes the N target data blocks from its own cache, and sends an indication message to the N mirror nodes corresponding to the N target data blocks respectively; each indication message includes the indication message sent to The logical address of the target data block corresponding to the mirror node, the indication message is used to instruct the mirror node to delete the data block corresponding to the logical address in the indication message from the cache.
  6. 一种数据存储方法,其特征在于,应用于分布式独立磁盘冗余阵列RAID系统,所述分布式RAID系统包括K个存储节点,K为大于1的整数,所述方法包括:A data storage method, which is characterized in that it is applied to a distributed independent disk redundant array RAID system, the distributed RAID system includes K storage nodes, and K is an integer greater than 1, the method includes:
    镜像节点接收至少一个接收节点发送的第一写请求,所述第一写请求包括镜像数据及镜像数据的逻辑地址,所述接收节点为从主机接收到第二写入请求的节点,所述第二写请求包括待写入数据及待写入数据的逻辑地址,所述镜像节点为所述接收节点根据所述待写入数据的逻辑地址确定,所述镜像数据为所述接收节点确定的写入所述镜像节点的数据;The mirroring node receives a first write request sent by the at least one receiving node, where the first write request includes a logical address of the mirrored data and the mirrored data, and the receiving node is a node that receives the second write request from the host, where the The second write request includes a logical address to be written and a data to be written, the mirror node is determined by the receiving node according to the logical address of the data to be written, and the mirror data is a write determined by the receiving node. Data into the mirror node;
    所述镜像节点将所述镜像数据写入自身的缓存,并记录所述镜像数据的逻辑地址与所述接收节点的对应关系;The mirroring node writes the mirrored data into its own cache, and records a corresponding relationship between the logical address of the mirrored data and the receiving node;
    所述镜像节点根据所述镜像数据与所述接收节点的对应关系在自身的缓存中选择N个目标数据块,所述N个目标数据块的接收节点互不相同,其中,N为大于0且小于K的整数;The mirroring node selects N target data blocks in its own cache according to the corresponding relationship between the mirrored data and the receiving node, and the receiving nodes of the N target data blocks are different from each other, where N is greater than 0 and An integer less than K;
    所述镜像节点分别向所述N个目标数据块对应的N个接收节点发送通知消息,每个通知消息包括该通知消息所发往的接收节点对应的目标数据块的逻辑地址,以指示所述接收节点将自身缓存中与该逻辑地址对应的数据块写入硬盘。The mirroring node sends a notification message to the N receiving nodes corresponding to the N target data blocks, where each notification message includes a logical address of the target data block corresponding to the receiving node to which the notification message is sent, to indicate the The receiving node writes the data block corresponding to the logical address in its own cache to the hard disk.
  7. 一种接收节点,其特征在于,应用于分布式独立磁盘冗余阵列RAID系统,所述分布式RAID系统包括K个存储节点,K为大于1的整数,所述接收节点为所述K个存储节点中的任意一个,所述接收节点包括:接收单元、存储单元、确定单元、发送单元和选择单元;A receiving node, characterized in that it is applied to a distributed independent disk redundant array RAID system, the distributed RAID system includes K storage nodes, K is an integer greater than 1, and the receiving node is the K storage Any one of the nodes, the receiving node includes: a receiving unit, a storage unit, a determining unit, a sending unit, and a selecting unit;
    所述接收单元,用于接收第一写请求,所述第一写请求包括待写入数据和待写入数据的逻辑地址;The receiving unit is configured to receive a first write request, where the first write request includes a logical address to be written data and data to be written;
    所述存储单元,用于将所述待写入数据存储在自身的缓存中;The storage unit is configured to store the to-be-written data in its own cache;
    所述确定单元,用于根据所述待写入数据的逻辑地址确定镜像节点;The determining unit is configured to determine a mirror node according to the logical address of the data to be written;
    所述存储单元,还用于记录第一对应关系,所述第一对应关系为待写入数据的逻辑地址与镜像节点的映射关系;The storage unit is further configured to record a first correspondence, where the first correspondence is a mapping relationship between a logical address to be written data and a mirror node;
    所述发送单元,用于将所述待写入数据和所述待写入数据的逻辑地址发送至所述镜像节点;The sending unit is configured to send the logical address of the data to be written and the data to be written to the mirror node;
    所述选择单元,还用于根据所述存储单元记录的所述第一对应关系在缓存中选择N个目标数据块,所述N个目标数据块的镜像节点互不相同,其中,N为大于0且小于K的整数;The selecting unit is further configured to select, in the cache, N target data blocks according to the first correspondence relationship recorded by the storage unit, where mirror nodes of the N target data blocks are different from each other, where N is greater than 0 and an integer less than K;
    所述发送单元,还用于分别向所述N个目标数据块对应的N个镜像节点发送通知消息,每个通知消息中包括该通知消息所发往的镜像节点对应的目标数据块的逻辑地址,该通知消息用于指示镜像节点将自身缓存中与该通知消息中的逻辑地址对应的数据块存储至自身的硬盘中。The sending unit is further configured to send a notification message to the N mirror nodes corresponding to the N target data blocks, where each notification message includes a logical address of a target data block corresponding to the mirror node to which the notification message is sent. The notification message is used to instruct the mirroring node to store the data block in the cache that corresponds to the logical address in the notification message to its own hard disk.
  8. 根据权利要求7所述的接收节点,其特征在于,所述待写入数据的逻辑地址包括至少一个子逻辑地址,所述确定单元,具体用于:The receiving node according to claim 7, wherein the logical address of the data to be written includes at least one sub-logical address, and the determining unit is specifically configured to:
    采用公式:X=Int(子逻辑地址/数据块的长度),对所述待写入数据的逻辑地址中的每个子逻辑地址在除以所述数据块的长度之后进行取整运算,获得与每个子逻辑地址对 应的整数X,所述数据块的长度为每个数据块所包括的子逻辑地址的个数;Using the formula: X=Int (sub-logical address/length of the data block), each sub-logical address in the logical address of the data to be written is rounded and divided by the length of the data block to obtain a An integer X corresponding to each sub-logical address, the length of the data block being the number of sub-logical addresses included in each data block;
    采用预先配置的哈希算法对每个整数X进行哈希计算,再对每个哈希计算的结果进行取余,并根据取余的结果得到镜像节点的编号。Each integer X is hashed by a pre-configured hash algorithm, and the result of each hash calculation is used, and the number of the mirror node is obtained according to the result of the remainder.
  9. 根据权利要求7或8所述的接收节点,其特征在于,所述待写入数据的逻辑地址包括至少一个子逻辑地址,所述第一对应关系记录了每个子逻辑地址对应的镜像节点;The receiving node according to claim 7 or 8, wherein the logical address of the data to be written includes at least one sub-logical address, and the first correspondence records a mirror node corresponding to each sub-logical address;
    所述选择单元,具体用于:The selection unit is specifically configured to:
    在所述第一对应关系中确定具有同样镜像节点的子逻辑地址;Determining a sub-logical address having the same mirror node in the first correspondence relationship;
    从N个不同镜像节点对应的子逻辑地址中选出构成一个数据块的子逻辑地址,所选择的子逻辑地址对应的数据构成N个目标数据块。The sub-logical addresses constituting one data block are selected from the sub-logical addresses corresponding to the N different mirror nodes, and the data corresponding to the selected sub-logical addresses constitutes N target data blocks.
  10. 根据权利要求7-9任意一项所述的接收节点,其特征在于,所述接收节点还包括:计算单元;The receiving node according to any one of claims 7-9, wherein the receiving node further comprises: a calculating unit;
    所述计算单元,用于根据所述N个目标数据块计算M个校验数据块;The calculating unit is configured to calculate M check data blocks according to the N target data blocks;
    所述发送单元,还用于将所述M个校验数据块发送至除所述N个镜像节点之外的存储节点存储。The sending unit is further configured to send the M check data blocks to a storage node storage other than the N mirror nodes.
  11. 根据权利要求10所述的接收节点,其特征在于,所述接收单元还包括:删除单元;The receiving node according to claim 10, wherein the receiving unit further comprises: a deleting unit;
    所述删除单元,用于从所述第一对应关系中删除所述N个目标数据块的子逻辑地址与镜像节点的对应关系,并从自身的缓存中删除所述N个目标数据块;The deleting unit is configured to delete a correspondence between a sub-logical address of the N target data blocks and a mirror node from the first correspondence, and delete the N target data blocks from a cache thereof;
    所述发送单元,还用于分别向所述N个目标数据块对应的N个镜像节点发送指示消息;每个指示消息中包括该指示消息所发往的镜像节点对应的目标数据块的逻辑地址,该指示消息用于指示镜像节点从缓存中删除与该指示消息中的逻辑地址对应的数据块。The sending unit is further configured to send an indication message to the N mirror nodes corresponding to the N target data blocks, where each indication message includes a logical address of the target data block corresponding to the mirror node to which the indication message is sent. The indication message is used to instruct the mirroring node to delete the data block corresponding to the logical address in the indication message from the cache.
  12. 一种镜像节点,其特征在于,应用于分布式独立磁盘冗余阵列RAID系统,所述分布式RAID系统包括K个存储节点,K为大于1的整数,所述镜像节点包括:接收单元、存储单元、选择单元和发送单元;A mirror node, which is applied to a distributed independent disk redundant array RAID system, the distributed RAID system includes K storage nodes, K is an integer greater than 1, and the mirror node includes: a receiving unit, and a storage Unit, selection unit, and transmission unit;
    所述接收单元,用于接收至少一个接收节点发送的第一写请求,所述第一写请求包括镜像数据及镜像数据的逻辑地址,所述接收节点为从主机接收到第二写入请求的节点,所述第二写请求包括待写入数据及待写入数据的逻辑地址,所述镜像节点为所述接收节点根据所述待写入数据的逻辑地址确定,所述镜像数据为所述接收节点确定的写入所述镜像节点的数据;The receiving unit is configured to receive a first write request sent by the at least one receiving node, where the first write request includes a logical address of the mirrored data and the mirrored data, and the receiving node receives the second write request from the host. a node, the second write request includes a logical address to be written data and a data to be written, the mirror node is determined by the receiving node according to the logical address of the data to be written, and the mirrored data is the Receiving data determined by the node to be written to the mirror node;
    所述存储单元,用于将所述镜像数据写入自身的缓存,并记录所述镜像数据的逻辑地址与所述接收节点的对应关系;The storage unit is configured to write the mirrored data into its own cache, and record a correspondence between a logical address of the mirrored data and the receiving node;
    所述选择单元,用于根据所述镜像数据与所述接收节点的对应关系在自身的缓存中选择N个目标数据块,所述N个目标数据块的接收节点互不相同,其中,N为大于0且小于K的整数;The selecting unit is configured to select N target data blocks in the cache according to the corresponding relationship between the mirror data and the receiving node, where the receiving nodes of the N target data blocks are different from each other, where N is An integer greater than 0 and less than K;
    所述发送单元,用于分别向所述N个目标数据块对应的N个接收节点发送通知消息,每个通知消息包括该通知消息所发往的接收节点对应的目标数据块的逻辑地址,以指示所述接收节点将自身缓存中与该逻辑地址对应的数据块写入硬盘。The sending unit is configured to send a notification message to the N receiving nodes corresponding to the N target data blocks, where each notification message includes a logical address of a target data block corresponding to the receiving node to which the notification message is sent, Instructing the receiving node to write a data block corresponding to the logical address in its own cache to the hard disk.
PCT/CN2018/113837 2017-11-03 2018-11-02 Data storage method and device WO2019086016A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711073029.2A CN109753225B (en) 2017-11-03 2017-11-03 Data storage method and equipment
CN201711073029.2 2017-11-03

Publications (1)

Publication Number Publication Date
WO2019086016A1 true WO2019086016A1 (en) 2019-05-09

Family

ID=66332466

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/113837 WO2019086016A1 (en) 2017-11-03 2018-11-02 Data storage method and device

Country Status (2)

Country Link
CN (2) CN109753225B (en)
WO (1) WO2019086016A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110502507B (en) * 2019-08-29 2022-02-08 上海达梦数据库有限公司 Management system, method, equipment and storage medium of distributed database
CN116107516B (en) * 2023-04-10 2023-07-11 苏州浪潮智能科技有限公司 Data writing method and device, solid state disk, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5410667A (en) * 1992-04-17 1995-04-25 Storage Technology Corporation Data record copy system for a disk drive array data storage subsystem
CN101504594A (en) * 2009-03-13 2009-08-12 杭州华三通信技术有限公司 Data storage method and apparatus
CN103544045A (en) * 2013-10-16 2014-01-29 南京大学镇江高新技术研究院 HDFS-based virtual machine image storage system and construction method thereof
CN106155937A (en) * 2015-04-07 2016-11-23 龙芯中科技术有限公司 Cache access method, equipment and processor

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6532527B2 (en) * 2000-06-19 2003-03-11 Storage Technology Corporation Using current recovery mechanisms to implement dynamic mapping operations
JP4270371B2 (en) * 2003-05-09 2009-05-27 インターナショナル・ビジネス・マシーンズ・コーポレーション Storage system, control device, control method, and program
US7996608B1 (en) * 2005-10-20 2011-08-09 American Megatrends, Inc. Providing redundancy in a storage system
CN101622595A (en) * 2006-12-06 2010-01-06 弗森多系统公司(dba弗森-艾奥) Apparatus, system, and method for storage space recovery in solid-state storage
US8103904B2 (en) * 2010-02-22 2012-01-24 International Business Machines Corporation Read-other protocol for maintaining parity coherency in a write-back distributed redundancy data storage system
KR20130064521A (en) * 2011-12-08 2013-06-18 삼성전자주식회사 Data storage device and data management method thereof
CN103186554B (en) * 2011-12-28 2016-11-23 阿里巴巴集团控股有限公司 Distributed data mirror method and storage back end
US9778856B2 (en) * 2012-08-30 2017-10-03 Microsoft Technology Licensing, Llc Block-level access to parallel storage
CN103797770B (en) * 2012-12-31 2015-12-02 华为技术有限公司 A kind of method and system of shared storage resources
CN103761058B (en) * 2014-01-23 2016-08-17 天津中科蓝鲸信息技术有限公司 RAID1 and RAID4 mixed structure network store system and method
CN104484130A (en) * 2014-12-04 2015-04-01 北京同有飞骥科技股份有限公司 Construction method of horizontal expansion storage system
US9785575B2 (en) * 2014-12-30 2017-10-10 International Business Machines Corporation Optimizing thin provisioning in a data storage system through selective use of multiple grain sizes
CN107748702B (en) * 2015-06-04 2021-05-04 华为技术有限公司 Data recovery method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5410667A (en) * 1992-04-17 1995-04-25 Storage Technology Corporation Data record copy system for a disk drive array data storage subsystem
CN101504594A (en) * 2009-03-13 2009-08-12 杭州华三通信技术有限公司 Data storage method and apparatus
CN103544045A (en) * 2013-10-16 2014-01-29 南京大学镇江高新技术研究院 HDFS-based virtual machine image storage system and construction method thereof
CN106155937A (en) * 2015-04-07 2016-11-23 龙芯中科技术有限公司 Cache access method, equipment and processor

Also Published As

Publication number Publication date
CN109753225A (en) 2019-05-14
CN111666043A (en) 2020-09-15
CN109753225B (en) 2020-06-02

Similar Documents

Publication Publication Date Title
US10324843B1 (en) System and method for cache management
CN108459826B (en) Method and device for processing IO (input/output) request
US20160371186A1 (en) Access-based eviction of blocks from solid state drive cache memory
US9507720B2 (en) Block storage-based data processing methods, apparatus, and systems
US10740018B2 (en) Data migration method and apparatus applied to computer system, and computer system
CN108064374B (en) Data access method, device and system
WO2019000950A1 (en) Fragment management method and fragment management apparatus
CN109445687B (en) Data storage method and protocol server
WO2017140262A1 (en) Data updating technique
US20160170841A1 (en) Non-Disruptive Online Storage Device Firmware Updating
WO2019184012A1 (en) Data writing method, client server, and system
WO2014190501A1 (en) Data recovery method, storage device, and storage system
WO2019086016A1 (en) Data storage method and device
US20230163789A1 (en) Stripe management method, storage system, stripe management apparatus, and storage medium
US10983930B1 (en) Efficient non-transparent bridge (NTB) based data transport
JP5893028B2 (en) System and method for efficient sequential logging on a storage device that supports caching
US8930626B1 (en) Cache management system and method
US20180307427A1 (en) Storage control apparatus and storage control method
US11157198B2 (en) Generating merge-friendly sequential IO patterns in shared logger page descriptor tiers
CN117149062A (en) Processing method and computing device for damaged data of magnetic tape
US20210311654A1 (en) Distributed Storage System and Computer Program Product
JP7067256B2 (en) Data transfer device and data transfer method
JP6958928B2 (en) Storage devices, storage management methods, and programs
CN112988034B (en) Distributed system data writing method and device
US10782891B1 (en) Aggregated host-array performance tiering

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18873128

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18873128

Country of ref document: EP

Kind code of ref document: A1