WO2019086016A1 - Procédé et dispositif de stockage de données - Google Patents

Procédé et dispositif de stockage de données Download PDF

Info

Publication number
WO2019086016A1
WO2019086016A1 PCT/CN2018/113837 CN2018113837W WO2019086016A1 WO 2019086016 A1 WO2019086016 A1 WO 2019086016A1 CN 2018113837 W CN2018113837 W CN 2018113837W WO 2019086016 A1 WO2019086016 A1 WO 2019086016A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
data
logical address
mirror
written
Prior art date
Application number
PCT/CN2018/113837
Other languages
English (en)
Chinese (zh)
Inventor
赵旺
陈飘
张鹏
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2019086016A1 publication Critical patent/WO2019086016A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0613Improving I/O performance in relation to throughput
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0631Configuration or reconfiguration of storage systems by allocating resources to storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD

Definitions

  • the embodiments of the present invention relate to the field of computer technologies, and in particular, to a data storage method and device.
  • a distributed array of independent disk (RAID) system generally consists of multiple storage nodes interconnected by a network.
  • redundant algorithms such as erasure code (EC) algorithm can be used to implement redundant storage of data, and in order to improve the system Input/output (I/O) performance enables data caching (cache) mechanisms in distributed RAID systems, allowing storage nodes to quickly read and write data from the cache.
  • I/O Input/output
  • the local storage node may send the write request to other storage nodes in the distributed RAID system after receiving a write request, so that the data in the write request can be separately stored to include the local storage node. At least two storage nodes are cached to ensure the reliability of the data, and then the local storage node can reply to the write request. After the local storage node receives multiple write requests, it may determine N data blocks from the data in the cache that has not been written to the hard disk, and calculate M check data blocks according to the EC algorithm, and then respectively go to N+M. The storage nodes send N data blocks and M check data blocks, so that each of the N+M storage nodes can store the data in the received data block into its own hard disk.
  • the local storage node since the local storage node sends data to the node when sending the write request to other storage nodes in the process of storing the data to the hard disk, the local storage node sends N data to the N storage nodes. Blocks also need to send data across nodes, so data is sent twice across nodes, resulting in lower I/O bandwidth.
  • the embodiment of the present application provides a data storage method and device, which solves the problem that the I/O bandwidth of the system is low because the data is sent twice across the node.
  • the present application provides a data storage method, which is applied to a distributed RAID system, where the distributed RAID system includes K storage nodes, and K is an integer greater than 0.
  • the method may include: receiving, receiving, including, to be written a first write request of the data and the logical address of the data to be written, and storing the data to be written in its own cache, and determining a mirror node according to the logical address of the data to be written, and recording the first correspondence, the first A correspondence relationship is a mapping relationship between the logical address of the data to be written and the mirror node, and the receiving node sends the logical address to be written and the data to be written to the mirror node, and from the cache according to the first correspondence.
  • the mirror nodes of the N target data blocks are different from each other, so that the receiving node sends a notification message to the N mirror nodes corresponding to the N target data blocks, and the notification message is included in each notification message.
  • the logical address of the target data block corresponding to the mirrored node to be used to indicate that the mirror node caches itself and the logic in the notification message. Access to the corresponding data block stored in the hard disk itself.
  • the receiving node is any one of K storage nodes, and N is an integer greater than 0 and less than K.
  • the receiving node determines the mirroring node according to the logical address of the data to be written, and sends the logical address to be written and the data to be written to the mirroring node, so that the N target data blocks respectively Stored in the N mirror nodes, the receiving node can directly send a notification message including the logical address of the target data block to the N mirror nodes, to instruct the mirror node to write the data block corresponding to the logical address to the hard disk, so that Compared with the prior art, the storage node needs to send data to be written twice across the node, and the receiving node only needs to send data once across the node when sending the data to be written to the mirror node, and the logic included in the notification message
  • the address has a smaller amount of data and consumes less bandwidth, thus improving system I/O bandwidth compared to the prior art.
  • the logical address of the data to be written includes at least one sub-logical address
  • the integer X and the receiving node performs a hash calculation on each integer X by using a pre-configured hash algorithm, and then the result of each hash calculation is used, and the number of the mirror node is obtained according to the result of the redundancy.
  • the length of the data block is the number of sub-logical addresses included in each data block.
  • the receiving node when the receiving node performs hash calculation on each integer X by using a pre-configured hash algorithm, if the preset number of mirroring nodes is One, the integer X is hashed using a pre-configured hash algorithm.
  • the receiving node when the receiving node performs a hash calculation on each integer X by using a pre-configured hash algorithm, if the preset number of mirroring nodes is greater than One, the integer X is calculated using a different type of hash algorithm pre-configured.
  • the receiving node when the receiving node performs a hash calculation on each integer X by using a pre-configured hash algorithm, if the preset number of mirroring nodes is greater than One is to calculate the integer X by using a pre-configured hash algorithm, and then obtain a preset number of other results according to the calculation result.
  • the logical address of the data to be written includes at least one sub-logical address, and the first correspondence relationship records the mirror node corresponding to each sub-logical address.
  • the receiving node selects N target data blocks in the cache according to the first correspondence, and specifically includes: the receiving node determines, in the first correspondence, a child logical address having the same mirror node, and the child corresponding to the N different mirror nodes Among the logical addresses, the sub-logical addresses constituting one data block are selected, and the data corresponding to the selected sub-logical addresses constitutes N target data blocks.
  • the receiving node selects a sub-logical logical address constituting one data block from the sub-logical logical addresses corresponding to the N different mirroring nodes, determining the composition A sub-logical address missing in a data block, determining whether the missing sub-logical address is recorded in the second correspondence, the second correspondence being the logical address of the data written to the hard disk of the mirror node and the hard disk written to the mirror node Correspondence of physical addresses.
  • the method may further include: the receiving node calculates M check data blocks according to the N target data blocks, and sends the M check data blocks. Storage node storage except for N mirror nodes.
  • the method may further include: The receiving node deletes the correspondence between the sub-logical address of the N target data blocks and the mirror node from the first correspondence. And the receiving node deletes the N target data blocks from the cache, and sends an indication message to the N mirror nodes corresponding to the N target data blocks, where each indication message includes a mirror node corresponding to the indication message The logical address of the target data block, used to instruct the mirror node to delete the data block corresponding to the logical address in the indication message from the cache.
  • the present application provides a data storage method, which is applied to a distributed RAID system, where the distributed RAID system includes K storage nodes, and K is an integer greater than 0.
  • the method may include: the mirror node receives at least one receiving node.
  • the first write request sent includes the logical address of the mirrored data and the mirrored data, and the mirrored data is written into its own cache, and the logical relationship between the logical address of the mirrored data and the receiving node is recorded.
  • the mirror node selects N target data blocks in its own cache according to the correspondence between the mirror data and the receiving node, and the receiving nodes of the N target data blocks are different from each other, and respectively receive N corresponding to the N target data blocks.
  • the node sends a notification message, where each notification message includes a logical address of the target data block corresponding to the receiving node to which the notification message is sent, to instruct the receiving node to write the data block corresponding to the logical address in the cache to the hard disk.
  • the receiving node is a node that receives a second write request from the host, and the second write request includes a logical address to be written and a data to be written, and the mirror node is determined by the receiving node according to the logical address of the data to be written.
  • the image data is the data written to the mirror node determined by the receiving node, and N is an integer greater than 0 and less than K.
  • the mirroring node writes the mirrored data in the received first write request to the cache, and records the correspondence between the logical address of the mirrored data and the receiving node, so that the N target data blocks respectively Stored in the N receiving nodes, the mirroring node can directly send a notification message including the logical address of the target data block to the N receiving nodes, to instruct the receiving node to write the data block corresponding to the logical address to the hard disk, so that Compared with the prior art, the storage node needs to send data twice to be written across the node, and only sends data once across the node, and the occupied bandwidth is small because the data amount of the logical address included in the notification message is small. Therefore, the system I/O bandwidth is improved compared with the prior art.
  • the present application provides a receiving node, which may include a module capable of implementing the method in the above first aspect and its various embodiments.
  • the application provides a mirroring node, which may include a module capable of implementing the method in the second aspect above.
  • the application provides a storage node, where the storage node includes: at least one processor, a memory, a communication interface, and a communication bus. At least one processor is coupled to the memory and the communication interface via a communication bus, the memory is configured to store the computer execution instructions, and when the storage node is in operation, the processor executes the memory storage computer execution instructions to cause the storage node to perform the first aspect or the first A data storage method of any of the possible implementations of aspects, or a data storage method as in the second aspect.
  • the present application provides a computer storage medium having stored thereon computer-executable instructions for implementing any of the possible implementations of the first aspect or the first aspect when the computer-executed instructions are executed by the processor A data storage method, or a data storage method as in the second aspect.
  • the present application also provides a computer program product, which when executed on a computer, causes the computer to perform the method of the first aspect or the second aspect described above.
  • the present application also provides a communication chip in which computer-executed instructions are stored, and when executed on a computer, cause the computer to perform the method of the first aspect or the second aspect described above.
  • any of the devices or computer storage media or computer program products provided above are used to perform the corresponding methods provided above, and therefore, the beneficial effects that can be achieved can be referred to the beneficial effects in the corresponding methods. , will not repeat them here.
  • FIG. 1 is a schematic structural diagram of an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of a storage node according to an embodiment of the present application.
  • FIG. 3 is a flowchart of a data storage method according to an embodiment of the present application.
  • FIG. 4 is a flowchart of another data storage method according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of another storage node according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of another storage node according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of another storage node according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of another storage node according to an embodiment of the present application.
  • FIG. 1 is a schematic structural diagram of an embodiment of the present application. As shown in FIG. 1 , the architecture may include: a host 11 and a distributed RAID system 12 .
  • the host 11 is configured to send a first write request to any one of the distributed RAID systems 12 when the host 11 needs to store data to the distributed RAID system 12, and also receive a response message returned by the storage node.
  • the response message is used to notify the host 11 that the data to be stored in the first write request has been stored in the distributed RAID system 12.
  • the distributed RAID system 12 is composed of K storage nodes through network interconnection, and is used to provide a large amount of storage space, K is an integer greater than 1, and each storage node in the distributed RAID system 12 can be a server in a specific implementation.
  • FIG. 2 is a schematic diagram of a composition of a storage node according to an embodiment of the present disclosure.
  • the storage node may include: at least one processor 21, a memory 22, a communication interface 23, and a communication bus 24.
  • the processor 21 is a control center of the storage node, and may be a processor or a collective name of a plurality of processing elements.
  • the processor 21 is a central processing unit (CPU), may be an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present application.
  • CPU central processing unit
  • ASIC application specific integrated circuit
  • DSPs digital signal processors
  • FPGAs field programmable gate arrays
  • the storage node may include one or more CPUs, such as CPU0 and CPU1 shown in FIG. 2.
  • the storage node may include a plurality of processors, such as processor 21 and processor 25 shown in FIG.
  • processors can be a single core processor (CPU) or a multi-core processor (multi-CPU).
  • a processor herein may refer to one or more devices, circuits, and/or processing cores for processing data, such as computer program instructions.
  • processor 21 may perform various functions of the storage node by running or executing a software program stored in memory 22, as well as invoking data stored in memory 22.
  • the processor 21 can execute the computer program code stored in the memory 22 to execute the data storage method provided by the present application, and save the data to be stored in the write request to the hard disk of the distributed RAID system.
  • the memory 22 can be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (RAM) or other type that can store information and instructions.
  • the dynamic storage device can also be an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disc storage, and a disc storage device. (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be Any other media accessed, but not limited to this.
  • Memory 22 may be present independently and coupled to processor 21 via communication bus 24. The memory 22 can also be integrated with the processor 21.
  • the memory 22 is used to store data in the present application and to execute the software program of the present application.
  • the memory 22 can be used to store computer program code corresponding to the data storage method provided by the embodiment of the present application.
  • the memory 22 may include a cache and a hard disk for storing data to be stored in the write request.
  • the communication interface 23 uses devices such as any transceiver for communicating with other devices or communication networks, such as a host, a radio access network (RAN), a wireless local area networks (WLAN), and the like.
  • the communication interface 23 may include a receiving unit that implements a receiving function, and a transmitting unit that implements a transmitting function.
  • the communication bus 24 may be an industry standard architecture (ISA) bus, a peripheral component interconnect (PCI) bus, or an extended industry standard architecture (EISA) bus.
  • ISA industry standard architecture
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 2, but it does not mean that there is only one bus or one type of bus.
  • the data storage method shown in FIG. 3 may be used to store data. As shown in FIG. 3, the method may include:
  • the receiving node is taken as the first storage node as an example.
  • the first storage node is a storage node that receives any first write request sent by the host in any one of the distributed RAID systems.
  • the first storage node receives a first write request sent by the host.
  • the first write request may include a logical address to be written and a data to be written, and the logical address of the data to be written includes at least one sub-logical address.
  • the logical address of the data to be written generally includes a first address and a data length. For example, if the first address is 0 and the data length is 3, the logical address of the data to be written includes 3 sub-logical addresses. , that is, logical address 0, logical address 1, and logical address 2.
  • the first storage node stores the data to be written in the first write request in its own cache.
  • the first storage node determines the mirror node according to the logical address of the data to be written in the first write request.
  • the length of the data block is the number of sub-logical addresses included in each data block.
  • the sub-logical address is rounded after dividing by the length of the data block to obtain an integer X corresponding to each sub-logical address.
  • the first storage node may perform a hash calculation on each integer X by using a pre-configured hash algorithm, and then perform the result of each hash calculation after dividing by the number K of storage nodes in the distributed RAID system. Take the remainder and get the number of the mirrored node based on the result of the remainder.
  • the first storage node When the first storage node performs hash calculation on each integer X by using a pre-configured hash algorithm, if the preset number of mirror nodes is only one, the first storage node may adopt a pre-configured hash algorithm. Calculate the integer X and then take the remainder to get the number of the mirror node. If the preset number of the mirrored nodes is greater than one, in one implementation manner, the first storage node may calculate the integer X by using a different type of hash algorithm configured in advance, and the type of the hash algorithm and the mirror node The number is the same. After the hash result is used, the number of different mirror nodes is obtained.
  • the first storage node may perform hash calculation on the integer X first, and after obtaining the remainder, obtain the number of one mirror node, and then determine the number of the other mirror node according to the determined result of the number of the mirror node. For example, if the preset number of mirror nodes is three, the first storage node may first perform hash calculation on the integer X, and take the number of the mirror node obtained after the remainder, and then the number result of the mirror node with the determination The two mirror node number values adjacent to each other are used as the other two mirror nodes, or two hash calculations are performed on the calculation result to obtain the numbers of the other two mirror nodes.
  • the length S of the data block may be pre-configured in the first storage node, so that starting from the sub-logical address 0, each consecutive S sub-logical addresses is a logical address of one data block. .
  • the first storage node performs a rounding operation on each sub-logical address by adopting the length S of the data block, so that the integer X obtained by rounding each sub-logical address in one data block is the same, thereby making it according to a data block.
  • the mirror nodes calculated by each sub-logical address are the same, that is, one data block can exist in one mirror node.
  • the preset number of mirror nodes is determined by the reliability index of the distributed RAID system.
  • the reliability index of the distributed RAID system is that in the case where the caches of the two storage nodes respectively fail, the data in the caches of the two storage nodes can still be recovered, and the data in each storage node is It needs to be stored in the cache of at least three storage nodes respectively, that is, the preset number of mirror nodes in the first storage node is two or more.
  • the sub-logical addresses included in the logical address of the data to be written are: logical address 2, logical address 3, logical address 4, and the length S of the pre-configured data block in the first storage node is 4, and There are two hash algorithms configured, the preset number of mirror nodes is two, and the number K of storage nodes in the distributed RAID system is 16. Then, the first storage node can perform rounding calculation on the logical addresses 2, 3, and 4 respectively, and obtain corresponding Xs of 0, 0, and 1, respectively. In this way, the first storage node may perform hash calculation on the integer 0 by using two hash algorithms respectively, and then perform the remainder after dividing the two hash calculation results by 16 to obtain the number of the mirror node, which is assumed to be 5 respectively.
  • the first storage node can determine that the mirror nodes corresponding to the logical address 2 are the storage nodes numbered 5 and 6, and the mirror nodes corresponding to the logical address 3 are the storage nodes numbered 5 and 6, and the logical address 4 The corresponding mirror nodes are storage nodes numbered 6 and 7.
  • the first storage node records the first correspondence.
  • the first storage node may save the first correspondence, where the first correspondence is a mapping relationship between each sub-logical address and the number of the mirror node. .
  • the first storage node 1 can save the first correspondence as shown in Table 1.
  • the first storage node sends the logical address to be written and the data to be written to the mirror node.
  • the first storage node may send the logical address of the data to be written and the data to be written to the mirror node.
  • the first storage node may send a second write request to each mirror node, where the second write request includes a logical address of the mirrored data and the mirrored data, where the logical address of the mirrored data is in the first correspondence relationship and the The logical address corresponding to the mirror node.
  • the mirror node can save the mirror data contained in the received second write request in its own cache.
  • the mirroring node may send a response message to the first storage node, to notify the first storage node that the mirror data included in the second write request has been stored in the cache of the mirror node.
  • the data to be written included in the first write request is data A, data B, and data C, wherein data A corresponds to logical address 2, and data B corresponds to logical address 3.
  • the data C corresponds to the logical address 4.
  • the logical address of the mirrored data includes: a logical address 2 corresponding to the mirror node 5, a logical address 3, and mirror data. Includes: Data A and Data B.
  • the logical address of the mirrored data includes: a logical address 2 corresponding to the mirror node 6, a logical address 3, and a logical address 4.
  • the mirrored data includes: data A, data B and data C.
  • the logical address of the mirrored data includes: a logical address 4 corresponding to the mirror node 7, and the mirrored data includes the data C.
  • the first storage node sends a response message to the host.
  • the first storage node may send a response message to the host to notify the host that the data to be written in the first write request is stored in the distributed RAID system. in.
  • the host sends multiple write requests to the first storage node.
  • the first storage node may process the received multiple write requests in parallel according to the processing capability, and the first storage Steps 301 - 306 can be performed when the node processes each write request.
  • the data can be persisted in the background. Specifically, the following steps 307-308 can be performed:
  • the first storage node selects N target data blocks from its own cache according to the first correspondence, and the mirror nodes of the N target data blocks are different from each other.
  • the target data block includes data corresponding to each consecutive S logical addresses.
  • the first storage node may first determine a sub-logical address having the same mirror node in the first correspondence, the same mirror node may be one or more, and if there are multiple mirror nodes, the first storage A node can first select a mirror node from multiple mirror nodes. In this way, the first storage node can select N mutually different mirror nodes, and select sub-logical addresses constituting one data block from the sub-logical addresses corresponding to the N different mirror nodes, and the selected sub-logical addresses correspond to The data is N target data blocks.
  • the first storage node When the first storage node selects a sub-logical address constituting a data block, it may first determine a sub-logical address that is missing when the data block is formed, and then determine whether the missing logical address is recorded in the second correspondence,
  • the second correspondence is a correspondence between a logical address of data written to the hard disk of the mirror node and a physical address of the hard disk written to the mirror node. If the missing sub-logical address does not exist in the second correspondence, it indicates that the first storage node does not write data corresponding to the missing sub-logical address to the hard disk, and the first storage node may set the missing sub-logic
  • the data corresponding to the address is 0 to constitute a target data block, and the missing sub-logical address is recorded into the first correspondence.
  • the missing sub-logical address exists in the second correspondence, it indicates that the first storage node has stored data corresponding to the missing logical address to the hard disk of the mirror node, and at this time, the first storage node may be according to the second Corresponding relationship, the data corresponding to the missing sub-logical address is obtained from the mirror node to form a target data block, and the missing sub-logical address is recorded into the first correspondence.
  • the first correspondence stored by the first storage node 1 is as shown in Table 2.
  • the first storage node 1 can be determined in Table 2, respectively, with the sub-logical addresses of the mirror nodes 2, 3, 4, 7. Then, the first storage node 1 can select three mirror nodes that are different from each other, such as mirror nodes numbered 2, 3, and 4. At this time, the first storage node 1 can obtain the sub-logical addresses corresponding to the three mirror nodes. Selecting a sub-logical address constituting a data block, and when selecting a sub-logical address constituting a data block, the first storage node may determine the missing sub-logical address 5 and add the corresponding logical address 5 corresponding to the missing Data is used to form a target data block, and the missing logical address 5 is recorded into the first correspondence.
  • the three target data blocks selected by the first storage node 1 are: logical address 3, logical address 4, and logic. Data corresponding to address 5, data corresponding to logical address 9, logical address 10, and logical address 11, data corresponding to logical address 0, logical address 1, and logical address 2.
  • the first storage node sends a notification message to the N mirror nodes corresponding to the N target data blocks.
  • the notification message includes a logical address of the target data block corresponding to the mirror node to which the notification message is sent, and the notification message is used to indicate that the mirror node caches the data block in the cache corresponding to the logical address in the notification message. Store to your own hard drive.
  • the first storage node may send a notification message to the N mirror nodes corresponding to the N target data blocks, respectively.
  • each mirroring node may first determine whether data corresponding to the logical address in the notification message is stored in its own cache. If the data corresponding to the logical address in the notification message is stored in the self cache, the mirror node may store the data block corresponding to the logical address in the notification message to its own hard disk. If the data corresponding to a logical address in the notification message is not stored in the cache, the mirror node may determine whether the logical address of the missing data is recorded in the logical address of the data written to the hard disk and the physical address written to the hard disk.
  • the mirror node may set the data corresponding to the logical address of the missing data to 0 to constitute the target data block. If yes, the mirror node can acquire data corresponding to the logical address to form a target data block according to a mapping relationship between a logical address of data written to the hard disk and a physical address written to the hard disk. The first storage node then stores the data block corresponding to the logical address in the notification message to its own hard disk.
  • the first storage node 1 may respectively send a notification message to the storage node numbered 2, 3, 4, wherein the first storage node 1 sends a notification to the storage node numbered 2
  • the logical address included in the message is 3, 4, and 5.
  • the logical address included in the notification message sent by the first storage node 1 to the storage node numbered 3 is 9, 10, and 11, and the first storage node 1 is numbered 4
  • the logical address included in the notification message sent by the storage node is 0, 1, and 2.
  • the first storage node calculates M check data blocks according to the N target data blocks.
  • the first storage node may calculate the data of the N target data blocks by using the EC algorithm. , obtain M check data blocks.
  • the first storage node sends the M check data blocks to a storage node storage other than the N mirror nodes.
  • the first storage node may select M storage nodes except N mirror nodes, and send M check data blocks to the M storage nodes respectively. Storage, so that each storage node stores the received check data block into its own hard disk, so that N target data blocks stored in the hard disk of the N storage nodes and M checksums stored in the hard disks of the M storage nodes are stored.
  • the data blocks may form a stripe such that, in the event that data of any of the N target data blocks of the strip is corrupted or lost, the remaining uncorrupted NY target data in the strip may be passed.
  • the data of the block and the data of the M check data blocks recover the data of the damaged Y target data blocks.
  • the first storage node deletes the correspondence between the sub-logical address of the N target data blocks and the mirror node from the first correspondence, and deletes N target data blocks from the cache, and respectively goes to the N target data blocks.
  • the corresponding N mirror nodes send an indication message.
  • the data of the N target data blocks is stored in the hard disks of the N storage nodes of the distributed RAID system, and the data of the M check data blocks is stored in the hard disks of the M storage nodes, and the N target data is indicated.
  • the block's data has been reliably stored in a distributed RAID system.
  • the first storage node may delete the correspondence between the sub-logical address of the N target data blocks and the mirror node from the first correspondence.
  • the first storage node may delete the N target data blocks from the cache, and send an indication message to the N mirror nodes corresponding to the N target data blocks, where the indication message includes the mirror node corresponding to the indication message.
  • the logical address of the target data block is used to instruct the mirror node to delete the data block corresponding to the logical address in the indication message from its own cache.
  • the logical address included in the indication message sent by the first storage node to any one of the N mirror nodes corresponding to the N target data blocks, and the first step in step 308 The logical address included in the notification message sent by the storage node to the mirror node is the same.
  • the receiving node determines the mirroring node according to the logical address of the data to be written, and sends the logical address to be written and the data to be written to the mirroring node, so that the N target data blocks respectively Stored in the N mirror nodes, the receiving node can directly send a notification message including the logical address of the target data block to the N mirror nodes, to instruct the mirror node to write the data block corresponding to the logical address to the hard disk, so that Compared with the prior art, the storage node needs to send data to be written twice across the node, and the receiving node only needs to send data once across the node when sending the data to be written to the mirror node, and the logic included in the notification message
  • the address has a smaller amount of data and consumes less bandwidth, thus improving system I/O bandwidth compared to the prior art.
  • the first storage node rounds each sub-logical address by adopting the length of the data block, so that one data block can exist in one mirror node.
  • the result of each hash calculation is divided by K, and the number of the mirror node is obtained according to the result of the redundancy, so that the calculated mirror node is a storage node in the distributed RAID system.
  • FIG. 4 is a flowchart of another data storage method according to an embodiment of the present application. As shown in FIG. 4, the method may include:
  • the mirror node is used as the second storage node as an example for description.
  • the second storage node receives a first write request sent by at least one receiving node.
  • the first write request includes a logical address of the mirrored data and the mirrored data
  • the receiving node is a node that receives the second write request from the host, and the second write request includes the logical address to be written and the data to be written
  • the mirror node The receiving node determines, according to the logical address of the data to be written, the mirrored data is data determined by the receiving node to be written to the mirror node.
  • the logical address of the mirrored data includes at least one sub-logical address.
  • the second storage node writes the mirrored data into its own cache, and records the correspondence between the logical address of the mirrored data and the receiving node.
  • the second storage node may save the mirrored data in the received first write request in the own cache, and record each sub-logical address in the logical address of the mirrored data. Correspondence with the receiving node.
  • the second storage node selects N target data blocks from its own cache according to the correspondence between the logical address of the mirrored data and the receiving node, and the receiving nodes of the N data blocks are different from each other.
  • the second storage node may first determine a sub-logical address having the same receiving node in the correspondence between the logical address of the mirrored data and the receiving node. Then, the first storage node may select N different receiving nodes, and select sub-logical addresses constituting one data block from the sub-logical addresses corresponding to the N different receiving nodes, and the selected sub-logical addresses correspond to The data is N target data blocks.
  • a specific description of selecting a sub-logical address constituting a data block for the second storage node may refer to step 307 in FIG. 3, where the first storage node selects a data block.
  • the description of the sub-logical address is not described here.
  • Step 404-Step 407 in the embodiment of the present application, the related description of Step 404-Step 407 is similar to the related description of Step 308-Step 311 in FIG. 3 of another embodiment of the present application, and the related description of Step 404-Step 407 is performed.
  • the steps 308 to 311 in FIG. 3 which are not described in detail in the embodiments of the present application.
  • the mirroring node writes the mirrored data in the received first write request to the cache, and records the correspondence between the logical address of the mirrored data and the receiving node, so that the N target data blocks respectively Stored in the N receiving nodes, the mirroring node can directly send a notification message including the logical address of the target data block to the N receiving nodes, to instruct the receiving node to write the data block corresponding to the logical address to the hard disk, so that Compared with the prior art, the storage node needs to send data twice to be written across the node, and only sends data once across the node, and the occupied bandwidth is small because the data amount of the logical address included in the notification message is small. Therefore, the system I/O bandwidth is improved compared with the prior art.
  • the storage node includes corresponding hardware structures and/or software modules for performing various functions.
  • the present invention can be implemented in a combination of hardware or hardware and computer software in combination with the algorithm steps of the various examples described in the embodiments disclosed herein. Whether a function is implemented in hardware or computer software to drive hardware depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods for implementing the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.
  • the embodiment of the present application may perform the division of the function module on the storage node according to the foregoing method example.
  • each function module may be divided according to each function, or two or more functions may be integrated into one processing module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules. It should be noted that the division of the modules in the embodiment of the present application is schematic, and is only a logical function division, and the actual implementation may have another division manner.
  • FIG. 5 is a schematic diagram showing a possible composition of the storage node involved in the foregoing and the embodiment.
  • the storage node may include: a receiving unit 51.
  • the storage unit 52 the determining unit 53, the transmitting unit 54, and the selecting unit 55.
  • the receiving unit 51 is configured to support the storage node to perform step 301 in the data storage method shown in FIG. 3.
  • the storage unit 52 is configured to support the storage node to perform step 302 and step 304 in the data storage method shown in FIG. 3.
  • the determining unit 53 is configured to support the storage node to perform step 303 in the data storage method shown in FIG. 3.
  • the sending unit 54 is configured to support the storage node to send the N mirror nodes corresponding to the N target data blocks respectively, as described in step 305, step 306, step 308, step 310, and step 311 in the data storage method shown in FIG. Indicate the message.
  • the selecting unit 55 is configured to support the storage node to perform step 307 in the data storage method shown in FIG. 3.
  • the storage node may further include: a calculating unit 56 and a deleting unit 57.
  • the calculating unit 56 is configured to support the storage node to perform step 309 in the data storage method shown in FIG. 3.
  • a deleting unit 57 configured to support the storage node to perform the correspondence between the sub-logical address and the mirror node of the N target data blocks deleted from the first correspondence relationship as described in step 311 in the data storage method shown in FIG. N target data blocks are deleted in their own cache.
  • the storage node provided by the embodiment of the present application is used to execute the data storage method in FIG. 3 above, so that the same effect as the above data storage method can be achieved.
  • FIG. 7 is a schematic diagram showing a possible composition of the storage node involved in the foregoing and the embodiment.
  • the storage node may include: a receiving unit 61.
  • the storage unit 62 the selection unit 63, and the transmission unit 64.
  • the receiving unit 61 is configured to support the storage node to perform step 401 in the data storage method shown in FIG. 4.
  • the storage unit 62 is configured to support the storage node to perform step 402 in the data storage method shown in FIG. 4.
  • the selecting unit 63 is configured to support the storage node to perform step 403 in the data storage method shown in FIG. 4.
  • the sending unit 64 is configured to support the storage node to send the indication message to the N receiving nodes corresponding to the N target data blocks respectively, as described in step 404, step 406, and step 407 in the data storage method shown in FIG. 4 .
  • the storage node provided in the embodiment of the present application is used to execute the data storage method in FIG. 4 above, so that the same effect as the above data storage method can be achieved.
  • FIG. 8 shows another possible composition diagram of the storage node involved in the above embodiment.
  • the storage node includes a processing module 71 and a communication module 72.
  • the processing module 71 is configured to control and manage the action of the storage node.
  • the processing module 71 is configured to support the storage node to perform the deletion from the first correspondence relationship as described in step 303, step 307, step 309, and step 311 in FIG. 3 .
  • the correspondence between the address and the receiving node deletes the correspondence between the sub-logical address of the N target data blocks and the mirror node, and deletes N target data blocks from its own cache, and/or other techniques for the techniques described herein process.
  • Communication module 72 is used to support communication of storage nodes with other network entities, such as hosts, other storage nodes in a distributed RAID system.
  • the communication module 72 is configured to support the storage node to perform the sending to the N mirror nodes corresponding to the N target data blocks, respectively, as described in step 301, step 305, step 306, step 308, step 310, and step 311 in FIG.
  • the indication message, the step 401, the step 404, the step 406, and the step 407 shown in FIG. 4 respectively send an indication message to the N receiving nodes corresponding to the N target data blocks.
  • the storage node may further include a storage module 73 for storing program code and data of the storage node.
  • the storage module 73 is configured to support the storage node to perform step 302, step 304 in FIG. 3, and step 402 shown in FIG.
  • the processing module 71 can be the processor or controller in FIG. 2. It is possible to implement or carry out the various illustrative logical blocks, modules and circuits described in connection with the present disclosure.
  • the processor can also be a combination of computing functions, for example, including one or more microprocessor combinations, a combination of a DSP and a microprocessor, and the like.
  • the communication module 72 can be the communication interface or the like in FIG.
  • the storage module 73 can be the memory in FIG.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the modules or units is only a logical function division.
  • there may be another division manner for example, multiple units or components may be used.
  • the combination may be integrated into another device, or some features may be ignored or not performed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may be one physical unit or multiple physical units, that is, may be located in one place, or may be distributed to multiple different places. . Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a readable storage medium.
  • the technical solution of the embodiments of the present application may be embodied in the form of a software product in the form of a software product in essence or in the form of a contribution to the prior art, and the software product is stored in a storage medium.
  • a number of instructions are included to cause a device (which may be a microcontroller, chip, etc.) or a processor to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes various media that can store program codes, such as a USB flash drive, a mobile hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

L'invention concerne un procédé et un dispositif de stockage de données, se rapportant au domaine technique des ordinateurs, et résolvant le problème selon lequel des données sont envoyées deux fois à travers des nœuds, ce qui permet d'obtenir une largeur de bande d'e/s d'un système qui est faible. Une solution particulière est la suivante : un nœud de réception reçoit une première demande d'écriture comprenant des données à écrire et une adresse logique des données à écrire, stocke, dans sa propre mémoire tampon, les données à écrire, détermine un nœud miroir selon l'adresse logique des données à écrire, et enregistre une première corrélation, la première corrélation étant une relation de mappage entre l'adresse logique des données à écrire et le nœud miroir ; et le nœud de réception envoie les données à écrire et l'adresse logique des données à écrire dans le nœud miroir, sélectionne N blocs de données cibles à partir de sa propre mémoire tampon conformément à la première corrélation, les noeuds miroirs des N blocs de données cibles étant différents les uns des autres, et envoie respectivement un message de notification à N nœuds miroirs correspondant aux N blocs de données cibles.
PCT/CN2018/113837 2017-11-03 2018-11-02 Procédé et dispositif de stockage de données WO2019086016A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711073029.2 2017-11-03
CN201711073029.2A CN109753225B (zh) 2017-11-03 2017-11-03 一种数据存储方法及设备

Publications (1)

Publication Number Publication Date
WO2019086016A1 true WO2019086016A1 (fr) 2019-05-09

Family

ID=66332466

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/113837 WO2019086016A1 (fr) 2017-11-03 2018-11-02 Procédé et dispositif de stockage de données

Country Status (2)

Country Link
CN (2) CN111666043A (fr)
WO (1) WO2019086016A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110502507B (zh) * 2019-08-29 2022-02-08 上海达梦数据库有限公司 一种分布式数据库的管理系统、方法、设备和存储介质
CN116107516B (zh) * 2023-04-10 2023-07-11 苏州浪潮智能科技有限公司 数据写入方法、装置、固态硬盘、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5410667A (en) * 1992-04-17 1995-04-25 Storage Technology Corporation Data record copy system for a disk drive array data storage subsystem
CN101504594A (zh) * 2009-03-13 2009-08-12 杭州华三通信技术有限公司 一种数据存储方法和装置
CN103544045A (zh) * 2013-10-16 2014-01-29 南京大学镇江高新技术研究院 一种基于hdfs的虚拟机镜像存储系统及其构建方法
CN106155937A (zh) * 2015-04-07 2016-11-23 龙芯中科技术有限公司 缓存访问方法、设备和处理器

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6532527B2 (en) * 2000-06-19 2003-03-11 Storage Technology Corporation Using current recovery mechanisms to implement dynamic mapping operations
JP4270371B2 (ja) * 2003-05-09 2009-05-27 インターナショナル・ビジネス・マシーンズ・コーポレーション 記憶システム、制御装置、制御方法、及び、プログラム
US7996608B1 (en) * 2005-10-20 2011-08-09 American Megatrends, Inc. Providing redundancy in a storage system
CN101646994B (zh) * 2006-12-06 2016-06-15 才智知识产权控股公司(2) 利用内存库交错管理固态存储器的命令的装置、系统及方法
US8103904B2 (en) * 2010-02-22 2012-01-24 International Business Machines Corporation Read-other protocol for maintaining parity coherency in a write-back distributed redundancy data storage system
KR20130064521A (ko) * 2011-12-08 2013-06-18 삼성전자주식회사 데이터 저장 장치 및 그것의 데이터 관리 방법
CN103186554B (zh) * 2011-12-28 2016-11-23 阿里巴巴集团控股有限公司 分布式数据镜像方法及存储数据节点
US9778856B2 (en) * 2012-08-30 2017-10-03 Microsoft Technology Licensing, Llc Block-level access to parallel storage
CN103797770B (zh) * 2012-12-31 2015-12-02 华为技术有限公司 一种共享存储资源的方法和系统
CN103761058B (zh) * 2014-01-23 2016-08-17 天津中科蓝鲸信息技术有限公司 Raid1和raid4混合结构网络存储系统及方法
CN104484130A (zh) * 2014-12-04 2015-04-01 北京同有飞骥科技股份有限公司 一种横向扩展存储系统的构建方法
US9785575B2 (en) * 2014-12-30 2017-10-10 International Business Machines Corporation Optimizing thin provisioning in a data storage system through selective use of multiple grain sizes
CN107748702B (zh) * 2015-06-04 2021-05-04 华为技术有限公司 一种数据恢复方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5410667A (en) * 1992-04-17 1995-04-25 Storage Technology Corporation Data record copy system for a disk drive array data storage subsystem
CN101504594A (zh) * 2009-03-13 2009-08-12 杭州华三通信技术有限公司 一种数据存储方法和装置
CN103544045A (zh) * 2013-10-16 2014-01-29 南京大学镇江高新技术研究院 一种基于hdfs的虚拟机镜像存储系统及其构建方法
CN106155937A (zh) * 2015-04-07 2016-11-23 龙芯中科技术有限公司 缓存访问方法、设备和处理器

Also Published As

Publication number Publication date
CN109753225B (zh) 2020-06-02
CN111666043A (zh) 2020-09-15
CN109753225A (zh) 2019-05-14

Similar Documents

Publication Publication Date Title
US10324843B1 (en) System and method for cache management
CN108459826B (zh) 一种处理io请求的方法及装置
US20160371186A1 (en) Access-based eviction of blocks from solid state drive cache memory
US9507720B2 (en) Block storage-based data processing methods, apparatus, and systems
CN108064374B (zh) 一种数据访问方法、装置和系统
US20180267739A1 (en) Data migration method and apparatus applied to computer system, and computer system
WO2019000950A1 (fr) Procédé de gestion de fragment et appareil de gestion de fragment
WO2017140262A1 (fr) Technique de mise à jour de données
CN109445687B (zh) 一种数据存储方法以及协议服务器
US20160170841A1 (en) Non-Disruptive Online Storage Device Firmware Updating
WO2019184012A1 (fr) Procédé d'écriture de données, serveur client, et système
WO2014190501A1 (fr) Procédé de récupération de données, dispositif de stockage et système de stockage
WO2019086016A1 (fr) Procédé et dispositif de stockage de données
US20230163789A1 (en) Stripe management method, storage system, stripe management apparatus, and storage medium
US10983930B1 (en) Efficient non-transparent bridge (NTB) based data transport
JP5893028B2 (ja) キャッシングに対応したストレージ装置上における効率的なシーケンシャルロギングのためのシステム及び方法
US8930626B1 (en) Cache management system and method
US20180307427A1 (en) Storage control apparatus and storage control method
US11157198B2 (en) Generating merge-friendly sequential IO patterns in shared logger page descriptor tiers
CN117149062A (zh) 一种磁带损坏数据的处理方法以及计算装置
JP7067256B2 (ja) データ転送装置およびデータ転送方法
JP6958928B2 (ja) ストレージ装置、ストレージ管理方法、及びプログラム
CN112988034B (zh) 一种分布式系统数据写入方法及装置
US20210311654A1 (en) Distributed Storage System and Computer Program Product
US10782891B1 (en) Aggregated host-array performance tiering

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18873128

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18873128

Country of ref document: EP

Kind code of ref document: A1