CN115495010A - Data access method, device and storage system - Google Patents

Data access method, device and storage system Download PDF

Info

Publication number
CN115495010A
CN115495010A CN202111196880.0A CN202111196880A CN115495010A CN 115495010 A CN115495010 A CN 115495010A CN 202111196880 A CN202111196880 A CN 202111196880A CN 115495010 A CN115495010 A CN 115495010A
Authority
CN
China
Prior art keywords
storage
storage nodes
data
nodes
write
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111196880.0A
Other languages
Chinese (zh)
Inventor
冉欢
陈雷明
姜兆普
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN115495010A publication Critical patent/CN115495010A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0613Improving I/O performance in relation to throughput
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

There is provided a data access method, the method comprising: the storage client obtains N slices from the data written into the storage space by the access device according to a redundancy algorithm, and respectively sends the N slices to corresponding storage nodes in the N storage nodes; wherein the storage space supports data append writing; the storage client receives write success responses returned by W storage nodes in the N storage nodes; wherein N and W are positive integers and N is greater than W; and the storage client returns a data writing success response to the access equipment.

Description

Data access method, device and storage system
Technical Field
The present application relates to the field of storage technologies, and in particular, to a data access method, an apparatus, and a storage system.
Background
In the distributed system, a storage client writes data to a plurality of storage nodes, and the storage client only returns a successful write operation to the upper access device after waiting for the plurality of storage nodes to all return successful write responses. However, due to network delay between the storage client and the storage nodes, the successful response time of the write back of each storage node is different, and the operation causes the delay of the whole write flow to be elongated, thereby affecting the write request performance and the throughput of the storage system.
Disclosure of Invention
The invention provides a data access method, a data access device and a storage system, which are used for reducing data access time delay and improving storage access performance and throughput.
The first aspect provides a data access method, in the method, a storage client obtains N slices from data written into a storage space by access equipment according to a data redundancy algorithm, and sends the N slices to corresponding storage nodes in N storage nodes respectively; wherein the storage space supports data Append-only writes; the storage client receives write success responses returned by W storage nodes in the N storage nodes; wherein N and W are positive integers and N is greater than W; and the storage client returns a data write success response to the access equipment. The data redundancy algorithm is an Erasure Coding (EC) algorithm or a multi-copy algorithm. In the scheme of the invention, the storage client side does not need N storage nodes to all return write success responses, and can determine the success of data write as long as W storage nodes return write success responses are received, thereby reducing access delay and improving write performance and throughput of the storage system.
In one possible implementation of the first aspect of the present invention, the W storage nodes are located in different available areas.
In a possible implementation manner of the first aspect of the present invention, the storage system is a centralized storage system or a storage hard array, and the storage nodes are hard disk frames or control nodes.
In one possible implementation manner of the first aspect of the present invention, the storage system is a distributed storage system, and the storage node is a storage server.
In one possible implementation manner of the first aspect of the present invention, the method further includes: the storage client reads the data from the N storage nodes; the storage client obtains returned reading results from R storage nodes in the N storage nodes; wherein R is a positive integer, and R is less than N, and R + W is greater than N; and the storage client determines the latest data in version from the reading results returned by the R storage nodes. And when the sum of the number of the storage nodes returning the reading result and the number of the storage nodes returning the writing success response is larger than N, the storage nodes returning the reading result necessarily comprise the storage node which stores the latest data most. Therefore, the mechanism can ensure that the storage client side can quickly determine the successful reading and writing on one hand, and can also ensure the data consistency on the other hand.
In one possible implementation manner of the first aspect of the present invention, the method further includes: before the N slices are respectively sent to the corresponding storage nodes in the N storage nodes, the write-back success can be returned to the access device by determining that the write-in success of the W storage nodes is successful, so that the write-back success response of the N storage nodes is not required to be waited.
In a possible implementation manner of the first aspect of the present invention, the sending of N slices to corresponding storage nodes in N storage nodes respectively specifically includes sending N slice write-in commands to memories of network interface cards of corresponding storage nodes in N storage nodes respectively, and a hard disk in a storage node obtains a write-in command from a memory of a network interface card, so as to implement storing of slices in a hard disk of a storage node.
In one possible implementation manner of the first aspect of the present invention, to improve data reliability, the storage client and the storage node that does not return a success record the fragmented write log. In one case, the storage client stores the write logs of all the fragments, and the storage node stores the received write logs of the fragments. And when the corresponding fragment is successfully written in the hard disk of the storage node, the storage client and the storage node delete the corresponding log. The storage client only stores the log of the fragments successfully written into the hard disk of the storage node. And the storage node which does not successfully write the fragments into the hard disk saves the log of the fragments which are not successfully written into the hard disk. The log is used for recording the information of the fragments, and includes information such as the identification, the writing position, the version number and the like of the storage space to which the fragments belong.
In an implementation manner of the first aspect of the present invention, the storage client may continuously write to the hard disk of the storage node that does not return a write success response according to the write result of the hard disk in each storage node. And the storage client saves the log of the fragment corresponding to the hard disk of the storage node until the hard disk of the storage node returns success. In another implementation, the continuation write operation may be performed by a storage node that does not return a write success response. Specifically, when the hard disk of the storage node which does not return the write success response receives a new write fragment again, it is determined that an address of the unwritten fragment exists before the address of the write fragment according to the address of the write fragment, a fragment write operation which is not successfully executed exists, and the storage node starts a data tracing process. I.e., the shim does not perform a successful sliced write operation. The specific data tracing operation can be played back according to the log of the storage node; or acquiring corresponding fragments from other storage nodes of the same partition, or acquiring fragments from a storage client. The scheme can ensure that the fragments are successfully written into the hard disk at last, and prevent the fragments from being lost and holes.
In an implementation manner of the first aspect of the present invention, the storage space is changed to a closed state (seal) when the amount of data written in the storage space reaches a certain threshold or meets a certain condition, that is, no data is written any more. Before the storage space is changed to the seal, the storage client needs to determine whether the storage node is written with the information that the write back is successful. And if the hard disk with the storage node does not return and succeeds, starting the data tracing operation. In particular, the storage node can also be started actively. The embodiment of the present invention is not limited thereto.
In an implementation manner of the first aspect of the present invention, the storage client may further carry timestamp information in the fragment sent to the storage node. The storage node can judge the sequence of each fragment according to the timestamp, so that the received fragments are sequenced in the cache queue.
In a second aspect, the present invention provides a data storage apparatus, and in one implementation, the data storage apparatus includes various units, which are used to implement various functions of a storage client in various implementations of the first aspect of the present invention. In another implementation, the data storage device includes an interface and a processor, the interface is in communication with the processor, and the processor is configured to implement various functions of the storage client in various implementations of the first aspect of the invention.
In a third aspect, the present invention provides a storage system, which includes a storage client and a storage node, wherein the storage client is configured to implement functions in various implementations of the first aspect of the present invention.
In a fourth aspect, the present invention provides a computer program product comprising computer program instructions for implementing the functions of the storage client in the various implementations of the first aspect of the present invention.
In a fifth aspect, the present invention provides a computer-readable storage medium containing computer program instructions for implementing the functions of the storage client in the various implementations of the first aspect of the present invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below.
FIG. 1 is a schematic diagram of a memory system according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a memory array controller according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a distributed storage system according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a server architecture in a distributed storage system according to an embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating a data storage process of a storage space according to an embodiment of the present invention;
FIG. 6 is a schematic sectional view of an embodiment of the present invention;
FIG. 7a is a schematic diagram of a write operation according to the present invention;
FIG. 7b is a diagram illustrating a read operation according to an embodiment of the present invention;
FIG. 8 is a diagram illustrating a data storage device according to an embodiment of the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
As shown in FIG. 1, the storage system in the embodiment of the invention may be a storage array (e.g., a storage array)
Figure BDA0003303496020000031
Is/are as follows
Figure BDA0003303496020000032
18000 the series of the components of the Chinese character,
Figure BDA0003303496020000033
series V6). The storage array includes a controller 101 and a plurality of SSDs. Among other things, the SSD may be located on a hard disk frame of the storage array. As shown in fig. 2, the controller 101 includes a Central Processing Unit (CPU) 201, a memory 202, and an interface 203, where the memory 202 stores computer program instructions, and the CPU201 executes the computer program instructions in the memory 202 to perform operations such as management and data access operations, data recovery, and the like on the storage system. In addition, to save the computing resources of the CPU201, a Field Programmable Gate Array (FPGA) or other hardware may also be used to execute all operations of the CPU201 in the embodiment of the present invention, or the FPGA or other hardware and the CPU201 are respectively used to execute partial operations of the CPU201 in the embodiment of the present invention. For convenience of description, embodiments of the present invention refer collectively to the combination of the CPU201 and the memory 202 as well as the various implementations described above, with the processor communicating with the interface 203. The Interface 203 may be a Network Interface Card (NIC), a Host Bus Adapter (HBA).
Furthermore, the storage system of the embodiment of the invention can also be a distributed storage system (e.g. a distributed storage system)
Figure BDA0003303496020000034
Is/are as follows
Figure BDA0003303496020000035
Series), etc. To be provided with
Figure BDA0003303496020000036
Is/are as follows
Figure BDA0003303496020000037
Series 100D. As shown in fig. 3, the distributed block storage system includes a plurality of servers, such as server 1, server 2, server 3, server 4, server 5, and server 6, which communicate with each other via InfiniBand, ethernet, etc. In practical applications, the number of servers in the distributed storage system may be increased according to practical requirements, which is not limited in the embodiment of the present invention.
The server of the distributed storage system includes a structure as shown in fig. 4. As shown in fig. 4, each server in the distributed storage system includes a Central Processing Unit (CPU) 401, a memory 402, an interface 403, an SSD1, an SSD 2, and an SSD 3, wherein the memory 402 stores computer program instructions, and the CPU401 executes the program instructions in the memory 402 to perform corresponding operations. The Interface 403 may be a hardware Interface, such as a Network Interface Card (NIC) or a Host Bus Adapter (HBA), or may also be a program Interface module. In addition, in order to save the computing resources of the CPU401, a Field Programmable Gate array (Field Programmable Gate array FPGA) or other hardware may also be used to perform the above corresponding operations instead of the CPU401, or the FPGA or other hardware and the CPU401 may perform the above corresponding operations together. For convenience of description, the CPU401 and the memory 402, the FPGA and other hardware replacing the CPU401 or the combination of the FPGA and other hardware replacing the CPU401 and the CPU401 are collectively referred to as a processor according to the embodiments of the present invention. The Interface 403 may be a Network Interface Card (NIC) or a Host Bus Adapter (HBA). In a distributed storage system, the server responsible for storage management in the distributed storage system is referred to as the controller. Specifically, the controller is used to perform memory space management, data access, and the like.
The storage system is used for providing storage space for the access device. Wherein the access device may be a host or a server. The access device is used for accessing the storage space provided by the storage system. Typically, the access device accesses the storage space through an interface provided by the storage client. The storage client may be located at the access device or at the storage system. The storage client presents the storage space provided by the storage system for the access device.
In the embodiment of the invention, the storage space is a section of storage space supporting additional writing. I.e. writing data backwards in the memory space in address order. In the additional write mode, the data written into the storage space is modified, the modified data needs to be written into new addresses in the storage space in sequence, and the data originally written into the storage space becomes invalid data. And the storage client presents the storage space to the access equipment, and simultaneously converts the access of the access equipment to the storage space into the access to the storage node.
Next, the components in the storage system will be described according to an embodiment of the present invention. As shown in fig. 6, a schematic diagram of a storage space write flow is shown. The storage client receives a write request sent by the access device, wherein the write request carries data. Usually, the write request carries a storage space identifier and an address. And the storage client continuously writes the data into the position indicated by the address. And adding the writing characteristic according to the storage space. The position before the address has data written. The embodiment of the present invention is described by taking an example that data written in a storage space is written in a storage node by using an Erasure Coding (Erasure Coding) algorithm. The storage client divides the data stored in the storage space into data fragments in a fixed size according to the address sequence of the storage space. The EC of the embodiment of the present invention is described with a stripe length of 6, that is, including 6 slices, where the number of data slices is 4 and the number of check slices is 2. Each slice is the same size. As shown in fig. 6, the client divides the data in the storage space into data slices 0-11 according to the address order in the storage space. The data fragments 0 to 3 are data fragments in the data fragment 0, and the data fragment 0 to 3 generate a verification fragment P01 and verification data P02; the data fragmentation 4-the data fragmentation 7 are data fragmentation in the fragmentation 1, and the fragmentation 1 generates a verification fragmentation P11 and verification data P12 according to the data fragmentation 4-the data fragmentation 7; the data fragment 8-the data fragment 11 are data fragments in the fragment 2, and the fragment 2 generates a verification fragment P21 and verification data P22 according to the data fragment 8-the data fragment 11. The storage client determines a corresponding storage location, that is, hardware in the storage node, for the segment in each stripe according to a routing table, also called a partition view, provided by the storage space manager. Usually, the stripes in one storage space belong to the same partition, and the stripes in the same partition are distributed on the hardware of the same storage node. That is, the shards at the same position in different shards in the same partition are located on the hard disks of the same storage node.
In the embodiment of the invention, the storage manager is used for managing the storage space and managing data redundancy in the storage space, and meanwhile, the storage manager is also used for formulating the partition view. The storage management includes allocation, creation, deletion, and the like of a storage space. The data redundancy management in the storage space comprises a data redundancy algorithm used by the storage space to store data to the storage nodes. Such as using an EC algorithm or multiple copy algorithms. As shown in fig. 7, the storage manager formulates a partition view for managing the relationship between the storage space and the partition to which the storage space belongs, and maintains the relationship between the partition and the hard disk in the storage node according to the redundancy algorithm. Usually, one hard disk in a storage node corresponds to one storage process or storage thread. The storage process or the storage thread is used for storing one fragment in the partition corresponding to the storage space. Specifically, as shown in fig. 7, the storage manager sends the partition view, that is, the relationship between the storage space and the partition to which the storage space belongs and the relationship between the partition and the hard disk in the storage node, to the storage client and the corresponding storage node.
And the storage client writes the fragments in the strips into the hard disks in the corresponding storage nodes according to the partition views. In a specific implementation, the storage client sends the fragments to the storage node where the main hard disk is located. And then, the storage node where the main hard disk is located sends the fragments belonging to the hard disks of other storage nodes to other storage nodes according to the corresponding relation between the partitions in the partition view and the hard disks in the storage hard disks. In another implementation manner, the storage client may also directly send the data to the storage node where the hard disk of the corresponding segment in the storage stripe is located. The storage client sends the fragment to the hard disk, and may use Remote Direct Memory Access (RDMA) access to send, and sends a command for writing the data fragment to the Memory of the storage node, and the hard disk fetches the command from the Memory of the storage node to store the fragment. In another implementation, a queue of a hard disk in a storage node is stored in a memory of a network interface card of the storage node, a storage client sends a command for writing data fragments to the memory of the network interface card, and the hard disk fetches the command from the memory of the network interface card to store the fragments.
In the embodiment of the present invention, different storage nodes may be located in different Available areas (Available zones, AZ). Taking the storage node shown in FIG. 6 as an example, storage node 0-1 is located at AZ0, storage node 2-3 is located at AZ1, and storage node 4-5 is located at AZ2.
In order to ensure the reliability of data and the consistency of data access, a storage client receives a write request of an access device, and usually, it is ensured that data is successfully written into a corresponding hard disk of a storage node from a storage space, and a data write success response is returned to the access device. However, in the above process, since the storage client needs to complete the writing of the hard disks of the plurality of storage nodes, the access delay is increased, and the access performance and the throughput are affected.
The embodiment of the invention aims to provide the access performance of the storage system, customize the data access consistency and reduce the waiting time of the storage client for accessing the corresponding hard disk in the storage node. Read and write consistency levels are defined according to the number of copies N (stripe length). In the embodiment of the invention, an EC algorithm is used as a data redundancy algorithm, and the number N of the striped copies is the sum of the number of the data fragments and the number of the check fragments. In the embodiment of the invention, a multi-copy algorithm is used as a data redundancy algorithm, and the number of the partitioned copies is the number of the multiple copies.
The write consistency refers to one-time write operation of the storage client, namely, one-time stripe write operation is executed, and the write success is calculated only if the update of the W copies is successfully completed. The storage client may return a response to the access device that the write was successful. The read consistency means that the storage client executes a read operation once, R copies need to be read, and finally the latest data of the data version is returned.
When W + R is larger than N, for the storage client, the whole storage system can ensure strong consistency. As shown in fig. 7a, for example, in the EC algorithm, when the storage client writes corresponding fragments into a hard disk of the storage node, the storage node records corresponding operations in the form of a log. For example, the information of the recording slice includes information such as an identifier, a writing location, and a version number of a storage space to which the slice belongs. And before the writing of the hard disk is successful, keeping the log. The storage client may also establish a log for storing information of the corresponding segment. The storage client can also record the writing state of the hard disk of each storage node, such as whether the writing is successful or not. Take the division length N shown in fig. 6 as 6,w =5,r =2 as an example. When the storage client writes in a fragment in a stripe and receives write success responses returned by 5 storage nodes, the stripe write is determined to be successful, and the write back to the access device is successful. And recording corresponding logs by the storage client and the storage nodes which do not return success. As shown in fig. 7b, when the storage client reads the stripe, two of the stripes return a read success response, the versions of the two stripes are compared to determine the latest version, so as to determine the latest data, and according to the read range, the storage client returns the latest read data to the access device.
When W + R is larger than N, for the storage client, the whole storage system can ensure strong consistency. Still taking striping as shown in fig. 7a, taking the example of using the multi-copy algorithm, the length of the stripe is 6, i.e. the number of multi-copies N =6, w =4, r = 3. When the storage client writes the fragments into the hard disks of the corresponding 6 storage nodes, and receives successful write responses returned by the 4 storage nodes, the successful write of the fragments is determined, and the successful write responses of the data are returned to the access equipment. And recording corresponding logs by the storage client and the storage nodes which do not return success. As shown in fig. 7b, when the storage client reads the stripe, two of the stripes return a read success response, and the versions of the two stripes are compared to determine the latest version, thereby determining the latest data.
According to the embodiment of the invention, after the consistency mechanism is determined for the storage space, the storage client can return the successful access response to the access equipment more quickly, so that the access time delay of the storage system is reduced, and the performance and the throughput of the storage system are improved. The consistency mechanism may be set by a storage manager, and the storage client executes according to the consistency mechanism set by the storage manager. Alternatively, the consistency mechanism for a memory space may be set by the access device to which the memory space is allocated.
In one implementation, the storage client may continuously write to the hard disk of the storage node that does not return a write success response according to the write result of the hard disk in each storage node. And the storage client saves the log of the fragment corresponding to the hard disk of the storage node until the hard disk of the storage node returns success. In another implementation, the continuation write operation may be performed by a storage node that does not return a write success response. Specifically, when the hard disk of the storage node which does not return the write success response receives a new write fragment again, it is determined that an address of the unwritten fragment exists before the address of the write fragment according to the address of the write fragment, a fragment write operation which is not successfully executed exists, and the storage node starts a data tracing process. I.e., the shim did not perform a successful fragment write operation. The specific data tracing operation can be played back according to the log of the storage node; or acquiring corresponding fragments from other storage nodes of the same partition, or acquiring fragments from a storage client.
In the embodiment of the invention, the data writing amount of the storage space reaches a certain threshold or meets a certain condition, namely, the data is not written any more, and the storage space is converted into a closed state (seal). Before the storage space is changed to the seal, the storage client needs to determine whether the storage node is written with the information that the write back is successful. And starting the data tracing operation when the hard disk with the storage node does not return and write back successfully. The specific implementation can also be started by the storage node actively. The embodiment of the present invention does not limit this.
In the embodiment of the present invention, the storage client may also carry timestamp information in the fragment sent to the storage node. The storage node can judge the sequence of the fragments according to the time stamps, so that the received fragments are sequenced in the cache queue.
In order to implement the function of the storage client in the above embodiment of the present invention, an embodiment of the present invention provides a data storage device, as shown in fig. 8, including a dividing unit 801, a transceiving unit 802, and a returning unit 803; the dividing unit 801 is configured to obtain N slices from data written into a storage space by an access device according to a redundancy algorithm; a transceiving unit 802, configured to send the N slices to corresponding storage nodes in the N storage nodes, respectively; receiving write success responses returned by W storage nodes in the N storage nodes; wherein the storage space supports data append writing; wherein N and W are positive integers and N is greater than W; a returning unit 803, configured to return a data write success response to the access device. In one implementation, the W storage nodes are located in different available areas AZ.
Optionally, the data storage apparatus shown in fig. 8 further includes: a reading unit, configured to read the data from the N storage nodes, and obtain returned reading results from R storage nodes of the N storage nodes; wherein R is a positive integer, and R is less than N, and R + W is greater than N; and the first determining unit is used for determining the latest data in the versions from the reading results returned by the R storage nodes.
Optionally, the data storage apparatus shown in fig. 7 further includes: the second determining unit is further configured to determine that write-back to the access device is successful if the W storage nodes are successfully written before the N slices are respectively sent to corresponding storage nodes of the N storage nodes.
The data storage device shown in embodiment 8 of the present invention is further used to implement the foregoing various modes of the embodiments of the present invention, and the embodiments of the present invention do not limit this.
Another implementation of a storage client may refer to the architecture shown in fig. 4, comprising a processor and an interface. The interface is in communication with the processor, and the processor is used for realizing the functions of the storage clients in the embodiment of the invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium or computer program product. Based on such understanding, the technical solution of the present invention or a part thereof, which essentially contributes to the prior art, can be embodied in the form of a software product, which is stored in a storage medium and includes several computer instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing computer instructions, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Claims (16)

1. A method of data access, the method comprising:
the storage client obtains N slices from the data written into the storage space by the access device according to a redundancy algorithm, and respectively sends the N slices to corresponding storage nodes in the N storage nodes; wherein the storage space supports data append writing;
the storage client receives write success responses returned by W storage nodes in the N storage nodes; wherein N and W are positive integers and N is greater than W;
and the storage client returns a data writing success response to the access equipment.
2. The method of claim 1,
the W storage nodes are located in different available areas AZ.
3. The method according to claim 1 or 2, characterized in that the method further comprises:
the storage client reads the data from the N storage nodes;
the storage client obtains returned reading results from R storage nodes in the N storage nodes; wherein R is a positive integer, and R is less than N, and R + W is greater than N;
and the storage client determines the latest data in version from the reading results returned by the R storage nodes.
4. The method according to any one of claims 1-3, further comprising:
before the N slices are respectively sent to corresponding storage nodes in the N storage nodes, it is determined that the W storage nodes are successfully written in, and then the write back to the access device is successful.
5. A data storage device, characterized in that,
the dividing unit is used for writing the data written into the storage space by the access equipment into N slices according to a redundancy algorithm;
a transceiving unit for transmitting the N slices to corresponding storage nodes of the N storage nodes respectively,
receiving write success responses returned by W storage nodes in the N storage nodes; the storage space supports data appending and writing; n and W are positive integers, and N is greater than W;
a return unit for returning a data write success response to the access device.
6. The apparatus of claim 5,
the W storage nodes are located in different available areas AZ.
7. The apparatus of claim 5 or 6, further comprising:
a reading unit, configured to read the data from the N storage nodes, and obtain returned reading results from R storage nodes of the N storage nodes; wherein R is a positive integer, and R is less than N, and R + W is greater than N;
and the first determining unit is used for determining the latest data in the version from the reading results returned by the R storage nodes.
8. The apparatus of any of claims 5-7, further comprising:
the second determining unit is further configured to determine that write-back to the access device is successful if the W storage nodes are successfully written before the N slices are respectively sent to corresponding storage nodes of the N storage nodes.
9. A data storage device comprising a processor and an interface; the processor is in communication with the interface; wherein the processor is configured to:
obtaining N slices by writing the data of the access equipment in the storage space according to a redundancy algorithm;
respectively sending the N slices to corresponding storage nodes in the N storage nodes; wherein the storage space supports data append writing;
receiving write success responses returned by W storage nodes in the N storage nodes; wherein N and W are positive integers and N is greater than W;
and returning a data write success response to the access device.
10. The apparatus of claim 9,
the W storage nodes are located in different available areas AZ.
11. The apparatus of claim 9 or 10, wherein the processor is further configured to:
reading the data from the N storage nodes, and obtaining returned reading results from R storage nodes in the N storage nodes; wherein R is a positive integer, and R is less than N, and R + W is greater than N;
and determining the latest data in version from the read results returned by the R storage nodes.
12. The apparatus according to any of claims 9-11, wherein the processor is further configured to:
before the N slices are respectively sent to corresponding storage nodes in the N storage nodes, the write-in success of the W storage nodes is determined, and then the write-back success of the W storage nodes can be returned to the access device.
13. A storage system, comprising a storage client and N storage nodes, wherein the storage client is configured to: obtaining N slices by writing the data written into the storage space by the access equipment according to a redundancy algorithm;
respectively sending the N slices to corresponding storage nodes in the N storage nodes; wherein the storage space supports data append writing;
receiving write success responses returned by W storage nodes in the N storage nodes; wherein N and W are positive integers and N is greater than W;
and returning a data write success response to the access device.
14. The storage system of claim 13,
the W storage nodes are located in different available areas AZ.
15. The storage system according to claim 13 or 14, wherein the storage client is further configured to:
reading the data from the N storage nodes, and obtaining returned reading results from R storage nodes in the N storage nodes; wherein R is a positive integer, and R is less than N, and R + W is greater than N;
and determining the latest data in version from the reading results returned by the R storage nodes.
16. The storage system according to any of claims 13-15, wherein the storage client is further configured to:
before the N slices are respectively sent to corresponding storage nodes in the N storage nodes, it is determined that the write success of the W storage nodes can be returned to the access device, and the write success can be returned to the access device.
CN202111196880.0A 2021-06-17 2021-10-14 Data access method, device and storage system Pending CN115495010A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110672272 2021-06-17
CN2021106722726 2021-06-17

Publications (1)

Publication Number Publication Date
CN115495010A true CN115495010A (en) 2022-12-20

Family

ID=84465460

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111196880.0A Pending CN115495010A (en) 2021-06-17 2021-10-14 Data access method, device and storage system

Country Status (1)

Country Link
CN (1) CN115495010A (en)

Similar Documents

Publication Publication Date Title
US9697219B1 (en) Managing log transactions in storage systems
CN108459826B (en) Method and device for processing IO (input/output) request
US9910777B2 (en) Enhanced integrity through atomic writes in cache
US20150058291A1 (en) Log-structured storage device format
CN109445687B (en) Data storage method and protocol server
WO2021017782A1 (en) Method for accessing distributed storage system, client, and computer program product
US11314454B2 (en) Method and apparatus for managing storage device in storage system
US10482029B1 (en) Distributed shared memory paging
US11340829B1 (en) Techniques for log space management involving storing a plurality of page descriptor (PDESC) page block (PB) pairs in the log
CN111949210A (en) Metadata storage method, system and storage medium in distributed storage system
US11921695B2 (en) Techniques for recording metadata changes
US20190114076A1 (en) Method and Apparatus for Storing Data in Distributed Block Storage System, and Computer Readable Storage Medium
US9798638B2 (en) Systems and methods providing mount catalogs for rapid volume mount
US11775194B2 (en) Data storage method and apparatus in distributed storage system, and computer program product
WO2019127017A1 (en) Method and apparatus for managing storage device in storage system
US11210236B2 (en) Managing global counters using local delta counters
WO2021046693A1 (en) Data processing method in storage system, device, and storage system
US20210311654A1 (en) Distributed Storage System and Computer Program Product
CN115495010A (en) Data access method, device and storage system
WO2019086016A1 (en) Data storage method and device
CN112000289B (en) Data management method for full flash storage server system and related components
US10809918B2 (en) Cache performance on a storage system
CN112988034B (en) Distributed system data writing method and device
US11782842B1 (en) Techniques for reclaiming dirty cache pages
US20230132442A1 (en) Method for processing data by using intermediate device, computer system, and intermediate device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication