CN107181773B - Data storage and data management method and device of distributed storage system - Google Patents

Data storage and data management method and device of distributed storage system Download PDF

Info

Publication number
CN107181773B
CN107181773B CN201610134600.6A CN201610134600A CN107181773B CN 107181773 B CN107181773 B CN 107181773B CN 201610134600 A CN201610134600 A CN 201610134600A CN 107181773 B CN107181773 B CN 107181773B
Authority
CN
China
Prior art keywords
writable
file
blocks
client
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610134600.6A
Other languages
Chinese (zh)
Other versions
CN107181773A (en
Inventor
刘善阳
张海勇
石超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610134600.6A priority Critical patent/CN107181773B/en
Publication of CN107181773A publication Critical patent/CN107181773A/en
Application granted granted Critical
Publication of CN107181773B publication Critical patent/CN107181773B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Computer And Data Communications (AREA)

Abstract

A data storage and data management method and device of a distributed storage system comprise the following steps: the method comprises the steps that a client side obtains information of a plurality of writable blocks of a first file and information of a plurality of data servers corresponding to each writable block in the plurality of writable blocks from a main control server; when the client performs write operation on the first file once, the write operation is initiated on the plurality of writable blocks in a concurrent mode, and data is written to the corresponding data server; and if the client detects that all the data servers corresponding to any writable block in the plurality of writable blocks return successful data writing, the client judges that the writing operation on the first file is successful. The method and the device can effectively reduce the burr rate of the write operation of the single file.

Description

Data storage and data management method and device of distributed storage system
Technical Field
The present invention relates to data storage, and more particularly, to a data storage and data management method, a client, and a master server in a distributed storage system.
Background
Cloud computing technology is now becoming increasingly popular, and distributed storage is one of the most fundamental problems to be solved by cloud computing. The distributed storage system stores data on a plurality of physically dispersed storage nodes, performs unified management and distribution on resources of the nodes, provides a file access interface for a user, and solves the problem of limitation of a local storage system on file size, file quantity, number of opened files and the like.
A typical distributed storage system is deployed in a three-terminal mode, as shown in fig. 1, and generally includes a master server, a data server, and a client. The main control server is also called as a metadata server, a name space management module, a management server and the like, and a redundant working mode can be adopted during actual deployment. Data servers are also referred to as data storage servers, storage nodes, block servers, data management modules, and the like. The client, also referred to as a client, may be a variety of application servers or may be an end user. Common distributed storage systems include the Google File System (GFS), Taobao File System (TFS), MooseFS File System, and the like.
Taking GFS as an example, a Client (Client) is used to provide various interfaces for users of the distributed storage system; the data server (Chunkservers) is used for specifically managing user data; the Master server (Master) is used to manage metadata (metadata).
In GFS, all metadata of a user file is stored in Master, including the namespace and block (Chunk) namespace of the file, the mapping from file to block, and the data server corresponding to the block, etc. To improve the throughput of accessing the metadata, all the metadata is cached in the memory of the master server. The size of the memory occupied by the master control server is in direct proportion to the number of files, and the physical memory size of the machine where the master control server is located determines the number of files that can be stored by the cluster.
The file is divided into fixed size blocks. Each file has only one writable block at the master server. The data server stores the blocks as Linux files in a local disk. For reliability, each block is replicated on multiple data servers, and a data server storing a copy of a writable block may be referred to as the data server to which the writable block corresponds. By default, 3 copies are kept on the data server, each copy being stored on one data server. The client side can not read and write file data from the main control server. The client simply asks the master server which data server it should contact. The client caches the information for a period of time and directly interacts with the data server in subsequent operations.
A typical write flow of the existing distributed storage system is described below by taking a GFS as an example. After receiving a request from a user to write data to a certain file (hereinafter referred to as a first file), a client executes the following processing:
firstly, a client requests writable block information of a first file from a master control server;
step two, the main control server returns the information of one writable block of the first file and the information of a plurality of corresponding data servers to the client;
the information of the writable block may be identification information of the writable block, such as a block handle (Chunk-handle), and the information of the data server may be an address of the data server, or other information that may be used to obtain the address, such as a name, an identifier, and the like.
The write-once operation process of the client on the first file is as follows:
step three, the client initiates the write operation of the first file to the data servers and waits for the data servers to return the result of the write data;
and step four, if the data servers all return the success of data writing to the client, the client judges that the data writing operation on the first file is successful, and returns the success of data writing to the user.
The inventor of the invention finds that the existing scheme has the problem of high writing delay burr rate through research. Because the distributed storage system stores a plurality of copies, generally 3 copies or more, in a plurality of data servers, after all the data servers successfully write data, one distributed write data is successful. Then the latency of one distributed write of data depends on the latency of the slowest one copy write. For an online system where latency is a concern, the total latency distribution of 3 copies is much worse than that of a single copy. For example: the proportion that the single-copy write delay is less than 20ms is 0.9; then the ratio of 3 copy write delays less than 20ms is 0.930.729, that is, any one of the 3 copies is required to be less than 20ms, and the write delay of the 3 copies is less than 20 ms. From the burr rate of the write delay, as long as one of the 3 copies has a burr, the distributed write data is just a burr. Therefore, the problem of high writing delay burr rate exists in the existing distributed storage system.
Disclosure of Invention
In view of this, the present invention provides the following.
A data storage method of a distributed storage system, comprising:
the method comprises the steps that a client side obtains information of a plurality of writable blocks of a first file and information of a plurality of data servers corresponding to each writable block in the plurality of writable blocks from a main control server;
when the client performs write operation on the first file once, the write operation is initiated on the plurality of writable blocks in a concurrent mode, and data is written to the corresponding data server;
and if the client detects that all the data servers corresponding to any writable block in the plurality of writable blocks return successful data writing, the client judges that the writing operation on the first file is successful.
A client in a distributed storage system, comprising a write data module, wherein the write data module comprises:
the information acquisition unit is used for acquiring information of a plurality of writable blocks of the first file and information of a plurality of data servers corresponding to each writable block in the plurality of writable blocks from the master control server;
an operation initiating unit, configured to initiate a write operation on the multiple writable blocks in a concurrent manner when a write operation is performed on the first file, and write data to a corresponding data server:
and the operation judging unit is used for judging that the write operation on the first file is successful if all the data servers corresponding to any writable block in the writable blocks return successful write data after the operation initiating unit initiates the write operation on the writable blocks in a concurrent mode.
A data management method of a distributed storage system, comprising:
the method comprises the steps that a main control server distributes a plurality of writable blocks for a first file, and distributes a plurality of data servers for each writable block in the writable blocks;
the main control server stores writable block information of the first file, wherein the writable block information comprises information of a plurality of writable blocks allocated to the first file and information of a plurality of data servers allocated to each writable block in the plurality of writable blocks.
A master server in a distributed storage system, comprising:
the device comprises an allocation module, a storage module and a processing module, wherein the allocation module is used for allocating a plurality of writable blocks for a first file and allocating a plurality of data servers for each writable block in the plurality of writable blocks;
and the metadata storage module is used for storing writable block information of the first file, wherein the writable block information comprises information of the plurality of writable blocks and information of a plurality of data servers corresponding to each writable block in the plurality of writable blocks.
The scheme allows a single file to have a plurality of writable blocks simultaneously, initiates write operation on the plurality of writable blocks in a concurrent mode, judges that the write operation on the file is successful as long as the write operation on one writable block is successful, and can effectively reduce the burr rate of single-file write operation.
Drawings
FIG. 1 is a schematic diagram of a network architecture of a distributed storage system;
FIG. 2 is a flow chart of a data storage method according to an embodiment of the invention;
FIG. 3 is a block diagram of a client according to an embodiment of the present invention;
FIG. 4 is a flow chart of a method for managing data according to an embodiment of the present invention;
fig. 5 is a block diagram of a second main control server according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.
Example one
The present embodiment provides a data storage method of a distributed storage system, as shown in fig. 2, including:
step 110, the client acquires information of a plurality of writable blocks of the first file and information of a plurality of data servers corresponding to each writable block in the plurality of writable blocks from the master control server;
in this embodiment, in the aspect of a protocol oriented to the master control server, the client extends a protocol for acquiring information of writable blocks of a file, and adds support for multiple writable blocks of one file. The client sends a query request (namely a request for acquiring the writable block information of the first file) for the writable block information of the first file to the master control server, and acquires the writable block information of the first file. The query request may also be triggered by a user request or by timing or other means. The client side obtains writable block information of the file and then caches the writable block information locally, and the writable block information can be obtained again from the main control server if the file is down and restarted.
In an example, assuming that the number of writable blocks of the file configured for the first file is 3, the client acquires block handles of 3 writable blocks of the first file from the master server, where the block handle of each 1 block corresponds to addresses of 3 data servers, and each 1 data server is used to store 1 copy of the block.
Step 120, when the client performs a write operation on the first file once, the client initiates a write operation on the plurality of writable blocks in a concurrent manner, and writes data to the corresponding data server;
herein, initiating write operations in a concurrent manner means that when one write operation is not completed, the next write operation can be started, i.e., multiple write operations can be performed simultaneously, but does not mean that all write operations are started simultaneously. The concurrence mode of the embodiment is to initiate write operation to a plurality of writable blocks in sequence, so that the writing amount can be reduced while burrs are effectively reduced, and a certain writable block is prevented from becoming a hot spot for writing. The invention does not exclude a concurrent mode of simultaneous activation.
In this embodiment, the initiating, by the client, a write operation to the plurality of writable blocks in a concurrent manner includes: the client side sequentially initiates write operation to the M writable blocks, after one writable block is initiated, if any condition in the set conditions is met, the client side does not continue to initiate write operation, otherwise, when the configured time interval T is up, the client side continues to initiate write operation to the next writable block; wherein the set condition comprises one or more of the following conditions:
the writing operation on the first file within a configured time interval T is successful;
the second condition is that write operation is initiated to C writable blocks in the plurality of writable blocks, wherein C is the maximum concurrent number of the configured writable blocks, C is more than or equal to 2 and is less than or equal to M, and M is the number of the configured writable blocks of the file;
conditional three, a write operation has been initiated to all of the writable blocks in the plurality of writable blocks.
Where the condition places a limit on the maximum number of concurrencies of writable blocks to achieve the best balance between glitch rate and write volume.
In this embodiment, an interface protocol between the client and the user is expanded, and the time interval T and the maximum concurrent number C of the writable blocks may be configured through an interface between the client and the user, that is, the client is configured according to a configuration instruction of the user. T is less than the write latency time defined as a spur. Herein, the configuration parameters used in the write operation to the first file, such as the number of writable blocks of the file, are all the configuration parameters applicable to the first file. Specifically, the configuration may be configured for the first file, the first file or a file set including the first file may be specified in the corresponding configuration instruction, and the configuration parameter is saved as attribute information of the first file; or the configuration can be unified for all files, and the corresponding configuration instruction does not need to specify the files. Note that C, M above and N below are parameters for indicating the number of writable blocks, which are necessarily integers and are not described one by one.
Step 130, if the client detects that all data servers corresponding to any writable block in the plurality of writable blocks return successful data writing, the client determines that the write operation on the first file is successful.
If the write operation is performed on the first file according to the request of the user, the client returns a prompt of the success of the write operation to the user after judging that the write operation on the first file is successful.
In this embodiment, the operation of writing data to the data server by the client may be performed in the same manner as in the prior art, and the data server does not need to be modified.
In this embodiment, an interface between the client and the user and an interface between the client and the main control server are extended to support the user to query and configure the number of writable blocks of the file through the client, and accordingly, the method of this embodiment includes the following query process and configuration process:
the query process comprises the following steps: the client sends a query request for the number of writable blocks of the first file or all files to the master control server according to the user instruction; and the client receives the query result returned by the master control server.
The configuration process comprises the following steps: the client sends a configuration request for the number of the writable blocks of the first file or all files to the master control server according to the user instruction, wherein the configuration request carries the number of the writable blocks of the files configured for the first file or all files by the user; and the client receives the configuration result returned by the master control server.
This embodiment further provides a client in a distributed storage system, including a data writing module, as shown in fig. 3, where the data writing module includes:
an information obtaining unit 10, configured to obtain, from a master server, information of a plurality of writable blocks of a first file and information of a plurality of data servers corresponding to each of the plurality of writable blocks;
an operation initiating unit 20, configured to, when performing a write operation on the first file once, initiate a write operation on the plurality of writable blocks in a concurrent manner, and write data to a corresponding data server:
an operation determining unit 30, configured to, after the operation initiating unit initiates a write operation on the multiple writable blocks in a concurrent manner, determine that the write operation on the first file is successful if all data servers corresponding to any writable block in the multiple writable blocks return that the write data is successful.
Alternatively,
the operation initiating unit initiates a write operation to the plurality of writable blocks in a concurrent manner, including: sequentially initiating write operations to the M writable blocks, and after initiating a write operation to one writable block, if receiving a stop notification of the operation judging unit, not initiating the write operation any more, otherwise, when the configured time interval T is up, continuing to initiate the write operation to the next writable block;
the operation judging unit sends a stop notification to the operation initiating unit if judging that any one of the set conditions is satisfied after the operation initiating unit initiates a write operation to a writable block; wherein the set condition comprises one or more of the following conditions:
the writing operation on the first file within a configured time interval T is successful;
c is the maximum concurrent number of the configured writable blocks, C is more than or equal to 2 and less than or equal to M, and M is the number of the configured writable blocks of the file;
a write operation has been initiated to all of the writable blocks in the plurality of writable blocks.
Alternatively,
the client further comprises: and the user interface module is used for configuring the time interval T and/or the maximum concurrent number C of the writable blocks for the first file or all the files according to a configuration instruction of a user, wherein T is less than the writing delay time defined as the burr.
Alternatively,
the client further comprises a master server interface module, which comprises at least one of the following units:
the writable block quantity query unit is used for sending a query request for the writable block quantity of the first file or all files to the master control server according to a user instruction and receiving a query result returned by the master control server;
and the writable block quantity configuration unit is used for sending a configuration request for the number of the writable blocks of the first file or all files to the master control server according to a user instruction and receiving a configuration result returned by the master control server, wherein the configuration request carries the number of the writable blocks of the files configured for the first file or all files by the user.
Example two
The present embodiment relates to a data management method of a distributed storage system, as shown in fig. 4, including:
step 210, the master control server allocates a plurality of writable blocks for the first file, and allocates a plurality of data servers for each writable block in the plurality of writable blocks;
in this embodiment, when the master server allocates a data server to a writable block of the plurality of writable blocks, it is preferable to select a plurality of different data servers from among the data servers that are not allocated to any writable block of the plurality of writable blocks to allocate the writable block. The data servers allocated for the plurality of writable blocks are different, so that the relative independence of the operations of the writable blocks can be kept, the burrs are effectively reduced, and the failure of the write operations of the plurality of writable blocks caused by the failure of one data server is avoided. If the number of writable data servers in the system is too small to meet the non-repetitive requirements, the allocation action is also allowed to succeed.
The number of data servers allocated for each writable block is determined according to the number of copies of the configuration.
Step 220, the master control server stores writable block information of the first file, where the writable block information includes information of a plurality of writable blocks allocated to the first file and information of a plurality of data servers allocated to each writable block of the plurality of writable blocks.
In this embodiment, the master control server stores the number of writable blocks of the file and information of the plurality of writable blocks allocated to the first file in a nonvolatile memory for persistent storage, and the writable blocks are not lost after downtime and restart. The master server may simultaneously store this information in a cache. In this embodiment, the main control server stores the information of the data server corresponding to each writable block in the plurality of writable blocks in a volatile memory, such as a cache, so as to improve the access speed, and the information can be recovered according to the recorded metadata information after the downtime restart.
In this embodiment, the main control server receives, through an interface with the client, a query request of the client for information on writable blocks of the first file, and responds, and in terms of corresponding services and protocols, adds support for multiple writable blocks of one file. The method specifically comprises the following steps:
after receiving a query request of a client for information of writable blocks of a first file, a master control server judges whether the number N of the writable blocks allocated to the first file reaches a configured number M of the writable blocks of the file, wherein N is more than or equal to 0, M is more than or equal to 2, and N is less than or equal to M:
if yes, returning a query result to the client, wherein the query result carries the stored writable block information of the first file;
if not, distributing M-N writable blocks for the first file, distributing a plurality of data servers for each writable block in the M-N writable blocks, updating the stored writable block information of the first file, returning a query result to the client, and carrying the updated writable block information of the first file in the query result.
After a writable block is fully written, the writable block is no longer a writable block. The main control server can update the writable block information of the file according to the notification of the client, delete the fully written blocks from the stored writable blocks, and modify the number of the allocated writable blocks. The main control server may allocate a new writable block for the first file after a writable block of the first file is full or is unavailable due to other reasons, or may allocate a new writable block for the first file after receiving an inquiry request from the client, so that the current number of writable blocks of the first file reaches the configured number of writable blocks of the file.
In this embodiment, the main control server receives, through an interface with the client, a query and configuration request of the client for the number of writable blocks of the file, and responds, specifically:
the processing flow of the query request comprises the following steps: the method comprises the steps that a main control server receives a query request of a client for the number of writable blocks of a first file or all files, and the number of the writable blocks of the stored files is searched; and the main control server carries the number of the searched writable blocks of the file in the query result and returns the query result to the client.
The processing flow of the configuration request comprises the following steps: the method comprises the steps that a main control server receives a configuration request of a client for the number of writable blocks of a first file or all files, wherein the configuration request carries the number of the writable blocks of the files configured for the first file or all files by the user; and the main control server updates the saved file writable block number to the file writable block number configured in the configuration request, and returns a configuration result to the client to indicate whether the configuration is successful or not.
This embodiment further provides a master server in a distributed storage system, as shown in fig. 5, including:
an allocating module 50, configured to allocate a plurality of writable blocks for a first file, and allocate a plurality of data servers for each writable block in the plurality of writable blocks;
the metadata storage module 60 is configured to store writable block information of the first file, where the writable block information includes information of the plurality of writable blocks and information of a plurality of data servers corresponding to each of the plurality of writable blocks.
Alternatively,
the master control service further comprises a client interface module, wherein the client interface module comprises a writable block information query unit:
the writable block information query unit is used for notifying the distribution module after receiving a query request of a client for the first file writable block information;
after receiving the notification of the writable block information query unit, the allocation module determines whether the number N of writable blocks allocated to the first file reaches a configured number M of writable blocks of the file, where N is greater than or equal to 0, M is greater than or equal to 2, and N is less than or equal to M: if yes, notifying the client interface module; if not, allocating M-N writable blocks to the first file, allocating a plurality of data servers to each writable block in the M-N writable blocks, updating the writable block information of the first file in the metadata storage module, and then notifying the writable block information query unit;
the writable block information query unit is further configured to search the writable block information of the first file in the metadata storage module after receiving the notification of the allocation module, and return the searched writable block information of the first file to the client by carrying the writable block information in a query result.
Alternatively,
the allocation module allocates a plurality of data servers for each of the plurality of writable blocks, comprising: when a data server is allocated to a writable block in the plurality of writable blocks, a plurality of different data servers are preferentially selected from the data servers which are not allocated to any writable block in the plurality of writable blocks to be allocated to the writable block.
Alternatively,
the metadata storage module is also used for storing the configured number of the writable blocks of the file;
the client interface module further comprises a writable block number query unit and/or a writable block number configuration unit, wherein:
the writable block quantity query unit is configured to receive a query request of the client for the writable block quantity of the first file or files of all files, read the writable block quantity of the files from the metadata storage module, and return the writable block quantity of the files to the client after being carried in a query result;
the writable block quantity configuration unit is configured to receive a configuration request of the client for the number of the writable blocks of the first file or all files, update the number of the writable blocks of the file stored in the metadata storage module to the number of the writable blocks of the file configured by the user in the configuration request, and then return a configuration result to the client to indicate whether the configuration is successful.
Alternatively,
the metadata storage module stores the number of writable blocks of the file and information of a plurality of writable blocks allocated to the first file in a nonvolatile memory, and stores information of a plurality of data servers corresponding to each writable block in the plurality of writable blocks in a volatile memory.
In the above embodiment, a single file may have a plurality of writable blocks, and for a write operation of the file, the client initiates the write operation to a plurality of chunks in a concurrent manner, and as long as a write operation of a Chunk succeeds, it is determined that the write operation of the file is successful. Meanwhile, by configuring the maximum concurrency number C and the time interval T of the writable block, a tradeoff between performance and resources can be made, because the concurrency is high, the glitch rate can be reduced due to a short transmission interval, but the resources of the network and the disk are consumed more.
The invention is further illustrated by examples of several applications.
Example 1
The present example describes exemplary data writing procedures, configuration procedures, and the like from the perspective of the system. Wherein, the data writing process comprises:
firstly, a Client (Client) is started, and a user configures the data writing parameters of a first file as follows: configuring a maximum concurrency number C of writable blocks (Chunk); configuring a time interval T for initiating write operation to a plurality of Chunks;
the maximum concurrency number C of the Chunk is more than or equal to 2 and less than or equal to the number M of the file writable blocks of the first file.
Secondly, the Client sends a request for acquiring the writable block information of the first file to a Master server (Master);
thirdly, after receiving the request, the Master finds the existing Chunk information of the first file and the configured writable block number M of the file in the memory;
fourthly, the Master returns the writable block information of the first file to the Client:
in this step, the writable block information of the first file includes information of M chunks and information of a data server (Chunk) corresponding to each Chunk. If the number of the existing chunks of the first file is M, directly returning the writable block information of the first file to the Client; and if the number of the existing chunks is less than M, allocating chunks to the first file, wherein the newly allocated Chunk number plus the number of the existing chunks is equal to M. And then the writable block information of the first file is returned to the Client.
Fifthly, caching writable block information of the first file received by the Client locally;
the Client may number the M chunks of the first file to initiate write operations to the M chunks in sequence by the number.
Sixthly, the Client receives a write request of a user and sequentially initiates write operation on the M Chunks in a concurrent mode;
in this step, the Client selects a Chunk first, initiates a write operation to the Chunk, sends a write data request to all chunks corresponding to the Chunk, and simultaneously, the Client background checks that the logic is started, and after a preset time interval T, judges whether the following conditions are met:
under the condition one, all Chunkservers corresponding to the Chunk return successful data writing;
the second condition is that the number of chunks which have initiated the write operation reaches the maximum concurrency number C of chunks;
if any one of the above conditions is satisfied, the write operation is not initiated, otherwise, the write operation is continuously initiated to the next Chunk and the same judgment is performed until any one of the above conditions is satisfied. If the condition one is met, namely one Chunk write operation is successful, the seventh step is carried out; and if the condition II is met, the step eight is carried out.
Seventhly, the Client judges that the write operation on the first file is successful, and returns a prompt of successful write operation to the user;
if the data is written on a plurality of chunks before the write operation is successful, after the Client judges that the write operation is successful, the Client receives the data writing results of other chunks successively, and the success of the write operation is not influenced no matter the success or the failure of the write operation, and the data writing results are transparent to the user.
And step eight, the Client can wait for the set time length, and if none of the Chunk write operations is successful, a corresponding prompt is returned to the user.
When the Client finds that a Chunk is full, the above step two can be executed, that is, a request for acquiring the writable block information of the first file is sent to the Master.
In this example, the query and configuration flow for the number of writable blocks of the file is as follows:
firstly, a user calls a Client interface to obtain the number of writable blocks of a currently configured file;
secondly, the Client sends an inquiry request to the Master to inquire the number of writable blocks of the currently configured file;
thirdly, after receiving the Client request, the Master finds the number M 'of the writable blocks of the saved file and returns the M' to the Client;
step four, Cilent returns the inquired M' to the user;
fifthly, comparing the current value M 'with the target value M by the user, if the current value M' is equal to the target value M, not modifying, and ending; if the number of the writable blocks is not equal to the M, calling a Client interface, and turning to the sixth step to configure the number of the writable blocks of the file as M;
the user can determine the target value M of the number of the writable blocks of the file according to the requirement of writing throughput.
Sixthly, the Client initiates a configuration request to the Master, and the number of the writable blocks of the carried file is M;
seventhly, after the Master receives the configuration request, the number of the stored writable blocks of the file is updated to M, and the change success is returned to the Client
And step eight, the Client returns the success of the change to the user.
Example two
This example describes the effect of reducing the glitch rate with a practical example.
This example defines that a write latency exceeds 100ms to be a glitch, and for simplicity of calculation, it is assumed that the maximum number of concurrent writable blocks C is 2, and the time interval T for initiating a write operation to a plurality of writable blocks is 50 ms. According to the prior art scheme, i.e. operating on only one writable block, a typical write latency distribution is as follows:
the proportion of write latency over 50 ms: 0.01;
the write latency exceeds the 100ms scale: 0.001.
there may be two concurrent write operations in the optimization of this example, we denote as A and B: a is sent out first, and B is sent out 50ms after A is sent out. A and B are written to the data servers corresponding to two different writable blocks, and the data servers corresponding to the two Chunk blocks are different from each other. In most cases, a and B can be viewed as two independent events. Since the proportion of more than 50ms is only 0.01, the effect of the increase in B requests on the write latency distribution is negligible, and the write latency distributions for A and B are the same as in the above table.
After the optimization of the present example is carried out, the proportion of the spur is calculated also with the write delay exceeding 100ms as the spur. In the optimized write operation, the write delay exceeding 100ms needs to satisfy the following two conditions at the same time: the latency of the a operation exceeds 100ms and the latency of the B operation exceeds 50ms, since B is issued 50ms after a. There is therefore a simple probability calculation: 0.01 × 0.001 ═ 0.00001. That is to say, the writing delay burr rate after optimization is reduced by 100 times.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (20)

1. A data storage method of a distributed storage system, comprising:
the method comprises the steps that a client side obtains information of a plurality of writable blocks of a first file and information of a plurality of data servers corresponding to each writable block in the plurality of writable blocks from a main control server;
when the client performs write operation on the first file once, the write operation is initiated on the plurality of writable blocks in a concurrent mode, and data is written to the corresponding data server;
and if the client detects that all the data servers corresponding to any writable block in the plurality of writable blocks return successful data writing, the client judges that the writing operation on the first file is successful.
2. The method of claim 1, wherein:
the client initiates a write operation to the plurality of writable blocks in a concurrent manner, including:
the client side sequentially initiates write operation to the M writable blocks, after one writable block is initiated, if any condition in the set conditions is met, the client side does not continue to initiate write operation, otherwise, when the configured time interval T is up, the client side continues to initiate write operation to the next writable block;
wherein the set condition comprises one or more of the following conditions:
the writing operation on the first file within a configured time interval T is successful;
c is the maximum concurrent number of the configured writable blocks, C is more than or equal to 2 and less than or equal to M, and M is the number of the configured writable blocks of the file;
a write operation has been initiated to all of the writable blocks in the plurality of writable blocks.
3. The method of claim 2, wherein:
the time interval T and/or the maximum number of concurrent writable blocks C are configured for the first file or all files by the client according to a configuration instruction of a user, and T is smaller than the write delay time defined as a spur.
4. A method as claimed in claim 1, 2 or 3, characterized by:
the method further comprises the following steps:
the client sends a query request for the number of writable blocks of the first file or all files to the master control server according to a user instruction;
and the client receives the query result returned by the master control server.
5. A method as claimed in claim 1, 2 or 3, characterized by:
the method further comprises the following steps:
the client sends a configuration request for the number of the writable blocks of the first file or all files to the master control server according to a user instruction, wherein the configuration request carries the number of the writable blocks of the files configured for the first file or all files by a user;
and the client receives a configuration result returned by the master control server.
6. A data management method of a distributed storage system, comprising:
the method comprises the steps that a main control server allocates a plurality of writable blocks which can be written in concurrently for a first file, and a plurality of data servers are allocated for each writable block in the plurality of writable blocks; the plurality of writable blocks are used for the client to initiate write operation to the plurality of writable blocks in a concurrent mode, and when the write operation of one writable block is successful, the write operation of the first file is judged to be successful;
the main control server stores writable block information of the first file, wherein the writable block information comprises information of a plurality of writable blocks allocated to the first file and information of a plurality of data servers allocated to each writable block in the plurality of writable blocks.
7. The method of claim 6, wherein:
after the master control server saves the writable block information of the first file, the method further includes:
the master control server receives a query request of a client for the first file writable block information;
the main control server judges whether the number N of the writable blocks allocated to the first file reaches the configured number M of the writable blocks of the file, wherein N is more than or equal to 0, M is more than or equal to 2, and N is less than or equal to M:
if yes, returning a query result to the client, wherein the query result carries the stored writable block information of the first file;
if not, allocating M-N writable blocks to the first file, allocating a plurality of data servers to each writable block in the M-N writable blocks, updating the stored writable block information of the first file, returning a query result to the client, and carrying the updated writable block information of the first file.
8. The method of claim 6, wherein:
the master server allocating a plurality of data servers for each of the plurality of writable blocks, comprising:
when the master control server allocates a data server for a writable block in the writable blocks, a plurality of different data servers are preferentially selected from the data servers which are not allocated to any writable block in the writable blocks and allocated to the writable block.
9. The method of claim 6, wherein:
the method further comprises the following steps:
the master control server receives a query request of a client for the number of the writable blocks of the first file or all files, and searches the number of the stored writable blocks of the files;
and the main control server carries the number of the searched writable blocks of the file in a query result and returns the query result to the client.
10. The method of claim 6, wherein:
the method further comprises the following steps:
the master control server receives a configuration request of a client for the number of the writable blocks of the first file or all files, wherein the configuration request carries the number of the writable blocks of the files configured for the first file or all files by the user;
and the main control server updates the saved file writable block number to the file writable block number configured in the configuration request, and returns a configuration result to the client to indicate whether the configuration is successful.
11. The method of claim 7, 9 or 10, wherein:
the method further comprises the following steps:
and the master control server stores the number of the writable blocks of the file and the information of a plurality of writable blocks allocated to the first file in a nonvolatile memory, and stores the information of a plurality of data servers corresponding to each writable block in the plurality of writable blocks in a volatile memory.
12. A client in a distributed storage system, comprising a write data module, wherein the write data module comprises:
the information acquisition unit is used for acquiring information of a plurality of writable blocks of the first file and information of a plurality of data servers corresponding to each writable block in the plurality of writable blocks from the master control server;
an operation initiating unit, configured to initiate a write operation on the multiple writable blocks in a concurrent manner when a write operation is performed on the first file, and write data to a corresponding data server:
and the operation judging unit is used for judging that the write operation on the first file is successful if all the data servers corresponding to any writable block in the writable blocks return successful write data after the operation initiating unit initiates the write operation on the writable blocks in a concurrent mode.
13. The client of claim 12, wherein:
the operation initiating unit initiates a write operation to the plurality of writable blocks in a concurrent manner, including: sequentially initiating write operations to the M writable blocks, and after initiating a write operation to one writable block, if receiving a stop notification of the operation judging unit, not initiating the write operation any more, otherwise, when the configured time interval T is up, continuing to initiate the write operation to the next writable block;
the operation judging unit sends a stop notification to the operation initiating unit if judging that any one of the set conditions is satisfied after the operation initiating unit initiates a write operation to a writable block; wherein the set condition comprises one or more of the following conditions:
the writing operation on the first file within a configured time interval T is successful;
c is the maximum concurrent number of the configured writable blocks, C is more than or equal to 2 and less than or equal to M, and M is the number of the configured writable blocks of the file;
a write operation has been initiated to all of the writable blocks in the plurality of writable blocks.
14. The client of claim 13, wherein:
the client further comprises:
and the user interface module is used for configuring the time interval T and/or the maximum concurrent number C of the writable blocks for the first file or all the files according to a configuration instruction of a user, wherein T is less than the writing delay time defined as the burr.
15. The client of claim 12, 13 or 14, wherein:
the client further comprises a master control server interface module, and the master control server interface module comprises:
the writable block quantity query unit is used for sending a query request for the writable block quantity of the first file or all files to the master control server according to a user instruction and receiving a query result returned by the master control server; and/or
And the writable block quantity configuration unit is used for sending a configuration request for the number of the writable blocks of the first file or all files to the master control server according to a user instruction and receiving a configuration result returned by the master control server, wherein the configuration request carries the number of the writable blocks of the files configured for the first file or all files by the user.
16. A master server in a distributed storage system, comprising:
the device comprises an allocation module, a storage module and a processing module, wherein the allocation module is used for allocating a plurality of writable blocks which can be written in concurrently for a first file and allocating a plurality of data servers for each writable block in the plurality of writable blocks; the plurality of writable blocks are used for the client to initiate write operation to the plurality of writable blocks in a concurrent mode, and when the write operation of one writable block is successful, the write operation of the first file is judged to be successful;
and the metadata storage module is used for storing writable block information of the first file, wherein the writable block information comprises information of the plurality of writable blocks and information of a plurality of data servers corresponding to each writable block in the plurality of writable blocks.
17. The master server of claim 16, wherein:
the master control server also comprises a client interface module, wherein the client interface module comprises a writable block information query unit:
the writable block information query unit is used for notifying the distribution module after receiving a query request of a client for the first file writable block information;
after receiving the notification of the writable block information query unit, the allocation module determines whether the number N of writable blocks allocated to the first file reaches a configured number M of writable blocks of the file, where N is greater than or equal to 0, M is greater than or equal to 2, and N is less than or equal to M: if yes, notifying the client interface module; if not, allocating M-N writable blocks to the first file, allocating a plurality of data servers to each writable block in the M-N writable blocks, updating the writable block information of the first file in the metadata storage module, and then notifying the writable block information query unit;
the writable block information query unit is further configured to search the writable block information of the first file in the metadata storage module after receiving the notification of the allocation module, and return the searched writable block information of the first file to the client by carrying the writable block information in a query result.
18. The master server of claim 16, wherein:
the allocation module allocates a plurality of data servers for each of the plurality of writable blocks, comprising: when a data server is allocated to a writable block in the plurality of writable blocks, a plurality of different data servers are preferentially selected from the data servers which are not allocated to any writable block in the plurality of writable blocks to be allocated to the writable block.
19. The master server of claim 17, wherein:
the metadata storage module is also used for storing the configured number of the writable blocks of the file;
the client interface module further comprises a writable block number query unit and/or a writable block number configuration unit, wherein:
the writable block quantity query unit is configured to receive a query request of the client for the writable block quantity of the first file or files of all files, read the writable block quantity of the files from the metadata storage module, and return the writable block quantity of the files to the client after being carried in a query result;
the writable block quantity configuration unit is configured to receive a configuration request of the client for the number of the writable blocks of the first file or all files, update the number of the writable blocks of the file stored in the metadata storage module to the number of the writable blocks of the file configured by the user in the configuration request, and then return a configuration result to the client to indicate whether the configuration is successful.
20. The master server of claim 17, 18 or 19, wherein:
the metadata storage module stores the number of writable blocks of the file and information of a plurality of writable blocks allocated to the first file in a nonvolatile memory, and stores information of a plurality of data servers corresponding to each writable block in the plurality of writable blocks in a volatile memory.
CN201610134600.6A 2016-03-09 2016-03-09 Data storage and data management method and device of distributed storage system Active CN107181773B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610134600.6A CN107181773B (en) 2016-03-09 2016-03-09 Data storage and data management method and device of distributed storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610134600.6A CN107181773B (en) 2016-03-09 2016-03-09 Data storage and data management method and device of distributed storage system

Publications (2)

Publication Number Publication Date
CN107181773A CN107181773A (en) 2017-09-19
CN107181773B true CN107181773B (en) 2020-12-25

Family

ID=59829595

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610134600.6A Active CN107181773B (en) 2016-03-09 2016-03-09 Data storage and data management method and device of distributed storage system

Country Status (1)

Country Link
CN (1) CN107181773B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109510730B (en) * 2017-09-15 2022-04-26 阿里巴巴集团控股有限公司 Distributed system, monitoring method and device thereof, electronic equipment and storage medium
CN110659251B (en) * 2018-06-13 2023-03-21 阿里巴巴集团控股有限公司 Data processing method and system and electronic equipment
CN109756708B (en) * 2018-12-28 2021-05-14 深圳英飞拓智能技术有限公司 Continuous transmission method and device of audio and video data
CN109871365A (en) * 2019-01-15 2019-06-11 苏州链读文化传媒有限公司 A kind of distributed file system
CN114661240A (en) * 2022-03-30 2022-06-24 阿里巴巴(中国)有限公司 Data processing method and storage system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102387204A (en) * 2011-10-21 2012-03-21 中国科学院计算技术研究所 Method and system for maintaining consistency of cluster caching
CN102857554A (en) * 2012-07-26 2013-01-02 福建网龙计算机网络信息技术有限公司 Data redundancy processing method based on distributed storage system
CN102882983A (en) * 2012-10-22 2013-01-16 南京云创存储科技有限公司 Rapid data memory method for improving concurrent visiting performance in cloud memory system
CN103116552A (en) * 2013-03-18 2013-05-22 华为技术有限公司 Method and device for distributing storage space in distributed type storage system
CN103150123A (en) * 2011-11-18 2013-06-12 株式会社日立制作所 Volume copy management method on thin provisioning pool of storage subsystem

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140304452A1 (en) * 2013-04-03 2014-10-09 Violin Memory Inc. Method for increasing storage media performance

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102387204A (en) * 2011-10-21 2012-03-21 中国科学院计算技术研究所 Method and system for maintaining consistency of cluster caching
CN103150123A (en) * 2011-11-18 2013-06-12 株式会社日立制作所 Volume copy management method on thin provisioning pool of storage subsystem
CN102857554A (en) * 2012-07-26 2013-01-02 福建网龙计算机网络信息技术有限公司 Data redundancy processing method based on distributed storage system
CN102882983A (en) * 2012-10-22 2013-01-16 南京云创存储科技有限公司 Rapid data memory method for improving concurrent visiting performance in cloud memory system
CN103116552A (en) * 2013-03-18 2013-05-22 华为技术有限公司 Method and device for distributing storage space in distributed type storage system

Also Published As

Publication number Publication date
CN107181773A (en) 2017-09-19

Similar Documents

Publication Publication Date Title
CN107181773B (en) Data storage and data management method and device of distributed storage system
US11687446B2 (en) Namespace change propagation in non-volatile memory devices
US10242022B1 (en) Systems and methods for managing delayed allocation on clustered file systems
US8793466B2 (en) Efficient data object storage and retrieval
EP3800538B1 (en) Method and apparatus for data migration
CN104899156A (en) Large-scale social network service-oriented graph data storage and query method
US20180181339A1 (en) Asynchronous semi-inline deduplication
US10482062B1 (en) Independent evictions from datastore accelerator fleet nodes
EP2511814A1 (en) Method of implementing array of disk and method and device of reading/writing data
CN112000287B (en) IO request processing device, method, equipment and readable storage medium
US9307024B2 (en) Efficient storage of small random changes to data on disk
CN115599747B (en) Metadata synchronization method, system and equipment of distributed storage system
US20190199794A1 (en) Efficient replication of changes to a byte-addressable persistent memory over a network
CN108777718B (en) Method and device for accessing read-write-more-less system through client side by service system
CN113268472A (en) Distributed data storage system and method
CN112334891A (en) Centralized storage for search servers
US8028011B1 (en) Global UNIX file system cylinder group cache
CN107491264B (en) Data writing method and device in distributed system
CN107145303B (en) Method and equipment for executing file writing in distributed storage system
CN106484310B (en) Storage array operation method and device
CN114528260A (en) File access request processing method, electronic equipment and computer program product
US11449425B2 (en) Using storage class memory as a persistent operating system file/block cache
US9864761B1 (en) Read optimization operations in a storage system
CN117539915B (en) Data processing method and related device
CN115550317B (en) Network resource management method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant