CN105912612A - Distributed file system and data equilibrium distribution method orienting same - Google Patents

Distributed file system and data equilibrium distribution method orienting same Download PDF

Info

Publication number
CN105912612A
CN105912612A CN201610211732.4A CN201610211732A CN105912612A CN 105912612 A CN105912612 A CN 105912612A CN 201610211732 A CN201610211732 A CN 201610211732A CN 105912612 A CN105912612 A CN 105912612A
Authority
CN
China
Prior art keywords
storage
module
data
client
storage server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610211732.4A
Other languages
Chinese (zh)
Other versions
CN105912612B (en
Inventor
曹碧舟
张建
孟军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHENZHEN ZHONGBO KECHUANG INFORMATION TECHNOLOGY Co Ltd
Tvzone Media Co Ltd
Original Assignee
SHENZHEN ZHONGBO KECHUANG INFORMATION TECHNOLOGY Co Ltd
Tvzone Media Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHENZHEN ZHONGBO KECHUANG INFORMATION TECHNOLOGY Co Ltd, Tvzone Media Co Ltd filed Critical SHENZHEN ZHONGBO KECHUANG INFORMATION TECHNOLOGY Co Ltd
Priority to CN201610211732.4A priority Critical patent/CN105912612B/en
Publication of CN105912612A publication Critical patent/CN105912612A/en
Application granted granted Critical
Publication of CN105912612B publication Critical patent/CN105912612B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A distributed file system includes a client, a storage cluster module, and a management cluster module; the client includes a user input port and an intelligent selection algorithm module for equally allocating stored data; the storage cluster module includes a plurality of storage servers, and each storage server includes a plurality of disks for storing stored data input by the client by means of the user input port; the management cluster module includes a plurality of memory managers for maintaining and managing state data among all the modules of the distributed file system; and the client, the storage cluster module, and the management cluster module are in a communication connection through a network protocol. The invention also includes a data equilibrium distribution method orienting the distributed file system. The beneficial effects of the distributed file system are that: the distributed file system is simple in structure, is easy to operate, can equally send client data to the storage devices of the storage clusters, and improve the degree of separation of the stored data in different storage zones.

Description

Distributed file system and data balanced distribution method oriented to same
Technical Field
The invention relates to the technical field of computers, in particular to a distributed file system and a data balanced distribution method oriented to the system.
Background
In recent years, distributed file systems have become a new trend in the development of computer technology. Distributed storage systems that are rapidly emerging are facing the problem of distributing PB-level data among thousands of storage devices. In such systems, there is a need to distribute data and load to fully utilize the available resources and maximize the performance of the system, while accommodating the growth of the system and managing the failure of hardware devices. Most systems simply write data, and a common problem is that once written, the data is hardly moved any more. The system can also become unstable when the storage devices are expanding because newly added storage devices are either empty or full of new data. Storage devices and system resources are only fully utilized if all of the remaining available resources are fully utilized based on the workload of the system. A robust solution is to distribute all data randomly over the available storage devices. This results in a probability-balanced distribution and confusion of old and new data. When a new storage device is added, random samples of the original data will be migrated to the new storage device to keep the system balanced. The key advantage of this approach is that all devices will load equally and the system will perform well under any potential workload.
Furthermore, in a large storage system providing a high level of parallelism and aggregate bandwidth, a large file will be randomly distributed across a large number of available devices. However, the number of distributions based on simple hashes cannot cope with varying devices, resulting in massive reorganization of data. In addition, with the existing random allocation scheme, when the data copy propagates in the storage cluster, the risk of data loss is high due to which storage device fails.
Therefore, an intelligent selection algorithm, an extensible and pseudo-random data uniform distribution algorithm, is developed, and is specially used for mapping between data and storage devices in a distributed file system, and the mapping does not depend on directories in a storage cluster. Because distributed storage systems are inherently dynamic and scalable, intelligent selection algorithms are used to accommodate the addition and deletion of storage devices and to minimize the movement of data within the storage distribution. The algorithm can be widely applied to various data replication and reliability mechanisms, and data are distributed in a balanced manner according to a strategy defined by a user, so that data copies are separated in different failure domains.
Disclosure of Invention
The invention provides a distributed file system and a data balanced distribution algorithm based on the same, which are used for solving the defects of the prior art, realizing the mapping between stored data and storage equipment, carrying out balanced distribution on the stored data according to a strategy defined by a user and improving the separation degree of the stored data in different storage areas.
The present invention solves the above technical problems, and provides the following technical solutions: a distributed file system comprises a client, a storage cluster module and a management cluster module; the client comprises a user input port for inputting storage data and an intelligent selection algorithm module for balanced distribution of the storage data;
the storage cluster module comprises a plurality of storage servers, and each storage server comprises a plurality of magnetic disks and is used for storing storage data input by the client through the user input port;
the management cluster module comprises a plurality of memory managers and is used for maintaining and managing state data among the modules in the distributed file system;
the intelligent selection algorithm module of the client comprises a first selection module for selecting storage server ID values, an iteration operation module for acquiring a plurality of storage server ID values corresponding to the storage server ID values, a second selection module for selecting a proper storage server ID value from the plurality of storage server ID values, and a result module for storing the storage server ID value selected by the second selection module; the first selection module is connected with the storage cluster module, and the first selection module, the iteration operation module, the second selection module and the result module are sequentially connected with one another;
the client, the storage cluster module and the management cluster module are respectively and mutually communicated and connected through a network protocol in sequence.
In the distributed file system of the invention, the storage cluster module comprises a hierarchical structure module, a copy distribution strategy module and a data module; wherein,
the hierarchical structure module comprises storage servers and buckets, and is used for counting the storage servers in the storage cluster module and selecting the storage servers for storing data in different modes;
the copy distribution strategy module is used for setting N copy storage data for each storage data and respectively storing the N copy storage data in different N storage servers;
the data module is to store each storage server ID value.
In the distributed file system of the present invention, the buckets of the hierarchy module include a plurality of storage servers or a plurality of buckets;
in the distributed file system of the present invention, the buckets include a balance bucket, a list bucket, a tree bucket, and a draw bucket;
the equalization buckets include buckets having the same weight;
the list bucket is composed of buckets with a linked list structure, wherein the list bucket comprises a plurality of storage servers, and the weight of each storage server is any weight;
the tree bucket is composed of buckets with tree structures, wherein the tree bucket comprises a plurality of storage servers, and the weight of each storage server is the sum of the weights of adjacent storage servers in the tree structures;
the drawing barrel comprises a plurality of storage servers, and the storage servers are randomly drawn through the weights of the storage servers;
wherein the buckets in the hierarchy store corresponding storage data in a manner of balanced buckets.
In the distributed file system of the present invention, the storage server has a storage server ID value and a weight; the bucket having a bucket ID value and a weight, wherein the bucket weight is a sum of weights of the plurality of storage servers,
in the distributed file system, the client, the management cluster module and the storage cluster module are respectively provided with configuration files, and the client acquires the storage information and the state information of the storage server corresponding to the storage cluster module by calling the management cluster module according to the ID value and the port information of the configuration files of the client for timely operation and distribution of the intelligent selection algorithm module of the client.
In the distributed file system of the invention, the network protocol adopts a TCP/IP protocol to connect each module of the distributed file system in a communication way.
The invention also comprises a data balanced distribution method facing the distributed file system, which comprises the following steps:
s1, inputting storage data and setting information through a user input port of the client;
s2, the client side acquires the state information of the storage cluster module through the management cluster module and is in communication connection with the storage cluster module;
s3, the client calls an intelligent selection algorithm module to calculate N storage server ID values capable of storing corresponding storage data according to the state information of the storage cluster module, wherein the N storage server ID values correspond to the N storage servers;
s4, the client stores the storage data to a first storage server in the corresponding N storage servers through the acquired storage server ID value, and the first storage server sends feedback information to the client after receiving the storage data so as to confirm that the storage data is successfully stored; wherein, the first storage server is a main storage server;
s5, the client stores the storage data in N-1 storage servers which are not the first storage server respectively through the acquired storage server ID values, and the storage data are guaranteed to have N-1 copy storage data.
In the data equal distribution method facing the distributed file system, the step S3 includes the following steps:
s31, the client calls an intelligent selection algorithm module of the client according to the state information of the storage cluster module;
s32, selecting one storage server ID value in the hierarchical structure of the storage cluster module by a first selection module of the intelligent selection algorithm module through the state information of the storage cluster module, and returning N storage server ID values corresponding to the storage server ID value to a bucket of the storage cluster module;
s33, the N storage server ID values in the bucket of the storage cluster module are subjected to iteration operation through an iteration operation module, each storage server correspondingly returns the N storage server ID values through downward traversal, and the N storage server ID values correspondingly returned by each storage server ID value are respectively stored in the N buckets;
s34, the iteration operation module calls a second selection module, and the second selection module respectively selects one storage server ID value in N buckets in a pseudo-random mode;
s35, the second selection module stores the selected N storage server ID values in the result module respectively, so that the identification of the client storage data is facilitated.
In the data balanced distribution method for the distributed file system, the client, the storage cluster module and the management cluster module are in network communication connection with each other in a network protocol mode.
The distributed file system and the data balanced distribution method oriented to the system have the following beneficial effects: the data transmission method and the data transmission device have the advantages that the structure is simple, the operation is convenient, and the data of the client side is transmitted to the storage equipment in the storage cluster in a balanced mode. The mapping between the storage data and the storage equipment is carried out, the storage data are evenly distributed according to the strategy defined by the user, and the separation degree of the storage data in different storage areas is improved. Therefore, the problem of balanced data distribution and load is solved, and the performance of the system is utilized to the maximum extent.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a schematic diagram of a distributed file system architecture of the present invention;
FIG. 2 is a schematic diagram of a distributed file system module relationship of the present invention;
FIG. 3 is a schematic diagram of an abstract workflow of the intelligent selection algorithm module of the present invention;
fig. 4 is a flowchart illustrating a data equal distribution method according to the present invention.
As shown, 100, a client, 200, a storage cluster module, 300, a management cluster module, 110, a user input port, 120, an intelligent selection algorithm module, 121, a first selection module, 122, an iterative operation module, 123, a second selection module, 124, a result module, 1211, take (item1), 1212, take (item2), 1213, take (item), 1221, select (host1), 1222, select (host2), 1223, select (host3), 1231, select (disk1), 1232, select (disk1), 1233, select (disk 1).
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
FIG. 1 is a diagram illustrating a distributed file system structure according to the present invention;
a distributed file system comprises a client 100, a storage cluster module 200 and a management cluster module 300; wherein,
the client 100 comprises a user input port 110 for inputting storage data and an intelligent selection algorithm module 120 for balanced allocation of the storage data;
the storage cluster module 200 includes a plurality of storage servers, and the storage servers include a plurality of disks, and are used for storing the storage data input by the client 100 through the user input port 110;
the management cluster module 300 includes a plurality of memory managers for maintaining and managing state data between modules in the distributed file system;
the intelligent selection algorithm module 120 of the client 100 includes a first selection module 121 for selecting a storage server ID value, an iterative operation module 122 for obtaining a plurality of storage server ID values corresponding to the storage server ID value, a second selection module 123 for selecting an appropriate storage server ID value from the plurality of storage server ID values, and a result module 124 for storing the storage server ID value selected by the second selection module 123; the first selection module 121 is connected to the storage cluster module 200, and the first selection module 121, the iterative operation module 122, the second selection module 123 and the result module 124 are sequentially connected to each other;
the client 100, the storage cluster module 200, and the management cluster module 300 are respectively connected to each other in communication via a network protocol.
FIG. 2 is a schematic diagram of a relationship between modules of the distributed file system according to the present invention;
in the distributed file system of the present invention, the storage cluster module 200 includes a hierarchical structure module, a copy distribution policy module, and a data module; wherein,
the hierarchical structure module comprises storage servers and buckets, and is used for counting the storage servers in the storage cluster module 200 and selecting the storage servers for storing data in different modes;
any client can independently calculate the position of each data block (object) in the storage distribution through the technology (intelligent selection algorithm), centralization is removed, and the possibility of single-point failure is removed.
A Bucket (Bucket) may contain any number of items. The items may be all storage devices (devices) or buckets (buckets), making them form internal nodes in a storage tree structure, a storage device (device) always being a leaf node of this tree structure. The administrator controls the weight of the storage device. The weight is related to the capacity of the storage device. The average load of such devices is proportional to the amount of data stored. This results in a one-dimensional position index, weight, capability that should be derived from the storage device. The weight of a packet is defined as the sum of the weights of all the items it contains. Buckets may consist of a hierarchy of any available storage. For example, a distributed map may be created with buckets named "host" representing the lowest level of one host to contain disk devices on the host, and then buckets named "frame" to contain hosts mounted on the same rack. In a large system, the "frame" bucket representing a rack may also be contained in a "row" bucket or a "from" bucket. The client recursively distributes the data blocks into hierarchically distinct bucket elements via a pseudo-random hash-like function. The intelligent selection algorithm module is based on 4 different bucket types:
1. barrel balancing:
the equalization buckets of the intelligent selection algorithm module 120 are used to indicate the types of devices that have the same weight. Wherein the intelligent selection algorithm module may map replicas into equalization buckets at a constant time. Other bucket types may be used where compliance restrictions are not appropriate. The intelligent selection algorithm module has an input value x and a complex number r, and selects a term from m buckets of congruence using the function c (r, x) ═ hash (x) + rp) mod m, where p is a randomly (but deterministic) selected prime number greater than m. When any r ≦ m, a different entry is selected using a simple numerical theory, which is not reliable when r > m, meaning that it is possible to generate the same entry when there are two different copies r with the same input x. In practice, this does not mean a collision probability greater than zero, or that the localization function needs to be repeated once. If the number of equalization buckets changes, there will be a complete reformation of data between devices, as in conventional hash-based allocation strategies.
2. Listing the buckets:
the structure of the list bucket is a linked list structure, and the contained items can have any weight. The intelligent selection algorithm module searches the position of the copy from the head of the table, firstly obtains the weight Wh of the head item and the sum Ws of the weights of all items in the remaining linked list, then obtains a value v of [ 0-1 ] according to hash (x, r, item), if the value v is in [ 0-Wh/Ws ], the copy is in the head item, and returns the id of the head item. Otherwise, the user continues to traverse the remaining linked list.
3. Tree barrel:
the lookup complexity of the linked list structure of the list bucket is O (n), and the lookup complexity of the decision tree of the tree bucket is O (log n). Wherein, item of the tree bucket is leaf node of the decision tree, other nodes in the decision tree know the weights of the left and right subtrees, and the weight of the node is equal to the sum of the weights of the left and right subtrees. The intelligent selection algorithm module searches the position of the copy from the root node, firstly obtains the weight Wl of the left sub-tree of the node, obtains the weight Wn of the node, then obtains a value v of [ 0-1 ] according to hash (x, r, node _ id), if the value v is in [ 0-Wl/Wn), the copy is in the left sub-tree, and if not, the copy is in the right sub-tree. The nodes continue to be traversed until a leaf node is reached. The key of the Tree Bucket is that when a delete leaf node is added, the node _ id of other nodes in the decision Tree is unchanged. The identification of node _ id of a node in the decision tree is determined based on a neutral traversal of the binary tree (node _ id is not equal to id of item, nor equal to weight of node).
4. The drawing barrel:
this type lets all the items contained in a bucket compete fairly (unlike list buckets and tree buckets which need not be traversed). This algorithm is just like a draw, with all items having the opportunity to be drawn (only the longest draw can be drawn). The length of each label is determined by length ═ f (wi) hash (x, r, i), f (wi) is related to the weight of item, and i is the id value of item. c (r, x) ═ MAXi (f (wi) × hash (x, r, i)).
In the distributed file system of the present invention, the buckets include a balance bucket, a list bucket, a tree bucket, and a draw bucket;
the equalization buckets include buckets having the same weight;
the list bucket is composed of buckets with a linked list structure, wherein the list bucket comprises a plurality of storage servers, and the weight of each storage server is any weight;
the tree bucket is composed of buckets with tree structures, wherein the tree bucket comprises a plurality of storage servers, and the weight of each storage server is the sum of the weights of adjacent storage servers in the tree structures;
the drawing barrel comprises a plurality of storage servers, and the storage servers are randomly drawn through the weights of the storage servers;
wherein the buckets in the hierarchy store corresponding storage data in a manner of balanced buckets.
The copy distribution strategy module is used for setting N copy storage data for each storage data and respectively storing the N copy storage data in different N storage servers; the constraints on the storage of the data by these copies (e.g., 3 copies in different racks) should ensure that the data copies on the device are placed in different racks, hosts, power supplies, controllers, or other physical locations, since the intelligent selection algorithm is set to distribute the data evenly in a weighted manner according to the storage capacity and broadband resources of the device, and to maintain a relative probability balance.
Where the copies are placed in a storage device having a hierarchical structure, this also has a significant impact on data security. By organizing the physical installation of the reflective system, the intelligent selection algorithm module 120 can modularize the system to locate potential equipment failures. These potentially failing resources include physical, such as common power supplies, common networks. By encoding information into a distributed map, the copy placement strategy of the intelligent selection algorithm module 120 can place data objects independently in different fault domains while still maintaining the desired distribution. For example, to locate a possible concurrent failure, it should be ensured that the copies of the data on the device are placed in different racks, hosts, power supplies, controllers, or other physical locations. And the fault domain is isolated, so that the data safety is ensured to a greater extent.
The data module is to store each storage server ID value.
In the distributed file system of the present invention, the buckets of the hierarchy module include a plurality of storage servers or a plurality of buckets;
in the distributed file system of the present invention, the storage server has a storage server ID value and a weight; the bucket having a bucket ID value and a weight, wherein the bucket weight is a sum of weights of the plurality of storage servers,
in the distributed file system of the present invention, the client 100, the management cluster module 300, and the storage cluster module 200 respectively have configuration files, and the client 100 obtains the storage information and the state information of the storage server corresponding to the storage cluster module 200 by calling the management cluster module 300 according to the ID value and the port information of the configuration files, so as to be used for the timely operation and allocation of the intelligent selection algorithm module 120 of the client 100.
In the distributed file system of the invention, the network protocol adopts a TCP/IP protocol to connect each module of the distributed file system in a communication way.
FIG. 3 is a schematic diagram illustrating an abstract workflow of the intelligent selection algorithm module according to the present invention;
as shown, the first selection module 121 in the figure represents take (item), and it selects an item (a node in the distributed hierarchy), typically a bucket, and returns all items (child nodes) contained in the bucket. These items are parameters of the subsequent operation, these items constitute a vector i; wherein 1211 in the figure represents that the first sub-section is at the first selection module take (item1), 1212 in the figure represents that the second sub-section is at the first selection module take (item2), and 1213 in the figure represents that the third sub-section is at the first selection module take (item 3).
The iteration operation module 122 in the figure represents select (host), which iteratively operates on each item (item in vector i), and for each item (item in vector i) to traverse downwards (traverse the item contained in this item), returns n different items (items with type t) and puts the items into vector i; wherein 1221 in the diagram indicates that the first sub-node is at iteration operation module select (host1), 1222 in the diagram indicates that the second sub-node is at iteration operation module select (host2), and 1223 in the diagram indicates that the third sub-node is at iteration operation module select (host 3).
The second selection module 123 in the figure represents select (disk), which calls the c (r, x) function, which pseudo-randomly selects an item in each bucket (bucket); wherein 1231 in the diagram indicates that the first child node is located at the second selection module select (disk1), 1232 in the diagram indicates that the second child node is located at the second selection module select (disk1), and 1233 in the diagram indicates that the third child node is located at the second selection module select (disk 1).
Emit in the figure represents the result module 124, which puts the vector i into the result module result, resulting in a result.
FIG. 4 is a schematic diagram illustrating a process flow of the data equal distribution method according to the present invention;
the invention also comprises a data balanced distribution method facing the distributed file system, which comprises the following steps:
s1, inputting the storage data and the setting information through the user input port 110 of the client 100;
s2, the client 100 obtains the status information of the storage cluster module 200 through the management cluster module 300, and is in communication connection with the storage cluster module 200;
s3, the client 100 calls the intelligent selection algorithm module 120 to calculate N storage server ID values capable of storing corresponding storage data according to the state information of the storage cluster module 200, where the N storage server ID values correspond to N storage servers;
s4, the client 100 stores the storage data in a first storage server of the N corresponding storage servers according to the obtained storage server ID value, and the first storage server sends feedback information to the client after receiving the storage data, so as to confirm that the storage data is successfully stored; wherein, the first storage server is a main storage server;
s5, the client 100 stores the storage data in N-1 storage servers which are not the first storage server respectively through the acquired storage server ID values, and the storage data are used for ensuring that the storage data have N-1 copy storage data.
In the data equal distribution method facing the distributed file system, the step S3 includes the following steps:
s31, the client 100 calls the intelligent selection algorithm module 120 of the client 100 according to the status information of the storage cluster module 200;
s32, the first selection module 121 of the intelligent selection algorithm module 120 selects a storage server ID value in the hierarchical structure of the storage cluster module 200 according to the state information of the storage cluster module 200, and returns N storage server ID values corresponding to the storage server ID value to the bucket of the storage cluster module 200;
s33, through the iterative operation of the iterative operation module 122, each storage server correspondingly returns N storage server ID values through downward traversal, and the N storage server ID values returned by the storage server ID value corresponding to each storage server ID value are respectively stored in N buckets;
s34, the iterative operation module 122 invokes the second selection module 123, and the second selection module 123 pseudo-randomly selects one storage server ID value from the N buckets, respectively;
s35, the second selecting module 123 stores the selected N storage server ID values in the result module 124, so as to facilitate the identification of the storage data of the client 100.
In the data balanced distribution method for the distributed file system of the present invention, the client 100, the storage cluster module 200, and the management cluster module 300 are in network communication connection with each other in a network protocol manner.
In the intelligent selection algorithm module 120, the first selection module 121take (item) function operation selects an item (usually a bucket) in the storage hierarchy and assigns it to a vector i, which is then used as an input for another operation.
The iteration operation module 122select (host) function operates on each element of the vector i.
In a distributed file system, in order to ensure the reliability of data, a redundant file is generated for each file. The primary storage unit is used to store original data (blocks), and the secondary storage unit is used to store redundant data (blocks).
In the subtree rooted at the current point, n different, similar t items are selected.
The storage devices are of a fixed and known type, and each bucket in the system has a type field for distinguishing between different types of buckets (e.g., for "rows" and for "cabinets").
For i e-i, the iteration operation module 122select (host) iterates through the requested items r e 1.. times.n, recursively down through all intermediate buckets, with setting information c (r, x) (the corresponding c (r, x) is defined in claim 1), and in each bucket, an embedded item is quasi-randomly selected. Until an item of type t is found. The result n | -i | different items are put back into input i, either as input for the subsequent iteration operation module select (host) operation, or shifted into the result vector by the result module emit operation.
1. The iterative operation module 122select (host) operation may traverse multiple storage levels from below the starting point to locate n different items of the specified type t, the recursive process being parameterized in part by r 1. In this process, the intelligent selection algorithm module 120 may reinitiate and select an edited r', which is mainly based on three reasons:
if the item is already in the current selection set (conflict, select (host) result must be different);
2. if the device fails, or if the device is overloaded. A failed or overloaded device will be flagged in the cluster map but will also remain in the hierarchy to avoid unnecessary data movement. By probabilistically reinitiating cluster mapping with quasi-random re-initiation, the intelligent selection algorithm module 120 will selectively divert data for overloaded devices-typically triggered by an overloaded usage report. For the failed or overloaded device, the intelligent selection algorithm module 120 may uniformly redistribute the items in the storage cluster module 200, and only needs to initiate the select (host) operation of the iterative operation module 122 at the starting point again. In the case of a collision, another r' vector is first used to recurse in the internal hierarchy to attempt a local search, thereby avoiding distorting the overall data distribution in the sub-tree where collisions are most likely (e.g., bucket numbers less than n).
3. In a master-replica replication scheme, it is often necessary that a previous replica be a new master after a replica target fails. In this case, the smart selection algorithm module 120 may use "top n" as a suitable target by reselecting in r' ═ r + f. Where f is the number of localization attempts for the current iteration operation block select (host) failure. However, for both verification and code erasure scenarios, the rank and location of the storage device at the output of the intelligent selection algorithm module 120 is critical, as each target stores a different data bit of the data object. In particular, if the storage device fails, it may be replaced locally by the output list R of the intelligent selection algorithm module 120, so that the other devices in its list remain ranked. In this case, the intelligent selection algorithm module 120 re-selects using r' ═ r + frn. Where fr is the number of failed attempts in r. Thus, a series of candidates can be defined for each replica level, which are uncorrelated in probability of failure. Instead, RUSH has no specific means to handle failed devices; other hash distribution functions already exist, but implicitly use the "first n" approach to skip failed devices in the final result. This makes them unusable for processing scenarios under verification.
Although the intelligent selection algorithm module 120 is well balanced in having a large storage system (with little difference in device utilization), any random process, which translates into a non-zero probability, will allocate space on a particular device significantly more than it would be on average. Unlike the existing probabilistic mapping algorithms (including RUSH), the intelligent selection algorithm module 120 includes a per-device overload correction mechanism, so that any portion of the device data can be redistributed. This can be used to selectively "tune" an overloaded device by withdrawing a portion of the data in the event that the device is at risk of being overloaded. When 99% capacity is reached in a cluster of 1000 devices, we find that the mapping execution time of the intelligent selection algorithm module increases by less than 20%, although the overload adjustment occurs in 47% of the devices and the variance drops 1/4, which we expect.
The distributed file system and the data balanced distribution method oriented to the system have the following beneficial effects: the data transmission method and the data transmission device have the advantages that the structure is simple, the operation is convenient, and the data of the client side is transmitted to the storage equipment in the storage cluster in a balanced mode. The mapping between the storage data and the storage equipment is carried out, the storage data are evenly distributed according to the strategy defined by the user, and the separation degree of the storage data in different storage areas is improved. Therefore, the problem of balanced data distribution and load is solved, and the performance of the system is utilized to the maximum extent.
The above-mentioned embodiments only express the embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A distributed file system, characterized in that the distributed file system comprises a client (100), a storage cluster module (200), a management cluster module (300); wherein,
the client (100) comprises a user input port (110) for inputting storage data and an intelligent selection algorithm module (120) for balanced allocation of storage data;
the storage cluster module (200) comprises a plurality of storage servers, wherein each storage server comprises a plurality of magnetic disks and is used for storing the storage data input by the client (100) through the user input port (110);
the management cluster module (300) comprises a plurality of memory managers and is used for maintaining and managing state data among all modules in the distributed file system;
the intelligent selection algorithm module (300) of the client (100) comprises a first selection module (121) for selecting storage server ID values, an iteration operation module (122) for acquiring a plurality of storage server ID values corresponding to the storage server ID values, a second selection module (123) for selecting a proper storage server ID value from the plurality of storage server ID values, and a result module (124) for storing the storage server ID value selected by the second selection module; the first selection module (121) is connected with the storage cluster module (200), and the first selection module (121), the iterative operation module (122), the second selection module (123) and the result module (124) are sequentially connected with one another;
the client (100), the storage cluster module (200) and the management cluster module (300) are respectively and mutually connected in communication through a network protocol.
2. The distributed file system of claim 1, wherein the storage cluster module (200) comprises a hierarchy module, a replica distribution policy module, and a data module; wherein,
the hierarchical structure module comprises storage servers and buckets, and is used for counting the storage servers in the storage cluster module and selecting the storage servers for storing data in different modes;
the copy distribution strategy module is used for setting N copy storage data for each storage data and respectively storing the N copy storage data in different N storage servers;
the data module is to store each storage server ID value.
3. The distributed file system in accordance with claim 2, wherein the buckets of the hierarchy module contain multiple storage servers or multiple buckets.
4. The distributed file system in accordance with claim 3, wherein the buckets include a balance bucket, a list bucket, a tree bucket, and a draw bucket;
the equalization buckets include buckets having the same weight;
the list bucket is composed of buckets with a linked list structure, wherein the list bucket comprises a plurality of storage servers, and the weight of each storage server is any weight;
the tree bucket is composed of buckets with tree structures, wherein the tree bucket comprises a plurality of storage servers, and the weight of each storage server is the sum of the weights of adjacent storage servers in the tree structures;
the drawing barrel comprises a plurality of storage servers, and the storage servers are randomly drawn through the weights of the storage servers;
wherein the buckets in the hierarchy store corresponding storage data in a manner of balanced buckets.
5. The distributed file system in accordance with claim 3, wherein the storage server has a storage server ID value and a weight; the bucket has a bucket ID value and a weight, wherein the bucket weight is the sum of the weights of the plurality of storage servers.
6. The distributed file system according to claim 1, wherein the client (100), the management cluster module (300) and the storage cluster module (200) have configuration files, respectively, and the client (100) obtains the storage information and the status information of the storage server corresponding to the storage cluster module (200) by calling the management cluster module (300) according to the ID value and the port information of the configuration files, so as to be used for timely operation allocation of the intelligent selection algorithm module (120) of the client.
7. The distributed file system of claim 1, wherein the network protocol communicatively couples the modules of the distributed file system using a TCP/IP protocol.
8. A data balanced distribution method facing to a distributed file system is characterized in that the method comprises the following steps:
s1, inputting the storage data and the setting information through the user input port (110) of the client (100);
s2, the client (100) acquires the state information of the storage cluster module (200) through the management cluster module (300), and is in communication connection with the storage cluster module (200);
s3, the client (100) calls an intelligent selection algorithm module (120) to calculate N storage server ID values capable of storing corresponding storage data according to the state information of the storage cluster module (200), wherein the N storage server ID values correspond to the N storage servers;
s4, the client (100) stores the storage data to a first storage server of the corresponding N storage servers through the acquired storage server ID value, and the first storage server sends feedback information to the client after receiving the storage data to confirm the successful storage of the storage data; wherein, the first storage server is a main storage server;
s5, the client (100) respectively stores the storage data in N-1 storage servers which are not the first storage server through the acquired storage server ID values, and the storage data are guaranteed to have N-1 copy storage data.
9. The distributed file system-oriented data equal distribution method of claim 8, wherein the step S3 includes the following steps:
s31, the client (100) calls the intelligent selection algorithm module (120) of the client (100) according to the state information of the storage cluster module (200);
s32, selecting one storage server ID value in the hierarchical structure of the storage cluster module (200) by a first selection module (121) of the intelligent selection algorithm module (120) through the state information of the storage cluster module (200), and returning N storage server ID values corresponding to the storage server ID value to a bucket of the storage cluster module (200);
s33, the N storage server ID values in the bucket of the storage cluster module (200) are iterated through the iteration operation module (122), each storage server correspondingly returns the N storage server ID values through downward traversal, and the N storage server ID values returned corresponding to each storage server ID value are respectively stored in the N buckets;
s34, the iteration operation module (122) calls a second selection module (123), and the second selection module (123) respectively selects one storage server ID value from the N buckets in a pseudo-random mode;
s35, the second selection module (123) stores the selected N storage server ID values in the result module (124) respectively, so as to facilitate the identification of the storage data of the client (100).
10. The distributed file system-oriented data balanced distribution method according to claim 7, wherein the client (100), the storage cluster module (200) and the management cluster module (300) are connected in network communication with each other by means of a network protocol.
CN201610211732.4A 2016-04-06 2016-04-06 A kind of distributed file system and the data balancing location mode towards the system Expired - Fee Related CN105912612B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610211732.4A CN105912612B (en) 2016-04-06 2016-04-06 A kind of distributed file system and the data balancing location mode towards the system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610211732.4A CN105912612B (en) 2016-04-06 2016-04-06 A kind of distributed file system and the data balancing location mode towards the system

Publications (2)

Publication Number Publication Date
CN105912612A true CN105912612A (en) 2016-08-31
CN105912612B CN105912612B (en) 2019-04-05

Family

ID=56744683

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610211732.4A Expired - Fee Related CN105912612B (en) 2016-04-06 2016-04-06 A kind of distributed file system and the data balancing location mode towards the system

Country Status (1)

Country Link
CN (1) CN105912612B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709045A (en) * 2016-12-29 2017-05-24 深圳市中博科创信息技术有限公司 Node selection method and device in distributed file system
CN108536392A (en) * 2017-03-01 2018-09-14 三星电子株式会社 Blended data lookup method
CN108540315A (en) * 2018-03-28 2018-09-14 新华三技术有限公司成都分公司 Distributed memory system, method and apparatus
CN108600401A (en) * 2018-08-17 2018-09-28 青岛海尔能源动力有限公司 a kind of data transmission method and system
CN109992206A (en) * 2019-03-27 2019-07-09 新华三技术有限公司成都分公司 Distributed data storage method and relevant apparatus
CN111095233A (en) * 2017-09-28 2020-05-01 深圳清华大学研究院 Hybrid file system architecture, file storage, dynamic migration and applications thereof
CN112579551A (en) * 2019-09-30 2021-03-30 北京金山云网络技术有限公司 Data storage and reading method, device, client, management server and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101944124A (en) * 2010-09-21 2011-01-12 卓望数码技术(深圳)有限公司 Distributed file system management method, device and corresponding file system
CN102123167A (en) * 2010-12-31 2011-07-13 成都市华为赛门铁克科技有限公司 Distributed file system, and data storage processing method and data storage processing device thereof
CN103457963A (en) * 2012-05-28 2013-12-18 联想(北京)有限公司 File storage method and distributed storage system
US20140059310A1 (en) * 2012-08-24 2014-02-27 Vmware, Inc. Virtualization-Aware Data Locality in Distributed Data Processing
CN103701916A (en) * 2013-12-31 2014-04-02 赛凡信息科技(厦门)有限公司 Dynamic load balancing method of distributed storage system
US20150312335A1 (en) * 2014-04-28 2015-10-29 Arizona Board Of Regents On Behalf Of Arizona State University Peer-to-peer architecture for processing big data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101944124A (en) * 2010-09-21 2011-01-12 卓望数码技术(深圳)有限公司 Distributed file system management method, device and corresponding file system
CN102123167A (en) * 2010-12-31 2011-07-13 成都市华为赛门铁克科技有限公司 Distributed file system, and data storage processing method and data storage processing device thereof
CN103457963A (en) * 2012-05-28 2013-12-18 联想(北京)有限公司 File storage method and distributed storage system
US20140059310A1 (en) * 2012-08-24 2014-02-27 Vmware, Inc. Virtualization-Aware Data Locality in Distributed Data Processing
CN103701916A (en) * 2013-12-31 2014-04-02 赛凡信息科技(厦门)有限公司 Dynamic load balancing method of distributed storage system
US20150312335A1 (en) * 2014-04-28 2015-10-29 Arizona Board Of Regents On Behalf Of Arizona State University Peer-to-peer architecture for processing big data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
赵赛: "《云存储中基于动态多中心的分布式文件系统研究》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
邓青 等: "《云存储环境下的负载均衡策略研究》", 《航空计算技术》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709045A (en) * 2016-12-29 2017-05-24 深圳市中博科创信息技术有限公司 Node selection method and device in distributed file system
CN106709045B (en) * 2016-12-29 2020-09-15 北京同有飞骥科技股份有限公司 Node selection method and device in distributed file system
CN108536392A (en) * 2017-03-01 2018-09-14 三星电子株式会社 Blended data lookup method
CN108536392B (en) * 2017-03-01 2024-03-12 三星电子株式会社 Mixed data searching method
CN111095233A (en) * 2017-09-28 2020-05-01 深圳清华大学研究院 Hybrid file system architecture, file storage, dynamic migration and applications thereof
CN111095233B (en) * 2017-09-28 2023-09-26 深圳清华大学研究院 Hybrid file system architecture, file storage, live migration and applications thereof
CN108540315A (en) * 2018-03-28 2018-09-14 新华三技术有限公司成都分公司 Distributed memory system, method and apparatus
CN108540315B (en) * 2018-03-28 2021-12-07 新华三技术有限公司成都分公司 Distributed storage system, method and device
CN108600401A (en) * 2018-08-17 2018-09-28 青岛海尔能源动力有限公司 a kind of data transmission method and system
CN109992206A (en) * 2019-03-27 2019-07-09 新华三技术有限公司成都分公司 Distributed data storage method and relevant apparatus
CN109992206B (en) * 2019-03-27 2022-05-10 新华三技术有限公司成都分公司 Data distribution storage method and related device
CN112579551A (en) * 2019-09-30 2021-03-30 北京金山云网络技术有限公司 Data storage and reading method, device, client, management server and system

Also Published As

Publication number Publication date
CN105912612B (en) 2019-04-05

Similar Documents

Publication Publication Date Title
CN105912612B (en) A kind of distributed file system and the data balancing location mode towards the system
US11294601B2 (en) Method of distributed data redundancy storage using consistent hashing
CN110169040B (en) Distributed data storage method and system based on multilayer consistent hash
US10209893B2 (en) Massively scalable object storage for storing object replicas
Weil et al. CRUSH: Controlled, scalable, decentralized placement of replicated data
CN104899286B (en) Distributed content is stored and is fetched
US7389293B2 (en) Remastering for asymmetric clusters in high-load scenarios
CN104067216B (en) System and method for implementing expansible data storage service
US10339002B2 (en) Catastrophic data loss avoidance
CN102609446B (en) Distributed Bloom filter system and application method thereof
US10908834B2 (en) Load balancing for scalable storage system
CN103150394A (en) Distributed file system metadata management method facing to high-performance calculation
US11663192B2 (en) Identifying and resolving differences between datastores
Choi et al. Dynamic hybrid replication effectively combining tree and grid topology
CN109597903A (en) Image file processing apparatus and method, document storage system and storage medium
JP4891657B2 (en) Data storage system, file search device and program
Higai et al. A study of effective replica reconstruction schemes at node deletion for HDFS
CN111309260B (en) Data storage node selection method
WO2021133405A1 (en) Dynamic adaptive partition splitting
US11086689B2 (en) Method for automatically and dynamically assigning the responsibility for tasks to the available computing components in a highly distributed data-processing system
US20180189342A1 (en) Method and system for tree management of trees under multi-version concurrency control
Yan et al. A Design of Metadata Server Cluster in Large Distributed Object-based Storage.
CA2934041C (en) Reconciling volumelets in volume cohorts
CN109040214B (en) Service deployment method for enhancing reliability in cloud environment
US20080091740A1 (en) Method for managing a partitioned database in a communication network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190405

CF01 Termination of patent right due to non-payment of annual fee