CN109951506B - Method and equipment for evaluating performance of storage cluster - Google Patents

Method and equipment for evaluating performance of storage cluster Download PDF

Info

Publication number
CN109951506B
CN109951506B CN201711385142.4A CN201711385142A CN109951506B CN 109951506 B CN109951506 B CN 109951506B CN 201711385142 A CN201711385142 A CN 201711385142A CN 109951506 B CN109951506 B CN 109951506B
Authority
CN
China
Prior art keywords
data
node
storage
cluster
bandwidth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711385142.4A
Other languages
Chinese (zh)
Other versions
CN109951506A (en
Inventor
李宏杰
刘鸿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Suzhou Software Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Suzhou Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Suzhou Software Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201711385142.4A priority Critical patent/CN109951506B/en
Publication of CN109951506A publication Critical patent/CN109951506A/en
Application granted granted Critical
Publication of CN109951506B publication Critical patent/CN109951506B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and equipment for evaluating performance of a storage cluster, which are used for solving the problem that the performance of a distributed storage cluster cannot be evaluated in the prior art, and the maximum occupied bandwidth of each node in the occupied bandwidth under a data read-write scene is calculated based on a data redundancy strategy; estimating the original data volume of the storage nodes in a data read-write scene according to the maximum occupied bandwidth, and calculating the cluster performance of the distributed storage cluster according to the original data volume of the storage nodes and the number of the storage nodes included in the storage cluster. By adopting the method of the embodiment of the invention, the original data volume of the storage node determined according to the maximum occupied bandwidth in the storage cluster is close to the maximum data volume which can be allowed by the distributed storage cluster, so that the cluster performance calculated according to the determined storage data volume is closer to reality and more accurate.

Description

Method and equipment for evaluating performance of storage cluster
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a method and an apparatus for evaluating performance of a storage cluster.
Background
With the rapid development of information technology, under the present digital era, the total amount of data information increases exponentially, and a more serious challenge is provided for data storage, an original storage mode of a local single server cannot effectively bear a resource storage task, and a cloud storage system becomes a preferred scheme for data storage with the advantages of high efficiency, strong expansibility, low cost and the like.
The cloud storage system refers to a system which integrates various storage devices in a network through application programs to cooperatively work through functions of a cluster, a network technology or a distributed file system and the like, and provides data storage and service access functions to the outside, so that the cloud storage system can also be called a distributed storage cluster.
In the process of scheme design and deployment implementation of the distributed storage cluster, the performance of the distributed object storage cluster during data reading and writing is difficult to evaluate and calculate practically. Therefore, a high-performance and high-utilization distributed storage cluster cannot be designed for a real environment.
In summary, there is no scheme for performing performance evaluation on a distributed storage cluster.
Disclosure of Invention
The invention provides a method and equipment for evaluating performance of a storage cluster, which are used for solving the problem that the performance of a distributed storage cluster cannot be evaluated in the prior art.
The invention provides a method for evaluating the performance of a storage cluster, which comprises the following steps:
calculating the maximum occupied bandwidth of each node in a distributed storage cluster in a data read-write scene based on a data redundancy strategy of the distributed storage cluster;
estimating the original data volume of the storage nodes in the distributed storage cluster under the data reading and writing scene according to the maximum occupied bandwidth, wherein the original data volume of the storage nodes is the data volume before data redundancy of the storage nodes under the data reading and writing scene;
and calculating the cluster performance of the distributed storage cluster according to the original data volume of the storage nodes and the number of the storage nodes included in the distributed storage cluster, wherein the cluster performance is used for representing the data throughput limit of the distributed storage cluster in a data read-write scene.
The present invention provides an evaluation apparatus that stores cluster performance, the evaluation apparatus including:
the bandwidth determining module is used for calculating the maximum occupied bandwidth of each node in the distributed storage cluster in a data read-write scene based on a data redundancy strategy of the distributed storage cluster;
the data calculation module is used for estimating the original data volume of the storage nodes in the distributed storage cluster under the data read-write scene according to the maximum occupied bandwidth, wherein the original data volume of the storage nodes is the data volume before data redundancy of the storage nodes under the data read-write scene;
and the evaluation module is used for calculating the cluster performance of the distributed storage cluster according to the original data volume of the storage nodes and the number of the storage nodes in the distributed storage cluster, wherein the cluster performance is used for expressing the data throughput limit of the distributed storage cluster in a data read-write scene.
In the embodiment of the invention, based on a data redundancy strategy, the maximum occupied bandwidth of the distributed storage cluster in the occupied bandwidth of each node in the distributed storage cluster under a data read-write scene is calculated; estimating the original data volume of the storage nodes in the distributed storage cluster under the data reading and writing scene according to the maximum occupied bandwidth, wherein the original data volume of the storage nodes is the data volume before data redundancy of the storage nodes under the data reading and writing scene; and calculating the cluster performance of the distributed storage cluster according to the original data volume of the storage nodes and the number of the storage nodes included in the distributed storage cluster, wherein the cluster performance is used for representing the data throughput limit of the distributed storage cluster in a data read-write scene. By adopting the method of the embodiment of the invention, the original data volume of the storage node determined according to the maximum occupied bandwidth in the storage cluster is close to the maximum data volume which can be allowed by the distributed storage cluster, so that the cluster performance calculated according to the determined storage data volume is closer to reality and more accurate.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a schematic diagram of a method for evaluating performance of a storage cluster according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a calculation method of bandwidth occupied by each node in a data read-write scenario according to an embodiment of the present invention;
FIG. 3 is a flow chart illustrating an evaluation of the performance of a storage cluster according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an evaluation device for storing cluster performance according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiments of the present invention will be described in further detail with reference to the drawings attached hereto.
As shown in fig. 1, an embodiment of the present invention provides a method for evaluating performance of a storage cluster, where the method includes:
step 101: calculating the maximum occupied bandwidth of each node in a distributed storage cluster in a data read-write scene based on a data redundancy strategy of the distributed storage cluster;
step 102: estimating the original data volume of the storage nodes in the distributed storage cluster under the data reading and writing scene according to the maximum occupied bandwidth, wherein the original data volume of the storage nodes is the data volume before data redundancy of the storage nodes under the data reading and writing scene;
step 103: and calculating the cluster performance of the distributed storage cluster according to the original data volume of the storage nodes and the number of the storage nodes included in the distributed storage cluster, wherein the cluster performance is used for representing the data throughput limit of the distributed storage cluster in a data read-write scene.
The distributed storage cluster comprises load balancing nodes, gateway nodes and storage nodes, wherein the load balancing nodes have a load balancing function and are used for averagely dividing received data into a plurality of data and distributing the data to the gateway nodes connected with the load balancing nodes, the quantity of the data distributed to each gateway node can be distributed according to the number of the storage nodes connected with each gateway node, when the number of the connected storage nodes is large, a plurality of data can be distributed, when the number of the connected storage nodes is small, a small quantity of data can be distributed, the gateway nodes have a gateway function and can assign the storage nodes to the data needing to be distributed, after receiving the data distributed by the load balancing nodes, the gateway nodes averagely distribute the received data to the storage nodes connected with the gateway nodes, and the storage nodes have a storage function, the load balancing node and the gateway node are used for storing data, the received data are evenly distributed to each storage node, the size of original data obtained by distribution of the storage nodes is the same, and then the storage nodes perform data redundancy on the original data to complete data storage operation. The original data amount allocated to each storage node is basically the same within a certain error allowable range, and the original data amount of the storage node in the embodiment of the present invention is the data amount before data redundancy is performed on one storage node in a distributed storage cluster.
At this time, the original data allocated to each storage node is the original data of each storage node before data redundancy, and each storage node needs to store the original data after obtaining the original data, wherein the storage mode is related to the redundancy strategy in the distributed storage cluster.
If the redundancy policy in the distributed storage cluster is N copies, the storage nodes need to copy the original data into N copies, one copy is stored in the own device, and N-1 copies are respectively allocated to other storage nodes for storage, that is, each storage node obtains 1 copy of data; the redundancy strategy in the distributed storage cluster is K + M erasure codes, at this time, the storage nodes need to divide the original data into K parts, the K parts of data are coded and calculated to form M parts of check data, the check data are used for decoding the check data and then restoring the original data when the data are in error and need to be restored, for the current K + M parts of data, the size of each part of data is the same, the storage nodes keep one part of data in the K parts of data or one part of check data in the M parts of check data, the rest K + M-1 parts of data are respectively distributed to other K + M-1 storage nodes for storage, and therefore each storage node obtains one part of data or one part of check data.
The process is a process of writing data into each storage node in the distributed storage cluster, if the data in the distributed storage cluster is read out from each storage node, for the case that the data redundancy strategy is N copies, because the data stored by each storage node is the same and the same as the original data before the data redundancy, only the data stored on one of the storage nodes needs to be acquired, for the case that the data redundancy strategy is K + M erasure code, because the data redundancy is realized, only one of K data or one of M parity data is stored by each storage node, and for the storage node storing one of K data, the original data before the data redundancy can be formed by acquiring other K-1 data from other storage nodes; after the storage node determines the original data before data redundancy, the storage node transmits the original data to the gateway node, and then the gateway node transmits the original data to the load balancing node.
It should be noted that, the load balancing node allocates data to the gateway node, the gateway node allocates data to the storage node and the storage node transmits the original data to the gateway node, and then the gateway node transmits the original data to the load balancing node in the front-end network of the distributed storage cluster, the front-end network is also called a service network, each storage node completes the process of obtaining the original data to store data according to the data redundancy policy and the process of obtaining data from other storage nodes by the storage node in the back-end network of the distributed storage cluster, and the back-end network is also called a storage network; the front-end network and the back-end network can use different networks or the same network; the load balancing node may also have a storage function, that is, one node of the distributed storage cluster may have both the load balancing function and the storage function, and the gateway node may also have a storage function, that is, one node of the distributed storage cluster may have both the gateway function and the gateway function, such a node may be referred to as a hybrid node, and the hybrid node having both the load balancing function and the storage function simultaneously serves as the load balancing node and the storage node, and the hybrid node having both the gateway function and the storage function simultaneously serves as the gateway node and the storage node.
In order to avoid a complex node architecture in the distributed storage cluster, the distributed storage cluster may also only include a storage node and a management node, that is, the distributed storage cluster does not include a gateway node and a load balancing node, the management node is configured to averagely allocate the received data to the storage node, and the storage node performs data redundancy and stores the data after receiving the data.
When calculating the maximum occupied bandwidth of each node in the distributed storage cluster in a data read-write scene based on a data redundancy strategy of the distributed storage cluster, the occupied bandwidth of each node in the data read-write scene needs to be determined, and the specific method is as follows:
firstly, determining a calculation mode of occupied bandwidth of each node in the distributed storage cluster under a data read-write scene according to a data redundancy strategy of the distributed storage cluster; for different redundancy strategies, under the data read-write scene, the calculation modes of the bandwidth occupied by the same node are different, at this time, the data redundancy strategy of the distributed storage cluster needs to be determined first, determining a preset calculation mode of the occupied bandwidth of each node according to a data redundancy strategy of the distributed storage cluster, as shown in fig. 2, a calculation manner of bandwidth occupied by each node in a data read-write scenario when the data redundancy policy is N copies and K + M erasure codes respectively, wherein x denotes original data obtained by allocating each storage node, wherein the mixed node also has a storage function, the original data obtained by allocating is also x, the mixed node also occupies bandwidth when data is redundant, R in fig. 2 denotes the number of storage nodes in the distributed storage cluster, the distributed storage cluster comprises mixed nodes with a storage function, wherein S represents the number of gateway nodes in the distributed storage cluster; in fig. 2, a storage network and a service network are calculated separately, the upper row of the storage network represents the bandwidth occupied by a storage node in the storage network when sending data to another storage node, the lower row of the storage network represents the bandwidth occupied by a storage node in the storage network when receiving data sent by another storage node, the lower row of the service network represents the bandwidth occupied by a node in the service network when sending data to another node, for example, the bandwidth occupied by a load balancing node when sending data to a gateway node, the bandwidth occupied by a gateway node when sending data to a storage node, the upper row of the service network represents the bandwidth occupied by a node in the service network when receiving data sent by another node, for example, the bandwidth occupied by a storage node when receiving data sent by a gateway node, the bandwidth occupied by a gateway node when receiving data sent by a load balancing node or a storage node, and if the storage network and the service network are the same network, calculating the occupied bandwidth of a node during data reading and writing, wherein the occupied bandwidth of a storage network and the occupied bandwidth of a service network need to be added; the independent storage represents a storage node only with a storage function, the storage + gateway represents a mixed node simultaneously with the storage function and the gateway function, the independent gateway represents a gateway node only with the gateway function, the independent load balancing represents a load balancing node only with the load balancing function, and the storage + load balancing represents a mixed node simultaneously with the storage function and the load balancing function. The data operation is divided into writing, reading and mixed reading and writing, wherein the writing represents writing data into the distributed storage cluster, the reading represents reading data from the distributed storage cluster, the mixed reading and writing represents simultaneous data writing and data reading, the reading and writing proportion is p: q, and represents that the proportion of original data needing to be read and original data needing to be written is p: q is calculated. The original data does not contain data resulting from data redundancy.
Taking a hybrid node having both a storage function and a gateway function as an example, when the data redundancy policy is an N-copy, in a data writing scene, the occupied bandwidths of the hybrid nodes on the upper line and the lower line of the storage network are both (N-1) x, the occupied bandwidths of the service network and the service network are (R/S-1) x and Rx/S respectively, in a data reading scene, the occupied bandwidth of the hybrid node on the storage network upstream and downstream is 0, the occupied bandwidths on the service network and the service network are Rx/S and (R/S-1) x respectively, when data are read and written in a mixed mode, the occupied bandwidths of the mixed nodes on the upper line and the lower line of the storage network are respectively (N-1) qx, the occupied bandwidths of the service network uplink and downlink are (R/S-q) x and (R/S-p) x respectively; when the data redundancy strategy is K + M erasure codes, in a data writing scene, the occupied bandwidths of the hybrid nodes on the upstream and the downstream of the storage network are both (1+ (M-1)/K) x, the occupied bandwidths of the service network and the service network are (R/S-1) x and Rx/S respectively, in a data reading scene, the occupied bandwidth of the hybrid node on the storage network and the storage network both upstream and downstream is (1-1/K) x, the occupied bandwidths on the service network and the service network are Rx/S and (R/S-1) x respectively, when data are read and written in a mixed mode, the occupied bandwidths of the mixed nodes on the upper line and the lower line of the storage network are both (1+ (qM-1)/K) x, the occupied bandwidths on the service network and the service network are (R/S-q) x and (R/S-p) x respectively.
Secondly, based on the determined calculation mode of the occupied bandwidth of each node, estimating the occupied bandwidth of each node in a data reading and writing scene according to the original data quantity of the preset storage node; as can be seen from fig. 2, it is necessary to calculate the occupied bandwidth of each node in the data read-write scenario and know the original data x of each storage node before data redundancy is performed, so that an x value can be preset, the preset x is brought into each formula in fig. 2, and the occupied bandwidth of each node in the data read-write scenario can be estimated; as can be seen from fig. 2, the bandwidth occupied by each node in the data read-write scenario is related to the original data x of each storage node before data redundancy, and when the original data x of each storage node before data redundancy changes, the bandwidth occupied by each node in the data read-write scenario also changes.
Determining the maximum occupied bandwidth in the estimated occupied bandwidths of the nodes, calculating the maximum occupied bandwidth according to numerical comparison after estimating the occupied bandwidth of each node in a data reading and writing scene, and estimating the original data volume of each storage node in the distributed storage cluster in the data reading and writing scene according to the maximum occupied bandwidth; when data is read and written, under the influence of network performance in the distributed storage cluster, the occupied bandwidth of each node cannot exceed the bandwidth occupied threshold allowed by the network in the distributed storage cluster, so that when the calculated maximum occupied bandwidth in each node is not greater than the bandwidth occupied threshold, the occupied bandwidth of other nodes cannot exceed the bandwidth occupied threshold.
And adjusting the original data volume of the preset storage node according to the bandwidth occupation threshold and the maximum occupied bandwidth, so that the maximum occupied bandwidth is not greater than the bandwidth occupation threshold.
Comparing the maximum occupied bandwidth with the bandwidth occupied threshold, if the maximum occupied bandwidth is smaller than the bandwidth occupied threshold, it indicates that the original data x of each storage node before data redundancy is preset to be smaller, and the preset x can be properly increased until the maximum occupied bandwidth is equal to the bandwidth occupied threshold, at this time, the adjusted x is the maximum data volume of the original data which can be obtained in the distributed storage cluster, if the maximum occupied bandwidth is larger than the bandwidth occupied threshold, it indicates that the original data x of each storage node before data redundancy is preset to be larger, the maximum occupied bandwidth exceeds the bandwidth occupied threshold, and the preset x can be properly decreased until the maximum occupied bandwidth is equal to the bandwidth occupied threshold, at this time, the adjusted x is the maximum data volume of the original data which can be obtained in the distributed storage cluster.
Or, it may be preset that x is an unknown number and is directly introduced into each formula shown in fig. 2 to calculate the occupied bandwidth of each node, the calculation result of the occupied bandwidth of each node should be the product of x and a real number, the calculation result with the maximum real number is selected as the maximum occupied bandwidth, the maximum occupied bandwidth is equal to the bandwidth occupied threshold, and x is solved.
It should be noted that, in the embodiment of the present invention, the original data of each storage node before data redundancy is the size of the original data obtained in each unit time, for example, when the original data X obtained by each storage node in T time before data redundancy is known, the size X of the original data obtained in each unit time can be calculated by X/T, so as to implement the occupied bandwidth calculated by X.
And adjusting the original data volume of the storage nodes according to the equipment performance of each storage node.
The device performance of each storage node in the distributed storage cluster limits the size of the original data that can be obtained by the storage node, for example, the read-write speed of a disk, the size of the disk, and the performance of a data interface in each storage node in the distributed storage cluster may affect the size of the original data that can be obtained by each storage node, for example, x is currently determined to be 120MB/S according to a bandwidth occupation threshold, but at this time, the device performance of each storage node cannot guarantee that the read-write of 120MB data is completed in a unit time, and if the maximum allowable value is 100MB/S, x needs to be adjusted in a proper amount to meet the device performance requirement.
Preferably, the device performance of each storage node in the distributed storage cluster is the same, and at this time, it is not necessary to adjust the original data x of each storage node, and it is only necessary to adjust x according to the device performance of one of the storage nodes.
After x is adjusted, calculating cluster performance of the distributed storage cluster according to the original data volume of each storage node and the number of the storage nodes included in the distributed storage cluster, wherein the cluster performance is used for representing the data throughput limit of the distributed storage cluster in a data read-write scene.
Specifically, the cluster performance of the distributed storage cluster may be calculated according to table 1:
TABLE 1
Figure BDA0001516425290000091
R is the number of storage nodes, and comprises a mixed node with a storage function, and when data are written into the distributed storage cluster, the maximum allowable written data throughput limit is Rx; maximum allowable read data throughput limit when reading data from distributed storage clusterRx(ii) a When the distributed storage cluster performs mixed reading and writing, the maximum allowable reading data throughput limit is Rpx, and the maximum allowable writing data throughput limit is Rpx.
The method for evaluating the cluster performance in the embodiment of the invention can be seen in that the original data volume of each storage node needs to be adjusted, the specific adjustment modes are divided into two types, and the first type is adjusted according to the bandwidth occupation threshold of the network in the distributed storage cluster. And secondly, adjusting according to the device performance of the storage nodes in the distributed storage cluster.
The limiting factor of the distributed storage cluster can be determined according to a mode of adjusting the original data volume of the storage nodes, if the original data volume enables the maximum occupied bandwidth to be larger than a bandwidth occupied threshold, the original data volume of each storage node needs to be adjusted according to the bandwidth occupied threshold, the limiting factor limiting the distributed storage cluster at the moment is the network performance in the distributed storage cluster, if the network performance in the distributed storage cluster is improved, the original data volume of each storage node of the distributed storage cluster can be improved, and then the cluster performance of the distributed storage cluster is improved; if the original data volume of the storage node already exceeds the maximum data volume allowed by the device performance of the storage node, the original data volume of the storage node needs to be adjusted according to the device performance, and it is indicated that the limiting factor limiting the distributed storage cluster at this time is the device performance of the storage node, and if the device performance of the storage node is improved, the original data volume of each storage node of the distributed storage cluster can be improved, so that the cluster performance of the distributed storage cluster is improved. That is to say, the limiting factor of the distributed storage cluster may reflect the factor limiting the cluster performance in the distributed storage cluster, and the network performance of the distributed storage cluster or the device performance of the node may be adjusted according to the limiting factor, so as to optimize the design scheme of the distributed storage cluster and improve the cluster parameters of the distributed storage cluster. The distributed storage cluster comprises 10 nodes, wherein 6 storage nodes with a storage function, 2 mixed nodes with a gateway function and a storage function, and 2 mixed nodes with a load balancing function and a storage function; the data redundancy policy is 3 copies; the storage network and the service network are different networks, and the bandwidth occupation threshold is 1100 MB/s; when data is read and written in a mixed mode, the read-write proportion is p: q is 0.8: 0.2;
then, the parameter used for calculating the bandwidth occupied by each node is N ═ 3, R ═ 10, S ═ 2, p ═ 0.8, and q ═ 0.2;
the above parameters are substituted into the respective formulas shown in fig. 2, and the results are shown in table 2:
TABLE 2
Figure BDA0001516425290000101
Figure BDA0001516425290000111
Wherein, the maximum value is 11x, which is equal to the bandwidth occupation threshold value of 1100MB/s, and x can be solved to be 100 MB/s;
further calculation is performed based on x obtained by the above calculation, and the calculation is carried out into each formula shown in table 1, so that the cluster performance of the distributed storage cluster can be calculated, as shown in table 3:
TABLE 3
Figure BDA0001516425290000112
If the device performance in each storage node cannot make x reach 100MB/s, x may be adjusted according to the device performance, and then the adjusted x is brought into table 1 to calculate the cluster performance.
Fig. 3 is a flowchart illustrating cluster performance evaluation according to an embodiment of the present invention.
Step 301: determining a calculation mode of occupied bandwidth of each node in the distributed storage cluster under a data read-write scene according to the data redundancy strategy of the distributed storage cluster;
step 302: based on the determined calculation mode of the occupied bandwidth of each node, estimating the occupied bandwidth of each node in a data reading and writing scene according to the original data quantity of the preset storage node;
step 303: determining the maximum occupied bandwidth in the estimated occupied bandwidths of all the nodes;
step 304: adjusting the original data volume of the preset storage node according to a bandwidth occupation threshold and the maximum occupied bandwidth, so that the maximum occupied bandwidth is not greater than the bandwidth occupation threshold;
step 305: adjusting the original data volume of the storage nodes according to the equipment performance of each storage node;
step 306: and calculating the cluster performance of the distributed storage cluster according to the original data volume of the storage nodes and the number of the storage nodes in the storage cluster.
Based on the same inventive concept, the embodiment of the invention also provides an evaluation device for the performance of the storage cluster. Since the principle of solving the problem by the device is similar to the method for evaluating the performance of the storage cluster in the embodiment of the present invention, the implementation of the device may refer to the implementation of the method, and repeated details are not described herein.
As shown in fig. 4, an evaluation device for storage cluster performance according to an embodiment of the present invention includes a bandwidth determination module 401, a data calculation module 402, and an evaluation module 403:
a bandwidth determining module 401, configured to calculate, based on a data redundancy policy of a distributed storage cluster, a maximum occupied bandwidth of bandwidths occupied by each node in the distributed storage cluster in a data read-write scene of the distributed storage cluster;
a data calculation module 402, configured to estimate an original data volume of a storage node in the distributed storage cluster in a data read-write scenario according to the maximum occupied bandwidth, where the original data volume of the storage node is a data volume of the storage node before data redundancy is performed in the data read-write scenario;
an evaluation module 403, configured to calculate a cluster performance of the distributed storage cluster according to the raw data amount of the storage node and the number of storage nodes included in the distributed storage cluster, where the cluster performance is used to represent a data throughput limit of the distributed storage cluster in a data read-write scenario.
The bandwidth determining module 401 calculates, based on a data redundancy policy of a distributed storage cluster, a maximum occupied bandwidth of each node occupied bandwidth in the distributed storage cluster in a data read-write scenario in the following manner:
determining a calculation mode of occupied bandwidth of each node in the distributed storage cluster under a data read-write scene according to the data redundancy strategy of the distributed storage cluster;
based on the determined calculation mode of the occupied bandwidth of each node, estimating the occupied bandwidth of each node in a data reading and writing scene according to the original data quantity of the preset storage node;
and determining the maximum occupied bandwidth in the estimated occupied bandwidths of all the nodes.
The data calculation module 402 estimates, according to the maximum occupied bandwidth, that the original data size of the storage node in the data read-write scenario in the distributed storage cluster is a bandwidth occupied threshold that needs to be considered, and adjusts the original data size of the preset storage node according to the bandwidth occupied threshold and the maximum occupied bandwidth, so that the maximum occupied bandwidth is not greater than the bandwidth occupied threshold.
After the data calculation module 402 adjusts the original data size of the preset storage node according to the bandwidth occupation threshold and the maximum occupied bandwidth, the original data size of the storage node may also be adjusted according to the device performance of each storage node.
Preferably, the device performance of each storage node in the distributed storage cluster is the same.
The present application is described above with reference to block diagrams and/or flowchart illustrations of methods, apparatus (systems) and/or computer program products according to embodiments of the application. It will be understood that one block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, and/or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer and/or other programmable data processing apparatus, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks.
Accordingly, the subject application may also be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.). Furthermore, the present application may take the form of a computer program product on a computer-usable or computer-readable storage medium having computer-usable or computer-readable program code embodied in the medium for use by or in connection with an instruction execution system. In the context of this application, a computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (8)

1. A method for evaluating the performance of a storage cluster, the method comprising:
calculating the maximum occupied bandwidth of each node in a distributed storage cluster in a data read-write scene based on a data redundancy strategy of the distributed storage cluster;
adjusting the original data volume of a preset storage node according to a bandwidth occupation threshold and a maximum occupied bandwidth, so that the maximum occupied bandwidth is not larger than the bandwidth occupation threshold, wherein the original data volume of the storage node is the data volume of the storage node before data redundancy is carried out in a data reading and writing scene;
and calculating the cluster performance of the distributed storage cluster according to the original data volume of the storage nodes and the number of the storage nodes included in the distributed storage cluster, wherein the cluster performance is used for representing the data throughput limit of the distributed storage cluster in a data read-write scene.
2. The method of claim 1, wherein calculating a maximum occupied bandwidth of bandwidths occupied by each node in a distributed storage cluster in a data read-write scene by the distributed storage cluster based on a data redundancy policy of the distributed storage cluster comprises:
determining a calculation mode of occupied bandwidth of each node in the distributed storage cluster under a data read-write scene according to the data redundancy strategy of the distributed storage cluster;
based on the determined calculation mode of the occupied bandwidth of each node, estimating the occupied bandwidth of each node in a data reading and writing scene according to the original data quantity of the preset storage node;
and determining the maximum occupied bandwidth in the estimated occupied bandwidths of all the nodes.
3. The method of claim 2, wherein after adjusting the amount of raw data of the preset storage node according to the bandwidth occupancy threshold and the maximum occupied bandwidth, further comprising:
and adjusting the original data volume of each storage node according to the equipment performance of the storage node.
4. The method of claim 3, wherein the device performance of each storage node in the distributed storage cluster is the same.
5. An evaluation device that stores cluster performance, the evaluation device comprising:
the bandwidth determining module is used for calculating the maximum occupied bandwidth of each node in the distributed storage cluster in a data read-write scene based on a data redundancy strategy of the distributed storage cluster;
the data calculation module is used for adjusting the original data volume of a preset storage node according to a bandwidth occupation threshold and a maximum occupied bandwidth, so that the maximum occupied bandwidth is not greater than the bandwidth occupation threshold, wherein the original data volume of the storage node is the data volume of the storage node before data redundancy is carried out in a data reading and writing scene;
and the evaluation module is used for calculating the cluster performance of the distributed storage cluster according to the original data volume of the storage nodes and the number of the storage nodes in the distributed storage cluster, wherein the cluster performance is used for expressing the data throughput limit of the distributed storage cluster in a data read-write scene.
6. The evaluation device of claim 5, wherein the bandwidth determination module is specifically configured to:
determining a calculation mode of occupied bandwidth of each node in the distributed storage cluster under a data read-write scene according to the data redundancy strategy of the distributed storage cluster;
based on the determined calculation mode of the occupied bandwidth of each node, estimating the occupied bandwidth of each node in a data reading and writing scene according to the original data quantity of the preset storage node;
and determining the maximum occupied bandwidth in the estimated occupied bandwidths of all the nodes.
7. The evaluation device of claim 6, wherein the data calculation module is further to:
and adjusting the original data volume of each storage node according to the equipment performance of the storage node.
8. The evaluation device of claim 7, wherein the device performance of each storage node in the distributed storage cluster is the same.
CN201711385142.4A 2017-12-20 2017-12-20 Method and equipment for evaluating performance of storage cluster Active CN109951506B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711385142.4A CN109951506B (en) 2017-12-20 2017-12-20 Method and equipment for evaluating performance of storage cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711385142.4A CN109951506B (en) 2017-12-20 2017-12-20 Method and equipment for evaluating performance of storage cluster

Publications (2)

Publication Number Publication Date
CN109951506A CN109951506A (en) 2019-06-28
CN109951506B true CN109951506B (en) 2021-11-30

Family

ID=67005169

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711385142.4A Active CN109951506B (en) 2017-12-20 2017-12-20 Method and equipment for evaluating performance of storage cluster

Country Status (1)

Country Link
CN (1) CN109951506B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111294255B (en) * 2020-01-22 2022-12-27 上海极熵数据科技有限公司 Gateway testing method and storage medium
CN112667166A (en) * 2020-12-30 2021-04-16 浪潮云信息技术股份公司 Cloud hard disk dynamic QoS setting method and tool based on cloud platform

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107276849A (en) * 2017-06-15 2017-10-20 北京奇艺世纪科技有限公司 The method for analyzing performance and device of a kind of cluster
CN107450854A (en) * 2017-08-07 2017-12-08 郑州云海信息技术有限公司 The determination method and system of maximum thread under a kind of expected rate

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140341568A1 (en) * 2013-05-20 2014-11-20 Sodero Networks, Inc. High-Throughput Network Traffic Monitoring through Optical Circuit Switching and Broadcast-and-Select Communications
US9444764B2 (en) * 2015-01-20 2016-09-13 State Farm Mutual Automobile Insurance Company Scalable and secure interconnectivity in server cluster environments
US20160349993A1 (en) * 2015-05-29 2016-12-01 Cisco Technology, Inc. Data-driven ceph performance optimizations
CN107145414B (en) * 2017-04-27 2021-03-09 苏州浪潮智能科技有限公司 Method and system for testing distributed object storage
CN107181626B (en) * 2017-07-18 2020-05-26 苏州浪潮智能科技有限公司 Method and system for monitoring network bandwidth of distributed storage cluster system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107276849A (en) * 2017-06-15 2017-10-20 北京奇艺世纪科技有限公司 The method for analyzing performance and device of a kind of cluster
CN107450854A (en) * 2017-08-07 2017-12-08 郑州云海信息技术有限公司 The determination method and system of maximum thread under a kind of expected rate

Also Published As

Publication number Publication date
CN109951506A (en) 2019-06-28

Similar Documents

Publication Publication Date Title
EP3419247B1 (en) Method and device for storage resource allocation for video cloud storage
US20200364608A1 (en) Communicating in a federated learning environment
WO2018176385A1 (en) System and method for network slicing for service-oriented networks
US20150160972A1 (en) Virtual machine migration management method, apparatus and system
CN112015583A (en) Data storage method, device and system
CN108512890B (en) Container cloud platform resource scheduling method and system based on rack sensing
CN112153700A (en) Network slice resource management method and equipment
CN108279974B (en) Cloud resource allocation method and device
CN108512672B (en) Service arranging method, service management method and device
CN107967164B (en) Method and system for live migration of virtual machine
JP2018525743A (en) Load balancing method and apparatus
CN109951506B (en) Method and equipment for evaluating performance of storage cluster
CN110297743B (en) Load testing method and device and storage medium
US9880883B2 (en) Virtual resource control system determining new allocation of resources at a hub
CN113923216A (en) Distributed cluster current limiting system and method and distributed cluster nodes
CN107357649B (en) Method and device for determining system resource deployment strategy and electronic equipment
CN111400241B (en) Data reconstruction method and device
US10659304B2 (en) Method of allocating processes on node devices, apparatus, and storage medium
CN110178119B (en) Method, device and storage system for processing service request
CN105335376A (en) Stream processing method, device and system
CN112911708A (en) Resource allocation method, server and storage medium
CN111046004A (en) Data file storage method, device, equipment and storage medium
CN114827079B (en) Capacity expansion method, device and storage medium of network address translation gateway
WO2020076394A1 (en) Resource allocation using restore credits
CN116150067A (en) Bandwidth adjustment method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant