CN115344466A - Cluster robustness testing method and device and electronic equipment - Google Patents

Cluster robustness testing method and device and electronic equipment Download PDF

Info

Publication number
CN115344466A
CN115344466A CN202110524276.XA CN202110524276A CN115344466A CN 115344466 A CN115344466 A CN 115344466A CN 202110524276 A CN202110524276 A CN 202110524276A CN 115344466 A CN115344466 A CN 115344466A
Authority
CN
China
Prior art keywords
cluster
tested
node
read
robustness
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110524276.XA
Other languages
Chinese (zh)
Inventor
张洪铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Cloud Network Technology Co Ltd
Original Assignee
Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Cloud Network Technology Co Ltd filed Critical Beijing Kingsoft Cloud Network Technology Co Ltd
Priority to CN202110524276.XA priority Critical patent/CN115344466A/en
Publication of CN115344466A publication Critical patent/CN115344466A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3433Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment for load management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3457Performance evaluation by simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the invention provides a cluster robustness testing method and device and electronic equipment. Wherein the method comprises the following steps: performing a first read-write operation on a cluster to be tested to obtain a first completion duration of the first read-write operation completed by the cluster to be tested, wherein each node in the cluster to be tested is in a normal state initially; controlling at least one node in the cluster to be tested to enter a preset simulation state, wherein the node in the preset simulation state has the capability of sending heartbeat messages to other nodes in the cluster to be tested; performing a second read-write operation on the cluster to be tested to obtain a second completion duration of the second read-write operation by the cluster to be tested; and determining the robustness of the cluster to be tested according to a first difference degree between the first completion time length and the second completion time length, wherein the robustness is inversely related to the first difference degree. The robustness of the cluster can be accurately determined.

Description

Cluster robustness testing method and device and electronic equipment
Technical Field
The invention relates to the technical field of cloud storage, in particular to a cluster robustness testing method and device and electronic equipment.
Background
A distributed cluster (hereinafter referred to as a cluster) such as an ETCD cluster comprises a plurality of nodes, and data synchronization can be maintained among the plurality of nodes, so that certain robustness is provided. For example, taking the example that three nodes are included in one cluster, when one of the nodes fails, the failed node can be taken over by the other two nodes to continue to complete the service logic.
However, the actual robustness of the cluster may not reach the expected robustness for various reasons, so that the cluster may fail due to insufficient robustness in the actual working process. Therefore, how to accurately test the robustness of the cluster and find out the cluster with insufficient robustness in time to avoid the fault of the cluster in the actual working process due to the insufficient robustness becomes a technical problem to be solved urgently.
Disclosure of Invention
The embodiment of the invention aims to provide a cluster robustness testing method, a cluster robustness testing device and electronic equipment, so as to accurately test the robustness of a cluster. The specific technical scheme is as follows:
in a first aspect of the embodiments of the present invention, a method for testing cluster robustness is provided, where the method includes:
performing a first read-write operation on a cluster to be tested to obtain a first completion duration of the first read-write operation completed by the cluster to be tested, wherein each node in the cluster to be tested is in a normal state initially;
controlling at least one node in the cluster to be tested to enter a preset simulation state, wherein the node in the preset simulation state has the capability of sending heartbeat messages to other nodes in the cluster to be tested;
performing a second read-write operation on the cluster to be tested to obtain a second completion duration of the second read-write operation by the cluster to be tested;
and determining the robustness of the cluster to be tested according to a first difference degree between the first completion time length and the second completion time length, wherein the robustness is inversely related to the first difference degree.
In a possible embodiment, before said controlling at least one node in the cluster to be tested to enter a preset simulation state, the method further comprises:
acquiring running environment information, wherein the running environment information is used for representing performance parameters of the cluster to be tested when the cluster to be tested runs in a specified environment;
the controlling at least one node in the cluster to be tested to enter a preset simulation state comprises the following steps:
and configuring the performance parameter of at least one node in the cluster to be tested as the performance parameter represented by the operating environment information according to the operating environment information, so that the at least one node enters a preset simulation state.
In one possible embodiment, the performance parameters include one or more of the following parameters:
network packet loss rate, network delay, system disk load rate, data disk load rate, CPU load rate, and memory usage rate.
In a possible embodiment, the performance parameter is a parameter that varies over time.
In one possible embodiment, the method further comprises:
controlling at least one node in the cluster to be tested to enter a preset closing state, wherein the node in the preset closing state does not have the capability of sending heartbeat messages to other nodes in the cluster to be tested;
performing a third read-write operation on the cluster to be tested to obtain a third completion duration of the third read-write operation by the cluster to be tested;
determining the robustness of the cluster to be tested according to the first difference degree between the first completion duration and the second completion duration, including:
and determining the robustness of the cluster to be tested according to a first difference degree between the first completion time length and the second completion time length and a second difference degree between the first completion time length and the second completion time length, wherein the robustness is negatively correlated with the first difference degree and negatively correlated with the second difference degree.
In a second aspect of the embodiments of the present invention, there is provided a cluster robustness testing apparatus, including:
the system comprises a first testing module, a second testing module and a third testing module, wherein the first testing module is used for performing first read-write operation on a cluster to be tested to obtain a first completion duration for the cluster to be tested to complete the first read-write operation, and each node in the cluster to be tested is in a normal state initially;
the simulation module is used for controlling at least one node in the cluster to be tested to enter a preset simulation state, wherein the node in the preset simulation state has the capability of sending heartbeat messages to other nodes in the cluster to be tested;
the second testing module is used for performing second read-write operation on the cluster to be tested to obtain second completion duration of the second read-write operation of the cluster to be tested;
and the determining module is used for determining the robustness of the cluster to be tested according to a first difference degree between the first completion time length and the second completion time length, wherein the robustness is negatively correlated with the first difference degree.
In a possible embodiment, the apparatus further includes an information obtaining module, configured to obtain operating environment information, where the operating environment information is used to indicate a performance parameter of the cluster to be tested when the cluster to be tested operates in a specified environment;
the simulation module is specifically configured to configure a performance parameter of at least one node in the cluster to be tested as a performance parameter represented by the operating environment information according to the operating environment information, so that the at least one node enters a preset simulation state.
In one possible embodiment, the performance parameters include one or more of the following parameters:
network packet loss rate, network delay, system disk load rate, data disk load rate, CPU load rate, and memory usage rate.
In a possible embodiment, the performance parameter is a parameter that varies over time.
In a possible embodiment, the simulation module is further configured to control at least one node in the to-be-tested cluster to enter a preset off state, where the node in the preset off state does not have a capability of sending a heartbeat packet to other nodes in the to-be-tested cluster;
the device further comprises:
the third testing module is used for carrying out third read-write operation on the cluster to be tested to obtain third completion time length for the cluster to be tested to complete the third read-write operation;
the determining module is specifically configured to determine robustness of the cluster to be tested according to a first difference degree between the first completion duration and the second completion duration and a second difference degree between the first completion duration and the second completion duration, where the robustness is negatively correlated with the first difference degree and negatively correlated with the second difference degree.
In a third aspect of the embodiments of the present invention, an electronic device is provided, which includes a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
a processor adapted to perform the method steps of any of the above first aspects when executing a program stored in the memory.
In a fourth aspect of embodiments of the present invention, a computer-readable storage medium is provided, in which a computer program is stored, which, when being executed by a processor, carries out the method steps of any one of the above-mentioned first aspects.
The embodiment of the invention has the following beneficial effects:
according to the cluster robustness testing method, the cluster robustness testing device and the electronic equipment provided by the embodiment of the invention, the influence degree of the performance of the cluster to be tested on the simulated state can be judged by comparing the time of completing the read-write operation of the cluster to be tested in the ideal state with the time of completing the read-write operation of the cluster to be tested in the simulated state, and theoretically, the higher the robustness of the cluster to be tested is, the lower the robustness of the cluster to be tested is, the higher the influence degree of the cluster to be tested on the simulated state is, so that the robustness of the cluster to be tested can be accurately determined according to the influence degree of the performance of the cluster to be tested on the simulated state.
Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a cluster robustness testing method according to an embodiment of the present invention;
FIG. 2 is another schematic flow chart of a cluster robustness testing method according to an embodiment of the present invention;
FIG. 3 is another schematic flow chart of a cluster robustness testing method according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a cluster robustness testing apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived from the embodiments given herein by one of ordinary skill in the art, are within the scope of the invention.
In order to more clearly describe the cluster robustness testing method provided by the embodiment of the present invention, a possible application scenario of the cluster robustness testing method provided by the embodiment of the present invention will be exemplarily described below. It should be understood that the following example is only one possible application scenario of the cluster robustness testing method provided in the embodiment of the present invention, and in other possible embodiments, the cluster robustness testing method provided in the embodiment of the present invention may also be applied to other possible application scenarios, which is not limited in this embodiment.
The ETCD cluster can be used for providing a distributed, highly available, consistent key-value storage database service, and can also be used for message subscription/publication. Contain a plurality of nodes in the ETCD cluster, theoretically when more than half node can normally work in the ETCD cluster, the ETCD cluster can normally work. For example, assuming that the ETCD cluster includes three nodes, theoretically, when two nodes in the cluster work normally, the ETCD cluster can work normally, that is, for the ETCD cluster including three nodes, a node can be tolerated to fail, so the ETCD cluster has certain fault resistance capability, that is, the ETCD has certain robustness.
However, the actual robustness of the etc cluster may not reach the expected robustness for various reasons, and therefore the etc cluster may not resist the fault which should be resisted theoretically in the actual working process, so that the etc cluster cannot work normally.
In order to avoid that the ETCD cluster can not work normally, the robustness test can be carried out on the ETCD cluster in advance. In a possible embodiment, less than half of the nodes in the running ETCD cluster may be down, taking the ETCD cluster including three nodes as an example, one node in the ETCD cluster may be down, and whether the ETCD cluster can continue to run normally is observed, if the ETCD cluster can still run normally, the robustness of the ETCD cluster may be considered to be high enough, and if the ETCD cluster cannot run normally, the robustness of the ETCD cluster may be considered to be low.
However, the reason why the node fails in the actual working process of the ETCD cluster is not down, so the robustness testing method in the embodiment cannot accurately judge the true robustness of the ETCD cluster, that is, the accuracy is low.
Based on this, an embodiment of the present invention provides a cluster robustness testing method, which may be as shown in fig. 1, where fig. 1 is a schematic flow diagram of the cluster robustness testing method provided in the embodiment of the present invention, and the method may include:
s101, performing read-write operation on the cluster to be tested to obtain first completion time of the read-write operation of the cluster to be tested.
S102, controlling at least one node in the cluster to be tested to enter a preset simulation state.
S103, performing second read-write operation on the cluster to be tested to obtain second completion duration of the second read-write operation of the cluster to be tested.
S104, determining the robustness of the cluster to be tested according to the first difference degree between the first completion time length and the second completion time length.
By adopting the embodiment, the influence degree of the performance of the cluster to be tested on the simulation state can be judged by comparing the time of completing the read-write operation of the cluster to be tested in the ideal state and the simulation state, and theoretically, the higher the robustness of the cluster to be tested is, the lower the influence degree of the cluster to be tested on the simulation state is, the lower the robustness of the cluster to be tested is, the higher the influence degree of the cluster to be tested on the simulation state is, so that the robustness of the cluster to be tested can be accurately determined according to the influence degree of the performance of the cluster to be tested on the simulation state.
In S101, each node in the cluster to be tested is initially in a normal state, and the normal state of the node may refer to a state in which the node can work normally theoretically, or may refer to a state in which the performance of the node is the highest theoretically. The performance parameters of the nodes can be adjusted in advance by related personnel according to actual requirements and/or experience, so that the nodes are in a normal state in advance.
The first read-write operation may be one read-write operation or a plurality of read-write operations, and the completion duration of the read-write operation may be a duration of a time period from when an instruction indicating the read-write operation is input to the cluster to be tested until the cluster to be tested returns information indicating completion of the read-write operation.
When the first read-write operation is a read-write operation, the first completion duration is the completion duration of the read-write operation. When the first read-write operation is multiple read-write operations, the first completion duration may be a set of completion durations of the multiple read-write operations, or may be a statistical value of the set, where the statistical value may be one or more of a maximum value, a minimum value, an average value, a median, a sum, and the like obtained through statistics.
In S102, the node in the preset simulation state has the capability of sending heartbeat messages to other nodes in the cluster to be tested, and the node in the normal state also has the capability of sending heartbeat messages to other nodes in the cluster to be tested. It may be understood that the node in the cluster may determine whether another node is down according to whether the node can receive the heartbeat message sent by the another node, for example, if the node in the cluster receives at least one heartbeat message sent by the another node within one time window, it may be determined that the another node is not down, and if the node does not receive the heartbeat message sent by the another node within one time window, it may be determined that the another node is down.
The fact that one node has the capability of sending heartbeat messages to other nodes in the cluster to be tested means that at least one heartbeat message in all heartbeat messages sent by the node can be received by the other nodes in the cluster.
The preset simulation state may be set according to actual needs and/or user experience, and for example, the preset simulation state may refer to any one of the following states:
1. the network packet loss rate is higher than a preset packet loss rate threshold value;
2. a state in which the network delay is above a preset delay threshold;
3. the system disk load rate is higher than a preset first load rate threshold value;
4. a state where the data disk load rate is above a second load rate threshold;
5. a state where the CPU load rate is higher than a third load rate threshold;
6. a state in which the memory usage is above a usage threshold.
In other possible embodiments, the preset simulation state may also be formed by a combination of any multiple of the above six states, and for example, the preset simulation state may be a state in which the network packet loss rate is higher than a preset packet loss rate threshold and the CPU load rate is higher than a third load rate threshold. And the preset simulation state can also be formed by combining any one or more states in the six states with other states except the six states.
In S103, the second read/write operation may be the same as the first read/write operation or different from the first read/write operation, and when the second read/write operation is different from the first read/write operation, the second read/write operation should theoretically be a read/write operation with an operation amount similar to that of the first read/write operation.
And the second read/write operation may be one read/write operation or a plurality of read/write operations. And the number of the second read-write operations may be the same as or different from the number of the first read-write operations.
When the second read/write operation is a read/write operation, the second completion duration is the completion duration of the read/write operation. When the second read/write operation is multiple read/write operations, the second completion duration may be a set of completion durations of the multiple read/write operations, or may be a statistical value of the set, where the statistical value may be one or more of a maximum value, a minimum value, an average value, a median, a sum, and the like obtained through statistics. It is to be understood that if the first completion duration and the second completion duration are both statistical values, the first completion duration and the second completion duration should theoretically be the same type of statistical value.
Illustratively, assume that the first read and write operations include read and write operations 1-3 and the second read and write operations include read and write operations 4-6. Then if the first completion duration is the maximum of the completion durations of the read-write operations 1 to 3, the second completion duration is theoretically the maximum of the completion durations of the read-write operations 4 to 6.
Since at least one node in the cluster is in the preset simulation state when the second read/write operation is performed, theoretically, the performance of the at least one node may be affected by the preset simulation state, and thus the efficiency of the cluster is reduced, and therefore the second completion time period may be longer than the first completion time period.
In S104, the robustness is inversely related to the first difference degree, that is, the robustness is higher when the first difference degree is smaller and the robustness is lower when other parameters which do not affect the robustness are not changed. It will be appreciated that, as the foregoing analysis shows, if the robustness of the cluster is higher, the performance of the nodes in the cluster is less affected by the preset simulation state at all, and thus the second completion duration is theoretically closer to the first completion duration, i.e., the first difference degree is smaller. If the robustness of the cluster is lower, the performance of the nodes in the cluster is influenced more by the preset simulation state completely, so that the second completion duration is deviated from the first completion duration in theory, namely the first difference degree is larger. The first degree of difference is therefore inversely related to the robustness of the cluster.
The robustness can be expressed in different ways according to different application scenes, and exemplary robustness can be expressed in the form of a robustness level or a robustness value. For example, if the first difference degree is smaller than the preset degree threshold, the robust level of the cluster is determined to be qualified, and if the first difference degree is not smaller than the preset degree threshold, the robust level of the cluster is determined to be unqualified, or for example, a decreasing function is used to map the first difference degree into the interval of [0,1] to obtain a mapping value corresponding to the first difference degree, and the mapping value is used as the robust value of the cluster.
It is understood that the robust level can be divided into three or more different levels according to actual requirements, for example, in one possible embodiment, the robust level can be divided into three levels of good, normal and bad, and in another possible embodiment, the robust level can be divided into five levels of excellent, better, normal, bad and bad.
Referring to fig. 2, fig. 2 is a schematic flow chart of another method for testing cluster robustness according to an embodiment of the present invention, where the method may include:
s201, performing a first read-write operation on the cluster to be tested to obtain a first completion duration of the first read-write operation of the cluster to be tested.
The step is the same as the step S101, and reference may be made to the related description about S101, which is not described herein again.
S202, acquiring the operating environment information.
The running environment information is used for representing performance parameters of the cluster to be tested when the cluster to be tested runs in a specified environment. The designated environment may be a theoretical working environment of the cluster to be tested, or an environment constructed by a user according to actual experience and/or requirements, and the performance parameters of the cluster to be tested when running in the designated environment may be estimated by the user according to experience, or determined by a related test or algorithm. For example, taking a specific environment as a theoretical working environment of a cluster to be tested as an example, a user may know, according to actual experience, that the network quality of the working environment is poor and packet loss is likely to occur when the user operates in the working environment, so that a performance parameter (i.e., a network packet loss rate) used for representing a packet loss probability in a data interaction process in the operating environment information may be set to be higher.
It is understood that the performance parameters represented by the operating environment information may vary according to actual needs, and for example, in one possible embodiment, the performance parameters represented by the operating environment information may include one or more of the following parameters: network packet loss rate, network delay, system disk load rate, data disk load rate, CPU load rate, and memory usage rate.
In other possible embodiments, other parameters that can affect the performance of the node may be included in the performance parameters.
It is understood that the node may be in a fluctuating state rather than a steady state during actual operation, for example, when the load borne by the cluster is large, the CPU load rate of the node is high, and when the load borne by the cluster is small, the load rate of the node is low. Therefore, in a possible embodiment, in order to enable the preset simulation state to better simulate the state in which the node actually operates, the performance parameter represented by the operating environment information may be a parameter that changes with time.
S203, configuring the performance parameter of at least one node in the cluster to be tested as the performance parameter represented by the operating environment information according to the operating environment information, so that the at least one node enters a preset simulation state.
Because the operating environment information represents the performance parameter of the cluster to be tested at the cloud top in the specified environment, the performance parameter of at least one node in the cluster to be tested is configured as the performance parameter represented by the operating environment information, so that the state of the at least one node is close to the state of the cluster to be tested when the cluster to be tested operates in the specified environment.
S204, performing second read-write operation on the cluster to be tested to obtain second completion duration of the second read-write operation of the cluster to be tested.
This step is the same as S103, and reference may be made to the foregoing description related to S103, which is not described herein again.
S205, according to the first difference degree between the first completion time length and the second completion time length, the robustness of the cluster to be tested is determined.
The step is the same as S104, and reference may be made to the foregoing description about S104, which is not repeated herein.
By adopting the embodiment, on one hand, the performance of the cluster in a specified environment can be better simulated through the operating environment information, so that the robustness of the cluster can be more pertinently tested. On the other hand, the user can flexibly configure the operating environment information according to the actual requirements of the user, and the robustness of the test cluster in different environments is tested, so that the applicability is stronger.
Referring to fig. 3, fig. 3 is a schematic diagram illustrating another flow of cluster robustness testing according to an embodiment of the present invention, where the schematic diagram may include:
s301, performing a first read-write operation on the cluster to be tested to obtain a first completion duration of the first read-write operation of the cluster to be tested.
The step is the same as the step S101, and reference may be made to the related description about S101, which is not described herein again.
S302, controlling at least one node in the cluster to be tested to enter a preset simulation state.
The step is the same as the step S102, and reference may be made to the related description about S102, which is not repeated herein.
And S303, performing a second read-write operation on the cluster to be tested to obtain a second completion duration of the second read-write operation of the cluster to be tested.
This step is the same as S103, and reference may be made to the related description about S103, which is not described herein again.
S304, controlling at least one node in the cluster to be tested to enter a preset closing state.
The nodes in the preset closing state do not have the ability of sending heartbeat messages to other nodes in the cluster to be tested. For example, the network of the at least one node may be cut off, or the at least one node may be down.
S305, performing a third read-write operation on the cluster to be tested to obtain a third completion duration of the third read-write operation of the cluster to be tested.
In S305, except for at least one node entering the preset off state, other nodes in the cluster are in a normal state. And when the aforementioned S303 is executed, the other nodes are in a normal state except for at least one node in the cluster that enters the preset simulation state.
It should be understood that fig. 3 is only a schematic flow chart of a possible cluster robustness testing method provided by the embodiment of the present invention, and in other possible embodiments, S304 and S305 may be executed before S302 and S303, or may be executed alternately with S302 and S303.
S306, determining the robustness of the cluster to be tested according to the first difference degree between the first completion time length and the second completion time length and the third difference degree between the first completion time length and the third completion time length.
Wherein robustness is inversely related to the first degree of difference and inversely related to the second degree of difference.
Because the node in the preset off state does not have the capability of sending the heartbeat message to other nodes in the cluster, the node can be equivalent to the node being down at the moment. Therefore, the embodiment is selected to take the influence of the downtime of the node on the cluster performance into consideration when determining the robustness of the cluster, so that the robustness of the cluster can be determined more comprehensively and accurately.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a cluster robustness testing apparatus provided in the embodiment of the present invention, which may include:
the first testing module 401 is configured to perform a first read-write operation on a cluster to be tested, so as to obtain a first completion time length for the cluster to be tested to complete the first read-write operation, where each node in the cluster to be tested is initially in a normal state;
a simulation module 402, configured to control at least one node in the to-be-tested cluster to enter a preset simulation state, where the node in the preset simulation state has a capability of sending a heartbeat message to other nodes in the to-be-tested cluster;
a second testing module 403, configured to perform a second read/write operation on the to-be-tested cluster to obtain a second completion duration for the to-be-tested cluster to complete the second read/write operation;
a determining module 404, configured to determine robustness of the cluster to be tested according to a first difference degree between the first completion duration and the second completion duration, where the robustness is negatively correlated to the first difference degree.
In a possible embodiment, the apparatus further includes an information obtaining module, configured to obtain operating environment information, where the operating environment information is used to represent a performance parameter of the cluster to be tested when the cluster to be tested operates in a specified environment;
the simulation module 402 is specifically configured to configure a performance parameter of at least one node in the cluster to be tested as a performance parameter represented by the operating environment information according to the operating environment information, so that the at least one node enters a preset simulation state.
In one possible embodiment, the performance parameters include one or more of the following parameters:
network packet loss rate, network delay, system disk load rate, data disk load rate, CPU load rate, and memory usage rate.
In a possible embodiment, the performance parameter is a parameter that varies over time.
In a possible embodiment, the simulation module 402 is further configured to control at least one node in the to-be-tested cluster to enter a preset off state, where the node in the preset off state does not have a capability of sending a heartbeat packet to other nodes in the to-be-tested cluster;
the device further comprises:
the third testing module is used for carrying out third read-write operation on the cluster to be tested to obtain third completion time length for the cluster to be tested to complete the third read-write operation;
the determining module 404 is specifically configured to determine robustness of the cluster to be tested according to a first difference degree between the first completion duration and the second completion duration and a second difference degree between the first completion duration and the second completion duration, where the robustness is negatively related to the first difference degree and negatively related to the second difference degree.
An embodiment of the present invention further provides an electronic device, as shown in fig. 5, which includes a processor 501, a communication interface 502, a memory 503 and a communication bus 504, where the processor 501, the communication interface 502 and the memory 503 complete mutual communication through the communication bus 504,
a memory 503 for storing a computer program;
the processor 501, when executing the program stored in the memory 503, implements the following steps:
performing a first read-write operation on a cluster to be tested to obtain a first completion duration of the first read-write operation completed by the cluster to be tested, wherein each node in the cluster to be tested is in a normal state initially;
controlling at least one node in the cluster to be tested to enter a preset simulation state, wherein the node in the preset simulation state has the capability of sending heartbeat messages to other nodes in the cluster to be tested;
performing a second read-write operation on the cluster to be tested to obtain a second completion duration of the second read-write operation by the cluster to be tested;
and determining the robustness of the cluster to be tested according to a first difference degree between the first completion time length and the second completion time length, wherein the robustness is inversely related to the first difference degree.
In a possible embodiment, before said controlling at least one node in the cluster to be tested to enter a preset simulation state, the method further comprises:
acquiring running environment information, wherein the running environment information is used for representing performance parameters of the cluster to be tested when the cluster to be tested runs in a specified environment;
the controlling at least one node in the cluster to be tested to enter a preset simulation state comprises the following steps:
and configuring the performance parameter of at least one node in the cluster to be tested as the performance parameter represented by the operating environment information according to the operating environment information, so that the at least one node enters a preset simulation state.
In one possible embodiment, the performance parameters include one or more of the following parameters:
network packet loss rate, network delay, system disk load rate, data disk load rate, CPU load rate, and memory usage rate.
In a possible embodiment, the performance parameter is a parameter that varies over time.
In a possible embodiment, the method further comprises:
controlling at least one node in the cluster to be tested to enter a preset closing state, wherein the node in the preset closing state does not have the capability of sending heartbeat messages to other nodes in the cluster to be tested;
performing a third read-write operation on the cluster to be tested to obtain a third completion duration of the third read-write operation by the cluster to be tested;
determining the robustness of the cluster to be tested according to the first difference degree between the first completion duration and the second completion duration, including:
and determining the robustness of the cluster to be tested according to a first difference degree between the first completion time length and the second completion time length and a second difference degree between the first completion time length and the second completion time length, wherein the robustness is negatively correlated with the first difference degree and the second difference degree.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
In a further embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any one of the above cluster robustness testing methods.
In a further embodiment provided by the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the cluster robustness testing methods of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiments of the apparatus, the electronic device, the computer-readable storage medium, and the computer program product, since they are substantially similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiments.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (12)

1. A cluster robustness testing method, the method comprising:
performing a first read-write operation on a cluster to be tested to obtain a first completion duration of the first read-write operation completed by the cluster to be tested, wherein each node in the cluster to be tested is in a normal state initially;
controlling at least one node in the cluster to be tested to enter a preset simulation state, wherein the node in the preset simulation state has the capability of sending heartbeat messages to other nodes in the cluster to be tested;
performing a second read-write operation on the cluster to be tested to obtain a second completion duration of the second read-write operation by the cluster to be tested;
and determining the robustness of the cluster to be tested according to a first difference degree between the first completion time length and the second completion time length, wherein the robustness is inversely related to the first difference degree.
2. The method of claim 1, wherein prior to said controlling at least one node in the cluster under test to enter a preset simulation state, the method further comprises:
acquiring running environment information, wherein the running environment information is used for representing performance parameters of the cluster to be tested when the cluster to be tested runs in a specified environment;
the controlling at least one node in the cluster to be tested to enter a preset simulation state comprises the following steps:
and configuring the performance parameter of at least one node in the cluster to be tested as the performance parameter represented by the operating environment information according to the operating environment information, so that the at least one node enters a preset simulation state.
3. The method of claim 2, wherein the performance parameters comprise one or more of the following parameters:
network packet loss rate, network delay, system disk load rate, data disk load rate, CPU load rate and memory utilization rate.
4. A method according to any of claims 2-3, characterized in that the performance parameter is a parameter that varies with time.
5. The method of claim 1, further comprising:
controlling at least one node in the cluster to be tested to enter a preset closing state, wherein the node in the preset closing state does not have the capability of sending heartbeat messages to other nodes in the cluster to be tested;
performing a third read-write operation on the cluster to be tested to obtain a third completion time length for the cluster to be tested to complete the third read-write operation;
determining the robustness of the cluster to be tested according to the first difference degree between the first completion duration and the second completion duration, including:
and determining the robustness of the cluster to be tested according to a first difference degree between the first completion time length and the second completion time length and a second difference degree between the first completion time length and the second completion time length, wherein the robustness is negatively correlated with the first difference degree and the second difference degree.
6. A cluster robustness testing apparatus, the apparatus comprising:
the system comprises a first testing module, a second testing module and a third testing module, wherein the first testing module is used for performing first read-write operation on a cluster to be tested to obtain a first completion duration for the cluster to be tested to complete the first read-write operation, and each node in the cluster to be tested is in a normal state initially;
the simulation module is used for controlling at least one node in the cluster to be tested to enter a preset simulation state, wherein the node in the preset simulation state has the capability of sending heartbeat messages to other nodes in the cluster to be tested;
the second testing module is used for performing second read-write operation on the cluster to be tested to obtain second completion duration of the second read-write operation of the cluster to be tested;
and the determining module is used for determining the robustness of the cluster to be tested according to a first difference degree between the first completion time length and the second completion time length, wherein the robustness is negatively correlated with the first difference degree.
7. The apparatus according to claim 6, further comprising an information obtaining module, configured to obtain operating environment information, where the operating environment information is used to represent a performance parameter of the cluster to be tested when operating in a specified environment;
the simulation module is specifically configured to configure a performance parameter of at least one node in the cluster to be tested as a performance parameter represented by the operating environment information according to the operating environment information, so that the at least one node enters a preset simulation state.
8. The apparatus of claim 7, wherein the performance parameters comprise one or more of:
network packet loss rate, network delay, system disk load rate, data disk load rate, CPU load rate, and memory usage rate.
9. The apparatus of any of claims 7-8, wherein the performance parameter is a time-varying parameter.
10. The apparatus according to claim 6, wherein the simulation module is further configured to control at least one node in the cluster to be tested to enter a preset off state, where the node in the preset off state does not have a capability of sending a heartbeat packet to other nodes in the cluster to be tested;
the device further comprises:
the third testing module is used for carrying out third read-write operation on the cluster to be tested to obtain third completion time length for the cluster to be tested to complete the third read-write operation;
the determining module is specifically configured to determine robustness of the cluster to be tested according to a first difference degree between the first completion duration and the second completion duration and a second difference degree between the first completion duration and the second completion duration, where the robustness is negatively related to the first difference degree and negatively related to the second difference degree.
11. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any one of claims 1 to 5 when executing a program stored in the memory.
12. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-5.
CN202110524276.XA 2021-05-13 2021-05-13 Cluster robustness testing method and device and electronic equipment Pending CN115344466A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110524276.XA CN115344466A (en) 2021-05-13 2021-05-13 Cluster robustness testing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110524276.XA CN115344466A (en) 2021-05-13 2021-05-13 Cluster robustness testing method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN115344466A true CN115344466A (en) 2022-11-15

Family

ID=83977842

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110524276.XA Pending CN115344466A (en) 2021-05-13 2021-05-13 Cluster robustness testing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN115344466A (en)

Similar Documents

Publication Publication Date Title
CN111181801B (en) Node cluster testing method and device, electronic equipment and storage medium
US7783744B2 (en) Facilitating root cause analysis for abnormal behavior of systems in a networked environment
US11138163B2 (en) Automatic root cause diagnosis in networks based on hypothesis testing
US20210216303A1 (en) Deployment routing of clients by analytics
US20130282354A1 (en) Generating load scenarios based on real user behavior
CN103699474A (en) Storage equipment monitoring system and method
CN110059068B (en) Data verification method and data verification system in distributed storage system
CN113472607A (en) Application program network environment detection method, device, equipment and storage medium
CN112737800A (en) Service node fault positioning method, call chain generation method and server
CN108170366A (en) Storage medium management method, device and storage device in storage device
CN111562884A (en) Data storage method and device and electronic equipment
US10938666B2 (en) Network testing simulation
US8582444B2 (en) Method for detecting hardware faults by determining a ratio of released connections
CN108512698B (en) Network disaster tolerance method and device and electronic equipment
CN115344466A (en) Cluster robustness testing method and device and electronic equipment
CN115811483A (en) Network state monitoring method and device, electronic equipment and storage medium
CN116185799A (en) Interrupt time acquisition method, device, system, communication equipment and storage medium
CN112671590B (en) Data transmission method and device, electronic equipment and computer storage medium
CN108712284B (en) Fault service positioning method and device and service server
CN114281611A (en) Method, system, equipment and storage medium for comprehensively detecting system disk
TWI841511B (en) System and method of testing memory device and non-transitory computer readable medium
US8977901B1 (en) Generating service call patterns for systems under test
TWI833172B (en) System and method of testing memory device and non-transitory computer readable medium
CN115086156B (en) Method for positioning abnormal application in storage area network
CN109753405B (en) Application resource consumption detection method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination