CN107919980B

CN107919980B - Evaluation method and device for clustered system

Info

Publication number: CN107919980B
Application number: CN201711037523.3A
Authority: CN
Inventors: 符立佳; 苗辉
Original assignee: Guizhou Baishancloud Technology Co Ltd
Current assignee: Guizhou Baishancloud Technology Co Ltd
Priority date: 2017-10-30
Filing date: 2017-10-30
Publication date: 2020-02-21
Anticipated expiration: 2037-10-30
Also published as: CN107919980A

Abstract

The invention discloses an evaluating method and a device of a clustering system, wherein the method comprises the following steps: step 1, obtaining a cluster calling program information table of a central server, wherein the cluster calling program information table comprises: calling a target address, a configuration file path and disaster recovery switching time limit; step 2, A reads the configuration file according to the path of the configuration file, and judges whether the calling target address is reasonably configured according to the content of the configuration file; when the judgment result is unreasonable, generating first alarm information and generating an evaluation report containing the first alarm information; and/or B, judging whether the clustered system completes disaster recovery switching within the disaster recovery switching time limit or not through simulation test; when the clustered system does not finish disaster recovery switching within the disaster recovery switching time limit, generating second alarm information and generating an evaluation report containing the second alarm information; and 3, outputting an evaluation report.

Description

Evaluation method and device for clustered system

Technical Field

The invention relates to the technical field of computer networks, in particular to an evaluation method and device for a clustered system.

Background

With the development of the internet, netizens have higher and higher requirements on the quality of network access and tolerate a network access fault or a service fault 0. At present, in order to achieve high availability, a plurality of network service systems are often built by adopting a clustering structure, so that the availability of system services can still be guaranteed when a single server fails and a single node fails. However, in many cases, such a clustered system cannot implement failover due to improper configuration of a cluster calling program or wrong calling manner, and thus, when a single server/node fails, the clustered characteristic of the system cannot be successfully utilized, resulting in service failure.

For example, system A has two servers A and B, which can provide equivalent services, and system B calls the data of system A as the data source of the service. Under normal conditions, when a single server A of the system A has a fault, the server B can normally provide services. However, in reality, only the server a may be configured when the system B configures the calling program, or although the servers a and B are configured, due to the problem of the switching program of the system B, when the server a fails, the server a cannot be switched to the server B to acquire data, so that a service failure occurs, thereby reducing the reliability of the service system.

Therefore, it is necessary to evaluate the clustered system so as to find the above abnormal conditions existing in the clustered system in time.

Disclosure of Invention

In order to solve the technical problem, the invention provides an evaluating method and an evaluating device for a clustered system, which can evaluate the calling strategy and the performance of the clustered system.

The invention provides an evaluating method of a clustering system, which comprises the following steps:

step 1, obtaining a cluster calling program information table of a central server, wherein the cluster calling program information table comprises: calling a target address, a configuration file path and disaster recovery switching time limit;

step 2, A reads the configuration file according to the path of the configuration file, and judges whether the calling target address is reasonably configured according to the content of the configuration file; when the judgment result is unreasonable, generating first alarm information and generating an evaluation report containing the first alarm information; and/or B, judging whether the clustered system completes disaster recovery switching within the disaster recovery switching time limit or not through simulation test; when the clustered system does not finish disaster recovery switching within the disaster recovery switching time limit, generating second alarm information and generating an evaluation report containing the second alarm information;

and 3, outputting an evaluation report.

Further, in the foregoing scheme, the method further includes:

judging whether the cluster calling program information table is complete or not according to the access log of the clustered system;

when the information table of the cluster calling program is complete, executing the step 2;

and when the information table of the cluster calling program is incomplete, generating third alarm information, generating an evaluation report containing the third alarm information, and executing the step 3.

Further, in the above solution, the information table of the cluster calling program further includes: a first calling source address and a first application service name; the judging whether the cluster calling program information table is complete according to the access log of the clustered system comprises:

extracting a second calling source address and a second application service name in an access log of the clustered system within preset time;

when the first calling source address is consistent with the second calling source address and the first application service name is consistent with the second application service name, judging that the cluster calling program information table is complete;

otherwise, judging that the information table of the cluster calling program is incomplete.

Further, in the foregoing scheme, the determining whether the calling target address is configured reasonably according to the content of the configuration file includes:

acquiring the address of the called server and a preset calling configuration strategy according to the configuration file;

when the calling target address is different from the called server address, judging that the calling target address is unreasonable; and/or

And when the calling target address does not accord with a preset calling configuration strategy, judging that the calling target address is unreasonable.

Further, in the foregoing scheme, the determining, by the simulation test, whether the clustered system completes the disaster recovery switching within the disaster recovery switching time limit includes:

respectively appointing a server pointed by any address in the calling target addresses to actively shield the request sent by the calling source address;

observing whether the clustered system switches the request to a server pointed to by other addresses in the calling target address;

when the cluster system switches the request to a server pointed by other addresses in the calling target address, recording the switching time used in the switching process;

when all the servers pointed by all the addresses in the calling target address are designated, each request is switched, and the corresponding switching time is less than or equal to the disaster recovery switching time limit, judging that the clustered system completes disaster recovery switching within the disaster recovery switching time limit;

otherwise, the cluster system does not complete the disaster recovery switching within the disaster recovery switching time limit.

The invention also provides an evaluation device of the clustering system, which comprises: the system comprises an information table acquisition module, a configuration judgment module and/or a disaster tolerance test module and a report output module; wherein the content of the first and second substances,

an information table obtaining module, configured to obtain a cluster calling program information table of the central server, where the cluster calling program information table includes: calling a target address, a configuration file path and disaster recovery switching time limit;

the configuration judging module is used for reading a configuration file according to the path of the configuration file and judging whether the calling target address is reasonably configured according to the content of the configuration file; when the judgment result is unreasonable, generating first alarm information;

the disaster tolerance test module is used for judging whether the clustered system completes disaster tolerance switching within the disaster tolerance switching time limit through simulation test; when the clustered system does not finish disaster recovery switching within the disaster recovery switching time limit, generating second alarm information;

and the report output module is used for outputting an evaluation report, and the evaluation report comprises the first alarm information and/or the second alarm information.

Further, in the foregoing solution, the apparatus further includes an information table checking module, where the information table checking module includes:

a complete judgment unit, configured to judge whether a cluster calling program information table of a central server is complete according to an access log of the clustered system after the cluster calling program information table of the central server is obtained;

the first skipping unit is used for skipping to the configuration judging module and/or the disaster tolerance testing module when the information table of the cluster calling program is complete;

and the second skipping unit is used for generating third alarm information when the information table of the cluster calling program is incomplete, generating an evaluation report containing the third alarm information and skipping to the report output module.

Further, in the above solution, the information table of the cluster calling program further includes: a first calling source address and a first application service name; the judging complete unit includes:

the extraction subunit is used for extracting a second calling source address and a second application service name in an access log of the clustered system within preset time;

a determining subunit, configured to determine that the cluster calling program information table is complete when the first calling source address is consistent with the second calling source address and the first application service name is consistent with the second application service name; otherwise, judging that the information table of the cluster calling program is incomplete.

Further, in the foregoing solution, the configuration determining module includes:

the obtaining unit is used for obtaining the called server address and a preset calling configuration strategy according to the configuration file;

the server judging unit is used for judging that the calling target address is unreasonable when the calling target address is different from the called server address; and/or

And the strategy judgment unit is used for judging that the calling target address is unreasonable when the calling target address does not accord with a preset calling configuration strategy.

Further, in the above scheme, the disaster recovery testing module includes:

the shielding unit is used for respectively appointing a server pointed by any address in the calling target addresses to actively shield the request sent by the calling source address;

the observation unit is used for observing whether the clustered system switches the request to a server pointed by other addresses in the calling target address;

the recording unit is used for recording the switching time used in the switching process when the clustered system switches the request to the server pointed by other addresses in the calling target address;

a disaster recovery determining unit, configured to determine that the clustered system completes disaster recovery switching within the disaster recovery switching time limit when all servers pointed to by all addresses in the call target address are designated, each request is switched, and the corresponding switching time is less than or equal to the disaster recovery switching time limit; otherwise, the cluster system does not complete the disaster recovery switching within the disaster recovery switching time limit.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:

fig. 1 is a schematic flow chart illustrating an implementation of an evaluation method of a clustered system according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a composition of an evaluation apparatus of a clustered system according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.

The invention provides an evaluating device which is used for implementing the evaluating method of the clustered system.

Fig. 1 is a schematic flow chart of an implementation of an evaluation method for a clustered system according to an embodiment of the present invention, as shown in fig. 1, the method includes:

specifically, the evaluation device obtains a cluster calling program information table of the central server, and the cluster calling program information table usually includes: the method comprises the following steps that information such as a first calling source address, a first application service name, a clustered system name, a first access resource, a calling target address, a configuration file path, disaster recovery switching time limit and the like are obtained;

here, the cluster caller refers to a generic name of a system, a module, and an interface of the cluster service.

Table 1 is an example of a table of cluster caller information in one embodiment.

TABLE 1

The evaluating device obtains a cluster calling program information table, and can obtain a plurality of key information for evaluating whether the clustered system has high reliability, including: a first calling source address, a first application service name, a calling target address, a configuration file path, a disaster recovery switching time limit and the like.

Therefore, whether the content of the information table of the cluster calling program is complete and correct or not directly influences the reliability of the evaluation result of the evaluation device on the clustered system. Therefore, in some embodiments, the above evaluation method further includes:

after a cluster calling program information table of a central server is obtained, whether the cluster calling program information table is complete or not is judged according to an access log of the clustered system;

when the information table of the cluster calling program is complete, executing the following step 2;

and when the information table of the cluster calling program is incomplete, generating third alarm information, executing the following step 3, and generating an evaluation report containing the third alarm information.

Specifically, the determining, according to the access log of the clustered system, whether the information table of the cluster calling program is complete includes:

Because when the cluster calling program accesses the clustered system, the IP address of the server (i.e. the first calling source address) and the name of the first application service are put into the request information, and the clustered system receives the calling request and records information of calling time, accessing resources, calling source IP address, calling source application service, etc. into the access log, the access log of the clustered system usually contains: calling time, a second access resource, a second calling source address, a second application service name and the like.

For example, the cluster calling program information table of the central server acquired by the evaluation device is shown in table 1, and when accessing the cluster 1, 1.1.1.1, applications 1 and 2.2.2.1, and application 2 are carried to the cluster 1; the cluster 1 prints an access log, including two items, namely 1.1.1.1, application 1 and 2.2.2.1 and application 2; the information is information of two second calling source addresses and second application service names; comparing the first calling source address and the first application service name with the first calling source address and the first application service name acquired from the cluster calling program information table (table 1), and judging that the cluster calling program information table is complete if the comparison is consistent; assuming that there is 3.3.3.3, 3, or only 1.1.1.1, 1, or only 2.2.2.1, 2 access in the log, the cluster caller information table is determined to be incomplete.

Further, in some embodiments, when the evaluating device finds that the information table of the cluster calling program is incomplete, not only the third alarm information is generated, but also the evaluation report including the third alarm information is generated and skipped to the following step 3, and the current evaluation process of the clustered system is exited.

Step 2, A reads the configuration file according to the path of the configuration file, and judges whether the calling target address is reasonably configured according to the content of the configuration file; when the judgment result is unreasonable, generating first alarm information and generating an evaluation report containing the first alarm information;

specifically, the evaluation device reads a configuration file according to the path of the configuration file, and judges whether the calling target address is reasonably configured according to the content of the configuration file; when the judgment result is unreasonable, generating first alarm information and generating an evaluation report containing the first alarm information;

in the foregoing solution, the determining whether the calling target address is configured reasonably according to the content of the configuration file includes:

Here, the preset invoking configuration policy includes, but is not limited to, "configure the minimum number of clustered servers IP," "different room servers," "different network area servers," "different ISP servers," and the like. Generally, a preset calling configuration policy is necessarily contained in a configuration file, the configuration policy has a unique identifier as a key, and if the calling configuration policy is particularly specified in the configuration file, the default is that the minimum number of configured clustered servers IP is 0. For example: "server _ ip: 1.1.1.1, 2.2.2.2 ", wherein server _ ip is a unique identifier of a calling configuration policy preset in a configuration file.

For example: the cluster calling program information table of the central server acquired by the evaluation device is shown in table 1, and whether the calling target address is reasonably configured or not is judged according to the first item in table 1. The evaluation device reads the IP data of the cluster server in the Config1, judges whether the IP data are consistent with the calling target addresses ' 4.4.4.1 and 4.4.4.2 ', judges reasonably if the IP data are consistent with the calling target addresses ', and judges unreasonably if the IP data are inconsistent with the calling target addresses; if the address configuration is consistent, the calling target address configuration is determined to be 4.4.4.1, 4.4.4.2;

at this time, the preset evaluation strategy is that the minimum IP is N, and the relationship is different machine rooms. The actual calling target addresses have 2 IPs, so that if N is greater than 2, the number of the actual calling target addresses does not meet the requirement of a preset evaluation strategy, and the judgment is unreasonable; and if N is less than or equal to 2, judging to be reasonable if the actual number of the calling target addresses meets the requirement of a preset evaluation strategy. Then, the evaluating device judges whether the IP addresses 4.4.4.1, 4.4.4.2 are in the same machine room, if so, the judgment is unreasonable, otherwise, the judgment is reasonable.

Therefore, the evaluating device can complete the check of the cluster calling program configuration rationality of the clustered system.

Further, in order to more perfectly evaluate the reliability of the clustered system, the evaluating method may further include:

step 2, judging whether the clustered system completes disaster recovery switching within the disaster recovery switching time limit or not through simulation test; when the clustered system does not finish disaster recovery switching within the disaster recovery switching time limit, generating second alarm information and generating an evaluation report containing the second alarm information;

specifically, the evaluation device can judge whether the clustered system completes disaster recovery switching within the disaster recovery switching time limit through an actual test process; when the clustered system does not finish disaster recovery switching within the disaster recovery switching time limit, generating second alarm information and generating an evaluation report containing the second alarm information;

wherein the determining, through the simulation test, whether the clustered system completes the disaster recovery switching within the disaster recovery switching time limit includes:

For example, the cluster calling program information table of the central server acquired by the evaluation device is shown in table 1, and the test judgment is performed on the first item in table 1. The evaluation device shields call request access from an IP address of 1.1.1.1 and an application service name of application 1 on a server with an IP address of 4.4.4.1 and records time; monitoring an access log on a server with an IP address of 4.4.4.2, and confirming the time when the access request reaches a backup server 4.4.4.2; calculating the switching time, comparing disaster tolerance switching time limit in the cluster calling program information table, if the time is less than or equal to the disaster tolerance switching time limit, determining that the disaster tolerance switching is completed within a set time limit, if the time exceeds the disaster tolerance switching time limit and the switching is not performed, determining that the switching is failed, and determining that the disaster tolerance switching is abnormal; then, the evaluating device shields the call request access from the IP address of 1.1.1.1 and the application service name of application 1 on the server with the IP address of 4.4.4.2, and records the time; monitoring an access log on a server with an IP address of 4.4.4.1, and confirming the time when the access request reaches a backup server 4.4.4.1; calculating the switching time, comparing disaster tolerance switching time limit in the cluster calling program information table, if the time is less than or equal to the disaster tolerance switching time limit, determining that the disaster tolerance switching is completed within a set time limit, if the time exceeds the disaster tolerance switching time limit and the switching is not performed, determining that the switching is failed, and determining that the disaster tolerance switching is abnormal; and if the two times of switching are finished within the set time limit, judging that the disaster tolerance capability of the clustering system is normal.

Particularly, the step 2B can complete evaluation of the disaster tolerance capability of the clustered system, and does not depend on the step 2A, so in some embodiments, the evaluation device can skip the step 2A and directly execute the step 2B; of course, in some embodiments, the evaluation device may only perform step 2A, skipping step 2B.

And 3, outputting an evaluation report.

Specifically, the evaluation device outputs the generated evaluation report according to the evaluation condition.

For example, the evaluation device outputs the alarm information when detecting the abnormality according to the above scheme, and when the abnormality is not detected, the evaluation report may be the normal evaluation result and the data of the evaluation process of each link.

By using the evaluation method of the clustered system provided by the embodiment, which configuration faults and omissions exist in the clustered system for the cluster calling program can be found in time, and meanwhile, the disaster tolerance capability of the clustered system can be actually detected, and the defect of the disaster tolerance capability can be found in time; therefore, when the fault does not occur, the fault is prevented in advance, the risk of the fault is reduced, and the reliability of the clustered system is improved.

Fig. 2 is a schematic structural diagram of an evaluation device of a clustered system according to an embodiment of the present invention, and as shown in fig. 2, the evaluation device includes: an information table acquisition module 201, a configuration judgment module 202 and/or a disaster tolerance test module 203, and a report output module 204; wherein the content of the first and second substances,

an information table obtaining module 201, configured to obtain a cluster calling program information table of a central server, where the cluster calling program information table includes: a first calling source address, a first application service name, a calling target address, a configuration file path and a disaster recovery switching time limit;

a configuration determining module 202, configured to read a configuration file according to the configuration file path, and determine whether the calling target address is configured reasonably according to the configuration file content; when the judgment result is unreasonable, generating first alarm information and generating an evaluation report containing the first alarm information;

the disaster tolerance testing module 203 is configured to determine, through a simulation test, whether the clustered system completes disaster tolerance switching within the disaster tolerance switching time limit; when the clustered system does not finish disaster recovery switching within the disaster recovery switching time limit, generating second alarm information and generating an evaluation report containing the second alarm information;

a report output module 204, configured to output an evaluation report, where the evaluation report includes the first warning information and/or the second warning information.

Further, the above evaluation apparatus further includes an information table checking module, where the information table checking module includes:

a first jumping unit, configured to jump to the configuration determining module 202 and/or the disaster tolerance testing module 203 when the information table of the cluster calling program is complete;

and the second skipping unit is used for generating third alarm information when the information table of the cluster calling program is incomplete, generating an evaluation report containing the third alarm information and skipping to the report output module 204. .

Furthermore, in the foregoing solution, the determining unit includes:

In the above solution, the configuration determining module 202 includes:

In the above solution, the disaster recovery testing module 203 includes:

In practical applications, each module and each unit can be implemented by a Central Processing Unit (CPU), a microprocessor unit (MPU), a Digital Signal Processor (DSP), or a Field Programmable Gate Array (FPGA) in the evaluation device.

The above-described aspects may be implemented individually or in various combinations, and such variations are within the scope of the present invention.

It will be understood by those skilled in the art that all or part of the steps of the above methods may be implemented by instructing the relevant hardware through a program, and the program may be stored in a computer readable storage medium, such as a read-only memory, a magnetic or optical disk, and the like. Alternatively, all or part of the steps of the foregoing embodiments may also be implemented by using one or more integrated circuits, and accordingly, each module/unit in the foregoing embodiments may be implemented in the form of hardware, and may also be implemented in the form of a software functional module. The present invention is not limited to any specific form of combination of hardware and software.

It is to be noted that, in this document, the terms "comprises", "comprising" or any other variation thereof are intended to cover a non-exclusive inclusion, so that an article or apparatus including a series of elements includes not only those elements but also other elements not explicitly listed or inherent to such article or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of additional like elements in the article or device comprising the element.

The above embodiments are merely to illustrate the technical solutions of the present invention and not to limit the present invention, and the present invention has been described in detail with reference to the preferred embodiments. It will be understood by those skilled in the art that various modifications and equivalent arrangements may be made without departing from the spirit and scope of the present invention and it should be understood that the present invention is to be covered by the appended claims.

Claims

1. A method for evaluating a clustered system, the method comprising:

step 2, A reads the configuration file according to the path of the configuration file, and judges whether the calling target address is reasonably configured according to the content of the configuration file; when the judgment result is unreasonable, generating first alarm information and generating an evaluation report containing the first alarm information; and the number of the first and second groups,

b, judging whether the clustered system completes disaster recovery switching within the disaster recovery switching time limit or not through simulation test; when the clustered system does not finish disaster recovery switching within the disaster recovery switching time limit, generating second alarm information and generating an evaluation report containing the second alarm information;

step 3, outputting an evaluation report;

after step 1, before step 2, the method further comprises:

when the information table of the cluster calling program is incomplete, generating third alarm information, generating an evaluation report containing the third alarm information, and executing the step 3;

in step 2, the determining whether the calling target address is configured reasonably according to the content of the configuration file includes:

2. An evaluating method according to claim 1, wherein the cluster caller information table further comprises: a first calling source address and a first application service name; the judging whether the cluster calling program information table is complete according to the access log of the clustered system comprises:

3. The evaluating method according to claim 1, wherein the determining whether the clustered system completes the disaster recovery switching within the disaster recovery switching time limit through the simulation test comprises:

4. An evaluation apparatus of a clustered system, the apparatus comprising: the disaster recovery system comprises an information table acquisition module, a configuration judgment module, a disaster tolerance test module and a report output module; wherein the content of the first and second substances,

wherein the configuration determination module comprises:

The strategy judgment unit is used for judging that the calling target address is unreasonable when the calling target address does not accord with a preset calling configuration strategy;

the report output module is used for outputting an evaluation report, and the evaluation report comprises the first alarm information and the second alarm information;

the apparatus further comprises an information table checking module, the information table checking module comprising:

the first skipping unit is used for skipping to the configuration judging module and the disaster tolerance testing module when the information table of the cluster calling program is complete;

5. The evaluation apparatus according to claim 4, wherein the information table of the group call routine further comprises: a first calling source address and a first application service name; the judging complete unit includes:

6. The evaluation device according to claim 4, wherein the disaster recovery test module comprises: