CN106656684B - Grid resource reliability monitoring method and device - Google Patents

Grid resource reliability monitoring method and device Download PDF

Info

Publication number
CN106656684B
CN106656684B CN201710187307.0A CN201710187307A CN106656684B CN 106656684 B CN106656684 B CN 106656684B CN 201710187307 A CN201710187307 A CN 201710187307A CN 106656684 B CN106656684 B CN 106656684B
Authority
CN
China
Prior art keywords
resource
test
test program
reliability
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710187307.0A
Other languages
Chinese (zh)
Other versions
CN106656684A (en
Inventor
陈炯
孙涌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201710187307.0A priority Critical patent/CN106656684B/en
Publication of CN106656684A publication Critical patent/CN106656684A/en
Application granted granted Critical
Publication of CN106656684B publication Critical patent/CN106656684B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/14Arrangements for monitoring or testing data switching networks using software, i.e. software packages
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Environmental & Geological Engineering (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method and a device for monitoring the reliability of grid resources, wherein the method comprises the following steps: acquiring resources from a grid environment according to a test instruction or according to a preset time interval; loading a corresponding test program according to the attribute of the resource; executing the test program and outputting a reliability evaluation result of the resource; according to the invention, the corresponding test program is loaded according to the attribute of the resource, so that the test program corresponding to the attribute of the resource can be loaded for the resources with different attributes such as the resource type, the management domain and the like, and the accuracy of reliability evaluation on the resource is ensured; by executing the test program, the reliability evaluation results of the resources are output, and the reliability evaluation results corresponding to different attributes of each resource can be obtained, so that the reliability of each attribute of each resource can be displayed more visually, the root cause of the resource fault can be positioned quickly and accurately, the time spent by a user on fault removal is reduced, and the user experience is improved.

Description

Grid resource reliability monitoring method and device
Technical Field
The present invention relates to the field of resource monitoring, and in particular, to a method and an apparatus for monitoring reliability of grid resources.
Background
With the development of modern computer technology, distributed computing is widely applied to the field of high-energy physics as an efficient computing mode. At present, in the field of high-energy physics, a data processing platform for an experiment is generally established by adopting a grid computing technology. Grid computing integrates data processing sites dispersed throughout various cooperating organizations through grid middleware techniques to serve one or more common physical goals. The grid computing platform provides services for physical researchers through a mode of operation, and data analysis operation of a physical experiment is distributed to each station through an operation scheduling system to be operated. The computing resources of the site provide the basic environment for the job to run, while the storage resources hold the experimental data needed for the job to run. Different sites are connected with each other through a high-speed network, and sharing of resources and data is achieved.
In a large-scale grid environment, resources have the characteristics of distributivity, heterogeneity, dynamics, autonomy and the like. The distribution of resources is manifested in that the distribution of resources over geographic locations is typically across regions, countries, or even continents. The resource isomerous body is characterized in that the resource types are various, and the system structure of the same resource is greatly different due to different implementation technologies. The dynamic body of the resource is that the resource can be added into or withdrawn from the grid at any time and is not restricted by the grid environment. The autonomy of the resources is realized in the way that the resources in different regions belong to different management domains, each management domain has a respective resource management strategy, and independent grid resource management can be realized. For the complexity of grid resources, the grid computing adopts middleware software, a middle layer is built between a user and the resources, the complexity of the resources is hidden, and a transparent resource access mechanism is provided for the user. However, the complexity of hiding greatly increases the maintenance difficulty of the grid environment, and once some resources fail, the root of the failure is difficult to accurately locate, which affects the recovery of the system. Therefore, there is a need for accurate and efficient monitoring of the reliability of individual resources in a grid environment.
In the prior art, a traditional grid monitoring method can simply collect and display some information related to resources, and a user needs to judge the reliability of the resources by the information and combining with own professional knowledge, find a fault of the resources and find a reason for the fault to remove. However, the traditional grid monitoring method is not intuitive in expression of resource reliability, has high requirements on the professional level of a user, and is not beneficial to user experience. Therefore, how to more intuitively reflect the reliability of the resource, quickly and accurately locate the source of the resource fault, and reduce the time spent by the user in troubleshooting is a problem which needs to be solved urgently nowadays.
Disclosure of Invention
The invention aims to provide a method and a device for monitoring the reliability of grid resources, which are used for reflecting the reliability of resources more intuitively, positioning the root cause of resource faults quickly and accurately, reducing the time spent by a user on fault removal and improving the user experience.
In order to solve the above technical problem, the present invention provides a method for monitoring grid resource reliability, which comprises:
acquiring resources from a grid environment according to a test instruction or according to a preset time interval;
loading a corresponding test program according to the attribute of the resource;
and executing the test program and outputting the reliability evaluation result of the resource.
Optionally, the loading a corresponding test program according to the attribute of the resource includes:
according to a preset matching rule, sequentially matching the attribute value of each attribute of the resource with the preset attribute values of all the test programs;
judging whether each attribute of the resource is matched with the corresponding test program;
and if so, successfully matching, loading the test program, executing the test program, and outputting the reliability evaluation result of the resource.
Optionally, the executing the test program and outputting the reliability evaluation result of the resource includes:
judging whether test operation needs to be submitted or not;
if not, executing a C L I command, and outputting the reliability evaluation result of the resource;
if so, submitting the test operation to a site, acquiring an output file of the site for executing the test operation, and outputting the reliability evaluation result of the resource according to the output file.
Optionally, the method further includes:
judging whether all the test programs are executed completely;
if yes, integrating the reliability evaluation results of the resources output by all the test programs to obtain the reliability state of the resources.
Optionally, the executing the test program and outputting the reliability evaluation result of the resource further includes:
in the execution process of the test program, continuously judging whether the execution time of the test program is greater than a preset time;
and if so, directly outputting the reliability evaluation result of the resource.
In addition, the invention also provides a device for monitoring the reliability of grid resources, which comprises:
the resource acquisition module is used for acquiring resources from the grid environment according to the test instruction or according to a preset time interval;
the test program loading module is used for loading the corresponding test program according to the attribute of the resource;
and the test program execution module is used for executing the test program and outputting the reliability evaluation result of the resource.
Optionally, the test program loading module includes:
the matching submodule is used for matching the attribute value of each attribute of the resource with the preset attribute values of all the test programs in sequence according to a preset matching rule;
the first judgment submodule is used for judging whether each attribute of the resource is matched with the corresponding test program; if yes, a matching success signal is sent to the loading sub-module;
and the loading submodule is used for receiving the matching success signal and loading the test program.
Optionally, the test program execution module includes:
the second judgment submodule is used for judging whether the test operation needs to be submitted, if not, the first starting signal is sent to the C L I test submodule, and if so, the second starting signal is sent to the test operation submodule;
the C L I test submodule is used for receiving the first starting signal, executing a C L I command and outputting a reliability evaluation result of the resource;
and the test operation submodule is used for receiving the second starting signal, submitting the test operation to a site, acquiring an output file of the site for executing the test operation, and outputting the reliability evaluation result of the resource according to the output file.
Optionally, the apparatus further comprises:
the judging module is used for judging whether all the test programs are executed completely; if so, sending a third starting signal to the reliability state module;
and the reliability state module is used for receiving a third starting signal, synthesizing the reliability evaluation results of the resources output by all the test programs and acquiring the reliability state of the resources.
Optionally, the test program execution module further includes:
the continuous judgment sub-module is used for continuously judging whether the execution time of the test program is greater than the preset time or not in the execution process of the test program; if yes, a fourth starting signal is sent to the overtime output submodule;
and the timeout output submodule is used for receiving the fourth starting signal and directly outputting the reliability evaluation result of the resource.
The invention provides a method for monitoring the reliability of grid resources, which comprises the following steps: acquiring resources from a grid environment according to a test instruction or according to a preset time interval; loading a corresponding test program according to the attribute of the resource; executing the test program and outputting a reliability evaluation result of the resource;
therefore, the method and the device can load the test program corresponding to the attribute of the resource for the resource with different attributes such as the resource type, the management domain and the like by loading the corresponding test program according to the attribute of the resource, thereby ensuring the accuracy of reliability evaluation on the resource; by executing the test program, the reliability evaluation results of the resources are output, and the reliability evaluation results corresponding to different attributes of each resource can be obtained, so that the reliability of each attribute of each resource can be displayed more visually, the root cause of the resource fault can be positioned quickly and accurately, the time spent by a user on fault removal is reduced, and the user experience is improved. In addition, the invention also provides a device for monitoring the reliability of the grid resources, and the device also has the beneficial effects.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a method for monitoring reliability of grid resources according to an embodiment of the present invention;
fig. 2 is a flowchart of another method for monitoring reliability of grid resources according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating the execution of a test program of another grid resource reliability monitoring method according to an embodiment of the present invention;
fig. 4 is a schematic diagram illustrating a state of execution of a test program of another grid resource reliability monitoring method according to an embodiment of the present invention;
fig. 5 is a structural diagram of a grid resource reliability monitoring apparatus according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a framework of a grid resource reliability monitoring apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart illustrating a method for monitoring reliability of grid resources according to an embodiment of the present invention. The method can comprise the following steps:
step 101: and acquiring resources from the grid environment according to the test instruction or according to a preset time interval.
The test instruction may be an instruction for detecting the reliability of the resource, which is sent by a user, that is, a manager when the grid resource fails or at any other time. The preset time interval may be a time interval for monitoring the reliability of the grid resource, which is set by the user. The present embodiment does not limit the setting of the specific content of the test command and the specific value of the preset time interval.
It can be understood that the resource acquired from the grid environment in the method provided by this embodiment may be a resource, that is, the embodiment takes a resource as an example, and introduces the reliability monitoring of the grid resource, and for various resources with different attributes, such as a resource type and an affiliated administrative domain, reliability monitoring may be performed by a method similar to the method provided by this embodiment, which is not limited in any way by this embodiment.
It should be noted that, as to the specific manner of acquiring the resource from the grid environment, the resource may be received directly from the resource storage location in the grid environment, or may be received through other transmission manners, and this embodiment also does not make any limitation on this.
Step 102: and loading a corresponding test program according to the attribute of the resource.
When loading the corresponding test program for the resource, the corresponding test program may be matched through the attribute of the resource, or the corresponding test program may be loaded according to other distinguishing points of different resources, or the test program applicable to all resources may be directly loaded. The present embodiment is not limited to this.
It can be understood that, for the specific way of loading the corresponding test program according to the attribute of the resource, the test program corresponding to the attribute can be loaded in a common attribute matching way, that is, according to different attributes of the resource; the corresponding test program may also be loaded according to the attribute of the resource in other manners, which is not limited in this embodiment.
It should be noted that, for the specific setting mode of the test program, the setting can be performed according to the principles of correctness, comprehensiveness, and single functionality, where the correctness can be that the test program itself needs to ensure the correctness of execution, and there cannot be a logical error; the comprehensiveness can be that the functions of all aspects of the resources need to be considered, and the resources are comprehensively tested; the single functionality can be fine-grained division of the testing task of the resource, and each test is guaranteed to test only a single function. That is, each test program performs a correct test on one attribute of the resource on the basis of ensuring that all the attributes of the resource are tested. The test program for testing the reliability of the resource can also be set according to other principles. The specific setting mode of the test program can be set by a designer or a user according to a practical scene and user requirements, and the embodiment is not limited to this.
Step 103: and executing the test program and outputting the reliability evaluation result of the resource.
The reliability evaluation result of the resource output in this step may be a result output by executing the test program. If the number of the executed test programs is one, the output result can be the reliability state of the resource; if there are multiple executed test programs, the output result may be the reliability evaluation result corresponding to each of the multiple attributes of the resource. The present embodiment is not limited to this.
It can be understood that, for a plurality of reliability evaluation results of the resource output after the test program corresponding to each of the plurality of attributes of the resource is executed, the method provided by this embodiment may further include a step of integrating the reliability evaluation results of the resource output by all the test programs to obtain the reliability state of the resource. That is, after all the test programs are executed, the reliability evaluation results output by the test programs corresponding to the different attributes of the resource are comprehensively analyzed, and the reliability state of the resource is determined. The present embodiment is not limited to this.
It should be noted that, in the grid environment, the resources to be monitored can be divided into computing resources, storage resources, and network resources. The computing resources, namely the specific environment for operation of the job, can be a Personal Computer (PC), a computing cluster or a cloud computing environment, and various software required by operation of the job is installed on the computing resources; the storage resources are various file storage systems and are used for storing experimental data, and the operation needs to acquire related data from the storage resources and store output data to the storage resources; network resources, i.e. high-speed networks connecting different regional sites, are a guarantee for data transmission.
The reliability Test program of the grid resource which can be divided into the three types can be divided into two types of Job Test and C L ITest, wherein the Job Test needs to submit a Test Job to a site for operation, and the Test program of the computing resource belongs to the type of the reliability evaluation result of the output file output resource of the Test Job operation, the C L I Test can only locally execute a C L I Test command, and the Test of the storage resource and the network resource belongs to the C L I Test according to the reliability evaluation result of the output resource of the command.
Specifically, for the reliability evaluation result of the resource output in this step, the reliability evaluation result corresponding to each attribute of the resource may be used. The contents of the reliability evaluation result may be OK, Bad, and uknown. Wherein, OK indicates that the attribute of the resource corresponding to the test program runs without exception, Bad indicates that the attribute of the resource corresponding to the test program fails, and unsewn indicates that the reliability state of the attribute of the resource corresponding to the test program is not judged due to some Unknown reasons, that is, the test program runs overtime. The content of the reliability evaluation result may be other, as long as the reliability of the attribute of the resource can be evaluated, and as for the content of the reliability evaluation result, the content may be set by a designer or a user according to a practical scenario and a user requirement, which is not limited in this embodiment.
In this embodiment, by loading the corresponding test program according to the attribute of the resource, the embodiment of the present invention may load the test program corresponding to the attribute of the resource for the resource with different attributes, such as the resource type and the belonging management domain, and ensure the accuracy of reliability evaluation on the resource; by executing the test program, the reliability evaluation results of the resources are output, and the reliability evaluation results corresponding to different attributes of each resource can be obtained, so that the reliability of each attribute of each resource can be displayed more visually, the root cause of the resource fault can be positioned quickly and accurately, the time spent by a user on fault removal is reduced, and the user experience is improved.
Referring to fig. 2 and fig. 3, fig. 2 is a flowchart illustrating another method for monitoring reliability of grid resources according to an embodiment of the present invention; fig. 3 is a flowchart of a test procedure executed by another grid resource reliability monitoring method according to an embodiment of the present invention. The method can comprise the following steps:
step 201: and acquiring resources from the grid environment according to the test instruction or according to a preset time interval.
The step is similar to step 101, and is not described herein again.
Step 202: and according to a preset matching rule, sequentially matching the attribute value of each attribute of the resource with the preset attribute values of all the test programs.
It can be understood that, since a resource may contain multiple attributes, corresponding to each attribute of the resource, each test program may define a corresponding target attribute value, when the resource loads the test program, the attribute value of each attribute of the resource is matched with the target attribute value (preset attribute value) of the test program one by one, if all the attributes are successfully matched, the test matching is successful, and if one attribute is failed to be matched, the test matching is failed.
The specific rules for testing attribute matching may be as follows:
If RA is NULL and TA is ANY,then S.
If RA is ANY and TA is NULL,then S.
If RA is MUL and TA is SIG and TA in RA,then S,else F.
If RA is SIG and TA is MUL and RA in TA,then S,else F.
If RA is MUL and TA is MUL and RA ins TA,then S,else F.
If RA is SIG and TA is SIG and RA eq TA,then S,else F.
it should be noted that, as to the way of matching the corresponding test program for the different attributes of the resource, that is, the specific setting of the preset matching rule, the matching way of the attribute value may be used, and other ways may also be used, which is not limited in this embodiment.
Step 203: judging whether each attribute of the resource is matched with a corresponding test program; if yes, go to step 204.
It can be understood that, the method provided by this embodiment may be a step of successfully matching all test programs corresponding to the resources, and then performing loading and execution of the test programs, so as to ensure accuracy of monitoring reliability of the grid resources. The following steps of loading and executing the test program may be performed every time a test program is matched, which is not limited in any way by the embodiment.
It can be understood that, for the case that each attribute of a resource is not matched with a corresponding test program, the reliability evaluation of the resource may not be performed any more, that is, the step of loading and executing the test program matched with the resource may not be performed any more; the user may also be notified of the situation; the reliability evaluation of the resource can be continued and the user is informed of the condition; or judging whether to continue to evaluate the reliability of the resource according to the number of the test programs matched with the resource. The present embodiment is not limited to this.
Step 204: and loading a test program.
For the loading of the test program, a plurality of test programs corresponding to the resource can be loaded simultaneously; or sequentially loading a plurality of test programs corresponding to the resources. The present embodiment is not limited to this.
Step 205: and executing the test program.
The Test program in the method provided by this embodiment may include two types, that is, a Job Test and a C L I Test, and the specific flow of this step may be as shown in fig. 3, including:
step 301: judging whether test operation needs to be submitted or not; if not, go to step 302; if yes, go to step 303.
It is understood that this step may be a judgment of two types of Test programs, namely, the Job Test and the C L I Test, and since the Job Test needs to submit a Test Job to the site, whether the Test Job needs to be submitted may be used as a distinguishing point between the two types of Test programs.
And step 302, executing the C L I command and outputting the reliability evaluation result of the resource.
It should be noted that, for the Test program of C L I Test, that is, the Test program corresponding to the storage resource and the network resource, the reliability evaluation result of the resource can be output by directly executing the C L I command.
Step 303: and submitting the test operation to the site, acquiring an output file of the site executing the test operation, and outputting a reliability evaluation result of the resource according to the output file.
Because the Test programs such as the Job Test t, namely the Test programs corresponding to the computing resources, need to submit Test jobs to the site, and after the site executes the Test jobs, the reliability evaluation results of the resources are output according to the output files downloaded and obtained from the site.
It should be understood that, as long as the reliability evaluation result of the resource can be output, the embodiment does not make any limitation on the specific interaction process with the site in this step. Specifically, the method can further include a monitoring step of executing the test operation on the site, so as to ensure that the running process of the test program is known at any time.
It should be noted that, for the specific execution state of the Test program, as shown in fig. 4, a Submitted may indicate that the Test Job is Submitted, Running may indicate that the Test Job or the C L I command is Running, Completed may indicate that the Test Job is Running completely but the output file is not yet downloaded, Done may indicate that the download of the Job Test output file or the Running of the C L I command is Completed, and Timeout may indicate that the execution of the Test program is timed out.
Step 206: judging whether the execution of the test program is finished; if not, go to step 207; if yes, go to step 208.
It is understood that, during the re-execution of each test program, the step may be continuously performed, or the step may be performed at preset time intervals, which is not limited in any way by the embodiment.
Step 207: judging whether the execution time of the test program is greater than a preset time or not; if yes, go to step 208; if not, go to step 206.
The predetermined time may be a maximum time for determining the execution of the test program set by the user and the designer, and if the maximum time is exceeded, it may be determined that the test program cannot evaluate the reliability of the resource. The specific value of the preset time can be set by the user and the designer according to the practical scene and the user requirement, and the embodiment is not limited to this.
It should be noted that, the steps 206 and 207 may be a step of determining whether to timeout for each test program. The judgment on whether the test program is executed may be performed in the manner provided by this embodiment, or may be performed in other manners, which is not limited in this embodiment.
Step 208: and outputting the reliability evaluation result of the resource.
The reliability evaluation result in the method provided by this embodiment can be represented by OK, Bad, and Unknown. OK may indicate that the resource is not running abnormally, Bad may indicate that the resource is out of order, and Unknown may indicate that the reliability status of the resource is not determinable for some Unknown reason. Each test program evaluates the reliability of the attribute of the corresponding resource, and the corresponding relationship between the execution state of the test program and the reliability evaluation result may be: the tests for the Submitted, Completed, and Timeout states correspond to the Unknown state; the Done state test corresponds to an OK or Bad state according to the output result.
It can be understood that, for the specific content of the reliability evaluation result, as shown in this embodiment, the specific content may also be changed correspondingly according to the settings of different test programs, and this embodiment does not limit this.
Step 209: judging whether all the test programs are executed completely; if yes, go to step 210; if not, go to step 205.
It is understood that this step is to evaluate the overall reliability status of the resource in step 210 in order to ensure that all test procedures corresponding to the resource are completed. This step may not be performed or may be performed in the process of evaluating the overall reliability state of the resource, which is not limited in this embodiment.
Step 210: and integrating the reliability evaluation results of the resources output by all the test programs to obtain the reliability state of the resources.
The reliability state of the resource is evaluated by adopting a worst-priority principle, namely if one test judgment program is Bad, the state is Bad, if no test program is Bad and the test program is OK, the state is OK, and if all the tests are Unknown, the state is Unknown. Other principles may also be adopted to evaluate the reliability status of the resource, and the embodiment is not limited to this.
It is understood that, if the reliability state evaluation method based on the worst priority principle provided by the present embodiment is adopted, the method provided by the present embodiment may be adopted, or other similar methods may be adopted, for example, step 209 is not required, and after each test program outputs a result, whether the result is Bad is determined; if yes, the reliability state of the resource is Bad; if not, judging whether all the test programs are executed completely. The present embodiment does not set any limit to this.
In this embodiment, by sequentially matching the attribute value of each attribute of a resource with the preset attribute values of all test programs according to a preset matching rule, each attribute of the resource can be matched with a corresponding test program; by loading and executing the test program, the reliability of each attribute of the resource can be evaluated; the reliability state of the resource is obtained by synthesizing the reliability evaluation results of the resource output by all the test programs, the integral reliability state of the resource can be obtained, the reliability of each resource is displayed more visually, a user can conveniently and accurately position the root cause of the resource fault, the time spent by the user on fault removal is reduced, and the user experience is improved.
Referring to fig. 5 and fig. 6, fig. 5 is a structural diagram of a grid resource reliability monitoring device according to an embodiment of the present invention, and fig. 6 is a schematic diagram of a framework of a grid resource reliability monitoring device according to an embodiment of the present invention. The apparatus may include:
a resource obtaining module 100, configured to obtain resources from a grid environment according to a test instruction or at preset time intervals;
the test program loading module 200 is used for loading the corresponding test program according to the attribute of the resource;
the test program execution module 300 is configured to execute the test program and output a reliability evaluation result of the resource.
Optionally, the test program loading module 200 may include:
the matching submodule is used for matching the attribute value of each attribute of the resource with the preset attribute values of all the test programs in sequence according to a preset matching rule;
the first judgment submodule is used for judging whether each attribute of the resource is matched with the corresponding test program; if yes, a matching success signal is sent to the loading sub-module;
and the loading submodule is used for receiving the matching success signal and loading the test program.
Optionally, the test program executing module 300 may include:
the second judgment submodule is used for judging whether the test operation needs to be submitted, if not, the first starting signal is sent to the C L I test submodule, and if so, the second starting signal is sent to the test operation submodule;
the C L I test submodule is used for receiving the first starting signal, executing a C L I command and outputting a reliability evaluation result of the resource;
and the test operation submodule is used for receiving the second starting signal, submitting the test operation to a site, acquiring an output file of the site for executing the test operation, and outputting the reliability evaluation result of the resource according to the output file.
Optionally, the apparatus may further include:
the judging module is used for judging whether all the test programs are executed completely; if so, sending a third starting signal to the reliability state module;
and the reliability state module is used for receiving a third starting signal, synthesizing the reliability evaluation results of the resources output by all the test programs and acquiring the reliability state of the resources.
Optionally, the test program execution module 300 may further include:
the continuous judgment sub-module is used for continuously judging whether the execution time of the test program is greater than the preset time or not in the execution process of the test program; if yes, a fourth starting signal is sent to the overtime output submodule;
and the timeout output submodule is used for receiving the fourth starting signal and directly outputting the reliability evaluation result of the resource.
It can be understood that, the framework structure of the grid Resource reliability monitoring apparatus provided by this embodiment may be as shown in fig. 6, where Resource Container is used to control the whole monitoring process, and matches the Test program to be executed according to the attribute of the Resource, and loads the Test program through Test L loader, and then passes the Test program to Test execution, and after all the Test programs are executed, the reliability status of the Resource is evaluated through stateobserver, and the result is written into the Database (DB).
In this embodiment, the test program loading module 200 loads the corresponding test program according to the attribute of the resource, so that the test program corresponding to the attribute of the resource can be loaded for the resource with different attributes, such as the resource type and the management domain to which the resource belongs, and the accuracy of reliability evaluation on the resource is ensured; the test program executing module 300 executes the test program, outputs the reliability evaluation result of the resource, and can acquire the reliability evaluation result corresponding to each of the different attributes of each resource, thereby displaying the reliability of each attribute of each resource more intuitively, quickly and accurately positioning the source of the resource fault, reducing the time spent by a user in troubleshooting, and improving the user experience.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The method and the device for monitoring the reliability of the grid resources provided by the invention are described in detail above. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims (8)

1. A method for monitoring reliability of grid resources is characterized by comprising the following steps:
acquiring resources from a grid environment according to a test instruction or according to a preset time interval;
loading a corresponding test program according to the attribute of the resource; matching a corresponding test program according to the attribute of the resource; the test program is set according to the principles of correctness, comprehensiveness and single functionality;
executing the test program and outputting a reliability evaluation result of the resource;
the loading of the corresponding test program according to the attribute of the resource comprises:
according to a preset matching rule, sequentially matching the attribute value of each attribute of the resource with the preset attribute values of all the test programs;
judging whether each attribute of the resource is matched with the corresponding test program;
and if so, successfully matching, loading the test program, executing the test program, and outputting the reliability evaluation result of the resource.
2. The grid resource reliability monitoring method according to claim 1, wherein the executing the test program and outputting the reliability evaluation result of the resource comprises:
judging whether test operation needs to be submitted or not;
if not, executing a C L I command, and outputting the reliability evaluation result of the resource;
if so, submitting the test operation to a site, acquiring an output file of the site for executing the test operation, and outputting the reliability evaluation result of the resource according to the output file.
3. The grid resource reliability monitoring method according to claim 1 or 2, further comprising:
judging whether all the test programs are executed completely;
if yes, integrating the reliability evaluation results of the resources output by all the test programs to obtain the reliability state of the resources.
4. The grid resource reliability monitoring method according to claim 3, wherein the executing the test program and outputting the reliability evaluation result of the resource further comprises:
in the execution process of the test program, continuously judging whether the execution time of the test program is greater than a preset time;
and if so, directly outputting the reliability evaluation result of the resource.
5. A grid resource reliability monitoring apparatus, comprising:
the resource acquisition module is used for acquiring resources from the grid environment according to the test instruction or according to a preset time interval;
the test program loading module is used for loading the corresponding test program according to the attribute of the resource; matching a corresponding test program according to the attribute of the resource; the test program is set according to the principles of correctness, comprehensiveness and single functionality;
the test program execution module is used for executing the test program and outputting the reliability evaluation result of the resource;
the test program loading module comprises:
the matching submodule is used for matching the attribute value of each attribute of the resource with the preset attribute values of all the test programs in sequence according to a preset matching rule;
the first judgment submodule is used for judging whether each attribute of the resource is matched with the corresponding test program; if yes, a matching success signal is sent to the loading sub-module;
and the loading submodule is used for receiving the matching success signal and loading the test program.
6. The grid resource reliability monitoring device of claim 5, wherein the test program execution module comprises:
the second judgment submodule is used for judging whether the test operation needs to be submitted, if not, the first starting signal is sent to the C L I test submodule, and if so, the second starting signal is sent to the test operation submodule;
the C L I test submodule is used for receiving the first starting signal, executing a C L I command and outputting a reliability evaluation result of the resource;
and the test operation submodule is used for receiving the second starting signal, submitting the test operation to a site, acquiring an output file of the site for executing the test operation, and outputting the reliability evaluation result of the resource according to the output file.
7. The grid resource reliability monitoring apparatus according to claim 5 or 6, further comprising:
the judging module is used for judging whether all the test programs are executed completely; if so, sending a third starting signal to the reliability state module;
and the reliability state module is used for receiving a third starting signal, synthesizing the reliability evaluation results of the resources output by all the test programs and acquiring the reliability state of the resources.
8. The grid resource reliability monitoring apparatus of claim 7, wherein the test program execution module further comprises:
the continuous judgment sub-module is used for continuously judging whether the execution time of the test program is greater than the preset time or not in the execution process of the test program; if yes, a fourth starting signal is sent to the overtime output submodule;
and the timeout output submodule is used for receiving the fourth starting signal and directly outputting the reliability evaluation result of the resource.
CN201710187307.0A 2017-03-27 2017-03-27 Grid resource reliability monitoring method and device Active CN106656684B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710187307.0A CN106656684B (en) 2017-03-27 2017-03-27 Grid resource reliability monitoring method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710187307.0A CN106656684B (en) 2017-03-27 2017-03-27 Grid resource reliability monitoring method and device

Publications (2)

Publication Number Publication Date
CN106656684A CN106656684A (en) 2017-05-10
CN106656684B true CN106656684B (en) 2020-07-24

Family

ID=58848574

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710187307.0A Active CN106656684B (en) 2017-03-27 2017-03-27 Grid resource reliability monitoring method and device

Country Status (1)

Country Link
CN (1) CN106656684B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110581785B (en) * 2018-06-11 2021-07-30 中国移动通信集团浙江有限公司 Reliability evaluation method and device
CN110413525B (en) * 2019-07-29 2023-05-23 国网新疆电力有限公司电力科学研究院 Safety testing method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103595579A (en) * 2013-11-07 2014-02-19 浪潮电子信息产业股份有限公司 QOS-oriented usability monitoring model of cloud computing resources and obtaining method thereof
CN106155883A (en) * 2015-03-30 2016-11-23 华为技术有限公司 A kind of virtual machine method for testing reliability and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727389B (en) * 2009-11-23 2012-11-14 中兴通讯股份有限公司 Automatic test system and method of distributed integrated service
CN103439629B (en) * 2013-08-05 2016-11-02 国家电网公司 Fault Diagnosis of Distribution Network systems based on data grids

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103595579A (en) * 2013-11-07 2014-02-19 浪潮电子信息产业股份有限公司 QOS-oriented usability monitoring model of cloud computing resources and obtaining method thereof
CN106155883A (en) * 2015-03-30 2016-11-23 华为技术有限公司 A kind of virtual machine method for testing reliability and device

Also Published As

Publication number Publication date
CN106656684A (en) 2017-05-10

Similar Documents

Publication Publication Date Title
US10534699B2 (en) Method, device and computer program product for executing test cases
US8756460B2 (en) Test selection based on an N-wise combinations coverage
CN111897724B (en) Automatic testing method and device suitable for cloud platform
CN107480039B (en) Small file read-write performance test method and device for distributed storage system
CN111124919A (en) User interface testing method, device, equipment and storage medium
US20100115496A1 (en) Filter generation for load testing managed environments
CN110750458A (en) Big data platform testing method and device, readable storage medium and electronic equipment
US20170262358A1 (en) Determining test case efficiency
US20080189686A1 (en) System and Method for Detecting Software Defects
US7043400B2 (en) Testing using policy-based processing of test results
US20180157581A1 (en) Automated system testing in a complex software environment
CN108572895B (en) Stability test method for automatically checking software and hardware configuration under Linux
CN114201408A (en) Regression testing method, device, computer equipment and storage medium
CN110990289B (en) Method and device for automatically submitting bug, electronic equipment and storage medium
CN106656684B (en) Grid resource reliability monitoring method and device
US11169910B2 (en) Probabilistic software testing via dynamic graphs
CN105183641B (en) The data consistency verification method and system of a kind of kernel module
CN111274130A (en) Automatic testing method, device, equipment and storage medium
CN113918465A (en) Compatibility testing method and device, electronic equipment and readable storage medium
EP4152715A1 (en) Method and apparatus for determining resource configuration of cloud service system
CN111400171B (en) Interface testing method, system and device and readable storage medium
CN115617668A (en) Compatibility testing method, device and equipment
CN115237441A (en) Upgrade test method, device and medium based on cloud platform
CN111679924B (en) Reliability simulation method and device for componentized software system and electronic equipment
CN114218072A (en) Test script generation method and device, storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant