CN114513401A - Automatic operation and maintenance repair method and device for private cloud and computer readable medium - Google Patents

Automatic operation and maintenance repair method and device for private cloud and computer readable medium Download PDF

Info

Publication number
CN114513401A
CN114513401A CN202210106160.9A CN202210106160A CN114513401A CN 114513401 A CN114513401 A CN 114513401A CN 202210106160 A CN202210106160 A CN 202210106160A CN 114513401 A CN114513401 A CN 114513401A
Authority
CN
China
Prior art keywords
health
private cloud
score
inspection
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210106160.9A
Other languages
Chinese (zh)
Inventor
白杰文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yunzhou Information Technology Co ltd
Original Assignee
Shanghai Yunzhou Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Yunzhou Information Technology Co ltd filed Critical Shanghai Yunzhou Information Technology Co ltd
Priority to CN202210106160.9A priority Critical patent/CN114513401A/en
Publication of CN114513401A publication Critical patent/CN114513401A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/60Business processes related to postal services
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07CTIME OR ATTENDANCE REGISTERS; REGISTERING OR INDICATING THE WORKING OF MACHINES; GENERATING RANDOM NUMBERS; VOTING OR LOTTERY APPARATUS; ARRANGEMENTS, SYSTEMS OR APPARATUS FOR CHECKING NOT PROVIDED FOR ELSEWHERE
    • G07C1/00Registering, indicating or recording the time of events or elapsed time, e.g. time-recorders for work people
    • G07C1/20Checking timed patrols, e.g. of watchman
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Marketing (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Educational Administration (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The scheme can firstly carry out patrol inspection on the private cloud according to preset patrol inspection items, acquires patrol inspection results, then carries out scoring calculation on the patrol inspection results based on preset scoring rules, acquires health scores of the private cloud, and can trigger execution of automatic repair processing corresponding to related patrol inspection items according to the health scores after the health scores are acquired. The health score can be obtained by automatically triggering calculation according to the routing inspection result and the preset scoring rule, so that the health score is not influenced by the subjective judgment standard of the operation and maintenance personnel, the health judgment result has a more uniform standard, the automatic repair processing can be automatically triggered by the specific condition of the health score, the professional operation and maintenance personnel do not need to wait for analysis and give corresponding repair measures, the dependence on the operation and maintenance personnel can be reduced, and the problem of repair lag is solved.

Description

Automatic operation and maintenance repair method and device for private cloud and computer readable medium
Technical Field
The present application relates to the field of information technologies, and in particular, to an automated operation and maintenance repair method and device for a private cloud, and a computer-readable medium.
Background
The private cloud platform generally regularly and regularly performs health inspection screening, and acquires data through various technical modes such as an inspection tool. The traditional inspection tool can collect data of a private cloud platform, but the collected data are unformatted data generally, and most of the time, experienced operation and maintenance personnel familiar with operation and maintenance of the private cloud platform need to judge the content of an inspection result, judge whether the environment of the current cloud platform is healthy according to experience, and provide a corresponding repair scheme for screened health problems.
However, this method relies too much on the mastery of the operation and maintenance personnel on the private cloud platform and a lot of operation and maintenance experiences, the health evaluation of the platform is easily based on subjective judgment, and different judgment standards may exist among different operation and maintenance personnel for the same inspection result, so that different health evaluation results are obtained. Meanwhile, for the screened health problems, professional operation and maintenance personnel need to wait for analysis and corresponding repair measures, so that the problem repair hysteresis is too strong, the operation and maintenance personnel seriously depend on the working time of the operation and maintenance personnel, and the screened problems are difficult to repair in time when the personnel are in short supply.
Disclosure of Invention
An object of the present application is to provide an automated operation and maintenance repair method, device and computer readable medium for private cloud, so as to solve the problems that inspection standards of private cloud are not uniform, problem repair depends on manpower, and hysteresis is strong in the prior art.
In order to achieve the above object, the present application provides an automated operation and maintenance repair method for a private cloud, which is characterized in that the method includes:
polling the private cloud according to a preset polling item to obtain a polling result;
based on a preset scoring rule, scoring calculation is carried out on the routing inspection result, and a health score of the private cloud is obtained;
and triggering and executing automatic repair processing corresponding to the related inspection items according to the health scores.
Further, based on a preset scoring rule, scoring calculation is carried out on the routing inspection result, and health scores of the private cloud are obtained, wherein the scoring calculation comprises the following steps:
and for the routing inspection item executed by the single node, determining a score corresponding to the level according to the level corresponding to the routing inspection result, and taking the score as a first health score of the routing inspection item on the node in the private cloud.
Further, based on a preset scoring rule, scoring calculation is carried out on the routing inspection result, and health scores of the private cloud are obtained, wherein the scoring calculation comprises the following steps:
for the routing inspection items executed by multiple nodes, determining scores corresponding to the levels according to the levels corresponding to the routing inspection results, and taking the scores as first health scores of the routing inspection items on the nodes in the private cloud respectively;
and summarizing the first health scores of the routing inspection items on each node in the private cloud to obtain second health scores of the routing inspection items.
Further, based on a preset scoring rule, scoring calculation is carried out on the routing inspection result, and the health score of the private cloud is obtained, wherein the scoring calculation comprises the following steps:
acquiring a polling result of a polling item executed on a node;
determining a grade corresponding to the grade according to the grade corresponding to the inspection result, and taking the grade as a first health grade of the inspection item in the private cloud on a node;
and summarizing the first health scores of the routing inspection items on the nodes to obtain third health scores of the nodes.
Further, based on a preset scoring rule, scoring calculation is carried out on the routing inspection result, and health scores of the private cloud are obtained, wherein the scoring calculation comprises the following steps:
acquiring health scores of all polling items in the private cloud based on a preset scoring rule;
and summarizing the health scores of the routing inspection items in the private cloud to obtain the whole health score of the private cloud.
Further, the summary process is to calculate an arithmetic average or a weighted average.
Further, according to the health score, triggering and executing automatic repair processing corresponding to related routing inspection items, including:
and when the health score meets a preset condition, checking whether an automatic repairing processing method corresponding to the related inspection item exists, and if the corresponding automatic repairing processing method exists, executing the automatic repairing processing method.
Further, the method further comprises:
and generating a patrol inspection report comprising processing suggestions related to related patrol inspection items according to the health scores, and sending the patrol inspection report to a preset user.
According to another aspect of the present application, there is also provided an automatic operation and maintenance repair device for a private cloud, the device including a memory for storing computer program instructions and a processor for executing the computer program instructions, wherein when the computer program instructions are executed by the processor, the device is triggered to execute the automatic operation and maintenance repair method for the private cloud.
The embodiment of the application also provides a computer readable medium, on which computer program instructions are stored, and the computer program instructions can be executed by a processor to implement the automatic operation and maintenance repair method for the private cloud.
Compared with the prior art, the automatic operation and maintenance repair scheme for the private cloud is provided, the scheme can firstly patrol the private cloud according to preset patrol items, patrol results are obtained, then grading calculation is carried out on the patrol results based on preset grading rules, health grading of the private cloud is obtained, and after health grading is obtained, automatic repair processing corresponding to the related patrol items is triggered and executed according to the health grading. The health score can be obtained by automatically triggering calculation according to the routing inspection result and the preset scoring rule, so that the health score is not influenced by the subjective judgment standard of the operation and maintenance personnel, the health judgment result has a more uniform standard, the automatic repair processing can be automatically triggered by the specific condition of the health score, the professional operation and maintenance personnel do not need to wait for analysis and give corresponding repair measures, the dependence on the operation and maintenance personnel can be reduced, and the problem of repair lag is solved.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
fig. 1 is a processing flow chart of an automated operation and maintenance repair method for a private cloud according to an embodiment of the present application;
fig. 2 is a processing flow chart of implementing automated operation and maintenance repair of a private cloud by using the scheme of the embodiment of the present application;
the same or similar reference numbers in the drawings identify the same or similar elements.
Detailed Description
The present application is described in further detail below with reference to the attached figures.
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In a typical configuration of the present application, the terminal, the devices serving the network each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, which include both non-transitory and non-transitory, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
Some embodiments of the application provide an automatic operation and maintenance repair method for a private cloud, and the method can firstly patrol the private cloud according to preset patrol items, obtain patrol results, then score and calculate the patrol results based on preset scoring rules, obtain health scores of the private cloud, and after obtaining the health scores, trigger execution of automatic repair processing corresponding to relevant patrol items according to the health scores. The health score can be obtained by automatically triggering calculation according to the routing inspection result and the preset scoring rule, so that the health score is not influenced by the subjective judgment standard of the operation and maintenance personnel, the health judgment result has a more uniform standard, the automatic repair processing can be automatically triggered by the specific condition of the health score, the professional operation and maintenance personnel do not need to wait for analysis and give corresponding repair measures, the dependence on the operation and maintenance personnel can be reduced, and the problem of repair lag is solved.
In an actual scenario, the execution subject of the method may be a user device, a network device, or a device formed by integrating the user device and the network device through a network, or may also be an application program running on the device. The user equipment comprises but is not limited to various terminal equipment such as a computer, a mobile phone and a tablet computer; including but not limited to implementations such as a network host, a single network server, multiple sets of network servers, or a cloud-computing-based collection of computers. Here, the Cloud is made up of a large number of hosts or web servers based on Cloud Computing (Cloud Computing), which is a type of distributed Computing, one virtual computer consisting of a collection of loosely coupled computers.
Fig. 1 shows a processing flow of an automated operation and maintenance repair method for a private cloud according to an embodiment of the present application, which at least includes the following steps:
and S101, polling the private cloud according to a preset polling item to obtain a polling result. The polling items refer to items for checking various health states in the private cloud, and may include, for example, checking of backup tasks of a management node database, checking of expired licenses, checking of used capacity of a management node system disk, checking of average utilization rate of a physical machine CPU, checking of memory utilization rate of a physical machine, and the like.
And S102, carrying out rating calculation on the inspection result based on a preset rating rule, and acquiring the health rating of the private cloud. In an actual scene, the scores of the inspection results can be divided into three dimensions, namely the score of a single inspection item dimension, the score of a host dimension multi-inspection item and the score of the whole private cloud platform dimension.
The scores of the dimensions of the single routing inspection item can be divided into routing inspection items executed by a single node and routing inspection items executed by multiple nodes, wherein the routing inspection items executed by the single node refer to the inspection of the routing inspection items aiming at a certain state of a single private cloud node, and therefore the routing inspection result obtained by executing the routing inspection items on the single node is considered.
And for the inspection item executed by the single node, when the inspection result is subjected to scoring calculation based on a preset scoring rule to obtain the health score of the private cloud, determining the score corresponding to the grade according to the grade corresponding to the inspection result, and taking the score as the first health score of the inspection item on the node in the private cloud. For example, taking the inspection item of the inspection for which the license is expired as an example, if the inspection result is that the license is expired for more than N days, the inspection result may be set to Normal, if the inspection result is that the license is expired for less than N days, the inspection result may be set to Warn, and if the inspection result is that the license is expired for more than N days, the inspection result may be set to Critical. The scores corresponding to the different levels may be preset in the preset scoring rule, for example, in this embodiment, the three levels of the inspection result may respectively correspond to the following three different scores:
normal:100
warn:50
critical:0
therefore, for the routing inspection item executed by a single node, the score corresponding to the level of the routing inspection result can be determined according to the level corresponding to the routing inspection result, for example, if the level corresponding to the routing inspection result of the routing inspection item p1 on one node n1 is normal, the score corresponding to the level can be determined to be 100, and at this time, the score can be used as the first health score of the routing inspection item on the node n1 in the private cloud, namely the first health score of the routing inspection item p1 on the node n1, to be 100. Similarly, if the level of the inspection result of the inspection item p1 on a certain node n2 is war, the first health score of the inspection item p1 on the node n2 can be determined to be 50.
It should be understood by those skilled in the art that the above-mentioned level setting of the inspection result and the corresponding scores of the different levels are only examples, and other forms based on similar principles, which exist now or later come, should be included in the scope of protection of the present application if applicable, and are included herein by reference. For example, in order to achieve finer-grained evaluation, a greater number of levels, such as four, five or even more inspection result levels, may be set, and the scores corresponding to the levels may be set to different specific values according to the requirements of the actual scene.
When the inspection result is subjected to score calculation based on a preset score rule for the inspection items executed by multiple nodes to obtain the health score of the private cloud, the score corresponding to the level can be determined according to the level corresponding to the inspection result, and the score is used as the first health score of the inspection items in the private cloud on each node; and then, summarizing the first health scores of the routing inspection items on each node in the private cloud to obtain second health scores of the routing inspection items.
The routing inspection item executed by the multiple nodes refers to that the routing inspection object relates to a plurality of nodes, but not to a single node, so the processing principle can be that after the first health scores of the routing inspection item on the single node are determined by adopting the processing mode of the routing inspection item executed by the single node, the first health scores are summarized to determine the second health scores of the routing inspection item on the plurality of nodes. For example, if the patrol item p2 is a multi-node executed patrol item, the nodes involved in the patrol item include n1, n2 and n3, and the grades of the corresponding patrol results of the patrol item p2 on the three nodes are normal, war and critical, the first health scores of the patrol item p2 on the single nodes n1, n2 and n3 can be determined to be 100, 50 and 0, and then the three first health scores can be summarized to obtain the second health score of the patrol item.
The summary processing may be calculating an arithmetic mean value, and the second health score of the patrol item may be calculated by using the following calculation formula:
Figure BDA0003493548110000071
wherein B is the second health score of the patrol item executed by multiple nodes, AniFor the first health score of a patrol item on a single node ni, i is the number of nodes involved in the patrol item. Taking the foregoing scenario as an example, the first health scores of the patrol item p2 on the single nodes n1, n2 and n3 may be determined, and when the three first health scores are 100, 50 and 0, a summary process may be performed to obtain the second health score 50 of the patrol item p 2. The data accuracy in the summary processing may be set according to the requirements of the actual scene, for example, when the second health score is calculated, the calculation result is m after the decimal point is retained or the integer is obtained. In some embodiments of the present application, the above calculation formula may be adjusted to a certain extent, that is, the following calculation formula is adopted to calculate the second health score:
Figure BDA0003493548110000072
wherein the content of the first and second substances,
Figure BDA0003493548110000073
partial computation results to preserve the 2 bits after the decimal point.
Through the mode, the scoring of the dimension of the single inspection item can be realized.
The scoring of the host dimension multiple routing inspection items can be realized in the following manner. The host corresponds to a node in the private cloud, so that when the evaluation of multiple inspection items of the host dimension is realized, the inspection result of the inspection item executed on the node can be obtained firstly, then the grade corresponding to the grade is determined according to the grade corresponding to the inspection result, the grade is used as the first health grade of the inspection item on the node in the private cloud, then the first health grade of each inspection item on the node is summarized, and the third health grade of the node is obtained. For example, for the node n1, if the patrol items executed for the node n1 include patrol items p1, p2, p3 and p4, and the levels of the corresponding patrol results are normal, war and critical, respectively, then based on the scores corresponding to these levels, the first health scores of the patrol items p1, p2, p3 and p4 on the node n1 in the private cloud can be determined to be 100, 50 and 0, respectively. Then, the first health scores of the routing inspection items on the node n1 may be summarized, for example, the summary processing in this embodiment may also adopt a method of calculating an arithmetic mean value, and at this time, the third health score may be calculated as follows:
Figure BDA0003493548110000081
wherein D is a third health score of a plurality of routing inspection items of a node, CpjA first health score for a patrol item pj executed on a node, where j is the number of patrol items executed on the nodeAmount of the compound (A). Taking the foregoing scenario as an example, since the first health scores corresponding to the four patrol inspection items p1, p2, p3 and p4 on the node n1 are 100, 50 and 0, respectively, the third health score of the node n1 can be obtained by performing the aggregation processing. Similarly, the data precision during the summary processing in this embodiment may also be set according to the requirement of the actual scene, for example, m after the decimal point is reserved is equal to or is rounded. In some embodiments of the present application, the above calculation formula may also be adjusted to a certain degree, that is, the following calculation formula is used to calculate the third health score:
Figure BDA0003493548110000082
wherein the content of the first and second substances,
Figure BDA0003493548110000083
partial computation results to preserve the 2 bits after the decimal point.
In addition, because different patrol items may have differences in evaluation importance of the health status, different weights may be set for each patrol item, for example, the weights corresponding to different patrol items in this embodiment may be set within a range of 1 to 100, and a larger weight indicates that the patrol item has a larger influence on the evaluation of the health status of the host, so that the third health score may be calculated by using the following formula:
Figure BDA0003493548110000084
wherein, WpjIs the weight of the patrol entry pj executed on a node.
For the scoring of the whole private cloud platform dimension, the health scoring of each routing inspection item in the private cloud can be obtained based on a preset scoring rule, and then the health scoring of each routing inspection item in the private cloud is summarized to obtain the whole health scoring of the private cloud. In the summary processing, the arithmetic average or the weighted average may be calculated as described above. For example, if a weighted average is adopted in this embodiment, the overall health score of the private cloud may be obtained by using the following specific calculation formula:
Figure BDA0003493548110000091
wherein Grade is the overall health score of the private cloud, submGrade is the health score of the patrol item pm, submWeight is the weight of the patrol item pm, and m is the number of patrol items executed in the private cloud. Similarly, the data precision during the summary processing in this embodiment may also be set according to the requirement of the actual scene, for example, m after the decimal point is reserved is equal to or is rounded. In some embodiments of the present application, the above calculation formula may also be adjusted to some extent, that is, the following calculation formula is adopted to calculate the overall health score of the private cloud:
Figure BDA0003493548110000092
wherein the content of the first and second substances,
Figure BDA0003493548110000093
partial computation results to preserve the 2 bits after the decimal point.
In an actual scenario, the patrol items may include patrol items executed by a single node and patrol items executed by multiple nodes, so that the health scores of the two types of patrol items may be determined by the foregoing calculation method. Determining a grade corresponding to a grade of an inspection item executed by a single node according to the grade corresponding to the inspection result, and taking the grade as a first health grade of the inspection item on the node in the private cloud; and for the routing inspection items executed by the multiple nodes, determining scores corresponding to the levels according to the levels corresponding to the routing inspection results, taking the scores as first health scores of the routing inspection items on each node in the private cloud, summarizing the first health scores of the routing inspection items on each node in the private cloud, and acquiring second health scores of the routing inspection items. Therefore, when the health scores of the polling items are summarized to obtain the overall health score of the private cloud, if the polling items are polling items executed by a single node, the health scores of the polling items can adopt a first health score during summary processing, and if the polling items are polling items executed by multiple nodes, the health scores of the polling items can adopt a second health score during summary processing.
And step S103, triggering and executing automatic repair processing corresponding to the related inspection items according to the health scores. The automatic repair processing can be based on an automatic processing method set in advance according to the routing inspection item, and when the related health score meets the triggering condition, the automatic processing method can be triggered to be executed, so that the automatic repair processing corresponding to the related routing inspection item is realized. For example, if the patrol entry p1 is a patrol entry that checks the main storage reserved capacity, its corresponding automated processing method may be preset to adjust the reserved capacity configuration to the preset value 200G.
In an actual scenario, after the health scores are obtained according to the above processing, the overall health score and the health scores of the individual items (such as the health scores of the inspection items or the health scores of the nodes) may be evaluated, and if the health scores satisfy a certain condition, the automatic repair processing corresponding to the related inspection items may be triggered to be executed. Therefore, when the execution of the automatic repair processing corresponding to the related inspection item is triggered according to the health score, the health score can be matched with a preset condition, when the health score meets the preset condition, whether the automatic repair processing method corresponding to the related inspection item exists or not is checked, and if the automatic repair processing method corresponding to the related inspection item exists, the automatic repair processing method is executed.
For example, the preset conditions may be set such that when the health score is 0 points corresponding to the critical level or less than or equal to 50 points corresponding to the war level, it is possible to check whether there is a corresponding automated processing method when the preset conditions are satisfied. And after the inspection of all the inspection items is finished, loading all the automatic processing methods obtained by inspection, thereby executing the automatic repair processing corresponding to the related inspection items.
Taking an example of a main storage reserved capacity polling item in the global configuration, the corresponding codes at least comprise the following contents:
Figure BDA0003493548110000101
Figure BDA0003493548110000111
for the patrol item, the preset condition that triggers the automatic repair process may be configured that the health score of the patrol item is 50 points, that is, when the level of the patrol result is war or critical, the reserved capacity of the main storage in the private cloud environment is represented to be smaller than a preset capacity threshold, for example, the capacity threshold may be set to be 200G in the embodiment. Through the configuration file, the preset conditions, the automatic processing method and the like corresponding to the inspection items can be configured, so that automatic operation and maintenance repair is realized. The following is the partial code content of the automated process corresponding to the patrol item:
Figure BDA0003493548110000112
Figure BDA0003493548110000121
therefore, automatic repair processing of the corresponding inspection item can be achieved according to the name of the inspection item and the corresponding health score. In the patrol item named check primary storage reserved capacity, the content "zstack-cli update global configuration" primary storage name "reserved dcapacity value" 200G "of the expr field is an automatic processing method for the health score of the war level, and the specific processing content is that the reserved capacity configuration of the main storage is adjusted to the preset value 200G, and by means of the automatic processing method, after the reserved capacity is increased to 200G, the problem of insufficient storage space caused by the fact that the preset capacity is too small can be avoided, and therefore the health state of the private cloud is ensured to be in a normal state. Therefore, through the mechanism, the closed-loop processing process from the marking after the inspection to the repairing can be completed, and the dependence on professional after-sale operation and maintenance personnel is solved.
In other embodiments of the present application, in addition to performing the automatic repair process corresponding to the related inspection item, a process suggestion including the related inspection item may be generated according to the health score and sent to a preset user. The preset user may be an operation and maintenance person of the private cloud or another person with related rights, such as an administrator of the private cloud or another person authorized by the administrator. When the routing information is sent to the preset user, the routing information can be sent to a specified communication account, for example, a specified mailbox of the preset user, so that the preset user can timely check the processing suggestions related to the related routing inspection items, the preset user can conveniently perform manual intervention according to actual conditions, and the health state of the private cloud is ensured to be normal.
In an actual scenario, after checking that there is an automatic repair processing method corresponding to the related inspection item, the automatic repair processing method may be provided to a preset user in a form of a processing suggestion before the automatic repair processing method is executed, and according to feedback of the preset user, if the automatic repair processing method is confirmed to be executed, or the automatic repair processing method is rejected to be executed. When the preset user confirms to execute the automatic processing method, automatic repair processing of the private cloud is achieved, and when the preset user refuses to execute the automatic processing method, relevant operation and maintenance personnel can intervene manually to solve corresponding problems in other modes, so that the situation that some special problems cannot be effectively solved through the automatic processing method is avoided.
Fig. 2 shows a processing flow when the method provided by the embodiment of the present application is used to implement automatic operation and maintenance repair of a private cloud, where the processing flow at least includes the following steps:
and step S201, polling the private cloud according to the polling items.
And S202, loading the data of the inspection result.
Step S203, judging whether unprocessed polling items exist; if yes, go to step S204, otherwise go to step S206.
And step S204, scoring the single inspection item to obtain a corresponding health score.
And S205, saving the single health score of the single inspection item.
Step S206, determining a grading dimension, and if the grading dimension is in the cloud platform level, executing step S207; if the host level is selected, step S208 is executed.
Step S207, a private cloud platform level scoring process is performed, and a corresponding health score is obtained.
And S208, dividing the hosts according to the IP addresses of the hosts, and determining the routing inspection items corresponding to a certain host.
Step S209, a host level scoring process is performed to obtain a corresponding health score.
And step S210, evaluating the health state of the health score.
And step S211, loading a processing proposal and an automatic processing method according to the evaluation result.
Step S212 determines whether to perform automatic repair, if so, step S213 is executed, and if not, step S214 is executed.
Step S213, a corresponding automatic repair processing method is executed.
And step S214, generating a report containing the inspection item processing suggestion, and sending the report to a preset user.
Based on the same inventive concept, the embodiment of the application also provides an automatic operation and maintenance repair device of the private cloud, the corresponding method of the device is the automatic operation and maintenance repair method of the private cloud in the embodiment, and the problem solving principle is similar to the method. The automatic operation and maintenance repair device for the private cloud provided by the embodiment of the application comprises a memory for storing computer program instructions and a processor for executing the computer program instructions, wherein when the computer program instructions are executed by the processor, the device is triggered to implement the method and/or technical scheme of the embodiments of the application.
In particular, the methods and/or embodiments in the embodiments of the present application may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. The computer program, when executed by a processing unit, performs the above-described functions defined in the method of the present application.
It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart or block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer-readable medium carries one or more computer program instructions that are executable by a processor to implement the methods and/or aspects of the embodiments of the present application as described above.
It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In some embodiments, the software programs of the present application may be executed by a processor to implement the above steps or functions. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.
It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order. The numerical sequence of the sequence numbers corresponding to the steps does not indicate any specific execution sequence, and the steps can be executed in any sequence combination on the premise of conforming to the execution logic.

Claims (10)

1. An automated operation and maintenance repair method for a private cloud, the method comprising:
polling the private cloud according to a preset polling item to obtain a polling result;
based on a preset scoring rule, scoring calculation is carried out on the routing inspection result, and a health score of the private cloud is obtained;
and triggering and executing automatic repair processing corresponding to the related inspection items according to the health scores.
2. The method according to claim 1, wherein the step of performing score calculation on the inspection result based on a preset score rule to obtain the health score of the private cloud comprises the following steps:
and for the inspection item executed by the single node, determining a score corresponding to the level according to the level corresponding to the inspection result, and taking the score as a first health score of the inspection item on the node in the private cloud.
3. The method according to claim 1, wherein the step of performing score calculation on the inspection result based on a preset score rule to obtain the health score of the private cloud comprises the following steps:
for the routing inspection items executed by multiple nodes, determining scores corresponding to the levels according to the levels corresponding to the routing inspection results, and taking the scores as first health scores of the routing inspection items on the nodes in the private cloud respectively;
and summarizing the first health scores of the routing inspection items on each node in the private cloud to obtain second health scores of the routing inspection items.
4. The method according to claim 1, wherein the step of performing score calculation on the inspection result based on a preset score rule to obtain the health score of the private cloud comprises the following steps:
acquiring a polling result of a polling item executed on a node;
determining a grade corresponding to the grade according to the grade corresponding to the inspection result, and taking the grade as a first health grade of the inspection item in the private cloud on a node;
and summarizing the first health scores of the routing inspection items on the nodes, and acquiring third health scores of the nodes.
5. The method according to claim 1, wherein the step of performing score calculation on the inspection result based on a preset score rule to obtain the health score of the private cloud comprises the following steps:
acquiring health scores of all routing inspection items in the private cloud based on a preset scoring rule;
and summarizing the health scores of the routing inspection items in the private cloud to obtain the whole health score of the private cloud.
6. The method according to any one of claims 3 to 5, wherein the summary process is calculating an arithmetic mean or a weighted mean.
7. The method according to claim 1, wherein according to the health score, triggering execution of automatic repair processing corresponding to the related inspection items comprises:
and when the health score meets a preset condition, checking whether an automatic repairing processing method corresponding to the related inspection item exists, and if the corresponding automatic repairing processing method exists, executing the automatic repairing processing method.
8. The method of claim 1, further comprising:
and generating a patrol inspection report comprising processing suggestions related to related patrol inspection items according to the health scores, and sending the patrol inspection report to a preset user.
9. An automated operation and maintenance repair device of a private cloud, the device comprising a memory for storing computer program instructions and a processor for executing the computer program instructions, wherein the computer program instructions, when executed by the processor, trigger the device to perform the method of any one of claims 1 to 8.
10. A computer readable medium having stored thereon computer program instructions executable by a processor to implement the method of any one of claims 1 to 8.
CN202210106160.9A 2022-01-28 2022-01-28 Automatic operation and maintenance repair method and device for private cloud and computer readable medium Pending CN114513401A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210106160.9A CN114513401A (en) 2022-01-28 2022-01-28 Automatic operation and maintenance repair method and device for private cloud and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210106160.9A CN114513401A (en) 2022-01-28 2022-01-28 Automatic operation and maintenance repair method and device for private cloud and computer readable medium

Publications (1)

Publication Number Publication Date
CN114513401A true CN114513401A (en) 2022-05-17

Family

ID=81550987

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210106160.9A Pending CN114513401A (en) 2022-01-28 2022-01-28 Automatic operation and maintenance repair method and device for private cloud and computer readable medium

Country Status (1)

Country Link
CN (1) CN114513401A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115019413A (en) * 2022-07-02 2022-09-06 深圳市海曼科技股份有限公司 Method and device for automatic inspection management and computer equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109345658A (en) * 2018-10-29 2019-02-15 百度在线网络技术(北京)有限公司 Restorative procedure, device, equipment, medium and the vehicle of Vehicular system failure
CN112054937A (en) * 2020-08-18 2020-12-08 浪潮思科网络科技有限公司 SDN health inspection method, equipment and device in cloud network fusion environment
CN113537415A (en) * 2021-09-17 2021-10-22 中国南方电网有限责任公司超高压输电公司广州局 Convertor station inspection method and device based on multi-information fusion and computer equipment
CN113535474A (en) * 2021-06-30 2021-10-22 重庆紫光华山智安科技有限公司 Method, system, medium and terminal for automatically repairing heterogeneous cloud storage cluster fault

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109345658A (en) * 2018-10-29 2019-02-15 百度在线网络技术(北京)有限公司 Restorative procedure, device, equipment, medium and the vehicle of Vehicular system failure
CN112054937A (en) * 2020-08-18 2020-12-08 浪潮思科网络科技有限公司 SDN health inspection method, equipment and device in cloud network fusion environment
CN113535474A (en) * 2021-06-30 2021-10-22 重庆紫光华山智安科技有限公司 Method, system, medium and terminal for automatically repairing heterogeneous cloud storage cluster fault
CN113537415A (en) * 2021-09-17 2021-10-22 中国南方电网有限责任公司超高压输电公司广州局 Convertor station inspection method and device based on multi-information fusion and computer equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115019413A (en) * 2022-07-02 2022-09-06 深圳市海曼科技股份有限公司 Method and device for automatic inspection management and computer equipment

Similar Documents

Publication Publication Date Title
US10838839B2 (en) Optimizing adaptive monitoring in resource constrained environments
CN104376875B (en) Storage device life prediction, determine method and device
US8682850B2 (en) Method of enhancing de-duplication impact by preferential selection of master copy to be retained
US10656934B2 (en) Efficient software testing
CN107797933B (en) Method and device for generating simulation message
US20160299788A1 (en) Prioritising Event Processing Based on System Workload
CN115174353B (en) Fault root cause determining method, device, equipment and medium
CN112559291A (en) Resource monitoring method and device, electronic equipment and storage medium
CN114513401A (en) Automatic operation and maintenance repair method and device for private cloud and computer readable medium
CN109918189B (en) Resource management method and related equipment
CN111008119A (en) Method, device, equipment and medium for updating hard disk prediction model
CN113516065B (en) Data weight measuring and calculating method and device based on block chain, server and storage medium
JP2017102922A (en) Method, program and processing system for selective retention of data
CN110659280A (en) Road blocking abnormity detection method and device, computer equipment and storage medium
CN106161058A (en) A kind of alarm stage division and device
CN110019783A (en) Attribute term clustering method and device
CN113129127A (en) Early warning method and device
CN114710397B (en) Service link fault root cause positioning method and device, electronic equipment and medium
CN113918513B (en) Data migration method, device, equipment and storage medium based on block chain
CN116684306B (en) Fault prediction method, device, equipment and readable storage medium
CN116051287B (en) Data analysis method and device, electronic equipment and storage medium
US20230072913A1 (en) Classification based on imbalanced dataset
CN113344621B (en) Determination method and device for abnormal account and electronic equipment
KR102346364B1 (en) Systems and methods for virtual server resource usage metric evaluation and performance tracking
US11150971B1 (en) Pattern recognition for proactive treatment of non-contiguous growing defects

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination