CN117240688A - Cloud platform operation maintenance method, system, equipment and medium - Google Patents

Cloud platform operation maintenance method, system, equipment and medium Download PDF

Info

Publication number
CN117240688A
CN117240688A CN202311193149.1A CN202311193149A CN117240688A CN 117240688 A CN117240688 A CN 117240688A CN 202311193149 A CN202311193149 A CN 202311193149A CN 117240688 A CN117240688 A CN 117240688A
Authority
CN
China
Prior art keywords
target service
state
dimensions
abnormal
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311193149.1A
Other languages
Chinese (zh)
Inventor
娄云磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Jinan data Technology Co ltd
Original Assignee
Inspur Jinan data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Jinan data Technology Co ltd filed Critical Inspur Jinan data Technology Co ltd
Priority to CN202311193149.1A priority Critical patent/CN117240688A/en
Publication of CN117240688A publication Critical patent/CN117240688A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the field of computers, and particularly relates to a cloud platform operation maintenance method, a cloud platform operation maintenance system, cloud platform operation maintenance equipment and a cloud platform operation maintenance medium. The method comprises the following steps: acquiring state data of different dimensions of a target service; judging the state of the target service from a plurality of dimensions based on the state data of the different dimensions; and responding to the judgment that the target service is abnormal, acquiring a repairing scheme matched with the abnormality, and repairing the target service through the repairing scheme. According to the cloud platform operation maintenance method provided by the invention, the operation state of the target service of the cloud platform is evaluated from multiple dimensions, and when the evaluation results based on the multiple dimensions are abnormal, a scheme for solving the abnormality is obtained to automatically maintain the cloud platform.

Description

Cloud platform operation maintenance method, system, equipment and medium
Technical Field
The invention belongs to the field of computers, and particularly relates to a cloud platform operation maintenance method, a cloud platform operation maintenance system, cloud platform operation maintenance equipment and a cloud platform operation maintenance medium.
Background
The cloud management platform provides comprehensive cloud resource supply, operation and maintenance management capability, has core competitiveness of integrated management and control, automatic operation and maintenance, intelligent analysis, personalized expansion and the like, can quickly construct a cloud data center for a client, provides products including a server, calculation virtualization, storage virtualization, network virtualization and the like, simplifies IT, improves operation and maintenance efficiency, accelerates enterprise digital transformation, and expands digital transformation income of the client. With the development of cloud computing technology, more and more cloud platforms and service types appear, such as VM, KVM, openstack, kubernetes, and how to effectively monitor and ensure the normal and stable operation of each service is a problem in the current cloud management platform construction process.
In the traditional state maintenance method of the cloud platform, the operation state of the cloud platform needs to be monitored, and various monitoring modes are needed, for example, various operation state data are collected and judged through an artificial intelligent model.
However, the traditional mode adopts an artificial intelligent model or a preset strategy to effectively judge the sudden situation, so that the judgment on the state of the cloud platform is not accurate enough.
Disclosure of Invention
In order to solve the above problems, the present invention provides a cloud platform operation and maintenance method, including:
acquiring state data of different dimensions of a target service;
judging the state of the target service from a plurality of dimensions based on the state data of the different dimensions;
and responding to the judgment that the target service is abnormal, acquiring a repairing scheme matched with the abnormality, and repairing the target service through the repairing scheme.
In some embodiments of the present invention, the step of determining the state of the target service from a plurality of dimensions based on the state data of the different dimensions includes:
calculating quantization scales of the quantifiable data of multiple dimensions according to the time relation;
in response to inconsistent judgment results in multiple dimensions, comparing whether quantization scales of quantized data corresponding to the multiple dimensions are consistent or not, and judging whether comparison results of the quantization scales are consistent with judgment results of the multiple dimensions or not;
setting the judgment result of the dimension, of which the judgment result of the plurality of dimensions is abnormal, as misjudgment in response to the comparison result of the quantization scale being consistent with the judgment result of the plurality of dimensions;
and in response to the fact that the comparison result of the quantification ratio is inconsistent with the judgment result of the plurality of dimensions, setting the judgment result of the dimension, in which the judgment result of the plurality of dimensions is abnormal, as correct, and setting the target service as abnormal.
In some embodiments of the present invention, the step of determining the state of the target service from a plurality of dimensions based on the state data of the different dimensions includes:
analyzing the resource state data of the computer where the target service is located from the state data of different dimensions, and judging whether the utilization rate of the resource state data exceeds a preset reference threshold value;
setting the running state of the target server to be abnormal in response to the utilization rate of the resource state data exceeding the reference threshold;
and setting the running state of the target server to be normal in response to the utilization rate of the resource state data not exceeding the reference threshold.
In some embodiments of the present invention, the step of determining the state of the target service from a plurality of dimensions based on the state data of the different dimensions further comprises:
analyzing business data processed by the target service from the state data with different dimensions, judging whether the execution state of the business data is abnormal or not, and counting the number of abnormal execution times;
setting the running state of the target server as abnormal in response to the execution abnormal times exceeding a preset value;
and setting the running state of the target server to be normal in response to the abnormal execution times not exceeding a preset value.
In some embodiments of the present invention, the step of determining the state of the target service from a plurality of dimensions based on the state data of the different dimensions further comprises:
analyzing the task logs output by the target service from the state data with different dimensionalities, judging whether the time for outputting the task logs is overtime or not, and counting the overtime times;
setting the running state of the target server to be abnormal in response to the timeout number of the target service exceeding a predetermined value;
and setting the running state of the target server to be normal in response to the timeout times of the target service not exceeding a predetermined value.
In some embodiments of the present invention, the step of determining the state of the target service from a plurality of dimensions based on the state data of the different dimensions further comprises:
and acquiring judgment results of multiple dimensions, respectively counting the times of different results, and taking the result with the largest times as the state of the target service.
In some embodiments of the present invention, in response to the determination that the abnormality is detected, the step of obtaining a repair scheme that matches the abnormality and repairing the target service by the repair scheme includes:
responding to the generation of the abnormality, and recording the operation of the target service when the target service is abnormal and all operations and operation results of a computer where the target service is located;
judging whether all operations on the computer are effective according to the operation result, and taking the effective operations as a repairing scheme of the corresponding abnormality.
Another aspect of the present invention further provides a cloud platform operation maintenance system, including:
the data acquisition module is configured to acquire state data of different dimensions of the target service;
a state judgment module configured to judge a state of the target service from a plurality of dimensions based on the state data of the different dimensions;
and the state repair module is configured to respond to the judgment that the target service is abnormal, acquire a repair scheme matched with the abnormality, and repair the target service through the repair scheme.
Yet another aspect of the present invention is directed to a computer device comprising:
at least one processor; and
a memory storing computer instructions executable on the processor, which when executed by the processor, perform the steps of the method of any of the above embodiments.
Yet another aspect of the invention also proposes a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method of any of the above embodiments.
According to the cloud platform operation maintenance method provided by the invention, the operation state of the target service of the cloud platform is evaluated from multiple dimensions, and when the evaluation results based on the multiple dimensions are abnormal, a scheme for solving the abnormality is obtained to automatically maintain the cloud platform.
Further, the quantifiable data in the evaluation data with different dimensions are associated, when the evaluation mechanisms with multiple dimensions are not uniform, whether the quantifiable data are matched or not is determined according to the relation of the quantifiable data, if the quantifiable data are matched, the situation that the data are not in error in relation is indicated, if the evaluation mechanism is possibly defective, and the result of the dimension with the abnormal evaluation result can be ignored.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a cloud platform operation and maintenance method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a cloud platform operation maintenance system according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computer readable storage medium according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
The method and the device aim to solve the problem that the traditional cloud platform state maintenance scene is difficult to early warn an emergency.
As shown in fig. 1, in order to solve the above-mentioned problems, the present invention provides a cloud platform operation maintenance method, which includes:
s1, acquiring state data of different dimensions of a target service;
s2, judging the state of the target service from a plurality of dimensions based on the state data of the different dimensions;
and step S3, responding to the judgment that the target service is abnormal, acquiring a repairing scheme matched with the abnormality, and repairing the target service through the repairing scheme.
In the embodiment of the invention, the target service refers to a service program of the cloud platform, namely, a service program for providing a cloud platform management service, and runs on a corresponding node in the cloud platform.
In some embodiments of the present invention, in step S1, a plurality of data about a target service program of a cloud platform is acquired, for example: the cloud platform operation related state data includes: the operation state of the computing node (or a plurality of computing nodes) where the target service is located, namely the state data such as CPU utilization rate, memory utilization rate, disk resources, network bandwidth and the like of the corresponding computing node; or monitoring the traffic volume processed by the target service, the speed of processing the traffic, and the like; or log data output by the target service, etc.
In step S2, a plurality of evaluation mechanisms of different dimensions are set up according to the dimensions of the collected state data, and the state of the target service is judged from the different dimensions. The judgment of each dimension is not affected, and the final judgment result is determined according to the combination of the judgment results of a plurality of dimensions, for example, if the number of the abnormal results in the judgment results of the plurality of dimensions is more than the number of the normal results, the state of the target service is considered to be abnormal.
In step S3, if the state of the target service is determined to be abnormal, an appropriate solution is found according to specific abnormality information to automatically repair the abnormality.
In some embodiments of the present invention, the step of determining the state of the target service from a plurality of dimensions based on the state data of the different dimensions includes:
calculating quantization scales of the quantifiable data of multiple dimensions according to the time relation;
in response to inconsistent judgment results in multiple dimensions, comparing whether quantization scales of quantized data corresponding to the multiple dimensions are consistent or not, and judging whether comparison results of the quantization scales are consistent with judgment results of the multiple dimensions or not;
setting the judgment result of the dimension, of which the judgment result of the plurality of dimensions is abnormal, as misjudgment in response to the comparison result of the quantization scale being consistent with the judgment result of the plurality of dimensions;
and in response to the fact that the comparison result of the quantification ratio is inconsistent with the judgment result of the plurality of dimensions, setting the judgment result of the dimension, in which the judgment result of the plurality of dimensions is abnormal, as correct, and setting the target service as abnormal.
In some embodiments of the present invention, data that can be quantized in data of multiple dimensions are cross-dimensionally correlated according to a time relationship, and quantization scales thereof are calculated. For example, the data such as CPU usage, memory occupancy, network bandwidth, and disk IO rate among the state data of the compute nodes are data that can be quantized; similarly, in the state data dimension of the target service, the number of completed requests of the target service, and the time consumed for processing the requests, are also quantifiable data. In addition, the dimension of the log data of the target service, the log output by the target is related to the corresponding request, namely, a request for accessing the target service of the cloud platform, which services are required to be provided by the cloud platform are recorded in the log, and the three dimension data are all positively related, namely, the more the number of requests processed by the target service is, the more the resources of the computing node are consumed, and the more the generated log data are generated. Therefore, the corresponding quantization scale can be calculated from the quantifiable data. And comparing whether the quantization scales are the same or not, namely, the three quantization scales can be calculated on the basis of the three quantization scales, comparing, and knowing the correlation of one dimension and the other two dimensions on the quantized data by comparison, wherein the state of the target service is abnormal if the state of the dimension 2 is judged to be normal operation, namely, the abnormal constant of the plurality of dimensions is smaller than the normal number, the judgment result of the dimension 2 is adopted, namely, the majority is obeyed to minority, and the state of the target service is not determined according to the judgment mechanism any more, if the quantization scales of the dimension 1 and the dimension 2 are 1.2, the quantization scales of the dimension 1 and the dimension 3 are 1, namely, the state of the target service is judged to be abnormal on the dimension 2.
Further, in some embodiments of the present invention, if the quantization scales of the multiple dimensions are the same, but the judgment result of the individual dimension is different from the judgment result of the other dimension, the "minority" subject to "majority" is adopted, that is, the state of the target service is determined according to the type with the most judgment result.
And quantizing the data, and calculating the two-user proportion of quantized data of different dimensions at the same time according to the time generated by the data of each dimension to obtain a quantization proportion which can adapt to the global situation (namely, when different resource levels are within an error allowable range, the reliable value of the quantized data of other dimensions can be obtained by multiplying the input data of any dimension by the two-user proportion).
Further, multiplying the quantifiable data of each dimension by the corresponding global quantification proportion to obtain a plurality of groups of quantified data of other dimensions, judging whether the calculated data of other dimensions are matched with the data of other dimensions obtained truly (the quantity difference can be within a certain range), if so, considering that the quantified data of the dimension is normal, and if not, indicating that the quantified data of the dimension is abnormal. If there is an abnormal dimension on the quantized data, the above-described "majority" subject to "minority" target service state judgment mechanism is employed. If there is no dimension once on the quantized data, the above-described "minority" compliant "majority" judgment mechanism is employed.
In some embodiments of the present invention, the step of determining the state of the target service from a plurality of dimensions based on the state data of the different dimensions includes:
analyzing the resource state data of the computer where the target service is located from the state data of different dimensions, and judging whether the utilization rate of the resource state data exceeds a preset reference threshold value;
setting the running state of the target server to be abnormal in response to the utilization rate of the resource state data exceeding the reference threshold;
and setting the running state of the target server to be normal in response to the utilization rate of the resource state data not exceeding the reference threshold.
In some embodiments of the present invention, monitoring operational state data of a computing node and determining whether an anomaly has occurred in the computing node based on the operational state data comprises: judging whether the heartbeat information of the node is normal or not; judging whether the resource utilization rate of the node is lower than a certain threshold value or higher than a certain threshold value; and judging that the starting time of the node is higher than a certain threshold value.
In some embodiments of the present invention, the step of determining the state of the target service from a plurality of dimensions based on the state data of the different dimensions further comprises:
analyzing business data processed by the target service from the state data with different dimensions, judging whether the execution state of the business data is abnormal or not, and counting the number of abnormal execution times;
setting the running state of the target server as abnormal in response to the execution abnormal times exceeding a preset value;
and setting the running state of the target server to be normal in response to the abnormal execution times not exceeding a preset value.
In some embodiments of the present invention, the running state data of the target service is monitored, that is, whether the service request data processed by the target service is abnormal is judged, the number of abnormal service requests is counted, and if the number exceeds a preset value, the state of the target server is set to be abnormal.
In some embodiments of the present invention, the step of determining the state of the target service from a plurality of dimensions based on the state data of the different dimensions further comprises:
analyzing the task logs output by the target service from the state data with different dimensionalities, judging whether the time for outputting the task logs is overtime or not, and counting the overtime times;
setting the running state of the target server to be abnormal in response to the timeout number of the target service exceeding a predetermined value;
and setting the running state of the target server to be normal in response to the timeout times of the target service not exceeding a predetermined value.
In some embodiments of the present invention, log data of a target service is parsed, a time interval of a monitoring task log is set, a default is 1 minute, a return time is calculated for each execution of the monitoring task log, the time is greater than a time-out time, when the execution is not completed, and the health inspection is considered to be overtime, the time-out time can be set to 5 seconds and 10 seconds, the number of times of pre-fabricated information occurrence in the monitoring task log inspection is set, and when three log abnormal information occurrence in the monitoring task log is detected, the health inspector of the monitoring service log system considers that the monitoring service is abnormal.
In some embodiments of the present invention, the step of determining the state of the target service from a plurality of dimensions based on the state data of the different dimensions further comprises:
and acquiring judgment results of multiple dimensions, respectively counting the times of different results, and taking the result with the largest times as the state of the target service.
In some embodiments of the present invention, in response to the determination that the abnormality is detected, the step of obtaining a repair scheme that matches the abnormality and repairing the target service by the repair scheme includes:
responding to the generation of the abnormality, and recording the operation of the target service when the target service is abnormal and all operations and operation results of a computer where the target service is located;
judging whether all operations on the computer are effective according to the operation result, and taking the effective operations as a repairing scheme of the corresponding abnormality.
In some embodiments of the present invention, when an abnormality occurs in a target service, all repair operations and operation results on the cloud platform are recorded, and if the operation results indicate that the repair is valid, the corresponding results are saved as a repair policy.
As shown in fig. 2, another aspect of the present invention further proposes a cloud platform operation maintenance system, including:
the system comprises a data acquisition module 1, wherein the data acquisition module 1 is configured to acquire state data of different dimensions of a target service;
a state judgment module 2, wherein the state judgment module 2 is configured to judge the state of the target service from a plurality of dimensions based on the state data of the different dimensions;
and the state repair module 3 is configured to respond to the judgment that the target service is abnormal, acquire a repair scheme matched with the abnormality, and repair the target service through the repair scheme.
As shown in fig. 3, a further aspect of the present invention further proposes a computer device, including:
at least one processor 21; and
a memory 22, said memory 22 storing computer instructions 23 executable on said processor 21, said instructions 23 when executed by said processor 21 implementing a cloud platform operation and maintenance method comprising:
acquiring state data of different dimensions of a target service;
judging the state of the target service from a plurality of dimensions based on the state data of the different dimensions;
and responding to the judgment that the target service is abnormal, acquiring a repairing scheme matched with the abnormality, and repairing the target service through the repairing scheme.
In some embodiments of the present invention, the step of determining the state of the target service from a plurality of dimensions based on the state data of the different dimensions includes:
calculating quantization scales of the quantifiable data of multiple dimensions according to the time relation;
in response to inconsistent judgment results in multiple dimensions, comparing whether quantization scales of quantized data corresponding to the multiple dimensions are consistent or not, and judging whether comparison results of the quantization scales are consistent with judgment results of the multiple dimensions or not;
setting the judgment result of the dimension, of which the judgment result of the plurality of dimensions is abnormal, as misjudgment in response to the comparison result of the quantization scale being consistent with the judgment result of the plurality of dimensions;
and in response to the fact that the comparison result of the quantification ratio is inconsistent with the judgment result of the plurality of dimensions, setting the judgment result of the dimension, in which the judgment result of the plurality of dimensions is abnormal, as correct, and setting the target service as abnormal.
In some embodiments of the present invention, the step of determining the state of the target service from a plurality of dimensions based on the state data of the different dimensions includes:
analyzing the resource state data of the computer where the target service is located from the state data of different dimensions, and judging whether the utilization rate of the resource state data exceeds a preset reference threshold value;
setting the running state of the target server to be abnormal in response to the utilization rate of the resource state data exceeding the reference threshold;
and setting the running state of the target server to be normal in response to the utilization rate of the resource state data not exceeding the reference threshold.
In some embodiments of the present invention, the step of determining the state of the target service from a plurality of dimensions based on the state data of the different dimensions further comprises:
analyzing business data processed by the target service from the state data with different dimensions, judging whether the execution state of the business data is abnormal or not, and counting the number of abnormal execution times;
setting the running state of the target server as abnormal in response to the execution abnormal times exceeding a preset value;
and setting the running state of the target server to be normal in response to the abnormal execution times not exceeding a preset value.
In some embodiments of the present invention, the step of determining the state of the target service from a plurality of dimensions based on the state data of the different dimensions further comprises:
analyzing the task logs output by the target service from the state data with different dimensionalities, judging whether the time for outputting the task logs is overtime or not, and counting the overtime times;
setting the running state of the target server to be abnormal in response to the timeout number of the target service exceeding a predetermined value;
and setting the running state of the target server to be normal in response to the timeout times of the target service not exceeding a predetermined value.
In some embodiments of the present invention, the step of determining the state of the target service from a plurality of dimensions based on the state data of the different dimensions further comprises:
and acquiring judgment results of multiple dimensions, respectively counting the times of different results, and taking the result with the largest times as the state of the target service.
In some embodiments of the present invention, in response to the determination that the abnormality is detected, the step of obtaining a repair scheme that matches the abnormality and repairing the target service by the repair scheme includes:
responding to the generation of the abnormality, and recording the operation of the target service when the target service is abnormal and all operations and operation results of a computer where the target service is located;
judging whether all operations on the computer are effective according to the operation result, and taking the effective operations as a repairing scheme of the corresponding abnormality.
As shown in fig. 4, still another aspect of the present invention further proposes a computer readable storage medium 401, where the computer readable storage medium 401 stores a computer program 402, where the computer program 402 when executed by a processor implements a cloud platform operation maintenance method, including:
acquiring state data of different dimensions of a target service;
judging the state of the target service from a plurality of dimensions based on the state data of the different dimensions;
and responding to the judgment that the target service is abnormal, acquiring a repairing scheme matched with the abnormality, and repairing the target service through the repairing scheme.
In some embodiments of the present invention, the step of determining the state of the target service from a plurality of dimensions based on the state data of the different dimensions includes:
calculating quantization scales of the quantifiable data of multiple dimensions according to the time relation;
in response to inconsistent judgment results in multiple dimensions, comparing whether quantization scales of quantized data corresponding to the multiple dimensions are consistent or not, and judging whether comparison results of the quantization scales are consistent with judgment results of the multiple dimensions or not;
setting the judgment result of the dimension, of which the judgment result of the plurality of dimensions is abnormal, as misjudgment in response to the comparison result of the quantization scale being consistent with the judgment result of the plurality of dimensions;
and in response to the fact that the comparison result of the quantification ratio is inconsistent with the judgment result of the plurality of dimensions, setting the judgment result of the dimension, in which the judgment result of the plurality of dimensions is abnormal, as correct, and setting the target service as abnormal.
In some embodiments of the present invention, the step of determining the state of the target service from a plurality of dimensions based on the state data of the different dimensions includes:
analyzing the resource state data of the computer where the target service is located from the state data of different dimensions, and judging whether the utilization rate of the resource state data exceeds a preset reference threshold value;
setting the running state of the target server to be abnormal in response to the utilization rate of the resource state data exceeding the reference threshold;
and setting the running state of the target server to be normal in response to the utilization rate of the resource state data not exceeding the reference threshold.
In some embodiments of the present invention, the step of determining the state of the target service from a plurality of dimensions based on the state data of the different dimensions further comprises:
analyzing business data processed by the target service from the state data with different dimensions, judging whether the execution state of the business data is abnormal or not, and counting the number of abnormal execution times;
setting the running state of the target server as abnormal in response to the execution abnormal times exceeding a preset value;
and setting the running state of the target server to be normal in response to the abnormal execution times not exceeding a preset value.
In some embodiments of the present invention, the step of determining the state of the target service from a plurality of dimensions based on the state data of the different dimensions further comprises:
analyzing the task logs output by the target service from the state data with different dimensionalities, judging whether the time for outputting the task logs is overtime or not, and counting the overtime times;
setting the running state of the target server to be abnormal in response to the timeout number of the target service exceeding a predetermined value;
and setting the running state of the target server to be normal in response to the timeout times of the target service not exceeding a predetermined value.
In some embodiments of the present invention, the step of determining the state of the target service from a plurality of dimensions based on the state data of the different dimensions further comprises:
and acquiring judgment results of multiple dimensions, respectively counting the times of different results, and taking the result with the largest times as the state of the target service.
In some embodiments of the present invention, in response to the determination that the abnormality is detected, the step of obtaining a repair scheme that matches the abnormality and repairing the target service by the repair scheme includes:
responding to the generation of the abnormality, and recording the operation of the target service when the target service is abnormal and all operations and operation results of a computer where the target service is located;
judging whether all operations on the computer are effective according to the operation result, and taking the effective operations as a repairing scheme of the corresponding abnormality.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that as used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The foregoing embodiment of the present invention has been disclosed with reference to the number of embodiments for the purpose of description only, and does not represent the advantages or disadvantages of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will appreciate that: the above discussion of any embodiment is merely exemplary and is not intended to imply that the scope of the disclosure of embodiments of the invention, including the claims, is limited to such examples; combinations of features of the above embodiments or in different embodiments are also possible within the idea of an embodiment of the invention, and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the embodiments should be included in the protection scope of the embodiments of the present invention.

Claims (10)

1. The cloud platform operation maintenance method is characterized by comprising the following steps of:
acquiring state data of different dimensions of a target service;
judging the state of the target service from a plurality of dimensions based on the state data of the different dimensions;
and responding to the judgment that the target service is abnormal, acquiring a repairing scheme matched with the abnormality, and repairing the target service through the repairing scheme.
2. The method of claim 1, wherein the step of determining the state of the target service from a plurality of dimensions based on the state data of the different dimensions comprises:
calculating quantization scales of the quantifiable data of multiple dimensions according to the time relation;
in response to inconsistent judgment results in multiple dimensions, comparing whether quantization scales of quantized data corresponding to the multiple dimensions are consistent or not, and judging whether comparison results of the quantization scales are consistent with judgment results of the multiple dimensions or not;
setting the judgment result of the dimension, of which the judgment result of the plurality of dimensions is abnormal, as misjudgment in response to the comparison result of the quantization scale being consistent with the judgment result of the plurality of dimensions;
and in response to the fact that the comparison result of the quantification ratio is inconsistent with the judgment result of the plurality of dimensions, setting the judgment result of the dimension, in which the judgment result of the plurality of dimensions is abnormal, as correct, and setting the target service as abnormal.
3. The method of claim 1, wherein the step of determining the state of the target service from a plurality of dimensions based on the state data of the different dimensions comprises:
analyzing the resource state data of the computer where the target service is located from the state data of different dimensions, and judging whether the utilization rate of the resource state data exceeds a preset reference threshold value;
setting the running state of the target server to be abnormal in response to the utilization rate of the resource state data exceeding the reference threshold;
and setting the running state of the target server to be normal in response to the utilization rate of the resource state data not exceeding the reference threshold.
4. The method of claim 1, wherein the step of determining the state of the target service from a plurality of dimensions based on the state data of the different dimensions further comprises:
analyzing business data processed by the target service from the state data with different dimensions, judging whether the execution state of the business data is abnormal or not, and counting the number of abnormal execution times;
setting the running state of the target server as abnormal in response to the execution abnormal times exceeding a preset value;
and setting the running state of the target server to be normal in response to the abnormal execution times not exceeding a preset value.
5. The method of claim 1, wherein the step of determining the state of the target service from a plurality of dimensions based on the state data of the different dimensions further comprises:
analyzing the task logs output by the target service from the state data with different dimensionalities, judging whether the time for outputting the task logs is overtime or not, and counting the overtime times;
setting the running state of the target server to be abnormal in response to the timeout number of the target service exceeding a predetermined value;
and setting the running state of the target server to be normal in response to the timeout times of the target service not exceeding a predetermined value.
6. The method of claim 1, wherein the step of determining the state of the target service from a plurality of dimensions based on the state data of the different dimensions further comprises:
and acquiring judgment results of multiple dimensions, respectively counting the times of different results, and taking the result with the largest times as the state of the target service.
7. The method according to claim 1, wherein the step of, in response to the determination as abnormal, acquiring a repair scheme matching the abnormal, and repairing the target service by the repair scheme includes:
responding to the generation of the abnormality, and recording the operation of the target service when the target service is abnormal and all operations and operation results of a computer where the target service is located;
judging whether all operations on the computer are effective according to the operation result, and taking the effective operations as a repairing scheme of the corresponding abnormality.
8. A cloud platform operation and maintenance system, comprising:
the data acquisition module is configured to acquire state data of different dimensions of the target service;
a state judgment module configured to judge a state of the target service from a plurality of dimensions based on the state data of the different dimensions;
and the state repair module is configured to respond to the judgment that the target service is abnormal, acquire a repair scheme matched with the abnormality, and repair the target service through the repair scheme.
9. A computer device, comprising:
at least one processor; and
a memory storing computer instructions executable on the processor, which when executed by the processor, perform the steps of the method of any one of claims 1-7.
10. A computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method of any one of claims 1-7.
CN202311193149.1A 2023-09-15 2023-09-15 Cloud platform operation maintenance method, system, equipment and medium Pending CN117240688A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311193149.1A CN117240688A (en) 2023-09-15 2023-09-15 Cloud platform operation maintenance method, system, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311193149.1A CN117240688A (en) 2023-09-15 2023-09-15 Cloud platform operation maintenance method, system, equipment and medium

Publications (1)

Publication Number Publication Date
CN117240688A true CN117240688A (en) 2023-12-15

Family

ID=89082067

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311193149.1A Pending CN117240688A (en) 2023-09-15 2023-09-15 Cloud platform operation maintenance method, system, equipment and medium

Country Status (1)

Country Link
CN (1) CN117240688A (en)

Similar Documents

Publication Publication Date Title
CN108491305B (en) Method and system for detecting server fault
US8516499B2 (en) Assistance in performing action responsive to detected event
KR102522005B1 (en) Apparatus for VNF Anomaly Detection based on Machine Learning for Virtual Network Management and a method thereof
US9389946B2 (en) Operation management apparatus, operation management method, and program
CN111045894B (en) Database abnormality detection method, database abnormality detection device, computer device and storage medium
CN109586952B (en) Server capacity expansion method and device
CN112162878A (en) Database fault discovery method and device, electronic equipment and storage medium
CN107992410B (en) Software quality monitoring method and device, computer equipment and storage medium
WO2020220437A1 (en) Method for virtual machine software aging prediction based on adaboost-elman
CN110569166A (en) Abnormality detection method, abnormality detection device, electronic apparatus, and medium
US20120317069A1 (en) Throughput sustaining support system, device, method, and program
US20120174231A1 (en) Assessing System Performance Impact of Security Attacks
CN108334427B (en) Fault diagnosis method and device in storage system
CN108390793A (en) A kind of method and device of analysis system stability
CN114490078A (en) Dynamic capacity reduction and expansion method, device and equipment for micro-service
CN117061335A (en) Cloud platform equipment health management and control method and device, storage medium and electronic equipment
CN108268351B (en) Method and system for accurately monitoring process running state
CN109992408B (en) Resource allocation method, device, electronic equipment and storage medium
CN114328078A (en) Threshold dynamic calculation method and device and computer readable storage medium
CN115114124A (en) Host risk assessment method and device
CN110413482B (en) Detection method and device
WO2020044898A1 (en) Device status monitoring device and program
CN117240688A (en) Cloud platform operation maintenance method, system, equipment and medium
CN114297034B (en) Cloud platform monitoring method and cloud platform
CN112685390B (en) Database instance management method and device and computing equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination