CN110502390B

CN110502390B - Automatic operation and maintenance management system of colleges and universities cloud computing center

Info

Publication number: CN110502390B
Application number: CN201910611693.0A
Authority: CN
Inventors: 宋焘
Original assignee: China University of Geosciences
Current assignee: China University of Geosciences
Priority date: 2019-07-08
Filing date: 2019-07-08
Publication date: 2021-06-01
Anticipated expiration: 2039-07-08
Also published as: CN110502390A

Abstract

The invention provides an automatic operation and maintenance management system of a cloud computing center in colleges and universities, which comprises: the system comprises a monitoring system, an operation and maintenance system and a management system, wherein the monitoring system is used for monitoring state information of the cloud computing center and judging whether the cloud computing center has an abnormal event or not; the operation and maintenance system is used for analyzing the data of the abnormal event when the monitoring system detects the abnormal event and matching corresponding problem information from the problem library; the management system is used for generating a corresponding operation and maintenance task according to the problem information, synchronizing the operation and maintenance task information to one or more user terminals corresponding to the operation and maintenance task, and updating the operation and maintenance task information according to the feedback information of the user terminals. The invention can find the problems existing in the cloud computing center in time, realize the refined allocation of the operation and maintenance roles and the effective tracing of the operation and maintenance tasks, and improve the operation and maintenance management effect of the cloud computing center.

Description

Automatic operation and maintenance management system of colleges and universities cloud computing center

Technical Field

The invention relates to the technical field of operation and maintenance of cloud computing centers, in particular to an automatic operation and maintenance management system of a cloud computing center in colleges and universities.

Background

With the rapid development of educational informatization construction, operation and maintenance management becomes one of the main informationization works of colleges and universities. The emergence of new technologies such as virtualization and cloud computing gradually changes the information-based construction mode, the traditional computing platform is replaced by a cloud computing platform, and the operating efficiency of a data center is improved. The operation and maintenance method has the advantages that the method brings the bonus in enjoying new technology and simultaneously has new problems, the number, scale and complexity of objects for operation and maintenance management are greatly increased, the operation and maintenance mode is only undertaken by operation and maintenance personnel in the traditional 'who constructs and maintains' operation and maintenance mode, and the operation and maintenance mode that the outside cannot know, supervise and feed back the operation and maintenance process obviously cannot meet the requirements of services, technologies and management.

Disclosure of Invention

Aiming at the problems, the invention aims to provide an automatic operation and maintenance management system of a cloud computing center in colleges and universities.

The purpose of the invention is realized by adopting the following technical scheme:

an automated operation and maintenance management system of a college cloud computing center, comprising: a monitoring system, an operation and maintenance system and a management system, wherein,

the monitoring system is used for monitoring the state information of the cloud computing center and judging whether the cloud computing center has an abnormal event or not;

the operation and maintenance system is used for analyzing the data of the abnormal event when the monitoring system detects the abnormal event and matching corresponding problem information from the problem library;

and the management system is used for generating a corresponding operation and maintenance task according to the problem information, synchronizing the operation and maintenance task information to one or more user terminals corresponding to the operation and maintenance task, and updating the operation and maintenance task information according to the feedback information of the user terminals.

In one embodiment, the operation and maintenance system further comprises:

the automatic operation and maintenance task processing method specifically comprises the following steps:

and calling and executing a corresponding operation and maintenance script from the knowledge base according to the problem information in the operation and maintenance task to process the abnormal event.

In one embodiment, the state information includes computing resource information, network resource information, storage resource information, and the like of the cloud computing center;

in one embodiment, the operation and maintenance task information includes: abnormal event information, operation and maintenance task types, related personnel information, operation and maintenance task processing information and the like;

the abnormal event information comprises problem information of the abnormal event, the occurrence time of the abnormal event, the range influenced by the abnormal event and the like;

the operation and maintenance task types comprise automatic maintenance, manual maintenance, daily maintenance, fault treatment and the like;

the related personnel information comprises user information corresponding to the influence range of the abnormal event, agent maintenance personnel information of the influence range, operation and maintenance personnel information, administrator information and the like;

the operation and maintenance task processing information comprises a processing flow, a processing result, a processing log and the like.

In one embodiment, the system further comprises a database module, wherein the database module further comprises an operation and maintenance task database, a history database, a problem base and a knowledge base;

the operation and maintenance task database is used for storing the operation and maintenance task data generated by the management system;

the historical database is used for recording historical state information of the cloud computing center, the processing process of abnormal events and processing result information, and is also used for synchronizing operation and maintenance scripts generated by operation records in the processing process into the knowledge base;

the problem library is used for storing common problems in the operation and maintenance process and abnormal event data characteristics corresponding to the problems;

and the knowledge base is used for storing the operation and maintenance script corresponding to the problem.

In one embodiment, the management system further comprises: a work order module is arranged on the machine body,

and the work order module is used for receiving the operation and maintenance feedback work order sent by the user and generating a corresponding operation and maintenance task according to the work order information, wherein the work order information comprises the problem information fed back by the user.

In one embodiment, an operation and maintenance system comprises: a question bank input module and a knowledge bank input module,

the problem bank input module is used for inputting the problems and the abnormal event data characteristics corresponding to the problems into the problem bank by operation and maintenance personnel;

and the knowledge base input module is used for inputting the operation and maintenance script and the problem information correspondingly solved by the script into the knowledge base by the operation and maintenance personnel.

In one embodiment, a monitoring system includes: an anomaly monitoring module that monitors, among other things,

the anomaly monitoring module is used for monitoring the state information of the cloud computing center and judging whether the state information is abnormal or not, and specifically comprises the following steps:

acquiring state information of a cloud computing center within a period of time;

and inputting the state information in a period of time into the trained anomaly detection model, and acquiring the running state detection result output by the model.

The invention has the beneficial effects that: the monitoring system is arranged in the cloud computing center to monitor the state information of the cloud computing center, detect abnormal events and find problems in the cloud computing center in time; setting an operation and maintenance system to analyze the abnormal events and matching corresponding problem information of the abnormal events from a problem library; and the management system generates corresponding operation and maintenance tasks, realizes an automatic work order of the operation and maintenance tasks, realizes the fine distribution of operation and maintenance roles and the effective tracing of the operation and maintenance tasks, establishes a complete operation and maintenance system of the cloud computing center, and improves the operation and maintenance management effect of the cloud computing center.

Meanwhile, the operation and maintenance system can automatically process the operation and maintenance tasks, corresponding processing scripts are called according to different problems to process abnormal events, automatic monitoring and automatic operation and maintenance of the cloud computing center are achieved, and the workload of operation and maintenance personnel can be effectively reduced.

Drawings

The invention is further illustrated by means of the attached drawings, but the embodiments in the drawings do not constitute any limitation to the invention, and for a person skilled in the art, other drawings can be obtained on the basis of the following drawings without inventive effort.

Fig. 1 is a frame structure diagram of the present invention.

Reference numerals:

the monitoring system 1, the alarm module 11, the abnormality monitoring module 12, the abnormality detection model training unit 121, the operation and maintenance system 2, the automatic operation and maintenance module 21, the question bank entry module 22, the knowledge bank entry module 23, the management system 3, the task generation module 31, the work order module 32, the database module 4, the operation and maintenance task database 41, the history database 42, the question bank 43, and the knowledge bank 44

Detailed Description

The invention is further described in connection with the following application scenarios.

Referring to fig. 1, there is shown a college cloud computing center automated operation and maintenance management system 3, including: a monitoring system 1, an operation and maintenance system 2 and a management system 3, wherein,

the monitoring system 1 is used for monitoring the state information of the cloud computing center and judging whether the cloud computing center has an abnormal event or not;

the operation and maintenance system 2 is used for analyzing the data of the abnormal event when the monitoring system 1 detects the abnormal event, and matching corresponding problem information from the problem database 43;

and the management system 3 is used for generating a corresponding operation and maintenance task according to the problem information, synchronizing the operation and maintenance task information to one or more user terminals corresponding to the operation and maintenance task, and updating the operation and maintenance task information according to the feedback information of the user terminals.

According to the embodiment of the invention, the monitoring system 1 is arranged in the cloud computing center to monitor the state information of the cloud computing center, detect abnormal events and find problems in the cloud computing center in time; the operation and maintenance system 2 is set to analyze the abnormal events and match the corresponding problem information of the abnormal events from the problem database 43; and the management system 3 generates corresponding operation and maintenance tasks, realizes an automatic work order of the operation and maintenance tasks, realizes the fine distribution of operation and maintenance roles and the effective tracing of the operation and maintenance tasks, establishes a complete operation and maintenance system of the cloud computing center, and improves the operation and maintenance management effect of the cloud computing center.

In one embodiment, the state information includes computing resource information, network resource information, storage resource information, and the like of the cloud computing center.

In one scenario, the monitoring system 1 mainly monitors the cloud computing center status information, including: CPU utilization, memory utilization, IO latency information.

the operation and maintenance task types comprise automatic maintenance, manual maintenance, daily maintenance and the like;

In one scenario, when an abnormal event occurs, after an operation and maintenance task is generated, the operation and maintenance task information is simultaneously sent to a terminal corresponding to an administrator, a user, a representative maintenance person and an operation and maintenance person, and the person related to the operation and maintenance task is notified at the first time, so that the operation and maintenance efficiency is improved; meanwhile, relevant personnel of the operation and maintenance task can respectively execute corresponding operations on the operation and maintenance task, so that the operation and maintenance task can be processed simultaneously in multiple roles (for example, a user supplements problem description, simple operation and maintenance operation can be directly operated by maintenance personnel, complex operation and maintenance tasks are operated by the operation and maintenance personnel, an administrator sets operation and maintenance permission, operation and maintenance progress, operation and maintenance time limit and the like), and the processing efficiency and the processing effect of the operation and maintenance task are further improved.

After the operation and maintenance task is completed, the operation and maintenance personnel or the agent maintenance personnel upload the processing result of the operation and maintenance task to the management system, and the management system updates the state information of the operation and maintenance task, so that a manager can master the progress of the operation and maintenance task according to the needs, count the workload condition of the operation and maintenance personnel, and make allocation flexibly.

Aiming at the characteristics of 'who constructs and maintains' in the traditional operation and maintenance mode, other users and other managers cannot know the problem of the whole operation and maintenance process; the operation and maintenance roles are set finely according to actual conditions, different roles are added into the operation and maintenance tasks at the same time to process and manage the operation and maintenance tasks, users and managers related to the operation and maintenance can know the progress and conditions of the operation and maintenance in real time, the transparence of operation and maintenance information is achieved, and the operation and maintenance management quality is improved.

In one embodiment, the operation and maintenance system 2 further includes:

and calling and executing a corresponding operation and maintenance script from the knowledge base 44 to process the abnormal event according to the problem information in the operation and maintenance task.

In one embodiment, the operation and maintenance system 2 further includes:

the automatic operation and maintenance module 21 is used for calling and executing the corresponding operation and maintenance script from the knowledge base 44 according to the problem information in the operation and maintenance task

According to the embodiment of the invention, the operation and maintenance system 2 can also automatically process the operation and maintenance tasks, call the corresponding processing scripts to process the abnormal events according to different problems, realize automatic monitoring and automatic operation and maintenance of the cloud computing center, and effectively reduce the workload of the operation and maintenance personnel.

In one embodiment, the system further comprises a database module 4, which further comprises an operation and maintenance task database 41, a history database 42, a question bank 43 and a knowledge bank 44;

the operation and maintenance task database 41 is used for storing the operation and maintenance task data generated by the management system 3;

the historical database 42 is used for recording historical state information of the cloud computing center, processing procedures and processing result information of abnormal events, and is also used for synchronizing operation and maintenance scripts generated by operation records in the processing procedures into the knowledge base 44;

the problem library 43 is used for storing common problems in the operation and maintenance process and abnormal event data characteristics corresponding to the problems;

and the knowledge base 44 is used for storing the operation and maintenance script corresponding to the problem.

In one embodiment, when an abnormal event is detected, the operation and maintenance personnel can also process the abnormal event through manual operation, and the processing procedure and the processing result information of the manual operation are recorded in the historical database 42.

In one scenario, the operation and maintenance task database 41 stores data including historical operation and maintenance tasks that have been completed, as well as ongoing operation and maintenance tasks for the system and user to call when needed.

In the above embodiment of the present invention, the operation and maintenance task database 41 is configured to uniformly manage the generated operation and maintenance task data for a user to call when needed, and the historical database 42 stores the historical state information of the cloud computing center and the processing procedure and processing result information of the abnormal event, so that the active recording of the operation and maintenance procedure can be realized, and the traceability of the operation and maintenance information can be improved; meanwhile, the problem base 43 and the knowledge base 44 are established to sort the common problems and the corresponding solutions in the operation and maintenance process, and the operation and maintenance script is established in the knowledge base 44 to manage script resources, so that the efficiency of operation and maintenance work can be effectively improved, the automatic operation of repeated operation and maintenance operation is realized, and the work load of operation and maintenance personnel is effectively reduced.

In one embodiment, the management system 3 further comprises: the task generating module 31 is configured to generate a task,

the task generating module 31 is configured to generate a corresponding operation and maintenance task according to the problem information.

In one embodiment, the management system 3 further comprises: the work order module 32 is configured to receive work orders,

and the work order module 32 is configured to receive an operation and maintenance feedback work order sent by a user, and generate a corresponding operation and maintenance task according to the work order information, where the work order information includes problem information fed back by the user.

In one embodiment, the operation and maintenance system 2 includes: a question bank entry module 22 and a knowledge bank entry module 23,

the question bank recording module 22 is used for the operation and maintenance personnel to record the questions and the abnormal event data characteristics corresponding to the questions into the question bank 43;

and the knowledge base entry module 23 is used for the operation and maintenance personnel to enter the operation and maintenance script and the problem information correspondingly solved by the script into the knowledge base 44.

In the traditional operation and maintenance process, problems of repetition, low technical content and even no need of operation and maintenance personnel are encountered. The problems are continuously collected and sorted through the problem database 43, the problem database 43 is continuously enlarged through automatic discovery of abnormal events and calibration of the problems, and a foundation is laid for automatic operation and maintenance. Meanwhile, corresponding processing scripts are developed by combining the operation and maintenance process and the solved problems, the knowledge base 44 is continuously perfected, and the automatic operation and maintenance degree can be continuously improved.

In one embodiment, the operation and maintenance personnel can also perform problem calibration on the abnormal event information from the historical database 42, and the calibrated abnormal event information and the corresponding problem information are synchronized into the problem database 43.

Similarly, the operation and maintenance personnel can calibrate the abnormal event handling process from the historical database 42, and the calibrated abnormal event handling process is generated into the operation and maintenance script and synchronized into the knowledge base 44 together with the calibrated problem information.

In the above embodiment of the present invention, after the operation and maintenance personnel manually or for the first time processes an abnormal event, the historical database 42 records the operation and maintenance process, after the operation and maintenance is completed, the operation and maintenance personnel can manually calibrate the historical operation and maintenance operation data and the solved abnormal event, and manually help the operation and maintenance system 2 to record the operation and maintenance operation, generate the corresponding operation and maintenance script, and store the operation and maintenance script in the knowledge base 44, so that the operation and maintenance system 2 can implement automatic operation and maintenance when encountering the same abnormal event. The method realizes the effective establishment of the operation and maintenance database, and effectively reduces the burden of operation and maintenance personnel caused by repeated operation.

In one embodiment, the monitoring system 1 comprises an alarm module 11,

and the alarm module 11 is configured to send a corresponding alarm message to the user terminal when the abnormal event is detected.

In one embodiment, according to the area where the abnormal event occurs, the abnormal event information is sent to the management terminal corresponding to the area, and the corresponding operation and maintenance personnel can process the abnormal event.

According to the above embodiment of the invention, after the abnormal event is detected, the specified operation and maintenance personnel can be notified to process the abnormal event according to the area where the abnormal event occurs and the preset rules (such as the operation and maintenance role, the operation and maintenance responsibility, the operation and maintenance requirement and the like), so that the operation and maintenance role and the operation and maintenance task can be clearly allocated.

In one embodiment, the monitoring system 1 comprises: an anomaly monitoring module 12 that monitors, among other things,

the anomaly monitoring module 12 is configured to monitor state information of the cloud computing center, and determine whether the state information is anomalous, specifically including:

and inputting the state information in a period of time into the trained anomaly detection model, and acquiring the operation state detection result output by the model.

According to the embodiment of the invention, the state information of the cloud computing center is monitored by adopting the trained anomaly detection model through the machine learning technology, the running state of the cloud computing center can be intelligently detected, the anomaly condition is timely found, and the operation and maintenance efficiency and effect are improved.

In one embodiment, the anomaly monitoring module 12 further includes an anomaly detection model training unit 121: training an anomaly detection model based on the historical state information, wherein the anomaly detection model is modeled based on an SOM network; the abnormality detection model training unit 121 includes:

preparing a layer:

acquiring historical state information of a cloud computing center within a period of time as a training sample, and enabling a historical state information vector g (t) ═ g₁(t),g₂(t),…,g_m(t)]Setting the input vector as an input vector, wherein m represents the dimension of the input vector, and t represents that the corresponding time sequence number of the input vector is t; initializing an input vector and a weight vector δ of a neuron_xy(0) Where x, y is 1,2, …, S, x and y denote the specific location of a neuron in the SOM network, and a neuron weight vector δ_xy(0) Is the same as the dimension of the input vector g (t), S denotes the size of the SOM neural network;

matching layer:

inputting an input vector in a training sample into an SOM network, acquiring the position of a neuron matched with the input vector, taking the Euclidean distance between the input vector and each neuron weight vector as a judgment basis, and selecting the corresponding neuron with the minimum Euclidean distance as the neuron matched with the input vector;

a training layer:

taking neurons matched with input vectors g (t) corresponding to the current time sequence number as a training region center phi, and acquiring an updating control factor of each neuron in a training region consisting of the training region center and a neighborhood thereof:

wherein the content of the first and second substances,

update control factor, I, representing a neuron (x, y)_ΦTo representTraining a coordinate vector of a center phi of the region in the SOM network, I (x, y) represents the coordinate vector of the neuron node (x, y) in the SOM network, lambda (t) represents an updating adjustment factor, d (t) represents a set neighborhood width, wherein the training region is a region which is less than the set neighborhood width from the center of the training region, S_eRepresenting the size of the SOM network;

updating each neuron in the training area according to the input vector, wherein the specifically adopted updating function is as follows:

wherein, delta_xy(t) represents the weight vector of the neuron (x, y) at the current time index t, δ_xy(t) represents the weight vector of the neuron (x, y) at the previous time index t-1, g (t) represents the input vector of the time index t,

an update control factor representing a neuron (x, y);

and after the training of the current time sequence number t is finished, starting the training of the next time sequence number t +1, and repeating the training of the training layer until the SOM network converges or exceeds the maximum time sequence number.

In one embodiment, the training sample further includes running state detection information corresponding to the historical state information.

In a first scenario, if the running state detection information includes normality or abnormality, the output result of the abnormality detection model is normal or abnormal;

in one scenario, the running state detection information includes problem information corresponding to normal or different abnormal events, and the result output by the abnormal detection model is normal or different kinds of problem information.

In one scenario, the input vector consists of the state information at the same time.

In one scenario, the SOM neural network is sized as sxs, which contains a total of sxs neuron nodes.

According to the embodiment of the invention, the anomaly detection model is trained by adopting the method, the historical running state information of the cloud computing center can be used as a training sample, the adaptability and the accuracy of model training are improved, meanwhile, the anomaly detection model is constructed by adopting the SOM network, the characteristic of huge data volume in the cloud computing center can be adapted, and the reliability of anomaly event detection is improved.

In one embodiment, in the anomaly detection model training unit 121, the set neighborhood width d (t) is obtained according to the following custom function:

where d (t) represents the neighborhood width at time number t, L represents the size of the SOM network, b represents the set control factor, λ (1) represents the updated adjustment factor at time number t equal to 1, λ (t) represents the updated adjustment factor at time number t,

and (4) indicating the set neighborhood width adjusting factor, and T indicating the time sequence number corresponding to the input vector, wherein T is 1,2, …, and T indicates the total number of the input vectors in the training sample.

According to the embodiment of the invention, the convergence of the training of the anomaly detection model is improved along with the increase of the input vector, and the neuron nodes in the SOM network form obvious differences, so that the field width is properly reduced along with the increase of the iteration times in the training of the model, and the accuracy and the stability of the training of the model can be improved.

In one embodiment, in the anomaly detection model training unit 121, the updated adjustment factor λ (t) is obtained according to the following custom function:

in the formula, λ (t) represents an update adjustment factor at time index t.

According to the above embodiment of the invention, as the convergence degree in the training process of the anomaly detection model is higher and higher, the stability of the SOM network is improved, so that the updated adjustment factor is synchronously adjusted by adopting the above method, the condition of 'over-training' of the anomaly detection model can be avoided, and the stability of the training of the anomaly detection model is improved.

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the protection scope of the present invention, although the present invention is described in detail with reference to the preferred embodiments, it should be analyzed by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. An automated operation and maintenance management system of a college cloud computing center is characterized by comprising: a monitoring system, an operation and maintenance system and a management system, wherein,

the operation and maintenance system is used for analyzing the data of the abnormal event when the monitoring system detects the abnormal event and matching corresponding problem information from a problem library;

the management system is used for generating a corresponding operation and maintenance task according to the problem information, synchronizing the operation and maintenance task information to one or more user terminals corresponding to the operation and maintenance task, and updating the operation and maintenance task information according to feedback information of the user terminals;

the monitoring system includes: an anomaly monitoring module that monitors, among other things,

acquiring state information of the cloud computing center within a period of time;

inputting the state information in the period of time into a trained anomaly detection model, and acquiring an operation state detection result output by the model;

the anomaly monitoring module further comprises an anomaly detection model training unit: training an anomaly detection model based on historical state information, wherein the anomaly detection model is modeled based on a SOM network; the abnormality detection model training unit includes:

preparing a layer:

acquiring historical state information of a cloud computing center within a period of time as a training sample, and enabling a historical state information vector g (t) ═ g₁(t),g₂(t),…,g_m(t)]Setting the input vector as an input vector, wherein m represents the dimension of the input vector, and t represents that the corresponding time sequence number of the input vector is t; initializing an input vector and a weight vector δ of a neuron_xy(0) Where x, y is 1,2, …, S, x and y denote the specific location of a neuron in the SOM network, and a neuron weight vector δ_xy(0) Is the same as the dimension of the input vector g (t), S represents the size of the SOM network;

matching layer:

inputting an input vector in a training sample into an SOM network, acquiring a neuron position matched with the input vector, taking Euclidean distance between the input vector and each neuron weight vector as a judgment basis, and selecting a corresponding neuron with the minimum Euclidean distance as a neuron matched with the input vector;

a training layer:

wherein the content of the first and second substances,

to representUpdate control factor for neurons (x, y), I_ΦRepresenting a coordinate vector of a training region center phi in the SOM network, I (x, y) representing a coordinate vector of a neuron node (x, y) in the SOM network, lambda (t) representing an updating adjustment factor, d (t) representing a set neighborhood width, wherein the training region is a region which is less than the set neighborhood width from the training region center, and S represents the size of the SOM network;

wherein, delta_xy(t) represents the weight vector of the neuron (x, y) at the current time index t, δ_xy(t-1) represents the weight vector of the neuron (x, y) at the previous time index t-1, g (t) represents the input vector of the time index t,

an update control factor representing a neuron (x, y);

after the training of the current time sequence number t is finished, starting the training of the next time sequence number t +1, and repeating the training of the training layer until the SOM network converges or exceeds the maximum time sequence number;

in the anomaly detection model training unit, the set neighborhood width d (t) is obtained according to the following custom function:

where d (t) represents the neighborhood width at time number t, S represents the size of the SOM network, b represents a set control factor, λ (1) represents an update adjustment factor at time number t equal to 1, λ (t) represents an update adjustment factor at time number t,

2. The automated operation and maintenance management system according to claim 1, further comprising:

the operation and maintenance task is automatically processed, and the method specifically comprises the following steps:

and calling and executing a corresponding operation and maintenance script from a knowledge base according to the problem information in the operation and maintenance task to process the abnormal event.

3. The automated operation and maintenance management system according to claim 1, wherein the status information comprises computing resource information, network resource information and storage resource information of a cloud computing center; and/or the presence of a gas in the gas,

the operation and maintenance task information comprises: abnormal event information, operation and maintenance task types, related personnel information and operation and maintenance task processing information;

the abnormal event information comprises problem information of an abnormal event, the occurrence time of the abnormal event and the range influenced by the abnormal event;

the operation and maintenance task types comprise automatic maintenance and manual maintenance;

the related personnel information comprises user information corresponding to an influence range of an abnormal event, agent maintenance personnel information of the influence range, operation maintenance personnel information and administrator information;

the operation and maintenance task processing information comprises a processing flow, a processing result and a processing log.

4. The automated operation and maintenance management system according to claim 2, further comprising a database module, further comprising an operation and maintenance task database, a history database, the question bank and the knowledge bank;

the historical database is used for recording historical state information of the cloud computing center, the processing process and the processing result information of the abnormal events, and is also used for synchronizing operation and maintenance scripts generated by operation records in the processing process into the knowledge base;

5. The automated operation and maintenance management system according to claim 1, wherein the management system further comprises: a work order module is arranged on the machine body,

the work order module is used for receiving an operation and maintenance feedback work order sent by a user and generating a corresponding operation and maintenance task according to the work order information, wherein the work order information comprises problem information fed back by the user.

6. The automated operation and maintenance management system according to claim 2, wherein the operation and maintenance system comprises: a question bank input module and a knowledge bank input module,

7. The automated operation and maintenance management system according to claim 1, wherein the monitoring system comprises an alarm module,

and the alarm module is used for sending a corresponding alarm message to the user terminal when the abnormal event is detected.