CN115208742A - Intelligent operation and maintenance management method and system - Google Patents

Intelligent operation and maintenance management method and system Download PDF

Info

Publication number
CN115208742A
CN115208742A CN202210789759.7A CN202210789759A CN115208742A CN 115208742 A CN115208742 A CN 115208742A CN 202210789759 A CN202210789759 A CN 202210789759A CN 115208742 A CN115208742 A CN 115208742A
Authority
CN
China
Prior art keywords
self
healing
platform
maintenance
fault
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210789759.7A
Other languages
Chinese (zh)
Other versions
CN115208742B (en
Inventor
陈磊
文建全
李畅
付彪
张强
马云露
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Trasen Technology Co ltd
Original Assignee
Hunan Trasen Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Trasen Technology Co ltd filed Critical Hunan Trasen Technology Co ltd
Priority to CN202210789759.7A priority Critical patent/CN115208742B/en
Publication of CN115208742A publication Critical patent/CN115208742A/en
Application granted granted Critical
Publication of CN115208742B publication Critical patent/CN115208742B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an intelligent operation and maintenance management method and system, wherein the intelligent operation and maintenance management method can carry out self-healing repair on fault events occurring on a service platform; specifically, a monitoring platform is used for inquiring a fault event, obtaining fault information corresponding to the fault event and then determining a corresponding self-healing program based on the fault information; and then starting a self-healing program through the self-healing platform to repair the service platform. The purpose of automatic operation and maintenance repair aiming at the fault event of the service platform is achieved, compared with the traditional manual operation and maintenance, the operation and maintenance difficulty is greatly reduced, and manpower and material resources are saved, so that the operation and maintenance efficiency is improved; and because the manual operation of operation and maintenance personnel on the service platform is reduced, the safety risk is reduced.

Description

Intelligent operation and maintenance management method and system
Technical Field
The invention relates to the technical field of operation and maintenance, in particular to an intelligent operation and maintenance management method and system.
Background
Information Technology (IT) is a generic term for various technologies that are mainly used to manage and process information. Computer science and communication technology are mainly used for designing, developing, installing and implementing information systems and application software. Information technology is also often referred to as Information and Communication Technology (ICT). Research on information technology includes science, technology, engineering, and management.
The application of information technology includes computer hardware and software, network and communication technology, application software development tool, etc. In enterprises, schools, and other organizations, the information technology architecture is an integrated structure that employs and develops information technology to achieve strategic goals. Information technology includes components of management and technology. The management components comprise mission, function and information requirement, system configuration and information flow; the technical component includes information technology standards, rules, etc. for implementing the management architecture.
The IT operation and maintenance refers to the operation, management and maintenance of IT facilities and business systems to ensure the normal operation of network devices and business systems of users, and is especially important for numerous industrial customers such as telecom, power, education, service organizations, finance/banking, medical treatment, transportation, government and the like to stabilize and effectively maintain the IT operation and maintenance.
The traditional IT operation and maintenance mode needs to depend on the operation and maintenance management skills and operation of operation and maintenance personnel, and needs to be manually switched among different systems and platforms. Because the operation and maintenance team personnel are limited, the operation and maintenance workload is large, the operation and maintenance efficiency is difficult to be improved and the troubleshooting difficulty is large by relying on the traditional manual mode, and the operation and maintenance personnel carry out manual operation and maintenance, so that the safety risk is higher.
Disclosure of Invention
The invention mainly aims to provide an intelligent operation and maintenance management method and system, and aims to solve the problems that operation and maintenance efficiency is difficult to improve, troubleshooting difficulty is high, and safety risk is high in a traditional manual mode.
The technical scheme provided by the invention is as follows:
an intelligent operation and maintenance management method is applied to an intelligent operation and maintenance management system; the system comprises a supervision platform, a service platform and a self-healing platform; the supervision platform is in communication connection with the service platform and the self-healing platform; the self-healing platform is in communication connection with the service platform; the service platform runs a service program; the self-healing platform stores a self-healing program; the self-healing program is a preset program for repairing a fault event; the method comprises the following steps:
the supervision platform inquires whether a fault event occurs when the service platform runs a service program;
if so, the supervision platform acquires fault information corresponding to the fault event;
the supervision platform acquires a corresponding self-healing program based on the fault information and marks the self-healing program as a target program;
and the self-healing platform starts the target program to repair the service platform and generates a repair result.
Preferably, the fault information includes a fault description; the administration platform comprises a first database; the first database stores self-healing information, wherein the self-healing information comprises fault description and a self-healing name corresponding to the fault type, and the self-healing name is the name of a self-healing program; the monitoring platform acquires a corresponding self-healing program based on the fault information, marks the self-healing program as a target program, and comprises the following steps:
the supervision platform acquires the fault description in the fault information;
the supervision platform retrieves whether the self-healing information containing the fault description exists in the first database;
if so, the supervision platform marks the self-healing information containing the fault description as target information;
and the supervision platform marks the self-healing program corresponding to the self-healing name in the target information as a target program.
Preferably, the system further comprises an operation and maintenance client in communication connection with the supervision platform; the supervision platform retrieves whether the self-healing information containing the fault type exists in the first database, and then further comprises:
if not, the supervision platform sends the fault information to the operation and maintenance client.
Preferably, the monitoring platform sends the fault information to the operation and maintenance client, and then further includes:
the operation and maintenance client displays the fault information;
the operation and maintenance client side judges whether a self-healing name input by a user is acquired;
if so, the operation and maintenance client marks the self-healing name input by the user as an input name;
the operation and maintenance client sends the input name to the supervision platform;
and the monitoring platform marks the self-healing program corresponding to the input name as a target program.
Preferably, the fault information further includes a fault location; the self-healing platform starts the target program to repair the service platform and generates a repair result, including:
the supervision platform sends the fault position to the self-healing platform;
the self-healing platform starts the target program to repair the fault position of the service platform;
and the self-healing platform generates a repairing result.
Preferably, the self-healing platform generates a healing result, and then further includes:
and displaying the repair result through the operation and maintenance client, wherein the repair result comprises repair success and repair failure.
Preferably, the operation and maintenance client is in communication connection with the self-healing platform; the operation and maintenance client side judges whether the self-healing name input by the user is acquired, and then the operation and maintenance client side further comprises the following steps:
if not, the operation and maintenance client side obtains a program input by a user, wherein the program is a program which is input by the user and used for repairing the fault event;
the operation and maintenance client sends the compiling program to the self-healing platform;
and the self-healing platform starts the writing program to repair the service platform and generate a repair result.
Preferably, the fault information further includes a fault level; the supervision platform sends the fault information to the operation and maintenance client, and the supervision platform comprises:
the supervision platform acquires the fault grade in the fault information and generates corresponding processing time limit based on the fault grade, wherein the fault grade comprises a common grade, a severe grade and a severe grade from low to high, and the lower the fault grade is, the longer the corresponding processing time limit is;
the supervision platform sends the fault information and the corresponding processing time limit to the operation and maintenance client;
and the operation and maintenance client displays the processing time limit.
Preferably, the operation and maintenance client is used for logging in different operation and maintenance accounts; each operation and maintenance account is preset with a unique operation and maintenance authority, and the operation and maintenance authority comprises a common level, a management level and a master supervision level from low to high; the displaying the processing time limit at the operation and maintenance client further comprises:
the operation and maintenance client side judges whether a compiling program corresponding to the fault information and input by a user is obtained within the processing time limit;
if yes, the operation and maintenance client sends the compiling program to the self-healing platform;
the self-healing platform starts the compiling program to repair the service platform and generates a repair result;
if not, the operation and maintenance client side judges whether a reporting instruction input by a user is obtained or not;
when a reporting instruction input by a user is obtained, the operation and maintenance client sends fault information to the operation and maintenance client which logs in an operation and maintenance account with higher operation and maintenance authority than the currently logged in operation and maintenance account.
The invention also provides an intelligent operation and maintenance management system, which is applied to the intelligent operation and maintenance management method; the system comprises a supervision platform, a service platform and a self-healing platform; the supervision platform is in communication connection with the service platform and the self-healing platform; the self-healing platform is in communication connection with the service platform; the service platform runs a service program; the self-healing platform stores a self-healing program; the self-healing program is a preset program for repairing a fault event.
Through above-mentioned technical scheme, can realize following beneficial effect:
the intelligent operation and maintenance management method provided by the invention can carry out self-healing repair on the fault event occurring in the service platform; specifically, a monitoring platform is used for inquiring a fault event, obtaining fault information corresponding to the fault event, and then determining a corresponding self-healing program based on the fault information; and then starting a self-healing program through the self-healing platform to repair the service platform. The purpose of automatic operation and maintenance repair aiming at the fault event of the service platform is achieved, compared with the traditional manual operation and maintenance, the operation and maintenance difficulty is greatly reduced, and manpower and material resources are saved, so that the operation and maintenance efficiency is improved; and because the manual operation of operation and maintenance personnel on the service platform is reduced, the safety risk is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
Fig. 1 is a flowchart of a first embodiment of an intelligent operation and maintenance management method according to the present invention.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides an intelligent operation and maintenance management method.
As shown in fig. 1, in a first embodiment of the intelligent operation and maintenance management method provided by the present invention, the intelligent operation and maintenance management method is applied to an intelligent operation and maintenance management system; the system comprises a supervision platform, a service platform and a self-healing platform; the supervision platform is in communication connection with the service platform and the self-healing platform; the self-healing platform is in communication connection with the service platform; the service platform runs a service program; the self-healing platform stores a self-healing program; in the embodiment, the self-healing program is obtained by operation and maintenance staff according to long-term experience accumulation, and the operation and maintenance staff perform automatic processing on most of the solidified alarm troubleshooting and fault recovery steps to form the self-healing program, so that when the service platform fails, the self-healing platform automatically executes the self-healing program to repair the service platform; the embodiment comprises the following steps:
step S110: and the supervision platform inquires whether a fault event occurs when the service platform operates a service program.
Specifically, the supervision platform refers to a platform for querying or receiving a fault event of the service platform, in this embodiment, the supervision platform is a supervision platform developed based on a whale operation and maintenance platform, and in practical application, the supervision platform may also be a platform generated in other development manners.
Failure events include, but are not limited to: business anomalies and health anomalies; common traffic anomalies include: business anomalies caused by network or Internet Data Center (IDC) anomalies, business anomalies caused by key module performance problems, business anomalies caused by host hardware or system anomalies, false business anomalies caused by invalid error notices and the like. The business exception caused by the exception of the host hardware or the system has the highest occurrence ratio.
The health abnormality refers to various acquired indexes of the system (such as temperature of each part in the server, rotating speed of a radiator, current value of a mainboard, voltage value of the mainboard and the like), and the various indexes are used for being compared with index values so as to find out whether the abnormality exists or not, and the abnormality can be regarded as a fault event.
If yes, go to step S120: and the supervision platform acquires fault information corresponding to the fault event.
Specifically, the fault information includes a fault description, a fault type, a fault level and a fault location; through the fault information, the detailed information corresponding to the fault event can be fully known; the failure location refers to information of a failed machine, such as an IP address of the failed machine, a MAC code of the failed machine, and the like, and may also be a virtual memory in the failed machine, such as a C disk, a D disk, and the like; the fault type is the type of the fault event, specifically, the service abnormality and the health abnormality in the above description; the fault description is a comprehensive description of the fault event, for example: and determining that the maximum capacity of the C disk in the computer is 100G and the current actual capacity is 99.5G according to the fault event, and then determining that the fault position is the C disk and the fault is described as insufficient disk space.
Step S130: and the supervision platform acquires a corresponding self-healing program based on the fault information and marks the self-healing program as a target program.
Specifically, a self-healing program for repairing the fault is directly determined according to the fault information.
Step S140: and the self-healing platform starts the target program to repair the service platform and generates a repair result.
And starting the target program to repair the service platform and generate a repair result, thereby realizing self-healing repair.
The intelligent operation and maintenance management method provided by the invention can carry out self-healing repair on the fault event occurring in the service platform; specifically, a monitoring platform is used for inquiring a fault event, obtaining fault information corresponding to the fault event, and then determining a corresponding self-healing program based on the fault information; and then starting a self-healing program through the self-healing platform to repair the service platform. The purpose of automatic operation and maintenance repair aiming at the fault event of the service platform is achieved, compared with the traditional manual operation and maintenance, the operation and maintenance difficulty is greatly reduced, and manpower and material resources are saved, so that the operation and maintenance efficiency is improved; and because the manual operation of operation and maintenance personnel on the service platform is reduced, the safety risk is reduced.
In a second embodiment of the intelligent operation and maintenance management method provided by the present invention, based on the first embodiment, the fault information includes a fault description; the administration platform comprises a first database; the first database stores self-healing information, wherein the self-healing information comprises fault description and a self-healing name corresponding to the fault type, and the self-healing name is the name of a self-healing program; step S130, including the steps of:
step S210: the supervision platform acquires the fault description in the fault information.
For example, a fault in the fault information that occurs is described as: "disk space is not sufficient".
Step S220: the supervisory platform retrieves whether the self-healing information including the fault description exists in the first database.
Specifically, self-healing information including "insufficient disk space" in the first database is obtained.
If yes, go to step S230: the supervision platform marks the self-healing information containing the fault description as target information.
Specifically, the self-healing information containing "insufficient disk space" in the first database is marked as target information.
Step S240: and the supervision platform marks the self-healing program corresponding to the self-healing name in the target information as a target program.
Specifically, a self-healing program capable of repairing the type of "insufficient disk space" is acquired and marked as a target program.
In a third embodiment of the intelligent operation and maintenance management method provided by the present invention, based on the second embodiment, the system further includes an operation and maintenance client communicatively connected to the supervision platform (where the operation and maintenance client may be a PC terminal or a mobile phone terminal, and the operation and maintenance client is a client operated by an operation and maintenance worker); step S220, the following steps are also included:
if not, go to step S310: and the supervision platform sends the fault information to the operation and maintenance client.
Specifically, if not, it is stated that no self-healing information including the fault description exists in the first database, that is, no self-healing program capable of solving the fault event is found, and the self-healing program cannot be repaired, so that the operation and maintenance personnel need to repair the fault event manually, and the supervision platform directly sends the fault information to the operation and maintenance client.
In a fourth embodiment of the intelligent operation and maintenance management method provided by the present invention, based on the third embodiment, step S310 further includes the following steps:
step S410: and the operation and maintenance client displays the fault information.
Step S420: and the operation and maintenance client judges whether the self-healing name input by the user is acquired.
If yes, go to step S430: and the operation and maintenance client marks the self-healing name input by the user as an input name.
Step S440: and the operation and maintenance client sends the input name to the supervision platform.
Step S450: and the monitoring platform marks the self-healing program corresponding to the input name as a target program.
The method and the system for repairing the service platform are characterized in that an operation and maintenance worker manually inputs the name of a self-healing program, so that the self-healing program capable of solving a fault event is manually selected, and the self-healing platform can start the self-healing program to repair the service platform; the self-healing program corresponding to the fault event can be found out by the monitoring platform, but actually, the self-healing platform has the situation of the self-healing program capable of solving the fault event.
In a fifth embodiment of the intelligent operation and maintenance management method provided by the present invention, based on the second embodiment, the fault information further includes a fault location; step S140, including the steps of:
step S510: and the supervision platform sends the fault position to the self-healing platform.
Step S520: and the self-healing platform starts the target program to repair the fault position of the service platform.
Step S530: and the self-healing platform generates a repairing result.
The purpose of this embodiment is to start the self-healing program to repair the fault event for the fault location, so as to repair the fault event accurately.
In a sixth embodiment of the intelligent operation and maintenance management method provided by the present invention, based on the fifth embodiment, step S530 further includes the following steps:
step S610: and displaying the repair result through the operation and maintenance client, wherein the repair result comprises the repair success and the repair failure.
The purpose of this embodiment is to send the result of self-healing to the operation and maintenance client and show to the operation and maintenance personnel in time know the repair result of self-healing procedure.
In a seventh embodiment of the intelligent operation and maintenance management method provided by the present invention, based on the sixth embodiment, the operation and maintenance client is in communication connection with the self-healing platform; step S420, the following steps are also included:
if not, go to step S710: the operation and maintenance client side obtains a compiling program input by a user, wherein the compiling program is a program which is input by the user and used for repairing the fault event.
Step S720: and the operation and maintenance client sends the writing program to the self-healing platform.
Step S730: and the self-healing platform starts the writing program to repair the service platform and generate a repair result.
Specifically, when there is no self-healing program capable of solving a failure event in the self-healing platform, the operation and maintenance staff manually operate and maintain the self-healing platform, that is, manually write the self-healing program capable of solving the failure event and upload the self-healing program to the self-healing platform, and then the self-healing platform operates the manually written self-healing program to repair the service platform and generate a repair result.
In an eighth embodiment of the intelligent operation and maintenance management method provided by the present invention, based on the seventh embodiment, step S730 further includes the following steps:
step S810: and the supervision platform acquires the repair result and judges whether the repair result is successful or not.
If yes, go to step S820: and the self-healing platform stores the writing program.
Step S830: and the supervision platform acquires the name of the written program and marks the name as the written name.
Step 840: and the supervision platform acquires the fault description in the fault information corresponding to the fault event repaired by the writing program and marks the fault description as a description to be newly added.
Step S850: and the supervision platform establishes a corresponding relation between the writing name and the description to be newly added and then stores the writing name and the description to be newly added into the first database.
The purpose of this embodiment is to establish a corresponding relationship between the manually written self-healing program and the fault description corresponding to the fault event and store the relationship in the first database, so as to automatically call the manually written self-healing program for self-healing repair in the following process, if the manually written self-healing program can successfully repair the fault event.
In a ninth embodiment of the intelligent operation and maintenance management method provided by the present invention, based on the third embodiment, the fault information further includes a fault level; step S310, including the following steps:
step S910: and the supervision platform acquires the fault grade in the fault information and generates corresponding processing time limit based on the fault grade, wherein the fault grade comprises a common grade, a severe grade and a severe grade from low to high, and the lower the fault grade is, the longer the corresponding processing time limit is.
Step S920: and the supervision platform sends the fault information and the corresponding processing time limit to the operation and maintenance client.
Step S930: and the operation and maintenance client displays the processing time limit.
Specifically, in this embodiment, for a case that an operation and maintenance worker needs to manually repair a fault event, that is, the monitoring platform generates a corresponding processing time limit according to a fault level of the fault event, and the lower the fault level is, it is described that the less an adverse effect of the fault event is, the more sufficient time is allowed for the operation and maintenance worker to obtain processing and solving, and the longer the corresponding processing time limit is; thus, the processing time limit left for operation and maintenance personnel can be more accurate.
In a tenth embodiment of the intelligent operation and maintenance management method provided by the present invention, based on the ninth embodiment, the operation and maintenance client is configured to log in different operation and maintenance accounts; each operation and maintenance account is preset with a unique operation and maintenance authority, and the operation and maintenance authority comprises a common level, a management level and a master supervision level from low to high; step S930, which further includes the following steps:
step S1010: and the operation and maintenance client judges whether the written program corresponding to the fault information input by the user is obtained within the processing time limit.
If yes, go to step S1020: and the operation and maintenance client sends the writing program to the self-healing platform.
Step S1030: and the self-healing platform starts the writing program to repair the service platform and generate a repair result.
Specifically, if the operation and maintenance personnel upload a writing program for repairing the fault event within a specified processing time limit, the writing program is automatically started to repair the service platform, and a repair result is generated.
If not, go to step S1040: and the operation and maintenance client judges whether a reporting instruction input by a user is acquired.
Step S1050: when a reporting instruction input by a user is obtained, the operation and maintenance client sends fault information to the operation and maintenance client which logs in an operation and maintenance account with higher operation and maintenance authority than the currently logged in operation and maintenance account.
Specifically, if the operation and maintenance personnel do not upload the writing program for repairing the fault event within the specified processing time limit, the fault information corresponding to the fault event is automatically sent to the operation and maintenance client corresponding to the operation and maintenance account with the higher operation and maintenance authority, that is, the fault information is reported to the operation and maintenance personnel with the higher authority for processing.
The invention also provides an intelligent operation and maintenance management system, which is applied to the intelligent operation and maintenance management method; the system comprises a supervision platform, a service platform and a self-healing platform; the supervision platform is in communication connection with the service platform and the self-healing platform; the self-healing platform is in communication connection with the service platform; the service platform runs a service program; the self-healing platform stores a self-healing program; the self-healing program is a preset program for repairing a fault event.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention or portions thereof contributing to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (such as a ROM/RAM, a magnetic disk, and an optical disk), and includes several instructions for enabling a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
While the present invention has been described with reference to the particular illustrative embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but is intended to cover various modifications, equivalent arrangements, and equivalents thereof, which may be made by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. An intelligent operation and maintenance management method is characterized by being applied to an intelligent operation and maintenance management system; the system comprises a supervision platform, a service platform and a self-healing platform; the supervision platform is in communication connection with the service platform and the self-healing platform; the self-healing platform is in communication connection with the service platform; the service platform runs a service program; the self-healing platform stores a self-healing program; the self-healing program is a preset program for repairing a fault event; the method comprises the following steps:
the supervision platform inquires whether a fault event occurs when the service platform operates a service program;
if so, the supervision platform acquires fault information corresponding to the fault event;
the supervision platform acquires a corresponding self-healing program based on the fault information and marks the self-healing program as a target program;
and the self-healing platform starts the target program to repair the service platform and generates a repair result.
2. The intelligent operation and maintenance management method according to claim 1, wherein the fault information comprises a fault description; the administration platform comprises a first database; the first database stores self-healing information, wherein the self-healing information comprises fault description and a self-healing name corresponding to the fault type, and the self-healing name is the name of a self-healing program; the monitoring platform acquires a corresponding self-healing program based on the fault information, marks the self-healing program as a target program, and comprises the following steps:
the supervision platform acquires the fault description in the fault information;
the supervision platform retrieves whether the self-healing information containing the fault description exists in the first database;
if so, the supervision platform marks the self-healing information containing the fault description as target information;
and the supervision platform marks the self-healing program corresponding to the self-healing name in the target information as a target program.
3. The intelligent operation and maintenance management method according to claim 2, wherein the system further comprises an operation and maintenance client communicatively connected to the supervision platform; the supervision platform retrieves whether the self-healing information containing the fault type exists in the first database, and then further comprises:
if not, the supervision platform sends the fault information to the operation and maintenance client.
4. The intelligent operation and maintenance management method according to claim 3, wherein the supervision platform sends the fault information to the operation and maintenance client, and then further comprises:
the operation and maintenance client displays the fault information;
the operation and maintenance client side judges whether a self-healing name input by a user is acquired;
if so, the operation and maintenance client marks the self-healing name input by the user as an input name;
the operation and maintenance client sends the input name to the supervision platform;
and the monitoring platform marks the self-healing program corresponding to the input name as a target program.
5. The intelligent operation and maintenance management method according to claim 2, wherein the fault information further includes a fault location; the self-healing platform starts the target program to repair the service platform and generates a repair result, including:
the supervision platform sends the fault position to the self-healing platform;
the self-healing platform starts the target program to repair the fault position of the service platform;
and the self-healing platform generates a repair result.
6. The intelligent operation and maintenance management method according to claim 5, wherein the self-healing platform generates a repair result, and then further comprises:
and displaying the repair result through the operation and maintenance client, wherein the repair result comprises repair success and repair failure.
7. The intelligent operation and maintenance management method according to claim 6, wherein the operation and maintenance client is in communication connection with the self-healing platform; the operation and maintenance client side judges whether the self-healing name input by the user is acquired, and then the operation and maintenance client side further comprises the following steps:
if not, the operation and maintenance client side obtains a program input by a user, wherein the program is a program which is input by the user and used for repairing the fault event;
the operation and maintenance client sends the compiling program to the self-healing platform;
and the self-healing platform starts the writing program to repair the service platform and generate a repair result.
8. The intelligent operation and maintenance management method according to claim 3, wherein the fault information further includes a fault level; the supervision platform sends the fault information to the operation and maintenance client, and the method comprises the following steps:
the supervision platform acquires the fault grade in the fault information and generates corresponding processing time limit based on the fault grade, wherein the fault grade comprises a common grade, a severe grade and a severe grade from low to high, and the lower the fault grade is, the longer the corresponding processing time limit is;
the supervision platform sends the fault information and the corresponding processing time limit to the operation and maintenance client;
and the operation and maintenance client displays the processing time limit.
9. The intelligent operation and maintenance management method according to claim 8, wherein the operation and maintenance client is used for logging in different operation and maintenance accounts; each operation and maintenance account is preset with a unique operation and maintenance authority, and the operation and maintenance authority comprises a common level, a management level and a master supervision level from low to high; the displaying the processing time limit at the operation and maintenance client further comprises:
the operation and maintenance client side judges whether a compiling program corresponding to the fault information and input by a user is obtained within the processing time limit;
if yes, the operation and maintenance client sends the compiling program to the self-healing platform;
the self-healing platform starts the compiling program to repair the service platform and generates a repair result;
if not, the operation and maintenance client side judges whether a reporting instruction input by a user is obtained or not;
when a reporting instruction input by a user is obtained, the operation and maintenance client sends fault information to the operation and maintenance client which logs in an operation and maintenance account with higher operation and maintenance authority than the currently logged in operation and maintenance account.
10. An intelligent operation and maintenance management system, which is applied to the intelligent operation and maintenance management method according to any one of claims 1-9; the system comprises a supervision platform, a service platform and a self-healing platform; the supervision platform is in communication connection with the service platform and the self-healing platform; the self-healing platform is in communication connection with the service platform; the service platform runs a service program; the self-healing platform stores a self-healing program; the self-healing program is a preset program for repairing a fault event.
CN202210789759.7A 2022-07-06 2022-07-06 Intelligent operation and maintenance management method and system Active CN115208742B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210789759.7A CN115208742B (en) 2022-07-06 2022-07-06 Intelligent operation and maintenance management method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210789759.7A CN115208742B (en) 2022-07-06 2022-07-06 Intelligent operation and maintenance management method and system

Publications (2)

Publication Number Publication Date
CN115208742A true CN115208742A (en) 2022-10-18
CN115208742B CN115208742B (en) 2024-03-29

Family

ID=83581055

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210789759.7A Active CN115208742B (en) 2022-07-06 2022-07-06 Intelligent operation and maintenance management method and system

Country Status (1)

Country Link
CN (1) CN115208742B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105656672A (en) * 2016-01-08 2016-06-08 成都网丁科技有限公司 APP-based fault prejudgment diagnostic tool and method
CN106921526A (en) * 2017-04-13 2017-07-04 湖南森纳信息科技有限公司 Intelligent campus network O&M system
CN109088773A (en) * 2018-08-24 2018-12-25 广州视源电子科技股份有限公司 Fault self-recovery method, apparatus, server and storage medium
CN110650036A (en) * 2019-08-30 2020-01-03 中国人民财产保险股份有限公司 Alarm processing method and device and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105656672A (en) * 2016-01-08 2016-06-08 成都网丁科技有限公司 APP-based fault prejudgment diagnostic tool and method
CN106921526A (en) * 2017-04-13 2017-07-04 湖南森纳信息科技有限公司 Intelligent campus network O&M system
CN109088773A (en) * 2018-08-24 2018-12-25 广州视源电子科技股份有限公司 Fault self-recovery method, apparatus, server and storage medium
CN110650036A (en) * 2019-08-30 2020-01-03 中国人民财产保险股份有限公司 Alarm processing method and device and electronic equipment

Also Published As

Publication number Publication date
CN115208742B (en) 2024-03-29

Similar Documents

Publication Publication Date Title
US7266734B2 (en) Generation of problem tickets for a computer system
CN107800783B (en) Method and device for remotely monitoring server
CN111314422A (en) Kafka-based message processing method and system, storage medium and computer equipment
CN113434327B (en) Fault processing system, method, equipment and storage medium
CN110635950A (en) Double-data-center disaster recovery system
CN111064600A (en) ONU area power-off and off-line monitoring method and device
CN111224819A (en) Distributed messaging system
CN112141832A (en) Visual operation platform of elevator thing networking
CN113852476A (en) Method, device and system for determining abnormal event associated object
CN113411209A (en) Distributed password service full-link detection system and method
CN115208742A (en) Intelligent operation and maintenance management method and system
CN111897643A (en) Thread pool configuration system, method, device and storage medium
CN109104314B (en) Method and device for modifying log configuration file
CN114679295B (en) Firewall security configuration method and device
CN116016209A (en) Network automation method and device
CN105790975A (en) Service processing operation execution method and device
CN112463427A (en) Fault information processing method and device, computer equipment and storage medium
US20030014481A1 (en) Management system and method for sharing operating information
CN217282957U (en) Network security intrusion detection defense device
CN112667565B (en) Storage unit file management method and system based on FUSE
CN113629878B (en) Remote control verification method and system for three-remote switch of power distribution network
CN117370052B (en) Microservice fault analysis method, device, equipment and storage medium
CN114785673B (en) Method and device for acquiring abnormal information during active-standby switching
WO2006079040A2 (en) Shared data center monitor
CN117632733A (en) Verification method and device for verifying accounting system and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant