CN112149975A - APM monitoring system and method based on artificial intelligence - Google Patents

APM monitoring system and method based on artificial intelligence Download PDF

Info

Publication number
CN112149975A
CN112149975A CN202010956247.6A CN202010956247A CN112149975A CN 112149975 A CN112149975 A CN 112149975A CN 202010956247 A CN202010956247 A CN 202010956247A CN 112149975 A CN112149975 A CN 112149975A
Authority
CN
China
Prior art keywords
performance
index
unit
application
application program
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010956247.6A
Other languages
Chinese (zh)
Other versions
CN112149975B (en
Inventor
朱桂芝
杨克伟
康俊健
林小莎
伍闵
许宜斌
李雅辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Eastcom Software Technology Co ltd
Original Assignee
Hangzhou Eastcom Software Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Eastcom Software Technology Co ltd filed Critical Hangzhou Eastcom Software Technology Co ltd
Priority to CN202010956247.6A priority Critical patent/CN112149975B/en
Publication of CN112149975A publication Critical patent/CN112149975A/en
Application granted granted Critical
Publication of CN112149975B publication Critical patent/CN112149975B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides an APM monitoring system based on artificial intelligence. In one embodiment, the index collection unit is used for collecting application program performance indexes of the application program microservice operation platform and the relation between the applications; the data analysis unit analyzes the performance index through an artificial intelligence analysis model; the alarm unit gives performance index alarm in real time and tracks and positions abnormal indexes according to the analysis result of the data analysis unit on the performance indexes; the automatic operation and maintenance unit automatically triggers the automatic expansion and contraction of the virtualization equipment of the application program microservice operation platform according to the analysis result of the data analysis unit on the performance index, and recovers the service; and the APM calls a chain topology display unit for displaying the relationship between the applications in a topological graph mode. The automatic expansion and contraction capacity recovery service of the virtualization equipment is triggered based on the automatic operation and maintenance model from the acquisition of the application performance index, so that an automatic operation and maintenance closed loop is formed.

Description

APM monitoring system and method based on artificial intelligence
Technical Field
The invention relates to the technical field of automatic operation and maintenance, in particular to an APM monitoring system and method based on artificial intelligence.
Background
The apm (application Performance management), that is, application Performance management, belongs to IT operation and maintenance management. The method mainly aims at monitoring and optimizing the IT application performance and user experience of enterprise key business, improves the reliability and quality of enterprise IT application, ensures that users obtain good service, and reduces the total IT ownership cost (TCO).
In the prior art, monitoring of IT devices and application software running thereon is performed according to a hierarchy, and the hierarchy is as shown in fig. 1, and is divided into an IAAS layer, a PAAS layer, and an SAAS layer, i.e., infrastructure (such as a network, a host, and a CPU, a memory, and a disk on a virtual machine), a system (an operating system, middleware, and a database), an application (a subsystem, a module, and an application amount, a transaction amount, a success rate, a failure rate of a function), and a front end (a user application page, an action, and the like). The monitoring process is as follows:
collecting monitoring indexes: the method comprises the following steps that maintenance personnel issue collection tasks periodically or automatically, and collect monitoring indexes such as resources, performance and alarms from an IAAS layer to an SAAS layer;
resource management: and (4) building a resource model, presenting resource data and carrying out simple statistical analysis on the data.
Topology management: manually constructing a topology from an IAAS layer to an SAAS layer;
and (3) performance management: and (3) making a monitoring strategy, presenting performance data, and sampling and statistically analyzing historical data in a certain period through a baseline algorithm.
And (4) alarm display: and setting a threshold value, and automatically alarming the service performance and the performance index of the operated IT equipment.
The existing technical scheme aims at solving the problems of monitoring and alarming of an IAAS layer and a PAAS layer, and has less attention to the performance and management of the application. In addition, as technology is updated, the hardware aspect: with network hardware function virtualization (NFV), cloud, dynamic scaling on demand and automated deployment can be performed; software aspect: software micro-servitization and distributed architecture transformation. The performance of the application, the alarm monitoring and the automation operation and maintenance are more and more problematic, for example, one request may involve a plurality of services, the service itself may depend on other services, the whole request path constitutes a mesh call chain, and once an exception occurs at a certain node in the whole call chain, the stability of the whole call chain is affected.
In addition, in the conventional APM monitoring system, artificial intelligence is only partially applied, and an automatic operation and maintenance closed loop from index acquisition to application performance self-healing cannot be formed. The existing APM monitoring system has the following disadvantages:
the index acquisition mode needs maintenance personnel to issue acquisition tasks regularly or automatically, and the workload is complex and errors are easy to occur;
the APM topology cannot automatically update the topological graph of the service system and the IT equipment operated by the service system, and the timeliness is poor;
and (5) alarm display, which cannot automatically alarm the APM service performance and the performance index of the operated IT equipment.
Failure of rapid fault self-healing: the existing application program deployment mode is changed greatly, and based on the traditional automatic operation and maintenance thought, the rapid self-healing of the service fault is difficult to achieve.
Disclosure of Invention
In view of this, the embodiment of the present application provides an APM monitoring system and a monitoring method based on artificial intelligence.
In a first aspect, the present application provides an artificial intelligence-based APM monitoring system, including:
the index acquisition unit is used for acquiring the application program performance indexes of the application program microservice operation platform and the relation between the applications;
the data analysis unit analyzes the performance index through an artificial intelligence analysis model;
the alarm unit gives performance index alarm in real time and tracks and positions abnormal indexes according to the analysis result of the data analysis unit on the performance indexes;
the automatic operation and maintenance unit automatically triggers the automatic expansion and contraction of the virtualization equipment of the application program microservice operation platform according to the analysis result of the data analysis unit on the performance index, and recovers the service;
and the APM calls a chain topology display unit for displaying the relationship between the applications in a topological graph mode.
Optionally, the system further comprises: data storage unit
And the data storage unit is used for storing the relation between the performance indexes and the application acquired by the index acquisition unit and storing the analysis statistical result of the data analysis unit.
Optionally, the system further comprises: a data query unit;
and the data query unit is used for enabling a user to query the performance index, the relation among the applications and the analysis result of the performance index.
Optionally, the system further comprises: a data display unit;
the data display unit is used for displaying the applied performance index data according to the query result of the user;
and the APM call chain topology display unit is used for displaying hardware and software components related to the application program according to the query result of the user, displaying the interaction among the software components and graphically displaying the path of the business real-time transaction.
Optionally, the index collecting unit is specifically configured to: the performance indexes of the application programs are collected in a log point burying mode, and the relation between the applications is automatically discovered through deploying the Agent.
Optionally, the performance index of the application includes: one or more of a monitoring index, a host index, a storage index, a middleware index, a virtual machine index, an application and module index, and an inter-service invocation index of the network.
In a second aspect, the present application provides an artificial intelligence-based APM monitoring method, including:
collecting performance indexes of an application program;
monitoring and analyzing the performance index of the application program in real time, and storing the performance index and the analysis result of the performance index;
and triggering to alarm the performance index exceeding the upper threshold or lower than the lower threshold according to the analysis result of the performance index, or triggering to automatically expand and contract the virtual equipment.
Optionally, the collecting the performance index of the application program and the relationship between the applications includes: the performance indexes of the application program are collected in a log point burying mode, and the application relation is automatically discovered through deploying the Agent.
Optionally, after the performing real-time monitoring and data analysis on the performance index of the application program and storing the performance index and the analysis result of the performance index, the method further includes:
responding to the query operation of a user, and displaying the collected performance indexes of the application programs and the analysis results of the performance indexes;
or responding to the query operation of the user to display the related hardware and software components of the application program, the interaction between the software components and the path of the business real-time transaction in a graphical mode.
Optionally, the triggering of automatic scaling of the virtualization device includes:
based on the automatic operation and maintenance model, when the virtual memory of the application program is insufficient, the automatic capacity expansion and contraction is triggered, and the service is recovered.
In one embodiment, the performance indexes of the application program are collected from the application program micro-service operation platform through an index collection unit, the collected application performance indexes are sent to a data analysis unit through a performance monitoring unit to be subjected to statistical analysis, and the statistical analysis results are stored in a data storage unit. And the automatic operation and maintenance unit triggers the automatic expansion and contraction of the virtualization equipment and recovers the service based on the automatic operation and maintenance model according to the analysis result of the data analysis unit. In the embodiment of the invention, the automatic expansion and contraction capacity recovery service of the virtualization equipment is triggered based on the automatic operation and maintenance model from the acquisition of the application performance index, so that an automatic operation and maintenance closed loop is formed. Furthermore, the performance monitoring unit monitors the performance indexes of the application programs in real time, the APM calling chain topology display unit displays the calling relation among the applications in a topology graph mode to achieve more visual monitoring of the calling relation of the applications, and the alarm unit alarms abnormal indexes and rapidly positions performance faults to achieve comprehensive optimization of application performance.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of a monitoring hierarchy in the prior art;
FIG. 2 is a schematic structural diagram of an artificial intelligence based APM monitoring system according to the present invention;
fig. 3 is a flowchart of an artificial intelligence-based APM monitoring method according to the present invention.
Detailed Description
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
In the embodiment of the invention, the APM monitoring system based on artificial intelligence is provided, and mainly focuses on performance and management of an application, and a closed loop is formed by application service index acquisition, performance index monitoring, performance alarm, topology display, intelligent operation and maintenance and application service self-healing.
Fig. 2 is a diagram of an artificial intelligence based APM monitoring system according to the present invention, and as shown in fig. 2, an artificial intelligence based APM monitoring system according to an embodiment of the present invention includes: the system comprises an application server micro-service operation platform 201, an index acquisition unit 202, a performance monitoring unit 203, a data analysis unit 204, a data storage unit 205, an automation operation and maintenance unit 206, an alarm unit 207, a data query unit 208, a data display unit 209 and an APM call chain topology display unit 210.
The application microservice runtime platform 201 includes at least one microservice and infrastructure and containers that reference the execution of the microservice.
The index collection unit 202 is used for collecting performance data of applications and automatically discovering relationships between the applications.
In one possible embodiment, the index collection unit 202 collects the performance index of the application through a log burial point. And automatically discovering the relationship between the applications by deploying the agents.
Wherein, the index of burying a collection through the log includes: monitoring indexes of the network, host indexes, storage indexes, middleware indexes, virtual machine indexes, application and module indexes, service call indexes and the like.
The monitoring indexes of the network comprise: port outflow utilization, port inflow utilization, cpu utilization, and memory utilization.
The host indicators include: cpu utilization, memory utilization, disk utilization, network card mac address, server resource information (running time, cpu core number, cpu type, memory size, operating system identification).
The storage index includes: SqlServer index: total number of available pages for the database, number of starting transactions/S, rate of distribution transactions for the instance, log size, etc.
The middleware indexes include: monitoring indexes of Nginx: connectivity, number of requests processed, number of currently active connections, average number of connections per second, etc.
The virtual machine metrics include: cpu utilization, memory utilization, disk utilization, network card input (bps), network card output (bps), etc.
Application and module metrics include: application amount, success rate, failure rate and the like of subsystems, modules and functions.
The inter-service invocation indicators include: availability, exceptions, response time, current number of waiting strokes, number of threads, number of service calls, amount of access, service availability, etc.
Automatically discovering relationships between applications by deploying agents includes: the network is scanned regularly through the server, and the application relation is automatically discovered after the agent is deployed.
The performance monitoring unit 203 analyzes the log data through the ELK platform and monitors the performance indexes acquired and applied by the index acquisition unit 202.
The data analysis unit 204 analyzes the performance index of the application acquired by the index acquisition unit 202 through an artificial intelligence analysis model.
The data storage unit 205 is used for storing the performance indexes of the applications collected by the index collection unit 202 and storing the analysis statistical results of the data analysis unit 203.
The automation operation and maintenance unit 206 triggers automatic capacity expansion and contraction to recover the service when the application performance is deteriorated, such as insufficient virtual memory, based on the automation operation and maintenance model.
In a possible embodiment, the automation operation and maintenance unit 206, through the collected memory usage rate of the application program, when the memory usage rate is lower than a certain threshold lower limit, the system prompts that the virtual memory is insufficient, and the automation operation and maintenance unit 206 creates a new copy according to a pre-configured elastic capacity expansion policy and a configuration type of a current container, and automatically adds the new copy to an existing cluster of the application program.
The alarm unit 207 is configured to alarm performance indicators of applications exceeding a threshold, give alarms for performance indicators of the APM service and the IT devices operating therein in real time based on an alarm analysis model, perform non-intrusive point burying, provide a code-level tracking and positioning fault based on a distributed tracking application program performance monitoring system, and perform code-level tracking and positioning fault.
In one possible embodiment, the upper threshold of the CPU utilization of the virtual machine is set to 70%, and when the obtained current index exceeds the upper threshold, an alarm is triggered.
The data query unit 208 is used for querying the performance index and the analysis statistic result of the application.
The data display unit 209 is used to display performance index data of the application.
The APM call chain topology presentation unit 210 is used for presenting related hardware and software components of an application program, presenting interaction among the components, and clearly and graphically presenting a path of a business real-time transaction. The method specifically comprises the following steps: and the topology nodes can be positioned to the service module, so that the calling relation chain of the application can be monitored more intuitively.
The data query unit 208 is specific to a user, and when the user needs to query the performance index of the currently acquired application or the analysis statistical result of the performance index of the acquired application, the user can query by triggering the data query unit 208. The result of the inquiry is displayed through the data display unit 209.
Further, the user may also query the relationship between the application programs, the related hardware and software components of the application programs, the interaction relationship between these components, and the path of the real-time transaction of the service by triggering the data query module 208, and display the relationship in the form of a topology map by invoking the chain topology display unit 210 by the APM. The calling relation chain of the application can be monitored more intuitively.
In the embodiment of the present invention, the performance index of the application is collected from the application microservice operating platform 201 by the index collecting unit 202, the collected performance index of the application is sent to the data analyzing unit 204 for statistical analysis by the performance monitoring unit 203, and the statistical analysis result is stored in the data storage unit 205. The automation operation and maintenance unit 206 triggers the automatic expansion and contraction of the virtualization device based on the automation operation and maintenance model according to the analysis result of the data analysis unit 204, and recovers the service. In the embodiment of the invention, the automatic expansion and contraction capacity recovery service of the virtualization equipment is triggered from the acquisition of the application performance index to the automatic operation and maintenance model, so that an automatic operation and maintenance closed loop is formed.
FIG. 3 is a flowchart of an APM monitoring method based on artificial intelligence according to the present invention, and FIG. 3 shows an APM monitoring method based on artificial intelligence according to the present invention, which includes steps S301-S303
Step S301: acquiring performance indexes and application relations of application programs;
and acquiring application performance indexes and automatically discovering application relations through log burying points or deploying agents, and storing the acquired application performance indexes and the acquired application relations in a data storage unit.
Step S302: monitoring and data analysis are carried out on the collected performance indexes in real time, and the analysis result is stored;
and analyzing the log data of the application program through the ELK platform, and monitoring the acquired performance index of the application program in real time.
The data analysis unit analyzes the acquired application performance indexes through the artificial intelligence model and stores the analysis result in the data storage unit.
Step S303: according to the analysis result of the performance index of the application program, alarming the performance index exceeding the upper threshold or being lower than the lower threshold or expanding the capacity of the infrastructure and the container of the application program;
the analysis result of the performance index of the application program is judged through the automatic operation and maintenance unit and the alarm module, and when the performance index of the application program is found to exceed the preset upper threshold or be lower than the preset lower threshold, the alarm unit is triggered to alarm the performance index, or the automatic operation and maintenance unit is triggered to automatically expand, contract and maintain, and restore the service.
In one possible embodiment, the upper threshold of the utilization rate of the CPU of the virtual machine is set to 70%, and when the obtained performance index of the application program indicates that the utilization rate of the CPU of the virtual machine is 90%, the alarm unit is triggered to alarm. When the collected performance indexes of the application program indicate that the memory utilization rate of the application program is lower than the preset threshold lower limit, the virtual memory shortage is prompted through the alarm module, the automatic operation and maintenance module is triggered to create a new copy according to the configuration type of the current container according to the elastic capacity expansion strategy configured in advance, and the new copy is automatically added into the existing cluster of the application program.
In one possible embodiment, after the collected performance index of the application program and the analysis result of the performance index are stored in the data storage unit, the user may query the collected performance index and the analysis result of the application program through the data query module and display the query result and the analysis result through the data display module.
Furthermore, the user can also query the relevant hardware and software components of the application program and the interaction of quality and safety supervision of the components through the data query module, or query the path of the real-time business transaction. And the query result is displayed in a topological chain mode. The application service calling chain of the APM is given through the APM calling chain topology display unit, and the service module can be positioned through the topology node in the calling chain, so that the system can monitor the calling relation of the application more visually.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (10)

1. An artificial intelligence based APM monitoring system comprising:
the index acquisition unit is used for acquiring the application program performance indexes of the application program microservice operation platform and the relation between the applications;
the data analysis unit analyzes the performance index through an artificial intelligence analysis model;
the alarm unit gives performance index alarm in real time and tracks and positions abnormal indexes according to the analysis result of the data analysis unit on the performance indexes;
the automatic operation and maintenance unit automatically triggers the automatic expansion and contraction of the virtualization equipment of the application program microservice operation platform according to the analysis result of the data analysis unit on the performance index, and recovers the service;
and the APM calls a chain topology display unit for displaying the relationship between the applications in a topological graph mode.
2. The system of claim 1, further comprising: data storage unit
And the data storage unit is used for storing the relation between the performance indexes and the application acquired by the index acquisition unit and storing the analysis statistical result of the data analysis unit.
3. The system of claims 1-2, further comprising: a data query unit;
and the data query unit is used for enabling a user to query the performance index, the relation among the applications and the analysis result of the performance index.
4. The system of claim 3, further comprising: a data display unit;
the data display unit is used for displaying the applied performance index data according to the query result of the user;
and the APM call chain topology display unit is used for displaying hardware and software components related to the application program according to the query result of the user, displaying the interaction among the software components and graphically displaying the path of the business real-time transaction.
5. The system of claim 1, wherein the metric acquisition unit is specifically configured to: the performance indexes of the application programs are collected in a log point burying mode, and the relation between the applications is automatically discovered through deploying the Agent.
6. The system of claim 1, wherein the performance metrics of the application include: one or more of a monitoring index, a host index, a storage index, a middleware index, a virtual machine index, an application and module index, and an inter-service invocation index of the network.
7. An APM monitoring method based on artificial intelligence comprises the following steps:
collecting performance indexes of an application program;
monitoring and analyzing the performance index of the application program in real time, and storing the performance index and the analysis result of the performance index;
and triggering to alarm the performance index exceeding the upper threshold or lower than the lower threshold according to the analysis result of the performance index, or triggering to automatically expand and contract the virtual equipment.
8. The method of claim 7, wherein collecting the relationship between the performance indicators of the application and the applications comprises: the performance indexes of the application program are collected in a log point burying mode, and the application relation is automatically discovered through deploying the Agent.
9. The method of claim 7, wherein after the monitoring and data analysis of the performance indicators of the application in real time and storing the performance indicators and the analysis results of the performance indicators, the method further comprises:
responding to the query operation of a user, and displaying the collected performance indexes of the application programs and the analysis results of the performance indexes;
or responding to the query operation of the user to display the related hardware and software components of the application program, the interaction between the software components and the path of the business real-time transaction in a graphical mode.
10. The method of claim 7, wherein triggering automatic scaling of the virtualized device comprises:
based on the automatic operation and maintenance model, when the virtual memory of the application program is insufficient, the automatic capacity expansion and contraction is triggered, and the service is recovered.
CN202010956247.6A 2020-09-11 2020-09-11 APM monitoring system and method based on artificial intelligence Active CN112149975B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010956247.6A CN112149975B (en) 2020-09-11 2020-09-11 APM monitoring system and method based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010956247.6A CN112149975B (en) 2020-09-11 2020-09-11 APM monitoring system and method based on artificial intelligence

Publications (2)

Publication Number Publication Date
CN112149975A true CN112149975A (en) 2020-12-29
CN112149975B CN112149975B (en) 2023-04-18

Family

ID=73890902

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010956247.6A Active CN112149975B (en) 2020-09-11 2020-09-11 APM monitoring system and method based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN112149975B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115102828A (en) * 2022-08-26 2022-09-23 歌尔股份有限公司 Fault analysis method and device
WO2024051723A1 (en) * 2022-09-08 2024-03-14 中电信数智科技有限公司 Multi-interface platform-based task monitoring and anomaly self-healing method and apparatus

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017028697A1 (en) * 2015-08-17 2017-02-23 阿里巴巴集团控股有限公司 Method and device for growing or shrinking computer cluster
CN109934361A (en) * 2019-02-25 2019-06-25 江苏电力信息技术有限公司 A kind of automation operation platform model based on container and big data
CN110581773A (en) * 2018-06-07 2019-12-17 北京怡合春天科技有限公司 automatic service monitoring and alarm management system
WO2020015061A1 (en) * 2018-07-18 2020-01-23 平安科技(深圳)有限公司 Monitoring alarm method, device and system for weblogic server, and computer storage medium
CN111181767A (en) * 2019-12-10 2020-05-19 中国航空工业集团公司成都飞机设计研究所 Monitoring and fault self-healing system and method for complex system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017028697A1 (en) * 2015-08-17 2017-02-23 阿里巴巴集团控股有限公司 Method and device for growing or shrinking computer cluster
CN110581773A (en) * 2018-06-07 2019-12-17 北京怡合春天科技有限公司 automatic service monitoring and alarm management system
WO2020015061A1 (en) * 2018-07-18 2020-01-23 平安科技(深圳)有限公司 Monitoring alarm method, device and system for weblogic server, and computer storage medium
CN109934361A (en) * 2019-02-25 2019-06-25 江苏电力信息技术有限公司 A kind of automation operation platform model based on container and big data
CN111181767A (en) * 2019-12-10 2020-05-19 中国航空工业集团公司成都飞机设计研究所 Monitoring and fault self-healing system and method for complex system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115102828A (en) * 2022-08-26 2022-09-23 歌尔股份有限公司 Fault analysis method and device
WO2024051723A1 (en) * 2022-09-08 2024-03-14 中电信数智科技有限公司 Multi-interface platform-based task monitoring and anomaly self-healing method and apparatus

Also Published As

Publication number Publication date
CN112149975B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
CN110661659B (en) Alarm method, device and system and electronic equipment
CN109714192B (en) Monitoring method and system for monitoring cloud platform
CN104407964B (en) A kind of centralized monitoring system and method based on data center
US8954971B2 (en) Data collecting method, data collecting apparatus and network management device
CN108365985A (en) A kind of cluster management method, device, terminal device and storage medium
CN102231681A (en) High availability cluster computer system and fault treatment method thereof
CN112149975B (en) APM monitoring system and method based on artificial intelligence
CN112162821B (en) Container cluster resource monitoring method, device and system
CN114500250B (en) System linkage comprehensive operation and maintenance system and method in cloud mode
CN108809701A (en) A kind of data center's wisdom data platform and its implementation
US20110160923A1 (en) Method and apparatus for monitoring the performance of a power delivery control system
CN105556499A (en) Intelligent auto-scaling
CN102929773A (en) Information collection method and device
CN109901969B (en) Design method and device of centralized monitoring management platform
CN110611597A (en) Cross-domain operation and maintenance system based on unidirectional network gate environment
CN110727508A (en) Task scheduling system and scheduling method
CN114356499A (en) Kubernetes cluster alarm root cause analysis method and device
CN113760652A (en) Method, system, device and storage medium for full link monitoring based on application
CN114095333A (en) Network troubleshooting method, device, equipment and readable storage medium
CN111339466A (en) Interface management method and device, electronic equipment and readable storage medium
CN103823743A (en) Monitoring method and monitoring device of software system
CN116895046A (en) Abnormal operation and maintenance data processing method based on virtualization
CN115858499A (en) Database partition processing method and device, computer equipment and storage medium
CN105072161A (en) Application program management system based on cloud computing
CN115840656A (en) Automatic operation and maintenance method and system for application program based on fault self-healing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant