CN117255004A - Intelligent operation and maintenance method based on Zabbix and aspect - Google Patents

Intelligent operation and maintenance method based on Zabbix and aspect Download PDF

Info

Publication number
CN117255004A
CN117255004A CN202311336169.XA CN202311336169A CN117255004A CN 117255004 A CN117255004 A CN 117255004A CN 202311336169 A CN202311336169 A CN 202311336169A CN 117255004 A CN117255004 A CN 117255004A
Authority
CN
China
Prior art keywords
alarm
server
zabbix
processing
maintenance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311336169.XA
Other languages
Chinese (zh)
Inventor
刘冬辉
田磊
孙嘉怿
黄习瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Benxi Steel Group Information Automation Co ltd
Original Assignee
Benxi Steel Group Information Automation Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Benxi Steel Group Information Automation Co ltd filed Critical Benxi Steel Group Information Automation Co ltd
Priority to CN202311336169.XA priority Critical patent/CN117255004A/en
Publication of CN117255004A publication Critical patent/CN117255004A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0876Aspects of the degree of configuration automation

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Automation & Control Theory (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides an intelligent operation and maintenance method based on Zabbix and aspect, relates to the technical field of computers, and particularly relates to an intelligent operation and maintenance method based on Zabbix and aspect. The method comprises the following steps: the Zabbix monitors a target server cluster operated by the application; when an alarm occurs, the intelligent operation and maintenance server acquires alarm information; the intelligent operation server inquires and processes the method according to the alarm content; the intelligent operation and maintenance server calls aspect, and the server of the alarm is processed according to the queried method weight through the remote connection of the server ip in the alarm information. The technical scheme of the invention solves the problems of low efficiency of technicians in operation when the service scene is complex and the number of operation and maintenance servers is large in the prior art, and can automatically process and solve the complex scene of various conditions and various situations; the operation and maintenance work can be completed in a recordable, traceable and rewritable way with high efficiency.

Description

Intelligent operation and maintenance method based on Zabbix and aspect
Technical Field
The invention relates to the technical field of computers, in particular to an intelligent operation and maintenance method based on Zabbix and aspect.
Background
The development of internet technology has now led to the key role played by Linux servers. In most scenarios, linux servers are used to run applications. Application operation and maintenance is also mostly performed on Linux servers.
The operation and maintenance work is generally that a technician regularly patrols and examines a target server, or after the technician is notified through a Zabbix alarm program, the technician is connected to the server through ssh and then performs operation treatment, when the service scene is complex and the number of the operation and maintenance servers is large, a large number of repeated works are generated, and the efficiency of the operation of the technician is low.
In view of the existing circumstances, the present invention provides an intelligent operation and maintenance method based on Zabbix and aspect.
Zabbix is an enterprise-level open source solution based on WEB interfaces that provides distributed system monitoring and network monitoring functions. The Zabbix can monitor various network parameters and ensure the safe operation of the server system; and provides a flexible notification mechanism for system administrators to quickly locate and resolve various problems that exist.
Aspect is a free programming tool language used to implement automatic and interactive task communication. The aspect can provide the input needed by the program according to the prompt simulation standard input of the program to realize the interactive program execution.
Disclosure of Invention
According to the problems that the service scene is complex, and technicians operate and maintain a large number of repeated work when the number of operation and maintenance servers is large, the intelligent operation and maintenance method based on Zabbix and aspect is provided.
The invention adopts the following technical means: an intelligent operation and maintenance method based on Zabbix and aspect comprises the following steps:
the first step: monitoring a target server cluster running based on Zabbix application;
and a second step of: when the value set by the Zabbix monitored target server exceeds a threshold value and an alarm occurs, acquiring a Zabbix alarm notification;
and a third step of: inquiring a processing method from a processing method database according to alarm information provided by a target server;
fourth step: invoking an aspect, remotely connecting an alarm server through a server ip in alarm information, processing the alarm server according to a processing method with the highest queried weight, dynamically acquiring a database command through executing query in the aspect, and transmitting the database command to a target server to execute the command and perform alarm processing;
fifth step: continuing to monitor the alarm target server for t seconds, when feedback that the alarm is solved is obtained according to the API of Zabbix within t seconds, terminating the processing, recording that the work is increased by a value a, and returning to the third step if the alarm is not solved;
sixth step: if no more solutions are queried, the process is aborted and a notification message is sent out to end.
Further, the method comprises the following steps: the Zabbin alarm notification is obtained by calling trigger of api_jsonrpc.php in the Zabbin API.
Further, the alarm information comprises an alarm reason, an alarm time length and an alarm server ip.
Further, the processing method database sets specific information through processing the WEB page, and the specific information is persisted in the processing method database, wherein the specific information is set to include alarm conditions, processing methods, processing weights, timing tasks, error processing waiting time t and success weight increasing value a.
Further, the API content body of zalbix is JSON format, and the jsonrpc value is 2.0.
Further, t is 30 seconds, and a is 1.
Further, the processing method with the highest weight influences the success weight according to the success times, and the execution sequence of the solution is adopted through the success weight.
An intelligent operation and maintenance server based on Zabbix and aspect comprises
And a monitoring module: a target server cluster for monitoring operation based on the Zabbix application;
the acquisition module is used for: when the value set by the Zabbix monitored target server exceeds a threshold value so as to alarm, the intelligent operation and maintenance server acquires a Zabbix alarm notification;
and a query module: inquiring a processing method from a processing method database according to alarm information provided by a target server;
the processing module is used for: invoking an aspect, remotely connecting an alarm server through a server ip in alarm information, processing the alarm server according to a processing method with the highest queried weight, dynamically acquiring a database command through executing query in the aspect, and transmitting the database command to a target server to execute the command and perform alarm processing;
and a judging module: the method comprises the steps of continuously monitoring an alarm target server for t seconds, when feedback that an alarm is solved is obtained according to an API of Zabbix within t seconds, terminating processing, recording that the alarm is not solved, and returning to the third step;
and (5) a suspension module: if no more solutions are queried, the process is aborted and the technician is notified of the end.
Compared with the prior art, the intelligent operation and maintenance method based on Zabbix and aspect has the following beneficial effects:
1. according to the invention, under the condition that the operation and maintenance servers are more, the common alarm of the servers is solved in an intelligent processing mode; business personnel can automatically solve common problems according to settings by using the application method, solve the problem that technicians repeatedly work during operation and maintenance, greatly improve operation and maintenance efficiency, and can automatically process and solve complex scenes of various conditions and conditions; wherein the intelligent implementation of the invention can influence the success weight according to the success times, and adopts the execution sequence of the solution method through the success weight.
2. The invention can realize customized intelligent operation and maintenance. The intelligent operation and maintenance processing command specific to the enterprise can be set and input according to the characteristics of the enterprise server cluster server application. These commands are valid commands with higher priority for a particular enterprise, but may be invalid commands with lower priority for application scenarios of other enterprises on the market; therefore, the invention can realize customized and exclusive intelligent operation and maintenance. If in the ERP operation and maintenance scene, when the alarm of insufficient memory space occurs, the application scene processing mode of other enterprises is that the system process is too much; the restarting of the JVM memory is preferentially carried out under the operation and maintenance scene of a specific enterprise; the command weight value is the highest, so that customized intelligent operation and maintenance are realized.
3. In the invention, under the condition of more operation and maintenance servers, the timing command can be set through the Linux system, and the fault rate of the servers can be further reduced through preventive processing. The invention records the operation and maintenance process at the same time, and can finish operation and maintenance work with high efficiency, and the operation and maintenance process can be recorded, traced and repeated.
4. The invention selects aspect as processing implementation application, and aspect has the characteristics of automation, interactivity and the like; the setting of processing command can be dynamically completed, the requirement of intelligent operation and maintenance is met, and the application scene of intelligent operation and maintenance is enriched.
The invention mainly utilizes Zabbix to monitor the server, collects the solutions of the common problems and characteristic problems of the alarm, and when the problems or faults occur, applies a processing mode of executing the processing modes according to the weight sequence, wherein the problems are processed through aspect, thereby achieving the effects of intelligent operation and maintenance and automatic processing. The method completes basic data monitoring and collection through the Zabbix function, and executes related commands through aspect.
In conclusion, the application of the technical scheme of the invention is based on the intelligent operation and maintenance method of Zabbix and aspect, and the requirements of operation and maintenance scenes are met. Therefore, the technical scheme of the invention solves the problems of low efficiency of technical staff operation caused by a large amount of repeated work generated when the service scene is complex and the number of operation and maintenance servers is large in the prior art.
Based on the reasons, the intelligent operation and maintenance system can be widely popularized in the fields of computer technology, intelligent operation and maintenance and the like.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to the drawings without inventive effort to a person skilled in the art.
FIG. 1 is a flow chart of a Zabbix and aspect-based intelligent operation and maintenance method according to the invention;
FIG. 2 is a schematic diagram of the implementation structure of a Zabbix and aspect-based intelligent operation and maintenance method according to the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
FIG. 1 is a flow chart of a Zabbix and aspect-based intelligent operation and maintenance method according to the invention;
FIG. 2 is a schematic diagram of the implementation structure of a Zabbix and aspect-based intelligent operation and maintenance method according to the present invention.
The intelligent operation and maintenance method based on Zabbix and aspect adopts Zabbix, expect two key technologies. According to the invention, basic data monitoring and collection are completed through the Zabbix function, and related commands are executed through aspect.
The method comprises the following steps:
the first step: starting, the Zabbix server monitors a target server cluster of application running;
and a second step of: when an alarm occurs, the intelligent operation and maintenance server acquires a Zabbix alarm notice, and the intelligent operation and maintenance application calls a trigger of API/api_jsonrpc.php of Zabbix to acquire specific alarm information, such as alarm reason, alarm duration, alarm server ip and the like;
and a third step of: the intelligent operation and maintenance application queries a processing method database for the processing method according to the alarm information.
Fourth step: the intelligent operation and maintenance application starts an aspect command, is remotely connected to an alarm server according to ip, and executes processing according to the processing method with the highest queried weight; in aspect, database commands are dynamically obtained by executing queries.
Fifth step: after execution is completed, the intelligent operation and maintenance application continues to monitor the alarm target server for t seconds, if feedback that the alarm has been solved is obtained according to the API of Zabbix within t seconds, the processing is terminated, and the success is recorded to increase the value of a. If not, repeating the third and fourth steps until the problem is solved.
Sixth step: if no more solutions are found in the fifth step, the processing is stopped, and the technician is notified by enterprise WeChat or mailbox, and the process is finished.
Further, the processing method database in the third step sets specific information through intelligent processing of the WEB page, and persists the specific information into the processing method database, wherein the processing method database comprises alarm conditions, processing methods, processing weights, timing tasks, error processing waiting time t, success weight increasing value a and the like.
Preferably, the API content body of zagbix in the second step is JSON format, and the jsonrpc value is 2.0.
Preferably, t is 30 seconds and a is 1 in the third step.
The working process of the invention is as follows: the Zabbix server monitors the target server, the target server alarms, the intelligent operation and maintenance application collects alarm information, the intelligent operation and maintenance application calls an aspect execution processing method according to the alarm content inquiry processing server, the intelligent operation and maintenance application judges whether to solve the problem, repeatedly inquires and executes other processing methods until the alarm is solved, if the rest processing methods are not inquired, a technician is notified, the successful processing record is recorded, and the weight value is increased.
Step 3, the processing method of step 4 is to process the order according to the settlement input question; setting weights according to the most-priority processing commands, the sub-processing commands and the like; the application is executed preferentially according to the order of the weight values; the initialization entry weight value may also be set to be the same, i.e., a number of 1.
The dynamic automatic change of the weight values is performed according to the following logic: the execution is completed, and meanwhile, the successful record times influence weight value can be set: when one method is successful, the number of successes increases and the weight value increases. After accumulation for a plurality of times, the weight value sequence is changed, so that the order of the methods for processing and calling is different according to the specific server situation.
Example 1
The intelligent operation and maintenance embodiment of the steel ERP system is as follows:
the first step: target steel ERP server cluster based on Zabbix application monitoring operation;
and a second step of: when the set value of the Zabbix monitored target ERP server exceeds a threshold value and an alarm occurs, if the memory space of a Linux server with the number of 01 is insufficient, the Zabbix sends alarm information to an intelligent operation and maintenance server, and the intelligent operation and maintenance server receives a Zabbix alarm notification, wherein the specific content is that the memory space of the Linux server with the number of 01 is insufficient;
the setting of the alarm condition comprises the alarm that the CPU usage ratio of the server exceeds 70 percent and is considered as an excessively high CPU, and a corresponding processing method can be set; when the CPU exceeds 70%, the memory ratio exceeds 50%, and the first three of the memory ratios are Java processes, the CPU is considered as an alarm for the abnormality of the second Java process, and a corresponding completely different processing method can be set.
When Zabbix gives an alarm, the application program passively processes errors; the intelligent operation and maintenance application can be used for setting timing tasks, timing frequency and actively checking server processes, java states and the like, so that active problem discovery is achieved, preventive treatment is carried out, and the server failure rate is further reduced.
And a third step of: inquiring the processing method from a processing method database according to alarm information provided by a Linux server with the number of 01, such as '01Linux server with insufficient memory space', wherein the inquiring conditions are three conditions of server, memory and insufficient space, and the inquiring processing method is a processing step 1'ps-ef|grep java|xargs kill-9', and a processing step 2'systemctl start tomcat'; the second command of the processing method is 'systemctl restart tomcat'; the weight value of the method one is 5578; the weight value of method two is 284.
Fourth step: calling aspect, remotely connecting the 01 server through Linux server information with the number 01 in the alarm information, and processing the 01 server according to the queried weight, wherein the method I is obviously larger in weight value, so that a command of the method I is sent to the 01 server to execute the command and perform alarm processing;
fifth step: continuing to monitor the server number 01 for 1 second, the memory alert has been processed, the Zabbix server sends the '01Linux server memory space shortage-repaired' content to the intelligent processing server, at which point the process terminates, and the weight of method one is increased to 5579.
In particular, for a 01 server which frequently generates the memory alarm, the command of executing the method I every 5 days is set, so that the memory alarm is effectively prevented.
In the embodiments provided in the present application, it should be understood that the disclosed technology may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, for example, may be a logic function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (8)

1. The intelligent operation and maintenance method based on Zabbix and aspect is characterized by comprising the following steps of:
the first step: monitoring a target server cluster running based on Zabbix application;
and a second step of: when the value set by the Zabbix monitored target server exceeds a threshold value and an alarm occurs, acquiring a Zabbix alarm notification;
and a third step of: inquiring a processing method from a processing method database according to alarm information provided by a target server;
fourth step: invoking an aspect, remotely connecting an alarm server through a server ip in alarm information, processing the alarm server according to a processing method with the highest queried weight, dynamically acquiring a database command through executing query in the aspect, and transmitting the database command to a target server to execute the command and perform alarm processing;
fifth step: continuing to monitor the alarm target server for t seconds, when feedback that the alarm is solved is obtained according to the API of Zabbix within t seconds, terminating the processing, recording that the work is increased by a value a, and returning to the third step if the alarm is not solved;
sixth step: if no more solutions are queried, the process is aborted and a notification message is sent out to end.
2. The Zabbix and aspect-based intelligent operation and maintenance method according to claim 1, comprising the steps of: the Zabbin alarm notification is obtained by calling trigger of api_jsonrpc.php in the Zabbin API.
3. The zabbin and aspect based intelligent operation and maintenance method according to claim 1, wherein the alarm information comprises an alarm reason, an alarm time length and an alarm server ip.
4. The zabbin and aspect-based intelligent operation and maintenance method according to claim 1, wherein the processing method database sets specific information by processing WEB pages, and the specific information is persisted in the processing method database, and the setting specific information comprises alarm conditions, processing methods, processing weights, timing tasks, error processing waiting time t and success weight increasing value a.
5. The zabbin and aspect based intelligent operation and maintenance method according to claim 1, wherein the API content of the zabbin is JSON format and the jsonrpc value is 2.0.
6. The method of claim 1, wherein t is 30 seconds and a is 1.
7. The zabbin and aspect based intelligent operation and maintenance method according to claim 1, wherein the highest weighted processing method affects success weights according to success times, and the execution sequence of solutions is adopted through the success weights.
8. An intelligent operation and maintenance server based on Zabbix and aspect, which is characterized by comprising
And a monitoring module: a target server cluster for monitoring operation based on the Zabbix application;
the acquisition module is used for: when the value set by the Zabbix monitored target server exceeds a threshold value so as to alarm, the intelligent operation and maintenance server acquires a Zabbix alarm notification;
and a query module: inquiring a processing method from a processing method database according to alarm information provided by a target server;
the processing module is used for: invoking an aspect, remotely connecting an alarm server through a server ip in alarm information, processing the alarm server according to a processing method with the highest queried weight, dynamically acquiring a database command through executing query in the aspect, and transmitting the database command to a target server to execute the command and perform alarm processing;
and a judging module: the method comprises the steps of continuously monitoring an alarm target server for t seconds, when feedback that an alarm is solved is obtained according to an API of Zabbix within t seconds, terminating processing, recording that the alarm is not solved, and returning to the third step;
and (5) a suspension module: if no more solutions are queried, the process is aborted and the technician is notified of the end.
CN202311336169.XA 2023-10-16 2023-10-16 Intelligent operation and maintenance method based on Zabbix and aspect Pending CN117255004A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311336169.XA CN117255004A (en) 2023-10-16 2023-10-16 Intelligent operation and maintenance method based on Zabbix and aspect

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311336169.XA CN117255004A (en) 2023-10-16 2023-10-16 Intelligent operation and maintenance method based on Zabbix and aspect

Publications (1)

Publication Number Publication Date
CN117255004A true CN117255004A (en) 2023-12-19

Family

ID=89134990

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311336169.XA Pending CN117255004A (en) 2023-10-16 2023-10-16 Intelligent operation and maintenance method based on Zabbix and aspect

Country Status (1)

Country Link
CN (1) CN117255004A (en)

Similar Documents

Publication Publication Date Title
CN107515796B (en) Equipment abnormity monitoring processing method and device
US20100131952A1 (en) Assistance In Performing Action Responsive To Detected Event
CN111400104A (en) Data synchronization method and device, electronic equipment and storage medium
CN110798339A (en) Task disaster tolerance method based on distributed task scheduling framework
US20050114867A1 (en) Program reactivation using triggering
CN109409948B (en) Transaction abnormity detection method, device, equipment and computer readable storage medium
CN100359865C (en) Detecting method
CN109740345A (en) A kind of method and device of monitoring process
CN117255004A (en) Intelligent operation and maintenance method based on Zabbix and aspect
CN108011906A (en) Digital signage management system and monitoring method with intelligent monitoring function
CN111324482A (en) Computer application program running data fault processing system
CN109040286B (en) Client online state maintenance method based on memory database
Wei et al. An agent-based services framework with adaptive monitoring in cloud environments
CN115222181B (en) Robot operation state monitoring system and method
CN115509719A (en) Automatic flow execution method and device
CN112799921A (en) Multi-device and multi-network environment operation and maintenance monitoring method and device and storage medium
CN114153583A (en) Task state management method, task management system and task calling system
US8595172B2 (en) Ensuring high availability of services via three phase exception handling
CN109922141A (en) The real time acquiring method and device of activity request list in Java application server
JP7360077B2 (en) Control device, control method, and control program
CN111756778A (en) Server disk cleaning script pushing method and device and storage medium
CN115934252A (en) Method and system for realizing full-automatic load balancing consumption message based on Kafka
CN116366508A (en) Container exception handling method and device, processor and electronic equipment
CN114385731A (en) Information processing method and system based on water turbine control feedback
CN101582796B (en) Implementation method and device of general electronic bulletin

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination