CN114389937A - Operation and maintenance monitoring and management system - Google Patents

Operation and maintenance monitoring and management system Download PDF

Info

Publication number
CN114389937A
CN114389937A CN202210047374.3A CN202210047374A CN114389937A CN 114389937 A CN114389937 A CN 114389937A CN 202210047374 A CN202210047374 A CN 202210047374A CN 114389937 A CN114389937 A CN 114389937A
Authority
CN
China
Prior art keywords
monitoring
layer
management
data
resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202210047374.3A
Other languages
Chinese (zh)
Inventor
徐皓原
梅雪娇
张入丹
韩嘉骝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202210047374.3A priority Critical patent/CN114389937A/en
Publication of CN114389937A publication Critical patent/CN114389937A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/04Network management architectures or arrangements
    • H04L41/042Network management architectures or arrangements comprising distributed management centres cooperatively managing the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation

Abstract

The invention discloses an operation and maintenance monitoring and management system, which comprises: the system comprises a resource layer, a data storage layer, a technical layer, an application layer and a presentation layer; the resource layer is used for describing system deployment resources and monitoring server resources, and comprises external field equipment resources; the data storage layer is used for storing the acquired data in a classified manner; the technical layer is used for data acquisition, data processing, user management and authority management; the application layer is used for monitoring outlines, resource management, data monitoring, service interfaces, host portrait, equipment management, system logs and system management; the display layer is used for visual design presentation through a PC and a large screen. The system adopts a cooperative working mode of the server and the client, is lighter and more efficient, and can support the simultaneous online monitoring of thousands of hosts; the system has the functions of collecting monitoring data in real time, feeding back monitoring states in real time, predicting faults and alarming, assisting in positioning faults, assisting in performance tuning, assisting in capacity planning and assisting in automatic operation and maintenance.

Description

Operation and maintenance monitoring and management system
Technical Field
The invention relates to the technical field of operation and maintenance monitoring, in particular to an operation and maintenance monitoring and management system.
Background
The existing operation and maintenance monitoring and management system has low operation and maintenance management level and efficiency and few functions, lacks a set of operation and maintenance monitoring and management system covering the related functions of cluster automatic installation, centralized management, cluster monitoring, alarming and the like, and can not meet the requirements of users.
Disclosure of Invention
In view of the technical deficiencies, the invention aims to provide an operation and maintenance monitoring and management system, which covers the relevant functions of cluster automatic installation, centralized management, cluster monitoring, alarming and the like.
In order to solve the technical problems, the invention adopts the following technical scheme:
an operation and maintenance monitoring and management system, comprising: the system comprises a resource layer, a data storage layer, a technical layer, an application layer and a presentation layer;
the resource layer is used for describing system deployment resources and monitoring server resources, and comprises external field equipment resources;
the data storage layer is used for storing the acquired data in a classified manner;
the technical layer is used for data acquisition, data processing, user management and authority management;
the application layer is used for monitoring outlines, resource management, data monitoring, service interfaces, host portrait, equipment management, system logs and system management;
the display layer is used for visual design presentation through a PC and a large screen.
Preferably, the monitoring server resource obtains the monitored resource from the server by means of a data probe.
The invention has the beneficial effects that: the method adopts a cooperative working mode of the server and the client, is lighter and more efficient, and can support the simultaneous online monitoring of thousands of hosts; the system has multiple functions of collecting monitoring data in real time, feeding back monitoring states in real time, predicting faults and alarming, assisting in positioning faults, assisting in performance tuning, assisting in capacity planning and assisting in automatic operation and maintenance.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is an architecture diagram of an operation and maintenance monitoring and management system provided by the present invention;
fig. 2 is a functional architecture diagram of an operation and maintenance monitoring and management system according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1-2, an operation and maintenance monitoring and management system includes: the system comprises a resource layer, a data storage layer, a technical layer, an application layer and a presentation layer;
the resource layer is used for describing system deployment resources and monitoring server resources, and comprises external field equipment resources;
the data storage layer is used for storing the acquired data in a classified manner;
the technical layer is used for data acquisition, data processing, user management and authority management;
the application layer is used for monitoring outlines, resource management, data monitoring, service interfaces, host portrait, equipment management, system logs and system management; the display layer is used for visual design display through a PC and a large screen; and large-screen visualization is realized, and all server indexes of operation and maintenance monitoring are represented in one picture through a large-screen display form. Including the number of servers, network, alarm, host status, etc.
Further, monitoring server resources acquire monitored resources from the server in a data probe mode, and acquire CPU, memory, storage and network information through a SigarProxy probe technology; the conditions of CPU occupation, memory occupation, storage occupation and port application of the monitoring process are detailed.
The technical capability index of the system is as follows:
1. monitoring a host cluster, wherein the default configuration can support 500+ host simultaneous online monitoring;
monitoring basic indexes such as a CPU (central processing unit), a memory, a system load, a magnetic disk and the like; (ii) a
3. Data monitoring (mysql, oracle, pg and the like) supports data statistics by writing sql, monitors the change trend of data, and desensitizes sensitive words. The data source is unsuccessfully connected and an alarm is given;
4. service heartbeat detection, namely health detection of a service interface, namely that you can return 200 hours, otherwise, even if the service is failed, an alarm is supported;
5. monitoring a process, namely monitoring whether the process normally runs by supporting a pid file, a process id and a process name, and supporting an alarm by using a memory and a cpu;
docker monitoring, the using state of the Docker is monitored, and the alarm is supported
7. Monitoring a magnetic disk, namely monitoring the use condition of the magnetic disk;
8. monitoring the port, namely monitoring whether the port is communicated with telnet or not, wherein the port eliminates the network firewall factor, is equivalent to a telnet localhost port and supports alarm;
9. monitoring log files, namely monitoring whether keywords exist in the logs or not, giving an alarm if the keywords exist, and specifying a specific log file or a directory where the logs are located, such as/usr/local/nginx/logs/access.log or/usr/local/nginx/logs/, wherein the latest log file under the directory can be read when the directory is specified;
10. the alarm mode is a mail by default, the execution of the alarm script is also supported, the alarm can be realized in the script by modes of nailing, WeChat and the like, and all index alarms can be closed and opened in the configuration file;
11. device management, which is useful for managing various devices of a company;
12. the host portrait is used for displaying all information of a cpu, a memory, a disk, a load, a monitoring port, a process, a docker, a log file and the like of the host.
The system monitoring system adopts a micro service architecture, English name is Microservice, and Microservice architecture mode is to organize the whole Web application into a series of small Web services. These small Web services can be compiled and deployed independently and communicate with each other through their respective exposed API interfaces. They cooperate with each other to provide functionality to the user as a whole, but can be independently expanded.
The system outline of the system can check the monitoring process quantity, the data source (database), the data table, the log monitoring, the service interface monitoring and the monitoring host index information through a system home page, and comprehensively show the operation and maintenance information in a chart form.
And (3) host management, namely, installing a probe under an application host to acquire data, wherein the host management mainly shows real-time running conditions of all server hosts monitored by operation and maintenance, including operating system, IP, internal memory, CPU, disk and network information.
And (4) process management, namely, the process management monitors the memory of the process and the resource occupation condition of the CPU in real time through the configured process and checks the historical running condition through a chart.
And port management, namely performing port operation conditions through the configured ports and checking the performed historical operation conditions through a chart.
And (3) data source management, wherein the data source management is monitored through a database such as a data source (Oracle, mysql, PG).
And monitoring a data interface, namely monitoring the data source management through a configured interface and monitoring the online condition of the interface in real time.
And (3) host portrait, namely evaluating the overall operation of the server through a visual chart, wherein all configured monitoring items are contained.
Data storage design, data storage period: the log is stored for 6 months, and other data are stored for a long time.
Backup frequency: and carrying out data backup by taking months as a backup period, and preparing the data to a backup server in a file form.
The system provides data services to the outside in a unified mode through Nginx load balancing and gateways, and an application cluster performs data cleaning, data management, data analysis and data services to the outside by searching data acquired by all probes;
data storage management: the data storage provides data display performance by main and standby storage of a Mysql database and Redis cache;
and Nacos is service registration capability and guarantees normal operation of the operation and maintenance system.
This monitored control system is based on little service springboot framework development, is the distributed monitored control system of light weight high performance, and the core is gathered the index and is included: the method comprises the steps of host system information, network flow, CPU state, CPU temperature, memory state, disk space and IO monitoring, hard disk smart health detection, system load, large screen visualization, ES cluster state, data visualization monitoring (mysql, oracle, pgsql and the like), service interface detection, application process monitoring, network topology, port monitoring, log file monitoring, docker monitoring, file tamper protection, digital communication equipment monitoring, Web SSH, bastion machine, instruction issuing and alarm information (mail WeChat nailing short message and the like) pushing.
The core acquisition indexes comprise: the method comprises the steps of monitoring host system information, network flow, CPU state, CPU temperature, memory state, disk space and IO (input/output) monitoring, system load, large screen visualization, data visualization monitoring (mysql, oracle, pgsql and the like), service interface detection, application process monitoring, port monitoring, log file monitoring and data communication equipment monitoring.
1. And a cooperative working mode of a server and a client is adopted, so that the system is lighter and more efficient, and can support the simultaneous online monitoring of thousands of hosts.
And 2, the server end is responsible for receiving data, processing the data and generating a chart display. and reporting the index data by the agent end every 30 seconds (time is adjustable) by default.
3. And supporting mainstream server platform installation and deployment, such as Linux, Windows, macOS, Unix and the like.
4. By adopting springboot + bootstrap, the distributed monitoring system is perfectly realized
The system has the following functions:
1. collecting monitoring data in real time: data of each dimension including hardware, an operating system, middleware, an application program and the like;
2. and (3) real-time feedback of the monitoring state: the state of the monitored object can be reflected in real time by carrying out multi-dimensional statistics and visual display on the acquired data;
3. predicting faults and alarms: the fault risk can be predicted in advance, and warning information can be sent out in time;
4. and (3) auxiliary fault positioning: providing various index data when a fault occurs, and assisting in fault analysis and positioning;
5. auxiliary performance optimization: providing data support for performance tuning, such as slow SQL, interface response time and the like;
6. auxiliary capacity planning: providing data support for capacity planning of a server, middleware and an application cluster;
7. auxiliary automation operation and maintenance: and data support is provided for intelligent operation and maintenance such as system capacity expansion.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (2)

1. An operation and maintenance monitoring and management system, comprising: the system comprises a resource layer, a data storage layer, a technical layer, an application layer and a presentation layer;
the resource layer is used for describing system deployment resources and monitoring server resources, and comprises external field equipment resources;
the data storage layer is used for storing the acquired data in a classified manner;
the technical layer is used for data acquisition, data processing, user management and authority management;
the application layer is used for monitoring outlines, resource management, data monitoring, service interfaces, host portrait, equipment management, system logs and system management;
the display layer is used for visual design presentation through a PC and a large screen.
2. The operation and maintenance monitoring and management system according to claim 1, wherein the monitoring server resource obtains the monitored resource from the server by means of a data probe.
CN202210047374.3A 2022-01-17 2022-01-17 Operation and maintenance monitoring and management system Withdrawn CN114389937A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210047374.3A CN114389937A (en) 2022-01-17 2022-01-17 Operation and maintenance monitoring and management system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210047374.3A CN114389937A (en) 2022-01-17 2022-01-17 Operation and maintenance monitoring and management system

Publications (1)

Publication Number Publication Date
CN114389937A true CN114389937A (en) 2022-04-22

Family

ID=81202074

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210047374.3A Withdrawn CN114389937A (en) 2022-01-17 2022-01-17 Operation and maintenance monitoring and management system

Country Status (1)

Country Link
CN (1) CN114389937A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114827678A (en) * 2022-04-29 2022-07-29 广东省广播电视网络股份有限公司中山分公司 Operation and maintenance monitoring and analyzing system for digital television front-end platform
CN115442223A (en) * 2022-07-19 2022-12-06 写逸网络科技(上海)有限公司 Automatic operation and maintenance method for distributed cluster
CN115811458A (en) * 2022-11-17 2023-03-17 浪潮云信息技术股份公司 Monitoring method and system based on springboot micro-service

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114827678A (en) * 2022-04-29 2022-07-29 广东省广播电视网络股份有限公司中山分公司 Operation and maintenance monitoring and analyzing system for digital television front-end platform
CN115442223A (en) * 2022-07-19 2022-12-06 写逸网络科技(上海)有限公司 Automatic operation and maintenance method for distributed cluster
CN115811458A (en) * 2022-11-17 2023-03-17 浪潮云信息技术股份公司 Monitoring method and system based on springboot micro-service

Similar Documents

Publication Publication Date Title
CN114389937A (en) Operation and maintenance monitoring and management system
US6871228B2 (en) Methods and apparatus in distributed remote logging system for remote adhoc data analysis customized with multilevel hierarchical logger tree
RU2636848C2 (en) Method of estimating power consumption
Castelli et al. Proactive management of software aging
CN105843904B (en) For the monitoring warning system of database runnability
CN104506393B (en) A kind of system monitoring method based on cloud platform
US5678042A (en) Network management system having historical virtual catalog snapshots for overview of historical changes to files distributively stored across network domain
US5668944A (en) Method and system for providing performance diagnosis of a computer system
US7251588B2 (en) System for metric introspection in monitoring sources
CN111209011A (en) Cross-platform container cloud automatic deployment system
CN108365985A (en) A kind of cluster management method, device, terminal device and storage medium
CN106487574A (en) Automatic operating safeguards monitoring system
US20070168915A1 (en) Methods and systems to detect business disruptions, determine potential causes of those business disruptions, or both
US20050187940A1 (en) Systems, methods and computer program products for managing a plurality of remotely located data storage systems
CN104022903A (en) One-stop automatic operation and maintaining system
WO2005122000A2 (en) Agent-less systems, methods and computer program products for managing a plurality of remotely located data storage systems
CN114244676A (en) Intelligent IT integrated gateway system
CN111488258A (en) System for analyzing and early warning software and hardware running state
CN110598051A (en) Power industry monitoring system, method and device
CN113076229B (en) General enterprise-level information technology monitoring system
CN107704361A (en) A kind of power transmission and transforming equipment monitoring platform basic resource monitoring system
CN107678915A (en) A kind of power transmission and transforming equipment monitoring platform basic resource monitoring method
US7779063B2 (en) Automatic benefit analysis of dynamic cluster management solutions
WO2019241199A1 (en) System and method for predictive maintenance of networked devices
Yuan et al. Design and implementation of accelerator control monitoring system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20220422

WW01 Invention patent application withdrawn after publication