CN114389937A - Operation and maintenance monitoring and management system - Google Patents
Operation and maintenance monitoring and management system Download PDFInfo
- Publication number
- CN114389937A CN114389937A CN202210047374.3A CN202210047374A CN114389937A CN 114389937 A CN114389937 A CN 114389937A CN 202210047374 A CN202210047374 A CN 202210047374A CN 114389937 A CN114389937 A CN 114389937A
- Authority
- CN
- China
- Prior art keywords
- monitoring
- layer
- management
- data
- resource
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/04—Network management architectures or arrangements
- H04L41/042—Network management architectures or arrangements comprising distributed management centres cooperatively managing the network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0631—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0677—Localisation of faults
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/04—Processing captured monitoring data, e.g. for logfile generation
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses an operation and maintenance monitoring and management system, which comprises: the system comprises a resource layer, a data storage layer, a technical layer, an application layer and a presentation layer; the resource layer is used for describing system deployment resources and monitoring server resources, and comprises external field equipment resources; the data storage layer is used for storing the acquired data in a classified manner; the technical layer is used for data acquisition, data processing, user management and authority management; the application layer is used for monitoring outlines, resource management, data monitoring, service interfaces, host portrait, equipment management, system logs and system management; the display layer is used for visual design presentation through a PC and a large screen. The system adopts a cooperative working mode of the server and the client, is lighter and more efficient, and can support the simultaneous online monitoring of thousands of hosts; the system has the functions of collecting monitoring data in real time, feeding back monitoring states in real time, predicting faults and alarming, assisting in positioning faults, assisting in performance tuning, assisting in capacity planning and assisting in automatic operation and maintenance.
Description
Technical Field
The invention relates to the technical field of operation and maintenance monitoring, in particular to an operation and maintenance monitoring and management system.
Background
The existing operation and maintenance monitoring and management system has low operation and maintenance management level and efficiency and few functions, lacks a set of operation and maintenance monitoring and management system covering the related functions of cluster automatic installation, centralized management, cluster monitoring, alarming and the like, and can not meet the requirements of users.
Disclosure of Invention
In view of the technical deficiencies, the invention aims to provide an operation and maintenance monitoring and management system, which covers the relevant functions of cluster automatic installation, centralized management, cluster monitoring, alarming and the like.
In order to solve the technical problems, the invention adopts the following technical scheme:
an operation and maintenance monitoring and management system, comprising: the system comprises a resource layer, a data storage layer, a technical layer, an application layer and a presentation layer;
the resource layer is used for describing system deployment resources and monitoring server resources, and comprises external field equipment resources;
the data storage layer is used for storing the acquired data in a classified manner;
the technical layer is used for data acquisition, data processing, user management and authority management;
the application layer is used for monitoring outlines, resource management, data monitoring, service interfaces, host portrait, equipment management, system logs and system management;
the display layer is used for visual design presentation through a PC and a large screen.
Preferably, the monitoring server resource obtains the monitored resource from the server by means of a data probe.
The invention has the beneficial effects that: the method adopts a cooperative working mode of the server and the client, is lighter and more efficient, and can support the simultaneous online monitoring of thousands of hosts; the system has multiple functions of collecting monitoring data in real time, feeding back monitoring states in real time, predicting faults and alarming, assisting in positioning faults, assisting in performance tuning, assisting in capacity planning and assisting in automatic operation and maintenance.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is an architecture diagram of an operation and maintenance monitoring and management system provided by the present invention;
fig. 2 is a functional architecture diagram of an operation and maintenance monitoring and management system according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1-2, an operation and maintenance monitoring and management system includes: the system comprises a resource layer, a data storage layer, a technical layer, an application layer and a presentation layer;
the resource layer is used for describing system deployment resources and monitoring server resources, and comprises external field equipment resources;
the data storage layer is used for storing the acquired data in a classified manner;
the technical layer is used for data acquisition, data processing, user management and authority management;
the application layer is used for monitoring outlines, resource management, data monitoring, service interfaces, host portrait, equipment management, system logs and system management; the display layer is used for visual design display through a PC and a large screen; and large-screen visualization is realized, and all server indexes of operation and maintenance monitoring are represented in one picture through a large-screen display form. Including the number of servers, network, alarm, host status, etc.
Further, monitoring server resources acquire monitored resources from the server in a data probe mode, and acquire CPU, memory, storage and network information through a SigarProxy probe technology; the conditions of CPU occupation, memory occupation, storage occupation and port application of the monitoring process are detailed.
The technical capability index of the system is as follows:
1. monitoring a host cluster, wherein the default configuration can support 500+ host simultaneous online monitoring;
monitoring basic indexes such as a CPU (central processing unit), a memory, a system load, a magnetic disk and the like; (ii) a
3. Data monitoring (mysql, oracle, pg and the like) supports data statistics by writing sql, monitors the change trend of data, and desensitizes sensitive words. The data source is unsuccessfully connected and an alarm is given;
4. service heartbeat detection, namely health detection of a service interface, namely that you can return 200 hours, otherwise, even if the service is failed, an alarm is supported;
5. monitoring a process, namely monitoring whether the process normally runs by supporting a pid file, a process id and a process name, and supporting an alarm by using a memory and a cpu;
docker monitoring, the using state of the Docker is monitored, and the alarm is supported
7. Monitoring a magnetic disk, namely monitoring the use condition of the magnetic disk;
8. monitoring the port, namely monitoring whether the port is communicated with telnet or not, wherein the port eliminates the network firewall factor, is equivalent to a telnet localhost port and supports alarm;
9. monitoring log files, namely monitoring whether keywords exist in the logs or not, giving an alarm if the keywords exist, and specifying a specific log file or a directory where the logs are located, such as/usr/local/nginx/logs/access.log or/usr/local/nginx/logs/, wherein the latest log file under the directory can be read when the directory is specified;
10. the alarm mode is a mail by default, the execution of the alarm script is also supported, the alarm can be realized in the script by modes of nailing, WeChat and the like, and all index alarms can be closed and opened in the configuration file;
11. device management, which is useful for managing various devices of a company;
12. the host portrait is used for displaying all information of a cpu, a memory, a disk, a load, a monitoring port, a process, a docker, a log file and the like of the host.
The system monitoring system adopts a micro service architecture, English name is Microservice, and Microservice architecture mode is to organize the whole Web application into a series of small Web services. These small Web services can be compiled and deployed independently and communicate with each other through their respective exposed API interfaces. They cooperate with each other to provide functionality to the user as a whole, but can be independently expanded.
The system outline of the system can check the monitoring process quantity, the data source (database), the data table, the log monitoring, the service interface monitoring and the monitoring host index information through a system home page, and comprehensively show the operation and maintenance information in a chart form.
And (3) host management, namely, installing a probe under an application host to acquire data, wherein the host management mainly shows real-time running conditions of all server hosts monitored by operation and maintenance, including operating system, IP, internal memory, CPU, disk and network information.
And (4) process management, namely, the process management monitors the memory of the process and the resource occupation condition of the CPU in real time through the configured process and checks the historical running condition through a chart.
And port management, namely performing port operation conditions through the configured ports and checking the performed historical operation conditions through a chart.
And (3) data source management, wherein the data source management is monitored through a database such as a data source (Oracle, mysql, PG).
And monitoring a data interface, namely monitoring the data source management through a configured interface and monitoring the online condition of the interface in real time.
And (3) host portrait, namely evaluating the overall operation of the server through a visual chart, wherein all configured monitoring items are contained.
Data storage design, data storage period: the log is stored for 6 months, and other data are stored for a long time.
Backup frequency: and carrying out data backup by taking months as a backup period, and preparing the data to a backup server in a file form.
The system provides data services to the outside in a unified mode through Nginx load balancing and gateways, and an application cluster performs data cleaning, data management, data analysis and data services to the outside by searching data acquired by all probes;
data storage management: the data storage provides data display performance by main and standby storage of a Mysql database and Redis cache;
and Nacos is service registration capability and guarantees normal operation of the operation and maintenance system.
This monitored control system is based on little service springboot framework development, is the distributed monitored control system of light weight high performance, and the core is gathered the index and is included: the method comprises the steps of host system information, network flow, CPU state, CPU temperature, memory state, disk space and IO monitoring, hard disk smart health detection, system load, large screen visualization, ES cluster state, data visualization monitoring (mysql, oracle, pgsql and the like), service interface detection, application process monitoring, network topology, port monitoring, log file monitoring, docker monitoring, file tamper protection, digital communication equipment monitoring, Web SSH, bastion machine, instruction issuing and alarm information (mail WeChat nailing short message and the like) pushing.
The core acquisition indexes comprise: the method comprises the steps of monitoring host system information, network flow, CPU state, CPU temperature, memory state, disk space and IO (input/output) monitoring, system load, large screen visualization, data visualization monitoring (mysql, oracle, pgsql and the like), service interface detection, application process monitoring, port monitoring, log file monitoring and data communication equipment monitoring.
1. And a cooperative working mode of a server and a client is adopted, so that the system is lighter and more efficient, and can support the simultaneous online monitoring of thousands of hosts.
And 2, the server end is responsible for receiving data, processing the data and generating a chart display. and reporting the index data by the agent end every 30 seconds (time is adjustable) by default.
3. And supporting mainstream server platform installation and deployment, such as Linux, Windows, macOS, Unix and the like.
4. By adopting springboot + bootstrap, the distributed monitoring system is perfectly realized
The system has the following functions:
1. collecting monitoring data in real time: data of each dimension including hardware, an operating system, middleware, an application program and the like;
2. and (3) real-time feedback of the monitoring state: the state of the monitored object can be reflected in real time by carrying out multi-dimensional statistics and visual display on the acquired data;
3. predicting faults and alarms: the fault risk can be predicted in advance, and warning information can be sent out in time;
4. and (3) auxiliary fault positioning: providing various index data when a fault occurs, and assisting in fault analysis and positioning;
5. auxiliary performance optimization: providing data support for performance tuning, such as slow SQL, interface response time and the like;
6. auxiliary capacity planning: providing data support for capacity planning of a server, middleware and an application cluster;
7. auxiliary automation operation and maintenance: and data support is provided for intelligent operation and maintenance such as system capacity expansion.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (2)
1. An operation and maintenance monitoring and management system, comprising: the system comprises a resource layer, a data storage layer, a technical layer, an application layer and a presentation layer;
the resource layer is used for describing system deployment resources and monitoring server resources, and comprises external field equipment resources;
the data storage layer is used for storing the acquired data in a classified manner;
the technical layer is used for data acquisition, data processing, user management and authority management;
the application layer is used for monitoring outlines, resource management, data monitoring, service interfaces, host portrait, equipment management, system logs and system management;
the display layer is used for visual design presentation through a PC and a large screen.
2. The operation and maintenance monitoring and management system according to claim 1, wherein the monitoring server resource obtains the monitored resource from the server by means of a data probe.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210047374.3A CN114389937A (en) | 2022-01-17 | 2022-01-17 | Operation and maintenance monitoring and management system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210047374.3A CN114389937A (en) | 2022-01-17 | 2022-01-17 | Operation and maintenance monitoring and management system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114389937A true CN114389937A (en) | 2022-04-22 |
Family
ID=81202074
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210047374.3A Withdrawn CN114389937A (en) | 2022-01-17 | 2022-01-17 | Operation and maintenance monitoring and management system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114389937A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114827678A (en) * | 2022-04-29 | 2022-07-29 | 广东省广播电视网络股份有限公司中山分公司 | Operation and maintenance monitoring and analyzing system for digital television front-end platform |
CN115442223A (en) * | 2022-07-19 | 2022-12-06 | 写逸网络科技(上海)有限公司 | Automatic operation and maintenance method for distributed cluster |
CN115811458A (en) * | 2022-11-17 | 2023-03-17 | 浪潮云信息技术股份公司 | Monitoring method and system based on springboot micro-service |
-
2022
- 2022-01-17 CN CN202210047374.3A patent/CN114389937A/en not_active Withdrawn
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114827678A (en) * | 2022-04-29 | 2022-07-29 | 广东省广播电视网络股份有限公司中山分公司 | Operation and maintenance monitoring and analyzing system for digital television front-end platform |
CN115442223A (en) * | 2022-07-19 | 2022-12-06 | 写逸网络科技(上海)有限公司 | Automatic operation and maintenance method for distributed cluster |
CN115811458A (en) * | 2022-11-17 | 2023-03-17 | 浪潮云信息技术股份公司 | Monitoring method and system based on springboot micro-service |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114389937A (en) | Operation and maintenance monitoring and management system | |
US6871228B2 (en) | Methods and apparatus in distributed remote logging system for remote adhoc data analysis customized with multilevel hierarchical logger tree | |
RU2636848C2 (en) | Method of estimating power consumption | |
Castelli et al. | Proactive management of software aging | |
CN104506393B (en) | A kind of system monitoring method based on cloud platform | |
US5668944A (en) | Method and system for providing performance diagnosis of a computer system | |
US5495607A (en) | Network management system having virtual catalog overview of files distributively stored across network domain | |
US7251588B2 (en) | System for metric introspection in monitoring sources | |
US7917536B2 (en) | Systems, methods and computer program products for managing a plurality of remotely located data storage systems | |
CN111209011A (en) | Cross-platform container cloud automatic deployment system | |
CN108365985A (en) | A kind of cluster management method, device, terminal device and storage medium | |
CN106487574A (en) | Automatic operating safeguards monitoring system | |
CN108259270A (en) | A kind of data center's system for unified management design method | |
CN104022903A (en) | One-stop automatic operation and maintaining system | |
EP1759303A2 (en) | Agent-less systems, methods and computer program products for managing a plurality of remotely located data storage systems | |
CN113076229B (en) | General enterprise-level information technology monitoring system | |
CN110598051A (en) | Power industry monitoring system, method and device | |
CN107678915A (en) | A kind of power transmission and transforming equipment monitoring platform basic resource monitoring method | |
CN107704361A (en) | A kind of power transmission and transforming equipment monitoring platform basic resource monitoring system | |
US7779063B2 (en) | Automatic benefit analysis of dynamic cluster management solutions | |
WO2019241199A1 (en) | System and method for predictive maintenance of networked devices | |
CN115934464A (en) | Information platform monitoring and collecting system | |
Yuan et al. | Design and implementation of accelerator control monitoring system | |
Doliwa et al. | Network monitoring and management for company with hybrid and distributed infrastructure | |
CN116737514B (en) | Automatic operation and maintenance method based on log and probe analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20220422 |
|
WW01 | Invention patent application withdrawn after publication |