CN110287079A - A kind of cluster Automatic monitoring systems and method - Google Patents

A kind of cluster Automatic monitoring systems and method Download PDF

Info

Publication number
CN110287079A
CN110287079A CN201910402304.3A CN201910402304A CN110287079A CN 110287079 A CN110287079 A CN 110287079A CN 201910402304 A CN201910402304 A CN 201910402304A CN 110287079 A CN110287079 A CN 110287079A
Authority
CN
China
Prior art keywords
data
server
cluster
intelligent body
early warning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910402304.3A
Other languages
Chinese (zh)
Inventor
杨杰
卢宇彤
杜云飞
颜辉
曾凌波
彭运勇
蒋迁谦
王红颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
National Sun Yat Sen University
Original Assignee
National Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Sun Yat Sen University filed Critical National Sun Yat Sen University
Priority to CN201910402304.3A priority Critical patent/CN110287079A/en
Publication of CN110287079A publication Critical patent/CN110287079A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/328Computer systems status display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/80Database-specific techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/875Monitoring of systems including the internet

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer And Data Communications (AREA)

Abstract

The present invention relates to network monitoring fields, and more particularly, to a kind of cluster Automatic monitoring systems and method, the system includes data acquisition module, main monitoring server, database server cluster, web server and front-end server;The data acquisition module deployment that the present invention uses is simple, is easy to carry out to monitor deployment on a large scale, accuracy, real-time are high, and scalability of the invention is strong, each monitoring function module is mutually indepedent, coupling is low, convenient for expanding of system function, the present invention just will appreciate that the operating condition of entire cluster, simple and effective by browser without downloading client.

Description

A kind of cluster Automatic monitoring systems and method
Technical field
The present invention relates to network monitoring fields, more particularly, to a kind of cluster Automatic monitoring systems and method.
Background technique
With the rapid development of hardware technology and the reduction of hardware cost, the scale of High Performance Computing Cluster is continuous Expand, cluster management work also becomes to become increasingly complex.The management work of cluster is exactly it is to be understood that each key service in cluster Operating status, the monitoring of key service log, each server hardware state (load, memory usage etc.), entire cluster resource Service condition etc.;If the automatically-monitored software of operating condition cluster that quickly understand entire cluster comprehensively is indispensable.
There are mainly three types of domestic each Supercomputer Center's cluster monitoring technologies:
(1) order line monitors: order line monitoring needs manually to go to execute order and checks system mode, needs to rely on artificial It executes, so that order line monitoring does not have real-time.
(2) script monitors: script monitoring is that the various orders of systems inspection are formed one by logical combination to can be performed Then program is periodically called by operating system, is executed, to check the state of system;Although script monitoring solves manually The problem of execution, but be a lack of monitoring visualization interface and do not have and function is actively pushed away to early warning, warning message.
(3) automatically-monitored: Automatic monitoring systems bottom data acquisition is script, is by the way that script to be put into Taken at regular intervals data are carried out in the periodic task of system, and realize web services, although seeing the basic feelings of system in front end page Condition, but the real-time of monitoring data and correctness are poor, ununified data acquire intelligent body, and cannot actively push away Send pre-, warning message.
Now domestic each Supercomputer Center the generally existing monitoring data collection of automatically-monitored aspect it is not automatic, lack pair In advance, the active push function of warning message, none unifies the monitoring page of display systems state.
Summary of the invention
In order to solve in currently available technology the generally existing monitoring data collection of automatically-monitored aspect it is not automatic, lack To pre-, warning message active push function, none unifies the deficiency of the monitoring page of display systems state, and the present invention mentions A kind of cluster Automatic monitoring systems are supplied.
In order to solve the above technical problems, technical scheme is as follows:
A kind of cluster Automatic monitoring systems, including data acquisition module, main monitoring server, database server collection Group, web server and front-end server;
The data acquisition module includes data acquisition intelligent body, and the data acquisition intelligent body is deployed in monitored Server cluster on and acquire the data of monitored server cluster;
The main monitoring server is used to be communicated with data acquisition intelligent body, receives data acquisition intelligent body acquisition Data;
The database server cluster is used to store the data acquisition intelligent body acquisition that main monitoring server receives The data arrived;
The web server is for the data in the server cluster of called data library and is analyzed, handles and filter out Early warning, warning message carry out active push;
The front-end server is for visualizing the data of web server.
Preferably, the main monitoring server includes two-server, and two-server passes through Keepalived+ The High Availabitity framework of Nginx formation principal and subordinate.When primary server works, it is in the state that monitoring prepares from server, when main service When device delay machine, from the work of server adapter tube primary server, after primary server restores, monitoring service can smoothly switch to main clothes It is engaged on device.
Preferably, every server of the main monitoring server include main monitoring service data acquisition module and Main monitoring service data filtering module, the main monitoring service data acquisition module are sent for handling data acquisition intelligent body The data that come over simultaneously send the data to main monitoring service data filtering module, and main monitoring service data filtering module is used for logarithm According to handle and will treated that data are stored into database server cluster.
Preferably, it disposes in python micro services frame nameko, nameko frame and contains in the web server Timing function and restful api function, timing function periodically connect database server cluster, and to database Data in server cluster are handled, and early warning, warning message are filtered out from collected data, will by timing function Early warning, warning message active push are gone out;Restful api function is by collected data jsonization for front-end server tune With.
Preferably, the json data that the front-end server will acquire can by Nginx+Vue framework realization front end Evaluation of markers is carried out depending on changing, while to data, early warning, warning message is marked, the fortune of cluster is reflected in a manner of chart, curve etc. Market condition.
The present invention also provides a kind of automatically-monitored methods of cluster, comprising the following steps:
Step S1: data are disposed on monitored server and acquire intelligent body, and log-on data acquires intelligent body;
Step S2: data acquire all data on the intelligent body server cluster that actively acquisition is monitored, and will acquisition To data be sent in the server cluster of specified data library;
Timing function on step S3:web server periodically connects database server cluster, and to database Data in server cluster are handled, and early warning, warning message are filtered out from data;
Step S4: if including that early warning is alert, warning message in data, by timing function by early warning, warning message master It is dynamic to push out;Restful api function calls monitoring data jsonization for front-end server;If without early warning, alarm Information then directly executes step S5;
Step S5: front-end server obtains data by restful api from web server, the data that will acquire The visualization of front end is realized by Nginx+Vue framework, while evaluation of markers is carried out to data and goes out early warning, warning message, to scheme The modes such as table, curve reflect the operating condition of cluster.
Compared with prior art, the beneficial effect of technical solution of the present invention is:
The acquisition intelligent body deployment of data that the present invention uses is simple, is easy to extend, and bottom intelligent body is independently disposed, be easy into The extensive monitoring deployment of row, accuracy, real-time are high, and each monitoring function module is mutually indepedent, coupling is low, are convenient for system function Extension, the present invention is easy to use, without downloading client, just will appreciate that the operating condition of entire cluster by browser, easy Efficiently.
Detailed description of the invention
Fig. 1 is system construction drawing of the invention.
Fig. 2 is flow chart of the method for the present invention.
Specific embodiment
The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent;
In order to better illustrate this embodiment, the certain components of attached drawing have omission, zoom in or out, and do not represent actual product Size;
To those skilled in the art, it is to be understood that certain known features and its explanation, which may be omitted, in attached drawing 's.
The following further describes the technical solution of the present invention with reference to the accompanying drawings and examples.
Embodiment 1
As shown in Figure 1, a kind of cluster Automatic monitoring systems, including data acquisition module 1, main monitoring server 2, data Library server cluster 3, Web server 4 and front-end server 5;
The data acquisition module 1 includes that data acquire intelligent body, and the data acquisition intelligent body, which is deployed in, is supervised On the server cluster of control and acquire the data of monitored server cluster;
The main monitoring server 2 is used to be communicated with data acquisition intelligent body, receives data acquisition intelligent body and adopts The data of collection;
The database server cluster 3 is used to store the data acquisition intelligent body that main monitoring server 2 receives and adopts The data collected;
The web server 4 is for the data in called data library server cluster 3 and is analyzed, handles screening Early warning, warning message carry out active push out;
The front-end server 5 is for visualizing the data of web server 4.
As a preferred embodiment, the main monitoring server 2 includes two-server, and two-server passes through The High Availabitity framework of Keepalived+Nginx formation principal and subordinate.When primary server works, the shape that monitoring prepares is in from server State, when primary server delay machine, from the work of server adapter tube primary server, after primary server restores, monitoring service can be smoothly Switch on primary server.
As a preferred embodiment, every server of the main monitoring server 2 includes main monitoring service Data acquisition module 1 and main monitoring service data filtering module, the main monitoring service data acquisition module 1 is for handling Data acquire the data that intelligent body sends over and send the data to main monitoring service data filtering module, main monitoring service number It according to filtering module is used to that data to be carried out to handle and data is stored into database server cluster 3 by treated.
As a preferred embodiment, python micro services frame nameko is disposed in the web server 4, Timing function and restful api function are contained in nameko frame, timing function periodically connects database clothes Business device cluster 3, and the data in database server cluster 3 are handled, early warning, report are filtered out from collected data Alert information, is gone out early warning, warning message active push by timing function;Restful api function is by collected data Jsonization is called for front-end server 5.
As a preferred embodiment, the json data that the front-end server 5 will acquire pass through Nginx+Vue frame Structure realizes the visualization of front end, while carrying out evaluation of markers to data, marks early warning, warning message, with the side such as chart, curve The operating condition of formula reflection cluster.
Embodiment 2
As shown in Fig. 2, the present embodiment additionally provides a kind of automatically-monitored method of cluster, comprising the following steps:
Step S1: data are disposed on monitored server and acquire intelligent body, and log-on data acquires intelligent body;
Step S2: data acquire all data on the intelligent body server cluster that actively acquisition is monitored, and will acquisition To data be sent in the server cluster of specified data library;
Timing function on step S3:web server 4 periodically connects database server cluster 3, and to data Data in library server cluster 3 are handled, and early warning, warning message are filtered out from data;
Step S4: if including that early warning is alert, warning message in data, by timing function by early warning, warning message master It is dynamic to push out;Restful api function calls monitoring data jsonization for front-end server 5;If without early warning, alarm Information then directly executes step S5;
Step S5: front-end server 5 obtains data from web server 4 by restful api, the number that will acquire According to the visualization for realizing front end by Nginx+Vue framework, while evaluation of markers is carried out to data and goes out early warning, warning message, with The modes such as chart, curve reflect the operating condition of cluster.
The same or similar label correspond to the same or similar components;
The terms describing the positional relationship in the drawings are only for illustration, should not be understood as the limitation to this patent;
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention Protection scope within.

Claims (6)

1. a kind of cluster Automatic monitoring systems, which is characterized in that including data acquisition module (1), main monitoring server (2), Database server cluster (3), web server (4) and front-end server (5);
The data acquisition module (1) includes data acquisition intelligent body, and the data acquisition intelligent body is deployed in monitored Server cluster on and acquire the data of monitored server cluster;
The main monitoring server (2) is used to be communicated with data acquisition intelligent body, receives data acquisition intelligent body acquisition Data;
The database server cluster (3) is used to store the data acquisition intelligent body that main monitoring server (2) receives and adopts The data collected;
The web server (4) is for the data in called data library server cluster (3) and is analyzed, handles screening Early warning, warning message carry out active push out;
The front-end server (5) is for visualizing the data of web server (4).
2. a kind of cluster Automatic monitoring systems according to claim 1, which is characterized in that the main monitoring server It (2) include two-server, two-server forms the High Availabitity framework of principal and subordinate by Keepalived+Nginx.
3. a kind of cluster Automatic monitoring systems according to claim 2, which is characterized in that the main monitoring server (2) every server includes main monitoring service data acquisition module (1) and main monitoring service data filtering module, described Main monitoring service data acquisition module (1) be used to handle the data data that send over of acquisition intelligent body and send the data to Main monitoring service data filtering module, main monitoring service data filtering module be used for data carry out handle and will treated number According to storing into database server cluster (3).
4. a kind of cluster Automatic monitoring systems according to claim 2, which is characterized in that the web server (4) Timing function and restful api function are contained in upper deployment python micro services frame nameko, nameko frame, it is fixed When the device periodicity of function connect database server cluster (3), and to the data in database server cluster (3) at Reason, early warning, warning message are filtered out from collected data, is gone out early warning, warning message active push by timing function It goes;Restful api function calls collected data jsonization for front-end server (5).
5. a kind of cluster Automatic monitoring systems according to claim 3, which is characterized in that the front-end server (5) the json data that will acquire realize the visualization of front end by Nginx+Vue framework, while carrying out evaluation of markers to data, Early warning, warning message are marked, the operating condition of cluster is reflected in a manner of chart, curve etc..
6. a kind of automatically-monitored method of cluster, which is characterized in that the method is based on described in any one of Claims 1 to 5 System, which comprises the following steps:
Step S1: data are disposed on monitored server and acquire intelligent body, and log-on data acquires intelligent body;
Step S2: data acquire all data on the intelligent body server cluster that actively acquisition is monitored, and will be collected Data are sent in specified data library server cluster (3);
Timing function on step S3:web server (4) periodically connects database server cluster (3), and to data Data in library server cluster (3) are handled, and early warning, warning message are filtered out from data;
Step S4: if in data including early warning police, warning message, early warning, warning message are actively pushed away by timing function It sees off;Restful api function calls monitoring data jsonization for front-end server (5);If without early warning, alarm signal Breath then directly executes step S5;
Step S5: front-end server (5) obtains data, the number that will acquire by restful api from web server (4) According to the visualization for realizing front end by Nginx+Vue framework, while evaluation of markers is carried out to data and goes out early warning, warning message, with The modes such as chart, curve reflect the operating condition of cluster.
CN201910402304.3A 2019-05-14 2019-05-14 A kind of cluster Automatic monitoring systems and method Pending CN110287079A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910402304.3A CN110287079A (en) 2019-05-14 2019-05-14 A kind of cluster Automatic monitoring systems and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910402304.3A CN110287079A (en) 2019-05-14 2019-05-14 A kind of cluster Automatic monitoring systems and method

Publications (1)

Publication Number Publication Date
CN110287079A true CN110287079A (en) 2019-09-27

Family

ID=68001893

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910402304.3A Pending CN110287079A (en) 2019-05-14 2019-05-14 A kind of cluster Automatic monitoring systems and method

Country Status (1)

Country Link
CN (1) CN110287079A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110798348A (en) * 2019-10-28 2020-02-14 海南电网有限责任公司 Fault warning method, server and system for power distribution communication network
CN111949487A (en) * 2020-08-14 2020-11-17 杭州溪塔科技有限公司 Block chain monitoring system and method with dynamically pluggable modules

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106453504A (en) * 2016-09-13 2017-02-22 杭州东方通信软件技术有限公司 Monitoring system and method based on NGINX server cluster
CN106993037A (en) * 2017-03-31 2017-07-28 山东超越数控电子有限公司 A kind of method that load-balanced server based on distributed system realizes high availability
US20180048545A1 (en) * 2016-08-11 2018-02-15 Hewlett Packard Enterprise Development Lp Container monitoring configuration deployment
CN107943668A (en) * 2017-12-15 2018-04-20 江苏神威云数据科技有限公司 Computer server cluster daily record monitoring method and monitor supervision platform

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180048545A1 (en) * 2016-08-11 2018-02-15 Hewlett Packard Enterprise Development Lp Container monitoring configuration deployment
CN106453504A (en) * 2016-09-13 2017-02-22 杭州东方通信软件技术有限公司 Monitoring system and method based on NGINX server cluster
CN106993037A (en) * 2017-03-31 2017-07-28 山东超越数控电子有限公司 A kind of method that load-balanced server based on distributed system realizes high availability
CN107943668A (en) * 2017-12-15 2018-04-20 江苏神威云数据科技有限公司 Computer server cluster daily record monitoring method and monitor supervision platform

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110798348A (en) * 2019-10-28 2020-02-14 海南电网有限责任公司 Fault warning method, server and system for power distribution communication network
CN111949487A (en) * 2020-08-14 2020-11-17 杭州溪塔科技有限公司 Block chain monitoring system and method with dynamically pluggable modules

Similar Documents

Publication Publication Date Title
Rabkin et al. Aggregation and Degradation in {JetStream}: Streaming Analytics in the Wide Area
CN106487596B (en) Distributed service tracking implementation method
US7720841B2 (en) Model-based self-optimizing distributed information management
CN105718351A (en) Hadoop cluster-oriented distributed monitoring and management system
KR101797185B1 (en) Efficiently collecting transaction-separated metrics in a distributed environment
CN107943668A (en) Computer server cluster daily record monitoring method and monitor supervision platform
CN103401698B (en) For the monitoring system that server health is reported to the police in server set group operatione
US9043794B2 (en) Scalable group synthesis
CN111339175B (en) Data processing method, device, electronic equipment and readable storage medium
CN106940677A (en) One kind application daily record data alarm method and device
CN110287079A (en) A kind of cluster Automatic monitoring systems and method
CN110855493B (en) Application topological graph drawing device for mixed environment
US10044820B2 (en) Method and system for automated transaction analysis
CN112615742A (en) Method, device, equipment and storage medium for early warning
CN106911519B (en) Data acquisition monitoring method and device
CN114785690B (en) Monitoring method based on service grid and related equipment
CN107579858A (en) The alarm method and device of cloud main frame, communication system
CN109714222A (en) The distributed computer monitoring system and its monitoring method of High Availabitity
CN102916846A (en) Monitoring method and system
Pevec et al. Distributed data platform for automotive industry: A robust solution for tackling big challenges of big data in transportation science
CN115766768B (en) Perception center design method and device in computing power network operation system
US20140040460A1 (en) Transaction data acquisition method, recording medium, and information processing apparatus
CN116089247A (en) Micro-service index early warning method and device based on index threshold and big data analysis
CN110289981A (en) A kind of high-performance calculation Internet monitoring method and system
CN115134262A (en) RocktMQ monitoring method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190927

RJ01 Rejection of invention patent application after publication