CN105718351B - A kind of distributed monitoring management system towards Hadoop clusters - Google Patents
A kind of distributed monitoring management system towards Hadoop clusters Download PDFInfo
- Publication number
- CN105718351B CN105718351B CN201610010050.7A CN201610010050A CN105718351B CN 105718351 B CN105718351 B CN 105718351B CN 201610010050 A CN201610010050 A CN 201610010050A CN 105718351 B CN105718351 B CN 105718351B
- Authority
- CN
- China
- Prior art keywords
- module
- data
- monitoring
- distributed
- hadoop
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3006—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3055—Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/32—Monitoring with visual or acoustical indication of the functioning of the machine
- G06F11/324—Display of status information
- G06F11/327—Alarm or error message display
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
Claims (10)
- A kind of 1. distributed monitoring management system for Hadoop group systems, it is characterised in that including:Performance monitoring module, Fault alarm module, comprehensive analysis enquiry module, overview display module, data memory module, configuration management module, system administration Module, wherein:Performance monitoring module is used for the performance of each monitored node in monitoring distributed group system, and the monitoring that will be collected Data Cun Chudao data memory modules, the monitoring data include server resource, Hadoop Metrics, Hadoop component days Will and other assemblies daily record;Fault alarm module is used to carry out fault alarm according to the monitoring data stored in data memory module, or receives distribution The alert data of in formula group system or independently of distributed cluster system monitor node and monitored node transmission, Fault alarm is carried out by the alert data received storage to data memory module and according to the data, the fault alarm includes Monitor supervision platform state, malfunctioning node, crashed process and failed services are found, failure and processing information are recorded, for different etc. The failure of level, notifies the administrative staff of different stage to handle;The alert data includes module information, Hadoop cluster states are believed Breath and server info;Comprehensive analysis enquiry module is used to read the monitoring data or alert data in data memory module, carries out calculating analysis, Analysis result after calculating is stored in data memory module;Data memory module is used to store monitoring data or alert data;Overview display module is used for the analysis result for showing comprehensive analysis enquiry module:Comprehensive analysis enquiry module is called, is obtained Various achievement datas, realize that analysis result visualizes;System management module is used to carry out user management and rights management:Configuration management work(to distributed type assemblies Hadoop platform Only system operator can be opened, domestic consumer only possesses the monitoring function to platform;Configuration management module is used to carry out distributed cluster system unified configuration:Realized based on zookeeper distributed unified Configuration service.
- 2. distributed monitoring management system according to claim 1, it is characterised in that:Performance monitoring module includes collection module and convergence module;Collection module is used for the monitoring data for reading monitored node, and by the monitoring data transmission being collected into convergence module;Convergence module collection monitoring data and collect storage arrive data memory module.
- 3. distributed monitoring management system according to claim 1, it is characterised in that:Fault alarm module, for the data in scan data memory module, the rank and species of warning information are determined, is sent short Letter or mail alarm;Or the alert data of the warning message collection module transmission on monitor node and monitored node is received, Data memory module is arrived into the alert data storage of the reception, and according to the rank and species of alert data, sends short message or postal Part is alarmed.
- 4. distributed monitoring management system according to claim 1, it is characterised in that:Overview display module carry out it is following it The displaying of one or its combination:(1) alarm today project statistics:Current cluster malfunction is shown in the form of block diagram, how many failed services Device, failed services and faulty components;(2) cluster server state:Cluster server is divided into three kinds of states:Normally, failure and high load capacity;(3) alarm list is not solved:All unsolved alarms;(4) resource that can change granularity uses timing diagram:Including cpu busy percentage, memory usage.
- 5. distributed monitoring management system according to claim 1, it is characterised in that:Data memory module include RRD and MysqL, in RRD, alert data is stored in MysqL supervising data storage.
- 6. a kind of distributed monitoring management method for Hadoop group systems, methods described is by one of claim 1-5 institutes The distributed monitoring management system stated is realized, it is characterised in that is comprised the following steps:Monitored node in the monitoring distributed group system of step 1., it is described by supervising data storage to data memory module Monitoring data includes server resource, Hadoop Metrics, the daily record of Hadoop components and other assemblies daily record;Step 2. carries out fault alarm according to the monitoring data of storage, or receives in distributed cluster system or independent In the alert data of the warning message collection module transmission on the monitor node and monitored node of distributed cluster system, by this The alert data storage of reception carries out fault alarm to data memory module and according to the alert data, and the fault alarm includes Monitor supervision platform state, malfunctioning node, crashed process and failed services are found, failure and processing information are recorded, for different etc. The failure of level, notifies the administrative staff of different stage to handle;The alert data includes module information, Hadoop cluster states are believed Breath and server info;Step 3. reads monitoring data or alert data in data memory module, carries out calculating analysis, preserves point after calculating Analyse result;Step 4. shows the analysis result of comprehensive analysis enquiry module:Comprehensive analysis enquiry module is called, obtains various index numbers According to, realize analysis result visualize;Step 5. carries out user management and rights management:To the configuration management function of distributed type assemblies Hadoop platform only to system Manager opens, and domestic consumer only possesses the monitoring function to platform;Step 6. carries out unified configuration to distributed cluster system:Distributed unified configuration service is realized based on zookeeper.
- 7. distributed monitoring management method according to claim 6, it is characterised in that:Monitoring in step 1 includes:The monitoring data of monitored node is read, the monitoring data being collected into is collected into storage.
- 8. distributed monitoring management method according to claim 6, it is characterised in that:Fault alarm in step 2 is specific For the data in scan data memory module, the rank and species of warning information are determined, sends short message or mail alarm;Or connect The alert data that the warning message collection module on monitor node and monitored node is transmitted is received, by the alert data of the reception Data memory module is stored, and according to the rank and species of alert data, sends short message or mail alarm.
- 9. distributed monitoring management method according to claim 6, it is characterised in that:Overview display in step 4 includes One or a combination set of following displaying:(1) alarm today project statistics:Current cluster malfunction is shown in the form of block diagram, how many failed services Device, failed services and faulty components;(2) cluster server state:Cluster server is divided into three kinds of states:Normally, failure and high load capacity;(3) alarm list is not solved:All unsolved alarms;(4) resource that can change granularity uses timing diagram:Including cpu busy percentage, memory usage.
- 10. distributed monitoring management method according to claim 6, it is characterised in that:Data memory module include RRD and MysqL, in RRD, alert data is stored in MysqL supervising data storage.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610010050.7A CN105718351B (en) | 2016-01-08 | 2016-01-08 | A kind of distributed monitoring management system towards Hadoop clusters |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610010050.7A CN105718351B (en) | 2016-01-08 | 2016-01-08 | A kind of distributed monitoring management system towards Hadoop clusters |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105718351A CN105718351A (en) | 2016-06-29 |
CN105718351B true CN105718351B (en) | 2018-02-09 |
Family
ID=56147721
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610010050.7A Active CN105718351B (en) | 2016-01-08 | 2016-01-08 | A kind of distributed monitoring management system towards Hadoop clusters |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105718351B (en) |
Families Citing this family (45)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106375113B (en) * | 2016-08-25 | 2020-01-17 | 新华三技术有限公司 | Method, device and system for recording faults of distributed equipment |
CN106407075B (en) * | 2016-09-19 | 2019-09-13 | 广州视源电子科技股份有限公司 | Management method and system for big data platform |
CN106487597A (en) * | 2016-10-26 | 2017-03-08 | 努比亚技术有限公司 | A kind of service monitoring system and method based on Zookeeper |
CN106453377B (en) * | 2016-10-28 | 2021-03-02 | 中金云金融(北京)大数据科技股份有限公司 | Block chain based distributed network intelligent monitoring system and method |
CN106776288B (en) * | 2016-11-25 | 2019-11-19 | 北京航空航天大学 | A kind of health metric method of the distributed system based on Hadoop |
CN106533792A (en) * | 2016-12-12 | 2017-03-22 | 北京锐安科技有限公司 | Method and device for monitoring and configuring resources |
CN108255661A (en) * | 2016-12-29 | 2018-07-06 | 北京京东尚科信息技术有限公司 | A kind of method and system for realizing Hadoop cluster monitorings |
CN107135119B (en) * | 2017-04-18 | 2020-05-05 | 国网福建省电力有限公司 | Business response tracking and interface state monitoring development system |
CN107168847A (en) * | 2017-04-21 | 2017-09-15 | 国家电网公司 | The full link application monitoring method and device of a kind of support distribution formula framework |
CN107483568A (en) * | 2017-08-04 | 2017-12-15 | 中兴软创科技股份有限公司 | It is a kind of based on cloud platform can flexible scheduling network and service monitoring system |
CN107729096A (en) * | 2017-09-20 | 2018-02-23 | 中国银行股份有限公司 | Shunting information method and system |
CN109697070B (en) * | 2017-10-23 | 2022-02-18 | 中移(苏州)软件技术有限公司 | Ambari-based cluster management method, device and medium |
CN107908526A (en) * | 2017-10-26 | 2018-04-13 | 北京人大金仓信息技术股份有限公司 | Centralized large-scale cluster monitoring early-warning system based on Web |
CN108111600A (en) * | 2017-12-20 | 2018-06-01 | 山东浪潮云服务信息科技有限公司 | A kind of data managing method and intelligent operation platform |
CN108134697B (en) * | 2017-12-21 | 2021-01-19 | 四川管理职业学院 | Hadoop architecture cloud platform risk assessment and early warning method |
CN108390907B (en) * | 2018-01-09 | 2021-06-22 | 浙江航天恒嘉数据科技有限公司 | Management monitoring system and method based on Hadoop cluster |
CN108418710B (en) * | 2018-02-09 | 2021-03-26 | 北京奇艺世纪科技有限公司 | Distributed monitoring system, method and device |
CN108459944A (en) * | 2018-03-29 | 2018-08-28 | 中科创能实业有限公司 | System operation monitoring method, device and server |
CN108449438B (en) * | 2018-05-22 | 2023-08-22 | 郑州云海信息技术有限公司 | Cluster CDC data monitoring device, system and method |
CN108959048A (en) * | 2018-06-22 | 2018-12-07 | 北京优特捷信息技术有限公司 | The method for analyzing performance of modular environment, device and can storage medium |
CN109165137A (en) * | 2018-07-27 | 2019-01-08 | 曙光信息产业(北京)有限公司 | data analysis and alarm method and system |
CN108763038B (en) * | 2018-08-08 | 2022-04-12 | 平安科技(深圳)有限公司 | Alarm data management method and device, computer equipment and storage medium |
CN109298945A (en) * | 2018-10-17 | 2019-02-01 | 北京京航计算通讯研究所 | The monitoring of Ceph distributed storage and tuning management method towards big data platform |
CN109347703B (en) * | 2018-11-21 | 2022-05-03 | 中国船舶重工集团公司第七一六研究所 | CPS node fault detection device and method |
CN109726077A (en) * | 2018-12-21 | 2019-05-07 | 中冶建筑研究总院有限公司 | A kind of Enterprise Project lightweight safety management control data platform |
CN109726211B (en) * | 2018-12-27 | 2020-02-04 | 无锡华云数据技术服务有限公司 | Distributed time sequence database |
CN109885544A (en) * | 2019-01-14 | 2019-06-14 | 中国海洋大学 | A kind of log storing method and system towards ocean big data cluster |
CN109951313B (en) * | 2019-01-18 | 2022-04-19 | 长江大学 | Monitoring device and method for Hadoop cloud platform |
CN109886327B (en) * | 2019-02-12 | 2021-11-19 | 北京奇艺世纪科技有限公司 | System and method for processing Java data in distributed system |
CN111694705A (en) * | 2019-03-15 | 2020-09-22 | 北京沃东天骏信息技术有限公司 | Monitoring method, device, equipment and computer readable storage medium |
WO2021102617A1 (en) * | 2019-11-25 | 2021-06-03 | 深圳晶泰科技有限公司 | Multi-public cloud computing platform-oriented cluster monitoring system and monitoring method therefor |
CN112104493A (en) * | 2020-09-07 | 2020-12-18 | 成都精灵云科技有限公司 | Acquisition and analysis system for low-delay host resource monitoring in cluster environment |
CN112328445B (en) * | 2020-10-27 | 2023-11-14 | 许继集团有限公司 | Multi-node management system based on condul |
CN112526974A (en) * | 2020-12-04 | 2021-03-19 | 中国航空工业集团公司成都飞机设计研究所 | Universal test data acquisition system adopting plug-in management architecture |
CN112486776B (en) * | 2020-12-07 | 2024-08-02 | 中国船舶集团有限公司第七一六研究所 | Cluster member node availability monitoring device and method |
CN112636979B (en) * | 2020-12-24 | 2022-08-12 | 北京浪潮数据技术有限公司 | Cluster alarm method and related device |
CN112667430A (en) * | 2021-01-14 | 2021-04-16 | 电子科技大学中山学院 | Big data cluster management method and device |
CN113626280B (en) * | 2021-06-30 | 2024-02-09 | 广东浪潮智慧计算技术有限公司 | Cluster state control method and device, electronic equipment and readable storage medium |
CN113419925A (en) * | 2021-08-25 | 2021-09-21 | 天津南大通用数据技术股份有限公司 | Monitoring method and system for monitoring and alarming multiple distributed MPP clusters |
CN113868099A (en) * | 2021-10-20 | 2021-12-31 | 苏州中科先进技术研究院有限公司 | Data monitoring system |
CN114458968A (en) * | 2021-12-29 | 2022-05-10 | 浙江中控技术股份有限公司 | Alarm integrated management system of oil-gas long-distance pipeline |
CN114584593A (en) * | 2022-03-28 | 2022-06-03 | 中国电子科技集团公司第三十八研究所 | Data acquisition system and method based on cluster state perception |
CN114629812A (en) * | 2022-03-28 | 2022-06-14 | 中国电子科技集团公司第三十八研究所 | Cluster visualization system and method based on autonomous controllable platform |
CN115296868A (en) * | 2022-07-22 | 2022-11-04 | 联通沃音乐文化有限公司 | Music operation background management system and method based on cloud computing |
CN118503073B (en) * | 2024-07-22 | 2024-10-11 | 浙江智臾科技有限公司 | Account separating and charging method based on user-level resource tracking |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103236949A (en) * | 2013-04-27 | 2013-08-07 | 北京搜狐新媒体信息技术有限公司 | Monitoring method, device and system for server cluster |
CN104268695A (en) * | 2014-09-26 | 2015-01-07 | 武汉大学 | Multi-center watershed water environment distributed cluster management system and method |
CN105024877A (en) * | 2015-06-01 | 2015-11-04 | 北京理工大学 | Hadoop malicious node detection system based on network behavior analysis |
-
2016
- 2016-01-08 CN CN201610010050.7A patent/CN105718351B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103236949A (en) * | 2013-04-27 | 2013-08-07 | 北京搜狐新媒体信息技术有限公司 | Monitoring method, device and system for server cluster |
CN104268695A (en) * | 2014-09-26 | 2015-01-07 | 武汉大学 | Multi-center watershed water environment distributed cluster management system and method |
CN105024877A (en) * | 2015-06-01 | 2015-11-04 | 北京理工大学 | Hadoop malicious node detection system based on network behavior analysis |
Also Published As
Publication number | Publication date |
---|---|
CN105718351A (en) | 2016-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105718351B (en) | A kind of distributed monitoring management system towards Hadoop clusters | |
CN108874640B (en) | Cluster performance evaluation method and device | |
CN106487574A (en) | Automatic operating safeguards monitoring system | |
CN107943668A (en) | Computer server cluster daily record monitoring method and monitor supervision platform | |
CN104881352A (en) | System resource monitoring device based on mobile terminal | |
US20030135382A1 (en) | Self-monitoring service system for providing historical and current operating status | |
CN109783322A (en) | A kind of monitoring analysis system and its method of enterprise information system operating status | |
CN108197261A (en) | A kind of wisdom traffic operating system | |
US20100070981A1 (en) | System and Method for Performing Complex Event Processing | |
CN107070692A (en) | A kind of cloud platform monitoring service system analyzed based on big data and method | |
CN108306980A (en) | A kind of engineering flight support big data Log Analysis System | |
CN112162907A (en) | Health degree evaluation method based on monitoring index data | |
CN106685703A (en) | Data acquisition and visual monitoring intelligent system | |
EP1889161A2 (en) | Automated reporting of computer system metrics | |
CN108092813A (en) | Data center's total management system server hardware Governance framework and implementation method | |
CN112688819A (en) | Comprehensive management system for network operation and maintenance | |
CN109885453A (en) | Big data platform monitoring system based on flow data processing | |
CN101989931A (en) | Operation alarm processing method and device | |
KR20150118963A (en) | Queue monitoring and visualization | |
CN109240863A (en) | A kind of cpu fault localization method, device, equipment and storage medium | |
CN109165137A (en) | data analysis and alarm method and system | |
KR20220166760A (en) | Apparatus and method for managing trouble using big data of 5G distributed cloud system | |
CN113608457A (en) | Network operation and maintenance monitoring system | |
CN115134262B (en) | RocktMQ monitoring method and device, storage medium and electronic equipment | |
CN109951313A (en) | A kind of monitoring device and method of Hadoop cloud platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20190724 Address after: Room 206, 2nd floor, No. 18 Keyuan Road, Daxing Economic Development Zone, 102600, Beijing Patentee after: Beijing Xiaodunbird Information Technology Co.,Ltd. Address before: 100028 Beijing city Daxing District Keyuan Road Economic Development Zone No. 18 Chinese creative building No. 4 Patentee before: BEIJING HUISHANG RONGTONG INFORMATION TECHNOLOGY Co.,Ltd. |
|
PE01 | Entry into force of the registration of the contract for pledge of patent right | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: A Distributed Monitoring and Management System for Hadoop Cluster Effective date of registration: 20221028 Granted publication date: 20180209 Pledgee: Shaanxi Pharmaceutical Holding Group Paeon Pharmaceutical Co.,Ltd. Pledgor: Beijing Xiaodunbird Information Technology Co.,Ltd. Registration number: Y2022110000284 |