CN106789398A - A kind of method of media big data hadoop cluster monitoring - Google Patents
A kind of method of media big data hadoop cluster monitoring Download PDFInfo
- Publication number
- CN106789398A CN106789398A CN201611061673.3A CN201611061673A CN106789398A CN 106789398 A CN106789398 A CN 106789398A CN 201611061673 A CN201611061673 A CN 201611061673A CN 106789398 A CN106789398 A CN 106789398A
- Authority
- CN
- China
- Prior art keywords
- monitoring
- hadoop
- index
- short message
- big data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0677—Localisation of faults
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Debugging And Monitoring (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
A kind of method of media big data hadoop cluster monitoring, it is related to system O&M field.It is solved cannot realize to its overall monitoring, so as to cause monitored results to there may be error, it is difficult to problem of the accurate failure judgement to the actual influence of system.The present invention disposes monitoring process by hadoop cluster institute monitoring device, and distribution monitoring threshold value controls monitoring process timing acquiring hadoop operation conditions, and anomalous event is sent into short message by Short Message Service Gateway after automatically analyzing and comparing realizes alarm.Advantages of the present invention is as follows:Realize the foundation of Hadoop cluster monitoring indexs and docked with monitor supervision platform in media large data sets, the problems such as solving index imperfection that current media big data Hadoop monitoring is present, need separate unit to realize, reduce O&M risk, improve operating efficiency.
Description
Technical field
The present invention relates to system O&M field, and in particular to a kind of method of media big data hadoop cluster monitoring.
Background technology
Current generation, big data have swepts the globe, and Hadoop is also introduced into and in many as outstanding big data product
Used in business, the storage of such as unstructured data, filing of historical data etc..And with the development of business, it will have
Increasing Hadoop clusters are put into production, and the use of Hadoop clusters is provided effectively for the development of media big data business
Guarantee, but at the same time, traditional monitoring method is also difficult to accurately monitor the running status of Hadoop clusters.
At present, media big data monitor supervision platform monitors comparatively perfect for open system, but for newer
Hadoop clusters, the non-architectonical of monitor control index, cluster monitoring depends on the characteristic monitoring that O&M department is implemented separately, for example
Monitoring daily record keyword, monitoring process etc..Due to Hadoop be by numerous server groups into cluster, therefore for the computer
The monitoring of cluster just becomes a big difficult point.As increasing Hadoop clusters put into production, each Hadoop collection of one side
Group is monitored realization by single device causes inefficiency, on the other hand there is also the incomplete situation of monitor control index, so that
Produce operation hidden danger.And, traditional method can only be monitored to every equipment, and Hadoop is used as a cluster, it is impossible to
Realize to its overall monitoring, so as to cause monitored results to there may be error, it is difficult to reality of the accurate failure judgement to system
Influence.Therefore, intend, from the overall angle of cluster, setting up complete Hadoop monitoring systems, all kinds of monitor control indexs of combing Hadoop
Influence to system and business, and using the centralized monitoring system of media big data, realize to the quick of Hadoop cluster monitorings
Configuration.
The content of the invention
The present invention can only be monitored to solve conventional method to every equipment, and Hadoop is used as a cluster, nothing
Method is realized to its overall monitoring, so as to cause monitored results to there may be error, it is difficult to reality of the accurate failure judgement to system
The problem of border influence, there is provided a kind of method of media big data hadoop cluster monitoring, particular technique embodiment is as follows:
The step of a kind of method of media big data hadoop cluster monitoring of the invention, the method, is as follows:
Step one, setting monitoring management machine and Short Message Service Gateway, and Short Message Service Gateway is connected with monitoring management machine, monitoring management
Machine is connected with hadoop cluster;
Step 2, monitoring process, receive the control command of monitoring management machine:Start, stop, updating monitoring threshold value, update
Monitor control index, monitoring script is updated, temporally piece judges that monitoring period is spaced, be monitored index and adopt if arrival time interval
Collection circulation;Key service to hadoop obtains these states for servicing by process status querying command;To hadoop's
SYSLOG file is read out, and operation monitoring script reads keyword therein and key index;System resource is passed through
Internal memory, storage, cpu utilization rate querying commands obtain achievement data index, and the index of collection and threshold value are compared, and reach then
Alarm event data are produced, monitoring management machine is given by alarm event data-pushing;
Step 3, offer operation interface set monitor control index, threshold value, monitoring script, alarm message receiving number for user
Deng, to monitoring process push monitor control index, threshold value, monitoring script, there is provided operation interface for user send monitoring start, stops order
Order, monitoring process is pushed to by order, after receiving the monitor event alarm data that monitoring process push comes, is converted to Short Message Service Gateway
Interface format, adds receiving number, sends to Short Message Service Gateway, realizes that alarm message sends.
A kind of method of media big data hadoop cluster monitoring of the invention, the advantage of the method is as follows:Realize
The foundation of Hadoop cluster monitoring indexs and docked with monitor supervision platform in media large data sets, solve the big number of current media
According to the index imperfection of Hadoop monitoring presence, the problems such as manually being realized by single device is needed, reduce O&M risk, improve
Operating efficiency.
Brief description of the drawings
Fig. 1 Organization Charts of the invention, Fig. 2 is Hadoop monitoring configuration example figures.
Specific embodiment
Specific embodiment one:What the method for present embodiment was realized in:Monitoring management machine and short message are set first
Gateway, and Short Message Service Gateway is connected with monitoring management machine, monitoring management machine is connected with hadoop cluster;Secondly dispose and start prison
Control process, receives the control command of monitoring management machine:Start, stop, updating monitoring threshold value, update monitor control index, update monitoring
Script, temporally piece judge monitoring period be spaced, as arrival time be spaced if be monitored index collection circulation;To hadoop's
Key service obtains these states for servicing by process status querying command;SYSLOG file to hadoop is read
Take, operation monitoring script reads keyword therein and key index;System resource is looked into by internal memory, storage, cpu utilization rates
Ask order and obtain achievement data index, the index of collection and threshold value are compared, reach then generation alarm event data, will accuse
Alert event data is pushed to monitoring management machine;Then provide operation interface for user set monitor control index, threshold value, monitoring script,
Alarm message receiving number etc., monitor control index, threshold value, monitoring script are pushed to monitoring process, there is provided operation interface is sent out for user
Go out monitoring to start, cease and desist order, order is pushed into monitoring process, be connected with line-break gateway, receive monitoring process and push what is come
After monitor event alarm data, Short Message Service Gateway interface format is converted to, adds receiving number, sent to Short Message Service Gateway, realize accusing
Alert short message sending.
Specific embodiment two:The monitoring management machine of present embodiment uses minicom.
Specific embodiment three:The monitoring parsing code of the different Hadoop clusters of present embodiment is general and unified
The operation of syslog server backgrounds is deployed in, therefore, for following newly-increased Hadoop clusters, only need to as shown in Figure 2 carry out phase
Close the configuration i.e. achievable monitoring to Hadoop clusters.
Specific embodiment four:Monitoring information is sent to short message operator by the Short Message Service Gateway of present embodiment.
Monitoring Hadoop services:The various services of monitoring Hadoop cluster operations, comprising key service and non-critical services
Two classes.Key service refers to Hadoop normally service processes necessary to operation, if breaking down, can influence Hadoop clusters just
Often operation.Such as HDFS services, MapReduce services etc., if breaking down, the data storage sum of Hadoop clusters can be influenceed
According to treatment, the normal operation of other related services can be also influenceed.The service that non-critical services refer generally to be deployed in management node is entered
Journey, if breaking down, can influence management of the management node to Hadoop clusters, but do not interfere with the normal fortune of Hadoop clusters
OK.Such as OKerberos resource exceptions, can cause user to log in the administration interface of Hadoop clusters.It is pointed out that
Hadoop is high-availability cluster, and such index is from the overall angle monitoring of cluster, if a certain service generation is abnormal but smooth
Realize that the high availability such as active-standby switch are operated, then not under such monitor control index.The monitoring has 20 monitor control indexs.
Monitoring Hadoop high availability:High availability is the fundamental design idea of Hadoop, the server occurred in cluster
Failure, bottom software failure etc. can't typically influence the normal operation of Hadoop.In management node and control node, Hadoop
It is many that high availability is realized using standby machine mode, if main frame failure service can automatically switch to standby host.For back end,
Hadoop can all the time monitor its running status, if break down can be isolated automatically, it is to be restored after rejoin cluster.It is such
Index can be used to monitor the process that Hadoop realizes High Availabitity, and such as service occurs active-standby switch, master/slave data synchronous abnormality etc..
Meanwhile, can also point out operation maintenance personnel to pay close attention in time by the monitoring and process the exception of host node generation.The monitoring type is total to
There are 15 monitor control indexs.
Monitoring resource service condition:Per class, service can all take corresponding resource to Hadoop, and such index monitors each service
Resource service condition, such as HDFS disk spaces utilization rate exceedes threshold value, NameNode memory usages and exceedes threshold value etc..This
Class monitoring can coordinate the preceding common analysis Hadoop states of two classes monitoring, to realize the quick positioning of clustering fault point.The monitoring
Type has 8 monitor control indexs.
Claims (3)
1. a kind of method that media big data hadoop cluster is monitored, it is characterised in that:The step of the method, is as follows:
Step one, monitoring management machine and Short Message Service Gateway are set, and Short Message Service Gateway are connected with monitoring management machine, monitoring management machine and
Hadoop cluster is connected;
Step 2, monitoring process, receive the control command of monitoring management machine:Start, stop, updating monitoring threshold value, update monitoring
Index, monitoring script is updated, temporally piece judges that monitoring period is spaced, be monitored index collection and follow if arrival time interval
Ring;Key service to hadoop obtains these states for servicing by process status querying command;To the Syslog of hadoop
Journal file is read out, and operation monitoring script reads keyword therein and key index;Pass through internal memory to system resource, deposit
Storage, cpu utilization rate querying commands obtain achievement data index, and the index of collection and threshold value are compared, and reach and then produce announcement
Alert event data, monitoring management machine is given by alarm event data-pushing;
Step 3, offer operation interface set monitor control index, threshold value, monitoring script, alarm message receiving number etc. for user, to
Monitoring process pushes monitor control index, threshold value, monitoring script, there is provided operation interface sends monitoring and starts, ceases and desist order for user, will
Order pushes to monitoring process, after receiving the monitor event alarm data that monitoring process push comes, is converted to Short Message Service Gateway interface
Form, addition receives number, sends to Short Message Service Gateway, realizes that alarm message sends.
2. the method that a kind of media big data hadoop cluster according to claim 1 is monitored, it is characterised in that:Monitoring pipe
Reason machine uses minicom.
3. the method that a kind of media big data hadoop cluster according to claim 1 is monitored, it is characterised in that:Different
The monitoring parsing code of Hadoop clusters is general, and unified plan runs in syslog server backgrounds, newly-increased for future
Hadoop clusters, carry out the relevant configuration i.e. achievable monitoring to Hadoop clusters.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611061673.3A CN106789398A (en) | 2016-11-25 | 2016-11-25 | A kind of method of media big data hadoop cluster monitoring |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611061673.3A CN106789398A (en) | 2016-11-25 | 2016-11-25 | A kind of method of media big data hadoop cluster monitoring |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106789398A true CN106789398A (en) | 2017-05-31 |
Family
ID=58912792
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611061673.3A Pending CN106789398A (en) | 2016-11-25 | 2016-11-25 | A kind of method of media big data hadoop cluster monitoring |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106789398A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107704359A (en) * | 2017-09-04 | 2018-02-16 | 北京天平检验行有限公司 | A kind of monitoring system of big data platform |
WO2018233630A1 (en) * | 2017-06-21 | 2018-12-27 | 新华三大数据技术有限公司 | Fault discovery |
CN109165137A (en) * | 2018-07-27 | 2019-01-08 | 曙光信息产业(北京)有限公司 | data analysis and alarm method and system |
CN109672581A (en) * | 2018-09-25 | 2019-04-23 | 平安科技(深圳)有限公司 | Monitoring method, device, equipment and the storage medium of zookeeper |
CN111224819A (en) * | 2019-12-30 | 2020-06-02 | 上海汇付数据服务有限公司 | Distributed messaging system |
CN112732528A (en) * | 2021-01-08 | 2021-04-30 | 卓望数码技术(深圳)有限公司 | Index acquisition method, system, equipment and storage medium based on IT operation and maintenance monitoring |
CN112765044A (en) * | 2021-04-06 | 2021-05-07 | 上海钐昆网络科技有限公司 | Abnormal data detection method, device, equipment and storage medium |
WO2021147481A1 (en) * | 2020-01-22 | 2021-07-29 | 北京字节跳动网络技术有限公司 | Monitoring method and apparatus, and electronic device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103618644A (en) * | 2013-11-26 | 2014-03-05 | 曙光信息产业股份有限公司 | Distributed monitoring system based on hadoop cluster and method thereof |
CN103678521A (en) * | 2013-11-30 | 2014-03-26 | 电子科技大学 | Distributed file monitoring system based on Hadoop frame |
-
2016
- 2016-11-25 CN CN201611061673.3A patent/CN106789398A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103618644A (en) * | 2013-11-26 | 2014-03-05 | 曙光信息产业股份有限公司 | Distributed monitoring system based on hadoop cluster and method thereof |
CN103678521A (en) * | 2013-11-30 | 2014-03-26 | 电子科技大学 | Distributed file monitoring system based on Hadoop frame |
Non-Patent Citations (1)
Title |
---|
李晋: ""Hadoop集群监控系统的研究与应用"", 《中国优秀硕士学位论文全文数据库》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018233630A1 (en) * | 2017-06-21 | 2018-12-27 | 新华三大数据技术有限公司 | Fault discovery |
CN107704359A (en) * | 2017-09-04 | 2018-02-16 | 北京天平检验行有限公司 | A kind of monitoring system of big data platform |
CN107704359B (en) * | 2017-09-04 | 2021-03-16 | 北京天平检验行有限公司 | Monitoring system of big data platform |
CN109165137A (en) * | 2018-07-27 | 2019-01-08 | 曙光信息产业(北京)有限公司 | data analysis and alarm method and system |
CN109672581A (en) * | 2018-09-25 | 2019-04-23 | 平安科技(深圳)有限公司 | Monitoring method, device, equipment and the storage medium of zookeeper |
CN111224819A (en) * | 2019-12-30 | 2020-06-02 | 上海汇付数据服务有限公司 | Distributed messaging system |
WO2021147481A1 (en) * | 2020-01-22 | 2021-07-29 | 北京字节跳动网络技术有限公司 | Monitoring method and apparatus, and electronic device |
CN112732528A (en) * | 2021-01-08 | 2021-04-30 | 卓望数码技术(深圳)有限公司 | Index acquisition method, system, equipment and storage medium based on IT operation and maintenance monitoring |
CN112765044A (en) * | 2021-04-06 | 2021-05-07 | 上海钐昆网络科技有限公司 | Abnormal data detection method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106789398A (en) | A kind of method of media big data hadoop cluster monitoring | |
CN103873279B (en) | Server management method and server management device | |
Lou et al. | Mining dependency in distributed systems through unstructured logs analysis | |
CN111884878A (en) | Data monitoring method based on block chain | |
CN102135929B (en) | Distributed fault-tolerant service system | |
CN102624554B (en) | Comprehensive network management method combining equipment management mode with service management mode | |
CN103607297A (en) | Fault processing method of computer cluster system | |
CN103490919A (en) | Fault management system and fault management method | |
CN105763395A (en) | Method and system for monitoring and managing virtual machine and container in cloud environment | |
CN115658420A (en) | Database monitoring method and system | |
CN105574590A (en) | Adaptive general control disaster recovery switching device and system, and signal generation method | |
CN103067209A (en) | Heartbeat module self-testing method | |
JP2019049802A (en) | Failure analysis supporting device, incident managing system, failure analysis supporting method, and program | |
CN110727508A (en) | Task scheduling system and scheduling method | |
KR20180037342A (en) | Application software error monitoring, statistics management service and solution method. | |
CN108304293A (en) | A kind of software systems monitoring method based on big data technology | |
CN111082998A (en) | Architecture system of operation and maintenance monitoring campus convergence layer | |
KR102188987B1 (en) | Operation method of cloud computing system for zero client device using cloud server having device for managing server and local server | |
CN103995759A (en) | High-availability computer system failure handling method and device based on core internal-external synergy | |
CN109147975A (en) | A kind of PWR nuclear power plant reactor core status monitoring and analysis system | |
CN115801545A (en) | Method, system, equipment and medium for reporting abnormity of hybrid cloud pipe in real time | |
CN110333973A (en) | A kind of method and system of multi-host hot swap | |
CN114218329A (en) | Data synchronization method, device, storage medium and computer terminal | |
CN109993840A (en) | For the big data analysis system of railway automatic ticket selling and checking monitoring of tools state | |
CN113708967A (en) | System monitoring disaster tolerance early warning device and early warning method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170531 |
|
WD01 | Invention patent application deemed withdrawn after publication |