CN106789398A - A kind of method of media big data hadoop cluster monitoring - Google Patents

A kind of method of media big data hadoop cluster monitoring Download PDF

Info

Publication number
CN106789398A
CN106789398A CN201611061673.3A CN201611061673A CN106789398A CN 106789398 A CN106789398 A CN 106789398A CN 201611061673 A CN201611061673 A CN 201611061673A CN 106789398 A CN106789398 A CN 106789398A
Authority
CN
China
Prior art keywords
monitoring
hadoop
index
short message
big data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611061673.3A
Other languages
Chinese (zh)
Inventor
吴梅梅
王永滨
冯爽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Communication University of China
Original Assignee
Communication University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Communication University of China filed Critical Communication University of China
Priority to CN201611061673.3A priority Critical patent/CN106789398A/en
Publication of CN106789398A publication Critical patent/CN106789398A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Debugging And Monitoring (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A kind of method of media big data hadoop cluster monitoring, it is related to system O&M field.It is solved cannot realize to its overall monitoring, so as to cause monitored results to there may be error, it is difficult to problem of the accurate failure judgement to the actual influence of system.The present invention disposes monitoring process by hadoop cluster institute monitoring device, and distribution monitoring threshold value controls monitoring process timing acquiring hadoop operation conditions, and anomalous event is sent into short message by Short Message Service Gateway after automatically analyzing and comparing realizes alarm.Advantages of the present invention is as follows:Realize the foundation of Hadoop cluster monitoring indexs and docked with monitor supervision platform in media large data sets, the problems such as solving index imperfection that current media big data Hadoop monitoring is present, need separate unit to realize, reduce O&M risk, improve operating efficiency.

Description

A kind of method of media big data hadoop cluster monitoring
Technical field
The present invention relates to system O&M field, and in particular to a kind of method of media big data hadoop cluster monitoring.
Background technology
Current generation, big data have swepts the globe, and Hadoop is also introduced into and in many as outstanding big data product Used in business, the storage of such as unstructured data, filing of historical data etc..And with the development of business, it will have Increasing Hadoop clusters are put into production, and the use of Hadoop clusters is provided effectively for the development of media big data business Guarantee, but at the same time, traditional monitoring method is also difficult to accurately monitor the running status of Hadoop clusters.
At present, media big data monitor supervision platform monitors comparatively perfect for open system, but for newer Hadoop clusters, the non-architectonical of monitor control index, cluster monitoring depends on the characteristic monitoring that O&M department is implemented separately, for example Monitoring daily record keyword, monitoring process etc..Due to Hadoop be by numerous server groups into cluster, therefore for the computer The monitoring of cluster just becomes a big difficult point.As increasing Hadoop clusters put into production, each Hadoop collection of one side Group is monitored realization by single device causes inefficiency, on the other hand there is also the incomplete situation of monitor control index, so that Produce operation hidden danger.And, traditional method can only be monitored to every equipment, and Hadoop is used as a cluster, it is impossible to Realize to its overall monitoring, so as to cause monitored results to there may be error, it is difficult to reality of the accurate failure judgement to system Influence.Therefore, intend, from the overall angle of cluster, setting up complete Hadoop monitoring systems, all kinds of monitor control indexs of combing Hadoop Influence to system and business, and using the centralized monitoring system of media big data, realize to the quick of Hadoop cluster monitorings Configuration.
The content of the invention
The present invention can only be monitored to solve conventional method to every equipment, and Hadoop is used as a cluster, nothing Method is realized to its overall monitoring, so as to cause monitored results to there may be error, it is difficult to reality of the accurate failure judgement to system The problem of border influence, there is provided a kind of method of media big data hadoop cluster monitoring, particular technique embodiment is as follows:
The step of a kind of method of media big data hadoop cluster monitoring of the invention, the method, is as follows:
Step one, setting monitoring management machine and Short Message Service Gateway, and Short Message Service Gateway is connected with monitoring management machine, monitoring management Machine is connected with hadoop cluster;
Step 2, monitoring process, receive the control command of monitoring management machine:Start, stop, updating monitoring threshold value, update Monitor control index, monitoring script is updated, temporally piece judges that monitoring period is spaced, be monitored index and adopt if arrival time interval Collection circulation;Key service to hadoop obtains these states for servicing by process status querying command;To hadoop's SYSLOG file is read out, and operation monitoring script reads keyword therein and key index;System resource is passed through Internal memory, storage, cpu utilization rate querying commands obtain achievement data index, and the index of collection and threshold value are compared, and reach then Alarm event data are produced, monitoring management machine is given by alarm event data-pushing;
Step 3, offer operation interface set monitor control index, threshold value, monitoring script, alarm message receiving number for user Deng, to monitoring process push monitor control index, threshold value, monitoring script, there is provided operation interface for user send monitoring start, stops order Order, monitoring process is pushed to by order, after receiving the monitor event alarm data that monitoring process push comes, is converted to Short Message Service Gateway Interface format, adds receiving number, sends to Short Message Service Gateway, realizes that alarm message sends.
A kind of method of media big data hadoop cluster monitoring of the invention, the advantage of the method is as follows:Realize The foundation of Hadoop cluster monitoring indexs and docked with monitor supervision platform in media large data sets, solve the big number of current media According to the index imperfection of Hadoop monitoring presence, the problems such as manually being realized by single device is needed, reduce O&M risk, improve Operating efficiency.
Brief description of the drawings
Fig. 1 Organization Charts of the invention, Fig. 2 is Hadoop monitoring configuration example figures.
Specific embodiment
Specific embodiment one:What the method for present embodiment was realized in:Monitoring management machine and short message are set first Gateway, and Short Message Service Gateway is connected with monitoring management machine, monitoring management machine is connected with hadoop cluster;Secondly dispose and start prison Control process, receives the control command of monitoring management machine:Start, stop, updating monitoring threshold value, update monitor control index, update monitoring Script, temporally piece judge monitoring period be spaced, as arrival time be spaced if be monitored index collection circulation;To hadoop's Key service obtains these states for servicing by process status querying command;SYSLOG file to hadoop is read Take, operation monitoring script reads keyword therein and key index;System resource is looked into by internal memory, storage, cpu utilization rates Ask order and obtain achievement data index, the index of collection and threshold value are compared, reach then generation alarm event data, will accuse Alert event data is pushed to monitoring management machine;Then provide operation interface for user set monitor control index, threshold value, monitoring script, Alarm message receiving number etc., monitor control index, threshold value, monitoring script are pushed to monitoring process, there is provided operation interface is sent out for user Go out monitoring to start, cease and desist order, order is pushed into monitoring process, be connected with line-break gateway, receive monitoring process and push what is come After monitor event alarm data, Short Message Service Gateway interface format is converted to, adds receiving number, sent to Short Message Service Gateway, realize accusing Alert short message sending.
Specific embodiment two:The monitoring management machine of present embodiment uses minicom.
Specific embodiment three:The monitoring parsing code of the different Hadoop clusters of present embodiment is general and unified The operation of syslog server backgrounds is deployed in, therefore, for following newly-increased Hadoop clusters, only need to as shown in Figure 2 carry out phase Close the configuration i.e. achievable monitoring to Hadoop clusters.
Specific embodiment four:Monitoring information is sent to short message operator by the Short Message Service Gateway of present embodiment.
Monitoring Hadoop services:The various services of monitoring Hadoop cluster operations, comprising key service and non-critical services Two classes.Key service refers to Hadoop normally service processes necessary to operation, if breaking down, can influence Hadoop clusters just Often operation.Such as HDFS services, MapReduce services etc., if breaking down, the data storage sum of Hadoop clusters can be influenceed According to treatment, the normal operation of other related services can be also influenceed.The service that non-critical services refer generally to be deployed in management node is entered Journey, if breaking down, can influence management of the management node to Hadoop clusters, but do not interfere with the normal fortune of Hadoop clusters OK.Such as OKerberos resource exceptions, can cause user to log in the administration interface of Hadoop clusters.It is pointed out that Hadoop is high-availability cluster, and such index is from the overall angle monitoring of cluster, if a certain service generation is abnormal but smooth Realize that the high availability such as active-standby switch are operated, then not under such monitor control index.The monitoring has 20 monitor control indexs.
Monitoring Hadoop high availability:High availability is the fundamental design idea of Hadoop, the server occurred in cluster Failure, bottom software failure etc. can't typically influence the normal operation of Hadoop.In management node and control node, Hadoop It is many that high availability is realized using standby machine mode, if main frame failure service can automatically switch to standby host.For back end, Hadoop can all the time monitor its running status, if break down can be isolated automatically, it is to be restored after rejoin cluster.It is such Index can be used to monitor the process that Hadoop realizes High Availabitity, and such as service occurs active-standby switch, master/slave data synchronous abnormality etc.. Meanwhile, can also point out operation maintenance personnel to pay close attention in time by the monitoring and process the exception of host node generation.The monitoring type is total to There are 15 monitor control indexs.
Monitoring resource service condition:Per class, service can all take corresponding resource to Hadoop, and such index monitors each service Resource service condition, such as HDFS disk spaces utilization rate exceedes threshold value, NameNode memory usages and exceedes threshold value etc..This Class monitoring can coordinate the preceding common analysis Hadoop states of two classes monitoring, to realize the quick positioning of clustering fault point.The monitoring Type has 8 monitor control indexs.

Claims (3)

1. a kind of method that media big data hadoop cluster is monitored, it is characterised in that:The step of the method, is as follows:
Step one, monitoring management machine and Short Message Service Gateway are set, and Short Message Service Gateway are connected with monitoring management machine, monitoring management machine and Hadoop cluster is connected;
Step 2, monitoring process, receive the control command of monitoring management machine:Start, stop, updating monitoring threshold value, update monitoring Index, monitoring script is updated, temporally piece judges that monitoring period is spaced, be monitored index collection and follow if arrival time interval Ring;Key service to hadoop obtains these states for servicing by process status querying command;To the Syslog of hadoop Journal file is read out, and operation monitoring script reads keyword therein and key index;Pass through internal memory to system resource, deposit Storage, cpu utilization rate querying commands obtain achievement data index, and the index of collection and threshold value are compared, and reach and then produce announcement Alert event data, monitoring management machine is given by alarm event data-pushing;
Step 3, offer operation interface set monitor control index, threshold value, monitoring script, alarm message receiving number etc. for user, to Monitoring process pushes monitor control index, threshold value, monitoring script, there is provided operation interface sends monitoring and starts, ceases and desist order for user, will Order pushes to monitoring process, after receiving the monitor event alarm data that monitoring process push comes, is converted to Short Message Service Gateway interface Form, addition receives number, sends to Short Message Service Gateway, realizes that alarm message sends.
2. the method that a kind of media big data hadoop cluster according to claim 1 is monitored, it is characterised in that:Monitoring pipe Reason machine uses minicom.
3. the method that a kind of media big data hadoop cluster according to claim 1 is monitored, it is characterised in that:Different The monitoring parsing code of Hadoop clusters is general, and unified plan runs in syslog server backgrounds, newly-increased for future Hadoop clusters, carry out the relevant configuration i.e. achievable monitoring to Hadoop clusters.
CN201611061673.3A 2016-11-25 2016-11-25 A kind of method of media big data hadoop cluster monitoring Pending CN106789398A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611061673.3A CN106789398A (en) 2016-11-25 2016-11-25 A kind of method of media big data hadoop cluster monitoring

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611061673.3A CN106789398A (en) 2016-11-25 2016-11-25 A kind of method of media big data hadoop cluster monitoring

Publications (1)

Publication Number Publication Date
CN106789398A true CN106789398A (en) 2017-05-31

Family

ID=58912792

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611061673.3A Pending CN106789398A (en) 2016-11-25 2016-11-25 A kind of method of media big data hadoop cluster monitoring

Country Status (1)

Country Link
CN (1) CN106789398A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704359A (en) * 2017-09-04 2018-02-16 北京天平检验行有限公司 A kind of monitoring system of big data platform
WO2018233630A1 (en) * 2017-06-21 2018-12-27 新华三大数据技术有限公司 Fault discovery
CN109165137A (en) * 2018-07-27 2019-01-08 曙光信息产业(北京)有限公司 data analysis and alarm method and system
CN109672581A (en) * 2018-09-25 2019-04-23 平安科技(深圳)有限公司 Monitoring method, device, equipment and the storage medium of zookeeper
CN111224819A (en) * 2019-12-30 2020-06-02 上海汇付数据服务有限公司 Distributed messaging system
CN112732528A (en) * 2021-01-08 2021-04-30 卓望数码技术(深圳)有限公司 Index acquisition method, system, equipment and storage medium based on IT operation and maintenance monitoring
CN112765044A (en) * 2021-04-06 2021-05-07 上海钐昆网络科技有限公司 Abnormal data detection method, device, equipment and storage medium
WO2021147481A1 (en) * 2020-01-22 2021-07-29 北京字节跳动网络技术有限公司 Monitoring method and apparatus, and electronic device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103618644A (en) * 2013-11-26 2014-03-05 曙光信息产业股份有限公司 Distributed monitoring system based on hadoop cluster and method thereof
CN103678521A (en) * 2013-11-30 2014-03-26 电子科技大学 Distributed file monitoring system based on Hadoop frame

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103618644A (en) * 2013-11-26 2014-03-05 曙光信息产业股份有限公司 Distributed monitoring system based on hadoop cluster and method thereof
CN103678521A (en) * 2013-11-30 2014-03-26 电子科技大学 Distributed file monitoring system based on Hadoop frame

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李晋: ""Hadoop集群监控系统的研究与应用"", 《中国优秀硕士学位论文全文数据库》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018233630A1 (en) * 2017-06-21 2018-12-27 新华三大数据技术有限公司 Fault discovery
CN107704359A (en) * 2017-09-04 2018-02-16 北京天平检验行有限公司 A kind of monitoring system of big data platform
CN107704359B (en) * 2017-09-04 2021-03-16 北京天平检验行有限公司 Monitoring system of big data platform
CN109165137A (en) * 2018-07-27 2019-01-08 曙光信息产业(北京)有限公司 data analysis and alarm method and system
CN109672581A (en) * 2018-09-25 2019-04-23 平安科技(深圳)有限公司 Monitoring method, device, equipment and the storage medium of zookeeper
CN111224819A (en) * 2019-12-30 2020-06-02 上海汇付数据服务有限公司 Distributed messaging system
WO2021147481A1 (en) * 2020-01-22 2021-07-29 北京字节跳动网络技术有限公司 Monitoring method and apparatus, and electronic device
CN112732528A (en) * 2021-01-08 2021-04-30 卓望数码技术(深圳)有限公司 Index acquisition method, system, equipment and storage medium based on IT operation and maintenance monitoring
CN112765044A (en) * 2021-04-06 2021-05-07 上海钐昆网络科技有限公司 Abnormal data detection method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN106789398A (en) A kind of method of media big data hadoop cluster monitoring
CN103873279B (en) Server management method and server management device
Lou et al. Mining dependency in distributed systems through unstructured logs analysis
CN111884878A (en) Data monitoring method based on block chain
CN102135929B (en) Distributed fault-tolerant service system
CN102624554B (en) Comprehensive network management method combining equipment management mode with service management mode
CN103607297A (en) Fault processing method of computer cluster system
CN103490919A (en) Fault management system and fault management method
CN105763395A (en) Method and system for monitoring and managing virtual machine and container in cloud environment
CN115658420A (en) Database monitoring method and system
CN105574590A (en) Adaptive general control disaster recovery switching device and system, and signal generation method
CN103067209A (en) Heartbeat module self-testing method
JP2019049802A (en) Failure analysis supporting device, incident managing system, failure analysis supporting method, and program
CN110727508A (en) Task scheduling system and scheduling method
KR20180037342A (en) Application software error monitoring, statistics management service and solution method.
CN108304293A (en) A kind of software systems monitoring method based on big data technology
CN111082998A (en) Architecture system of operation and maintenance monitoring campus convergence layer
KR102188987B1 (en) Operation method of cloud computing system for zero client device using cloud server having device for managing server and local server
CN103995759A (en) High-availability computer system failure handling method and device based on core internal-external synergy
CN109147975A (en) A kind of PWR nuclear power plant reactor core status monitoring and analysis system
CN115801545A (en) Method, system, equipment and medium for reporting abnormity of hybrid cloud pipe in real time
CN110333973A (en) A kind of method and system of multi-host hot swap
CN114218329A (en) Data synchronization method, device, storage medium and computer terminal
CN109993840A (en) For the big data analysis system of railway automatic ticket selling and checking monitoring of tools state
CN113708967A (en) System monitoring disaster tolerance early warning device and early warning method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170531

WD01 Invention patent application deemed withdrawn after publication