CN103701661A - Method and system for realizing node monitoring - Google Patents

Method and system for realizing node monitoring Download PDF

Info

Publication number
CN103701661A
CN103701661A CN201310717518.2A CN201310717518A CN103701661A CN 103701661 A CN103701661 A CN 103701661A CN 201310717518 A CN201310717518 A CN 201310717518A CN 103701661 A CN103701661 A CN 103701661A
Authority
CN
China
Prior art keywords
information
back end
node
control command
proxy server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310717518.2A
Other languages
Chinese (zh)
Other versions
CN103701661B (en
Inventor
刘璧怡
郭美思
宗栋瑞
吴楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN201310717518.2A priority Critical patent/CN103701661B/en
Publication of CN103701661A publication Critical patent/CN103701661A/en
Application granted granted Critical
Publication of CN103701661B publication Critical patent/CN103701661B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The application discloses a method and a system for realizing node monitoring. The system comprises a main server and agent servers, wherein the agent servers operate on data nodes and are correspondingly independent; the main server is connected with a name node and used for acquiring cluster configuration information, sending state instructions and control instructions to the agent servers based on a heartbeat protocol, and receiving node state information uploaded by the agent servers so as to update the cluster configuration information; the agent servers are used for receiving the state instruction and control instruction information of the main server, acquiring the data node state information according to the state instructions, uploading the data node state information to the main server, performing working state control on components of the data nodes according to the control instructions and feeding back control instruction results to the main server. According to the method and the system, the agent servers receive the state instruction and control instruction information of the main server so as to acquire the data node status information, send the control instructions and feedback the control instruction result information, so that the data node monitoring management is realized.

Description

A kind of method and system that realize monitoring nodes
Technical field
The present invention relates to large data processing technique, espespecially a kind of be applicable to the large data platform of distributed system architecture (hadoop) realize monitoring nodes method and system.
Background technology
Be accompanied by the development of digital living, the volume of data increases sharply with mysterious speed, and consequent large data also become and are more and more difficult to process.Large data are data processing and the application models that adopt based on cloud computing, by the integration to data, share, intersect the intellectual resources of multiplexing formation and the ability of knowledge services.And large data platform is the base support of large data technique application.
The large data platform of current most popular hadoop is a distributed system base platform of being developed by Apache foundation.The large data platform of hadoop has in the situation that user does not understand distributed bottom details, just can carry out distributed program exploitation, the feature of having utilized fully the power of cluster to carry out high-speed computation and storage.The node scale of a hadoop cluster often comprises tens, and even thousands of back end up to a hundred, due in large scale, make fast and accurately back end in monitoring management cluster become extremely difficult.
At present, the node state that the shell dos command line DOS that the large data platform of hadoop provides by cluster or browser are checked cluster.If a certain node in cluster is carried out to control operation, need to log in individually this node, by shell instruction, this node is carried out to control operation.When the node in cluster occurs extremely delaying machine, the service that need to delay before machine by recovering manually this node, then this node is added to cluster, could recover cluster and normally work.Adopt the method for manual reversion to have complex operation, when expending a large amount of manpower, also easily introduce new mistake, make that in large-scale cluster environment operation is monitored and replied to clustered node very inconvenient.
Summary of the invention
In order to solve the problems of the technologies described above, the invention discloses a kind of method and system that realize monitoring nodes, can carry out effective monitoring to the state information of back end, when occurring extremely delaying machine, can to the back end of the machine of delaying, recover to control timely and effectively.
The invention provides a kind of method that realizes monitoring nodes, comprising:
A master server and the corresponding independently proxy server moving on each back end; Wherein,
Master server is connected with title node, for obtaining cluster configuration information from title node; Based on heart-beat protocol, issue status command and control command information to proxy server; The node status information that Receiving Agent server is uploaded, to upgrade cluster configuration information;
Proxy server, for receiving status command and the control command information of master server, obtains back end state information according to status command, is uploaded to master server; According to control command, each assembly of back end is carried out to working state control, and by control command result feedback to master server.
Further, master server also for, when back end is delayed machine extremely, the control command that sends the machine node recovery configuring of delaying according to cluster configuration information is to proxy server;
Proxy server also for, according to control command, control back end and according to cluster configuration information, recover the operating state of each assembly of back end, and by control command result feedback to master server.
Further, master server is also for, the back end state information of obtaining according to proxy server, and whether specified data node occurs the machine of extremely delaying.
Further, master server specifically for, by message queue mode, issue status command and control command information.
Further, proxy server specifically for, adopt message queue mode uploading data node status information and FEEDBACK CONTROL instruction results information.
On the other hand, the application also provides a kind of method that realizes monitoring nodes,
One master server is set on title node, independently proxy server is set respectively accordingly on each back end;
Master server obtains cluster configuration information from title node, based on heart-beat protocol, issues status command and control command information to proxy server;
Proxy server, according to status command acquisition of information back end state information, carries out working state control according to control command information to controlling each assembly of node;
Back end state information and control command object information are sent to main service node, carry out cluster configuration information renewal.
Further, the method also comprises:
When back end is delayed machine extremely, the control command information that described master server sends according to cluster configuration information the machine node recovery configuring of delaying is to proxy server;
Proxy server is controlled back end according to control command and according to cluster configuration information, is recovered the operating state of each assembly of back end.
Further, the back end state information that master server obtains according to proxy server, whether specified data node there is the machine of extremely delaying.
Further, master server by message queue mode, issue status command and control command information.
Further, proxy server adopts message queue mode, uploading data node status information and FEEDBACK CONTROL instruction results information.
The application provides a kind of technical scheme, comprising: a master server and the corresponding independently proxy server moving on each back end; Wherein, master server is connected with title node, for obtaining cluster configuration information from title node; Based on heart-beat protocol, issue status command and control command information to proxy server; The node status information that Receiving Agent server is uploaded, to upgrade cluster configuration information; Proxy server, for receiving status command and the control command information of master server, obtains back end state information according to status command, is uploaded to master server; According to control command, each assembly of back end is carried out to working state control, and by control command result feedback to master server.The present invention realizes status command and the control command information that proxy server receives master server, to obtain back end state information, to issue control command FEEDBACK CONTROL instruction results information, realizes the monitoring management to back end.
Accompanying drawing explanation
Accompanying drawing described herein is used to provide a further understanding of the present invention, forms the application's a part, and schematic description and description of the present invention is used for explaining the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is the structured flowchart that the present invention realizes the system of monitoring nodes;
Fig. 2 is the flow chart that the present invention realizes the method for monitoring nodes.
Embodiment
For technical scheme of the present invention is understood fully, heart-beat protocol is carried out to the statement of summary.Reception in network and transmission data are all to use the SOCKET in WINDOWS to realize.But if this socket disconnects, that just necessarily has problem when sending data and receiving data.Judge whether socket can be with realizing by heart-beat protocol exactly.In fact in TCP, realized a mechanism that is called heartbeat.If be provided with heartbeat, that TCP will send the heartbeat of the number of times arranging within the regular hour, and this information can not affect defined agreement.So-called " heartbeat " is exactly self-defining structure of timed sending, allows the other side know this service " online ".To guarantee the validity of link.
A self-defining structure of timed sending (heartbeat packet), with the validity of guaranteeing to connect, the main contents of Here it is heart-beat protocol.
Fig. 1 is the structured flowchart that the present invention realizes the system of monitoring nodes, as shown in Figure 1, comprising:
A master server and the corresponding independently proxy server moving on each back end; Wherein,
Master server is connected with title node, for obtaining cluster configuration information from title node; Based on heart-beat protocol, issue status command and control command information to proxy server; The node status information that Receiving Agent server is uploaded, to upgrade cluster configuration information;
Proxy server, for receiving status command and the control command information of master server, obtains back end state information according to status command, is uploaded to master server; According to control command, each assembly of back end is carried out to working state control, and by control command result feedback to master server.
Further, described master server also for, when back end is delayed machine extremely, the control command that sends the machine node recovery configuring of delaying according to cluster configuration information is to proxy server;
Proxy server also for, according to control command, control back end and according to cluster configuration information, recover the operating state of each assembly of back end, and by control command result feedback to master server.
Master server is also for, the back end state information of obtaining according to proxy server, and whether specified data node occurs the machine of extremely delaying.
Master server specifically for, by message queue mode, issue status command and control command.
Proxy server specifically for, adopt message queue mode uploading data node status information and FEEDBACK CONTROL instruction results.
Fig. 2 is the flow chart that the present invention realizes the method for monitoring nodes; As shown in Figure 2, comprising:
Step 200, a master server is set on title node, at each back end, independently proxy server is set respectively accordingly.
Step 201, master server obtain cluster configuration information from title node, based on heart-beat protocol, issue status command and control command information to proxy server.
In this step, master server by message queue mode, issue status command and control command.
Step 202, proxy server, according to status command acquisition of information back end state information, carry out working state control according to control command information to controlling each assembly of node.
In this step, proxy server adopts message queue mode, uploading data node status information and FEEDBACK CONTROL instruction results information.
Step 203, back end state information and control command object information are sent to main service node, carry out cluster configuration information renewal.
The inventive method also comprises:
When back end is delayed machine extremely, the control command information that described master server sends according to cluster configuration information the machine node recovery configuring of delaying is to proxy server.
In this step, the back end state information that master server obtains according to proxy server, whether specified data node there is the machine of extremely delaying.
Described proxy server is controlled back end according to control command and according to cluster configuration information, is recovered the operating state of each assembly of back end.
One of ordinary skill in the art will appreciate that all or part of step in said method can come instruction related hardware to complete by program, described program can be stored in computer-readable recording medium, as read-only memory, disk or CD etc.Alternatively, all or part of step of above-described embodiment also can realize with one or more integrated circuits.Correspondingly, each the module/unit in above-described embodiment can adopt the form of hardware to realize, and also can adopt the form of software function module to realize.The application is not restricted to the combination of the hardware and software of any particular form.
The above, be only preferred embodiments of the present invention, is not intended to limit protection scope of the present invention.Within the spirit and principles in the present invention all, any modification of making, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (10)

1. a system that realizes monitoring nodes, is characterized in that, comprising: a master server and the corresponding independently proxy server moving on each back end; Wherein,
Master server is connected with title node, for obtaining cluster configuration information from title node; Based on heart-beat protocol, issue status command and control command information to proxy server; The node status information that Receiving Agent server is uploaded, to upgrade cluster configuration information;
Proxy server, for receiving status command and the control command information of master server, obtains back end state information according to status command, is uploaded to master server; According to control command, each assembly of back end is carried out to working state control, and by control command result feedback to master server.
2. system according to claim 1, is characterized in that,
Described master server also for, when back end is delayed machine extremely, the control command that sends the machine node recovery configuring of delaying according to cluster configuration information is to proxy server;
Described proxy server also for, according to control command, control back end and according to cluster configuration information, recover the operating state of each assembly of back end, and by control command result feedback to master server.
3. system according to claim 2, is characterized in that, described master server is also for, the back end state information of obtaining according to proxy server, and whether specified data node occurs the machine of extremely delaying.
4. system according to claim 1, is characterized in that, described master server specifically for, by message queue mode, issue status command and control command information.
5. system according to claim 1, is characterized in that, described proxy server specifically for, adopt message queue mode uploading data node status information and FEEDBACK CONTROL instruction results information.
6. a method that realizes monitoring nodes, is characterized in that, comprising:
One master server is set on title node, independently proxy server is set respectively accordingly on each back end;
Master server obtains cluster configuration information from title node, based on heart-beat protocol, issues status command and control command information to proxy server;
Proxy server, according to status command acquisition of information back end state information, carries out working state control according to control command information to controlling each assembly of node;
Back end state information and control command object information are sent to main service node, carry out cluster configuration information renewal.
7. method according to claim 6, is characterized in that, the method also comprises:
When back end is delayed machine extremely, the control command information that described master server sends according to cluster configuration information the machine node recovery configuring of delaying is to proxy server;
Described proxy server is controlled back end according to control command and according to cluster configuration information, is recovered the operating state of each assembly of back end.
8. method according to claim 7, is characterized in that, the back end state information that described master server obtains according to proxy server, and whether specified data node there is the machine of extremely delaying.
9. method according to claim 6, is characterized in that, described master server by message queue mode, issue status command and control command information.
10. method according to claim 6, is characterized in that, described proxy server adopts message queue mode, uploading data node status information and FEEDBACK CONTROL instruction results information.
CN201310717518.2A 2013-12-23 2013-12-23 A kind of method and system for realizing monitoring nodes Active CN103701661B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310717518.2A CN103701661B (en) 2013-12-23 2013-12-23 A kind of method and system for realizing monitoring nodes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310717518.2A CN103701661B (en) 2013-12-23 2013-12-23 A kind of method and system for realizing monitoring nodes

Publications (2)

Publication Number Publication Date
CN103701661A true CN103701661A (en) 2014-04-02
CN103701661B CN103701661B (en) 2017-08-25

Family

ID=50363064

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310717518.2A Active CN103701661B (en) 2013-12-23 2013-12-23 A kind of method and system for realizing monitoring nodes

Country Status (1)

Country Link
CN (1) CN103701661B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104135388A (en) * 2014-08-15 2014-11-05 曙光信息产业(北京)有限公司 Safety management method of data nodes in distributed system
CN104866380A (en) * 2015-06-18 2015-08-26 北京搜狐新媒体信息技术有限公司 Method and device for processing state transition of cluster management system
CN105007193A (en) * 2015-08-19 2015-10-28 浪潮(北京)电子信息产业有限公司 Multi-layer information processing method, system thereof and cluster management node
CN105872055A (en) * 2016-03-31 2016-08-17 浪潮通用软件有限公司 Online monitoring method and system for computer systems in network distributed deployment mode
CN105915405A (en) * 2016-03-29 2016-08-31 深圳市中博科创信息技术有限公司 Large-scale cluster node performance monitoring system
CN106126283A (en) * 2016-06-21 2016-11-16 浪潮电子信息产业股份有限公司 A kind of method of product allocation, Apparatus and system
CN106506203A (en) * 2016-10-25 2017-03-15 杭州云象网络技术有限公司 A kind of monitoring nodes system for being applied to block chain
CN106557543A (en) * 2016-10-14 2017-04-05 深圳前海微众银行股份有限公司 Node switching method and system
CN106802852A (en) * 2017-01-19 2017-06-06 郑州云海信息技术有限公司 A kind of method of Linux platform component unified monitoring
CN107819553A (en) * 2017-09-28 2018-03-20 青岛海信网络科技股份有限公司 A kind of control instruction feedback method and device
CN108363610A (en) * 2018-02-09 2018-08-03 华为技术有限公司 A kind of control method and equipment of virtual machine monitoring plug-in unit
CN109656570A (en) * 2018-12-18 2019-04-19 江苏满运软件科技有限公司 Group system and its operation method, electronic equipment and storage medium
CN111506480A (en) * 2020-04-23 2020-08-07 上海达梦数据库有限公司 State detection method, device and system for components in cluster
WO2020206638A1 (en) * 2019-04-10 2020-10-15 Beijing Voyager Technology Co., Ltd. Systems and methods for data storage
CN113051102A (en) * 2019-12-26 2021-06-29 中国移动通信集团云南有限公司 File backup method, device, system, storage medium and computer equipment
CN115379012A (en) * 2022-10-25 2022-11-22 航天云网数据研究院(广东)有限公司 Industrial interconnection platform message queue deployment method and device based on identification analysis

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109324834A (en) * 2018-09-19 2019-02-12 郑州云海信息技术有限公司 A kind of system and method that distributed storage server is restarted automatically

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101667034A (en) * 2009-09-21 2010-03-10 北京航空航天大学 Scalable monitoring system supporting hybrid clusters
CN102104628A (en) * 2010-12-29 2011-06-22 北京新媒传信科技有限公司 Server cluster system and management method thereof
CN102394791A (en) * 2011-10-26 2012-03-28 浪潮(北京)电子信息产业有限公司 Downtime recovery method and system
CN102761570A (en) * 2011-04-28 2012-10-31 同济大学 System and method for monitoring grid resources based on agents
KR20130073372A (en) * 2011-12-23 2013-07-03 주식회사 포스코 Heat recovery apparatus of coke oven and method of the same

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101667034A (en) * 2009-09-21 2010-03-10 北京航空航天大学 Scalable monitoring system supporting hybrid clusters
CN102104628A (en) * 2010-12-29 2011-06-22 北京新媒传信科技有限公司 Server cluster system and management method thereof
CN102761570A (en) * 2011-04-28 2012-10-31 同济大学 System and method for monitoring grid resources based on agents
CN102394791A (en) * 2011-10-26 2012-03-28 浪潮(北京)电子信息产业有限公司 Downtime recovery method and system
KR20130073372A (en) * 2011-12-23 2013-07-03 주식회사 포스코 Heat recovery apparatus of coke oven and method of the same

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104135388A (en) * 2014-08-15 2014-11-05 曙光信息产业(北京)有限公司 Safety management method of data nodes in distributed system
CN104135388B (en) * 2014-08-15 2017-06-06 曙光信息产业(北京)有限公司 The method for managing security of back end in a kind of distributed system
CN104866380A (en) * 2015-06-18 2015-08-26 北京搜狐新媒体信息技术有限公司 Method and device for processing state transition of cluster management system
CN104866380B (en) * 2015-06-18 2018-07-06 北京搜狐新媒体信息技术有限公司 A kind for the treatment of method and apparatus of the state conversion of cluster management system
CN105007193A (en) * 2015-08-19 2015-10-28 浪潮(北京)电子信息产业有限公司 Multi-layer information processing method, system thereof and cluster management node
CN105915405A (en) * 2016-03-29 2016-08-31 深圳市中博科创信息技术有限公司 Large-scale cluster node performance monitoring system
CN105872055A (en) * 2016-03-31 2016-08-17 浪潮通用软件有限公司 Online monitoring method and system for computer systems in network distributed deployment mode
CN106126283A (en) * 2016-06-21 2016-11-16 浪潮电子信息产业股份有限公司 A kind of method of product allocation, Apparatus and system
CN106126283B (en) * 2016-06-21 2019-05-14 浪潮电子信息产业股份有限公司 A kind of method, apparatus and system of product allocation
CN106557543A (en) * 2016-10-14 2017-04-05 深圳前海微众银行股份有限公司 Node switching method and system
CN106506203A (en) * 2016-10-25 2017-03-15 杭州云象网络技术有限公司 A kind of monitoring nodes system for being applied to block chain
CN106506203B (en) * 2016-10-25 2019-12-10 杭州云象网络技术有限公司 Node monitoring system applied to block chain
CN106802852A (en) * 2017-01-19 2017-06-06 郑州云海信息技术有限公司 A kind of method of Linux platform component unified monitoring
CN107819553B (en) * 2017-09-28 2020-10-30 青岛海信网络科技股份有限公司 Control instruction feedback method and device
CN107819553A (en) * 2017-09-28 2018-03-20 青岛海信网络科技股份有限公司 A kind of control instruction feedback method and device
CN108363610A (en) * 2018-02-09 2018-08-03 华为技术有限公司 A kind of control method and equipment of virtual machine monitoring plug-in unit
CN109656570A (en) * 2018-12-18 2019-04-19 江苏满运软件科技有限公司 Group system and its operation method, electronic equipment and storage medium
CN109656570B (en) * 2018-12-18 2022-03-22 江苏满运软件科技有限公司 Cluster system, operation method thereof, electronic device and storage medium
WO2020206638A1 (en) * 2019-04-10 2020-10-15 Beijing Voyager Technology Co., Ltd. Systems and methods for data storage
CN112352228A (en) * 2019-04-10 2021-02-09 北京航迹科技有限公司 Data storage system and method
CN113051102A (en) * 2019-12-26 2021-06-29 中国移动通信集团云南有限公司 File backup method, device, system, storage medium and computer equipment
CN113051102B (en) * 2019-12-26 2024-03-19 中国移动通信集团云南有限公司 File backup method, device, system, storage medium and computer equipment
CN111506480A (en) * 2020-04-23 2020-08-07 上海达梦数据库有限公司 State detection method, device and system for components in cluster
CN111506480B (en) * 2020-04-23 2024-03-08 上海达梦数据库有限公司 Method, device and system for detecting states of components in cluster
CN115379012A (en) * 2022-10-25 2022-11-22 航天云网数据研究院(广东)有限公司 Industrial interconnection platform message queue deployment method and device based on identification analysis
CN115379012B (en) * 2022-10-25 2023-03-24 航天云网数据研究院(广东)有限公司 Industrial interconnection platform message queue deployment method and device based on identification analysis

Also Published As

Publication number Publication date
CN103701661B (en) 2017-08-25

Similar Documents

Publication Publication Date Title
CN103701661A (en) Method and system for realizing node monitoring
US10797962B2 (en) Methods and apparatus for providing adaptive private network centralized management system data visualization processes
EP3526994B1 (en) Network management interface
JP7138150B2 (en) DISTRIBUTED TRAINING METHOD, SYSTEM, DEVICE, STORAGE MEDIUM, AND PROGRAM
EP2934036B1 (en) System and method for managing cwsn communication data based on gui interaction
CN105052076B (en) Network element management system and network element management method based on cloud computing
EP3764596A1 (en) Data configuration method and apparatus
WO2015191649A1 (en) Providing multiple synchronous serial console sessions using data buffering
CN105357048A (en) Method and system for data synchronization of network equipment
CN113742031A (en) Node state information acquisition method and device, electronic equipment and readable storage medium
CN113364638B (en) Method, electronic device and storage medium for EPA networking
CN111327460A (en) Gateway configuration method, cloud device, gateway and system
CN105262604A (en) Virtual machine migration method and equipment
CN102148702B (en) Method for managing network by utilizing network configuration protocol
CA2795782A1 (en) Cloud node management method, system and central server
US11556100B2 (en) Control method, related device, and system
CN102427474B (en) Data transmission system in cloud storage
US11838176B1 (en) Provisioning and deploying RAN applications in a RAN system
CN114363399B (en) Control method and system of edge gateway
EP3629616B1 (en) Data interaction method, device and equipment
CN102368715B (en) IEC-61850 protocol gateway-based realization equipment for centralized network management of exchangers
KR20150088462A (en) Method for linking network device in cloud environment and apparatus therefor
WO2017032212A1 (en) Data stream processing method and apparatus
US10860409B2 (en) Tracelog isolation of failed sessions at scale
EP2842271A1 (en) Network management

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant