CN103701661A - Method and system for realizing node monitoring - Google Patents
Method and system for realizing node monitoring Download PDFInfo
- Publication number
- CN103701661A CN103701661A CN201310717518.2A CN201310717518A CN103701661A CN 103701661 A CN103701661 A CN 103701661A CN 201310717518 A CN201310717518 A CN 201310717518A CN 103701661 A CN103701661 A CN 103701661A
- Authority
- CN
- China
- Prior art keywords
- information
- back end
- node
- control command
- proxy server
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Abstract
The application discloses a method and a system for realizing node monitoring. The system comprises a main server and agent servers, wherein the agent servers operate on data nodes and are correspondingly independent; the main server is connected with a name node and used for acquiring cluster configuration information, sending state instructions and control instructions to the agent servers based on a heartbeat protocol, and receiving node state information uploaded by the agent servers so as to update the cluster configuration information; the agent servers are used for receiving the state instruction and control instruction information of the main server, acquiring the data node state information according to the state instructions, uploading the data node state information to the main server, performing working state control on components of the data nodes according to the control instructions and feeding back control instruction results to the main server. According to the method and the system, the agent servers receive the state instruction and control instruction information of the main server so as to acquire the data node status information, send the control instructions and feedback the control instruction result information, so that the data node monitoring management is realized.
Description
Technical field
The present invention relates to large data processing technique, espespecially a kind of be applicable to the large data platform of distributed system architecture (hadoop) realize monitoring nodes method and system.
Background technology
Be accompanied by the development of digital living, the volume of data increases sharply with mysterious speed, and consequent large data also become and are more and more difficult to process.Large data are data processing and the application models that adopt based on cloud computing, by the integration to data, share, intersect the intellectual resources of multiplexing formation and the ability of knowledge services.And large data platform is the base support of large data technique application.
The large data platform of current most popular hadoop is a distributed system base platform of being developed by Apache foundation.The large data platform of hadoop has in the situation that user does not understand distributed bottom details, just can carry out distributed program exploitation, the feature of having utilized fully the power of cluster to carry out high-speed computation and storage.The node scale of a hadoop cluster often comprises tens, and even thousands of back end up to a hundred, due in large scale, make fast and accurately back end in monitoring management cluster become extremely difficult.
At present, the node state that the shell dos command line DOS that the large data platform of hadoop provides by cluster or browser are checked cluster.If a certain node in cluster is carried out to control operation, need to log in individually this node, by shell instruction, this node is carried out to control operation.When the node in cluster occurs extremely delaying machine, the service that need to delay before machine by recovering manually this node, then this node is added to cluster, could recover cluster and normally work.Adopt the method for manual reversion to have complex operation, when expending a large amount of manpower, also easily introduce new mistake, make that in large-scale cluster environment operation is monitored and replied to clustered node very inconvenient.
Summary of the invention
In order to solve the problems of the technologies described above, the invention discloses a kind of method and system that realize monitoring nodes, can carry out effective monitoring to the state information of back end, when occurring extremely delaying machine, can to the back end of the machine of delaying, recover to control timely and effectively.
The invention provides a kind of method that realizes monitoring nodes, comprising:
A master server and the corresponding independently proxy server moving on each back end; Wherein,
Master server is connected with title node, for obtaining cluster configuration information from title node; Based on heart-beat protocol, issue status command and control command information to proxy server; The node status information that Receiving Agent server is uploaded, to upgrade cluster configuration information;
Proxy server, for receiving status command and the control command information of master server, obtains back end state information according to status command, is uploaded to master server; According to control command, each assembly of back end is carried out to working state control, and by control command result feedback to master server.
Further, master server also for, when back end is delayed machine extremely, the control command that sends the machine node recovery configuring of delaying according to cluster configuration information is to proxy server;
Proxy server also for, according to control command, control back end and according to cluster configuration information, recover the operating state of each assembly of back end, and by control command result feedback to master server.
Further, master server is also for, the back end state information of obtaining according to proxy server, and whether specified data node occurs the machine of extremely delaying.
Further, master server specifically for, by message queue mode, issue status command and control command information.
Further, proxy server specifically for, adopt message queue mode uploading data node status information and FEEDBACK CONTROL instruction results information.
On the other hand, the application also provides a kind of method that realizes monitoring nodes,
One master server is set on title node, independently proxy server is set respectively accordingly on each back end;
Master server obtains cluster configuration information from title node, based on heart-beat protocol, issues status command and control command information to proxy server;
Proxy server, according to status command acquisition of information back end state information, carries out working state control according to control command information to controlling each assembly of node;
Back end state information and control command object information are sent to main service node, carry out cluster configuration information renewal.
Further, the method also comprises:
When back end is delayed machine extremely, the control command information that described master server sends according to cluster configuration information the machine node recovery configuring of delaying is to proxy server;
Proxy server is controlled back end according to control command and according to cluster configuration information, is recovered the operating state of each assembly of back end.
Further, the back end state information that master server obtains according to proxy server, whether specified data node there is the machine of extremely delaying.
Further, master server by message queue mode, issue status command and control command information.
Further, proxy server adopts message queue mode, uploading data node status information and FEEDBACK CONTROL instruction results information.
The application provides a kind of technical scheme, comprising: a master server and the corresponding independently proxy server moving on each back end; Wherein, master server is connected with title node, for obtaining cluster configuration information from title node; Based on heart-beat protocol, issue status command and control command information to proxy server; The node status information that Receiving Agent server is uploaded, to upgrade cluster configuration information; Proxy server, for receiving status command and the control command information of master server, obtains back end state information according to status command, is uploaded to master server; According to control command, each assembly of back end is carried out to working state control, and by control command result feedback to master server.The present invention realizes status command and the control command information that proxy server receives master server, to obtain back end state information, to issue control command FEEDBACK CONTROL instruction results information, realizes the monitoring management to back end.
Accompanying drawing explanation
Accompanying drawing described herein is used to provide a further understanding of the present invention, forms the application's a part, and schematic description and description of the present invention is used for explaining the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is the structured flowchart that the present invention realizes the system of monitoring nodes;
Fig. 2 is the flow chart that the present invention realizes the method for monitoring nodes.
Embodiment
For technical scheme of the present invention is understood fully, heart-beat protocol is carried out to the statement of summary.Reception in network and transmission data are all to use the SOCKET in WINDOWS to realize.But if this socket disconnects, that just necessarily has problem when sending data and receiving data.Judge whether socket can be with realizing by heart-beat protocol exactly.In fact in TCP, realized a mechanism that is called heartbeat.If be provided with heartbeat, that TCP will send the heartbeat of the number of times arranging within the regular hour, and this information can not affect defined agreement.So-called " heartbeat " is exactly self-defining structure of timed sending, allows the other side know this service " online ".To guarantee the validity of link.
A self-defining structure of timed sending (heartbeat packet), with the validity of guaranteeing to connect, the main contents of Here it is heart-beat protocol.
Fig. 1 is the structured flowchart that the present invention realizes the system of monitoring nodes, as shown in Figure 1, comprising:
A master server and the corresponding independently proxy server moving on each back end; Wherein,
Master server is connected with title node, for obtaining cluster configuration information from title node; Based on heart-beat protocol, issue status command and control command information to proxy server; The node status information that Receiving Agent server is uploaded, to upgrade cluster configuration information;
Proxy server, for receiving status command and the control command information of master server, obtains back end state information according to status command, is uploaded to master server; According to control command, each assembly of back end is carried out to working state control, and by control command result feedback to master server.
Further, described master server also for, when back end is delayed machine extremely, the control command that sends the machine node recovery configuring of delaying according to cluster configuration information is to proxy server;
Proxy server also for, according to control command, control back end and according to cluster configuration information, recover the operating state of each assembly of back end, and by control command result feedback to master server.
Master server is also for, the back end state information of obtaining according to proxy server, and whether specified data node occurs the machine of extremely delaying.
Master server specifically for, by message queue mode, issue status command and control command.
Proxy server specifically for, adopt message queue mode uploading data node status information and FEEDBACK CONTROL instruction results.
Fig. 2 is the flow chart that the present invention realizes the method for monitoring nodes; As shown in Figure 2, comprising:
Step 200, a master server is set on title node, at each back end, independently proxy server is set respectively accordingly.
Step 201, master server obtain cluster configuration information from title node, based on heart-beat protocol, issue status command and control command information to proxy server.
In this step, master server by message queue mode, issue status command and control command.
Step 202, proxy server, according to status command acquisition of information back end state information, carry out working state control according to control command information to controlling each assembly of node.
In this step, proxy server adopts message queue mode, uploading data node status information and FEEDBACK CONTROL instruction results information.
Step 203, back end state information and control command object information are sent to main service node, carry out cluster configuration information renewal.
The inventive method also comprises:
When back end is delayed machine extremely, the control command information that described master server sends according to cluster configuration information the machine node recovery configuring of delaying is to proxy server.
In this step, the back end state information that master server obtains according to proxy server, whether specified data node there is the machine of extremely delaying.
Described proxy server is controlled back end according to control command and according to cluster configuration information, is recovered the operating state of each assembly of back end.
One of ordinary skill in the art will appreciate that all or part of step in said method can come instruction related hardware to complete by program, described program can be stored in computer-readable recording medium, as read-only memory, disk or CD etc.Alternatively, all or part of step of above-described embodiment also can realize with one or more integrated circuits.Correspondingly, each the module/unit in above-described embodiment can adopt the form of hardware to realize, and also can adopt the form of software function module to realize.The application is not restricted to the combination of the hardware and software of any particular form.
The above, be only preferred embodiments of the present invention, is not intended to limit protection scope of the present invention.Within the spirit and principles in the present invention all, any modification of making, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.
Claims (10)
1. a system that realizes monitoring nodes, is characterized in that, comprising: a master server and the corresponding independently proxy server moving on each back end; Wherein,
Master server is connected with title node, for obtaining cluster configuration information from title node; Based on heart-beat protocol, issue status command and control command information to proxy server; The node status information that Receiving Agent server is uploaded, to upgrade cluster configuration information;
Proxy server, for receiving status command and the control command information of master server, obtains back end state information according to status command, is uploaded to master server; According to control command, each assembly of back end is carried out to working state control, and by control command result feedback to master server.
2. system according to claim 1, is characterized in that,
Described master server also for, when back end is delayed machine extremely, the control command that sends the machine node recovery configuring of delaying according to cluster configuration information is to proxy server;
Described proxy server also for, according to control command, control back end and according to cluster configuration information, recover the operating state of each assembly of back end, and by control command result feedback to master server.
3. system according to claim 2, is characterized in that, described master server is also for, the back end state information of obtaining according to proxy server, and whether specified data node occurs the machine of extremely delaying.
4. system according to claim 1, is characterized in that, described master server specifically for, by message queue mode, issue status command and control command information.
5. system according to claim 1, is characterized in that, described proxy server specifically for, adopt message queue mode uploading data node status information and FEEDBACK CONTROL instruction results information.
6. a method that realizes monitoring nodes, is characterized in that, comprising:
One master server is set on title node, independently proxy server is set respectively accordingly on each back end;
Master server obtains cluster configuration information from title node, based on heart-beat protocol, issues status command and control command information to proxy server;
Proxy server, according to status command acquisition of information back end state information, carries out working state control according to control command information to controlling each assembly of node;
Back end state information and control command object information are sent to main service node, carry out cluster configuration information renewal.
7. method according to claim 6, is characterized in that, the method also comprises:
When back end is delayed machine extremely, the control command information that described master server sends according to cluster configuration information the machine node recovery configuring of delaying is to proxy server;
Described proxy server is controlled back end according to control command and according to cluster configuration information, is recovered the operating state of each assembly of back end.
8. method according to claim 7, is characterized in that, the back end state information that described master server obtains according to proxy server, and whether specified data node there is the machine of extremely delaying.
9. method according to claim 6, is characterized in that, described master server by message queue mode, issue status command and control command information.
10. method according to claim 6, is characterized in that, described proxy server adopts message queue mode, uploading data node status information and FEEDBACK CONTROL instruction results information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310717518.2A CN103701661B (en) | 2013-12-23 | 2013-12-23 | A kind of method and system for realizing monitoring nodes |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310717518.2A CN103701661B (en) | 2013-12-23 | 2013-12-23 | A kind of method and system for realizing monitoring nodes |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103701661A true CN103701661A (en) | 2014-04-02 |
CN103701661B CN103701661B (en) | 2017-08-25 |
Family
ID=50363064
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310717518.2A Active CN103701661B (en) | 2013-12-23 | 2013-12-23 | A kind of method and system for realizing monitoring nodes |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103701661B (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104135388A (en) * | 2014-08-15 | 2014-11-05 | 曙光信息产业(北京)有限公司 | Safety management method of data nodes in distributed system |
CN104866380A (en) * | 2015-06-18 | 2015-08-26 | 北京搜狐新媒体信息技术有限公司 | Method and device for processing state transition of cluster management system |
CN105007193A (en) * | 2015-08-19 | 2015-10-28 | 浪潮(北京)电子信息产业有限公司 | Multi-layer information processing method, system thereof and cluster management node |
CN105872055A (en) * | 2016-03-31 | 2016-08-17 | 浪潮通用软件有限公司 | Online monitoring method and system for computer systems in network distributed deployment mode |
CN105915405A (en) * | 2016-03-29 | 2016-08-31 | 深圳市中博科创信息技术有限公司 | Large-scale cluster node performance monitoring system |
CN106126283A (en) * | 2016-06-21 | 2016-11-16 | 浪潮电子信息产业股份有限公司 | A kind of method of product allocation, Apparatus and system |
CN106506203A (en) * | 2016-10-25 | 2017-03-15 | 杭州云象网络技术有限公司 | A kind of monitoring nodes system for being applied to block chain |
CN106557543A (en) * | 2016-10-14 | 2017-04-05 | 深圳前海微众银行股份有限公司 | Node switching method and system |
CN106802852A (en) * | 2017-01-19 | 2017-06-06 | 郑州云海信息技术有限公司 | A kind of method of Linux platform component unified monitoring |
CN107819553A (en) * | 2017-09-28 | 2018-03-20 | 青岛海信网络科技股份有限公司 | A kind of control instruction feedback method and device |
CN108363610A (en) * | 2018-02-09 | 2018-08-03 | 华为技术有限公司 | A kind of control method and equipment of virtual machine monitoring plug-in unit |
CN109656570A (en) * | 2018-12-18 | 2019-04-19 | 江苏满运软件科技有限公司 | Group system and its operation method, electronic equipment and storage medium |
CN111506480A (en) * | 2020-04-23 | 2020-08-07 | 上海达梦数据库有限公司 | State detection method, device and system for components in cluster |
WO2020206638A1 (en) * | 2019-04-10 | 2020-10-15 | Beijing Voyager Technology Co., Ltd. | Systems and methods for data storage |
CN113051102A (en) * | 2019-12-26 | 2021-06-29 | 中国移动通信集团云南有限公司 | File backup method, device, system, storage medium and computer equipment |
CN115379012A (en) * | 2022-10-25 | 2022-11-22 | 航天云网数据研究院(广东)有限公司 | Industrial interconnection platform message queue deployment method and device based on identification analysis |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109324834A (en) * | 2018-09-19 | 2019-02-12 | 郑州云海信息技术有限公司 | A kind of system and method that distributed storage server is restarted automatically |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101667034A (en) * | 2009-09-21 | 2010-03-10 | 北京航空航天大学 | Scalable monitoring system supporting hybrid clusters |
CN102104628A (en) * | 2010-12-29 | 2011-06-22 | 北京新媒传信科技有限公司 | Server cluster system and management method thereof |
CN102394791A (en) * | 2011-10-26 | 2012-03-28 | 浪潮(北京)电子信息产业有限公司 | Downtime recovery method and system |
CN102761570A (en) * | 2011-04-28 | 2012-10-31 | 同济大学 | System and method for monitoring grid resources based on agents |
KR20130073372A (en) * | 2011-12-23 | 2013-07-03 | 주식회사 포스코 | Heat recovery apparatus of coke oven and method of the same |
-
2013
- 2013-12-23 CN CN201310717518.2A patent/CN103701661B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101667034A (en) * | 2009-09-21 | 2010-03-10 | 北京航空航天大学 | Scalable monitoring system supporting hybrid clusters |
CN102104628A (en) * | 2010-12-29 | 2011-06-22 | 北京新媒传信科技有限公司 | Server cluster system and management method thereof |
CN102761570A (en) * | 2011-04-28 | 2012-10-31 | 同济大学 | System and method for monitoring grid resources based on agents |
CN102394791A (en) * | 2011-10-26 | 2012-03-28 | 浪潮(北京)电子信息产业有限公司 | Downtime recovery method and system |
KR20130073372A (en) * | 2011-12-23 | 2013-07-03 | 주식회사 포스코 | Heat recovery apparatus of coke oven and method of the same |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104135388A (en) * | 2014-08-15 | 2014-11-05 | 曙光信息产业(北京)有限公司 | Safety management method of data nodes in distributed system |
CN104135388B (en) * | 2014-08-15 | 2017-06-06 | 曙光信息产业(北京)有限公司 | The method for managing security of back end in a kind of distributed system |
CN104866380A (en) * | 2015-06-18 | 2015-08-26 | 北京搜狐新媒体信息技术有限公司 | Method and device for processing state transition of cluster management system |
CN104866380B (en) * | 2015-06-18 | 2018-07-06 | 北京搜狐新媒体信息技术有限公司 | A kind for the treatment of method and apparatus of the state conversion of cluster management system |
CN105007193A (en) * | 2015-08-19 | 2015-10-28 | 浪潮(北京)电子信息产业有限公司 | Multi-layer information processing method, system thereof and cluster management node |
CN105915405A (en) * | 2016-03-29 | 2016-08-31 | 深圳市中博科创信息技术有限公司 | Large-scale cluster node performance monitoring system |
CN105872055A (en) * | 2016-03-31 | 2016-08-17 | 浪潮通用软件有限公司 | Online monitoring method and system for computer systems in network distributed deployment mode |
CN106126283A (en) * | 2016-06-21 | 2016-11-16 | 浪潮电子信息产业股份有限公司 | A kind of method of product allocation, Apparatus and system |
CN106126283B (en) * | 2016-06-21 | 2019-05-14 | 浪潮电子信息产业股份有限公司 | A kind of method, apparatus and system of product allocation |
CN106557543A (en) * | 2016-10-14 | 2017-04-05 | 深圳前海微众银行股份有限公司 | Node switching method and system |
CN106506203A (en) * | 2016-10-25 | 2017-03-15 | 杭州云象网络技术有限公司 | A kind of monitoring nodes system for being applied to block chain |
CN106506203B (en) * | 2016-10-25 | 2019-12-10 | 杭州云象网络技术有限公司 | Node monitoring system applied to block chain |
CN106802852A (en) * | 2017-01-19 | 2017-06-06 | 郑州云海信息技术有限公司 | A kind of method of Linux platform component unified monitoring |
CN107819553B (en) * | 2017-09-28 | 2020-10-30 | 青岛海信网络科技股份有限公司 | Control instruction feedback method and device |
CN107819553A (en) * | 2017-09-28 | 2018-03-20 | 青岛海信网络科技股份有限公司 | A kind of control instruction feedback method and device |
CN108363610A (en) * | 2018-02-09 | 2018-08-03 | 华为技术有限公司 | A kind of control method and equipment of virtual machine monitoring plug-in unit |
CN109656570A (en) * | 2018-12-18 | 2019-04-19 | 江苏满运软件科技有限公司 | Group system and its operation method, electronic equipment and storage medium |
CN109656570B (en) * | 2018-12-18 | 2022-03-22 | 江苏满运软件科技有限公司 | Cluster system, operation method thereof, electronic device and storage medium |
WO2020206638A1 (en) * | 2019-04-10 | 2020-10-15 | Beijing Voyager Technology Co., Ltd. | Systems and methods for data storage |
CN112352228A (en) * | 2019-04-10 | 2021-02-09 | 北京航迹科技有限公司 | Data storage system and method |
CN113051102A (en) * | 2019-12-26 | 2021-06-29 | 中国移动通信集团云南有限公司 | File backup method, device, system, storage medium and computer equipment |
CN113051102B (en) * | 2019-12-26 | 2024-03-19 | 中国移动通信集团云南有限公司 | File backup method, device, system, storage medium and computer equipment |
CN111506480A (en) * | 2020-04-23 | 2020-08-07 | 上海达梦数据库有限公司 | State detection method, device and system for components in cluster |
CN111506480B (en) * | 2020-04-23 | 2024-03-08 | 上海达梦数据库有限公司 | Method, device and system for detecting states of components in cluster |
CN115379012A (en) * | 2022-10-25 | 2022-11-22 | 航天云网数据研究院(广东)有限公司 | Industrial interconnection platform message queue deployment method and device based on identification analysis |
CN115379012B (en) * | 2022-10-25 | 2023-03-24 | 航天云网数据研究院(广东)有限公司 | Industrial interconnection platform message queue deployment method and device based on identification analysis |
Also Published As
Publication number | Publication date |
---|---|
CN103701661B (en) | 2017-08-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103701661A (en) | Method and system for realizing node monitoring | |
US10797962B2 (en) | Methods and apparatus for providing adaptive private network centralized management system data visualization processes | |
EP3526994B1 (en) | Network management interface | |
JP7138150B2 (en) | DISTRIBUTED TRAINING METHOD, SYSTEM, DEVICE, STORAGE MEDIUM, AND PROGRAM | |
EP2934036B1 (en) | System and method for managing cwsn communication data based on gui interaction | |
CN105052076B (en) | Network element management system and network element management method based on cloud computing | |
EP3764596A1 (en) | Data configuration method and apparatus | |
WO2015191649A1 (en) | Providing multiple synchronous serial console sessions using data buffering | |
CN105357048A (en) | Method and system for data synchronization of network equipment | |
CN113742031A (en) | Node state information acquisition method and device, electronic equipment and readable storage medium | |
CN113364638B (en) | Method, electronic device and storage medium for EPA networking | |
CN111327460A (en) | Gateway configuration method, cloud device, gateway and system | |
CN105262604A (en) | Virtual machine migration method and equipment | |
CN102148702B (en) | Method for managing network by utilizing network configuration protocol | |
CA2795782A1 (en) | Cloud node management method, system and central server | |
US11556100B2 (en) | Control method, related device, and system | |
CN102427474B (en) | Data transmission system in cloud storage | |
US11838176B1 (en) | Provisioning and deploying RAN applications in a RAN system | |
CN114363399B (en) | Control method and system of edge gateway | |
EP3629616B1 (en) | Data interaction method, device and equipment | |
CN102368715B (en) | IEC-61850 protocol gateway-based realization equipment for centralized network management of exchangers | |
KR20150088462A (en) | Method for linking network device in cloud environment and apparatus therefor | |
WO2017032212A1 (en) | Data stream processing method and apparatus | |
US10860409B2 (en) | Tracelog isolation of failed sessions at scale | |
EP2842271A1 (en) | Network management |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |