CN103701661A

CN103701661A - Method and system for realizing node monitoring

Info

Publication number: CN103701661A
Application number: CN201310717518.2A
Authority: CN
Inventors: 刘璧怡; 郭美思; 宗栋瑞; 吴楠
Original assignee: Inspur Beijing Electronic Information Industry Co Ltd
Current assignee: Inspur Beijing Electronic Information Industry Co Ltd
Priority date: 2013-12-23
Filing date: 2013-12-23
Publication date: 2014-04-02
Anticipated expiration: 2033-12-23
Also published as: CN103701661B

Abstract

The application discloses a method and a system for realizing node monitoring. The system comprises a main server and agent servers, wherein the agent servers operate on data nodes and are correspondingly independent; the main server is connected with a name node and used for acquiring cluster configuration information, sending state instructions and control instructions to the agent servers based on a heartbeat protocol, and receiving node state information uploaded by the agent servers so as to update the cluster configuration information; the agent servers are used for receiving the state instruction and control instruction information of the main server, acquiring the data node state information according to the state instructions, uploading the data node state information to the main server, performing working state control on components of the data nodes according to the control instructions and feeding back control instruction results to the main server. According to the method and the system, the agent servers receive the state instruction and control instruction information of the main server so as to acquire the data node status information, send the control instructions and feedback the control instruction result information, so that the data node monitoring management is realized.

Description

A kind of method and system that realize monitoring nodes

Technical field

The present invention relates to large data processing technique, espespecially a kind of be applicable to the large data platform of distributed system architecture (hadoop) realize monitoring nodes method and system.

Background technology

Be accompanied by the development of digital living, the volume of data increases sharply with mysterious speed, and consequent large data also become and are more and more difficult to process.Large data are data processing and the application models that adopt based on cloud computing, by the integration to data, share, intersect the intellectual resources of multiplexing formation and the ability of knowledge services.And large data platform is the base support of large data technique application.

The large data platform of current most popular hadoop is a distributed system base platform of being developed by Apache foundation.The large data platform of hadoop has in the situation that user does not understand distributed bottom details, just can carry out distributed program exploitation, the feature of having utilized fully the power of cluster to carry out high-speed computation and storage.The node scale of a hadoop cluster often comprises tens, and even thousands of back end up to a hundred, due in large scale, make fast and accurately back end in monitoring management cluster become extremely difficult.

At present, the node state that the shell dos command line DOS that the large data platform of hadoop provides by cluster or browser are checked cluster.If a certain node in cluster is carried out to control operation, need to log in individually this node, by shell instruction, this node is carried out to control operation.When the node in cluster occurs extremely delaying machine, the service that need to delay before machine by recovering manually this node, then this node is added to cluster, could recover cluster and normally work.Adopt the method for manual reversion to have complex operation, when expending a large amount of manpower, also easily introduce new mistake, make that in large-scale cluster environment operation is monitored and replied to clustered node very inconvenient.

Summary of the invention

In order to solve the problems of the technologies described above, the invention discloses a kind of method and system that realize monitoring nodes, can carry out effective monitoring to the state information of back end, when occurring extremely delaying machine, can to the back end of the machine of delaying, recover to control timely and effectively.

The invention provides a kind of method that realizes monitoring nodes, comprising:

A master server and the corresponding independently proxy server moving on each back end; Wherein,

Master server is connected with title node, for obtaining cluster configuration information from title node; Based on heart-beat protocol, issue status command and control command information to proxy server; The node status information that Receiving Agent server is uploaded, to upgrade cluster configuration information;

Proxy server, for receiving status command and the control command information of master server, obtains back end state information according to status command, is uploaded to master server; According to control command, each assembly of back end is carried out to working state control, and by control command result feedback to master server.

Further, master server also for, when back end is delayed machine extremely, the control command that sends the machine node recovery configuring of delaying according to cluster configuration information is to proxy server;

Proxy server also for, according to control command, control back end and according to cluster configuration information, recover the operating state of each assembly of back end, and by control command result feedback to master server.

Further, master server is also for, the back end state information of obtaining according to proxy server, and whether specified data node occurs the machine of extremely delaying.

Further, master server specifically for, by message queue mode, issue status command and control command information.

Further, proxy server specifically for, adopt message queue mode uploading data node status information and FEEDBACK CONTROL instruction results information.

On the other hand, the application also provides a kind of method that realizes monitoring nodes,

One master server is set on title node, independently proxy server is set respectively accordingly on each back end;

Master server obtains cluster configuration information from title node, based on heart-beat protocol, issues status command and control command information to proxy server;

Proxy server, according to status command acquisition of information back end state information, carries out working state control according to control command information to controlling each assembly of node;

Back end state information and control command object information are sent to main service node, carry out cluster configuration information renewal.

Further, the method also comprises:

When back end is delayed machine extremely, the control command information that described master server sends according to cluster configuration information the machine node recovery configuring of delaying is to proxy server;

Proxy server is controlled back end according to control command and according to cluster configuration information, is recovered the operating state of each assembly of back end.

Further, the back end state information that master server obtains according to proxy server, whether specified data node there is the machine of extremely delaying.

Further, master server by message queue mode, issue status command and control command information.

Further, proxy server adopts message queue mode, uploading data node status information and FEEDBACK CONTROL instruction results information.

The application provides a kind of technical scheme, comprising: a master server and the corresponding independently proxy server moving on each back end; Wherein, master server is connected with title node, for obtaining cluster configuration information from title node; Based on heart-beat protocol, issue status command and control command information to proxy server; The node status information that Receiving Agent server is uploaded, to upgrade cluster configuration information; Proxy server, for receiving status command and the control command information of master server, obtains back end state information according to status command, is uploaded to master server; According to control command, each assembly of back end is carried out to working state control, and by control command result feedback to master server.The present invention realizes status command and the control command information that proxy server receives master server, to obtain back end state information, to issue control command FEEDBACK CONTROL instruction results information, realizes the monitoring management to back end.

Accompanying drawing explanation

Accompanying drawing described herein is used to provide a further understanding of the present invention, forms the application's a part, and schematic description and description of the present invention is used for explaining the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:

Fig. 1 is the structured flowchart that the present invention realizes the system of monitoring nodes;

Fig. 2 is the flow chart that the present invention realizes the method for monitoring nodes.

Embodiment

For technical scheme of the present invention is understood fully, heart-beat protocol is carried out to the statement of summary.Reception in network and transmission data are all to use the SOCKET in WINDOWS to realize.But if this socket disconnects, that just necessarily has problem when sending data and receiving data.Judge whether socket can be with realizing by heart-beat protocol exactly.In fact in TCP, realized a mechanism that is called heartbeat.If be provided with heartbeat, that TCP will send the heartbeat of the number of times arranging within the regular hour, and this information can not affect defined agreement.So-called " heartbeat " is exactly self-defining structure of timed sending, allows the other side know this service " online ".To guarantee the validity of link.

A self-defining structure of timed sending (heartbeat packet), with the validity of guaranteeing to connect, the main contents of Here it is heart-beat protocol.

Fig. 1 is the structured flowchart that the present invention realizes the system of monitoring nodes, as shown in Figure 1, comprising:

Further, described master server also for, when back end is delayed machine extremely, the control command that sends the machine node recovery configuring of delaying according to cluster configuration information is to proxy server;

Master server is also for, the back end state information of obtaining according to proxy server, and whether specified data node occurs the machine of extremely delaying.

Master server specifically for, by message queue mode, issue status command and control command.

Proxy server specifically for, adopt message queue mode uploading data node status information and FEEDBACK CONTROL instruction results.

Fig. 2 is the flow chart that the present invention realizes the method for monitoring nodes; As shown in Figure 2, comprising:

Step 200, a master server is set on title node, at each back end, independently proxy server is set respectively accordingly.

Step 201, master server obtain cluster configuration information from title node, based on heart-beat protocol, issue status command and control command information to proxy server.

In this step, master server by message queue mode, issue status command and control command.

Step 202, proxy server, according to status command acquisition of information back end state information, carry out working state control according to control command information to controlling each assembly of node.

In this step, proxy server adopts message queue mode, uploading data node status information and FEEDBACK CONTROL instruction results information.

Step 203, back end state information and control command object information are sent to main service node, carry out cluster configuration information renewal.

The inventive method also comprises:

When back end is delayed machine extremely, the control command information that described master server sends according to cluster configuration information the machine node recovery configuring of delaying is to proxy server.

In this step, the back end state information that master server obtains according to proxy server, whether specified data node there is the machine of extremely delaying.

Described proxy server is controlled back end according to control command and according to cluster configuration information, is recovered the operating state of each assembly of back end.

One of ordinary skill in the art will appreciate that all or part of step in said method can come instruction related hardware to complete by program, described program can be stored in computer-readable recording medium, as read-only memory, disk or CD etc.Alternatively, all or part of step of above-described embodiment also can realize with one or more integrated circuits.Correspondingly, each the module/unit in above-described embodiment can adopt the form of hardware to realize, and also can adopt the form of software function module to realize.The application is not restricted to the combination of the hardware and software of any particular form.

The above, be only preferred embodiments of the present invention, is not intended to limit protection scope of the present invention.Within the spirit and principles in the present invention all, any modification of making, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims

1. a system that realizes monitoring nodes, is characterized in that, comprising: a master server and the corresponding independently proxy server moving on each back end; Wherein,

2. system according to claim 1, is characterized in that,

Described master server also for, when back end is delayed machine extremely, the control command that sends the machine node recovery configuring of delaying according to cluster configuration information is to proxy server;

Described proxy server also for, according to control command, control back end and according to cluster configuration information, recover the operating state of each assembly of back end, and by control command result feedback to master server.

3. system according to claim 2, is characterized in that, described master server is also for, the back end state information of obtaining according to proxy server, and whether specified data node occurs the machine of extremely delaying.

4. system according to claim 1, is characterized in that, described master server specifically for, by message queue mode, issue status command and control command information.

5. system according to claim 1, is characterized in that, described proxy server specifically for, adopt message queue mode uploading data node status information and FEEDBACK CONTROL instruction results information.

6. a method that realizes monitoring nodes, is characterized in that, comprising:

7. method according to claim 6, is characterized in that, the method also comprises:

8. method according to claim 7, is characterized in that, the back end state information that described master server obtains according to proxy server, and whether specified data node there is the machine of extremely delaying.

9. method according to claim 6, is characterized in that, described master server by message queue mode, issue status command and control command information.

10. method according to claim 6, is characterized in that, described proxy server adopts message queue mode, uploading data node status information and FEEDBACK CONTROL instruction results information.