Summary of the invention
For the problems referred to above in correlation technique, the present invention proposes a kind of job stream management method and device of cluster, can realize the quick position to the job stream controller run in cluster.
Technical scheme of the present invention is achieved in that
According to an aspect of the present invention, a kind of job stream management method of cluster is provided.
This job stream management method comprises:
The job stream within the scope of scanning destination node, each node run controls the process number of process, determines that the job stream that each node runs controls process;
Control the progress information of process based on job stream that each node runs, determine the port numbers of this job stream controller corresponding to job stream control process;
According to the port numbers of job stream controller, determine the job stream controller that each node configures.
In addition, this job stream management method comprises further:
Control the progress information of process based on the job stream that each node runs, determine to initiate the user profile that job stream controls process.
Further, this job stream management method comprises further:
According to pre-configured user priority information, the job stream controlling each node runs is to the observability of user.
In addition, this job stream management method comprises further:
According to the port numbers of the job stream controller that the IP address information of each node and each node configure, determine the distributed intelligence of the job stream controller within the scope of destination node on each node.
In addition, this job stream management method comprises further:
By calling the job stream control command encapsulated in advance, multiple job stream controller is managed.
Wherein, when managing any one job stream controller, by determining multiple job streams that this job stream controller is corresponding on the node at place; Obtain work flow information and the job status information of each job stream in multiple job stream; According to the work flow information of each job stream and job status information, each job stream corresponding to this job stream controller is managed.
In addition, this job stream management method comprises further:
The job state of the job stream corresponding to the job stream controller that each node configures is followed the tracks of;
When the job state of job stream changes, in job scheduling system to should the status information of Mission Operations that changes of job state upgrade.
Further, this job stream management method comprises further:
Associated job step in advance: the job number of the operation in job stream with the Mission Operations in job scheduling system is associated by the attribute information according to the operation in job stream, generates operation-related information.
Corresponding, this job stream management method comprises further:
To should before the status information of Mission Operations that changes of job state upgrades in job scheduling system, search in job scheduling system according to operation-related information and whether have should the Mission Operations of the operation that job state changes in job stream;
When do not find to should the job state of job stream change the Mission Operations of operation, perform associated job step in advance.
According to a further aspect in the invention, a kind of job stream management devices of cluster is provided.
This job stream management devices comprises:
Scan module, controls the process number of process for scanning the job stream within the scope of destination node, each node run, determine that the job stream that each node runs controls process;
First determination module, for controlling the progress information of process based on job stream that each node runs, determines the port numbers of this job stream controller corresponding to job stream control process;
Second determination module, for the port numbers according to job stream controller, determines the job stream controller that each node configures.
The present invention by scan and determine job stream that node runs control process process number and to the port numbers of the job stream controller of process should be controlled by job stream, achieve the quick position to the job stream controller run in cluster.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, the every other embodiment that those of ordinary skill in the art obtain, all belongs to the scope of protection of the invention.
According to embodiments of the invention, provide a kind of job stream management method of cluster.
As shown in Figure 1, comprise according to the job stream management method of the embodiment of the present invention:
Step S101, the job stream within the scope of scanning destination node, each node run controls the process number of process, determines that the job stream that each node runs controls process;
Step S103, controls the progress information of process based on job stream that each node runs, determine the port numbers of this job stream controller corresponding to job stream control process;
Step S105, according to the port numbers of job stream controller, determines the job stream controller that each node configures.
Be described in detail technique scheme with ecflow below, in a specific embodiment, the node in cluster is divided three classes by the present invention: management node, web monitor node and other nodes.Wherein, management node can run server detection, the acquisition of job stream state and monitoring supervisor; Web monitor node then for providing the page access entrance of the monitor and managment function to job stream, namely provides a kind of visualization interface, and user manages the server on each node in cluster by this visualization interface; In addition, all nodes in cluster except management node and web monitor node all belong to other nodes.Further, management node and web monitor node can be deployed in same node, also can separately dispose.In addition, all nodes in cluster all can run ecflow server, also can be used as job run node (i.e. other nodes above-mentioned) simultaneously.
Because multiple server can be distributed on the different nodes of cluster, also can distribute on the same node, so in order to fast and locate position and the state of the ecflow server in cluster, in one embodiment, the detection flow process of ecflow server is comprised according to the job stream management method of the embodiment of the present invention, concrete known with reference to Fig. 2, user is in order to determine the ecflow server which in cluster, which node is distributed with, user can input the cluster detection that a specified node scope (i.e. destination node scope) carries out ecflow server, as can be seen from Figure 2, at system acceptance and get user input specified node scope after, port numbers and the process number of ecflow server on each node within the scope of this specified node of parallel scan will be come by parallel commands such as pssh, concrete, the job stream that each node of system meeting parallel scan runs controls the process number of process (i.e. ecflow server process), thus can determine each node runs there is which ecflow server process,
And, system also can based on the progress information of the ecflow server process that each node runs, determine the port numbers of this job stream controller (i.e. ecflow server) corresponding to ecflow server process, wherein, in order to realize the differentiation of different e cflow server, system is in advance for ecflow server is assigned with unique port numbers;
Then, system just according to ecflow server port numbers, can determine each node to be configured with which job stream controller.
It should be noted that, sweep velocity in the present embodiment in order to accelerate ecflow server have employed the mode of parallel scan, but along with requiring different to the speed of detection of ecflow server in cluster, serial or serial also can be adopted with the parallel mode combined to realize the search of ecflow server.
In addition, it can also be seen that from Fig. 2, the owning user (namely initiating the user of this ecflow server process) obtaining ecflow server is also comprised according to the job stream management method of the embodiment of the present invention, concrete, system based on the progress information of the ecflow server process that each node runs, can determine the user profile initiating this ecflow server process.Like this, system is after the owning user determining the ecflowserver that each node configures, just can according to pre-configured user priority information, control job stream that the ecflow server on each node runs to the observability of user, such as, for the prevalent user that rank is lower, it is only with the relevant information of the job stream run under seeing its ecflow server process initiated; And for the higher administrator of rank, it is visible to administrator that system just can control the job stream that the multiple ecflow server on multiple node run, or the job stream that on a node, multiple ecflowserver runs is visible to administrator, wherein, without the need to paying close attention to whether the process of multiple ecflow server is that this administrator initiates, thus realize the differentiated control of different stage user to different e cflowserver.
In the present embodiment, system default only scans the ruuning situation of the ecflow server that above-mentioned management node configures, and in actual applications, the node type comprised within the scope of the destination node that the present invention specifies for user does not do concrete restriction, and it can make flexible adjustment according to actual needs.
In addition, in order to the distribution situation of ecflow server under further clear and definite cluster, also can comprise the port numbers according to the ecflow server that the IP address information of each node and each node configure according to the job stream management method of the embodiment of the present invention, determine the distributed intelligence of the ecflow server within the scope of destination node on each node.That is, because nodes different under cluster is distinguished with IP address, ecflow server distinguishes with port numbers, so under cluster multiple node scope in, just can come within the scope of destination node, which ecflow server arbitrary node to be distributed with under automatic detection cluster with the combination of the port numbers of IP address and ecflow server, and which node ecflow server is distributed on arbitrarily.
Further, after determining the distribution situation of ecflow server in cluster, the status information of ecflow server can just be checked.
Can be found out by foregoing description, can find fast by means of technique scheme of the present invention and locate the status information of the position of the ecflow server run in cluster and the ecflow server of location.
Certainly, although what describe in the above-described embodiments is search for the ecflowserver of range of nodes certain in cluster, but those skilled in the art is to be understood that, even if do not determine a destination node scope, technical scheme of the present invention still can realize checking the search location of the ecflow server of nodes all in whole cluster and state.
In another embodiment, the present invention is in order to realize the centralized management to multiple ecflow server, job stream management method according to the embodiment of the present invention also can comprise: by calling the job stream control command encapsulated in advance, manage multiple job stream controller.Concrete:
On the one hand, the present invention realizes managing while multiple ecflowserver by adopting B/S (browser/server) pattern, on the other hand, job stream control command (being ecflow_client order here) encapsulates by the present invention, thus avoids the independent management to different e cflow server.
Wherein, in one embodiment, when managing any one ecflow server, Fig. 3 shows the job stream monitoring process flow diagram under many ecflow server, as can be seen from Figure 3, first system can obtain all application (i.e. All Jobs stream by ecflow_client from the ecflow server specified, wherein, because an ecflow server can corresponding one or more job stream, therefore, All Jobs stream herein can be one, also can be multiple), namely system determines multiple job streams that the ecflow server that specifies is corresponding on the node at place,
Then, system obtains work flow information and the job status information of each job stream in multiple job stream by ecflow_client, concrete, system can obtain the work flow information and job status information of specifying application in all application by ecflow_client, wherein, work flow information is operation order and the information such as dependence of each operation in job stream, and job status information is then the Job execution situation of a job stream;
So after the work flow information obtaining each job stream and job status information, just can resolve the work flow information of each job stream and job status information, and the information after resolving is back to patterned web interface, wherein, when carrying out data and returning, the operation being in different conditions in job stream can represent with different colors by system, thus provides the monitoring function to job stream; And based on the data returned, system can also provide operation, stops, hangs up, discharges, reruns, check the management functions such as output to the operation in job stream.
Wherein, because the monitoring flow process of the job stream to multiple ecflow server is identical, therefore, only the monitoring flow process to the job stream under an ecflow server is shown with Fig. 3.
Be it can also be seen that by foregoing description, by means of technique scheme of the present invention, present invention achieves the centralized management to multiple ecflow server, namely the connection to another ecflow server can be realized without the need to closedown and the connection of ecflow server, thus realize simultaneously to the job stream monitor and managment under multiple ecflow server, conveniently management and supervision ecflow state in large-scale cluster, makes keeper can grasp cluster overall condition fast; Further, the invention provides patterned web interface, facilitate the monitor and managment to ecflow application job stream mode; Further, by realizing the monitor and managment function of the ecflow application job stream of web version, user is without the need to just checking and management operations stream mode by instruments such as vnc.
In addition, in order to quick position job task run location in the cluster and ruuning situation, in one embodiment, also can comprise according to the job stream management method of the embodiment of the present invention: the job state of the job stream corresponding to the ecflow server that each node configures is followed the tracks of; When the job state of job stream changes, in job scheduling system to should the status information of Mission Operations that changes of job state upgrade.Concrete, monitor process flow diagram as can be seen from the job stream shown in Fig. 4 with associating of the operation in job scheduling system:
The job state of the job stream that system is run under following the tracks of the ecflow server that each node configures is to check whether the job state of job stream changes that (concrete can be understood as, there are three operation A performed according to sequencing in job stream, B, C, so original state three operations are all in queueing condition, when starting to perform operation A, then the job state of operation A is then updated to running status by queueing condition, and the job state of operation B and C is constant, the situation that the job state that the situation that now state of operation A changes just can be understood as job stream changes, namely the situation that the job state that there is operation in job stream changes), the current work judging job state in this job stream is also needed whether to be present in job scheduling system (such as PBS), the determining step of the whether corresponding operation ID namely in Fig. 4.
And in order to realize the judgement whether be present in the current work of job state in job stream in job scheduling system (such as PBS), in one embodiment, Job flow management system according to the embodiment of the present invention then comprises further: associated job step in advance, namely according to attribute information (the such as job name of the operation in this job stream, submit the user of this operation to, the information such as the key word of operation) job number of the operation in job stream with the Mission Operations of actual motion in job scheduling system is associated, thus generation operation-related information, can be understood as in a concrete example and the key word of the operation in job stream is associated with the job number (i.e. operation ID) of this operation in PBS, the two is same operation, but because the attribute information of operation in ecflow server and status information are sightless to user, and only have Hand up homework in ecflow server in PBS, just can realize the information of this operation to user visible, so the above-mentioned step of associated job in advance that the present invention is arranged.
So there is the above-mentioned step of associated job in advance, namely after generating operation-related information, the determining step of above-mentioned " whether corresponding operation ID " just can be performed according to the job stream management method of the embodiment of the present invention, whether system can be searched in PBS according to operation-related information has should the Mission Operations of the operation that job state changes in job stream, namely searches should the operation ID of key word in PBS according to the key word of the operation in job stream;
If inquire this operation ID in PBS, then represent that the current work (operation that job state changes) in job stream has been submitted in PBS, system just can upgrade the status information of the Mission Operations of this current work corresponding in PBS (such as current work is here operation A above, now just the state of the operation A in PBS can be updated to operation by queuing);
And if in PBS, do not find this operation ID, then representing that the operation of this job state change in job stream is not submitted in PBS (is execution statement before such as operation A, operation A is caused not to be submitted in PBS), so system just can submit the job in PBS, and perform the above-mentioned step of associated job in advance, make to exist the corresponding relation of the job number of this operation in the key word of this operation in job stream and PBS in operation-related information;
Then, system just can inquire this operation ID in PBS, thus upgrades in PBS should the status information of Mission Operations of operation ID;
Finally, after the state information updating of the Mission Operations that the state in PBS is changed, attribute information and the status information of this Mission Operations just can be checked from PBS, such as can this Mission Operations of quick position run location in the cluster (being positioned at which node) and ruuning situation, and obtain the concrete data of this Mission Operations, thus the operation achieved in job stream shows with associating of the operation in job scheduling system and monitors.
It should be noted that, although realize technique scheme of the present invention by means of ecflow job stream control software design in the foregoing description, but technique scheme of the present invention can be applied to the job stream control software design (such as SMS) of other types equally according to actual needs, the present invention is to this and be not specifically limited.
According to embodiments of the invention, additionally provide a kind of job stream management devices of cluster.
As shown in Figure 5, comprise according to the job stream management devices of the embodiment of the present invention:
Scan module 51, controls the process number of process for scanning the job stream within the scope of destination node, each node run, determine that the job stream that each node runs controls process;
First determination module 52, for controlling the progress information of process based on job stream that each node runs, determines the port numbers of this job stream controller corresponding to job stream control process;
Second determination module 53, for the port numbers according to job stream controller, determines the job stream controller that each node configures.
In one embodiment, comprise further according to the job stream management devices of the embodiment of the present invention:
3rd determination module (not shown), for controlling the progress information of process based on the job stream that each node runs, determines to initiate the user profile that job stream controls process.
Further, in one embodiment, comprise further according to the job stream management devices of the embodiment of the present invention:
Control module (not shown), for according to pre-configured user priority information, the job stream controlling each node runs is to the observability of user.
In addition, in one embodiment, comprise further according to the job stream management devices of the embodiment of the present invention:
4th determination module (not shown), for the port numbers according to the job stream controller that the IP address information of each node and each node configure, determines the distributed intelligence of the job stream controller within the scope of destination node on each node.
In addition, in one embodiment, comprise further according to the job stream management devices of the embodiment of the present invention:
Administration module (not shown), for by calling the job stream control command encapsulated in advance, manages multiple job stream controller.
Wherein, in one embodiment, described administration module (not shown) comprises:
Determine submodule (not shown), for determining multiple job streams that this job stream controller is corresponding on the node at place;
Acquisition module (not shown), for obtaining work flow information and the job status information of each job stream in multiple job stream;
Management submodule (not shown), for managing each job stream corresponding to this job stream controller according to the work flow information of each job stream and job status information.
In addition, in one embodiment, comprise further according to the job stream management devices of the embodiment of the present invention:
Tracking module (not shown), for following the tracks of the job state of the job stream corresponding to the job stream controller that each node configures;
Update module (not shown), for when the job state of job stream changes, in job scheduling system to should the status information of Mission Operations that changes of job state upgrade.
In addition, in one embodiment, comprise further according to the job stream management devices of the embodiment of the present invention:
Relating module (not shown), is associated the job number of the operation in job stream with the Mission Operations in job scheduling system for the attribute information according to the operation in job stream, generates operation-related information.
In addition, in one embodiment, comprise further according to the job stream management devices of the embodiment of the present invention:
Search module (not shown), for in job scheduling system to should before the status information of Mission Operations that changes of job state upgrades, search in job scheduling system according to operation-related information and whether have should the Mission Operations of the operation that job state changes in job stream;
Calling module (not shown), for when do not find to should the job state of job stream change the Mission Operations of operation, call described relating module (not shown).
In sum, the present invention is based on the high-performance calculation user environment that workflow builds, achieve the automatic discovery to the ecflow server run in cluster; And the centralized management achieved multiple ecflow server, conveniently management and supervision ecflow state in large-scale cluster, keeper can grasp cluster overall condition fast; Actual job in the task of ecflow and job scheduling system is associated simultaneously, can quick position task run location in the cluster and ruuning situation, obtain the concrete data of task run; And by providing patterned web page, conveniently can carry out monitor and managment to ecflow application job stream mode, by realizing the monitor and managment function of the ecflow application job stream of web version, make user without the need to checking and management operations stream mode by instruments such as vnc.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.