CN101308468A - Grid calculation environment task cross-domain control method - Google Patents

Grid calculation environment task cross-domain control method Download PDF

Info

Publication number
CN101308468A
CN101308468A CNA2008101241334A CN200810124133A CN101308468A CN 101308468 A CN101308468 A CN 101308468A CN A2008101241334 A CNA2008101241334 A CN A2008101241334A CN 200810124133 A CN200810124133 A CN 200810124133A CN 101308468 A CN101308468 A CN 101308468A
Authority
CN
China
Prior art keywords
grid
resource
user
virtual organization
job
Prior art date
Application number
CNA2008101241334A
Other languages
Chinese (zh)
Other versions
CN100570569C (en
Inventor
王汝传
莫晓莉
张琳
王海艳
陈建刚
王杨
Original Assignee
南京邮电大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京邮电大学 filed Critical 南京邮电大学
Priority to CNB2008101241334A priority Critical patent/CN100570569C/en
Publication of CN101308468A publication Critical patent/CN101308468A/en
Application granted granted Critical
Publication of CN100570569C publication Critical patent/CN100570569C/en

Links

Abstract

Disclosed is an operation cross-domain control method in a grid computing environment; the method uses a trust mechanism to realize the operation control in the grid environment, makes trust evaluation to the available resources in the grid, analyzes the operation to be processed, and uses a mobile agent to transfer the operation to an appropriate resource for execution, according to the information provided by a control system in the grid. The proposal overcomes the disadvantages of poor reliability, non-guaranteed response time, over-long running time on the resources and imperfect fault management in other operation control proposals, and can achieve self-adaptation to the resource and operation control in the grid, reduce the grid traffic and improve the utilization rate of the network to form an operational parallel solution, thereby achieving the purpose of improving the utilization efficiency of grid resources and the execution efficiency of grid computing, speeding up the execution of tasks and improving the accuracy of the results so as to enhance the processing efficiency of distributed systems.

Description

Operation cross-domain control method under the grid computing environment

Technical field

The present invention is a kind of being used at grid, and the operation controlling schemes that the utilization faith mechanism is realized has realized cross-domain operation, belongs to the interleaving techniques application of grid computing and Distributed Calculation.

Background technology

Grid computing has developed into a key areas of computer industry, the difference of this field and Distributed Calculation is, grid computing concentrates on resource sharing and collaborative work and high performance location more, is devoted to solve the problem of relevant resource sharing between a plurality of individualities or the tissue.Under the grid computing environment, adopt the distributed management control model based on Virtual Organization, it makes servers at different levels and job entity free from the work of complexity such as resource control, job scheduling and control.Servers at different levels just are responsible for collecting the various resource informations of its scope and setting up corresponding distributed data base, when certain entity submit job is handled request, system will be Virtual Organization of this job creation automatically, it is solely responsible for work such as the use of scheduling, resource of this operation and safety assurance, as shown in Figure 1.

In fact, be by the cooperation of a series of basic function module in the grid system, provide service to the user, the basic function module of grid system, as shown in Figure 2.

In grid system, a large amount of operations that is applied in is arranged, these use the various resources of sharing grid, and these application are commonly called operation, and so-called operation is meant the set of personal code work, data, task and related resource descriptor.And task management control is a technology that grows up along with the application development of Clustering and network technology, it is according to the resource requirement of operation and the state of gridding resource, the desired resource of operation is selected and distributed, and carry out the control that the scheduling of task and operation are carried out, its target is to realize the optimization of gridding resource is used, for grid user provides better QoS.

The purpose that grid work control will reach is as follows:

1. provide excellent user interface, control data I/O, and the correctness of assurance data;

2. control the life cycle of operation, this is the basic function of grid work control, is responsible for grid and creates all processes that extremely finishes to return result of calculation from submitting to;

3. the coupling of resource and operation utilizes the resource control module to seek appropriate resources for operation.

4. be responsible for operation and decompose and migration, realize the load balance of resource.

The grid job management control system mainly contains Condor-G system, Sun grid engine and GRAM at present.

The grid computing technology that the Globus project is developed provides the standard agreement and the service of remote resource access.Particularly by GSI (grid security infrastructure Grid Security Infrastructure), grid resource allocation management GRAM (Grdi Resource Allocation Manager), overall secondary storage service GASS agreements such as (GlobalAcess to Secondary Storage), can realize inter-organization safety long-distance executive system, and can not make an amendment and use existing batch processing system.But the successful combination that realizes these three agreements is not an easy thing, can handle a large amount of operations even it needs client to run under the situation of complex errors yet.In order to reach this purpose, Condor-G is developed in Condor and Globus project cooperation.

Sun grid engine SGE (Sun Grid Engine) is the set of the computational resource of executing the task with mesh definition.System provides single inlet, and the user once can submit a plurality of operations to, and need not to consider to carry out details.

There are some services to be combined among the GT3 and become GRAM together.GRAM provides a simple interface that uses remote system for the user.The user can carry out " operation " by this interface on remote resource, the most frequently used function of GRAM is exactly operation submission and operation control.

Summary of the invention

Technical matters: the purpose of this invention is to provide the operation cross-domain control method under a kind of grid computing environment, the method that the application of the invention proposes can realize the safe dynamic of available resources in the grid is searched the adaptivity of controlling with operation, make the current field reduce the traffic of grid, improve the utilization factor of network, form the parallel of operation and find the solution, thereby improve the utilization ratio of gridding resource and the execution efficient of grid computing.If resource runs into irresistible natural cause such as power down and leaves grid, the operation node is not also given information the grid control gear, at this moment just has no idea to integrate, and obtains final correct result.In order to allow the user when the burst disaster takes place, still can obtain correct operation result, need monitor each resource node, handle the subjob fault immediately, under the situation of the current field scarcity of resources, in time carry out cross-domain operation.

Technical scheme: the final purpose of grid is exactly for a kind of convenient environment that carries out high-performance calculation is provided to the user.For as close as possible data source is carried out in the operation that makes us, reduce cost on network communication, save bandwidth, balanced load, strengthen the monitoring of subtask node, accelerate task executions, thereby improve the treatment effeciency of distributed system and result's accuracy, we have proposed a kind of operation controlling schemes of using faith mechanism.

Trust is the assessment to the confidence level of an entity identities and behavior, relevant with reliability, sincerity and the performance of this entity, trust is a subjective concept, depends on experience, usually represent the height of reliability rating with trust value, trust value is dynamic change with the behavior of entity.Grid resources such as the personal computer that distributes on the geography, workstation, cluster, scientific instrument with customer contact.The grid entity comprises resource and user, difference according to difference of organizing under the grid entity and geographic position, we become several independently autonomous territories (Autonomous Domain) to grid dividing, each autonomous territory comprises the plurality of grids entity, oneself operating strategy, security strategy are arranged, connect by network between the autonomous territory.By being different autonomous territories, can solve the autonomy and the isomerism problem of extensibility, website easily to grid dividing.When entities different in the grid will be concluded the business, need know the trusting relationship between them, according to the difference in entity autonomous territory of living in, we are divided in the territory trusting relationship between the entity between the trusting relationship between the entity and territory to the trusting relationship between the entity.Here only simply applied to trust model in a kind of territory and come trust value between the computational entity, the trusting relationship between the territory between the entity is not considered.

Operation control under the grid environment of the present invention in the operation cross-domain control method is the utilization faith mechanism, and has realized the cross-domain operation of grid, and concrete steps are as follows:

Step 1: before submit job, the user at first will become the user of this grid through registration,

Step 2: before the user added grid, the grid application layer carried out the initialization of environment, for the ensuing a series of activities of grid user are prepared,

Step 3: if user identity is legal, grid is determined the access control right of user to resource, the request of grid user submit job,

1. grid user is filled in the operation that will submit to: grid user is when submitting grid work to, need provide the zero-time and the termination time of task names, job description and the operation execution of submission, in the process of submitting to, submit to the host of this operation spontaneously local ip address and host name to be attached in the job description

2. grid user is submitted the operation of oneself to, legitimacy, the user capture control authority grade of the operation that grid Virtual Organization control gear need be submitted to grid user are tested, if this job request is legal, there is not the semantic conflict problem, grid Virtual Organization job controller will be accepted this request

3. this grid user operation enters the operation wait to row in the grid Virtual Organization, and solicited status is set to: submit state, wait for scheduled for executing,

Step 4: the operation control gear at grid Virtual Organization center is that the order scheduling is carried out in operation, regularly extracts the operation that is positioned at head of the queue in the operation waiting list, if formation is not empty, and execution in step 5; Otherwise the wait of operation control gear enters formation until the operation that has the user to submit to,

Step 5: the operation control gear obtains the descriptor of operation,

Step 6: filter out available computational resource at the grid work control gear according to faith mechanism,

Step 7: the operation control gear is that resource matched scheduling is carried out in operation, determines to be assigned to the subtask of each computational resource,

Step 8: operation is decomposed, moved: behind the resource node that obtains mating, Virtual Organization's service end is divided the operation that the user submitted to according to the resource performance of coupling, this job assignment algorithm is to divide according to the resource performance weight, the workload that the combination property height of resource then is divided into is also big, and then workload is little on the contrary; Virtual Organization's service end is given corresponding resource node by mobile agent platform startup mobile agent with the job assignment of each division then, if job migration success, operation control gear job state is set to the state of being ready to and enters step 9, otherwise, job state is set to error status and enters step 11

Step 9: the subtask is migrated to computational resource, accepts the scheduling of local resource operating system,

Step 10: when the operation in the Virtual Organization was moved to the resource node operation by mobile agent, Virtual Organization's service end started watcher thread, and monitoring results and resource node see whether return operation result,

Finish if user inquiring goes out operation, then can check the execution result of operation, otherwise enter step 11 by the identification number of input operation,

Step 11: return failure if occurred the operation result of certain resource operation in this process, at this moment Virtual Organization's service end is distributed again with regard to the operation that needs to distribute to this resource, if at this moment other are in resource nodes operation that also do not end task of operation, Virtual Organization's service end then needs to ask the service of other Virtual Organization to bring in to assist to finish this section operation, thereby the service end of Virtual Organization 1 sends to this operation the service end of Virtual Organization 2, also send this user's authentication assertion simultaneously, the service end of Virtual Organization 2 is asserted to this and is verified, if by then just receiving this section operation, and in the territory of Virtual Organization 2 correspondingly Resources allocation move, last operation result returns to the service end of Virtual Organization 1, the user is integrated and returned to this service end to these results again, so just realized the cross-domain dynamic migration scheduling of grid work.

The mentality of designing of this trust model is to be starting point with the direct or indirect trusting relationship between each resource node of grid and the user, carries out that modeling and coding realizes, has the tree-shaped relation as Fig. 3 between user and the resource node.

This dendrogram is divided into 4 layers, and h represents the height set. a certain user be in ground floor be tree root (h=1) by that analogy, be cotyledon up to h=4.Come for oneself provides service if the user wants to find in the grid all to meet the resource node that oneself requires, then will travel through one by one as destination node, filter out enabled node then with all nodes.

1. at first open user's trust record, search the resource node that direct trusting relationship is arranged with the user earlier, getting IP address this node of last bit representation has 120,170,190 3, earlier from node 120, if not destination node is the source with this 120 node again then, carries out degree of depth traversal.

2. finish back (find h=4 170 till) up to degree of depth traversal and just carry out range, promptly horizontal traversal turns back to the h=2 layer, searches node 170, if not destination node, and does not have trust record, then forward to layer next node 190.

3. carry out degree of depth traversal again, all travel through one time EOP (end of program) up to all nodes.

4. at each node in the grid one or more trust path is arranged all, it is integrated, draw the final trust value of user after the weighted mean, and then filter out the node that meets the demands, finish the job task that this user submits to according to the confidence level threshold value to this node.

Mobile agent be one can be in heterogeneous network independently from a host migration to an other main frame, and can with other agency or the mutual program of resource.In fact it is a synthesis of acting on behalf of Agent technology and distributed computing technology.

For the operation that needs in the grid to handle, search available resource at first dynamically, the available resources here are meant meet the demands and the resource node online free time that filters out according to trust value, and come multifactorial evaluation according to resource performance separately, dynamically operation are decomposed.Rely on the information that control system provided in the grid, and utilize mobile agent (Mobile agency) that it is migrated on the adequate resources and carry out.If occur some in the process of implementation or some resource nodes takes place unusual and can't return correct result, will carry out abnormality processing so, on other nodes of grid, preserve the copy of this operation in operational process, comprise working procedure, input data, descriptor etc.In order to reduce grid work execution time and network service load, mobile agent migrates to the subjob of abnormal nodes on other nodes of local domain as much as possible.Here only a marker need be set, when the operation node has any unusual and can't return normal information or result, it is unusual just to dish out, marker is put other values, enter the abnormality processing stage, be about to this abnormal nodes and be made as off-line state, and the scope of work in its information is taken out, other normal node that meet in this territory continue to carry out, and gather correct result at last.If but in local domain scarcity of resources, be difficult to find when meeting the resource that grid work describes, mobile agent is just carried out copy with its operation and is sent to other territories, allows it seek adequate resources for this job requirements, the cross-domain operation in realizing dispatching.

Resource exists a life cycle: comprise the registration of resource, shared and cancellation.Its detailed process is as follows:

1. to resouce controller registration oneself.

2. resouce controller is write the log-on message of resource in the resource information database.The result of registration has had the information of oneself in resource information database.Resource after the registration has just become gridding resource.

3. when the user needs resource, file a request to resouce controller.

4. resouce controller obtains the information of coupling resource from resource information database, returns to the user, and the user has obtained resource information.

5. resource information has been arranged, and server just can be various mutual with carrying out between the resource.

Grid work control center need carry out task to the operation that grid user is submitted to and decompose, and tree row branched structure has been adopted in the decomposition of operation here, as shown in Figure 4.

The wherein original operation of root node A in grid work submission interface, submitting to, and the operation of really carrying out on the grid computing resource node is leaf node E, F, G, H, I.The decomposition of grid work should be considered the static load problem in the grid environment, the distribution that is each task all requires the computing power of resource node to satisfy the computation requirement of task node, to avoid the bigger task of calculated amount to be assigned on the resource of computing power difference, perhaps the less task of calculated amount is assigned on the strong resource of computing power, realizes static load balance.

After task was decomposed, the ensuing work of our desired grids was that the gridding task after decomposing is issued in grid, and continued to carry out in the current time migrates to the host of available resources.The meaning of job migration is:

1. realization load balance.Load balance is that the user obtains the prerequisite that good service quality and resource are fully shared, and in the job run stage, adopts resource migration mechanism, and to the underloading node, the load that makes each resource in the system is balance roughly with a part of job migration on the heavily loaded node.

2. processing operation fault and resource are left request.When resource because fault or capabilities limits can not continue to move the operation that has moved on it again the time, can continue the operation of operation these job migrations on other resources.When resource proposes to withdraw from grid request, the grid work that is moving on it is moved on other resources, permit resource and withdraw from grid, respect the wish of resource owner.

3. make full use of gridding resource, reduce the overall overhead of operation.

The free migration of mobile agent decision operation, it is that difference according to migrating objects is divided into code migration and data migtation.In order to reduce grid work execution time and network service load, mobile agent migrates to our grid work in the local LAN (Local Area Network) as much as possible, only in local area network, be difficult to find under the situation of the resource that meets the grid work description, the Agent that mobile agent is carried out copy with its operation is sent to gateway, allow it seek to continue in the adequate resources in another one or the several LAN (Local Area Network) to carry out, as shown in Figure 5 for this job requirements.

For the loading problem under the grid environment, because the composition structure of computational resource is very complicated in the grid computing environment, but it can be by the LAN (Local Area Network) of up to ten thousand single PCs, a plurality of cluster even several tissues.Owing to the difference of computational load, the difference of processor architecture, the reasons such as difference of high-speed cache service efficiency, the unbalanced of computational load between each resource node caused in the capital, cause the computational resource node idle waiting that has, the excessive phenomenon of computational resource node load that has.

We require can be to the computing power c of computational resource iComputation requirement ψ with parallel task jAll carry out quantitative description comparatively accurately, make that the distribution of task each time all requires the computing power of resource node to satisfy the computation requirement of task node, to avoid the bigger task of calculated amount to be assigned on the resource of computing power difference, perhaps the less task of calculated amount is assigned on the strong resource of computing power, thus the load balance of the task of realization.So if the computing power parameter of computational resource and the computation requirement amount of parallel task can reflect real situation more exactly, the resource that computing power is strong in the system can obtain more task so, this meets the demand of the load balance of grid environment.

For the network service loading problem, why grid has powerful distributed computation ability, has benefited from it and utilizes gridding resource with using up all institute's energy.Therefore but this has also brought our problem that need pay close attention to of another one: the network service load.At present, communication network is the physical basis of grid, the migrating to remote resource node, interprocess communication etc. none does not need the support of communication network of the processing of grid work such as job entity.This will certainly produce a large amount of network service loads, and how reducing these loads as much as possible also is the problem that our designing institute will be considered.

Grid provides the physical basis that can carry out parallel computation for people.Just as noted earlier, in the grid owing to the difference of computational load, the difference of processor architecture, the reasons such as difference of high-speed cache service efficiency, the unbalanced of computational load between each resource node caused in the capital, cause the computational resource node idle waiting that has, the excessive phenomenon of computational resource node load that has.

When atomic task is assigned to when beginning to calculate on the computational resource, taken all or part of computing power of this resource, resouce controller will deduct the shared part of this atomic task from the computing power of current resource.Simultaneously, in order to guarantee when distributing other parallel task, to access correct computing power parameter value, in on the non-leafy node of non-atomic task, beginning to dispatch, also this non-atomic task aggregate demand is deducted from the computing power of this resource node at resource tree.Certainly after the calculating of atomic task is finished, resouce controller will recover by the shared part of this atomic task, as shown in Figure 6 in the computing power parameter of current resource.

Be defined as follows variable: T i: the operation that the user submits to; R i: the gridding resource node; c i: R iThe valuation of CPU computing power; Link I, j: R iAnd R jBetween bandwidth; ψ J, i: T jBe assigned to R iThe task workload.Under the prerequisite of the CPU computing power of each resource of considering gridding, for any R i, its computing power e i = 1 1 c i + A link 0 , i , Gridding task control center is with T jDecompose, because ψ j = Σ i = 1 n ψ j , i , So at R kGo up assigned ψ j , k = ψ j Σ i = 1 n c i * link 0 , i link 0 , j + A * c i c k * link 0 , k link 0 , k + A * c k . Moving into and moving out, c along with migration task in the computational resource iValue can continuous thereupon corresponding adjustment.

Beneficial effect:

(1) the utilization faith mechanism can effectively find out available resource node.

(2) the utilization mobile agent can be moved on the client servers at different levels or central server of grid environment, carries out local high-speed communication with it, and it no longer takies Internet resources, thereby greatly reduces the traffic of grid, and has improved utilization efficiency of network resources.

(3) can independently calculation task be moved to another node from a node in the isomery lattice computing environment that on the region, distributes; And mutual with other agency or resource, the control and the self-adaptation of realization operation and resource.

(4) in grid computing, mobile agent does not need unified scheduling.Can be asynchronous by the agency that the user creates in the operation of various computing node, finish again and send the result to user etc. task.Same user or same computing node can be created multiple agency, in one or more node operations, form the parallel ability of finding the solution simultaneously.

(5) overcome that the response time is not guaranteed and shortcoming that working time on resource may be long etc.Under the situation of this territory scarcity of resources, call the resource in other territories, effectively by the cross-domain task that fulfils assignment.

(6) operation control effectively provides excellent user interface, and control data I/O, and the correctness of assurance data are responsible for grid and are created all processes that returns result of calculation to end from submitting to.

(7) effectively be responsible for operation and decompose and migration, realize the load balance of resource.

Description of drawings

Fig. 1 is job entity and the Virtual Organization's synoptic diagram under the grid computing environment.

Fig. 2 is the basic function module synoptic diagram of grid system.

Fig. 3 is the associated nodes dendrogram of user ZL.

Fig. 4 is a task tree branched structure.

Fig. 5 is the grid work transition graph.

Fig. 6 is the operation exploded view.

Fig. 7 is the grid work hierarchy of control structural drawing of utilization mobile agent.

Fig. 8 is that life cycle figure is carried out in operation control.

Fig. 9 is the job scheduling figure that uses the grid security platform of mobile agent.

Figure 10 uses the operation of grid security platform of the mobile agent cross-domain scheduling graph when unusual.

Embodiment

One. architecture

The main grid assembly of utilization faith mechanism:

The mobile agent back-up environment: as the middleware of mobile agent operation, provide that mobile agent moves, safe and intelligent basic-level support, can be integrated with other grid assembly.

Node: be the supplier of grid computing resource, make a general reference various computer equipments, instrument etc.

The grid control system: be responsible for unified command, the Coordination Treatment that different grid users use resource and judge whether need when unusual cross-domain; The information service of grid computing is provided, can adopts information inquiry, collection and dissemination method based on mobile agent.

Operation agency: be to be used for the collaborative grid task of finishing a complexity according to the mobile agent (or sub agent) that certain job description standard generates.

The structure of grid work control system:

Grid work control is the module of being responsible for control mesh operation life cycle.A kind of grid work hierarchy of control structure of using mobile agent has been proposed, as shown in Figure 7 here.

The client terminal local mobile agent: describe in grid clients input job request at validated user, the local mobile agent of client is according to this job request descriptor generating mesh operation and be committed to operation control center of grid Virtual Organization.

Job information: store the grid work under all various state queues, and the execution information of grid work, as the executing state of operation, the execution data of operation etc.

Job scheduling: for grid work carries out order scheduling, coupling scheduling.

Operation is decomposed: operation must be decomposed according to the resource control information in the control center of grid Virtual Organization is dynamic.

Job assignment: subtask and resource are mated.

The service end agency: service end the agency communicate according to the mobile agent in job assignment module and the host.

The host agency: host has been represented gridding resource, in case open mobile agent, registers in regional region, just means that this gridding resource is effective in Virtual Organization.

Grid work control need be finished following task:

1. the whole life of control operation is responsible for operation and is submitted to beginning up to the overall process of returning result of calculation to the user from the user;

2. search adequate resources for operation, the coupling job requirements.According to the demand of user job, from grid, select adequate resources in the current available resource, and selected resources allocation is used to the user;

3. the I/O of control operation.The I/O of grid work is generally all carried out between remote node, but these characteristics might not embody at the code of operation, input may be to read keyboard, output may be to write screen, the grid work control gear is wanted and can be read data from correct position, can be to correct position write data;

4. be responsible for the migration of operation, operation from the then operation of a resource migration to a new resource, is realized the load balance of resource.Owing to can not accurately predict the actual conditions of job run, laod unbalance in grid, also can occur and need the situation of job migration, the dynamic turnover of resource also needs to carry out the migration of operation.

The operation control gear also will provide the job information query interface, so that the user obtains the job status information that oneself is submitted at any time.

At present, the operation of supporting on the grid all is batch processing job mostly, and after the user submitted to, grid will find the appropriate nodes running job, needs to return to user result behind the end of run.General operation seldom needs in operational process or even does not need and user interactions again.

The scheduling of the operation in the grid computing environment comprise operation decomposition, resource discovering and choose, Task Distribution, task run, task supervision and recovery, task coordinate and six aspects such as integrated.

1. the major function of operation decomposition is the subtask that a plurality of high as far as possible degree of parallelisms are resolved in the operation of submitting to.

Resource discovering with choose: resource that resource owner should be issued and access strategy are given resource media (resource matchmaker); Releasing news of these resources of resource media storage; Its resource requirement information of resource requestor issue is given the resource media; The resource media is chosen adequate resources according to the demand information of resource requestor and is gathered to resource requestor.

3. Task Distribution: an operation is broken down into m task T={T1, T2 ..., Tm} has n available resource R={R1 in the system, R2 ..., Rn}.The purpose of Task Distribution be exactly with this m module assignment in n resource, make the performance objective functional value minimum of expection.

4. task run: resource reservation; The submission task is to resource; Preparatory stage can comprise foundation, segmentation transportation, require reservation of resource or other related resource action of need preparing to run application; Task under the control of local scheduling strategy, operation task.

5. task monitors and recovers: task monitors two purposes: be convenient between user and the operation alternately; Be the job control program feedback information in time, be convenient to job control program and make a policy fast.

6. task coordinate and integrated: carry out between can finishing the work by a coordinator synchronously.After all tasks were finished, we must integrate their execution result, become the result of whole task.In addition, grid work scheduling comprises that also performance evaluation, the QoS of scheduling consider or the like function.

Two. method flow

Flow process is carried out in grid work control:

Generally speaking, the execution of grid work is all carried out on remote node, and a complete grid work control performance period as shown in Figure 8.

1. before submit job, the user at first will become the user of this grid through registration.

2. before the user added grid, the grid application layer at first will carry out the initialization of environment, for the ensuing a series of activities of grid user are prepared.

3. if user identity is legal, grid is determined the access control right of user to resource, the request of grid user submit job.

1. grid user is filled in the operation that will submit to.

Grid user need provide the zero-time and the termination time of task names, job description and the operation execution of submission when submitting grid work to.In the process of submitting to, submit to the host of this operation spontaneously local ip address and host name to be attached in the job description.The reason of doing like this is to cause ensuing series of steps such as operation issue to make mistakes for fear of IP address and host name thereof that grid user is wrongly write this machine.

2. grid user is submitted the operation of oneself to; Legitimacy, the user capture control authority grade of the operation that grid Virtual Organization control gear need be submitted to grid user are tested, if this job request is legal, do not have the semantic conflict problem, and grid Virtual Organization job controller will be accepted this request.

3. this grid user operation enters the operation wait to row in the grid Virtual Organization, and solicited status is set to: submit state, wait for scheduled for executing.

Grid user is after submit job, and operation enters grid Virtual Organization center job waiting list.Each submitted operation all is endowed unique identification number.Grid Virtual Organization dispatching center can carry out order scheduling and coupling scheduling for grid work.For the order scheduling, grid work is followed the principle of " FIFO ", and the grid work control gear at Virtual Organization center is always selected the operation that is positioned at head of the queue and at first handled in the operation waiting list.The resource of record grid user in the grid Virtual Organization, Virtual Organization is that current operation selects suitable gridding resource to mate scheduling according to the faith mechanism of being mentioned before.

4. the operation control gear at grid Virtual Organization center is that the order scheduling is carried out in operation, regularly extracts the operation that is positioned at head of the queue in the operation waiting list, if formation is not empty, and execution in step 5; Otherwise the operation control gear is waited for and is entered formation until the operation that has the user to submit to.

5. the operation control gear obtains the descriptor of operation, as submission person's user profile, job content etc.

6. filter out available computational resource (its number is no more than the maximal value of available resources number in the resource control) according to faith mechanism at the grid work control gear.

7. the operation control gear is that resource matched scheduling is carried out in operation, determines to be assigned to the subtask of each computational resource.

8. operation is decomposed, moved.

Behind the resource node that obtains mating, the VO service end is divided the operation that the user submitted to according to the resource performance of coupling, this job assignment algorithm just (comprises cpu performance according to the resource performance weight at present, bandwidth performance, internal memory performance carries out comprehensively) divide, the workload that the combination property height of resource then is divided into is also big, and then workload is little on the contrary.The VO service end is given corresponding resource node by mobile agent platform startup agent with the job assignment of each division then, as Fig. 9.If the job migration success, operation control gear job state is set to the state of being ready to and enters step 9, otherwise job state is set to error status and enters step 11.

9. the subtask is migrated to computational resource, accepts the scheduling of local resource operating system.

10. when the operation among the VO was moved to the resource node operation by agent, the VO service end started watcher thread, and monitoring results and resource node see whether return operation result.

The effector of grid Virtual Organization can inquire about the running status of grid work in the present formation by operation control center.For grid user, can utilize the job state of the identification number inquiry submission of operation, show the current running status of operation, as being arranged at present, which resource carrying out this operation, the process status of the content of the subjob that each resources allocation is arrived and current each resource running job.

Finish if user inquiring goes out operation, then can check the execution result of operation, otherwise enter step 11 by the identification number of input operation.

11. return failure (its reason such as this resource node goes offline or overload and paralysis etc.) if occurred the operation result of certain resource operation in this process, at this moment the VO service end is distributed again with regard to the operation that needs to distribute to this resource, if at this moment other are in resource nodes operation that also do not end task of operation, the VO service end then needs to ask other VO service ends (as: VO2) to assist to finish this section operation, as Figure 10.Thereby the VO1 service end sends to the VO2 service end with this operation, and the SAML that also sends this user simultaneously asserts, the VO2 service end asserts to this and verify, if by then just can receiving this section operation, and in the VO2 territory correspondingly Resources allocation move.Last operation result returns to the VO1 service end, and the user is integrated and returned to the VO1 service end to these results again, has so just realized the cross-domain dynamic migration scheduling of grid work.

Claims (1)

1. the operation cross-domain control method under the grid computing environment is characterized in that operation control in this method is the utilization faith mechanism, and has realized the cross-domain operation of grid, and concrete steps are as follows:
Step 1: before submit job, the user at first will become the user of this grid through registration,
Step 2: before the user added grid, the grid application layer carried out the initialization of environment, for the ensuing a series of activities of grid user are prepared,
Step 3: if user identity is legal, grid is determined the access control right of user to resource, the request of grid user submit job,
1. grid user is filled in the operation that will submit to: grid user is when submitting grid work to, need provide the zero-time and the termination time of task names, job description and the operation execution of submission, in the process of submitting to, submit to the host of this operation spontaneously local ip address and host name to be attached in the job description
2. grid user is submitted the operation of oneself to, legitimacy, the user capture control authority grade of the operation that grid Virtual Organization control gear need be submitted to grid user are tested, if this job request is legal, there is not the semantic conflict problem, grid Virtual Organization job controller will be accepted this request
3. this grid user operation enters the operation wait to row in the grid Virtual Organization, and solicited status is set to: submit state, wait for scheduled for executing,
Step 4: the operation control gear at grid Virtual Organization center is that the order scheduling is carried out in operation, regularly extracts the operation that is positioned at head of the queue in the operation waiting list, if formation is not empty, and execution in step 5; Otherwise the wait of operation control gear enters formation until the operation that has the user to submit to,
Step 5: the operation control gear obtains the descriptor of operation,
Step 6: filter out available computational resource at the grid work control gear according to faith mechanism,
Step 7: the operation control gear is that resource matched scheduling is carried out in operation, determines to be assigned to the subtask of each computational resource,
Step 8: operation is decomposed, moved: behind the resource node that obtains mating, Virtual Organization's service end is divided the operation that the user submitted to according to the resource performance of coupling, this job assignment algorithm is to divide according to the resource performance weight, the workload that the combination property height of resource then is divided into is also big, and then workload is little on the contrary; Virtual Organization's service end is given corresponding resource node by mobile agent platform startup mobile agent with the job assignment of each division then, if job migration success, operation control gear job state is set to the state of being ready to and enters step 9, otherwise, job state is set to error status and enters step 11
Step 9: the subtask is migrated to computational resource, accepts the scheduling of local resource operating system,
Step 10: when the operation in the Virtual Organization was moved to the resource node operation by mobile agent, Virtual Organization's service end started watcher thread, and monitoring results and resource node see whether return operation result,
Finish if user inquiring goes out operation, then can check the execution result of operation, otherwise enter step 11 by the identification number of input operation,
Step 11: return failure if occurred the operation result of certain resource operation in this process, at this moment Virtual Organization's service end is distributed again with regard to the operation that needs to distribute to this resource, if at this moment other are in resource nodes operation that also do not end task of operation, Virtual Organization's service end then needs to ask the service of other Virtual Organization to bring in to assist to finish this section operation, thereby the service end of Virtual Organization 1 sends to this operation the service end of Virtual Organization 2, also send this user's authentication assertion simultaneously, the service end of Virtual Organization 2 is asserted to this and is verified, if by then just receiving this section operation, and in the territory of Virtual Organization 2 correspondingly Resources allocation move, last operation result returns to the service end of Virtual Organization 1, the user is integrated and returned to this service end to these results again, so just realized the cross-domain dynamic migration scheduling of grid work.
CNB2008101241334A 2008-06-13 2008-06-13 Operation cross-domain control method under the grid computing environment CN100570569C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2008101241334A CN100570569C (en) 2008-06-13 2008-06-13 Operation cross-domain control method under the grid computing environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2008101241334A CN100570569C (en) 2008-06-13 2008-06-13 Operation cross-domain control method under the grid computing environment

Publications (2)

Publication Number Publication Date
CN101308468A true CN101308468A (en) 2008-11-19
CN100570569C CN100570569C (en) 2009-12-16

Family

ID=40124933

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2008101241334A CN100570569C (en) 2008-06-13 2008-06-13 Operation cross-domain control method under the grid computing environment

Country Status (1)

Country Link
CN (1) CN100570569C (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101907989A (en) * 2010-06-01 2010-12-08 南京大学 Mobile agent-based application seamless migration method
CN101958808A (en) * 2010-10-18 2011-01-26 华东交通大学 Cluster task dispatching manager used for multi-grid access
CN101977395A (en) * 2010-10-04 2011-02-16 桂林电子科技大学 Node trust management system in wireless sensor network
CN102147750A (en) * 2011-01-27 2011-08-10 中国农业银行股份有限公司 Method and system for processing operation
CN102656560A (en) * 2009-12-15 2012-09-05 国际商业机器公司 Concurrent execution of request processing and analytics of requests
CN102687486A (en) * 2009-12-28 2012-09-19 瑞典爱立信有限公司 Social web of objects
CN102685266A (en) * 2012-05-14 2012-09-19 中国科学院计算机网络信息中心 Zone file signature method and system
CN102694877A (en) * 2012-05-14 2012-09-26 中国科学院计算机网络信息中心 Zone file signature control method, device and zone file signature system
CN102799467A (en) * 2011-05-27 2012-11-28 金蝶软件(中国)有限公司 Method and system for allocating tasks
CN102841822A (en) * 2011-06-23 2012-12-26 腾讯科技(深圳)有限公司 Method and system for performing crash protection on jobTracker hosts
CN103092698A (en) * 2012-12-24 2013-05-08 中国科学院深圳先进技术研究院 System and method of cloud computing application automatic deployment
CN103581200A (en) * 2013-11-15 2014-02-12 中国科学院信息工程研究所 Method and system for achieving fast circulation of structural file among multiple levels of safety domains
CN103617086A (en) * 2013-11-20 2014-03-05 东软集团股份有限公司 Parallel computation method and system
CN103701894A (en) * 2013-12-25 2014-04-02 浙江省公众信息产业有限公司 Method and system for dispatching dynamic resource
US8874638B2 (en) 2009-12-15 2014-10-28 International Business Machines Corporation Interactive analytics processing
US8892762B2 (en) 2009-12-15 2014-11-18 International Business Machines Corporation Multi-granular stream processing
CN104239144A (en) * 2014-09-22 2014-12-24 珠海许继芝电网自动化有限公司 Multilevel distributed task processing system
CN104506600A (en) * 2014-12-16 2015-04-08 苏州海博智能系统有限公司 Computation resource sharing method, device and system as well as client side and server
CN105592160A (en) * 2015-12-30 2016-05-18 南京邮电大学 Service-consumer-oriented resource configuration method in cloud computing environment
CN105630598A (en) * 2015-12-29 2016-06-01 宇龙计算机通信科技(深圳)有限公司 Data processing method, data processing device and wearable intelligent device
CN105700948A (en) * 2014-11-24 2016-06-22 阿里巴巴集团控股有限公司 Method and device for scheduling calculation task in cluster
CN107704318A (en) * 2017-09-20 2018-02-16 北京京东尚科信息技术有限公司 The method and apparatus of example scheduling
CN108255607A (en) * 2018-01-08 2018-07-06 武汉斗鱼网络科技有限公司 Task processing method, device, electric terminal and readable storage medium storing program for executing
WO2020082702A1 (en) * 2018-10-24 2020-04-30 Huawei Technologies Co., Ltd. Objective driven dynamic object placement optimization

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8819183B2 (en) 2009-12-15 2014-08-26 International Business Machines Corporation Concurrent execution of request processing and analytics of requests
US8892762B2 (en) 2009-12-15 2014-11-18 International Business Machines Corporation Multi-granular stream processing
US8874638B2 (en) 2009-12-15 2014-10-28 International Business Machines Corporation Interactive analytics processing
CN102656560A (en) * 2009-12-15 2012-09-05 国际商业机器公司 Concurrent execution of request processing and analytics of requests
US9491181B2 (en) 2009-12-28 2016-11-08 Telefonaktiebolaget L M Ericsson Social web of objects
CN102687486A (en) * 2009-12-28 2012-09-19 瑞典爱立信有限公司 Social web of objects
CN101907989A (en) * 2010-06-01 2010-12-08 南京大学 Mobile agent-based application seamless migration method
CN101977395A (en) * 2010-10-04 2011-02-16 桂林电子科技大学 Node trust management system in wireless sensor network
CN101977395B (en) * 2010-10-04 2013-05-22 桂林电子科技大学 Node trust management system in wireless sensor network
CN101958808A (en) * 2010-10-18 2011-01-26 华东交通大学 Cluster task dispatching manager used for multi-grid access
CN102147750A (en) * 2011-01-27 2011-08-10 中国农业银行股份有限公司 Method and system for processing operation
CN102799467A (en) * 2011-05-27 2012-11-28 金蝶软件(中国)有限公司 Method and system for allocating tasks
CN102841822A (en) * 2011-06-23 2012-12-26 腾讯科技(深圳)有限公司 Method and system for performing crash protection on jobTracker hosts
CN102841822B (en) * 2011-06-23 2016-10-05 腾讯科技(深圳)有限公司 Carry out delaying the method and system of machine protection to jobTracker main frame
CN102694877A (en) * 2012-05-14 2012-09-26 中国科学院计算机网络信息中心 Zone file signature control method, device and zone file signature system
CN102685266A (en) * 2012-05-14 2012-09-19 中国科学院计算机网络信息中心 Zone file signature method and system
CN103092698A (en) * 2012-12-24 2013-05-08 中国科学院深圳先进技术研究院 System and method of cloud computing application automatic deployment
CN103581200B (en) * 2013-11-15 2016-06-29 中国科学院信息工程研究所 A kind of realize the method and system that between multilevel security territory, structured document quickly circulates
CN103581200A (en) * 2013-11-15 2014-02-12 中国科学院信息工程研究所 Method and system for achieving fast circulation of structural file among multiple levels of safety domains
CN103617086A (en) * 2013-11-20 2014-03-05 东软集团股份有限公司 Parallel computation method and system
CN103617086B (en) * 2013-11-20 2017-02-08 东软集团股份有限公司 Parallel computation method and system
CN103701894A (en) * 2013-12-25 2014-04-02 浙江省公众信息产业有限公司 Method and system for dispatching dynamic resource
CN104239144A (en) * 2014-09-22 2014-12-24 珠海许继芝电网自动化有限公司 Multilevel distributed task processing system
CN105700948A (en) * 2014-11-24 2016-06-22 阿里巴巴集团控股有限公司 Method and device for scheduling calculation task in cluster
CN104506600A (en) * 2014-12-16 2015-04-08 苏州海博智能系统有限公司 Computation resource sharing method, device and system as well as client side and server
CN105630598A (en) * 2015-12-29 2016-06-01 宇龙计算机通信科技(深圳)有限公司 Data processing method, data processing device and wearable intelligent device
CN105630598B (en) * 2015-12-29 2019-06-11 宇龙计算机通信科技(深圳)有限公司 Data processing method, data processing equipment and wearable smart machine
CN105592160A (en) * 2015-12-30 2016-05-18 南京邮电大学 Service-consumer-oriented resource configuration method in cloud computing environment
CN105592160B (en) * 2015-12-30 2019-09-13 南京邮电大学 Resource allocation method towards service consumer under a kind of cloud computing environment
CN107704318A (en) * 2017-09-20 2018-02-16 北京京东尚科信息技术有限公司 The method and apparatus of example scheduling
CN108255607A (en) * 2018-01-08 2018-07-06 武汉斗鱼网络科技有限公司 Task processing method, device, electric terminal and readable storage medium storing program for executing
WO2019134304A1 (en) * 2018-01-08 2019-07-11 武汉斗鱼网络科技有限公司 Task processing method and apparatus, electronic terminal, and readable storage medium
WO2020082702A1 (en) * 2018-10-24 2020-04-30 Huawei Technologies Co., Ltd. Objective driven dynamic object placement optimization

Also Published As

Publication number Publication date
CN100570569C (en) 2009-12-16

Similar Documents

Publication Publication Date Title
Lin et al. Bandwidth‐aware divisible task scheduling for cloud computing
CN103414761B (en) Mobile terminal cloud resource scheduling method based on Hadoop framework
Qureshi et al. Survey on grid resource allocation mechanisms
US9038078B2 (en) System and method for a workload management and scheduling module to manage access to a compute environment according to local and non-local user identity information
Arunarani et al. Task scheduling techniques in cloud computing: A literature survey
Mateescu et al. Hybrid computing—where HPC meets grid and cloud computing
Jayasinghe et al. Improving performance and availability of services hosted on iaas clouds with structural constraint-aware virtual machine placement
US8917744B2 (en) Outsourcing resources in a grid computing environment
US8631410B2 (en) Scheduling jobs in a cluster having multiple computing nodes by constructing multiple sub-cluster based on entry and exit rules
Ge et al. GA-based task scheduler for the cloud computing systems
Sun et al. Modeling a dynamic data replication strategy to increase system availability in cloud computing environments
Elmroth et al. Grid resource brokering algorithms enabling advance reservations and resource selection based on performance predictions
Czajkowski et al. Agreement-based resource management
US8332862B2 (en) Scheduling ready tasks by generating network flow graph using information receive from root task having affinities between ready task and computers for execution
US7668741B2 (en) Managing compliance with service level agreements in a grid environment
US7707288B2 (en) Automatically building a locally managed virtual node grouping to handle a grid job requiring a degree of resource parallelism within a grid environment
Ernemann et al. Economic scheduling in grid computing
Sotiriadis et al. SimIC: Designing a new inter-cloud simulation platform for integrating large-scale resource management
US8359223B2 (en) Intelligent management of virtualized resources for cloud database systems
Gmach et al. Adaptive quality of service management for enterprise services
Chang et al. Optimal resource allocation in clouds
Amato et al. Exploiting cloud and workflow patterns for the analysis of composite cloud services
US7761557B2 (en) Facilitating overall grid environment management by monitoring and distributing grid activity
Beiriger et al. Constructing the ASCI computational grid
US8275881B2 (en) Managing escalating resource needs within a grid environment

Legal Events

Date Code Title Description
PB01 Publication
C06 Publication
SE01 Entry into force of request for substantive examination
C10 Entry into substantive examination
GR01 Patent grant
C14 Grant of patent or utility model
ASS Succession or assignment of patent right

Owner name: JIANGSU YITONG HIGH-TECH CO., LTD.

Free format text: FORMER OWNER: NANJING POST + TELECOMMUNICATION UNIV.

Effective date: 20101117

TR01 Transfer of patent right

Effective date of registration: 20101117

Address after: 215500 Jiangsu Province, Changshou City Tonglinlu No. 28

Patentee after: JIANGSU YITONG HIGH-TECH Co., LTD.

Address before: 210003 Nanjing City, Jiangsu Province, the new model road No. 66

Patentee before: Nanjing Post & Telecommunication Univ.

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 210003 NO. 66, XINMOFAN ROAD, NANJING CITY, JIANGSU PROVINCE TO: 215500 NO.28, TONGLIN ROAD, CHANGSHU CITY, JIANGSU PROVINCE