CN106708624B - Self-adaptive adjustment method for multi-working-domain computing resources - Google Patents

Self-adaptive adjustment method for multi-working-domain computing resources Download PDF

Info

Publication number
CN106708624B
CN106708624B CN201611048286.6A CN201611048286A CN106708624B CN 106708624 B CN106708624 B CN 106708624B CN 201611048286 A CN201611048286 A CN 201611048286A CN 106708624 B CN106708624 B CN 106708624B
Authority
CN
China
Prior art keywords
computing
working
domain
calculation
units
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611048286.6A
Other languages
Chinese (zh)
Other versions
CN106708624A (en
Inventor
王胜明
黄河
徐泰山
苏寅生
徐健
周剑
郭剑
梅勇
邵伟
姚海成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NANJING NANRUI GROUP CO
China Southern Power Grid Co Ltd
Nari Technology Co Ltd
Original Assignee
NANJING NANRUI GROUP CO
China Southern Power Grid Co Ltd
Nari Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NANJING NANRUI GROUP CO, China Southern Power Grid Co Ltd, Nari Technology Co Ltd filed Critical NANJING NANRUI GROUP CO
Priority to CN201611048286.6A priority Critical patent/CN106708624B/en
Publication of CN106708624A publication Critical patent/CN106708624A/en
Application granted granted Critical
Publication of CN106708624B publication Critical patent/CN106708624B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5011Pool

Abstract

The invention belongs to the field of distributed computing, and provides a self-adaptive adjustment method for computing resources with multiple working domains. The invention can comprehensively consider the constraint conditions of reference working period, calculation resource distribution priority, shortest calculation period, longest allowable calculation period and the like of each working domain, determine the number of the calculation units distributed by each activated working domain based on the execution time of the calculation tasks in the calculation units, and standardize the calculation resources of the calculation nodes with different hardware configurations according to the calculation units on the basis, thereby realizing the self-adaptive adjustment of the calculation resources among a plurality of working domains.

Description

Self-adaptive adjustment method for multi-working-domain computing resources
Technical Field
The invention belongs to the field of distributed computing, and particularly relates to a self-adaptive adjustment method for computing resources of multiple working domains.
Background
In the field of distributed computing, especially in the field where a large amount of computation is required, for example, in the field of safety and stability analysis of power systems, the computation speed has become a key factor for improving the practicability of application functions. Parallel computing is an effective method for improving the analysis and calculation performance of large-scale and complex power grids, and the mature application of a distributed computing management platform promotes the popularization of the online application of the safety and stability analysis of the power system. Because the real-time requirement of a computing cycle exists in a plurality of online application functions, and offline multi-users have the randomness characteristic, the utilization rate of computing resources independently configured is low. On the basis of organizing and controlling the computing process according to the working domains, the distributed computing management platform urgently needs the support of computing resources for dynamically carrying out optimal allocation and self-adaptive adjustment among a plurality of working domains, effectively improves the reliability of online application and promotes the efficient utilization of the computing resources of offline application.
In the literature, a fault-tolerant method of mutual backup by cluster nodes (patent No. CN02159479.1) proposes a fault-tolerant method of mutual backup by cluster nodes. The method comprises the steps that machine group nodes are connected, communicated and backed up through a heartbeat ring; the main node distributes the position of the newly added node in the cluster and returns the service information born by the newly added node; when the nodes in the cluster find that the adjacent nodes are abnormal, confirming the adjacent nodes; the master node takes over the failure service. The method mainly solves the problem of node hot standby in cluster management, but the method does not consider the problem of dynamic allocation of computing resources among a plurality of computer clusters.
The second document, "a cluster application management system and its application management method" (patent No. CN201010286186.3) proposes a cluster application management system applied to large-scale cluster management. The system comprises an execution engine module and a database module, wherein the database module is used for storing the processing results of all applications in real time and establishing a monitoring table to record the change information of the processing results of all the related applications of a plurality of applications; the execution engine module is used for executing each application in the machine group system, writing the processing result of each application into the database module in real time, reading the monitoring table in the database module periodically, judging whether the triggering condition of each application is met according to the read change information of the processing results of the plurality of applications, and triggering the corresponding application if the triggering condition is met.
Document three, a distributed computing multi-application function asynchronous concurrent scheduling method (patent No. CN201110005759.5), proposes a method for asynchronous concurrent scheduling of multiple application functions, which is applied to a distributed computing management platform. The distributed computing management platform integrates the time-consuming characteristics and the number of computing tasks of each application function and the scale and the performance information of the computer group nodes, forms computing jobs of each application function by independently setting a proper computing job scheduling granularity for each application function, adds the computing jobs into a scheduling sequence of the distributed computing management platform, and realizes asynchronous concurrent submission of the computing tasks of a plurality of application functions, unified scheduling computation of the computing jobs of a plurality of application functions and asynchronous recovery of the computing results of a plurality of application functions.
In the three methods, one of the documents only solves the problem of node hot standby in a single working domain, and does not consider the problems of node hot standby and dynamic allocation of computing resources among a plurality of working domains; the second document realizes a cluster application management system applied to large-scale cluster management based on a database, and does not consider the difference requirements of the application function operation calculation cycles of different data sources (online and offline application scenes) of the power system. The third document only solves the asynchronous concurrent scheduling method of multiple application functions in a single computer cluster, and the cluster fixedly allocates corresponding computing resources, and cannot dynamically adjust the computing resources according to emergencies (such as changes of activation states of work domains, operating states of cluster nodes, and the like) of the distributed computing management platform in the operating process, so as to meet the requirements of the computing cycle of the system. Therefore, the three methods do not well solve the problem of dynamic management of computing resources of the power system, the computing period of the online system cannot be fully guaranteed, and the computing resources of the offline system cannot be utilized to the maximum extent.
Disclosure of Invention
The invention aims to: aiming at the computing characteristics and the working period of a plurality of application functions under different online and offline application scenes and the dynamic requirements on computing resources generated by the computing characteristics and the working period, the multi-working-domain computing resource self-adaptive adjusting method suitable for the distributed computing management platform is provided.
Specifically, the invention is realized by adopting the following technical scheme, which comprises the following steps:
1) defining the minimum combination of various computing resources required for completing the computing of one computing task as a computing unit according to different requirements of the computing task on various computing resources, carrying out standardized processing on each computing node according to the computing resources contained in the computing unit, measuring the computing capacity of each computing node through non-negative integer multiples of the computing unit, and entering step 2);
2) aiming at the computing requirements of each application function in the online and offline application scenes of the power system, a computer group is divided into a plurality of working domains, each working domain independently organizes, schedules and manages computing tasks based on a distributed computing management platform, and the interaction of computing data and control information is not directly performed between the working domains in the computing process;
according to the importance levels of all the working domains and the requirements on the calculation time, uniformly setting parameters of a reference working period, a calculation resource allocation priority, a shortest calculation period and a longest allowable calculation period for the working domains, and converting the shortest calculation period and the longest allowable calculation period into the maximum allocation calculation unit number and the minimum reservation calculation unit number according to the expected execution time of a single calculation task on one calculation unit;
when the activation state of any working domain or the running state of any computing node changes, turning to step 3);
3) calculating the number of computing units pre-distributed by each activated working domain based on the reference working cycle, the computing resource distribution priority, the shortest computing cycle and the longest allowed computing cycle parameters set by each activated working domain as well as the currently activated working domain and the normally operated computing nodes; if the total number of the currently available computing units is less than or equal to the sum of the minimum reserved computing units of all the currently activated working domains, sequentially performing computing unit pre-allocation according to the minimum reserved computing unit number according to the sequence from high to low of the computing resource allocation priority of each activated working domain until all the computing units are allocated;
4) aiming at all the working domains which are pre-distributed, calculating the number of normal operation calculation units which participate in the regulation of each working domain based on the number of calculation units pre-distributed by each working domain and the number of normal operation calculation units distributed by each working domain before the regulation, and determining the working domain to which each calculation node belongs after the regulation by combining the switching priority of each calculation node and the number of calculation units contained by each calculation node based on the principle that the number of calculation nodes participating in the regulation is as small as possible; and a prompt is given to an activation work domain which is not allocated to the computing resource with the minimum reserved computing unit number;
5) and the distributed computing management platform on each computing node senses the working domain change information, switches from the original working domain to the adjusted working domain, and completes the self-adaptive adjustment of the computing nodes among the plurality of working domains.
The above technical solution is further characterized in that, in step 1), each computing node performs computing unit standardization processing according to formula (1) based on the computing resources it has, and determines the number of effective computing units included in each computing node, so as to measure its computing power:
Figure BDA0001161602340000041
wherein n is the number of the cluster computing nodes in the system, uiThe number of computing units contained in the ith computing node is a non-negative integer;
Rcpu(i)、Rmem(i) and Rio(i) The CPU of the i-th calculation node,Memory and IO computing resources;
Rcpu 0、Rmem 0and Rio 0Respectively, CPU, memory and IO computational resources that standard computational units have.
The above technical solution is further characterized in that, in the step 2), the shortest calculation period and the longest allowable calculation period of each work domain are converted into the maximum allocation calculation unit number and the minimum reservation calculation unit number by formulas (2) and (3), respectively:
Figure BDA0001161602340000042
Figure BDA0001161602340000043
where m is the number of working domains in the system, tmin(j) Is the shortest calculation period of the jth work domain, tmax(j) Is the longest allowed calculation period, f, of the jth work domainjNumber of application functions, s, to be run for jth work domainjkThe number of computing tasks for the kth application function running for the jth work domain,
Figure BDA0001161602340000044
an execution time is predicted on one computing unit for a single computing task of the kth application function,
Figure BDA0001161602340000045
is a sign of an upward rounding operation;
cmax(j) the number of the maximum distribution computing units of the jth working domain is represented by a non-negative integer; c. Cmin(j) Is a non-negative integer and represents the minimum reserved calculation unit number of the j-th work domain.
The above technical solution is further characterized in that the step 3) specifically comprises the following steps:
3.1) taking the sum of the number of the computing units contained in all the normal operation computing nodes as the number c of the computing units to be distributeduThe number of pre-allocation computing units of each working domain is initialized to 0; counting the sum of the minimum reserved calculation units of all the currently activated working domains
Figure BDA0001161602340000051
Where m is the number of working fields in the system, djThe activation state of the jth working domain is that the working domain is activated when the value of the activation state is 1, and the working domain is not activated when the value of the activation state is 0;
if it is
Figure BDA0001161602340000052
Turning to step 3.2); otherwise, turning to the step 3.8);
3.2) performing descending arrangement on each working domain in the activated state according to the distribution priority of the computing resources from high to low, selecting the working domain with the top order as the working domain to be pre-distributed, and entering the step 3.3);
3.3) solving the number of pre-allocation calculation units of the working domain to be pre-allocated according to the formula (4):
Figure BDA0001161602340000053
wherein r isjCalculating the number of units for the pre-allocation of the jth work domain; c. CiFor the operating state of the ith computing node, ciA value of 1 indicates that the compute node is functioning properly and a value of 0 indicates that the compute node is malfunctioning ηjThe reference working period set for the jth working domain is a calculation resource distribution coefficient among different working domains;
Figure BDA0001161602340000054
is a rounded-down operation sign;
3.4) if the number of the pre-distribution computing units of the working domain to be pre-distributed is larger than the maximum number of the distribution computing units of the working domain, updating the value of the number of the pre-distribution computing units of the working domain to be pre-distributed to the maximum number of the distribution computing units of the working domain;
3.5) determining the work area to be pre-allocatedWhether the number of pre-allocation computing units is less than or equal to the number c of computing units to be allocateduIf it is less than or equal to cuThen c will beuIs updated to cuSubtracting the pre-distribution calculating unit number of the working domain to be pre-distributed, otherwise updating the value of the pre-distribution calculating unit number of the working domain to be pre-distributed to cuAnd update cuIs 0;
3.6) determination of cuIf the number is 0, entering the step 4) if the number is 0, otherwise, transferring the working domain which is arranged behind the working domain to be pre-allocated at this time as a new working domain to be pre-allocated to the step 3.3) to pre-allocate the computing resources of the next activated working domain until the pre-allocation of the computing resources of all the activated working domains is completed;
3.7) the number c of the computing units to be allocated after the pre-allocation of the computing resources of all the activated working domains is finisheduIf the number of the computing units is still larger than 0, sequentially allocating one computing unit to each activated working domain according to the computing resource allocation priority of each working domain from high to low for all activated working domains which do not reach the maximum allocation computing unit number constraint until all the computing units are allocated to the working domains or all the computing units allocated to the activated working domains reach the maximum allocation computing unit number; entering step 4);
3.8) for all the activated working domains, pre-distributing the computing units to all the activated working domains in sequence according to the minimum reserved computing unit number of each activated working domain from high to low according to the computing resource distribution priority of each activated working domain until all the computing units are distributed; step 4) is entered.
The above technical solution is further characterized in that the step 4) specifically comprises the following steps:
4.1) calculating the number of the calculation units participating in the adjustment of each working domain based on the formula (7) according to the number of the calculation units pre-allocated to each working domain and the number of the calculation units allocated to each working domain before the adjustment, aiming at all the working domains:
Δsj=pj-rj(1≤j≤m) (7)
wherein p isjNormal operation allocated before this adjustment for jth working domainThe number of row calculation units; r isjCalculating the number of units for the pre-allocation of the jth work domain; Δ sjAnd the number of the computing units participating in switching adjustment after the optimization allocation for the jth working domain is carried out, wherein the value of the number of the computing units is greater than 0, which indicates that the jth working domain switching computing unit is used for other working domains, and the value of the number of the computing units is less than 0, which indicates that the other working domain switching computing units are used for the jth working domain.
4.2) for each working domain with the number of the calculation units participating in adjustment in the working domain set larger than 0, sequentially selecting one calculation node in a normal operation state from high to low according to the switching priority of each calculation node in the working domain, marking as k, and updating the value of the number of the calculation units participating in adjustment in the working domain to be the number of the calculation units participating in adjustment and ukA difference of (d); if the number of the computing units participating in the adjustment of the working domain after the update is greater than or equal to 0, adding the computing node k into the computing node set to be switched, continuing to select the next computing node in the working domain in the normal operation state, and if the number of the computing units participating in the adjustment of the working domain after the update is less than 0 or all the computing nodes in the working domain are processed, turning to step 4.3);
4.3) for each working domain of which the number of the calculation units participating in the adjustment in the working domain set is less than 0 and the number of the calculation units which are allocated to the normal operation before the adjustment is equal to 0, selecting one calculation node from the to-be-switched calculation node set according to the switching priority of each calculation node from low to high, marking the calculation node as k, switching the calculation node to the working domains for use, and updating the values of the number of the calculation units participating in the adjustment in the working domains to the number of the calculation units participating in the adjustment in each working domain and ukSimultaneously updating the working domains of the computing node k as the working domains; if the set of computing nodes to be switched is empty, switching to the step 4.5); if the set of computing nodes to be switched is not empty, turning to step 4.4);
4.4) the working domains with the number of the calculation units participating in the adjustment in the working domain set being less than 0 are arranged in a descending order according to the sequence of the calculation resource distribution priority from high to low, and for each working domain, the working domains are switched from the set of the calculation nodes to be switched according to the switching priority of each calculation nodeSequentially selecting a computing node from low to high, recording the node as k, switching the node to the working domains for use, and updating the value of the number of computing units participating in adjustment of the working domains to the number of computing units participating in adjustment of each working domain and ukSimultaneously updating the working domains to which the computing node k belongs to the working domains until the number of computing units participating in adjustment of the working domains is more than or equal to 0 or the computing node set to be switched is empty, and turning to step 4.5);
4.5) for all the activated working domains, counting the number of the actually distributed computing units of each activated working domain according to the working domain information to which each computing node belongs and the number of the contained computing units, judging the number of the actually distributed computing units of each activated working domain and the minimum reserved computing unit number, giving a prompt to the activated working domains which are not distributed with the computing resources with the minimum reserved computing unit number, and entering the step 5).
The invention has the following beneficial effects: based on the reference working periods of all the working domains, the invention combines the real-time activation state of each working domain and the real-time running state of each computing node, and then comprehensively considers the constraints of the shortest computing period, the longest allowed computing period and the like of each working domain, thereby realizing the dynamic allocation of computing resources of the working domains; on the basis, each computing node is subjected to standardization processing according to the computing unit, and distribution relations between the working domains and the computing nodes before adjustment are further combined, so that a computing node optimized switching scheme with the quantity of the computing nodes adjusted as small as possible is finally provided. The invention not only realizes the online dynamic optimization allocation of the computing resources of the multiple working domains based on the activation state of the working domains and the running state of the computing nodes, but also reduces the number of the computing nodes participating in the adjustment as much as possible. Therefore, the invention can consider the influence of the switching of the computing nodes on the activation of the working domain on the basis of realizing the self-adaptive management of the computing resources, and effectively improves the utilization efficiency of the computing resources and the reliability of the operation of the working domain.
Drawings
FIG. 1 is a schematic diagram of data interaction of multiple working domains according to the method of the present invention.
FIG. 2 is a schematic flow chart 1 of one embodiment of the method of the present invention.
FIG. 3 is a schematic flow chart of an embodiment of the method of the present invention, FIG. 2.
Detailed Description
The present invention will be described in further detail with reference to the following examples with reference to the accompanying drawings.
The basic principle of the invention is that: setting a corresponding reference working period based on the requirement of each working domain on computing resources, and taking the reference working period as a computing resource distribution coefficient among different working domains; carrying out standardization processing on each computing node based on various computing resources contained in the computing unit, and determining the number of effective computing units contained in each computing node; according to the real-time activation state of each working domain and the real-time running state of each computing node, comprehensively considering the computing resource allocation priority of each working domain, the constraints such as the shortest computing period and the longest allowed computing period, and the execution time of computing tasks in computing units, and determining the number of computing units pre-allocated to each activation working domain; on the basis, according to the number of effective computing units contained in the computing nodes, and further in combination with the allocation relation between each activated working domain and the computing nodes before adjustment, an optimized switching scheme of the computing nodes is provided on the basis of the principle that the number of the computing nodes participating in adjustment is as small as possible, and the self-adaptive adjustment of computing resources among a plurality of working domains is realized.
Therefore, the invention divides the computer group into a plurality of working domains according to the computing requirements of each application function in the online and offline application scenes of the power system, as shown in fig. 1. Each work domain independently performs calculation task organization, scheduling and management based on a distributed calculation management platform, and interaction of calculation data and control information is not directly performed among the work domains in the calculation process.
The flow of the present invention is shown in fig. 2 and 3. Step 1 in fig. 2 describes that, according to different requirements of a computing task on various computing resources, a minimum combination of the various computing resources required for completing the computing of one computing task is defined as a computing unit, each computing node is standardized according to formula (1) based on the computing resources included in the computing unit, and the computing power of each computing node is measured by a non-negative integer multiple of the computing unit:
Figure BDA0001161602340000091
wherein n is the number of the cluster computing nodes in the system, uiThe number of computing units contained in the ith computing node is a non-negative integer; rcpu(i)、Rmem(i) And Rio(i) The I-th computing node comprises a CPU, a memory and IO computing resources; rcpu 0、Rmem 0And Rio 0Respectively, CPU, memory and IO computational resources that standard computational units have.
Step 2 in fig. 2 illustrates that, based on the demands of each work domain for computing resources, corresponding reference work cycle, computing resource allocation priority, shortest computing cycle, and longest allowed computing cycle parameters are manually set for each work domain, and according to the expected execution time of a single computing task on one computing unit, the shortest computing cycle and the longest allowed computing cycle are converted into the maximum allocation computing unit number and the minimum reservation computing unit number according to formulas (2) and (3), respectively:
Figure BDA0001161602340000092
Figure BDA0001161602340000093
where m is the number of working domains in the system, tmin(j) Is the shortest calculation period of the jth work domain, tmax(j) Is the longest allowed calculation period, f, of the jth work domainjNumber of application functions, s, to be run for jth work domainjkThe number of computing tasks for the kth application function running for the jth work domain,
Figure BDA0001161602340000094
an execution time is predicted on one computing unit for a single computing task of the kth application function,
Figure BDA0001161602340000095
is the sign of the rounding up operation.
cmax(j) The number of the maximum distribution computing units of the jth working domain is represented by a non-negative integer; c. Cmin(j) Is a non-negative integer and represents the minimum reserved calculation unit number of the j-th work domain.
Step 3 in fig. 2 illustrates that, when the activation state of any working domain or the running state of any computing node changes, the following processing is performed:
3.1) counting the sum of the number of the computing units contained in all the computing nodes in the normal running state as the number c of the computing units to be distributeduInitial value of (1), number r of pre-allocated computing units of each work domainjInitialization is 0; counting the sum of the minimum reserved calculation units of all the current activated working domains
Figure BDA0001161602340000101
If it is
Figure BDA0001161602340000102
Turning to step 3.2); otherwise, turning to the step 3.8); where m is the number of working fields in the system, djThe activation state of the jth work domain is that the work domain is activated when the value is 1, and the work domain is not activated when the value is 0.
3.2) performing descending arrangement on each working domain in the activated state according to the distribution priority of the computing resources from high to low, selecting the working domain with the top order as the working domain to be pre-distributed, and entering the step 3.3);
3.3) solving the number of pre-allocation calculation units of the working domain to be pre-allocated according to the formula (4):
Figure BDA0001161602340000103
wherein r isjCalculating the number of units for the pre-allocation of the jth work domain; c. CiFor the operating state of the ith computing node, ciA value of 1 representsThe compute node is operating normally and a value of 0 indicates the compute node is faulty ηjThe reference working period set for the jth working domain is a calculation resource distribution coefficient among different working domains;
Figure BDA0001161602340000104
is a rounded-down operation sign;
rjthe value of (4) is a non-negative integer, and since one calculation unit can only be allocated to one working domain for use at the same time, the calculation result of the formula (4) is subjected to rounding and calculation.
3.4) determination of rjWhether the constraint of the maximum allocation calculation unit number of the working domain defined by the formula (5) is satisfied, if r isjGreater than the maximum number of allocated computing units c of the working domainmax(j) Then r isjValue of cmax(j);
rj≤cmax(j)(1≤j≤m) (5)
3.5) determination of rjWhether or not the number of calculation units to be allocated c is less than or equal touIf r isjC is less than or equal touThen update cuIs cu-rj(ii) a If rjGreater than cuThen r is updatedjIs cuUpdate cuIs 0;
3.6) determination of cuIf the number of the working domains is 0, entering the step 4 if the number of the working domains is 0, otherwise, transferring the working domain which is arranged behind the working domain to be pre-allocated at the time as a new working domain to be pre-allocated to the step 3.3) to pre-allocate the computing resources of the next activated working domain until the pre-allocation of the computing resources of all the activated working domains is completed;
3.7) the number c of the computing units to be allocated after the pre-allocation of the computing resources of all the activated working domains is finisheduIf the number of the computing units is still larger than 0, sequentially allocating one computing unit to each activated working domain according to the computing resource allocation priority of each working domain from high to low for all activated working domains which do not reach the maximum allocation computing unit number constraint until all the computing units are allocated to the working domains or all the computing units allocated to the activated working domains reach the maximum allocation computing unit number; entering the step4;
3.8) for all the activated working domains, pre-distributing the computing units to all the activated working domains in sequence according to the minimum reserved computing unit number of each activated working domain from high to low according to the computing resource distribution priority of each activated working domain until all the computing units are distributed; step 4 is entered.
Step 4 in fig. 3 describes that, according to the number of pre-allocated computing units of each working domain and the number of computing units allocated before adjustment, the working domain to which each computing node belongs after adjustment is determined based on the principle that the adjusted number of computing nodes is as small as possible, and the specific processing is as follows:
4.1) for all the working domains, the number r of computing units pre-allocated according to each working domainjAnd the number p of computing units allocated to each working domain before adjustmentjThe number of calculation units Δ s participating in the adjustment for each work field is calculated based on the formula (7)j(j is more than or equal to 1 and less than or equal to m), and turning to the step 4.2);
Δsj=pj-rj(1≤j≤m) (7)
in the formula, pjCalculating the number of units for normal operation distributed before the adjustment of the work domain j; Δ sjAnd the number of the computing units participating in switching adjustment after the optimal allocation for the working domain j is larger than 0, which indicates that the working domain j is switched to the other working domains for use, and the value of which is smaller than 0 indicates that the other working domain switching computing units are switched to the working domain j for use.
4.2) vs. Δ s in the set of work domainsjEach working domain j which is larger than 0 sequentially selects one computing node in a normal running state from high to low according to the switching priority of each computing node in the working domain, records the computing node as k, and updates the delta sjIs Δ sj-ukIf Δ sjIf the number of the computing nodes k is more than or equal to 0, adding the computing node k into the computing node set to be switched, and continuously selecting the next computing node in the working domain in the normal operation state; if Δ sjIf the number is less than 0 or all the computing nodes in the working domain complete the processing, turning to step 4.3);
4.3) vs. Δ s in the set of work domainsjLess than 0 and pjIs equal to each of 0A working domain j, selecting a computing node from the computing node set to be switched according to the switching priority of each computing node from low to high sequence, marking as k, switching to the working domain j for use, and updating delta sjIs Δ sj+ukAnd simultaneously updating the work domain of the computing node k to be j. If the set of computing nodes to be switched is empty, switching to the step 4.5); if the set of computing nodes to be switched is not empty, turning to step 4.4);
4.4) pairing Δ s in the working Domain setjThe working domains smaller than 0 are arranged in descending order according to the sequence of the distribution priority of the computing resources from high to low, for each working domain j, one computing node is sequentially selected from the computing node set to be switched according to the switching priority of each computing node from low to high, the computing node is marked as k, the computing node is switched to the working domain j for use, and the delta s is updatedjIs Δ sj+ukAnd simultaneously updating the work domains of the selected computing nodes to be j until delta sjGreater than or equal to 0 or the set of computing nodes to be switched is empty. Turning to step 4.5);
4.5) for all the activated work domains, counting the number of the calculation units actually distributed by each activated work domain according to the work domain information to which each calculation node belongs and the number of the included calculation units, and updating to rjAnd then, carrying out constraint check on the minimum reserved computing unit number of each activation working domain according to a formula (6), giving a prompt to the activation working domain which is not allocated with the computing resource with the minimum reserved computing unit number, and entering the step 5.
rj≥cmin(j)(1≤j≤m) (6)
Step 5 in fig. 3 describes that, for all the computing nodes, the distributed computing management platform on each computing node senses the change information of the working domain to which it belongs, switches from the original working domain to the adjusted working domain, and completes the adaptive adjustment of the computing resources among the plurality of working domains.
Although the present invention has been described in terms of the preferred embodiment, it is not intended that the invention be limited to the embodiment. Any equivalent changes or modifications made without departing from the spirit and scope of the present invention also belong to the protection scope of the present invention. The scope of the invention should therefore be determined with reference to the appended claims.

Claims (3)

1. A self-adaptive adjustment method for multi-working-domain computing resources is characterized by comprising the following steps:
1) defining the minimum combination of various computing resources required for completing the computing of one computing task as a computing unit according to different requirements of the computing task on various computing resources, carrying out standardized processing on each computing node according to the computing resources contained in the computing unit, measuring the computing capacity of each computing node through non-negative integer multiples of the computing unit, and entering step 2);
2) aiming at the computing requirements of each application function in the online and offline application scenes of the power system, a computer group is divided into a plurality of working domains, each working domain independently organizes, schedules and manages computing tasks based on a distributed computing management platform, and the interaction of computing data and control information is not directly performed between the working domains in the computing process;
according to the importance levels of all the working domains and the requirements on the calculation time, uniformly setting parameters of a reference working period, a calculation resource allocation priority, a shortest calculation period and a longest allowable calculation period for the working domains, and converting the shortest calculation period and the longest allowable calculation period into the maximum allocation calculation unit number and the minimum reservation calculation unit number according to the expected execution time of a single calculation task on one calculation unit;
when the activation state of any working domain or the running state of any computing node changes, turning to step 3);
3) calculating the number of computing units pre-distributed by each activated working domain based on the reference working cycle, the computing resource distribution priority, the shortest computing cycle and the longest allowed computing cycle parameters set by each activated working domain as well as the currently activated working domain and the normally operated computing nodes; if the total number of the currently available computing units is less than or equal to the sum of the minimum reserved computing units of all the currently activated working domains, sequentially performing computing unit pre-allocation according to the minimum reserved computing unit number according to the sequence from high to low of the computing resource allocation priority of each activated working domain until all the computing units are allocated;
4) aiming at all the working domains which are pre-distributed, calculating the number of normal operation calculation units which participate in the regulation of each working domain based on the number of calculation units pre-distributed by each working domain and the number of normal operation calculation units distributed by each working domain before the regulation, and determining the working domain to which each calculation node belongs after the regulation by combining the switching priority of each calculation node and the number of calculation units contained by each calculation node based on the principle that the number of calculation nodes participating in the regulation is as small as possible; and a prompt is given to an activation work domain which is not allocated to the computing resource with the minimum reserved computing unit number;
5) the distributed computing management platform on each computing node senses the working domain change information, switches from the original working domain to the adjusted working domain, and completes the self-adaptive adjustment of the computing nodes among a plurality of working domains;
in the step 1), each computing node performs computing unit standardization processing according to a formula (1) according to computing resources of the computing node, and determines the number of effective computing units contained in each computing node, so as to measure the computing capacity of the computing node:
Figure FDA0002449180750000021
wherein n is the number of the cluster computing nodes in the system, uiThe number of computing units contained in the ith computing node is a non-negative integer;
Rcpu(i)、Rmem(i) and Rio(i) The I-th computing node comprises a CPU, a memory and IO computing resources;
Rcpu 0、Rmem 0and Rio 0CPU, memory and IO computing resources of the standard computing unit are respectively provided;
in the step 2), the shortest calculation cycle and the longest allowable calculation cycle of each work domain are converted into the maximum allocation calculation unit number and the minimum reservation calculation unit number respectively through formulas (2) and (3):
Figure FDA0002449180750000022
Figure FDA0002449180750000023
where m is the number of working domains in the system, tmin(j) Is the shortest calculation period of the jth work domain, tmax(j) Is the longest allowed calculation period, f, of the jth work domainjNumber of application functions, S, to be run for jth work domainjkThe number of computing tasks for the kth application function running for the jth work domain,
Figure FDA0002449180750000024
an execution time is predicted on one computing unit for a single computing task of the kth application function,
Figure FDA0002449180750000025
is a sign of an upward rounding operation;
cmax(j) the number of the maximum distribution computing units of the jth working domain is represented by a non-negative integer; c. Cmin(j) Is a non-negative integer and represents the minimum reserved calculation unit number of the j-th work domain.
2. The adaptive adjustment method for multi-working-domain computing resources of claim 1, wherein the step 3) specifically comprises the following steps:
3.1) taking the sum of the number of the computing units contained in all the normal operation computing nodes as the number c of the computing units to be distributeduThe number of pre-allocation computing units of each working domain is initialized to 0; counting the sum of the minimum reserved calculation units of all the currently activated working domains
Figure FDA0002449180750000031
Where m is the number of working domains in the system,djthe activation state of the jth working domain is that the working domain is activated when the value of the activation state is 1, and the working domain is not activated when the value of the activation state is 0;
if it is
Figure FDA0002449180750000032
Turning to step 3.2); otherwise, turning to the step 3.8);
3.2) performing descending arrangement on each working domain in the activated state according to the distribution priority of the computing resources from high to low, selecting the working domain with the top order as the working domain to be pre-distributed, and entering the step 3.3);
3.3) solving the number of pre-allocation calculation units of the working domain to be pre-allocated according to the formula (4):
Figure FDA0002449180750000033
wherein r isjCalculating the number of units for the pre-allocation of the jth work domain; c. CiFor the operating state of the ith computing node, ciA value of 1 indicates that the compute node is functioning properly and a value of 0 indicates that the compute node is malfunctioning ηjThe reference working period set for the jth working domain is a calculation resource distribution coefficient among different working domains;
Figure FDA0002449180750000034
is a rounded-down operation sign;
3.4) if the number of the pre-distribution computing units of the working domain to be pre-distributed is larger than the maximum number of the distribution computing units of the working domain, updating the value of the number of the pre-distribution computing units of the working domain to be pre-distributed to the maximum number of the distribution computing units of the working domain;
3.5) judging whether the number of the pre-distribution computing units of the working domain to be pre-distributed is less than or equal to the quantity c of the computing units to be distributeduIf it is less than or equal to cuThen c will beuIs updated to cuSubtracting the pre-distribution calculating unit number of the working domain to be pre-distributed, otherwise, calculating the pre-distribution calculating unit number of the working domain to be pre-distributedIs updated to cuAnd update cuIs 0;
3.6) determination of cuIf the number is 0, entering the step 4) if the number is 0, otherwise, transferring the working domain which is arranged behind the working domain to be pre-allocated at this time as a new working domain to be pre-allocated to the step 3.3) to pre-allocate the computing resources of the next activated working domain until the pre-allocation of the computing resources of all the activated working domains is completed;
3.7) the number c of the computing units to be allocated after the pre-allocation of the computing resources of all the activated working domains is finisheduIf the number of the computing units is still larger than 0, sequentially allocating one computing unit to each activated working domain according to the computing resource allocation priority of each working domain from high to low for all activated working domains which do not reach the maximum allocation computing unit number constraint until all the computing units are allocated to the working domains or all the computing units allocated to the activated working domains reach the maximum allocation computing unit number; entering step 4);
3.8) for all the activated working domains, pre-distributing the computing units to all the activated working domains in sequence according to the minimum reserved computing unit number of each activated working domain from high to low according to the computing resource distribution priority of each activated working domain until all the computing units are distributed; step 4) is entered.
3. The adaptive adjustment method for multi-working-domain computing resources of claim 1, wherein the step 4) specifically comprises the following steps:
4.1) calculating the number of the calculation units participating in the adjustment of each working domain based on the formula (7) according to the number of the calculation units pre-allocated to each working domain and the number of the calculation units allocated to each working domain before the adjustment, aiming at all the working domains:
Δsj=pj-rj(1≤j≤m) (7)
wherein p isjCalculating the number of units for normal operation distributed before the adjustment of the jth working domain; r isjCalculating the number of units for the pre-allocation of the jth work domain; Δ sjThe number of the computing units participating in the switching adjustment after the optimization allocation for the jth working domain is greater than0 represents that the jth working domain switching computing unit is used by other working domains, and the value of the 0 is less than that of the other working domain switching computing units;
4.2) for each working domain with the number of the calculation units participating in adjustment in the working domain set larger than 0, sequentially selecting one calculation node in a normal operation state from high to low according to the switching priority of each calculation node in the working domain, marking as k, and updating the value of the number of the calculation units participating in adjustment in the working domain to be the number of the calculation units participating in adjustment and ukA difference of (d); if the number of the computing units participating in the adjustment of the working domain after the update is greater than or equal to 0, adding the computing node k into the computing node set to be switched, continuing to select the next computing node in the working domain in the normal operation state, and if the number of the computing units participating in the adjustment of the working domain after the update is less than 0 or all the computing nodes in the working domain are processed, turning to step 4.3);
4.3) for each working domain of which the number of the calculation units participating in the adjustment in the working domain set is less than 0 and the number of the calculation units which are allocated to the normal operation before the adjustment is equal to 0, selecting one calculation node from the to-be-switched calculation node set according to the switching priority of each calculation node from low to high, marking the calculation node as k, switching the calculation node to the working domains for use, and updating the values of the number of the calculation units participating in the adjustment in the working domains to the number of the calculation units participating in the adjustment in each working domain and ukSimultaneously updating the working domains of the computing node k as the working domains; if the set of computing nodes to be switched is empty, switching to the step 4.5); if the set of computing nodes to be switched is not empty, turning to step 4.4);
4.4) the working domains with the number of the calculation units participating in the adjustment in the working domain set being less than 0 are arranged in a descending order according to the sequence of the calculation resource allocation priority from high to low, for each working domain, one calculation node is sequentially selected from the to-be-switched calculation node set according to the sequence of the switching priority of each calculation node from low to high, is recorded as k, is switched to be used in the working domains, and the value of the number of the calculation units participating in the adjustment in the working domains is updated to be the number of the calculation units participating in the adjustment in each working domainAnd ukSimultaneously updating the working domains to which the computing node k belongs to the working domains until the number of computing units participating in adjustment of the working domains is more than or equal to 0 or the computing node set to be switched is empty, and turning to step 4.5);
4.5) for all the activated working domains, counting the number of the actually distributed computing units of each activated working domain according to the working domain information to which each computing node belongs and the number of the contained computing units, judging the number of the actually distributed computing units of each activated working domain and the minimum reserved computing unit number, giving a prompt to the activated working domains which are not distributed with the computing resources with the minimum reserved computing unit number, and entering the step 5).
CN201611048286.6A 2016-11-25 2016-11-25 Self-adaptive adjustment method for multi-working-domain computing resources Active CN106708624B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611048286.6A CN106708624B (en) 2016-11-25 2016-11-25 Self-adaptive adjustment method for multi-working-domain computing resources

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611048286.6A CN106708624B (en) 2016-11-25 2016-11-25 Self-adaptive adjustment method for multi-working-domain computing resources

Publications (2)

Publication Number Publication Date
CN106708624A CN106708624A (en) 2017-05-24
CN106708624B true CN106708624B (en) 2020-08-11

Family

ID=58934953

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611048286.6A Active CN106708624B (en) 2016-11-25 2016-11-25 Self-adaptive adjustment method for multi-working-domain computing resources

Country Status (1)

Country Link
CN (1) CN106708624B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110738322B (en) * 2018-07-03 2023-06-02 杭州海康威视数字技术股份有限公司 Distributed training method, device, equipment and system
CN112988372B (en) * 2019-12-16 2023-10-24 杭州海康威视数字技术股份有限公司 Method and device for determining allocation mode of hardware operation platform
CN111753997B (en) * 2020-06-28 2021-08-27 北京百度网讯科技有限公司 Distributed training method, system, device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7900206B1 (en) * 2004-03-31 2011-03-01 Symantec Operating Corporation Information technology process workflow for data centers
CN102325054A (en) * 2011-10-18 2012-01-18 国网电力科学研究院 Self-adaptive adjusting method for hierarchy management of distributed type calculation management platform cluster
CN102063336B (en) * 2011-01-12 2013-02-27 国网电力科学研究院 Distributed computing multiple application function asynchronous concurrent scheduling method
CN104598318A (en) * 2014-12-30 2015-05-06 北京奇艺世纪科技有限公司 Node calculating capability reporting method and calculating node

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060195578A1 (en) * 2005-02-28 2006-08-31 Fujitsu Limited Resource allocation method for network area and allocation program therefor, and network system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7900206B1 (en) * 2004-03-31 2011-03-01 Symantec Operating Corporation Information technology process workflow for data centers
CN102063336B (en) * 2011-01-12 2013-02-27 国网电力科学研究院 Distributed computing multiple application function asynchronous concurrent scheduling method
CN102325054A (en) * 2011-10-18 2012-01-18 国网电力科学研究院 Self-adaptive adjusting method for hierarchy management of distributed type calculation management platform cluster
CN104598318A (en) * 2014-12-30 2015-05-06 北京奇艺世纪科技有限公司 Node calculating capability reporting method and calculating node

Also Published As

Publication number Publication date
CN106708624A (en) 2017-05-24

Similar Documents

Publication Publication Date Title
CN107580023B (en) Stream processing job scheduling method and system for dynamically adjusting task allocation
CN107038069B (en) Dynamic label matching DLMS scheduling method under Hadoop platform
EP2633403B1 (en) System and method of active risk management to reduce job de-scheduling probability in computer clusters
US8782246B2 (en) Optimized multi-component co-allocation scheduling with advanced reservations for data transfers and distributed jobs
Boutin et al. Apollo: Scalable and coordinated scheduling for {Cloud-Scale} computing
CN102063336B (en) Distributed computing multiple application function asynchronous concurrent scheduling method
CN104391737B (en) The optimization method of load balance in cloud platform
US8060610B1 (en) Multiple server workload management using instant capacity processors
US10333859B2 (en) Multi-tenant resource coordination method
CN108845874B (en) Dynamic resource allocation method and server
CN110888714B (en) Scheduling method, scheduling device and computer readable storage medium for containers
US11216059B2 (en) Dynamic tiering of datacenter power for workloads
CN110287245A (en) Method and system for scheduling and executing distributed ETL (extract transform load) tasks
CN104407926B (en) A kind of dispatching method of cloud computing resources
CN106708624B (en) Self-adaptive adjustment method for multi-working-domain computing resources
US8024542B1 (en) Allocating background workflows in a data storage system using historical data
CN104462432A (en) Self-adaptive distributed computing method
US8539495B2 (en) Recording medium storing therein a dynamic job scheduling program, job scheduling apparatus, and job scheduling method
Hu et al. FlowTime: Dynamic scheduling of deadline-aware workflows and ad-hoc jobs
CN103389791A (en) Power control method and device for data system
Cheng et al. Improving fair scheduling performance on hadoop
CN114090201A (en) Resource scheduling method, device, equipment and storage medium
CN112256418A (en) Big data task scheduling method
CN115098023B (en) Array memory and memory unit control method
Zhang et al. Design of grid resource management system based on divided min-min scheduling algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant