EP3123329A1 - Resource utilisation control - Google Patents
Resource utilisation controlInfo
- Publication number
- EP3123329A1 EP3123329A1 EP15714256.3A EP15714256A EP3123329A1 EP 3123329 A1 EP3123329 A1 EP 3123329A1 EP 15714256 A EP15714256 A EP 15714256A EP 3123329 A1 EP3123329 A1 EP 3123329A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- group
- utilisation
- resource
- nodes
- agent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
- 238000000034 method Methods 0.000 claims abstract description 57
- 230000008569 process Effects 0.000 claims description 26
- 238000004891 communication Methods 0.000 claims description 11
- 230000011664 signaling Effects 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 9
- 238000012360 testing method Methods 0.000 claims description 7
- 239000003795 chemical substances by application Substances 0.000 description 87
- 238000012546 transfer Methods 0.000 description 10
- 238000012545 processing Methods 0.000 description 9
- 238000005295 random walk Methods 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 244000287680 Garcinia dulcis Species 0.000 description 3
- 230000004913 activation Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000015654 memory Effects 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 230000007115 recruitment Effects 0.000 description 2
- PCTMTFRHKVHKIS-BMFZQQSSSA-N (1s,3r,4e,6e,8e,10e,12e,14e,16e,18s,19r,20r,21s,25r,27r,30r,31r,33s,35r,37s,38r)-3-[(2r,3s,4s,5s,6r)-4-amino-3,5-dihydroxy-6-methyloxan-2-yl]oxy-19,25,27,30,31,33,35,37-octahydroxy-18,20,21-trimethyl-23-oxo-22,39-dioxabicyclo[33.3.1]nonatriaconta-4,6,8,10 Chemical compound C1C=C2C[C@@H](OS(O)(=O)=O)CC[C@]2(C)[C@@H]2[C@@H]1[C@@H]1CC[C@H]([C@H](C)CCCC(C)C)[C@@]1(C)CC2.O[C@H]1[C@@H](N)[C@H](O)[C@@H](C)O[C@H]1O[C@H]1/C=C/C=C/C=C/C=C/C=C/C=C/C=C/[C@H](C)[C@@H](O)[C@@H](C)[C@H](C)OC(=O)C[C@H](O)C[C@H](O)CC[C@@H](O)[C@H](O)C[C@H](O)C[C@](O)(C[C@H](O)[C@H]2C(O)=O)O[C@H]2C1 PCTMTFRHKVHKIS-BMFZQQSSSA-N 0.000 description 1
- 206010033799 Paralysis Diseases 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 230000008844 regulatory mechanism Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/329—Power saving characterised by the action undertaken by task scheduling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3442—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for planning or managing the needed capacity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5094—Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5011—Pool
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/504—Resource capping
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present invention relates to resource access control and specifically to a decentralised method for maintaining the total utilisation of a resource across a population of distributed consumers under a specific target.
- Maintaining the total utilisation of a resource across a population of distributed consumers under a specific target can be simultaneously useful and difficult to achieve.
- the difficulty comes in many different forms, one of them being random and/or unpredictable fluctuation in demand.
- One way to prevent this situation is through centralised orchestration, with a resource controller prioritising activation of devices so as not to exceed the target while simultaneously trying to meet certain quality of service criteria.
- a first aspect of the present invention discloses a method for operating a distributed system comprising a plurality of resource consumer computer nodes utilising a resource; each resource consumer computer node comprising at least one software agent and said software agents communicating with each other over a communications network; said method comprising: setting a total utilisation cap value/for the resource; signalling said total utilisation cap value to one or more of said software agents over said communications network; forming groups of resource consumer computer nodes and their associated agents; computing a local resource utilisation cap value in each group based on the set total utilisation cap value; and using the local resource utilisation cap value to locally enforce control of utilisation of the resource for the nodes in each group such that the local utilisation cap value is not exceeded.
- a system for decentralised control of utilisation of a resource for which a total utilisation cap value is set comprising: a communications network; a plurality of resource consumer computer nodes arranged in operation to utilise said resource, characterised in that where each resource consumer computer node is associated with one or more software agents; said resource consumer computer nodes arranged in operation to communicate with each other via their associated software agents over said communications network; a module arranged to signal said total utilisation cap value to one or more of said software agents; a group forming module or at least one software agent adapted to form one or more groups of resource consumer computer nodes and their associated software agents; and a local controller in each group adapted to compute a local utilisation cap value for the group of resource consumer computer nodes based on said set total utilisation cap value and to use the local resource utilisation cap value to enforce control of utilisation of the resource for the nodes in the group to the resource such that resource utilisation in the group does not exceed said local utilisation cap value.
- Figure 1 shows a system implementing decentralised access control to a resource.
- Figure 2 shows the system of figure 1 where resource consumers or nodes are grouped into subpopulations.
- Figure 3 shows a flowchart for one exemplary embodiment of a server loop
- Figure 4 shows a flowchart for one exemplary embodiment of a software agent loop.
- Figure 5 shows a flowchart for a general embodiment of the invention.
- Figure 6 shows the components of an exemplary server.
- the servers consume a resource, in this case electricity, delivered from a power network 20.
- the servers 10 are connected to a central server 12 over a network 14.
- the servers 10, 12 are preferably general purpose computers, each server comprising at least an interface 22, processor 23, at least one memory 24, at least one store 25 and a bus 21 over which the different parts of the server communicate with each other.
- Each server is configured with one or more software modules 16 which when loaded from storage 25 into one of the memories 24 and run by the processor 23 adapts each server to perform the different embodiments of the present invention.
- the central server 12 comprises a software module 16 arranged to form groups of nodes as well as a module 16b for defining and signalling global parameters such as a total utilisation cap value for a resource and group forming parameter values if set centrally.
- Each server 10, and optionally also server 12, is paired with at least one software agent 16a, which is part of an overlay signalling network 18.
- Each agent 16a is configured to communicate via interface 22 with other software agents 16a over the overlay network 18, preferably using peer-to-peer communication (P2P).
- P2P peer-to-peer communication
- Each server 10 preferably stores in the store 25 one or more status lists 26 listing the status and identity (id) of the server itself, and status and identity (id) of its associated agents 16a.
- the status for the server 10 can for example be “idle", “loaded” or "busy”.
- the status for the agents 16a can for instance be "available” or "recruited”.
- Consumer systems which do not normally themselves comprise a built-in computer need to be equipped with such features. Examples of such consumers are for example water consumer systems, gas consumer systems etc.
- a total utilisation cap value is set, limiting the number of servers or nodes that can be active or run simultaneously.
- Embodiments of the present invention implement decentralised access control to resources by creating groups of resource consumers, or nodes; calculating a local utilisation cap value in each group based on the set total population cap value and enforcing the localised cap to resource usage in each group such that the total cap value for the resource is not exceeded.
- the total utilisation cap value is signalled, possibly from the central server 12, over the overlay network 18 to one or more, but preferably all, of the software agents 16a and stored in the respective server 10. Agents 16a which received the message can thereafter communicate the total utilisation cap value to any agents that did not initially receive the message.
- a large number of end users require services from the servers 10, however, in accordance with the total utilisation cap only a percentage of the servers are allowed to work at a time in order to save energy, or balance energy supply and demand.
- the number of nodes allowed to be active in a group is determined in the following way. For a group or subpopulation / having n subpopj number of nodes in the group, the number of nodes allowed to be active, or servers allowed to run, is:
- WsubpopJ flOOr( f * risubpopj j)
- max_demand subp0 pj (n subpop j / total_population_size) * Maximum_use
- a local resource utilisation cap value can then be defined, in terms of a local resource consumption level, to be / * max_demand subp0 pj.
- nodes in the distributed system consume fairly the same or different level of resources; the global population of nodes initially have to be divided into groups.
- the nodes or servers can be divided into subpopulations or groups either by using centralised methods or decentralised methods.
- One or more global parameters can be used to assist in both types of group formation, such as the following integer parameters:
- the parameter target_subpop_size specifies a preferred target size of a group;
- the parameter max_subpop_size specifies the maximum number of nodes in a group, and
- the parameter min_subpop_size specifies the minimum size of a group.
- the target size of a group i.e. the parameter target_subpop_size
- Centralised methods for creating groups comprise for example: -Creating a number of empty subpopulation lists or groups and then assigning all nodes randomly across those lists/groups or sequentially by node identifier.
- the number of lists/groups could preferably be determined by dividing the total number of nodes with the defined target group size, target_subpop_size .
- Decentralised methods for forming groups comprise for example:
- P2P peer-to-peer
- each group it is useful to designate a local controller arranged to enforce the local cap and to manage aspects such as membership and transfer of nodes to or from other groups.
- the designation could be by various means such as a winner-takes-all competition between subpopulation members; arbitrarily based on unique node identifiers where for example the node with highest identifier value is designated as the local controller; or always designate the active node with the longest or shortest queue as the local controller.
- the nodes in a group could be arranged to signal each other with a certain frequency to check which node is the local controller and if there is none in the group, because the group has just formed or the server or node of the selected local controller has gone down, then initiate selection of a new local controller using any of the methods described above.
- x ⁇ ffloor(f * (n subp0 p_i +x))>floor(f * n subpop j), where x >1, then seek to add a member, i.e. only seek to add a member to a subpopulation if it would increase the number of nodes that can be active at a time in the group, m subpop j .
- the value of x can be changed.
- test to decide whether to seek to lose or make available for transfer a member of a group might be:
- a group / preferably signals to other groups that it would like to gain a member by maintaining a Boolean state wantmore SU bpo P j which is true if it is seeking to add a member, or it can maintain a list Spare_nodes subpopJ which lists any nodes it is seeking to lose (if any) from its set of
- lnactive_nodes subpop i System efficiency as a whole depends on the efficiencies achieved across all subpopulations or groups, assuming that every node is always a member of a subpopulation. Since subpopulations may differ in size, their local pro-rata caps are also likely to differ, which according to the above tests means that some may be seeking to add members whilst others are seeking to lose members. Matching up pairs of such subpopulations to enable a transfer of a node requires either coordination via the central controller 12 or P2P signalling via subpopulations over the signalling network 18. These mutually beneficial transfers help to increase the efficiency of the system as a whole.
- Transfer of a node from one subpopulation to another requires removal of the node from subpopulation / and addition of the node to subpopulation j. Care must be taken to enforce the respective local caps, and this can be managed by ensuring that a node is only made available for transfer from subpopulation / provided that it is an active node that is deactivated prior to transfer and that the remaining nodes still fulfil the requirement that the number of inactive nodes in the group are ⁇ n subpop j - m subpopJ ). Also, a node is only added to subpopulation j as an initially inactive node, within the updated subpopulation.
- a hard upper limit on the size of subpopulations could be enforced if necessary by only allowing subpopulations to grow by adding a member if n subpopJ ⁇ max_subpop_size.
- This embodiment relates to a power management system for distributed servers 10. Only sixteen distributed servers are shown in the figure, however, the actual number of servers is typically much higher.
- the central server 12 manages the setting of the total power utilisation cap value /as well as the setting of the current group forming parameters.
- the central server 12 signals the servers or rather their associated agents 16a, that the total power utilisation cap value/ is set to 0.5 and the parameter target_subpop_size is set to 4, the parameter max_subpop_size is set to 5 and the parameter min_subpop_size is set to 3.
- the central server 12 also instructs the servers 10 on how to select a local group controller; in this embodiment the rule states that the node with the largest workload should be the local controller.
- the servers 10 In order to be allowed to process their workloads, the servers 10 begin to form groups based on the received parameter values. Initially each server 10a, 10b, 10c and lOd creates a group for itself followed by a merger phase, where the "one server groups" merge to form groups of sizes that conform to the set parameter values. As a result group 101 having four members, group 102 having three member, group 103 having four members and group 104 having five members are formed. Next a local controller 101C, 102C, 103C, 104C is selected in each group. In group 101 server lOd is active and has the largest workload and is hence elected as the local controller 101C.
- Each controller 101C, 102C, 103C and 104C determines a local utilisation cap value, m, for its group 101, 102, 103, and 104 based on the set total utilisation cap value/such that the sum of the power consumption of all the servers in the groups do not exceed the total utilisation cap value/ set for the power utilisation.
- the local controller 101C, 102C, 103C and 104C ensures via local P2P signalling to all other group members that (n subpop j - m subpopJ ) number of servers in its subpopulation are always inactive.
- the local controller should ensure that if the number of servers processing jobs equals the maximum number of servers allowed to process jobs concurrently, one active server is always deactivated before another server is allowed to activate, so that only at most m subpopJ number of servers are active at any given time.
- Each local controller for a group maintains the addresses and/or identities of all members of the group so that it can communicate with them; it also maintains a list of which servers are currently active and which servers are inactive. Table 1 shows an example of such a list maintained by the controller 101C for the group 101 when the total utilisation cap value/ is set to 0.5:
- the list also states the current work load for each node.
- nodes The activation of nodes is here prioritised according to need, e.g. inactive nodes that are loaded with more jobs would be prioritised for activation first.
- the local controller lOlC/lOd would signal the servers 10a and lOd (itself) that they are allowed to process their workload and signal server 10c that it can start to process its workload only when any of the servers 10a and lOd has processed all their jobs and changes into an inactive state.
- group 102 takes over one server from group 104 where after the groups 101, 102, 103 and 104 would all comprise four members of which two are allowed to run simultaneously in each group in order to not exceed the local resource utilisation cap.
- the groups are formed centrally by the central server 12.
- the total number of distributed servers is in this embodiment four hundred, which are all initially inactive.
- the central server 12 generates forty lists with space for ten servers in each list and randomly assigns all servers across these lists; hence, forming forty groups each having ten members.
- the central server signals the agents 16a over the network 18 that the global utilisation cap value/ is set to 0.4 and that the agent associated with the server having the longest queue of jobs should be the local controller. Based on the total utilisation cap value 0.4 the local controller determines that at most four servers in the group may run at a time, ranks all the servers according to their workload, and instructs itself and the other three highest ranked servers to change to an active state and to process their workload.
- the server stores a list comprising the identity, address, current status and load for each member in the group.
- the list is continuously updated to keep track of the status and workload of each member.
- This embodiment uses an alternative way of specifying the total utilisation cap value/ in terms of a rational number p/q, representing the maximum fraction of consumers allowed to be active at a time.
- the embodiment is focused on a special case where a subpopulation is defined via a
- each server 10 is paired with one or more software agents 16a and linked to a tuneable number of other servers 10 in the virtual or overlay network 18 of arbitrary topology.
- the local utilisation cap is equal to the total utilisation cap value and hence the agent determines that the local cap value, here defined as a ratio, is also 1 ⁇ 4 and stores this value as well.
- Each server 10 stores in the store one or more status list listing the status and identity (id) of the server itself, and status and identity (id) of its associated agents 16a.
- each server has a single associated agent, but in fact the flowcharts shown in figures 3 and 4 can similarly be applied to a case where server can have more than one agent.
- the status for the server 10 is idle, loaded or busy. Loaded indicates that the server has a queue of jobs waiting to be processed by the server. The length of the queue, or number of jobs waiting, is preferably also indicated.
- the status for the agent is either available or recruited. If the agent of a server has been asked, or recruited, by another agent to form a group with this other agent, the identity of this agent would be indicated in the table as well. See Table 2. Id Status No. jobs in recruited by
- Each server also stores a list with the identity of other agents/servers which the agent has recrui and formed a group or subpopulation with, see Table 3.
- Table 3 A server loop for a server will now be described in relation to figure 3.
- a server's job queue is empty, its agent remains dormant or idle and in a "wait" state [300] for a time t.
- the server checks if the job queue is still empty, [301]. If empty the status for the server is set as idle, [302] and the server releases any recruits earlier recruited by the agent, [303]. If the job queue is not empty the server checks whether the size n of the workforce assembled by the agent, i.e. number of recruits plus available agents, exceeds a threshold, [304]. This threshold can be defined as n ⁇ ceiling(n 0 *q/p), where n 0 is the number of agents per server.
- the server checks its status [306] and if loaded instructs the agent to perform a random walk in the virtual network, "visiting" other servers 10, [307]. If the agent paired with the "visited” server has the status "available”, this information is maintained by the visited server, the agent is recruited and its ID and the ID of its paired server is added to the list of recruits for the "visiting" agent's "home” server. The state of the recruited agent is changed from “available" to "recruited”. The server loop then returns to the waiting state, [300]. The server then repeats the applicable steps 30 to 37 until the workforce assembled by the agent meets or exceeds the set threshold, thus ending its random walk; where after the status of the server is changed to "busy", [308].
- the server checks if the assembled workforce is above target, [309], and if yes it releases at least one recruit, [310] before processing the job. If the workforce matches the target work force the server processes the job directly,[312], where after it returns to the wait state [300].
- This process forms the basis for the distributed control algorithm and the forming of groups in this specific embodiment: in the 25% target example, in order to be allowed to process a job, a server needs to have secured three recruits plus its own, or "resident", agent which are no longer available to other servers, guaranteeing that the local utilisation ratio 1/4 is respected.
- “Home” always refers to the server with which the agent was paired at initialisation.
- “Host” always refers to the server that the agent is currently investigating. Note that the agent doesn't need to be physically located on the "host”, the term rather designates its logical position in the overlay network.
- the agent checks the state of its "home” server, which as stated earlier can be in three states: “idle” -there are no jobs in the queue, "busy” - the server is processing jobs, or "loaded” -the queue is not empty but no processing is taking place, because the server doesn't have enough recruits. If the "home” is "idle” or "busy”, the agent has nothing to do.
- the agent If the "home" is "loaded”, it means that recruits are needed and the agent must search for them. Before the agent start to search for recruits it checks its own status, namely if it is recruited, [401]. If it is already recruited it returns to the wait state [400]. If the agent is not recruited it begins a random walk in the overlay network 18, one hop per time-step, whereby the agent moves from one "host” to the next, [402]. In order to avoid back-tracking the agent keeps a record of hosts that have already been visited during the course of the current random walk. If it finds itself trapped, that is all the neighbours of its current "host” have already been visited, the random walk is reset with the agent's "home” as its first "host” and the record is erased, i.e. the process starts over.
- the agent Upon arrival on a new "host", the agent first checks if the "host” is busy, [403]. If it is, then recruiting its paired agent could lead to the data-centre exceeding the global 25% target. If the "host” is busy the agent hence returns to the wait state [400]. If the "host is not busy, the agent checks if the agent with which the "host” is paired is still “available", [404]. If it is available the agent next checks whether the workload at the local "host” is lower or equal to that at "home”, [408]. This is a regulatory mechanism designed to give advantage, i.e. a higher priority, to those servers that have accumulated the longest backlog.
- the local agent's state is turned from “available” into “recruited” and it is logically transferred, [409].
- the visiting agent is attempting a secondary recruitment procedure that consists in trying to "capture” one of its "host's” recruits.
- the agent therefore checks that the "host” has at least one recruit, [405].
- the agent checks if the number of recruits at the "host” is lower or equal to that at the agent's "home", [406], and if so it hijacks one recruit from the "host", [407], before returning to the wait state [400].
- the hijacked agent's status remains the same but it is transferred from the "host's" list of recruited agents to the "home” servers list of recruited agents.
- a further embodiment of the invention will now be described with reference to figure 5.
- This embodiment could apply to several scenarios. For example, a large fleet of consumer devices could require an overnight software update and it is required that the total demand on the backend server infrastructure remains below a certain target.
- each customer device is deployed with a client or agent configured to cause de-synchronisation of server connection attempts by forcing each device to recruit a target number of counterparts before being allowed to place an update request.
- a server in the backend server infrastructure is in this embodiment responsible for setting the total utilisation cap for the backend servers as well as setting the group forming parameters. The server thereafter signals the total resource utilisation cap value and the parameters to the clients in the user devices.
- target_subpop_size ⁇ min_subpop_size
- n SU bpop_i > target_subpop_size .
- the local controller creates a list of all nodes within the group, ranked from longest job queue to shortest job queue.
- msubpopj floor( f * n subpop _i).
- the local controller defines the Active_nodes subpop i to contain the first msubpopj nodes from the prioritised list, and lnactive_nodes subpop i to contain the remaining nodes from that list.
- the local controller For each group, the local controller signals all I 'nactive_nodes subpop i to become inactive and upon confirmation then signals all Active_nodes subpop to become active.
- the local controller updates its ranked list of nodes from longest job queue to shortest job queue according to jobs being completed by nodes and newly arriving jobs. The status for nodes for which all jobs are complete will be changed to 'Inactive'.
- Nodes having the status 'Inactive' for which jobs are waiting may be activated and their status changed to the 'Active' provided that the total number of active nodes would not exceed m subp0 pj , i.e. respecting the local utilisation cap.
- any mutually beneficial node transfers are carried out between groups, coordinated by a central server.
- the local controller determines if it would benefit from gaining or losing one or more nodes from/to another group, by updating a Boolean state wantmore SU b P o P _i and a list Spare_nodes subpopJ which lists any nodes it is seeking to lose (if any) from its set of lnactive_nodes subpopJ .
- step [510] Check if the local controller is about to become inactive, if so the process returns to step [503] where a new local controller is selected, the most loaded node at that time-step, and transfer control to this node. If the local controller is not about to become inactive the process loops back to [507] where the current status of each node is signalled to respective node.
- the nodes in the distributed system consume different amounts of energy, ⁇ , for each node j, i.e. some nodes are less efficient and consume more resources than other nodes.
- ⁇ the pro-rata maximum level of resource demand for a group of such nodes.
- the Maximum_use is the total resource consumption of all the nodes in the distributed system.
- a local resource utilisation cap value can then be defined in terms of a local resource consumption level to be / * max_demand subpopJ,
- a subset of the group's nodes can be chosen to comprise the Active_nodes subpopJ , indicated in the list maintained by each local controller, such that the total resource required for these nodes to be active, i.e. the sum of the ⁇ for all nodes in the list, is less than or equal to/ * max_demand subpopJ,
- m subpopJ is set to be the number of nodes contained in the Active_nodes subpopJ list.
- the remaining (n subpopJ - m subpopJ ) nodes in the group comprise the lnactive_nodes subpop i list.
- Group 101 comprises four server nodes 10a -d, where each server consumes a different amount of energy when processing jobs; three of the nodes consume 10 kWh and one node 20 kWh
- the selected local controller 101C which in this case is node lOd since it has the longest job queue, maintains a list of all the nodes in the group, indicating the identity of the node, if the node is active or inactive, the load at the node and also the level of power consumed by each node, see table 4.
- the local controller 101C there after determines that since it consumes 20kWh for processing the jobs only its own server can be allowed to operate at this time. There after the controller signals server lOd (itself) that it can process its jobs and signals all the other servers that they are to remain in inactive state, otherwise the local utilisation cap value of 27.5kWh would be exceeded. Once server lOd has processed all its jobs, and therefore will change to an inactive state, server 10a is selected as the new local controller 101C. The controller signals server 10a (itself) and server 10c to become active and process their jobs, and signals servers 10b and lOd to be inactive. The list maintained by the new local controller is updated accordingly, see table 5 below.
- the number of servers in the group that can be active at a time varies based on the energy consumption of each server.
- node A has recruited agents Bl, CI and C2 while at the same time its own agents Al and A2 have been recruited by node C.
- Node C has further recruited agent B2 from node B.
- Node B has not recruited any agents.
- the system configuration is: A[B1, CI, C2] + B[] +
- the target ratio or total resource utilisation cap value / may vary over time, for instance to reflect fluctuating availability of a resource, such as for example the output of a solar power plant which will vary throughout the day.
- a resource such as for example the output of a solar power plant which will vary throughout the day.
- the easy approach is to have time-of-day ratios or / values built-in from the start.
- the solar-powered facility is configured to operate on battery between 8pm and 8am and to recruit six agents to achieve a lower p/q ratio during this time in order to be allowed to operate.
- the target could be as low as two, which in the two agents per node scenario sketched above, would effectively mean that the data-centre can work at 100% capacity.
- a more flexible way would be to include a total resource utilisation cap value / or target ratio with every new submission, making it job-specific.
- this approach would require re-introducing a measure of centralised management to assign a ratio to each job and choose which server to send it to, it has other advantages in that it could also be used to assign variable priorities. Indeed, assigning a higher / value to a high-value job would statistically result in it being processed faster, making it possible to support multiple service level agreement (SLA) profiles. For instance, allocating an /value of 0.75 to high priority job A and a value of 0.25 to low priority job B makes it much more likely that the server "in charge" of A will reach its threshold first and start processing. This however is at the expense of a hard guarantee. In the example above, although the average / between jobs A and B is 0.5, only their, likely improbable, simultaneous arrival at the head of the queue of their respective servers would safeguard the global target.
- Exemplary embodiments of the invention are realised, at least in part, by executable computer program code which may be embodied in application program data provided by program modules in the respective nodes 10 and 12.
- executable computer program code When such computer program code is loaded into the memory of each device for execution by the respective processor it provides a computer program code structure which is capable of performing at least part of the methods in accordance with the above described exemplary embodiments of the invention.
- each step of the processes can correspond to at least one line of computer program code and that such, in combination with the processor (CPU) in respective node or server 10, 12, provides apparatuses for effecting the described processes.
- the modules or part of the modules for effecting the described processes can be implemented in hardware or a combination of hardware and software.
- a method and system configured to cap utilisation of a resource to a defined global target value in a decentralised manner is disclosed.
- Groups of resource consumers are formed and a local utilisation cap value is determined for each group based on the global target value.
- a local controller in each group controls access to the resource for each member such that the local utilisation cap value is not exceeded, and hence enforcing that the resources consumed by all the resource consumers in respective group do not exceed the defined global target value.
- the method and system would be very useful in large and dynamic environments such as "virtual" server farms, comprising of a variable number of physical facilities.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Hardware Redundancy (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP14250055 | 2014-03-28 | ||
PCT/GB2015/050862 WO2015145126A1 (en) | 2014-03-28 | 2015-03-24 | Resource utilisation control |
Publications (1)
Publication Number | Publication Date |
---|---|
EP3123329A1 true EP3123329A1 (en) | 2017-02-01 |
Family
ID=50628727
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP15714256.3A Ceased EP3123329A1 (en) | 2014-03-28 | 2015-03-24 | Resource utilisation control |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP3123329A1 (en) |
WO (1) | WO2015145126A1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100191994A1 (en) * | 2009-01-29 | 2010-07-29 | Nokia Corporation | Method and apparatus for controlling energy consumption during resource sharing |
US20100205469A1 (en) * | 2009-02-06 | 2010-08-12 | Mccarthy Clifford A | Power budgeting for a group of computer systems |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8037329B2 (en) * | 2007-01-31 | 2011-10-11 | Hewlett-Packard Development Company, L.P. | Systems and methods for determining power consumption profiles for resource users and using the profiles for resource allocation |
KR100990412B1 (en) * | 2009-10-29 | 2010-10-29 | 주식회사 팀스톤 | Computer server capable of supporting cpu virtualization |
US8631253B2 (en) * | 2010-08-17 | 2014-01-14 | Red Hat Israel, Ltd. | Manager and host-based integrated power saving policy in virtualization systems |
-
2015
- 2015-03-24 WO PCT/GB2015/050862 patent/WO2015145126A1/en active Application Filing
- 2015-03-24 EP EP15714256.3A patent/EP3123329A1/en not_active Ceased
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100191994A1 (en) * | 2009-01-29 | 2010-07-29 | Nokia Corporation | Method and apparatus for controlling energy consumption during resource sharing |
US20100205469A1 (en) * | 2009-02-06 | 2010-08-12 | Mccarthy Clifford A | Power budgeting for a group of computer systems |
Non-Patent Citations (1)
Title |
---|
See also references of WO2015145126A1 * |
Also Published As
Publication number | Publication date |
---|---|
WO2015145126A1 (en) | 2015-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sharma et al. | A survey of job scheduling and resource management in grid computing | |
Kaur et al. | A systematic review on task scheduling in Fog computing: Taxonomy, tools, challenges, and future directions | |
Guo et al. | Improving mapreduce performance in heterogeneous network environments and resource utilization | |
Chakraborty et al. | Intelligent Latency-aware tasks prioritization and offloading strategy in Distributed Fog-Cloud of Things | |
CN109564528B (en) | System and method for computing resource allocation in distributed computing | |
Yanggratoke et al. | Gossip-based resource allocation for green computing in large clouds | |
Wided et al. | Load balancing with Job Migration Algorithm for improving performance on grid computing: Experimental Results | |
Cao | Self-organizing agents for grid load balancing | |
Garala et al. | A performance analysis of load Balancing algorithms in Cloud environment | |
Srivastava et al. | CGP: Cluster-based gossip protocol for dynamic resource environment in cloud | |
Hu et al. | Requirement-aware scheduling of bag-of-tasks applications on grids with dynamic resilience | |
Wu et al. | ABP scheduler: Speeding up service spread in docker swarm | |
CN112860442A (en) | Resource quota adjusting method and device, computer equipment and storage medium | |
CN117149382A (en) | Virtual machine scheduling method, device, computer equipment and storage medium | |
SM et al. | Priority based resource allocation and demand based pricing model in peer-to-peer clouds | |
Wagner et al. | Autonomous, collaborative control for resilient cyber defense (ACCORD) | |
CN114265676B (en) | Cluster resource scheduling method, device, equipment and medium | |
CN115629854A (en) | Distributed task scheduling method, system, electronic device and storage medium | |
EP3123329A1 (en) | Resource utilisation control | |
CN111556126B (en) | Model management method, system, computer device and storage medium | |
Naaz et al. | Load balancing algorithms for peer to peer and client server distributed environments | |
Guo et al. | A data distribution aware task scheduling strategy for mapreduce system | |
Nehra et al. | Towards dynamic load balancing in heterogeneous cluster using mobile agent | |
Wei et al. | A novel scheduling mechanism for hybrid cloud systems | |
Mour et al. | Load management model for cloud computing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20160926 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: SAFFRE, FABRICE Inventor name: SHACKLETON, MARK |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: BRITISH TELECOMMUNICATIONS PUBLIC LIMITED COMPANY |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
17Q | First examination report despatched |
Effective date: 20181108 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R003 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED |
|
18R | Application refused |
Effective date: 20201204 |