EP3123329A1

EP3123329A1 - Resource utilisation control

Info

Publication number: EP3123329A1
Application number: EP15714256.3A
Authority: EP
Inventors: Mark Shackleton; Fabrice Saffre
Original assignee: British Telecommunications PLC
Current assignee: British Telecommunications PLC
Priority date: 2014-03-28
Filing date: 2015-03-24
Publication date: 2017-02-01
Also published as: WO2015145126A1

Abstract

A method and system configured to cap utilisation of a resource to a defined global target value in a decentralised manner. Groups, 101 - 104, of resource consumers 10 are formed and a local utilisation cap value is determined for each group based on the global target value. A local controller, 101C - 104C, in each group controls access to the resource for each member such that the local utilisation cap value is not exceeded, and hence enforcing that the resources consumed by all the resource consumers 10 in respective group do not exceed the defined global target value.

Description

Resource utilisation control

Field of invention

The present invention relates to resource access control and specifically to a decentralised method for maintaining the total utilisation of a resource across a population of distributed consumers under a specific target.

Background of invention

Maintaining the total utilisation of a resource across a population of distributed consumers under a specific target can be simultaneously useful and difficult to achieve. The difficulty comes in many different forms, one of them being random and/or unpredictable fluctuation in demand. One way to prevent this situation is through centralised orchestration, with a resource controller prioritising activation of devices so as not to exceed the target while simultaneously trying to meet certain quality of service criteria.

There are however many drawbacks to centralised orchestration, one of them being that it represents a single point of failure. Another, and perhaps more fundamental one, is that it requires the controller to have exhaustive knowledge of and full authority over the entire population of resource consumers and enough time and/or processing power to keep up with system dynamics, which can very quickly become an intractable problem, i.e. scalability barrier.

A decentralised coordination of resource usage which overcomes some of these problems is known from patent application US20070039004. This application discloses systems and methods operable on a network to coordinate resource utilization amongst agents in a multi-agent system in a decentralized manner. Interconnected agents circulate coordination keys amongst coordination group members. The coordination key includes information defining the coordination group, resources coordinated by group members, and information about scheduled resource utilization. However, this is a complicated system requiring coordination of resource schedules between the agents resulting in an agreement between the agents on who is using what at which time.

Summary of invention

A first aspect of the present invention discloses a method for operating a distributed system comprising a plurality of resource consumer computer nodes utilising a resource; each resource consumer computer node comprising at least one software agent and said software agents communicating with each other over a communications network; said method comprising: setting a total utilisation cap value/for the resource; signalling said total utilisation cap value to one or more of said software agents over said communications network; forming groups of resource consumer computer nodes and their associated agents; computing a local resource utilisation cap value in each group based on the set total utilisation cap value; and using the local resource utilisation cap value to locally enforce control of utilisation of the resource for the nodes in each group such that the local utilisation cap value is not exceeded. By forming groups and implementing and enforcing a local utilisation cap value in each group, it is possible to reliably enforce a guaranteed cap in terms of resource utilisation in a scalable way without the need for centralised orchestration.

According to a second aspect of the invention there is provided a system for decentralised control of utilisation of a resource for which a total utilisation cap value is set; the system comprising: a communications network; a plurality of resource consumer computer nodes arranged in operation to utilise said resource, characterised in that where each resource consumer computer node is associated with one or more software agents; said resource consumer computer nodes arranged in operation to communicate with each other via their associated software agents over said communications network; a module arranged to signal said total utilisation cap value to one or more of said software agents; a group forming module or at least one software agent adapted to form one or more groups of resource consumer computer nodes and their associated software agents; and a local controller in each group adapted to compute a local utilisation cap value for the group of resource consumer computer nodes based on said set total utilisation cap value and to use the local resource utilisation cap value to enforce control of utilisation of the resource for the nodes in the group to the resource such that resource utilisation in the group does not exceed said local utilisation cap value.

Embodiments of the invention will now be described, by way of example, with reference to the accompanying drawings in which: Figure 1 shows a system implementing decentralised access control to a resource.

Figure 2 shows the system of figure 1 where resource consumers or nodes are grouped into subpopulations.

Figure 3 shows a flowchart for one exemplary embodiment of a server loop Figure 4 shows a flowchart for one exemplary embodiment of a software agent loop. Figure 5 shows a flowchart for a general embodiment of the invention. Figure 6 shows the components of an exemplary server.

Detailed description of invention

With reference to figure 1 a general overview of a decentralised consumer community implementing the present invention is described. A large number of consumers 10, in this embodiment represented as servers of which a few are shown, are distributed over a geographical area. The servers consume a resource, in this case electricity, delivered from a power network 20. The servers 10 are connected to a central server 12 over a network 14. Referring to figure 6, the servers 10, 12 are preferably general purpose computers, each server comprising at least an interface 22, processor 23, at least one memory 24, at least one store 25 and a bus 21 over which the different parts of the server communicate with each other. Each server is configured with one or more software modules 16 which when loaded from storage 25 into one of the memories 24 and run by the processor 23 adapts each server to perform the different embodiments of the present invention. The central server 12 comprises a software module 16 arranged to form groups of nodes as well as a module 16b for defining and signalling global parameters such as a total utilisation cap value for a resource and group forming parameter values if set centrally.

Each server 10, and optionally also server 12, is paired with at least one software agent 16a, which is part of an overlay signalling network 18. Each agent 16a is configured to communicate via interface 22 with other software agents 16a over the overlay network 18, preferably using peer-to-peer communication (P2P). Each server 10 preferably stores in the store 25 one or more status lists 26 listing the status and identity (id) of the server itself, and status and identity (id) of its associated agents 16a. The status for the server 10 can for example be "idle", "loaded" or "busy". The status for the agents 16a can for instance be "available" or "recruited".

Consumer systems which do not normally themselves comprise a built-in computer need to be equipped with such features. Examples of such consumers are for example water consumer systems, gas consumer systems etc.

In order to save energy or to better balance energy supply and demand a total utilisation cap value is set, limiting the number of servers or nodes that can be active or run simultaneously.

Embodiments of the present invention implement decentralised access control to resources by creating groups of resource consumers, or nodes; calculating a local utilisation cap value in each group based on the set total population cap value and enforcing the localised cap to resource usage in each group such that the total cap value for the resource is not exceeded. The total utilisation cap value is signalled, possibly from the central server 12, over the overlay network 18 to one or more, but preferably all, of the software agents 16a and stored in the respective server 10. Agents 16a which received the message can thereafter communicate the total utilisation cap value to any agents that did not initially receive the message.

A large number of end users require services from the servers 10, however, in accordance with the total utilisation cap only a percentage of the servers are allowed to work at a time in order to save energy, or balance energy supply and demand.

Setting of cap values

The total population cap value / can be set in terms of Permitted_use versus Maximum_use of a resource such as power, giving: f = Permitted_use / Maximum_use, where/ is a real number in the range (0.0,1.0) representing the maximum fraction of nodes, or servers, that is permitted to run simultaneously.

If nodes in the distributed system consume the same, or fairly the same, level of resources the number of nodes allowed to be active in a group is determined in the following way. For a group or subpopulation / having n_subpopj number of nodes in the group, the number of nodes allowed to be active, or servers allowed to run, is:

WsubpopJ = flOOr( f * risubpopj j)

where the floor () function is a function to round a real number down to the nearest integer. If nodes in the distributed system consume different levels of the resource, for each node j, a slightly different approach is required. In this case a pro-rata maximum level of resource demand for a group or subpopulation / of size n_{subpop j} can be calculated as follows:

max_demand_subp0pj = (n_subpopj / total_population_size) * Maximum_use A local resource utilisation cap value can then be defined, in terms of a local resource consumption level, to be / * max_demand_subp0pj.

Group formation

Irrespective of if the nodes in the distributed system consume fairly the same or different level of resources; the global population of nodes initially have to be divided into groups. The nodes or servers can be divided into subpopulations or groups either by using centralised methods or decentralised methods.

One or more global parameters can be used to assist in both types of group formation, such as the following integer parameters: The parameter target_subpop_size specifies a preferred target size of a group; the parameter max_subpop_size specifies the maximum number of nodes in a group, and the parameter min_subpop_size specifies the minimum size of a group.

The same parameters can also be used for cases where the total utilisation cap value/ is expected to change dynamically.

The target size of a group, i.e. the parameter target_subpop_size , can be set with reference to a desired efficiency level, in terms of how closely the cap can be approached without actually breaching it. This can be specified via a further parameter value delta_f that indicates how far beneath the target/value is considered acceptable from an efficiency perspective. Since the level of quantisation within a subpopulation is one node, i.e. changing state between active-inactive, the parameter target_subpop_size can be set with reference to parameter delta_f: target_subpop_size = ceiling( 1.0/ delta_f ), where the ceiling() function is a function to round a real number up to the nearest integer.

For example, if the total population cap value is set to /= 0.7 but it is considered acceptable to approximate this by 0.6, then delta_f would be set to the difference delta_f = 0.7 - 0.6 = 0.1. This would give rise to a value of target_subpop_size = 10. Since the resource consumption within a group is governed by the number of nodes allowed to be active, when one node changes state from active to inactive or vice versa, it will affect the consumption within that group by l/10^th or 0.1 as desired. So delta_f can be used to help set a suitable population size to achieve the level of granular control that is desired.

Centralised methods for creating groups comprise for example: -Creating a number of empty subpopulation lists or groups and then assigning all nodes randomly across those lists/groups or sequentially by node identifier. The number of lists/groups could preferably be determined by dividing the total number of nodes with the defined target group size, target_subpop_size .

-Recursively subdivide the total population of nodes or servers until subpopulations of a size less than or equal to the maximum number of nodes per group, max_subpop_size, are created. The subdivision could be random or based upon node identifiers, location, or other means.

Decentralised methods for forming groups comprise for example:

-Using peer-to-peer (P2P) communications between nodes to coordinate the process of aggregation, from individual nodes to subpopulations containing two or more nodes. The size of the desired subpopulations could be guided by reference to the global parameters target_subpop_size or min_subpop_size. During an initialisation stage, a subpopulation would be created for every node and then a period of aggregation or merging of subpopulations could be undertaken until all subpopulations have reached at least the specified size.

Within each group it is useful to designate a local controller arranged to enforce the local cap and to manage aspects such as membership and transfer of nodes to or from other groups. The designation could be by various means such as a winner-takes-all competition between subpopulation members; arbitrarily based on unique node identifiers where for example the node with highest identifier value is designated as the local controller; or always designate the active node with the longest or shortest queue as the local controller.

For resilience, the nodes in a group could be arranged to signal each other with a certain frequency to check which node is the local controller and if there is none in the group, because the group has just formed or the server or node of the selected local controller has gone down, then initiate selection of a new local controller using any of the methods described above.

Although not strictly necessary, it may be desirable to allow group membership to change, i.e. nodes to transfer between subpopulations. One reason to do this would be to allow the global cap to be more closely met, by aligning the fraction of nodes allowed to run in a subpopulation more closely to the global parameter/. This involves seeking to dynamically match the ratio of active to inactive nodes as closely as possible to/, by adding or removing nodes from a subpopulation, thus changing the number of nodes in each subpopulation and potentially changing the number of concurrently active nodes in a group. A test to decide whether to add a member to a subpopulation or group might be:

\ffloor(f * (n_subp0p_i +x))>floor(f * n_subpopj), where x >1, then seek to add a member, i.e. only seek to add a member to a subpopulation if it would increase the number of nodes that can be active at a time in the group, m_subpopj . The value of x can be changed.

Similarly, a test to decide whether to seek to lose or make available for transfer a member of a group might be:

\ffloor(f * (n_subpop_i -l))==floor(f * n_subpopJ) then seek to lose a member, i.e. the number of nodes that can be active at a time in the group, m_subpopJ , would be unchanged by the loss, and so /would be more closely approximated. A group / preferably signals to other groups that it would like to gain a member by maintaining a Boolean state wantmore _SUbpo_Pj which is true if it is seeking to add a member, or it can maintain a list Spare_nodes_subpopJ which lists any nodes it is seeking to lose (if any) from its set of

lnactive_nodes_{subpop i}. System efficiency as a whole depends on the efficiencies achieved across all subpopulations or groups, assuming that every node is always a member of a subpopulation. Since subpopulations may differ in size, their local pro-rata caps are also likely to differ, which according to the above tests means that some may be seeking to add members whilst others are seeking to lose members. Matching up pairs of such subpopulations to enable a transfer of a node requires either coordination via the central controller 12 or P2P signalling via subpopulations over the signalling network 18. These mutually beneficial transfers help to increase the efficiency of the system as a whole.

Transfer of a node from one subpopulation to another requires removal of the node from subpopulation / and addition of the node to subpopulation j. Care must be taken to enforce the respective local caps, and this can be managed by ensuring that a node is only made available for transfer from subpopulation / provided that it is an active node that is deactivated prior to transfer and that the remaining nodes still fulfil the requirement that the number of inactive nodes in the group are {n_subpopj - m_subpopJ). Also, a node is only added to subpopulation j as an initially inactive node, within the updated subpopulation.

A hard upper limit on the size of subpopulations could be enforced if necessary by only allowing subpopulations to grow by adding a member if n_subpopJ < max_subpop_size.

A first embodiment of the invention will now be described with reference to figure 2.

This embodiment relates to a power management system for distributed servers 10. Only sixteen distributed servers are shown in the figure, however, the actual number of servers is typically much higher. The central server 12 manages the setting of the total power utilisation cap value /as well as the setting of the current group forming parameters. The central server 12 signals the servers or rather their associated agents 16a, that the total power utilisation cap value/ is set to 0.5 and the parameter target_subpop_size is set to 4, the parameter max_subpop_size is set to 5 and the parameter min_subpop_size is set to 3. The central server 12 also instructs the servers 10 on how to select a local group controller; in this embodiment the rule states that the node with the largest workload should be the local controller. In order to be allowed to process their workloads, the servers 10 begin to form groups based on the received parameter values. Initially each server 10a, 10b, 10c and lOd creates a group for itself followed by a merger phase, where the "one server groups" merge to form groups of sizes that conform to the set parameter values. As a result group 101 having four members, group 102 having three member, group 103 having four members and group 104 having five members are formed. Next a local controller 101C, 102C, 103C, 104C is selected in each group. In group 101 server lOd is active and has the largest workload and is hence elected as the local controller 101C. Each controller 101C, 102C, 103C and 104C determines a local utilisation cap value, m, for its group 101, 102, 103, and 104 based on the set total utilisation cap value/such that the sum of the power consumption of all the servers in the groups do not exceed the total utilisation cap value/ set for the power utilisation. The local controller 101C, 102C, 103C and 104C ensures via local P2P signalling to all other group members that (n_subpopj - m_subpopJ) number of servers in its subpopulation are always inactive. For example, the local controller should ensure that if the number of servers processing jobs equals the maximum number of servers allowed to process jobs concurrently, one active server is always deactivated before another server is allowed to activate, so that only at most m_subpopJ number of servers are active at any given time.

Each local controller for a group maintains the addresses and/or identities of all members of the group so that it can communicate with them; it also maintains a list of which servers are currently active and which servers are inactive. Table 1 shows an example of such a list maintained by the controller 101C for the group 101 when the total utilisation cap value/ is set to 0.5:

Tablel.

Preferably, the list also states the current work load for each node.

The activation of nodes is here prioritised according to need, e.g. inactive nodes that are loaded with more jobs would be prioritised for activation first. Hence, in this embodiment the local controller lOlC/lOd would signal the servers 10a and lOd (itself) that they are allowed to process their workload and signal server 10c that it can start to process its workload only when any of the servers 10a and lOd has processed all their jobs and changes into an inactive state.

The groups 101, 102, 103 and 104 vary in size and in order to optimize the resource utilisation group 102, consisting of three servers, checks whether it should add a server to the group. This is done by testing \ffloor( f * (n_subpopJ +x))>floor( f * n_subpopJ), where x >1, which in this case, with x=l, corresponds to 2>1, hence the test is true and group 102 therefore signals that it would like to add a server to the group.

Group 104 on the other hand comprises five servers and performs the test " \ffloor(f * (n_subpopJ - l))==floor( f * n_subpop_i) then seek to lose a member", i.e. 2 == 2. Since the group cannot currently allow more than two of its five servers to run simultaneously it would be more efficient if it did lose a server and hence advertises this fact; the group would still be able to run two servers

simultaneously.

The end result would highly likely be that group 102 takes over one server from group 104 where after the groups 101, 102, 103 and 104 would all comprise four members of which two are allowed to run simultaneously in each group in order to not exceed the local resource utilisation cap.

In a further embodiment of the invention the groups are formed centrally by the central server 12. The total number of distributed servers is in this embodiment four hundred, which are all initially inactive. The central server 12 generates forty lists with space for ten servers in each list and randomly assigns all servers across these lists; hence, forming forty groups each having ten members. The central server signals the agents 16a over the network 18 that the global utilisation cap value/ is set to 0.4 and that the agent associated with the server having the longest queue of jobs should be the local controller. Based on the total utilisation cap value 0.4 the local controller determines that at most four servers in the group may run at a time, ranks all the servers according to their workload, and instructs itself and the other three highest ranked servers to change to an active state and to process their workload. Once one of these servers has processed its job queue and returned to an inactive state the next server in the ranked list is allowed to process its load, hence keeping the local utilisation cap value at 4. As in previous example the server stores a list comprising the identity, address, current status and load for each member in the group. The list is continuously updated to keep track of the status and workload of each member.

Another embodiment of the invention will now be described with reference to figures 1, 3 and 4.

This embodiment uses an alternative way of specifying the total utilisation cap value/ in terms of a rational number p/q, representing the maximum fraction of consumers allowed to be active at a time. The embodiment is focused on a special case where a subpopulation is defined via a

"workforce" grouping of agents of size n_subpop_ for which only one node is allowed to be active, i.e. rrisubpopj =1 and p=l. By choosing a priori the size of the subpopulation, n_subpopj, it is effectively possible to enforce a ratio of active to inactive nodes and thus define a local cap value via that ratio. It is assumed here that q is relatively small, versus the total number of nodes in the system. Provided the total number of nodes is an integer multiple of q, then all groups could be of equal size. In order to facilitate the description of the algorithm, it will be assumed that it is applied to managing data-centre power load; hence the target can be expressed in terms of the number of servers that can work concurrently without exceeding the limit. Assuming that this number represents 25% of all servers in the data-centre, this effectively means that for every one server active, three must be in "sleeping" mode, hence the total utilisation cap value /=l/4. As in previous embodiments each server 10 is paired with one or more software agents 16a and linked to a tuneable number of other servers 10 in the virtual or overlay network 18 of arbitrary topology. The agent 16a receives the total utilisation cap value /=l/4 from the central server 12, and stores this value in the server. In this particular embodiment the local utilisation cap is equal to the total utilisation cap value and hence the agent determines that the local cap value, here defined as a ratio, is also ¼ and stores this value as well.

Each server 10 stores in the store one or more status list listing the status and identity (id) of the server itself, and status and identity (id) of its associated agents 16a. In the specific embodiment described here each server has a single associated agent, but in fact the flowcharts shown in figures 3 and 4 can similarly be applied to a case where server can have more than one agent. The status for the server 10 is idle, loaded or busy. Loaded indicates that the server has a queue of jobs waiting to be processed by the server. The length of the queue, or number of jobs waiting, is preferably also indicated. The status for the agent is either available or recruited. If the agent of a server has been asked, or recruited, by another agent to form a group with this other agent, the identity of this agent would be indicated in the table as well. See Table 2. Id Status No. jobs in Recruited by

queue

Server ABCD Busy YYYY

Agent 1234 Available

Table 2

Each server also stores a list with the identity of other agents/servers which the agent has recrui and formed a group or subpopulation with, see Table 3.

Table 3 A server loop for a server will now be described in relation to figure 3. When a server's job queue is empty, its agent remains dormant or idle and in a "wait" state [300] for a time t. After time t has expired, the server checks if the job queue is still empty, [301]. If empty the status for the server is set as idle, [302] and the server releases any recruits earlier recruited by the agent, [303]. If the job queue is not empty the server checks whether the size n of the workforce assembled by the agent, i.e. number of recruits plus available agents, exceeds a threshold, [304]. This threshold can be defined as n≥ ceiling(n₀*q/p), where n₀ is the number of agents per server. If not, the server checks its status [306] and if loaded instructs the agent to perform a random walk in the virtual network, "visiting" other servers 10, [307]. If the agent paired with the "visited" server has the status "available", this information is maintained by the visited server, the agent is recruited and its ID and the ID of its paired server is added to the list of recruits for the "visiting" agent's "home" server. The state of the recruited agent is changed from "available" to "recruited". The server loop then returns to the waiting state, [300].The server then repeats the applicable steps 30 to 37 until the workforce assembled by the agent meets or exceeds the set threshold, thus ending its random walk; where after the status of the server is changed to "busy", [308]. In the next step the server checks if the assembled workforce is above target, [309], and if yes it releases at least one recruit, [310] before processing the job. If the workforce matches the target work force the server processes the job directly,[312], where after it returns to the wait state [300].

This process forms the basis for the distributed control algorithm and the forming of groups in this specific embodiment: in the 25% target example, in order to be allowed to process a job, a server needs to have secured three recruits plus its own, or "resident", agent which are no longer available to other servers, guaranteeing that the local utilisation ratio 1/4 is respected.

The details of an algorithm governing agent decisions with respect to recruiting other agents will now be described with reference to figure 4. "Home" always refers to the server with which the agent was paired at initialisation. "Host" always refers to the server that the agent is currently investigating. Note that the agent doesn't need to be physically located on the "host", the term rather designates its logical position in the overlay network. At every time-step [400], the agent checks the state of its "home" server, which as stated earlier can be in three states: "idle" -there are no jobs in the queue, "busy" - the server is processing jobs, or "loaded" -the queue is not empty but no processing is taking place, because the server doesn't have enough recruits. If the "home" is "idle" or "busy", the agent has nothing to do.

If the "home" is "loaded", it means that recruits are needed and the agent must search for them. Before the agent start to search for recruits it checks its own status, namely if it is recruited, [401]. If it is already recruited it returns to the wait state [400]. If the agent is not recruited it begins a random walk in the overlay network 18, one hop per time-step, whereby the agent moves from one "host" to the next, [402]. In order to avoid back-tracking the agent keeps a record of hosts that have already been visited during the course of the current random walk. If it finds itself trapped, that is all the neighbours of its current "host" have already been visited, the random walk is reset with the agent's "home" as its first "host" and the record is erased, i.e. the process starts over.

Upon arrival on a new "host", the agent first checks if the "host" is busy, [403]. If it is, then recruiting its paired agent could lead to the data-centre exceeding the global 25% target. If the "host" is busy the agent hence returns to the wait state [400]. If the "host is not busy, the agent checks if the agent with which the "host" is paired is still "available", [404]. If it is available the agent next checks whether the workload at the local "host" is lower or equal to that at "home", [408]. This is a regulatory mechanism designed to give advantage, i.e. a higher priority, to those servers that have accumulated the longest backlog. If the "host" queue is shorter than the "home" queue the local agent's state is turned from "available" into "recruited" and it is logically transferred, [409]. In effect, this means that the recruiting agent's "home" adds the new recruit to the list of recruited agents, which serves as both a headcount and as a reference to the identity of the recruits.

If the "host" is not busy, but the host agent has the status "recruited" then the visiting agent is attempting a secondary recruitment procedure that consists in trying to "capture" one of its "host's" recruits. The agent therefore checks that the "host" has at least one recruit, [405]. Next, the agent checks if the number of recruits at the "host" is lower or equal to that at the agent's "home", [406], and if so it hijacks one recruit from the "host", [407], before returning to the wait state [400]. The hijacked agent's status remains the same but it is transferred from the "host's" list of recruited agents to the "home" servers list of recruited agents.

This "rich becomes richer" logic is designed to break deadlocks. It is otherwise possible to reach a situation in which all servers enter a "hoarding" behaviour, holding on to existing recruits but failing to increase the number so as to reach the number of recruits required in order to processing a load. This would result in terminal paralysis of the whole system, as all servers effectively deny each other access to the recruits they need to switch state from "loaded" to "busy" allowing them to process their workload and subsequently release their recruits. Optionally, there could be a queue length threshold associated with the recruiting process such that a node/agent does not initiate the recruiting process unless the job queue at the node has reached this threshold length. It is also possible to envisage an "emergency" procedure whereby a server that has been unsuccessfully trying to recruit the target number of agents spontaneously releases its recruits after a certain time in order to avoid deadlocks.

A further embodiment of the invention will now be described with reference to figure 5. This embodiment could apply to several scenarios. For example, a large fleet of consumer devices could require an overnight software update and it is required that the total demand on the backend server infrastructure remains below a certain target. In order to achieve this each customer device is deployed with a client or agent configured to cause de-synchronisation of server connection attempts by forcing each device to recruit a target number of counterparts before being allowed to place an update request. A server in the backend server infrastructure is in this embodiment responsible for setting the total utilisation cap for the backend servers as well as setting the group forming parameters. The server thereafter signals the total resource utilisation cap value and the parameters to the clients in the user devices.

The general setup algorithm for the process would be as follows: [500] Set total resource utilisation cap value in terms of a rational number with f = p/q.

[501] Set group forming parameters:

Set target_subpop_size = q.

Specify a delta_f parameter value that indicates how far beneath the target/value is considered acceptable from an efficiency perspective.

Set min_subpop_size = ceiling( 1.0 / delta_f).

If target_subpop_size < min_subpop_size then set target_subpop_size =

min_subpop_size.

Set max_subpop_size = total_pop_size / 2.

[502] Create a subpopulation for every node and then aggregate these subpopulations until the number of members in each formed group is equal to or above the target group size, n_SUbpop_i ^>= target_subpop_size .

[503] In each group, select via P2P signalling the most loaded node to be the local controller to locally manage the subsequent group tasks.

[504] In each group, the local controller creates a list of all nodes within the group, ranked from longest job queue to shortest job queue.

[505] For each group the local controller sets the local cap utilisation value to be m:

msubpopj ⁼ floor( f * n_subpop_i).

[506] For each group, the local controller defines the Active_nodes_{subpop i} to contain the first msubpopj nodes from the prioritised list, and lnactive_nodes_{subpop i} to contain the remaining nodes from that list.

[507] For each group, the local controller signals all I 'nactive_nodes_{subpop i} to become inactive and upon confirmation then signals all Active_nodes_subpop to become active.

The system there after enters a dynamic state where all groups are maintained by their respective local controller per time-step, with the following features:

[508] The local controller updates its ranked list of nodes from longest job queue to shortest job queue according to jobs being completed by nodes and newly arriving jobs. The status for nodes for which all jobs are complete will be changed to 'Inactive'.

Nodes having the status 'Inactive' for which jobs are waiting may be activated and their status changed to the 'Active' provided that the total number of active nodes would not exceed m_subp0pj , i.e. respecting the local utilisation cap.

[509] Any mutually beneficial node transfers are carried out between groups, coordinated by a central server. To facilitate this, the local controller determines if it would benefit from gaining or losing one or more nodes from/to another group, by updating a Boolean state wantmore _SUb_Po_P_i and a list Spare_nodes_subpopJ which lists any nodes it is seeking to lose (if any) from its set of lnactive_nodes_subpopJ.

[510] Check if the local controller is about to become inactive, if so the process returns to step [503] where a new local controller is selected, the most loaded node at that time-step, and transfer control to this node. If the local controller is not about to become inactive the process loops back to [507] where the current status of each node is signalled to respective node.

In a further embodiment the nodes in the distributed system consume different amounts of energy, η, for each node j, i.e. some nodes are less efficient and consume more resources than other nodes. As stated earlier the pro-rata maximum level of resource demand for a group of such nodes can be calculated as follows: max_demand_subpop_, = (n_subpopJ/ total_population_size) * Maximum_use

The Maximum_use is the total resource consumption of all the nodes in the distributed system. A local resource utilisation cap value can then be defined in terms of a local resource consumption level to be / * max_demand_subpopJ,

A subset of the group's nodes can be chosen to comprise the Active_nodes_subpopJ , indicated in the list maintained by each local controller, such that the total resource required for these nodes to be active, i.e. the sum of the η for all nodes in the list, is less than or equal to/ * max_demand_subpopJ, In this case, m_subpopJ is set to be the number of nodes contained in the Active_nodes_subpopJ list. The remaining (n_subpopJ - m_subpopJ) nodes in the group comprise the lnactive_nodes_{subpop i} list.

There are several ways of choosing and maintaining the list of nodes so as to still respect the constraint on total resource consumed, such as:

-Firstly initialise the Active_nodes_subpopJ list to be empty and then progressively add nodes of increasing index, until no more can be added without exceeding the local resource utilisation cap R.

-Evaluate differing combinations of nodes and their corresponding total resource consumption needs, to find the total that most closely matches / * max_demand_subpop_, without exceeding it. An example of this embodiment will be described with reference to figure 2 as well. This exemplary system only comprises sixteen servers to simplify the description of the invention; however, real world systems would comprise a much larger number of servers. The total utilisation cap value is set to /= 0.5 and the Maximum-use of energy if all sixteen servers were to be active at the same time is 220 kWh. Group 101 comprises four server nodes 10a -d, where each server consumes a different amount of energy when processing jobs; three of the nodes consume 10 kWh and one node 20 kWh The selected local controller 101C,which in this case is node lOd since it has the longest job queue, maintains a list of all the nodes in the group, indicating the identity of the node, if the node is active or inactive, the load at the node and also the level of power consumed by each node, see table 4.

Table 4

The max_demand_subp_Op_₁₀i is calculated by the local controller to be 55kWh. Since the total utilisation cap value is set to /=0.5, the local resource utilisation value is calculated to be 0.5 * 55 = 27.5kWh. The local controller 101C there after determines that since it consumes 20kWh for processing the jobs only its own server can be allowed to operate at this time. There after the controller signals server lOd (itself) that it can process its jobs and signals all the other servers that they are to remain in inactive state, otherwise the local utilisation cap value of 27.5kWh would be exceeded. Once server lOd has processed all its jobs, and therefore will change to an inactive state, server 10a is selected as the new local controller 101C. The controller signals server 10a (itself) and server 10c to become active and process their jobs, and signals servers 10b and lOd to be inactive. The list maintained by the new local controller is updated accordingly, see table 5 below.

Table 5

Hence, in this embodiment the number of servers in the group that can be active at a time varies based on the energy consumption of each server.

The above embodiments are to be understood as illustrative examples of the invention. Further embodiments of the invention are envisaged. In the embodiments described the same total resource utilisation cap value/ is applicable to all servers or nodes. However, situations could arise where different total utilisation cap values/ is applicable for different groups of servers or nodes in the system.

Alternative embodiments where consumer nodes are associated with more than one agent are possible, for instance in order to accommodate finer-grained ratios and allowing the fraction of busy or operating nodes to be set and controlled more flexibly. Any total utilisation ratio can be achieved in principle, as long as the number of agents per node is large enough and it is possible to enlist agents whose "home" server is "busy". In one further embodiment, the "resident" agent does no longer count towards its "home's" headcount. As a result, the number of recruits alone must reach the target. For instance, an /value of 1/4 will be achieved by forcing a server to enlist four recruits instead of three plus its "resident" agent, in order to be allowed to start. In another embodiment, a target ratio of /= p/q = 2/3, i.e. two "busy" nodes for every "idle" or

"loaded" one, can be enforced with two recruiting agents per node. Three nodes labelled A, B and C each have two associated recruiting agents Al, A2, Bl, B2 and CI, C2. Following the recruitment process earlier described, node A has recruited agents Bl, CI and C2 while at the same time its own agents Al and A2 have been recruited by node C. Node C has further recruited agent B2 from node B. Node B has not recruited any agents. Hence, the system configuration is: A[B1, CI, C2] + B[] +

C[A1, A2, B2] where A and C are allowed to process their load and B remains on stand-by /'loaded", until enough agents have been released, achieving the target resource utilisation ratio p/q =2/3.

Furthermore, the target ratio or total resource utilisation cap value / may vary over time, for instance to reflect fluctuating availability of a resource, such as for example the output of a solar power plant which will vary throughout the day. There are multiple ways to achieve this, with a variable level of flexibility. In the case of a repeating pattern, e.g. daily cycle, the easy approach is to have time-of-day ratios or / values built-in from the start. The solar-powered facility is configured to operate on battery between 8pm and 8am and to recruit six agents to achieve a lower p/q ratio during this time in order to be allowed to operate. Conversely, around mid-day, when the supply is plentiful, the target could be as low as two, which in the two agents per node scenario sketched above, would effectively mean that the data-centre can work at 100% capacity.

A more flexible way would be to include a total resource utilisation cap value / or target ratio with every new submission, making it job-specific. Although this approach would require re-introducing a measure of centralised management to assign a ratio to each job and choose which server to send it to, it has other advantages in that it could also be used to assign variable priorities. Indeed, assigning a higher / value to a high-value job would statistically result in it being processed faster, making it possible to support multiple service level agreement (SLA) profiles. For instance, allocating an /value of 0.75 to high priority job A and a value of 0.25 to low priority job B makes it much more likely that the server "in charge" of A will reach its threshold first and start processing. This however is at the expense of a hard guarantee. In the example above, although the average / between jobs A and B is 0.5, only their, likely improbable, simultaneous arrival at the head of the queue of their respective servers would safeguard the global target.

Exemplary embodiments of the invention are realised, at least in part, by executable computer program code which may be embodied in application program data provided by program modules in the respective nodes 10 and 12. When such computer program code is loaded into the memory of each device for execution by the respective processor it provides a computer program code structure which is capable of performing at least part of the methods in accordance with the above described exemplary embodiments of the invention.

Furthermore, a person skilled in the art will appreciate that the computer program structure referred to can correspond to the process flows shown in figures 3, 4 and 5 where each step of the processes can correspond to at least one line of computer program code and that such, in combination with the processor (CPU) in respective node or server 10, 12, provides apparatuses for effecting the described processes. Also, the modules or part of the modules for effecting the described processes can be implemented in hardware or a combination of hardware and software.

In summary, a method and system configured to cap utilisation of a resource to a defined global target value in a decentralised manner is disclosed. Groups of resource consumers are formed and a local utilisation cap value is determined for each group based on the global target value. A local controller in each group controls access to the resource for each member such that the local utilisation cap value is not exceeded, and hence enforcing that the resources consumed by all the resource consumers in respective group do not exceed the defined global target value.

The method and system would be very useful in large and dynamic environments such as "virtual" server farms, comprising of a variable number of physical facilities.

Claims

1. A method for operating a distributed system comprising a plurality of resource consumer computer nodes (10) utilising a resource (20); each resource consumer computer node comprising at least one software agent (16a) and said software agents communicating with each other over a communications network; said method comprising:

setting a total utilisation cap value/for the resource;

signalling said total utilisation cap value to one or more of said software agents over said communications network;

forming groups of resource consumer computer nodes and their associated agents;

computing a local resource utilisation cap value in each group based on the set total utilisation cap value; and

using the local resource utilisation cap value to locally enforce control of utilisation of the resource for the nodes in each group such that the local utilisation cap value is not exceeded.

2. A method according to claim 1 comprising the local utilisation cap value is further

determined based on the number of resource consumer computer nodes in the group.

3. A method according to any preceding claim wherein the total utilisation cap value/ is set as a real number in the range (0.0, 1.0) or set in terms of a rational number p/q, where/ represents the maximum fraction of the total number of resource consumer computer nodes that are permitted to be active at a time.

4. A method according to any preceding claim comprising defining the local utilisation cap value for a group of a size n, where n is the number of resource consumer computer nodes in the group, as m =floor( f * n), such that m is the number of resource consumer computer nodes in the group allowed to be active at a time.

5. A method according to any of claims 1 to 3 comprising defining the local utilisation cap value for a group as a function of the total utilisation cap value/ and a local maximum level of resource consumption for the nodes in the group.

6. A method according to claims 3 or 4 comprising forming one or more groups of resource consumer computer nodes where the number of resource consumer computer nodes in each group equals q.

7. A method according to claim 6 comprising a first software agent signalling another software agent to determine: a) the current state of the other agent b) the current state of the resource consumer computer node associated with said other agent, and c) the workload at this other node; evaluating a response from the other agent and if possible forming a group with said other agent; and repeating the process by signalling other agents until the number of nodes in the group equals q.

A method according to claim 7 comprising if the response from the other agent indicates that a) agent is not available and b) node is idle or loaded, said first agent further determining the number of agents recruited by said other agent and if the number is lower than the number of agents already recruited by said first agent, taking over one of the recruits from said other agent.

A method according to any of claims 1 to 5 comprising defining values for group forming parameters such as a minimum size of a group, a maximum size of a group and a target size of a group, and forming groups of resource consumer computer nodes based on the defined values for one or more of said parameters.

10. A method according to claim 9 comprising defining a further parameter value delta_f,

indicating an allowed deviation from the set target total utilisation cap value/and defining a group forming parameter value as a function of delta_f.

11. A method according to claim 9 or 10 comprising forming groups by initially creating a group for each resource consumer computer node and merging groups until all groups have reached either the defined minimum size of a group or the defined target size of a group.

12. A method according to any preceding claim comprising designating one of the softwa

agents in each group as a local controller.

13. A method according to any preceding claim wherein the software agent, or local controller, is enforcing the control of utilisation of the resource by determining the number of currently active resource consumer computer nodes in the group, checking the local utilisation cap value and signalling to the other software agents in the group over the communications network if their respective resource consumer computer node is allowed to be active or alternatively is not allowed to be active.

14. A method according to any of the preceding claims further comprising testing if the total cap utilisation value can be more closely met by either adding a resource consumer computer node to a group or removing a resource consumer computer node from a group and if so said group indicating that it seeks to add a resource consumer computer node or respectively seeks to lose a resource consumer computer node.

15. A system for decentralised control of utilisation of a resource for which a total utilisation cap value is set; the system comprising:

a communications network (14);

a plurality of resource consumer computer nodes (10) arranged in operation to utilise said resource,

characterised in that where each resource consumer computer node is associated with one or more software agents (16a); said resource consumer computer nodes arranged in operation to communicate with each other via their associated software agents over said communications network; a module (16b) arranged to signal said total utilisation cap value to one or more of said software agents;

a group forming module (16) or at least one software agent (16b) adapted to form one or more groups of resource consumer computer nodes and their associated software agents; and

a local controller (101C, 102C, 103C, 104C) in each group adapted to compute a local utilisation cap value for the group of resource consumer computer nodes based on said set total utilisation cap value and to use the local resource utilisation cap value to enforce control of utilisation of the resource for the nodes in the group to the resource such that resource utilisation in the group does not exceed said local utilisation cap value.

16. A computer program or suite of computer programs executable by a processor to cause the processor to perform the method of any one of claims 1 to 14.

17. A non-transitory computer readable storage medium storing a computer program or a suite of computer programs according to claim 16.