WO2012140401A1 - Automated construction of usage policies - Google Patents

Automated construction of usage policies Download PDF

Info

Publication number
WO2012140401A1
WO2012140401A1 PCT/GB2012/000346 GB2012000346W WO2012140401A1 WO 2012140401 A1 WO2012140401 A1 WO 2012140401A1 GB 2012000346 W GB2012000346 W GB 2012000346W WO 2012140401 A1 WO2012140401 A1 WO 2012140401A1
Authority
WO
WIPO (PCT)
Prior art keywords
battery
policy
load
charge
batteries
Prior art date
Application number
PCT/GB2012/000346
Other languages
French (fr)
Inventor
Maria FOX
Derek LONG
Daniele MAGAZZERI
Original Assignee
University Of Strathclyde
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University Of Strathclyde filed Critical University Of Strathclyde
Publication of WO2012140401A1 publication Critical patent/WO2012140401A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Definitions

  • the present invention relates to a system and method for optimising decision-making in a system that has multiple choices of action, for example a multiple battery system, where the choice is between which battery to load next, a traffic control system, where the choice is between alternative roads along which to direct traffic, and a navigation system where the choice is between alternative bearings and speeds.
  • a multiple battery system where the choice is between which battery to load next
  • a traffic control system where the choice is between alternative roads along which to direct traffic
  • a navigation system where the choice is between alternative bearings and speeds.
  • the load can be serviced entirely by a single battery at a time, so the problem is distinct from the management of cells within a single battery.
  • An example of a context in which multiple battery use occurs is in laptops equipped with multiple battery bays. More efficient use of multiple batteries can be achieved by exploiting the phenomenon of recovery, which is a consequence of the chemical properties of a battery: as charge is drawn from a battery, the stored charge is released by a chemical reaction, which takes time to replenish the charge. In general, charge is drawn from a battery faster than the reaction can replenish it and this can lead to a battery appearing to become dead when, in fact, it still contains stored charge. By allowing the battery to rest, the reaction can replenish the charge and the battery become functional once again.
  • the policy is constructed to select a new battery whenever the voltage of the battery currently servicing a load drops below a certain threshold.
  • the next battery is selected according to one of four alternative policies: V_max: select the battery pack with highest state of charge; V_min: select the battery pack with lowest state of charge; T jnax: select the battery pack that has been unused for the longest time; T_min: select the battery that has been unused for the shortest time.
  • V_ max is the best of these policies, tested on up to four batteries.
  • a method for optimising performance of a system comprising modelling the system behaviour; generating sample instances of the system behaviour; solving the sample instances; constructing training data and learning a policy from the training.
  • measurable improvements can be observed in the performance of the system.
  • a sample is defined with reference to a problem, which is itself defined by a set of parameters whose values are determined by a collection of probability distributions.
  • a sample is an instance of the problem where the parameters are chosen according to the underlying probability distributions.
  • Modelling system behaviour may be done using any suitable model.
  • the KiBaM and a specification of the distributions may be used. Modelling may be done in any language (for example PDDL+) amenable to solution by search-based approaches.
  • the specification of load distributions can be any form that permits sampling.
  • Samples may be generated using Monte Carlo sampling of the distributions.
  • Alternative sampling strategies are possible, including weighted sampling strategies to encourage particular parts of the solution space to be explored.
  • the known planner UPMurphi may be used taking the PDDL+ model as input, together with the individual samples (also converted to PDDL+).
  • alternative planners could be used.
  • the planner must be capable of solving the instances (so must have a way to handle non-linear continuous dynamics).
  • the planner may use a dynamic (variable) discretise-and-validate cycle. Fixed discretisation is possible (though less efficient).
  • a tailored search-based solver could be used.
  • Constructing the training data can be done by generating discretised state trajectories using a fixed discretisation.
  • An alternative is to generate training data by perturbation of the solved instances. It is possible that a variable discretisation could be used.
  • a classifier can be used. More generally, any machine- learning approach could be applied here, including neural networks, support vectors and so on. In the case of the multiple battery application, measurable performance improvements are achieved in increased battery lifetime and reduced switching between batteries, when compared with other load management techniques.
  • Figure 1 is a flow diagram of a method for constructing a policy for controlling a multi-component system
  • Figure 2 is a flow diagram of a method for constructing a policy for controlling a multi-battery system
  • Figure 3 is a schematic view of a planner for use in the method of Figure 2;
  • Figure 4 is a block diagram of a multi-battery system that has a controller for controlling operation of the batteries to optimise use;
  • Figure 5 is a flow diagram of a method for controlling the system of Figure 4.
  • Figure 6 is an illustration of the charge states of an individual battery
  • Figure 7 is a simulation of a plan search with variable discretisation
  • Figure 8 is a simulation of a load test for a system with two batteries of type B 2 ;
  • Figure 9 shows simulated results for a load test for the same system used for Figure 8, but determined using V_max technique of the prior artl;
  • Figure 10 is a simulation of a policy for eight batteries with a stochastic load
  • Figure 11 is a diagram of an experimental two battery system for conducting comparative tests
  • Figure 12 shows a distribution of load durations used for experiments on the system of Figure 11 ;
  • Figure 3 is a run showing the behaviour of a best-of-two policy when applied to the two battery system of Figure 1 ;
  • Figure 14 is a run showing the behaviour of a plan-based policy in accordance with the invention when applied to the two battery system of Figure 11;
  • Figure 15 shows the best-of-two and the plan-based policy of the invention both running on the same load profile for the two battery system of Figure 11 ;
  • Figure 16 shows two executions of the plan-based policy on different load profiles for the two battery system of Figure 11 ;
  • Figure 17 shows the plan-based policy of the invention plotted with estimated available charge
  • Figure 18 shows the sequencing policy showing its shorter lifetime on load profile
  • the present invention provides a method for getting as close as possible to optimal system behaviour, depending on system constraints. This involves modelling the system behaviour; generating sample instances of the system behaviour based on an estimate of system usage; solving the sample instances using the system model; constructing training data and learning a usage policy from the training data, as shown in Figure 1.
  • the invention will be described in the context of a method for optimising multiple battery performance, but could equally be applied to any system that has a choice of outputs, for example a navigation system in which multiple options exist in terms of direction, and the time taken to reach the destination, or the fuel consumed, is a measurable output.
  • a navigation system in which multiple options exist in terms of direction, and the time taken to reach the destination, or the fuel consumed, is a measurable output.
  • Another example is a traffic light system in which traffic lights can be switched according to a defined policy in order to optimise traffic flow. In this case, reduction in observable congestion is a measurable output.
  • Figure 2 shows a flow diagram for determining a policy for controlling a multiple battery system to optimise the battery lifetime. This involves determining the distribution of loads. This can be done by determining a probability distribution for each of waiting time between loads, length of load, and size of load - loads may overlap in some systems and not in others. Then samples from the load profiles are generated. These are solved, and the resultant solutions are processed into classification examples. Once this is done, a classifier is applied to learn an optimised policy (a decision tree). The samples may be solved using any suitable technique, for example using a planner.
  • Figure 3 shows an example of a planner. This takes as input a model of the continuous dynamics of the batteries and a description of a specific problem (i.e. load profile).
  • the planner constructs a search space to find a high quality solution by expanding the alternative action choices.
  • Two important elements make this efficient: a dynamic discretisation - one action choice is to wait (while a load is serviced or until a load arrives).
  • a dynamic discretisation waiting times can be selected from a range of discrete choices, of decreasing sizes.
  • the search heuristic used to guide the search through the space of alternatives avoids constructing large parts of the reachable space and makes the approach scalable.
  • control policy can be used to map states in the system, i.e. level of charge in the batteries and load, to actions, i.e. which battery to switch to.
  • the control policy is provided in a controller that is software-based, although hardware such as FPGAs may be used. Since the structure of the decision tree is stateless, the controller could be implemented without a memory.
  • the controller may include or be able to communicate with sensors associated with the batteries and the load, so that both can be monitored. Equally, the controller may include or be connected to a switch, for example a simple multiplexing switch, that is operable to select the battery of interest in response to a signal from the controller, as shown in Figure 4.
  • Figure 5 shows the steps taken by the policy in use.
  • the policy samples the states of the batteries and the load. Based on this, the decision tree is used to identify the battery that has to be used to optimise performance. If necessary, a control signal is generated to cause switching to another battery. This process is repeated at an appropriate frequency.
  • the frequency may be determined by, for example, hardware limitations and/or load profile characteristics - a high frequency load profile demands similarly high frequency responsiveness. However, in most applications a process cycle of around 1 Hz is sufficient.
  • KiBaM Kinetic Battery Model
  • This is a deterministic non-linear continuous model of battery performance.
  • the KiBaM is described in "Extension of the kinetic battery model for wind/hybrid power systems" by Manwell, J., and McGowan, J. 1994, Proceedings of the 5th European Wind Energy Association Conference (EWEC 1994), 284-289, the contents of which are incorporated herein by reference.
  • This model is based on the assumption that the battery charge is distributed over two wells: the available-charge well and the bound- charge well, as shown in Figure 6.
  • a fraction c of the total charge is stored in the available-charge well and a fraction ( -c) in the bound-charge well.
  • the available-charge well supplies electrons directly to the load (i(t)), where t denotes the time, whereas the bound-charge well supplies electrons only to the available-charge well.
  • the charge flows from the bound charge well to the available-charge well through a "valve" with fixed conductance, k. When a load is applied to the battery, the available charge reduces, and the height difference between the two wells grows. When the load is removed, charge flows from the bound-charge well to the available charge well until the heights are equal again.
  • the KiBaM lends itself, in principle, to use in an optimisation problem solver that can find the best battery usage plan, given a load profile.
  • the load profile is generated by external processes, typically controlled directly or indirectly by user demands. These demands can often be modelled probabilistically, reflecting typical patterns of use. It is assumed that the profiles are drawn from a known distribution. Consequently, the planning problem ceases to be deterministic and becomes a probabilistic planning problem.
  • the problem can be cast as a hybrid temporal Markov Decision Process, in which the states are characterised by the states of charge of the batteries, the current load and the currently active battery. Battery switching actions are deterministic, but the events that cause load to change are not. The time between events is also governed by a stochastic process, but the timing of switching actions is controllable. More formally, for a problem with n batteries, a state is characterised by the tuple: (sb-r, sai; sb 2 ; sa 2 sb n ; sa n ; S; t;
  • wait( ) causes a transition to a state in which time has advanced to time t', which is less than or equal to (t + T)
  • the state of charge of battery B is updated according to the battery model and the load might be different (according to the probability distribution governing loads).
  • the interpretation of the action is that it advances time to the next event, which will be when a battery is depleted of available charge, or when the load changes, or when T time has passed, whichever is first.
  • hindsight optimisation is used, in which a deterministic sampled problem is solved using an optimising solver to generate an ideal trajectory for the problem instance.
  • a classifier is learned that characterises the policy for the part of the space sampled. This approach works well when the policy structure is less complex to represent than the value function for the problem space.
  • To extend the learned classifier to a complete policy involves ensuring that an action is assigned to every possible state. This can be achieved by adding some default behaviour to cover states that are otherwise not handled by the classifier, or else by managing run-time errors in the use of an incomplete policy in a way appropriate to the application.
  • the MDP is solved using an approach based on a combination of two ideas. Firstly, sampling from the distribution of loads to arrive at a deterministic problem, which is then solved using the continuous KiBaM as the battery model. This leads to a continuous non-linear optimisation problem, which is solved using a "discretise and validate" approach. Secondly, the solutions to the sample problem instances are combined to arrive at a policy for the MDP from which the problems are drawn.
  • the approach is domain-specific in some respects: the discretisation scheme, while based on general principles, is selected for the problem domain and load distribution. A search heuristic is used that, while generic, is not suited to all problems.
  • the aggregation of solutions into a policy makes use of an entirely general approach, but the extent to which the approach yields good policies depends on the nature of the problem space in which it is applied.
  • the multiple battery management problem can be considered an optimisation problem, when faced with a known and deterministic load profile.
  • dynamics of KiBaM are identified. This can be done using PDDL+ ( M. Fox and D. Long, “Modelling mixed discrete-continuous domains for planning” in Journal of Artificial Intelligence Research vol. 27, pages 235-297, 2006, the contents of which are incorporated herein by reference), which is an extension of the standard planning domain modelling language, PDDL, to capture continuous processes and events.
  • there is a durative action of variable duration that allows the planner to use a cell over an interval.
  • This action can be applied at any time and terminated after any positive duration, granting a planner complete freedom over the scheduling of cells in the battery.
  • a specific problem instance lists the available cells and their characteristics, together with a collection of loads as timed initial literals. The goal is to service all the loads and this is expressed by ensuring that each active cell increments the number of services currently available. An event is triggered if there is ever a positive load and no active service. This mechanism allows the planner to switch cells during a single load period.
  • PDDL+ as the modelling language grants several benefits. Firstly, it allows the use of VAL (R. Howey, D. Long and M. Fox, "VAL: Automatic plan validation, continuous effects and mixed initiative planning using PDDL" in Proc.
  • VAL Automatic plan validation, continuous effects and mixed initiative planning using PDDL
  • Planning-as-model-checking is a well-researched theme ( A. Cimatti, F. Giunchiglia, E. Giunchiglia and P. Traverso, "Planning via Model Checking: A Decision Procedure for R” in Proc. of 4th European Conference on Planning (ECP), 1997, the contents of which are incorporated herein by reference; S. Edelkamp, "Taming numbers and durations in the model checking integrated planning system” in Journal of Artificial Intelligence Research Special issue on the 3rd International Planning Competition, volume 20, 2002, the contents of which are incorporated herein by reference).
  • the key to this approach is the observation that a proof of state reachability corresponds to a plan.
  • a hybrid system is a system whose state description involves continuous as well as discrete variables.
  • the system In order to apply model checking algorithms and exploit reachability analysis, the system should have a finite number of states. Therefore the system is approximated by discretising the continuous components of the state (which is assumed to be bounded) and their dynamic behaviours.
  • F(s, a, d) e S 1 is denoted by F(s, a, d).
  • each state s e S is assumed to contain a special temporal variable t denoting the time elapsed in the current path from the initial state to s.
  • the notation t(s) is used for the value of variable t in state s.
  • is a trajectory, r s (k), ⁇ a(k) and ?r d (k) to denote the state Sk, the action and the duration d k , respectively.
  • the length of 7r is denoted withj/r
  • This set can be refined by the addition of smaller durations if successive searches fail to find a solution. Allowing different durations within the same search enables the model-checker to construct states that interact with executing processes at different time points, while stepping quickly along the timeline where there are no interesting features. To use variable discretisation efficiently, search space redundancy is eliminated by disallowing the use of long duration actions immediately following short duration actions.
  • Long duration actions can only be used if an event or other action has intervened since the last short action in the family.
  • the repeated consecutive use of short duration actions is disallowed beyond the accumulated duration of the next longer duration action.
  • the longest duration action can be repeated arbitrarily often. Provided an appropriate family of durations is used, this allows the space of reachable states to remain equivalent to the relevant subset of the reachable states using uniform discretisation with the smallest duration.
  • the battery domain has an important property that supports a simple heuristic evaluation function for states: the charge in the battery monotonica!y decreases over time and the optimal solution is the one that gives the longest possible plan.
  • An upper bound on the duration of the solution can be found using the observation that the optimal duration cannot exceed that of a single battery with combined capacity equal to the sum of the capacities of the multiple batteries (assuming the same discharging and flow behaviours).
  • the set of durations used for this example D ⁇ 0.01, 0.4, 0.5, 1.0 ⁇ (measured in minutes).
  • the set of durations is a systematically scaled set, with an initial collection determined by a manual selection based on the known properties of the load profile. A small duration is included in order to handle very sensitive interactions. Should this set prove inadequate, iterative refinement of the discretisation adds successively smaller durations to the set. In the initial state s 0 there is no load and no active service and both cells have a limited initial capacity.
  • Cell 2 is used for 0.5 minute. In the next period, no load is applied, then no cell is used. The transition ⁇ s 3 , wait, 0.5, s 4 > is considered, but it would lead to a positive load and no active service, so the duration of action wait has to be reduced to 0.4. 5. To service the last load period of 0.02 minute, cell 1 could be used. However, the remaining charge in cell 1 allows it to service only 0.01 minute. So, finally, cell 2 is used until the end of the load profile. The validity of a transition is dynamically checked during the search since invalid transitions trigger specific events (e.g.
  • event cellDead is triggered at step 2 and event disaster is triggered at step 4) which, in turn, violates the invariant conditions of corresponding actions (a cell must not die during use).
  • variable discretisation only six states have to be visited in order to reach the goal, while using a uniform discretisation, it is necessary to explore at least 242 states.
  • a further benefit of using differently sized durations in the discretisation is that favouring longer durations reduces the number of switches in the solutions generated, leading to solutions that are better in practical terms than those based on a high frequency switching between batteries.
  • these load profiles were used to validate the variable-range discretisation KiBaM model (DD- iBaM), and to find an appropriate discretisation for the continuous variables involved in the system dynamics (i.e. variables ⁇ and ⁇ and process durations).
  • VAL was used to validate solutions for the discretised model against the continuous model, using single cell batteries.
  • the second column shows the theoretical upper bound given by an extremely high- frequency switching.
  • the approach of the invention outperformed significantly the UPPAAL-based one, providing solutions that achieve more than 99% efficiency compared with the theoretical limit.
  • a battery switching policy can be determined.
  • the load profile applied to the batteries is not known in advance, but it is assumed that a probability distribution characterising typical use of the batteries is available.
  • the policy is a function that determines, which battery to use when load must be serviced, using the current states of charge of the available batteries as the basis for making the decision.
  • One way to approach this problem is to see the mapping as a classification, where the state of the batteries is mapped to a class corresponding to the correct choice of battery.
  • the solutions to the determinised problems can be used as the basis of a classifier construction problem.
  • An existing machine learning approach can be used to build a good classifier.
  • the successful construction of a classifier depends on there being exploitable structure in the space defined by the solutions to the determinised problems.
  • the states are described by continuous variables: these are discretised for the purpose of building the classifier.
  • the solution generally does not cover the whole space of reachable states, so it is important that the policy is completed with a sensible default rule when the input state is too distant from any of the previously encountered states.
  • the best default rule is a best-of-n rule, which is the best of the published hand-constructed policies for this problem. Deployment of the final policies requires that they can be efficiently implemented in cheap hardware.
  • Simple classifier rule systems can be very effectively implemented in look-up tables, which are ideal for implementation on Field Programmable Gate Arrays (FPGAs) or as purpose-built hardware.
  • FPGAs Field Programmable Gate Arrays
  • WEKA ( . Hall, E. Frank, G. Holmes, G. Pfahringer, P. Reutemann and I.H. itten, "The WEKA data mining software: An update” in SIGKKD Explorations volume 11(1), 2009, the contents of which are incorporated herein by reference) is a machine learning framework, developed at the University of Waikato, that provides a set of classification and clustering algorithms for data-mining tasks.
  • WEKA takes as input a training set, comprising a list of instances sharing a set of attributes.
  • ( ⁇ ,, ⁇ ,, ⁇ ⁇ , ⁇ ⁇ , ⁇ , L)
  • ( ⁇ ,, ⁇ ,, ⁇ ⁇ , ⁇ ⁇ , ⁇ , L)
  • ⁇ DOT and ⁇ denote the available charge and total charge of the ⁇ battery, respectively
  • B is the currently active battery
  • L is the current load.
  • the attribute used as the class is the battery B.
  • the stochastic load profiles have been defined with a distribution of:
  • the generated model requires significant memory to store (more than 500Mb of RAM memory), or it is too slow to be used. These parameters have also been used to determine the number of training examples to classify, as the bigger the training set, the better the performance and the higher the memory and time requirements.
  • the J48 classifier which implements the machine learning algorithm C4.5, was selected.
  • the output is a decision tree whose leaves represent, in our case study, the battery to be used.
  • This classifier resulted very well suited to deal with this task as 99% of instances have been correctly classified during the cross- validation/ performed by WEKA.
  • an empirical evaluation showed that the best result is obtained using 250,000 training examples (note that this involves considering about 4. 0 8 real values) since further extending the training set does not make any significant improvement in the performance but increases memory and time requirements.
  • the WEKA classes were embedded for loading the classification model into a battery simulation framework.
  • the model for the 8 battery case is represented by a tree with 61 levels and consists of 7645 nodes. Applying the decision tree to determine which battery to load at each decision point takes negligible time..
  • To evaluate the performance of the policy four probability distributions were considered with different average value for the load amplitude, namely 100, 250, 500, 750 mA. For each distribution 100 stochastic load profiles were generated and the policy was used to service them. load best-of-8 best-of-8 DD-Policy DD-Policy profile time sw time sw
  • Table 2 shows the average value and had standard deviation for the system lifetime and the number of switches obtained using the best-of-8 policy at high frequency switching, compared with our policy.
  • the policy of the invention achieved on average 99% efficiency compared with the theoretical upper bound given by the best-of-8 policy executed at very high frequency (recall that this is infeasible in practice).
  • the number of switches invoked by the policy is slightly greater than in the corresponding deterministic solving, but it remains one or two orders of magnitude smaller than the number invoked by the best-of-8 policy and is completely acceptable.
  • the present invention adapts several existing technologies from automated planning to solve a problem that can be seen as a DP.
  • a form of hindsight optimisation is used to generate samples of determinised load profiles and these problems are solved using an optimal deterministic solver, before combining the solutions to form a policy.
  • the policy construction approach adapts the use of machine learning to construct a classifier.
  • variable-range discretisation is used to solve a non-linear continuous optimisation problem with very high accuracy, while exploring a very small proportion of the state space.
  • This approach is scalable and effective, and incurs much lower mechanical wear than would be required to achieve the theoretical upper bound on lifetime.
  • the solution is domain-specific in several respects, the components are general. The elements that are most tailored to the problem are the selection of the discretisation range and the search heuristic.
  • the best-of-two policy was restricted to switch at most every five minutes, so that the best-of-two policy and the plan-based policy switched a similar number of times in an entire run.
  • Simulation results suggest that the plan-based policy should switch no more than about 20 times, but the experiments reveal that the noise in the sensor data leads to errors in the estimation of the state of charge which cause the policy to switch more frequently than anticipated. Frequent switching indicates that the policy is responding to spurious artifacts in the sensed data and to the variability in the real behaviour of the
  • the plan-based policy was applied every 36 seconds (0.01 hours), reflecting the granularity of the plans and learned policy.
  • An experiment was conducted in which the best-of-two policy was allowed to switch every 36 seconds, to ensure that the results obtained were not biased by offering the plan-based policy a faster reaction time, to changes in the battery state of charge, than best-of-two.
  • Figure 13 shows the best-of-two policy running on the second load profile.
  • the curves show the characteristic discharge/recovery pattern, separated by a step separation caused by the internal resistance of the battery (when the battery is recovering its voltage is open circuit, when it is loaded it is then reduced by the internal resistance).
  • the load and voltage curves for the red curve (battery B1 ) are fuzzy because there is more noise in the readings from these sensors than for the other battery. This phenomenon is consistently a problem for B1 and is not dependent on the battery, but appears to be a feature of the circuit itself.
  • FIG. 1 shows the behaviour of the plan-based policy running on the second load profile.
  • the top two curves represent the usage of the two batteries, B1 and B2.
  • Battery B1 (the red curve) is used for the first 10,000 half-seconds, then B2 is briefly used before the policy switches back to B1 until about half way through the run.
  • the two batteries are interleaved, and the rising curves of B1 correspond to the periods in which B2 is in use and B1 is resting.
  • the alternating load is represented by the bottom two curves. It can be seen that when the load changes, the measured voltage changes (the top curve registers a slight blip). This is because of the internal resistance which means that there is a lower voltage loss in the battery when the current changes. This would be expected to be about 34mV (if the internal resistance is 0.34 Ohms) because the difference in current is 0.1 A.
  • FIG 15 shows the best-of-two policy and the plan-based policy both being run on the second load profile side-by-side.
  • the red plots are B1 and green are B2.
  • the blue and purple points shows where B1/B2 serviced the load (and the value of the load) for best- of-two, while the black points, slightly displaced above these, show where B2 serviced the load under the plan-based policy (B1 serviced the load the rest of the time).
  • the voltage curves for the plan-based policy have been offset from curves for best-of-two so that they can be displayed on the same plot.
  • the labelling on the y-axis has been removed to avoid confusion.
  • the plan-based policy tends to use B1 first and B2 second, although not sequentially.
  • FIG. 16 shows a comparison of the plan-based policy working on the first and second load profiles. The performance of the policy on the first load profile is shown in the upper voltage curves and the upper load curves, while the curves for the second load profile have been displaced to differentiate them.
  • the plot highlights the similarity in the way the policy manages the batteries in each case: the general strategy is to run B1 until it is at the knee, resting it only briefly in this period, then oscillate between B1 and B2 at low frequency for a while, before entering a period in which B1 is rapidly switched with B2 as B1 converges on empty. The policy then finishes off with B2.
  • B1 is faced with heavier loads during the first part of the second profile, so it dies faster than in the first profile.
  • B2 faces a slightly less arduous time during the second half of the second profile and manages to last considerably longer.
  • the load in the interval 30000-33000 was a high load serviced by B2 in the first profile, while the same period happens to be a lower load in the second profile. This is a key reason why B2 dies faster in the first profile: its available charge is depleted in that period and there is no real opportunity to rest it after that point.
  • the final period of load in the first profile is a high load and that kills B2 quickly, while the final period of load in the second profile is a lower one. This allows B2 to recover some of its bound charge over that period, depleting its available charge more slowly and sustaining it a little longer in that critical period.
  • the upper policy execution switches frequently in the window between 41 ,000 and 43,000 half seconds, just before B1 dies.
  • the plan-based policy includes a default action to switch to the other battery to avoid the currently loaded battery dying prematurely.
  • the reason for this is to protect the batteries and the policy from the effects of errors in the sensor data that propagate into the state of charge model.
  • the effect of the default action in this case is to cause the policy to switch to B2 when B1 is almost out of charge, but back to B1 as soon as it has recovered enough to be able to be loaded once again (according to the state of charge model).
  • Figure 17 shows the policy for the first load profile again, this time plotted with the estimated available charge (based on the voltage readings and the voltage model).
  • the graph shows several important features.
  • the black crosshairs mark the estimated available charge (measured in 0.1 mAh units) for B1 and the grey crosshairs show it for B2.
  • the discontinuities are due to the changing load values. There should be no discontinuity, because the model adjusts for the load (using our estimated internal resistance), but it is clear that there is an additional effect here that cannot be captured this way.
  • the voltage-capacity model seems to be marginally less unstable for lower states of charge (the steps get slightly smaller in these cases for the black curve).
  • the available charge model breaks down in some situations (when the observations cannot be fitted consistently to the initial state assumed for the battery). This leads to some of the available charge values being negative (particularly in the 42000-45000 period). This causes the policy to revert to the default action, but the somewhat simplistic implementation of the default leads to the oscillation between batteries during this period.
  • Figure 18 shows the results obtained by draining the batteries in sequence, using the second load profile. This performance is optimal in terms of switching, but the lifetime achieved is much shorter than that achieved by the plan-based policy and similar to the lifetime of the best-of-2 for this case. The fact that best-of-2 does worse than sequential scheduling for this profile is probably due to variation in the battery behaviour. It seems likely that best-of-2 should perform more similarly to the results in the other load profiles.
  • plan-based policy of the present invention achieves a consistently longer lifetime than the bestof-two policy, with significantly reduced switching.
  • the results are summarised below: Load Plan-based Policy Best-of-two Sequential
  • the present invention may be embodied as an apparatus (including, for example, a system, machine, device, computer program product, and/or the like), as a method (including, for example, a business process, computer-implemented process, and/or the like), or as any combination of the foregoing.
  • Embodiments of the present invention are described above with reference to flowchart illustrations and/or block diagrams of such methods and apparatuses. It will be understood that blocks of the flowchart illustrations and/or block diagrams, and/or combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-executable program instructions (i.e., computer-executable program code).
  • These computer-executable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a particular machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create a mechanism for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • a processor may be "configured to" perform a certain function in a variety of ways, including, for example, by having one or more general-purpose circuits perform the function by executing one or more computer-executable program instructions embodied in a computer-readable medium, and/or by having one or more application- specific circuits perform the function.
  • may be stored or embodied in a computer-readable medium to form a computer program product that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block(s). Any combination of one or more computer-readable media/medium may be utilized.
  • a computer-readable storage medium may be any medium that can contain or store data, such as a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the computer-readable medium may be a transitory computer-readable medium or a non-transitory computer-readable medium.
  • a transitory computer-readable medium may be, for example, but not limited to, a propagation signal capable of carrying or otherwise communicating data, such as computer-executable program instructions.
  • a transitory computer-readable medium may include a propagated data signal with computer-executable program instructions embodied therein, for example, in base band or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof.
  • a transitory computer- readable medium may be any computer-readable medium that can contain, store, communicate, propagate, or transport program code for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied in a transitory computer-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, radio frequency (RF), etc.
  • RF radio frequency
  • a non-transitory computer-readable medium may be, for example, but not limited to, a tangible electronic, magnetic, optical, electromagnetic, infrared, or semiconductor storage system, apparatus, device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the non-transitory computer-readable medium would include, but is not limited to, the following: an electrical device having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
  • one or more computer-executable program instructions for carrying out operations of the present invention may include object-oriented, scripted, and/or unscripted programming languages, such as, for example, Java, Perl, Smalltalk, C++, SAS, SQL, Python, Objective C, and/or the like.
  • the one or more computer-executable program instructions for carrying out operations of embodiments of the present invention are written in conventional procedural programming languages, such as the "C" programming languages and/or similar programming languages.
  • the computer program instructions may alternatively or additionally be written in one or more multi-paradigm programming languages, such as, for example, F#.
  • the computer-executable program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operation area steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions/acts specified in the flowchart and/or block diagram block(s).
  • computer program implemented steps or acts may be combined with operator or human implemented steps or acts in order to carry out an embodiment of the invention.
  • Embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.), or an embodiment combining software and hardware aspects that may generally be referred to herein as a "module,” "application,” or "system.”

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

A method for determining a control policy for optimising performance of a system that has a variable input and multiple controllable components that can be switched between multiple different configurations, the method comprising: modelling the system; determining a distribution of possible system inputs; generating input samples based on those inputs; analysing the samples to provide possible solutions, each solution representing a system configuration taking into account one or more system constraints; constructing training data using the possible solutions and learning a control policy from the training data.

Description

Automated Construction of Usage Policies
Field of the Invention
The present invention relates to a system and method for optimising decision-making in a system that has multiple choices of action, for example a multiple battery system, where the choice is between which battery to load next, a traffic control system, where the choice is between alternative roads along which to direct traffic, and a navigation system where the choice is between alternative bearings and speeds. Background of the Invention
There is a growing number of systems that depend on batteries for power supply, ranging from small mobile devices to very large high-powered devices such as batteries used for local storage in electrical substations. In most of these systems, there are significant user-benefits or engineering reasons to base the supply on multiple batteries with load being switched between batteries by a control system. Due to the physical and chemical properties of batteries, it is possible to extract a greater proportion of the energy stored in a single battery of capacity C than of that stored in n batteries each of capacity C=n, for n > 1. The key to efficient use of multiple batteries lies in the design of effective policies for the management of the switching of load between them, to get as close as possible to this upper bound. Here, the load can be serviced entirely by a single battery at a time, so the problem is distinct from the management of cells within a single battery. An example of a context in which multiple battery use occurs is in laptops equipped with multiple battery bays. More efficient use of multiple batteries can be achieved by exploiting the phenomenon of recovery, which is a consequence of the chemical properties of a battery: as charge is drawn from a battery, the stored charge is released by a chemical reaction, which takes time to replenish the charge. In general, charge is drawn from a battery faster than the reaction can replenish it and this can lead to a battery appearing to become dead when, in fact, it still contains stored charge. By allowing the battery to rest, the reaction can replenish the charge and the battery become functional once again. Thus, efficient use of multiple batteries involves carefully timing the use and rest periods. This careful timing can best be achieved using combinatorial problem-solving technology that does not artificially discretise the problem or assume that the load profiles are known. In the paper "Scheduling battery usage in mobile systems" Very Large Scale Integration (VLSI) Systems, IEEE Transactions on 11(6): 136 - 1143, 2003, Benini et al present a solution to the multiple battery usage planning problem. To do this, a very accurate battery model is constructed and parameterised to capture lithium-ion, cadmium-nickel and lead-acid battery types, and it is shown how hand constructed policies can achieve efficiency, relative to a single battery, between 70% and 97.5%. The average performance is around 80%. To achieve this, the policy is constructed to select a new battery whenever the voltage of the battery currently servicing a load drops below a certain threshold. The next battery is selected according to one of four alternative policies: V_max: select the battery pack with highest state of charge; V_min: select the battery pack with lowest state of charge; T jnax: select the battery pack that has been unused for the longest time; T_min: select the battery that has been unused for the shortest time. The authors show that V_ max is the best of these policies, tested on up to four batteries.
In the paper "Maximizing system lifetime by battery scheduling" by Jongerden et al, (2009 Proceedings of the 39th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2009), 63-72), a model checking strategy is described, based on UPPAAL, to schedule battery use given a known load profile. The approach uses the Kinetic Battery Model. This is a non-linear continuous model and the authors treat it by discretisation and planning to a horizon. This allows them to find effective schedules, but it does not scale well because of the need to use a fine-grained discretisation of the temporal dimension. In deployed systems, the standard policies are typically static policies based on rapid switching between available batteries. In fact, an optimal use of multiple batteries can be achieved theoretically by switching between them at extremely high frequency, when the behaviour converges on that of a single battery. Unfortunately, this theoretical solution is not achievable in practice because of the losses in the physical process of switching between batteries, as the frequency increases. Round-robin (which is similar to T_max above) or best-of-n (which is similar to V_max above) policies applied at fixed frequencies are the most commonly fielded solutions, but these often achieve less than 80% efficiency. The efficiency of the use of multiple batteries can be assessed both by the relative lifetime compared with a single battery and by the number of switches required to achieve it.
Summary of the Invention According to the present invention, there is provided a method for optimising performance of a system comprising modelling the system behaviour; generating sample instances of the system behaviour; solving the sample instances; constructing training data and learning a policy from the training. Using the invention, measurable improvements can be observed in the performance of the system.
In this context, a sample is defined with reference to a problem, which is itself defined by a set of parameters whose values are determined by a collection of probability distributions. A sample is an instance of the problem where the parameters are chosen according to the underlying probability distributions.
Modelling system behaviour may be done using any suitable model. For a multiple battery system, the KiBaM and a specification of the distributions may be used. Modelling may be done in any language (for example PDDL+) amenable to solution by search-based approaches. The specification of load distributions can be any form that permits sampling.
Samples may be generated using Monte Carlo sampling of the distributions. Alternative sampling strategies are possible, including weighted sampling strategies to encourage particular parts of the solution space to be explored.
To solve instances, the known planner UPMurphi may be used taking the PDDL+ model as input, together with the individual samples (also converted to PDDL+). However, alternative planners could be used. The planner must be capable of solving the instances (so must have a way to handle non-linear continuous dynamics). The planner may use a dynamic (variable) discretise-and-validate cycle. Fixed discretisation is possible (though less efficient). A tailored search-based solver could be used.
Constructing the training data can be done by generating discretised state trajectories using a fixed discretisation. An alternative is to generate training data by perturbation of the solved instances. It is possible that a variable discretisation could be used.
To learn the policy from training, a classifier can be used. More generally, any machine- learning approach could be applied here, including neural networks, support vectors and so on. In the case of the multiple battery application, measurable performance improvements are achieved in increased battery lifetime and reduced switching between batteries, when compared with other load management techniques. Brief description of the Drawings
Various aspects of the invention will now be described by way of example only and with reference to the following drawings:
Figure 1 is a flow diagram of a method for constructing a policy for controlling a multi-component system;
Figure 2 is a flow diagram of a method for constructing a policy for controlling a multi-battery system;
Figure 3 is a schematic view of a planner for use in the method of Figure 2;
Figure 4 is a block diagram of a multi-battery system that has a controller for controlling operation of the batteries to optimise use;
Figure 5 is a flow diagram of a method for controlling the system of Figure 4;
Figure 6 is an illustration of the charge states of an individual battery;
Figure 7 is a simulation of a plan search with variable discretisation;
Figure 8 is a simulation of a load test for a system with two batteries of type B2;
Figure 9 shows simulated results for a load test for the same system used for Figure 8, but determined using V_max technique of the prior artl;
Figure 10 is a simulation of a policy for eight batteries with a stochastic load;
Figure 11 is a diagram of an experimental two battery system for conducting comparative tests;
Figure 12 shows a distribution of load durations used for experiments on the system of Figure 11 ;
Figure 3 is a run showing the behaviour of a best-of-two policy when applied to the two battery system of Figure 1 ;
Figure 14 is a run showing the behaviour of a plan-based policy in accordance with the invention when applied to the two battery system of Figure 11;
Figure 15 shows the best-of-two and the plan-based policy of the invention both running on the same load profile for the two battery system of Figure 11 ;
Figure 16 shows two executions of the plan-based policy on different load profiles for the two battery system of Figure 11 ;
Figure 17 shows the plan-based policy of the invention plotted with estimated available charge, and
Figure 18 shows the sequencing policy showing its shorter lifetime on load profile
2. Detailed description of the Invention
The present invention provides a method for getting as close as possible to optimal system behaviour, depending on system constraints. This involves modelling the system behaviour; generating sample instances of the system behaviour based on an estimate of system usage; solving the sample instances using the system model; constructing training data and learning a usage policy from the training data, as shown in Figure 1.
The invention will be described in the context of a method for optimising multiple battery performance, but could equally be applied to any system that has a choice of outputs, for example a navigation system in which multiple options exist in terms of direction, and the time taken to reach the destination, or the fuel consumed, is a measurable output. Another example is a traffic light system in which traffic lights can be switched according to a defined policy in order to optimise traffic flow. In this case, reduction in observable congestion is a measurable output.
Figure 2 shows a flow diagram for determining a policy for controlling a multiple battery system to optimise the battery lifetime. This involves determining the distribution of loads. This can be done by determining a probability distribution for each of waiting time between loads, length of load, and size of load - loads may overlap in some systems and not in others. Then samples from the load profiles are generated. These are solved, and the resultant solutions are processed into classification examples. Once this is done, a classifier is applied to learn an optimised policy (a decision tree). The samples may be solved using any suitable technique, for example using a planner. Figure 3 shows an example of a planner. This takes as input a model of the continuous dynamics of the batteries and a description of a specific problem (i.e. load profile). Using these, the planner constructs a search space to find a high quality solution by expanding the alternative action choices. Two important elements make this efficient: a dynamic discretisation - one action choice is to wait (while a load is serviced or until a load arrives). By using a dynamic discretisation, waiting times can be selected from a range of discrete choices, of decreasing sizes. The search heuristic used to guide the search through the space of alternatives avoids constructing large parts of the reachable space and makes the approach scalable.
Once the control policy is determined, it can be used to map states in the system, i.e. level of charge in the batteries and load, to actions, i.e. which battery to switch to. Typically, the control policy is provided in a controller that is software-based, although hardware such as FPGAs may be used. Since the structure of the decision tree is stateless, the controller could be implemented without a memory. The controller may include or be able to communicate with sensors associated with the batteries and the load, so that both can be monitored. Equally, the controller may include or be connected to a switch, for example a simple multiplexing switch, that is operable to select the battery of interest in response to a signal from the controller, as shown in Figure 4.
Figure 5 shows the steps taken by the policy in use. The policy samples the states of the batteries and the load. Based on this, the decision tree is used to identify the battery that has to be used to optimise performance. If necessary, a control signal is generated to cause switching to another battery. This process is repeated at an appropriate frequency. The frequency may be determined by, for example, hardware limitations and/or load profile characteristics - a high frequency load profile demands similarly high frequency responsiveness. However, in most applications a process cycle of around 1 Hz is sufficient.
There are numerous ways to implement the invention. In one embodiment, to model the behaviour of a multiple battery system, the known Kinetic Battery Model (KiBaM) model is used. This is a deterministic non-linear continuous model of battery performance. The KiBaM is described in "Extension of the kinetic battery model for wind/hybrid power systems" by Manwell, J., and McGowan, J. 1994, Proceedings of the 5th European Wind Energy Association Conference (EWEC 1994), 284-289, the contents of which are incorporated herein by reference. This model is based on the assumption that the battery charge is distributed over two wells: the available-charge well and the bound- charge well, as shown in Figure 6. A fraction c of the total charge is stored in the available-charge well and a fraction ( -c) in the bound-charge well. The available-charge well supplies electrons directly to the load (i(t)), where t denotes the time, whereas the bound-charge well supplies electrons only to the available-charge well. The charge flows from the bound charge well to the available-charge well through a "valve" with fixed conductance, k. When a load is applied to the battery, the available charge reduces, and the height difference between the two wells grows. When the load is removed, charge flows from the bound-charge well to the available charge well until the heights are equal again.
The KiBaM lends itself, in principle, to use in an optimisation problem solver that can find the best battery usage plan, given a load profile. However, in most real battery usage problems the load profile is generated by external processes, typically controlled directly or indirectly by user demands. These demands can often be modelled probabilistically, reflecting typical patterns of use. It is assumed that the profiles are drawn from a known distribution. Consequently, the planning problem ceases to be deterministic and becomes a probabilistic planning problem.
The problem can be cast as a hybrid temporal Markov Decision Process, in which the states are characterised by the states of charge of the batteries, the current load and the currently active battery. Battery switching actions are deterministic, but the events that cause load to change are not. The time between events is also governed by a stochastic process, but the timing of switching actions is controllable. More formally, for a problem with n batteries, a state is characterised by the tuple: (sb-r, sai; sb2; sa2 sbn; san; S; t;
L) where sty is the bound charge in battery i, saj is the available charge in battery i, B is the number of the battery currently servicing load (where B is between 1 and n), t is the time of the state and L is the current load. Out of each state there is a deterministic action, Use B which causes a transition to the state (sb,; sa^- sb2; sa2 sb„; san; B'; t;L), in which battery B' is the battery servicing load. There is also a non-deterministic action, wait( ), which causes a transition to a state in which time has advanced to time t', which is less than or equal to (t + T), the state of charge of battery B is updated according to the battery model and the load might be different (according to the probability distribution governing loads). The interpretation of the action is that it advances time to the next event, which will be when a battery is depleted of available charge, or when the load changes, or when T time has passed, whichever is first. To solve the MDP, a variant of hindsight optimisation is used, in which a deterministic sampled problem is solved using an optimising solver to generate an ideal trajectory for the problem instance. Using a collection of such samples as a base, a classifier is learned that characterises the policy for the part of the space sampled. This approach works well when the policy structure is less complex to represent than the value function for the problem space. To extend the learned classifier to a complete policy involves ensuring that an action is assigned to every possible state. This can be achieved by adding some default behaviour to cover states that are otherwise not handled by the classifier, or else by managing run-time errors in the use of an incomplete policy in a way appropriate to the application.
More specifically, the MDP is solved using an approach based on a combination of two ideas. Firstly, sampling from the distribution of loads to arrive at a deterministic problem, which is then solved using the continuous KiBaM as the battery model. This leads to a continuous non-linear optimisation problem, which is solved using a "discretise and validate" approach. Secondly, the solutions to the sample problem instances are combined to arrive at a policy for the MDP from which the problems are drawn. The approach is domain-specific in some respects: the discretisation scheme, while based on general principles, is selected for the problem domain and load distribution. A search heuristic is used that, while generic, is not suited to all problems. The aggregation of solutions into a policy makes use of an entirely general approach, but the extent to which the approach yields good policies depends on the nature of the problem space in which it is applied.
The multiple battery management problem can be considered an optimisation problem, when faced with a known and deterministic load profile. To solve this, dynamics of KiBaM are identified. This can be done using PDDL+ ( M. Fox and D. Long, "Modelling mixed discrete-continuous domains for planning" in Journal of Artificial Intelligence Research vol. 27, pages 235-297, 2006, the contents of which are incorporated herein by reference), which is an extension of the standard planning domain modelling language, PDDL, to capture continuous processes and events. There are two battery processes: consume and recover. These govern the behaviour of cells and the event triggered by attempting to load a cell once its available charge is exhausted. In addition, there is a durative action of variable duration that allows the planner to use a cell over an interval. This action can be applied at any time and terminated after any positive duration, granting a planner complete freedom over the scheduling of cells in the battery. A specific problem instance lists the available cells and their characteristics, together with a collection of loads as timed initial literals. The goal is to service all the loads and this is expressed by ensuring that each active cell increments the number of services currently available. An event is triggered if there is ever a positive load and no active service. This mechanism allows the planner to switch cells during a single load period. The use of PDDL+ as the modelling language grants several benefits. Firstly, it allows the use of VAL (R. Howey, D. Long and M. Fox, "VAL: Automatic plan validation, continuous effects and mixed initiative planning using PDDL" in Proc. of International Conference on Tools with Al (ICTAI), 294-301 , 2004, the contents of which are incorporated herein by reference) to validate solutions analytically against the continuous model, allowing a means to confirm that the discretisation used during construction of solutions does not compromise the correctness of the plan. Secondly, it provides a semantics for the model in terms of a timed hybrid automaton (following from M. Fox and D. Long, "Modelling mixed discrete-continuous domains for planning" in Journal of Artificial Intelligence Research vol. 27, pages 235-297, 2006, the contents of which are incorporated herein by reference). Finally, existing tools that construct and search in spaces defined by PDDL+ models can be used, such as UPMurphi ( G. Delia Penna, B. Intrigila, D. Magazzeni and F. ercorio, "UPMurphi: A tool for universal planning on PDDL+ problems" in Proc. of 19th International Conference on Automated Planning and Scheduling (ICAPS) 106-113, 2009, the contents of which are incorporated herein by reference). UPMurphi uses a discretise and validate approach to construct and search state spaces for PDDL+ problems. In the discretise and validate approach the continuous dynamics of the problem are relaxed into a discretised model, where discrete time steps and corresponding step functions for resource values are used in place of the original continuous dynamics. This relaxed problem is solved using a forward reachability analysis and then solutions are validated against the continuous model using the validator, VAL (R. Howey, D. Long and M. Fox, "VAL: Automatic plan validation, continuous effects and mixed initiative planning using PDDL" in Proc. of International Conference on Tools with Al (ICTAI), 294-301, 2004, the contents of which are incorporated herein by reference). The validation process is used to identify whether a finer discretisation is required and guide remodelling of the relaxed problem. VAL provides analytic solutions to differential equations involved in the models.
Planning-as-model-checking is a well-researched theme ( A. Cimatti, F. Giunchiglia, E. Giunchiglia and P. Traverso, "Planning via Model Checking: A Decision Procedure for R" in Proc. of 4th European Conference on Planning (ECP), 1997, the contents of which are incorporated herein by reference; S. Edelkamp, "Taming numbers and durations in the model checking integrated planning system" in Journal of Artificial Intelligence Research Special issue on the 3rd International Planning Competition, volume 20, 2002, the contents of which are incorporated herein by reference). The key to this approach is the observation that a proof of state reachability corresponds to a plan. Unlike classical model checking, where the problem usually involves exploring all paths to make guarantees about behaviour, in planning it is only necessary to find a single trajectory to prove plan existence. The formal statement of the deterministic version of the problem is as follows: a hybrid system is a system whose state description involves continuous as well as discrete variables. In order to apply model checking algorithms and exploit reachability analysis, the system should have a finite number of states. Therefore the system is approximated by discretising the continuous components of the state (which is assumed to be bounded) and their dynamic behaviours. Definition 1 : Finite State Temporal System (FSTS) S is a 5-tuple (S, s0, A, D, F), where: S is a finite set of states, s0 e S is the initial state, A is a finite set of actions, D is a finite set of durations and F : S x A x D x S - {0,1} is the transition function, i.e. F(s, a, d, s') = 1 iff the system can reach state s' from state s via action a having a duration d. The state s' s.t. F(s, a, d, s') = 1 is denoted by F(s, a, d). For each state s e S, we also define the set EnAct(s)= {a <= A3d e D : F(s, a,d) e S } as the set of all the actions enabled at state s. In an FSTS, each state s e S is assumed to contain a special temporal variable t denoting the time elapsed in the current path from the initial state to s. In the following, the notation t(s) is used for the value of variable t in state s. For all s,, Sj e S, such that F(s,, a, d, = 1 , t(sj) = t(s,) + d.
Definition 2: A trajectory in the FSTS, S = (S; s0;A;D; F) is a sequence π - s0aodoSiaidiS2a2d2 s„ where, Vi > 0,5,. <= S is a state, ai e A is an action, d, e D is a duration, and F(Si, ai, d,, Sj+1) = 1. If π is a trajectory, r s(k), ^ a(k) and ?r d(k) to denote the state Sk, the action and the duration dk, respectively. Finally, the length of 7r is denoted withj/r | , given by the number of actions in the trajectory, and with π the duration of π , i.e. π
Figure imgf000012_0001
To define the planning problem for such a system, it is assumed that a set of goal states Gc S has been specified. Moreover, to have a finite state system, a finite temporal horizon, T, is fixed and a plan is needed to reach the goal within time T. In the case of the battery usage planning problem, this horizon is important because it represents the target duration for the service provided by the battery.
Definition 3: (Planning Problem on FSTS) Let S = (S; s0; A; D; F) be an FSTS. Then, a planning problem (PP) is a triple P = (S; G; 7) where 6 c S is the set of the goal states and T is the finite temporal horizon. A solution for P is a trajectory π * in S s.t. | = n,
Figure imgf000012_0002
π <T, ^- *(0) = s0 and /r*(n) e G. The constraints added to the temporal planning problem are parameterised and can be iteratively relaxed in order to explore successively larger spaces for plans. A finite collection of possible durations for segments of processes (definition 2) is used. This set can be refined by the addition of smaller durations if successive searches fail to find a solution. Allowing different durations within the same search enables the model-checker to construct states that interact with executing processes at different time points, while stepping quickly along the timeline where there are no interesting features. To use variable discretisation efficiently, search space redundancy is eliminated by disallowing the use of long duration actions immediately following short duration actions.
Long duration actions can only be used if an event or other action has intervened since the last short action in the family. The repeated consecutive use of short duration actions is disallowed beyond the accumulated duration of the next longer duration action. The longest duration action can be repeated arbitrarily often. Provided an appropriate family of durations is used, this allows the space of reachable states to remain equivalent to the relevant subset of the reachable states using uniform discretisation with the smallest duration.
The battery domain has an important property that supports a simple heuristic evaluation function for states: the charge in the battery monotonica!!y decreases over time and the optimal solution is the one that gives the longest possible plan. An upper bound on the duration of the solution can be found using the observation that the optimal duration cannot exceed that of a single battery with combined capacity equal to the sum of the capacities of the multiple batteries (assuming the same discharging and flow behaviours).
Once there is a horizon, a discretised search space is constructed and searched. To make this approach practical, an informed heuristic is needed to search the space. For this domain, duration of the plan to the current state plus total remaining charge is admissible, but uninformative, but duration plus total available charge is highly informative. This is equivalent to minimising the total bound charge. A best-first enumeration of the reachable states is used to efficiently explore the reachable space. This heuristic is suitable for a class of domains: any domain where there is a monotonically decreasing resource, and the longest plan is required, a heuristic that sums plan duration with available resource will be informative. Such a heuristic is admissible if the resource units can be spent to extend the total execution time at a fixed and constant rate at any given moment during the execution of a plan. The range of differently sized duration intervals can lead to significant benefits in the size of the set of visited nodes in the search space, compared with using a fixed duration increment. Consider the load profile shown at the top of Figure 7. The planning problem for two cells is defined according to definitions 1 and 3, with G = {s e
Figure imgf000014_0001
= 2.42}, i.e. the goal is to service the whole load profile. The temporal horizon T is set to the duration of the profile as well. The definition of the FSTS is straightforward: the set of actions is A = {useCl,useC2,wait} waiting where the former actions refer to the cell being used while the latter one is applicable when there is no active service. The set of durations used for this example D = {0.01, 0.4, 0.5, 1.0} (measured in minutes). In practice, the set of durations is a systematically scaled set, with an initial collection determined by a manual selection based on the known properties of the load profile. A small duration is included in order to handle very sensitive interactions. Should this set prove inadequate, iterative refinement of the discretisation adds successively smaller durations to the set. In the initial state s0 there is no load and no active service and both cells have a limited initial capacity.
In the example shown in Figure 7, the plan search with variable discretisation proceeds as follows:
No cell is used for a period of 1 minute (when the load is idle). The corresponding transition is shown in Figure 7.
After one minute a load is applied and cell 1 is used. This corresponds to transition < s-,, useC1 , 1.0, s2 >. However, due to their limited capacity, cells cannot be used continuously for one minute (these loads are very high for cells of this capacity). The transition is thus not valid and a shorter duration has to be considered.
Cell 1 is used for 0.5 minute. Then, since a load is still applied, the second cell is used. As before, the transition < s2, useC2, 1.0, s3 > can be considered, but in this case there would be an active service and no load.
Cell 2 is used for 0.5 minute. In the next period, no load is applied, then no cell is used. The transition < s3, wait, 0.5, s4 > is considered, but it would lead to a positive load and no active service, so the duration of action wait has to be reduced to 0.4. 5. To service the last load period of 0.02 minute, cell 1 could be used. However, the remaining charge in cell 1 allows it to service only 0.01 minute. So, finally, cell 2 is used until the end of the load profile. The validity of a transition is dynamically checked during the search since invalid transitions trigger specific events (e.g. event cellDead is triggered at step 2 and event disaster is triggered at step 4) which, in turn, violates the invariant conditions of corresponding actions (a cell must not die during use). Moreover, with variable discretisation only six states have to be visited in order to reach the goal, while using a uniform discretisation, it is necessary to explore at least 242 states. A further benefit of using differently sized durations in the discretisation is that favouring longer durations reduces the number of switches in the solutions generated, leading to solutions that are better in practical terms than those based on a high frequency switching between batteries.
A first set of experimental results show the performance of the solver on the deterministic battery usage optimisation problem. The same case study as proposed in M. Jongerden, B. Haverkort and J-P. Katoen, "Maximizing system lifetime by battery scheduling" in Proc. of 39th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2009), 68-72, 2009, the contents of which are incorporated herein by reference, where two types of jobs are considered, a low current job (250 mA) and a high current job (500 mA), according to the following load profiles:
• continuous loads: one load with only low current jobs (CL_250), one with only high current jobs (CL_500) and one alternating between a low current job and a high current job (CL_alt);
• intermittent loads with short idle periods of one minute between the jobs: one with only low current jobs (ILs_250), one with only high current jobs (ILs_500), and one alternating between a low current job and a high current job (ILs_alt);
• intermittent loads with long idle periods of two minutes between the jobs: one with only low current jobs (ILI_250) and one with only high current jobs (ILI_500).
As a first step, these load profiles were used to validate the variable-range discretisation KiBaM model (DD- iBaM), and to find an appropriate discretisation for the continuous variables involved in the system dynamics (i.e. variables γ and δ and process durations). To do this, VAL was used to validate solutions for the discretised model against the continuous model, using single cell batteries. Two battery types were considered, one with capacity 5.5 Ahr (B-i) and one with capacity 11 Ahr (B2). Both battery types have the same parameters: c = 0.166 and k' = 0.122h'1. By discretising γ and δ, rounding them to 0. 0001 , and, for all the load profiles above and for both battery types, the same lifetimes were obtained computed with the original KiBaM and validated in (M. Jongerden and B. Haverkort, "Battery modeling", Technical Report T -CTIT-08-01, Centre for Telematics and Information Technology, University of Twente, the contents of which are incorporated herein by reference).
To generate the scheduling plans for multicell batteries, the approach described above was used with the set of durations D = {0.01, 0.02, 0.05, 0.1 , 0.25, 0.5, 1.0}. The solutions were compared to those obtained using the UPPAAL-based approach. The resulting lifetimes are shown in Table 1 :
load best-of-two UPPAAL- iBaM DHA-KiBaM 8 batteries ¾
profile lifetime lifetime lifetime (visited states) lifetime (number of switches)
Bi B J best-of-8 DHA OnLine
CLJ250 12.16 46.92 12.04 N/A 1114 (194) 46.91 (691) 252.38 (17,227) 239.77 (442) 239.77 (787)
CLJSOO 4.59 12.16 4.58 N/A 4.59 (116) 12.14 (194) 263.17 (16.640) 250.02 (424) 250.02 (852)
C alt 7.03 21.26 6.48 N/A 7.03 (136 ) 21.2 (350) 257.29 (17,410) 244.43 (406) 244.43 (855)
IU.250 44.79 132.76 40.80 N/A 44.76 (552) 132.73 (1068) 256.4 (17,105) 243.58 (387) 243.58 (730)
ILs.500 10.82 44.79 10.48 N/A 10.8 (131) 44.76 (552) 264.42 (17,374) 251.20 (422) 251.20 (931)
!Ls-att 16.95 72.75 16.91 N/A 16.92 (159) 72.55 (599) 265.95 (17,272) 252.66 (389) 252.66 (882)
IU.250 84.91 216.91 78.96 N/A 84.88 (488) 216.88 (1123) 266.8 (17,233) 253.46 (401) 253.46 (853)
IU.S00 21.86 84.91 18.68 N/A 21.85 (173) 84.88 (488) 265.41 (17,149) 252.14 (433) 252.14 (700)
Table 1 : System lifetime (in minutes) for all load profiles according to different battery usages
Here the second column shows the theoretical upper bound given by an extremely high- frequency switching. In all load profiles considered, the approach of the invention outperformed significantly the UPPAAL-based one, providing solutions that achieve more than 99% efficiency compared with the theoretical limit.
By using variable discretisation, it is possible to consider a much finer discretisation for variables than is used by Jongerden et al in "Maximizing system lifetime by battery scheduling" in Proc. of 39th Annual IEEE/IFIP International Congerence on Dependable Systems and Networks (DSN 2009), 68-72, 2009, the contents of which are incorporated herein by reference and to handle very sensitive interactions. This is important, particularly when the available charge in the cells is almost exhausted. Jongerden et al (2009) describe their plans as optimal, but it is important to note that this is only with respect to the discretisation that they use; a finer-grained discretisation offers the opportunity for a higher quality solution to be found at the cost of a much larger state space. Despite the very large state space the model creates, the solver visits a very small collection of states. These problems are all solved in less than a second. Moreover, when dealing with larger batteries of type B2, the state space becomes so large that any exhaustive approach is infeasible. High quality solutions for batteries of type B2 were found (an example is shown in Figure 8 compared with the V_max solution, which is shown in Figure 9, illustrating the huge improvement that can be obtained over this policy). Results of performance on an 8 battery system (see Figure 10), show that the method of the invention can scale effectively to much larger problems. The number of switches used to produce the results is very significantly smaller than the best-of-8 policy. However, the resulting solutions achieve more than 99% efficiency. The final column, labelled DDPolicy, shows the performance of the policies applied to these load profiles. These generate slightly worse performance in switches (although still much better than competing approaches), but maintain the lifetime performance. This will be described in more detail later.
Having shown how to generate high quality plans for deterministic multiple battery management problems, a battery switching policy can be determined. In general, the load profile applied to the batteries is not known in advance, but it is assumed that a probability distribution characterising typical use of the batteries is available. The policy is a function that determines, which battery to use when load must be serviced, using the current states of charge of the available batteries as the basis for making the decision. One way to approach this problem is to see the mapping as a classification, where the state of the batteries is mapped to a class corresponding to the correct choice of battery. The solutions to the determinised problems can be used as the basis of a classifier construction problem. An existing machine learning approach can be used to build a good classifier.
The successful construction of a classifier depends on there being exploitable structure in the space defined by the solutions to the determinised problems. The states are described by continuous variables: these are discretised for the purpose of building the classifier. The solution generally does not cover the whole space of reachable states, so it is important that the policy is completed with a sensible default rule when the input state is too distant from any of the previously encountered states. The best default rule is a best-of-n rule, which is the best of the published hand-constructed policies for this problem. Deployment of the final policies requires that they can be efficiently implemented in cheap hardware. Simple classifier rule systems can be very effectively implemented in look-up tables, which are ideal for implementation on Field Programmable Gate Arrays (FPGAs) or as purpose-built hardware.
WEKA ( . Hall, E. Frank, G. Holmes, G. Pfahringer, P. Reutemann and I.H. itten, "The WEKA data mining software: An update" in SIGKKD Explorations volume 11(1), 2009, the contents of which are incorporated herein by reference) is a machine learning framework, developed at the University of Waikato, that provides a set of classification and clustering algorithms for data-mining tasks. WEKA takes as input a training set, comprising a list of instances sharing a set of attributes. In order to perform the classification on the battery usage problem data, instances of the following form are considered: τ = (σ,, γ,, σΝ, γΝ, Β, L) where σ„ and ^ denote the available charge and total charge of the Λη battery, respectively, B is the currently active battery and L is the current load. In this setting, the attribute used as the class is the battery B.
The stochastic load profiles have been defined with a distribution of:
• the load amplitude / e [100 750] mA
• the load/idle period duration d e [0.1 5] min
• the load frequency f e [0.3 0.7].
This leads to load profiles that are very irregular (see the bottom of Figure 9) and therefore harder to handle than the very regular profiles considered by Jongerden et al (2009). A set of stochastic load profiles was generated and for each a near-optimal policy was created using the deterministic solving described above. The resultant set of policies was used as the training set for the classification process. In order to select the most suitable classification algorithm, all the classifiers provided by WEKA were applied to a data set of 10,000 training examples. Their performance was evaluated as the number of correctly classified instances during the cross-validation. Classifiers providing less than 70% correctness were discarded. The memory and the time required to use the classifier were then considered. The output of the classification process is a model encoding the resulting decision tree. In some cases, the generated model requires significant memory to store (more than 500Mb of RAM memory), or it is too slow to be used. These parameters have also been used to determine the number of training examples to classify, as the bigger the training set, the better the performance and the higher the memory and time requirements.
According to these criteria, the J48 classifier, which implements the machine learning algorithm C4.5, was selected. The output is a decision tree whose leaves represent, in our case study, the battery to be used. This classifier resulted very well suited to deal with this task as 99% of instances have been correctly classified during the cross- validation/ performed by WEKA. For the cardinality of the training set, an empirical evaluation showed that the best result is obtained using 250,000 training examples (note that this involves considering about 4. 08 real values) since further extending the training set does not make any significant improvement in the performance but increases memory and time requirements.
In order to use the decision tree, the WEKA classes were embedded for loading the classification model into a battery simulation framework. The model for the 8 battery case is represented by a tree with 61 levels and consists of 7645 nodes. Applying the decision tree to determine which battery to load at each decision point takes negligible time.. To evaluate the performance of the policy four probability distributions were considered with different average value for the load amplitude, namely 100, 250, 500, 750 mA. For each distribution 100 stochastic load profiles were generated and the policy was used to service them. load best-of-8 best-of-8 DD-Policy DD-Policy profile time sw time sw
R100 792.6(15.5) 71383(1379) 786.2(15.4) 1667(16 )
R250 369.8( S1) 28952(853, 366.7(202) 1518(H3)
R500 226.7(2.13) 14671(512J 224.6(2.27) 987(122,
R750 188.3(0.8) 11519(463) 186.4(0.T) 302(33, Table 2: Average system lifetime and number of switches for stochastic load profiles for eight battery systems.
Table 2 shows the average value and had standard deviation for the system lifetime and the number of switches obtained using the best-of-8 policy at high frequency switching, compared with our policy. The policy of the invention achieved on average 99% efficiency compared with the theoretical upper bound given by the best-of-8 policy executed at very high frequency (recall that this is infeasible in practice). The number of switches invoked by the policy is slightly greater than in the corresponding deterministic solving, but it remains one or two orders of magnitude smaller than the number invoked by the best-of-8 policy and is completely acceptable.
The present invention adapts several existing technologies from automated planning to solve a problem that can be seen as a DP. A form of hindsight optimisation is used to generate samples of determinised load profiles and these problems are solved using an optimal deterministic solver, before combining the solutions to form a policy. The policy construction approach adapts the use of machine learning to construct a classifier. In the construction of high quality solutions to deterministic problems, variable-range discretisation is used to solve a non-linear continuous optimisation problem with very high accuracy, while exploring a very small proportion of the state space. This approach is scalable and effective, and incurs much lower mechanical wear than would be required to achieve the theoretical upper bound on lifetime. Although the solution is domain-specific in several respects, the components are general. The elements that are most tailored to the problem are the selection of the discretisation range and the search heuristic.
To test the real life effectiveness of the plan based policy of the present invention, three sets of experiments were done on an apparatus consisting of two Ritar 6V batteries connected to a circuit, as shown in Figure 11. An Arduino microprocessor controller was mounted on the circuit to allow access to the input and output pins. The Arduino is a standard electronic prototyping platform well known in the art. Each of the two-battery experiments took over eleven hours to drain the batteries. In the case of a two-battery setup, sequencing involves only one switch (the minimum number of switches possible in the two battery case). Thirteen experiments were run in total. In all of the plots showing battery voltages during these experiments, the last lowest point on the battery voltage curves (the red and green curves) are the points at which the corresponding battery died, When performing the experiments it was noticed that the Arduino distorts all measured values: time and voltages, and therefore amps and internal resistance. Its distortions appear consistent across all experiments, resulting in systematic eaor. In particular, all of the times measured suggest that the Arduino measures 1 hour every 1.4 hours of real time, so a 7 or 8 hour lifetime measured by the Arduino is actually approximately 10 to 11 hours of real time. All data values come directly from the Arduino measurements, unadjusted for the systematic errors. Hence, the lifetime values are likely to be considerably longer when measured in "real" time.
Six different load profiles were randomly, drawn from the same distribution used to train the policy, each alternating between 0.2 and 0.3 Amps and having intervals of constant load of durations that are distributed around 30 minutes with a distribution as shown in Figure 12. For each load profile, a best-of-two and the plan-based policy of the invention were run so that a direct comparison of lifetime achieved could be made.
For the first two load profiles, the best-of-two policy was restricted to switch at most every five minutes, so that the best-of-two policy and the plan-based policy switched a similar number of times in an entire run. Simulation results suggest that the plan-based policy should switch no more than about 20 times, but the experiments reveal that the noise in the sensor data leads to errors in the estimation of the state of charge which cause the policy to switch more frequently than anticipated. Frequent switching indicates that the policy is responding to spurious artifacts in the sensed data and to the variability in the real behaviour of the
batteries.
The plan-based policy was applied every 36 seconds (0.01 hours), reflecting the granularity of the plans and learned policy. An experiment was conducted in which the best-of-two policy was allowed to switch every 36 seconds, to ensure that the results obtained were not biased by offering the plan-based policy a faster reaction time, to changes in the battery state of charge, than best-of-two.
Figure 13 shows the best-of-two policy running on the second load profile. The curves show the characteristic discharge/recovery pattern, separated by a step separation caused by the internal resistance of the battery (when the battery is recovering its voltage is open circuit, when it is loaded it is then reduced by the internal resistance). The load and voltage curves for the red curve (battery B1 ) are fuzzy because there is more noise in the readings from these sensors than for the other battery. This phenomenon is consistently a problem for B1 and is not dependent on the battery, but appears to be a feature of the circuit itself. The strange striations for the green (B2) curve at the start of the graph are due to a failure of the Arduino to correctly capture the battery voltage over this period, but it does not affect the performance of the policy (simple fail safes are in place to ensure that spurious data of this sort do not affect our performance). Figure 1 shows the behaviour of the plan-based policy running on the second load profile. The top two curves represent the usage of the two batteries, B1 and B2. Battery B1 (the red curve) is used for the first 10,000 half-seconds, then B2 is briefly used before the policy switches back to B1 until about half way through the run. In the second half of the graph, the two batteries are interleaved, and the rising curves of B1 correspond to the periods in which B2 is in use and B1 is resting. The alternating load is represented by the bottom two curves. It can be seen that when the load changes, the measured voltage changes (the top curve registers a slight blip). This is because of the internal resistance which means that there is a lower voltage loss in the battery when the current changes. This would be expected to be about 34mV (if the internal resistance is 0.34 Ohms) because the difference in current is 0.1 A. It is actually higher than that, but this appears to be because there is a slight over-reaction to changes in the load, causing the battery voltage to drop sharply when the battery is first loaded, and then pull back, while the battery tends to recover sharply, and then fall back in line, when its load is reduced. Figure 15 shows the best-of-two policy and the plan-based policy both being run on the second load profile side-by-side. The red plots are B1 and green are B2. The blue and purple points shows where B1/B2 serviced the load (and the value of the load) for best- of-two, while the black points, slightly displaced above these, show where B2 serviced the load under the plan-based policy (B1 serviced the load the rest of the time). The voltage curves for the plan-based policy have been offset from curves for best-of-two so that they can be displayed on the same plot. The labelling on the y-axis has been removed to avoid confusion. There are three interesting features:
1. The plan-based policy tends to use B1 first and B2 second, although not sequentially.
2. The plan-based policy runs for longer, demonstrating that increased lifetime is achieved. 3. Best-of-two essentially alternates between the batteries (minor variations are due to slight discrepancies in the batteries and other factors). Figure 16 shows a comparison of the plan-based policy working on the first and second load profiles. The performance of the policy on the first load profile is shown in the upper voltage curves and the upper load curves, while the curves for the second load profile have been displaced to differentiate them. The plot highlights the similarity in the way the policy manages the batteries in each case: the general strategy is to run B1 until it is at the knee, resting it only briefly in this period, then oscillate between B1 and B2 at low frequency for a while, before entering a period in which B1 is rapidly switched with B2 as B1 converges on empty. The policy then finishes off with B2.
An interesting difference is a consequence of the (random) loads: B1 is faced with heavier loads during the first part of the second profile, so it dies faster than in the first profile. However, B2 faces a slightly less arduous time during the second half of the second profile and manages to last considerably longer. In particular, the load in the interval 30000-33000 was a high load serviced by B2 in the first profile, while the same period happens to be a lower load in the second profile. This is a key reason why B2 dies faster in the first profile: its available charge is depleted in that period and there is no real opportunity to rest it after that point. The final period of load in the first profile is a high load and that kills B2 quickly, while the final period of load in the second profile is a lower one. This allows B2 to recover some of its bound charge over that period, depleting its available charge more slowly and sustaining it a little longer in that critical period.
In Figure 16 the upper policy execution switches frequently in the window between 41 ,000 and 43,000 half seconds, just before B1 dies. This is because the plan-based policy includes a default action to switch to the other battery to avoid the currently loaded battery dying prematurely. The reason for this is to protect the batteries and the policy from the effects of errors in the sensor data that propagate into the state of charge model. The effect of the default action in this case is to cause the policy to switch to B2 when B1 is almost out of charge, but back to B1 as soon as it has recovered enough to be able to be loaded once again (according to the state of charge model).
Figure 17 shows the policy for the first load profile again, this time plotted with the estimated available charge (based on the voltage readings and the voltage model). The graph shows several important features. The black crosshairs mark the estimated available charge (measured in 0.1 mAh units) for B1 and the grey crosshairs show it for B2. The discontinuities are due to the changing load values. There should be no discontinuity, because the model adjusts for the load (using our estimated internal resistance), but it is clear that there is an additional effect here that cannot be captured this way. As already mentioned, it is also the case that the discrepancy between battery terminal voltage readings for the different loads should be 0.1 A x 0.34Ohms = 34mV, where 0.1 A is the difference in load and 0.34 is the internal resistance, but the graph shows differences that are much greater. This effect appears to worsen as the battery discharges (see the widening gaps between the loaded and unloaded voltages recorded for the batteries in the red/green curves— particularly for the red curve). However, interestingly, the voltage-capacity model seems to be marginally less unstable for lower states of charge (the steps get slightly smaller in these cases for the black curve). As can also be seen, the available charge model breaks down in some situations (when the observations cannot be fitted consistently to the initial state assumed for the battery). This leads to some of the available charge values being negative (particularly in the 42000-45000 period). This causes the policy to revert to the default action, but the somewhat simplistic implementation of the default leads to the oscillation between batteries during this period.
Figure 18 shows the results obtained by draining the batteries in sequence, using the second load profile. This performance is optimal in terms of switching, but the lifetime achieved is much shorter than that achieved by the plan-based policy and similar to the lifetime of the best-of-2 for this case. The fact that best-of-2 does worse than sequential scheduling for this profile is probably due to variation in the battery behaviour. It seems likely that best-of-2 should perform more similarly to the results in the other load profiles.
The plan-based policy of the present invention achieves a consistently longer lifetime than the bestof-two policy, with significantly reduced switching. The results are summarised below: Load Plan-based Policy Best-of-two Sequential
Profile Lifetime Switches Lifetime Switches Lifetime Switches
1 7.887 71 7.534 73 — —
2 8.033 47 7.000 81 7.079 1
3 7.974 91 7.563 705 - -
4 7.831 158 6.998 701 — -
5 8.056 1 1 7.749 787 - -
6 7.120 36 7.085 706 — —
As will be appreciated by one of ordinary skill in the art in view of this disclosure, the present invention may be embodied as an apparatus (including, for example, a system, machine, device, computer program product, and/or the like), as a method (including, for example, a business process, computer-implemented process, and/or the like), or as any combination of the foregoing. Embodiments of the present invention are described above with reference to flowchart illustrations and/or block diagrams of such methods and apparatuses. It will be understood that blocks of the flowchart illustrations and/or block diagrams, and/or combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-executable program instructions (i.e., computer-executable program code). These computer-executable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a particular machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create a mechanism for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. As used herein, a processor may be "configured to" perform a certain function in a variety of ways, including, for example, by having one or more general-purpose circuits perform the function by executing one or more computer-executable program instructions embodied in a computer-readable medium, and/or by having one or more application- specific circuits perform the function.
These computer-executable program instructions may be stored or embodied in a computer-readable medium to form a computer program product that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block(s). Any combination of one or more computer-readable media/medium may be utilized. In the context of this document, a computer-readable storage medium may be any medium that can contain or store data, such as a program for use by or in connection with an instruction execution system, apparatus, or device. The computer-readable medium may be a transitory computer-readable medium or a non-transitory computer-readable medium.
A transitory computer-readable medium may be, for example, but not limited to, a propagation signal capable of carrying or otherwise communicating data, such as computer-executable program instructions. For example, a transitory computer-readable medium may include a propagated data signal with computer-executable program instructions embodied therein, for example, in base band or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A transitory computer- readable medium may be any computer-readable medium that can contain, store, communicate, propagate, or transport program code for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied in a transitory computer-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, radio frequency (RF), etc.
A non-transitory computer-readable medium may be, for example, but not limited to, a tangible electronic, magnetic, optical, electromagnetic, infrared, or semiconductor storage system, apparatus, device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the non-transitory computer-readable medium would include, but is not limited to, the following: an electrical device having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
It will also be understood that one or more computer-executable program instructions for carrying out operations of the present invention may include object-oriented, scripted, and/or unscripted programming languages, such as, for example, Java, Perl, Smalltalk, C++, SAS, SQL, Python, Objective C, and/or the like. In some embodiments, the one or more computer-executable program instructions for carrying out operations of embodiments of the present invention are written in conventional procedural programming languages, such as the "C" programming languages and/or similar programming languages. The computer program instructions may alternatively or additionally be written in one or more multi-paradigm programming languages, such as, for example, F#.
The computer-executable program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operation area steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions/acts specified in the flowchart and/or block diagram block(s). Alternatively, computer program implemented steps or acts may be combined with operator or human implemented steps or acts in order to carry out an embodiment of the invention.
Embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.), or an embodiment combining software and hardware aspects that may generally be referred to herein as a "module," "application," or "system."
A skilled person will appreciate that variations of the disclosed arrangements are possible without departing from the invention. Accordingly, the above description of the specific embodiment is made by way of example only and not for the purposes of limitation. It will be clear to the skilled person that minor modifications may be made without significant changes to the operation described.

Claims

Claims
1. A method for determining a control policy for optimising performance of a system that has a variable input and multiple controllable components that can be switched between multiple different configurations, the method comprising:
modelling the system;
determining a distribution of possible system inputs;
generating input samples based on those inputs;
analysing the samples to provide possible solutions, each solution representing a system configuration taking into account one or more system constraints;
constructing training data using the possible solutions and
learning a control policy from the training data.
2. A method as claimed in claim 1 comprising determining the distribution of possible system inputs involves determining a probability distribution.
3. A method as claimed in claim 1 or claim 2 wherein each sample is a function of time and analysing the samples is done over discrete time intervals.
4. A method as claimed in claim 3 wherein the time intervals are of variable durations.
5. A method as claimed in any of the preceding claims wherein analysing the samples to provide possible solutions involves using a planner.
6. A method as claimed in claim 5 wherein the planner is operable to validate a plan for each sample, thereby to provide a solution for that sample.
7. A method as claimed in any of the preceding claims wherein learning a control policy from the training data involves using a classifier.
8. A method as claimed in any of the preceding claims wherein the system is a multiple battery system.
9. A method as claimed in claim 8 wherein modelling the system involves using the KiBa .
10. A method as claimed in claim 8 or claim 9 wherein the system input comprises the load applied to the multiple battery system.
11. A policy that is an output of the method of any of the preceding claims.
12. A policy as claimed in claim 1 consisting of a decision tree.
13. A controller or system that includes a policy as claimed in claim 11 or claim 12.
14. A controller as claimed in claim 13 that is implemented in software or hardware.
15. A computer program or computer program product having code or instructions for implementing the method of any of claims 1 to 10.
PCT/GB2012/000346 2011-04-14 2012-04-16 Automated construction of usage policies WO2012140401A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1106356.7 2011-04-14
GBGB1106356.7A GB201106356D0 (en) 2011-04-14 2011-04-14 Automated construction of usage policies

Publications (1)

Publication Number Publication Date
WO2012140401A1 true WO2012140401A1 (en) 2012-10-18

Family

ID=44147032

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2012/000346 WO2012140401A1 (en) 2011-04-14 2012-04-16 Automated construction of usage policies

Country Status (2)

Country Link
GB (1) GB201106356D0 (en)
WO (1) WO2012140401A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9696782B2 (en) 2015-02-09 2017-07-04 Microsoft Technology Licensing, Llc Battery parameter-based power management for suppressing power spikes
US9748765B2 (en) 2015-02-26 2017-08-29 Microsoft Technology Licensing, Llc Load allocation for multi-battery devices
US9793570B2 (en) 2015-12-04 2017-10-17 Microsoft Technology Licensing, Llc Shared electrode battery
US9939862B2 (en) 2015-11-13 2018-04-10 Microsoft Technology Licensing, Llc Latency-based energy storage device selection
US10061366B2 (en) 2015-11-17 2018-08-28 Microsoft Technology Licensing, Llc Schedule-based energy storage device selection
US10158148B2 (en) 2015-02-18 2018-12-18 Microsoft Technology Licensing, Llc Dynamically changing internal state of a battery

Non-Patent Citations (14)

* Cited by examiner, † Cited by third party
Title
A. CIMATTI; F. GIUNCHIGLIA; E. GIUNCHIGLIA; P. TRAVERSO: "Planning via Model Checking: A Decision Procedure for R", PROC. OF 4TH EUROPEAN CONFERENCE ON PLANNING (ECP, 1997
BENINI: "Scheduling battery usage in mobile systems", VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, IEEE TRANSACTIONS, vol. 11, no. 6, 2003, pages 1136 - 1143, XP011104625, DOI: doi:10.1109/TVLSI.2003.817555
EPO: "Notice from the European Patent Office dated 1 October 2007 concerning business methods", OFFICIAL JOURNAL OF THE EUROPEAN PATENT OFFICE, EPO, MUNCHEN, DE, vol. 30, no. 11, 1 November 2007 (2007-11-01), pages 592 - 593, XP007905525, ISSN: 0170-9291 *
G. DELLA PENNA; B. INTRIGILA; D. MAGAZZENI; F. MERCORIO: "UPMurphi: A tool for universal planning on PDDL+ problems", PROC. OF 19TH INTERNATIONAL CONFERENCE ON AUTOMATED PLANNING AND SCHEDULING (ICAPS, 2009, pages 106 - 113
JONGERDEN ET AL.: "Proceedings of the 39th Annual IEEE/IFIP International Conference on Dependable Systems and Networks", 2009, article "Maximizing system lifetime by battery scheduling", pages: 63 - 72
JONGERDEN: "Maximizing system lifetime by battery scheduling", PROC. OF 39TH ANNUAL IEEE/IFIP INTERNATIONAL CONGERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN 2009, 2009, pages 68 - 72
M. FOX; D. LONG: "Modelling mixed discrete-continuous domains for planning", JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, vol. 27, 2006, pages 235 - 297
M. HALL; E. FRANK; G. HOLMES; G. PFAHRINGER; P. REUTEMANN; I.H. WITTEN: "The WEKA data mining software: An update", SIGKKD EXPLORATIONS, vol. 11, no. 1, 2009, XP055058312, DOI: doi:10.1145/1656274.1656278
M. JONGERDEN; B. HAVERKORT: "Technical Report TR-CTIT-08-01", CENTRE FOR TELEMATICS AND INFORMATION TECHNOLOGY, article "Battery modeling"
M. JONGERDEN; B. HAVERKORT; J-P. KATOEN: "Proc. of 39th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2009", 2009, article "Maximizing system lifetime by battery scheduling", pages: 68 - 72
MANWELL, J.; MCGOWAN, J.: "Extension of the kinetic battery model for wind/hybrid power systems", PROCEEDINGS OF THE 5TH EUROPEAN WIND ENERGY ASSOCIATION CONFERENCE, 1994, pages 284 - 289
R. HOWEY; D. LONG; M. FOX: "VAL: Automatic plan validation, continuous effects and mixed initiative planning using PDDL", PROC. OF INTERNATIONAL CONFERENCE ON TOOLS WITH AI (ICTAI, 2004, pages 294 - 301, XP010759689
R. HOWEY; D. LONG; M. FOX: "VAL: Automatic plan validation, continuous effects and mixed initiative planning using PDDL", PROC. OF INTERNATIONAL CONFERENCE ON TOOLS WITH AI, 2004, pages 294 - 301, XP010759689
S. EDELKAMP: "Taming numbers and durations in the model checking integrated planning system", JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH SPECIAL ISSUE ON THE 3RD INTERNATIONAL PLANNING COMPETITION, vol. 20, 2002

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9696782B2 (en) 2015-02-09 2017-07-04 Microsoft Technology Licensing, Llc Battery parameter-based power management for suppressing power spikes
US10228747B2 (en) 2015-02-09 2019-03-12 Microsoft Technology Licensing, Llc Battery parameter-based power management for suppressing power spikes
US10158148B2 (en) 2015-02-18 2018-12-18 Microsoft Technology Licensing, Llc Dynamically changing internal state of a battery
US9748765B2 (en) 2015-02-26 2017-08-29 Microsoft Technology Licensing, Llc Load allocation for multi-battery devices
US10263421B2 (en) 2015-02-26 2019-04-16 Microsoft Technology Licensing, Llc Load allocation for multi-battery devices
US9939862B2 (en) 2015-11-13 2018-04-10 Microsoft Technology Licensing, Llc Latency-based energy storage device selection
US10061366B2 (en) 2015-11-17 2018-08-28 Microsoft Technology Licensing, Llc Schedule-based energy storage device selection
US9793570B2 (en) 2015-12-04 2017-10-17 Microsoft Technology Licensing, Llc Shared electrode battery

Also Published As

Publication number Publication date
GB201106356D0 (en) 2011-06-01

Similar Documents

Publication Publication Date Title
Fox et al. Plan-based policies for efficient multiple battery load management
Fox et al. Automatic construction of efficient multiple battery usage policies
US20220239122A1 (en) Server-side characterisation of rechargeable batteries
WO2012140401A1 (en) Automated construction of usage policies
Chin et al. Q-learning based traffic optimization in management of signal timing plan
Nikovski et al. Univariate short-term prediction of road travel times
JP7155138B2 (en) Battery health status determination and alarm generation
Lin et al. Estimation of battery state of health using probabilistic neural network
Shahzad et al. Data mining based job dispatching using hybrid simulation-optimization approach for shop scheduling problem
WO2020056157A1 (en) Systems and methods for managing energy storage systems
US11476678B2 (en) Predictive rechargeable battery management system
KR20180056238A (en) Battery charging method, battery charging information generating method and battery charging apparatus
JP2015090709A (en) Strategic modeling for economic optimization of grid-tied energy contexts
CN106233571A (en) Load dispatch in many battery apparatus
Koulinas et al. Construction resource allocation and leveling using a threshold accepting–based hyperheuristic algorithm
KR20220073829A (en) Battery Performance Prediction
Li et al. Dynamic data-driven and model-based recursive analysis for estimation of battery state-of-charge
Wang et al. A method based on improved ant lion optimization and support vector regression for remaining useful life estimation of lithium‐ion batteries
Zhang et al. Battery state estimation with a self-evolving electrochemical ageing model
CN104598984A (en) Fuzzy neural network based fault prediction method
Jones et al. Fluid queue models of battery life
Jongerden Model-based energy analysis of battery powered systems
Hermanns et al. How is your satellite doing? Battery kinetics with recharging and uncertainty
US11515578B2 (en) Apparatus and application for predicting performance of battery
Doppa et al. Autonomous design space exploration of computing systems for sustainability: Opportunities and challenges

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12719999

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12719999

Country of ref document: EP

Kind code of ref document: A1