WO2011041846A1 - A dynamically controlled learning system - Google Patents

A dynamically controlled learning system Download PDF

Info

Publication number
WO2011041846A1
WO2011041846A1 PCT/AU2010/001326 AU2010001326W WO2011041846A1 WO 2011041846 A1 WO2011041846 A1 WO 2011041846A1 AU 2010001326 W AU2010001326 W AU 2010001326W WO 2011041846 A1 WO2011041846 A1 WO 2011041846A1
Authority
WO
WIPO (PCT)
Prior art keywords
fitness
programs
population
subpopulation
learning
Prior art date
Application number
PCT/AU2010/001326
Other languages
French (fr)
Inventor
Philip Valencia
Raja Jurdak
Original Assignee
Commonwealth Scientific And Industrial Research Organisation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU2009904927A external-priority patent/AU2009904927A0/en
Application filed by Commonwealth Scientific And Industrial Research Organisation filed Critical Commonwealth Scientific And Industrial Research Organisation
Publication of WO2011041846A1 publication Critical patent/WO2011041846A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming

Definitions

  • the present invention relates to a dynamically controlled learning system, and in particular to a genetic programming framework that allows the evolution of an online learning system to be dynamically controlled.
  • Learning systems are computer systems that apply artificial intelligence techniques to solve a wide variety of difficult but relatively common optimisation problems.
  • the problems may relate to traffic light optimisation, optimisation of heating ventilation and air-conditioning (HVAC) systems, actuation of wireless sensor network (WSN) nodes and robot control, i.e. robotics.
  • HVAC heating ventilation and air-conditioning
  • WSN wireless sensor network
  • a dynamically controlled learning system including:
  • a fitness importance component for executing a process to generate fitness importance data representative of a target performance level
  • a learning control component for dynamically controlling the balance between a performance level and a learning level of said learning system based on said fitness importance data.
  • a dynamic learning process executed by a computer node, including:
  • Figure 1 is a block diagram of one embodiment of an online learning system including wireless sensor nodes
  • Figure 2 is a block diagram of another embodiment of an online learning system based on a network computer
  • Figure 3 is a block flow diagram of components and processes of an embodiment of the online learning system.
  • a dynamically controlled online learning system 100, 200 is implemented on one or more computer devices or systems.
  • the online learning system is based on an in-situ distributed genetic programming (IDGP) framework that includes an IDGP engine 330 and supporting components 306, 312 to be installed on a computer device.
  • the framework may be installed on wireless sensor nodes 102 of a wireless sensor network.
  • the nodes 102 each include a microcontroller 104, connected to a radio transceiver and volatile and non-volatile memory 106 all powered by a battery 1 10.
  • the framework is installed on the microcontroller 104 and the memory 106 of each node 102.
  • the IDGP framework may also be installed on a computer system 202, as shown in Figure 2.
  • the computer system 202 is based on a standard computer, such as a 32 or 64 bit Intel architecture computer produced by Lenovo Corporation, IBM Corporation, or Apple Inc.
  • the processes executed by the computer system 202 are defined and controlled by computer program instruction code and data of software components or modules 250 of the IDGP framework stored on non-volatile (e.g. hard disk) storage 204 of the computer 202.
  • the corresponding processes performed or executed by the modules 250 can, alternatively, be performed by firmware stored in read only memory (ROM) or at least in part by dedicated hardware circuits of the computer 202, such as application specific integrated circuits (ASICs) and/or field programmable gate arrays (FPGAs).
  • ROM read only memory
  • ASICs application specific integrated circuits
  • FPGAs field programmable gate arrays
  • the computer 202 includes random access memory (RAM) 206, at least one microprocessor 208, and external interfaces 210, 212, 214 that are all connected by a system bus 216.
  • the external interfaces include universal serial bus (USB) interfaces 210, a network interface connector (NIC) 212, and a display adapter 214.
  • USB interfaces 210 are connected to input/output devices, such as a keyboard and mouse 218.
  • the display adapter 214 is connected to a display device, such as an LCD display screen 222.
  • the NIC 212 enables the computer system 200 to connect to a communications network 220, such as the Internet.
  • the computer 202 includes an operating system (OS) 224, such as Linux Enterprise Server or Microsoft Windows Server (if the system 200 is operating as a server), or Windows XP or Vista if it is operating as a personal computer.
  • the computer 102 also includes web server code 226, such as Apache, and a database management system (DBMS) 230, such as MySQL, to provide structured query language support and to enable maintenance of and access to data, such as parameter values, stored in an SQL database 232 of the system 200.
  • the web server 226, DBMS 230 and the modules 250 all run on the OS 224.
  • the modules 250 of the IDGP framework may include markup language code (such as HTML, XML, XHTML), scripts (such as PHP, ASP and CGI), image files, style sheets and program code written using languages such as C, Ruby or PHP. For the scripts, the modules would include a runtime engine for compilation in real time.
  • the modules 250 may also be implemented using a code development framework, such as Microsoft.Net or Ruby On Rails.
  • the modules 250 can then include computer program classes, methods and files as part of the framework using computer program instruction code such as Ruby or Java.
  • the computer 202 is able to generate and provide code elements for user interfaces that are served by the web server 226 and requested by a remote web browser on a client computer 260 that is connected to the computer 202 over the network 220.
  • the IDGP components or modules 250 include, as shown in Figure 3, the IDGP engine 330 and a fitness component 306 and a fitness importance component 3 12.
  • the engine 330 includes a generate initial programs component 302, an evaluation component 304, that relies upon the fitness component 306, a ranking component 308, a distribution component 310 that relies upon the fitness importance (FI) component 312, and a generate next population component 314.
  • the instantiated IDGP engine 330 on a node 102, 202 uses the generated initial programs component 302 to generate a population of randomly generated programs based on the values of initial parameters, such as LMAX being the maximum line length of each program and Np being the number of programs in the population. Seeded programs for the task(s) performed by the system 100, 200 can also be provided by a user. Operation of the engine 330 then proceeds to invoking the evaluation component 304 which executes and evaluates each program in turn by applying a fitness process of the called fitness component 306. For some applications this may involve only application of a performance evaluation metric for the task of a system application.
  • the fitness process 306 generates a performance data value or score for each program by evaluating each of the programs of the population according to a performance criteria for the specific application or task.
  • the engine 330 then invokes the ranking component 308 to perform a ranking process to apply a rank level to each program in the population based on the performance data values obtained during the evaluation process 304. Programs can then be broadcast for evaluation by other nodes 102, 202 in the network.
  • the IDGP engine 330 invokes a distribution component 310 to control the distribution of subpopulations, as discussed below, based on a called fitness importance process of the FI component 312.
  • the distribution component 310 provides a learning control module which controls the level of performance and learning associated with a generated next generation population.
  • the IDGP engine 330 uses a generate next population component 314 to generate subpopulations of the next generation population based on parameters generated during the distribution process 310. Operation of the engine 330 then returns to the evaluation component and process 304, where the next generation population that has been just generated is executed and evaluated.
  • the next generation population process 314 generates a number of subpopulations of programs for the next generation by applying different genetic programming operations to the previous generation of programs.
  • the operations include cloning, mutation, crossover of programs (as discussed in Koza) or obtaining programs from other sources, such as receiving programs from other nodes or human-devised (injected) programs.
  • the subpopulations can be categorised based on their expected performance, which can be determined based on knowledge of the program's previous performance and knowledge of the effect of the genetic programming operations on the performance of programs. For example, an "elite" program generated by a clone operation (meaning the program is copied unaltered) will have the same performance in the next generation as it did in the previous generation, provided that there are not significant changes in the operating environment.
  • the IDGP engine 330 is able to construct and select subpopulations to generate a next generation population which maximises the learning capability given a desired expected performance constraint.
  • This performance constraint is set by the fitness importance process 312 which determines a fitness importance level that represents the level of importance of performance over the importance of optimal learning for the next population.
  • the distribution component 310 determines the distribution of subpopulations in the next generation population generated by the component 314.
  • the fitness importance parameter used by the distribution component 310 has a normalised data value range from 0 to 1 , where 0 represents a population of optimal learning is to be generated, and 1 represents a population of optimal performance is to be generated, and the data values between represent populations to be generated to meet a performance constraint.
  • the performance constraint as represented by the generated fitness importance data, is met by adjusting the size of the elite subpopulation (i.e. the number of elite programs) and scaling the remaining subpopulations over the fixed population size (Np) as discussed below.
  • the evolutionary process of evaluation and constructing new populations is repeated by returning operation to evaluation process 304.
  • the evolutionary process performed by the IDGP engine 330 effectively performs continual optimisation and learning simultaneously, whilst the fitness importance process 312 can operate to bias the evolutionary process towards either better performance or faster learning.
  • the fitness importance (FI) parameter data value generated by the fitness importance process 312 is a meta-heuristic that enables online learning systems to control the balance between exploration and exploitation (i.e. performance) and adapt to changing environmental conditions. For example, in the case of wireless sensor networks, which operate under relatively tight learning constraints, excessive exploration can lead to quick depletion of the battery 1 10 without any certain benefits.
  • WSN's may use solar energy harvesting which can mean that a network harvests more energy during the day than it actually needs to consume. The harvested energy may exceed the network storage capacity and thus would normally be wasted, unless deployed for other activities. By setting a low FI value during sunny conditions, a node 102 can use this excess energy to further evolve its program code during periods that do not require high performance.
  • a computer node 102, 202 may ultimately acquire a population that meets an acceptable fitness threshold allowing greater exploration for programs that further improve performance. Over time this situation may change due to, for example, an external event, which will require the selection of programs more devoted to performance before exploration can again occur.
  • the operation period can be considered to move between a convergence phase into an acceptable performance phase where there is more exploration opportunity, and then when the external event occurs, back into a reconvergence phase.
  • the balance between exploration and exploitation i.e. performance
  • the balance between exploration and exploitation is controlled by the FI process 312 and the subpopulation distribution control process 310.
  • a generation i has a population of N / ,-, programs.
  • the fitness of a program determined by the evaluation process 304 and the fitness process 306, that has the h highest fitness within a population is defined as fk.
  • the average fitness of generation / is defined as: N
  • the Fitness Importance (FI) at time t is ⁇ , , and is generated by a FI function ⁇ (/) executed by the process 312.
  • the values of ⁇ range from 0 to 1 inclusive.
  • a value of 0 corresponds to no importance being placed on the current fitness of the system, but allows for the system to provide optimal convergence (i.e. minimum convergence time for an optimal solution).
  • a value of 1 for ⁇ however corresponds to a maximum importance of fitness, meaning that the system should perform as best as possible with the current knowledge of the existing population programs and their fitnesses.
  • the program with the highest expected fitness i.e. Ei
  • the FI component 312 sets the data value of the FI parameter based on one or more input data signals received and/or generated by the node 102, 202.
  • the data signals represent the current operating conditions of the learning system 100, 200, which may include internal operating conditions such as battery level, energy level, etc. or external environmental conditions, such as temperature, time of day, humidity and any other conditions that can be sensed by sensors associated with the node 102, 202 or the network connected to the node 102, 202.
  • the parameter ⁇ is the set of subpopulation sizes for the / ' generation as previously described ⁇ N£ format NM, Na, NR, , Not ⁇ , and Y, is the set of corresponding program fitnesses.
  • the function ⁇ finds appropriate subpopulation sizes ⁇ 3 ⁇ 4 ⁇ , NHJ, NQ, NRJ and No/ which provide optimal exploration (faster convergence to an optimal solution) given the constraint of satisfying the desired performance criteria of ⁇ .
  • the number of possible combinations scales exponentially as Np increases, i.e. the number of
  • a heuristic implementation of the subpopulation distribution function ⁇ is executed by generating Np fixed distributions by including 1 to Np elite programs, with the remaining distribution scaled and quantised in an iterative manner starting using a ceiling function.
  • the distribution process 310 can start with subpopulations that have high expected fitness and iteratively apply the scaling using the ceiling function until the subpopulation with the lowest expected fitness is reached or until Np programs have been allocated.
  • One example of implementation of the IDGP framework is in the dynamic control of the packet data rate used by the wireless sensor nodes 102 of a wireless sensor network.
  • the communication programs in the nodes 102 used to set the packet data rate are subject to an inherent trade off where setting a high packet data rate provides a high degree of information return at a high energy cost, while setting a low data rate saves energy but returns less information.
  • the optimal packet data rate can vary according to changes in the deployment environment, which precludes setting a fixed data rate.
  • the data rate of the network can be set at a nominal initial value upon deployment.
  • the network nodes 102 can learn and evolve their control programs (logic) and thus their data rate based on the sensed conditions of their deployment environment.
  • the sensor nodes 102 can include environmental sensors, such as leaf wetness and wind speed sensors. During a fine day, the samples of these sensors will remain fairly constant, so sending back sensor data at a low rate is sufficient to be representative of the environmental conditions.
  • the FI component 312 sets the FI meta-heuristic to 0 during fine days so the sensor nodes need only provide coarse information (low performance period), and that they can use this period of low activity to maximize their learning by evolving their executable logic.
  • an event of high interest such as a storm
  • the FI process 312 on detecting the event based on sensor signals received, sets the FI value to a positive value that is greater than 0, which forces the network nodes to bias their data delivery to better performing over ones that would provide more learning capability.
  • the extent to which the nodes 102 bias their evolution strategies and subpopulation distributions, and hence improve the current data delivery rate, is dependent on the value of FI. As FI approaches 1 , the data delivery rate will approach the performance of the best delivery rate strategy learnt thus far. Simultaneously, the learning potential of the nodes 102 approaches zero.
  • the FI process 312 can set the FI value low (close to zero) again to ensure that the nodes continue to learn (evolve program logic) in order to deliver higher performance under the local and often dynamic environmental conditions. This increases the systems' adaptability to unexpected environmental changes.
  • the IDGP framework can be applied to online learning systems that are dynamic where there are periods where performance is needed and other periods where learning is desired.
  • the FI value can be set relatively high during peak hour periods, and low during off peak periods.
  • the FI parameter value can be set based on fluctuations in demand periods and loading conditions to enable the distribution component 310 to control the subpopulation distributions accordingly.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Genetics & Genomics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Physiology (AREA)
  • Feedback Control In General (AREA)

Abstract

A dynamically controlled learning system, including: a fitness importance component for executing a process to generate fitness importance data representative of a target performance level; and a learning control component for dynamically controlling the balance between a performance level and a learning level of the learning system based on the fitness importance data.

Description

A DYNAMICALLY CONTROLLED LEARNING SYSTEM
FIELD The present invention relates to a dynamically controlled learning system, and in particular to a genetic programming framework that allows the evolution of an online learning system to be dynamically controlled.
BACKGROUND
Learning systems are computer systems that apply artificial intelligence techniques to solve a wide variety of difficult but relatively common optimisation problems. For example, the problems may relate to traffic light optimisation, optimisation of heating ventilation and air-conditioning (HVAC) systems, actuation of wireless sensor network (WSN) nodes and robot control, i.e. robotics.
One of the difficulties in developing a learning system that is able to evolve is the complexity and computation time required to deal with a dynamically changing environment. Accordingly, most systems are first evolved offline and have their system logic fixed when deployed. Even logic deployed for wireless sensor networks and robotics is subject to constraints associated with explicit and implicit idealisations about the real world thereby confining their operation to limited or restricted conditions. This logic is sometimes referred to as "brittle" due to the inability of the learning systems to be able to predict the outcome of or handle complex interactions between numerous nodes and/or a complex dynamic physical environment.
Biological systems, however, are able to learn and evolve in a manner that enables the system to adapt or change behaviour to survive despite encountering unexpected conditions. Seeking to emulate these characteristics, genetic algorithms have been developed for solving optimisation problems, and genetic programming can be used to generate unique logic, as discussed in J.R. Koza et al, "Genetic programming IV: Routine Human-Competitive Machine Intelligence", 2003, Springer Science & Business Media Inc., ISBN 1-4020-7446-8 ("Koza"). An online learning system may then adapt a single parameter of the logic or may create entirely unique program logic. Yet systems that employ genetic programming rarely adopt online evolution that is applied either post deployment of an application system or during the lifetime of the application system. A survey of evolutionary approaches for WSNs is discussed in G. Nan and M. Li;, "Evolutionary based approaches in wireless sensor networks: A survey." Natural Computation, 2008, ICNC '08. Fourth International Conference on, vol. 5, pp. 217-222, Sep 2008. Even for these proposals, the evolution is constrained once the system is deployed.
It is desired to address the above or at least provide a useful alternative. SUMMARY
According to one aspect of the present invention there is provided a dynamically controlled learning system, including:
a fitness importance component for executing a process to generate fitness importance data representative of a target performance level; and
a learning control component for dynamically controlling the balance between a performance level and a learning level of said learning system based on said fitness importance data.
Embodiments of the invention include a genetic programming engine configured to:
(i) execute and evaluate a population of programs of said system and rank the programs using a fitness process; and
(ii) generate a next generation population of executable programs based on said population;
wherein said learning control component determines a subpopulation distribution of said next generation population based on said fitness importance data. According to another aspect of the present invention there is provided a dynamic learning process, executed by a computer node, including:
generating fitness importance data representative of a target performance level of said node; and
dynamically controlling the balance between a performance level and a learning level of said node based on said fitness importance data.
DRAWINGS Embodiments of the present invention are hereinafter described, by way of example only, with reference to the accompanying drawings, wherein:
Figure 1 is a block diagram of one embodiment of an online learning system including wireless sensor nodes;
Figure 2 is a block diagram of another embodiment of an online learning system based on a network computer;
Figure 3 is a block flow diagram of components and processes of an embodiment of the online learning system.
DESCRIPTION
A dynamically controlled online learning system 100, 200 is implemented on one or more computer devices or systems. The online learning system is based on an in-situ distributed genetic programming (IDGP) framework that includes an IDGP engine 330 and supporting components 306, 312 to be installed on a computer device. For example, the framework may be installed on wireless sensor nodes 102 of a wireless sensor network. The nodes 102 each include a microcontroller 104, connected to a radio transceiver and volatile and non-volatile memory 106 all powered by a battery 1 10. The framework is installed on the microcontroller 104 and the memory 106 of each node 102. The IDGP framework may also be installed on a computer system 202, as shown in Figure 2. The computer system 202 is based on a standard computer, such as a 32 or 64 bit Intel architecture computer produced by Lenovo Corporation, IBM Corporation, or Apple Inc. The processes executed by the computer system 202 are defined and controlled by computer program instruction code and data of software components or modules 250 of the IDGP framework stored on non-volatile (e.g. hard disk) storage 204 of the computer 202. The corresponding processes performed or executed by the modules 250 can, alternatively, be performed by firmware stored in read only memory (ROM) or at least in part by dedicated hardware circuits of the computer 202, such as application specific integrated circuits (ASICs) and/or field programmable gate arrays (FPGAs). The computer 202 includes random access memory (RAM) 206, at least one microprocessor 208, and external interfaces 210, 212, 214 that are all connected by a system bus 216. The external interfaces include universal serial bus (USB) interfaces 210, a network interface connector (NIC) 212, and a display adapter 214. The USB interfaces 210 are connected to input/output devices, such as a keyboard and mouse 218. The display adapter 214 is connected to a display device, such as an LCD display screen 222. The NIC 212 enables the computer system 200 to connect to a communications network 220, such as the Internet. The computer 202 includes an operating system (OS) 224, such as Linux Enterprise Server or Microsoft Windows Server (if the system 200 is operating as a server), or Windows XP or Vista if it is operating as a personal computer. The computer 102 also includes web server code 226, such as Apache, and a database management system (DBMS) 230, such as MySQL, to provide structured query language support and to enable maintenance of and access to data, such as parameter values, stored in an SQL database 232 of the system 200. The web server 226, DBMS 230 and the modules 250 all run on the OS 224. The modules 250 of the IDGP framework may include markup language code (such as HTML, XML, XHTML), scripts (such as PHP, ASP and CGI), image files, style sheets and program code written using languages such as C, Ruby or PHP. For the scripts, the modules would include a runtime engine for compilation in real time. The modules 250 may also be implemented using a code development framework, such as Microsoft.Net or Ruby On Rails. The modules 250 can then include computer program classes, methods and files as part of the framework using computer program instruction code such as Ruby or Java. The computer 202 is able to generate and provide code elements for user interfaces that are served by the web server 226 and requested by a remote web browser on a client computer 260 that is connected to the computer 202 over the network 220. The IDGP components or modules 250 include, as shown in Figure 3, the IDGP engine 330 and a fitness component 306 and a fitness importance component 3 12. The engine 330 includes a generate initial programs component 302, an evaluation component 304, that relies upon the fitness component 306, a ranking component 308, a distribution component 310 that relies upon the fitness importance (FI) component 312, and a generate next population component 314.
The instantiated IDGP engine 330 on a node 102, 202 uses the generated initial programs component 302 to generate a population of randomly generated programs based on the values of initial parameters, such as LMAX being the maximum line length of each program and Np being the number of programs in the population. Seeded programs for the task(s) performed by the system 100, 200 can also be provided by a user. Operation of the engine 330 then proceeds to invoking the evaluation component 304 which executes and evaluates each program in turn by applying a fitness process of the called fitness component 306. For some applications this may involve only application of a performance evaluation metric for the task of a system application. The fitness process 306 generates a performance data value or score for each program by evaluating each of the programs of the population according to a performance criteria for the specific application or task.
The engine 330 then invokes the ranking component 308 to perform a ranking process to apply a rank level to each program in the population based on the performance data values obtained during the evaluation process 304. Programs can then be broadcast for evaluation by other nodes 102, 202 in the network.
Rather than simply generate a next generation population from the ranked population, the IDGP engine 330 invokes a distribution component 310 to control the distribution of subpopulations, as discussed below, based on a called fitness importance process of the FI component 312. The distribution component 310 provides a learning control module which controls the level of performance and learning associated with a generated next generation population. The IDGP engine 330 uses a generate next population component 314 to generate subpopulations of the next generation population based on parameters generated during the distribution process 310. Operation of the engine 330 then returns to the evaluation component and process 304, where the next generation population that has been just generated is executed and evaluated.
The next generation population process 314 generates a number of subpopulations of programs for the next generation by applying different genetic programming operations to the previous generation of programs. The operations include cloning, mutation, crossover of programs (as discussed in Koza) or obtaining programs from other sources, such as receiving programs from other nodes or human-devised (injected) programs. The subpopulations can be categorised based on their expected performance, which can be determined based on knowledge of the program's previous performance and knowledge of the effect of the genetic programming operations on the performance of programs. For example, an "elite" program generated by a clone operation (meaning the program is copied unaltered) will have the same performance in the next generation as it did in the previous generation, provided that there are not significant changes in the operating environment. Whereas the expected performance of "children" programs generated by crossover and mutation operations on a set of probabilistically biased selected programs is typically less than the performance of either "parent" program of the previous generation. Yet the expected learning potential for children programs is greater than that of elite programs since they explore more operations, in other words give more knowledge about the "fitness landscape", than a parent program that has been tested recently. Using the expected performance and learning capabilities, the IDGP engine 330 is able to construct and select subpopulations to generate a next generation population which maximises the learning capability given a desired expected performance constraint. This performance constraint is set by the fitness importance process 312 which determines a fitness importance level that represents the level of importance of performance over the importance of optimal learning for the next population. This is used by the distribution component 310 to determine the distribution of subpopulations in the next generation population generated by the component 314. The fitness importance parameter used by the distribution component 310 has a normalised data value range from 0 to 1 , where 0 represents a population of optimal learning is to be generated, and 1 represents a population of optimal performance is to be generated, and the data values between represent populations to be generated to meet a performance constraint. The performance constraint, as represented by the generated fitness importance data, is met by adjusting the size of the elite subpopulation (i.e. the number of elite programs) and scaling the remaining subpopulations over the fixed population size (Np) as discussed below.
Once a new population is generated from the subpopulations the evolutionary process of evaluation and constructing new populations is repeated by returning operation to evaluation process 304. The evolutionary process performed by the IDGP engine 330 effectively performs continual optimisation and learning simultaneously, whilst the fitness importance process 312 can operate to bias the evolutionary process towards either better performance or faster learning.
The fitness importance (FI) parameter data value generated by the fitness importance process 312 is a meta-heuristic that enables online learning systems to control the balance between exploration and exploitation (i.e. performance) and adapt to changing environmental conditions. For example, in the case of wireless sensor networks, which operate under relatively tight learning constraints, excessive exploration can lead to quick depletion of the battery 1 10 without any certain benefits. To augment their energy reserves, WSN's may use solar energy harvesting which can mean that a network harvests more energy during the day than it actually needs to consume. The harvested energy may exceed the network storage capacity and thus would normally be wasted, unless deployed for other activities. By setting a low FI value during sunny conditions, a node 102 can use this excess energy to further evolve its program code during periods that do not require high performance. Also during the evolutionary process a computer node 102, 202 may ultimately acquire a population that meets an acceptable fitness threshold allowing greater exploration for programs that further improve performance. Over time this situation may change due to, for example, an external event, which will require the selection of programs more devoted to performance before exploration can again occur. The operation period can be considered to move between a convergence phase into an acceptable performance phase where there is more exploration opportunity, and then when the external event occurs, back into a reconvergence phase. The balance between exploration and exploitation (i.e. performance) is controlled by the FI process 312 and the subpopulation distribution control process 310.
To describe operation of the FI process 312, we can first assume that a generation i has a population of N/ ,-, programs. The program population is split into five subpopulations with sizes defined by the set Γ, comprised of Ngi elite programs, Nm highly ranked programs with mutation, No children programs generated through crossover and mutation, Nm randomly generated programs and No other externally generated programs, i.e. NPi = N£, +
NHi + Na + NRi + N0i.
The fitness of a program, determined by the evaluation process 304 and the fitness process 306, that has the h highest fitness within a population is defined as fk. The average fitness of generation / is defined as: N„ Ns
J Pi ^ Pi SeQ k=\
where fski is the fitness of the h program of subpopulation S from the set of subpopulations Q = {E, H, C, R, 0} in generation /.
The Fitness Importance (FI) at time t is φ, , and is generated by a FI function Φ(/) executed by the process 312. For convenience t is described as discrete units in terms of generations. Hence t = 1 represents the period corresponding to the first generation of programs that were evaluated. The values of φ range from 0 to 1 inclusive. A value of 0 corresponds to no importance being placed on the current fitness of the system, but allows for the system to provide optimal convergence (i.e. minimum convergence time for an optimal solution). A value of 1 for φ however corresponds to a maximum importance of fitness, meaning that the system should perform as best as possible with the current knowledge of the existing population programs and their fitnesses. To achieve this, the program with the highest expected fitness (i.e. Ei) should be exploited. This is similar to offline evolutionary approaches where the best solution evolved offline is placed into the online environment and then not altered. However, no exploration (learning) occurs while φ = 1 because of the lack of diversity in the program population.
The values between these limits represent a linear scale of desired expected performance between the desired expected performance when φ = 0 and φ = 1. i.e.
E{Fj )|„+1 = E(FM )| φ__0 +(E( +1 )| -E(FM )| )φ (2) or in terms of φ] ,
=0
(3)
E^ ) ,__ -E{Fj %
Accordingly, when φ. = 0.5, it indicates the desire for the expected average fitness to lie exactly half way between the expected average fitness of a standard genetic algorithm process E(Fj )\ ^0 and the expected average fitness of using only the current elite solution i
The FI component 312 sets the data value of the FI parameter based on one or more input data signals received and/or generated by the node 102, 202. The data signals represent the current operating conditions of the learning system 100, 200, which may include internal operating conditions such as battery level, energy level, etc. or external environmental conditions, such as temperature, time of day, humidity and any other conditions that can be sensed by sensors associated with the node 102, 202 or the network connected to the node 102, 202. The parameter Γ, is the set of subpopulation sizes for the /' generation as previously described {N£„ NM, Na, NR, , Not}, and Y, is the set of corresponding program fitnesses. A new population (generation j) can be constructed with the same demographics, Γ, , but not necessarily with exactly the same programs or same fitnesses, i.e. Γ . = Γ, , Yy≠ Y, , then an expected average fitness for generation j can be determined based on knowledge of the current generation i, as E(Fj )| n Y, .
To obtain this, the distribution component 310 executes a subpopulation distribution function, γ χ, Υί , Ύι ) = Γ, ΐο produce a set Γ . of subpopulation sizes which will yield an expected average fitness of x based on the subpopulation distribution, Γ, , and results, Y, , of the previous generation. Since the FI process 312 generates an FI value to yield an expected performance, the subpopulation distribution function produces a distribution of programs whose expected performance equals that specified by the FI value of the parameter. The distribution process therefore executes
*^ )| # , Γ, , Υ, ) = Γ (4) where E( y )| rj is determined by Equation (2) above.
The function γ finds appropriate subpopulation sizes Λ¾·, NHJ, NQ, NRJ and No/ which provide optimal exploration (faster convergence to an optimal solution) given the constraint of satisfying the desired performance criteria of^ . For S subpopulations, the number of possible combinations scales exponentially as Np increases, i.e. the number of
(N + 5 - 1)!
possible Γ , sets is Nr, =— . Within these, only a subset will meet the minimum
H 1 Np!(5 - 1)!
desired expected performance, and the subpopulation distribution that retains the maximum learning capacity is to be retained. A heuristic implementation of the subpopulation distribution function γ, is executed by generating Np fixed distributions by including 1 to Np elite programs, with the remaining distribution scaled and quantised in an iterative manner starting using a ceiling function. The distribution process 310 can start with subpopulations that have high expected fitness and iteratively apply the scaling using the ceiling function until the subpopulation with the lowest expected fitness is reached or until Np programs have been allocated.
An example is illustrated below in Table 1 , for a population of Np = 21. The r| ^=0 subpopulation distribution is defined as per the first entry, and r ^, subpopulation distribution defined as per the last entry The subpopulation distribution function creates subpopulation distributions by linearly increasing the number of elites from φ = 0 to φ =
1. Since the set of r| ^=0 contains only one elite, then Np unique discrete distributions are created. With the size of the elite subpopulation, NE, known, remaining subpopulations are scaled across the remaining Np - NE available positions, by calculating the fraction that the subpopulation used in r| ^=0 of the total set and then applying the ceiling function which simply rounds a real number up to the next integer. This transform is applied iteratively, starting with subpopulations that have higher expected performance (and less learning potential) and working down to subpopulations with a lower expected performance (i.e. from E to H to C to R to O). In Table 1, φα is the desired fitness importance value, whereas φα is the observed fitness importance value obtained by dividing the fitness level obtained from generation j by the estimated fitness at φ = 0 for generation . The difference between them provides an error measurement.
Figure imgf000012_0001
5 EEEEEHHHCCCCCCCCCCRRO 653 0.277 0.20
6 EEEEEEHHHCCCCCCCCCRRO 673 0.320 0.25
7 EEEEEEEHHHCCCCCCCCRRO 693 0.364 0.30
8 EEEEEEEEHHCCCCCCCCRRO 695 0.368 0.35
9 EEEEEEEEEHHCCCCCCCCRO 737 0.459 0.40
10 EEEEEEEEEEHHCCCCCCCRO 757 0.502 0.45
11 EEEEEEEEEEEHHCCCCCCRO 777 0.545 0.50
12 EEEEEEEEEEEEHHCCCCCRO 797 0.589 0.55
13 EEEEEEEEEEEEEHHCCCCCR 841 0.684 0.60
14 EEEEEEEEEEEEEEHHCCCCR 861 0.727 0.65
15 EEEEEEEEEEEEEEEHCCCCR 863 0.732 0.70
16 EEEEEEEEEEEEEEEEHCCCR 883 0.775 0.75
17 EEEEEEEEEEEEEEEEEHCCC 925 0.866 0.80
18 EEEEEEEEEEEEEEEEEEHCC 945 0.909 0.85
19 EEEEEEEEEEEEEEEEEEEHC 965 0.952 0.90
20 EEEEEEEEEEEEEEEEEEEEH 985 0.996 0.95
21 EEEEEEEEEEEEEEEEEEEEE 987 1.000 1.00
Table 1
One example of implementation of the IDGP framework is in the dynamic control of the packet data rate used by the wireless sensor nodes 102 of a wireless sensor network. The communication programs in the nodes 102 used to set the packet data rate are subject to an inherent trade off where setting a high packet data rate provides a high degree of information return at a high energy cost, while setting a low data rate saves energy but returns less information. The optimal packet data rate can vary according to changes in the deployment environment, which precludes setting a fixed data rate.
With the IDGP framework installed, the data rate of the network can be set at a nominal initial value upon deployment. As time passes, the network nodes 102 can learn and evolve their control programs (logic) and thus their data rate based on the sensed conditions of their deployment environment. For instance, the sensor nodes 102 can include environmental sensors, such as leaf wetness and wind speed sensors. During a fine day, the samples of these sensors will remain fairly constant, so sending back sensor data at a low rate is sufficient to be representative of the environmental conditions. As such, the FI component 312 sets the FI meta-heuristic to 0 during fine days so the sensor nodes need only provide coarse information (low performance period), and that they can use this period of low activity to maximize their learning by evolving their executable logic.
At some point during the deployment, an event of high interest, such as a storm, can occur, requiring the network to maximise its data delivery performance. As a result, the FI process 312 on detecting the event based on sensor signals received, sets the FI value to a positive value that is greater than 0, which forces the network nodes to bias their data delivery to better performing over ones that would provide more learning capability. The extent to which the nodes 102 bias their evolution strategies and subpopulation distributions, and hence improve the current data delivery rate, is dependent on the value of FI. As FI approaches 1 , the data delivery rate will approach the performance of the best delivery rate strategy learnt thus far. Simultaneously, the learning potential of the nodes 102 approaches zero.
Once the event of interest is detected as finished, the FI process 312 can set the FI value low (close to zero) again to ensure that the nodes continue to learn (evolve program logic) in order to deliver higher performance under the local and often dynamic environmental conditions. This increases the systems' adaptability to unexpected environmental changes.
The IDGP framework can be applied to online learning systems that are dynamic where there are periods where performance is needed and other periods where learning is desired. For example, for an online learning system that deals with traffic light optimisation, the FI value can be set relatively high during peak hour periods, and low during off peak periods. For systems that handle optimisation of HV AC, the FI parameter value can be set based on fluctuations in demand periods and loading conditions to enable the distribution component 310 to control the subpopulation distributions accordingly. Many modifications will be apparent to those skilled in the art without departing from the scope of the present invention.

Claims

THE CLAIMS DEFINING THE INVENTION ARE AS FOLLOWS:
1. A dynamically controlled learning system, including:
a fitness importance component for executing a process to generate fitness importance data representative of a target performance level; and
a learning control component for dynamically controlling the balance between a performance level and a learning level of said learning system based on said fitness importance data.
2. A dynamically controlled learning system as claimed in claim 1 , wherein the learning system is online and said fitness importance data is generated on the basis of operating data representing detected operating conditions of said learning system.
3. A dynamically controlled learning system as claimed in claim 1 or 2, including a genetic programming engine configured to:
(i) execute and evaluate a population of programs of said system and rank the programs using a fitness process; and
(ii) generate a next generation population of executable programs based on said population;
wherein said learning control component determines a subpopulation distribution of said next generation population based on said fitness importance data.
4. A dynamically controlled learning system as claimed in claim 3, including:
a fitness component for performing said fitness process to generate performance data representative of a performance level of a program of said population; and
said genetic program engine including:
(i) an evaluation component for executing and evaluating said population using said fitness component;
(ii) a ranking component for applying said rank to said programs based on said performance data; a distribution component comprising said learning control component; a generate next population component for generating said next generation population using said subpopulation distribution.
5. A dynamically controlled learning system as claimed in claim 4, wherein said subpopulation distribution is a distribution of the number of programs in each subpopulation.
6. A dynamically controlled learning system as claimed in claim 5, wherein the next generation population includes at least one of:
(i) a subpopulation of elite programs generated by a clone operation;
(ii) a subpopulation of highly ranked programs generated by cloning highly ranked programs and applying a mutation operation;
(iii) a subpopulation of children programs generated by crossover and/or mutation operations;
(iv) random programs generated by the genetic programming engine; and
(v) programs generated externally of said genetic programming engine.
7. A dynamically controlled learning system as claimed in claim 4, 5 or 6, wherein said genetic programming engine executes the evaluation, ranking, distribution and generate next population generation components iteratively, and said fitness importance component varies said fitness importance data over time.
8. A dynamically controlled learning system as claimed in claim 7, wherein said fitness importance process applies a performance level to an evaluated program based on performance criteria for an application of the programs of said population.
9. A dynamically controlled learning system as claimed in any one of the preceding claims, including computer nodes each executing said genetic programming engine and communicating using a communications network, wherein the nodes broadcast ranked programs associated with a high rank to other nodes, and receive high rank programs from other nodes for a subpopulation.
10. A dynamically controlled learning system as claimed in any one of claims 3 to 9, including wireless sensor nodes each including said genetic programming engine and said fitness importance component.
1 1. A dynamically controlled learning system as claimed in claim 10, wherein said fitness importance component adjusts said fitness importance data based on energy levels of said nodes.
12. A dynamically controlled learning system as claimed in any one of the preceding claims, wherein said system is a traffic light optimisation system.
13. A dynamically controlled learning system as claimed in any one of the preceding claims, wherein said system is a heating, ventilation and cooling system.
14. A dynamically controlled learning system as claimed in any one of the preceding claims, wherein said system is a robotics system.
15. A dynamic learning process, executed by a computer node, including:
generating fitness importance data representative of a target performance level of said node; and
dynamically controlling the balance between a performance level and a learning level of said node based on said fitness importance data.
16. A dynamic learning process as claimed in claim 15, wherein said fitness importance data is generated on the basis of operating data representing detected operating conditions of said node.
17. A dynamic learning process as claimed in claim 15 or 16, including:
(i) executing and evaluating a population of programs of said system; (ii) ranking the programs using a fitness process; and
(iii) generating a next generation population of executable programs based on said population;
wherein said dynamically controlling includes determining a subpopulation distribution of said next generation population based on said fitness importance data.
18. A dynamic learning process as claimed in claim 17, including performing said fitness process to generate performance data representative of a performance level of a program of said population; wherein said ranking applies a rank to said programs based on said performance data.
19. A dynamic learning process as claimed in claim 18, wherein said subpopulation distribution is a distribution of the number of programs in each subpopulation.
20. A dynamic learning process as claimed in claim 19, wherein the next generation population includes at least one of:
(i) a subpopulation of elite programs generated by cloning;
(ii) a subpopulation of highly ranked programs generated by cloning highly ranked programs and applying a mutation operation;
(iii) a subpopulation of children programs generated by crossover and/or mutation operations;
(iv) randomly generated programs; and
(v) programs generated externally of said node.
21. A dynamic learning process as claimed in claim 18, 19 or 20, including iteratively executing said evaluating, ranking, distribution determining and generating next generation population steps, and varying said fitness importance data over time.
22. A dynamic learning process as claimed in claim 21 , wherein said fitness importance process applies a performance level to an evaluated program based on performance criteria for an application of the programs of said population.
23. A dynamic learning process as claimed in any one of claims 15 to 22, including broadcasting ranked programs associated with a high rank to other nodes, and receiving high rank programs from other nodes for a subpopulation.
24. A dynamic learning process as claimed in claim 23, including adjusting said fitness importance data based on an energy level of said node.
25. Computer readable storage media including program code for performing a process as claimed in any one of claims 1 to 24.
PCT/AU2010/001326 2009-10-09 2010-10-08 A dynamically controlled learning system WO2011041846A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AU2009904927A AU2009904927A0 (en) 2009-10-09 A dynamically controlled learning system
AU2009904927 2009-10-09

Publications (1)

Publication Number Publication Date
WO2011041846A1 true WO2011041846A1 (en) 2011-04-14

Family

ID=43856306

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2010/001326 WO2011041846A1 (en) 2009-10-09 2010-10-08 A dynamically controlled learning system

Country Status (1)

Country Link
WO (1) WO2011041846A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1033637A2 (en) * 1999-03-02 2000-09-06 Yamaha Hatsudoki Kabushiki Kaisha Method and apparatus for optimizing overall characteristic of device, using heuristic method
US6128579A (en) * 1997-03-14 2000-10-03 Atlantic Richfield Corporation Automated material balance system for hydrocarbon reservoirs using a genetic procedure
JP2004178574A (en) * 2002-11-11 2004-06-24 Yamaha Motor Co Ltd Control parameter optimization method, control parameter optimization device, and control apparatus optimization program
US20040133308A1 (en) * 2002-08-30 2004-07-08 Keisuke Kato Robot apparatus and motion controlling method therefor

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6128579A (en) * 1997-03-14 2000-10-03 Atlantic Richfield Corporation Automated material balance system for hydrocarbon reservoirs using a genetic procedure
EP1033637A2 (en) * 1999-03-02 2000-09-06 Yamaha Hatsudoki Kabushiki Kaisha Method and apparatus for optimizing overall characteristic of device, using heuristic method
US20040133308A1 (en) * 2002-08-30 2004-07-08 Keisuke Kato Robot apparatus and motion controlling method therefor
JP2004178574A (en) * 2002-11-11 2004-06-24 Yamaha Motor Co Ltd Control parameter optimization method, control parameter optimization device, and control apparatus optimization program

Similar Documents

Publication Publication Date Title
Gharehchopogh et al. A comprehensive survey on symbiotic organisms search algorithms
US11757982B2 (en) Performing load balancing self adjustment within an application environment
Sharma et al. Lévy flight artificial bee colony algorithm
JP5888640B2 (en) Photovoltaic power generation prediction apparatus, solar power generation prediction method, and solar power generation prediction program
Zhang et al. Gradient decent based multi-objective cultural differential evolution for short-term hydrothermal optimal scheduling of economic emission with integrating wind power and photovoltaic power
Islam et al. Development of chaotically improved meta-heuristics and modified BP neural network-based model for electrical energy demand prediction in smart grid
CN113574546A (en) Determining causal models for a control environment
Zhang et al. Adaptive grid based multi-objective Cauchy differential evolution for stochastic dynamic economic emission dispatch with wind power uncertainty
Muzumdar et al. Designing a robust and accurate model for consumer-centric short-term load forecasting in microgrid environment
CN110599068A (en) Cloud resource scheduling method based on particle swarm optimization algorithm
CN109828836A (en) A kind of batch streaming computing system dynamic state of parameters configuration method
KR20220009682A (en) Method and system for distributed machine learning
Kiefhaber et al. Ranking of direct trust, confidence, and reputation in an abstract system with unreliable components
CN111461443A (en) Optimal design method and system for wind power plant, electronic device and storage medium
CN111126707A (en) Energy consumption equation construction and energy consumption prediction method and device
WO2011041846A1 (en) A dynamically controlled learning system
Qian et al. An Edge Intelligence-Based Framework for Online Scheduling of Soft Open Points With Energy Storage
Huang et al. Application-driven sensing data reconstruction and selection based on correlation mining and dynamic feedback
US20190190265A1 (en) Method and system for demand-response signal assignment in power distribution systems
Avalos et al. A comparative study of evolutionary computation techniques for solar cells parameter estimation
Basaran et al. A short-term photovoltaic output power forecasting based on ensemble algorithms using hyperparameter optimization
Yu et al. A systematic review of reinforcement learning application in building energy-related occupant behavior simulation
Zheng et al. Course scheduling algorithm based on improved binary cuckoo search
CN117349532B (en) Dynamic multi-target service combination optimization recommendation method and system
CN117293916B (en) User-oriented power grid dispatching method and device and computing equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10821488

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10821488

Country of ref document: EP

Kind code of ref document: A1