CN117136343A

CN117136343A - System for balancing energy detection and observation times for auto-induction vehicles

Info

Publication number: CN117136343A
Application number: CN202180094725.XA
Authority: CN
Inventors: 法尔哈德·拉巴尼亚; 亚历山大·波罗杰克; 雅各布·威廉·朗格兰
Original assignee: Shewhart Aerospace Co
Current assignee: Shewhart Aerospace Co
Priority date: 2020-12-23
Filing date: 2021-12-22
Publication date: 2023-11-28
Also published as: CA3202847A1; WO2022133605A1; US20240038080A1; EP4268041A1

Abstract

The present invention relates to a multi-objective method of optimizing the "observation" or "sensing" time of an ASV and the "energy replenishment" or "energy harvesting" time of the ASV. The method comprises the following steps: collecting data about observation points of interest; determining whether energy harvesting is required; and effectively accessing the observation point of interest between searches of energy harvesting.

Description

System for balancing energy detection and observation times for auto-induction vehicles

Technical Field

The following relates to auto-induction vehicles, and more particularly to energy harvesting methods of such auto-induction vehicles.

Background

Auto-induction vehicles (ASVs), such as unmanned aerial vehicles, unmanned aircraft, controllable balloons (i.e., hot air balloons), remotely operated vehicles, remotely operated underwater vehicles, and the like, are typically unmanned and typically highly mobile, and may be remotely operated or autopilot by a user in the vicinity of the vehicle. The autonomous vehicle does not require a user to maneuver it. An autopilot vehicle may have greatly improved range and endurance capabilities of the unmanned vehicle. Auto-induction vehicles may be used for a variety of purposes including, but not limited to, remote sensing, commercial monitoring, movie production, disaster relief, geological exploration, agriculture, rescue operations, and the like. It may be noted that for these and other uses, it would be desirable to improve the operation time and endurance of the ASV. Auto-induction vehicles may contain a plurality of sensors, which may include, but are not limited to, accelerometers, altimeters, barometers, gyroscopes, thermal imagers, cameras, laser radar (LiDAR) sensors, and the like. These sensors may be used to increase the running time and endurance of the ASV or may be used for the above-mentioned purposes. For example, gyroscopes may be used to measure or maintain the direction and angular velocity of an ASV, and may improve the runtime of an ASV; cameras may be used to capture images during geological exploration.

One of the key limitations of ASV performance may be energy. The energy directly affects the cruising ability, range and payload capacity of the ASV. To better manage energy levels, ASVs may extract energy from their environment, which is referred to herein as "energy harvesting. The ASV may employ any energy harvesting method or combination of energy harvesting methods to harvest energy to enhance the cruising ability and range of the ASV. In one example, an underwater ASV may utilize wave flow to harvest energy. In another example, a terrestrial ASV may harness solar energy to harvest energy. In yet another example, aviation ASV may utilize hot air flow ascents (thermal updrafts) and ridge air flow ascents (ridge airfts) (referred to herein as "soaring") to harvest energy.

Soaring uses a hot gas stream to increase the flight time of an aviation ASV, and has been studied and tested over the last two decades. For example, in 2010, edwards and Silverberg in the mongolian cross-country challenge race (Montague Cross Country Challenge) exhibited soaring for a remote glider against a manned competitor. However, there may be challenges associated with soaring.

Some challenges include: sensing (an effective soaring system should be able to sense the motion of the surrounding environment and the air flow); energy harvesting (aviation ASV should be equipped to make decisions to harness energy and avoid droops in the air); energy level considerations (i.e., an aviation ASV should be able to take into account its energy state and the energy state of the environment when it is flying).

AutoSoar (Depenbusch, nathan T., john J. Bird, and Jack W.Langlaan. "The AutoSOAR autonomous soaring aircraft part 2:Hardware implementation and flight results." Journal of Field Robotics 35.4.4 (2018): 435-458) solves some of these problems. Autoso proposes a method of automatically flying by using hot gas flow rise and ridge gas flow rise. Autoso aims to handle all phases of the hot gas stream soaring, such as: thermal detection, thermal locking and unlocking, thermal centering control, mapping, detection, and flight management. Autoso is directed to a method of increasing the time of flight by using hot gas streams and easily searching for these hot gas streams.

However, autoso fails to optimize the run time of an ASV employing energy harvesting while simultaneously achieving the "sensing" or "observing" goals of the ASV task. There remains a need for a method of optimizing/balancing the "observation" or "sensing" time of an ASV and the "energy replenishment" or "energy harvesting" time of the ASV.

Disclosure of Invention

The invention provides a multi-objective method for optimizing the 'observation' or 'induction' time of an ASV and the 'energy supplement' or 'energy collection' time of the ASV. The method comprises the following steps: collecting data about observation points of interest; determining whether energy harvesting is required; and effectively accessing the observation point of interest between searches of energy harvesting.

Drawings

Embodiments will now be described in conjunction with the accompanying drawings, in which:

FIG. 1 is a schematic diagram of an off-board algorithm;

FIG. 2 is a schematic diagram of an on-board path planning system;

FIG. 3 is a schematic diagram of a time-based algorithm;

FIG. 4 is a schematic illustration of a rewards graph;

FIG. 5 is a schematic diagram of a graph of value functions;

FIG. 6 is a schematic illustration of a probability map of a previously found hot gas flow in a given region;

FIG. 7 is a schematic diagram of an embodiment of a combined diagram;

FIG. 8 is a schematic diagram of a decision AutoSoar system tuned to include a greedy decision algorithm;

FIG. 9 is a schematic diagram of an on-board system for an intelligent decision algorithm; and

FIG. 10 is a schematic diagram of an on-board system for a decision algorithm with a reinforcement learning system.

Detailed Description

The invention provides a multi-objective method for optimizing/balancing the 'observation' or 'sensing' time of an ASV and the 'energy supplement' or 'energy collection' time of the ASV. The method comprises the following steps: collecting data about observation points of interest; determining whether energy harvesting is required; and effectively accessing the observation point of interest between searches of energy harvesting.

The method provided by the invention can effectively access the observation point of interest and improve the cruising ability of the ASV. The invention proposes to determine the balance between energy harvesting, energy harvesting detection and access observation points. It may be noted that by employing different input signals, the ASV is instructed to extend its energy level and runtime while following the observed target.

The invention provides an optimized ASV observation system. The system includes off-board computer software and a local on-board intelligent system. Off-board computer software programs acquire historical flight data, weather forecast, mission objective, and ASV characteristics. The program then uses this information to generate paths and potential maps of potential paths. These maps and paths are planned with the weather forecast awareness system, but they are not required to generate the maps.

The local on-board intelligence system obtains information from the off-board computer, signals from the sensors, and autopilot instructions. The system may (or may not) also access a local weather system (third party). The system may select the next waypoint based on the presented information. The system utilizes an intelligent decision making system to achieve a balance between environmental detection and environmental utilization. The system updates the energy profile. For example, the system may select the pitch angle or speed of the aircraft so that it operates in a more optimal manner. In one embodiment, the proposed solution allows an aviation ASV to utilize the hot gas flow while behaving as expected in an observation task. In another embodiment, the proposed solution allows the underwater ASV to utilize wave flow while behaving as expected in an observation task.

The local on-board intelligence system obtains information from the off-board computer, signals from the sensors, and autopilot instructions. The system may or may not also access the local and global weather systems from third parties. The system may select the next waypoint based on the presented information. The system utilizes an intelligent decision making system to achieve a balance between environmental detection and environmental utilization. The system updates the energy profile. The system may select and/or modify the pitch angle or speed of the aircraft to operate in a more optimal manner. In one embodiment, the proposed solution allows an aviation ASV to utilize the hot gas flow while behaving as expected in an observation task. In another embodiment, the proposed solution allows the underwater ASV to utilize wave flow while behaving as expected in an observation task.

The method can improve the endurance of the ASV while achieving the observation target. This approach enables any ASV to perform its tasks efficiently and to use free energy available in the atmosphere (i.e., hot gas flow rise, tidal energy, solar energy). Off-board path planning and map generation

Fig. 1 shows a schematic diagram of an off-board algorithm. The off-board algorithm may calculate and generate the required paths for the on-board computing agent. Off-board algorithms may create a poll of potential actions for the on-board computer to make decisions from. An off-board computer may take some of the inputs 111 and generate outputs 112. Some outputs 112 include, but are not limited to: the value function map 109 may be a list 110 of values for multiple paths. Some inputs 111 include, but are not limited to: a start point 101, an end point 101, one or more regions of interest 101, no-fly areas, boundaries, historical flight data 102, aircraft parameters 103, energy capacity of ASV 104, map (terrain, land coverage, underwater, etc.) 105, weather forecast 106, observed importance factors 107, and one or more desired energy acquisition types 100. These inputs 111 may be obtained by an off-board algorithm via off-board path planner 108 and an output 112 is generated. In a preferred embodiment, the output 112 includes a potential value function graph 109 and/or a path list 110 having values associated therewith.

Thus, in one embodiment, using dynamic programming and available information such as location of observation points, historical flight information 102, wind and weather forecast 106, and vehicle energy status 104, off-board algorithm 108 generates a grid 109 of value functions having a value function associated with each grid. The system may take this as an input to an on-board computer system that manages vehicle behavior during operation and determines when it is appropriate to change behavior.

Onboard path planning system

Fig. 2 shows a schematic diagram of an on-board path planning system. Once the off-board global path planner 108 is computed, the on-board controller can make decisions using the output of the off-board system. The on-board path planner 113 may: generating potential behavior of the ASV using the value function graph 109 based on the observation points (targets) and the fastest paths to reach these observation points; sensor readings 114 (i.e., wind direction, energy level, air restriction, etc.) are illustrated; making decisions and realizing balance between the detection and utilization of hot air flow and detection of observation points; optimizing the behavior near the hot gas flow and observation point (e.g., a thermal sequence or cyclic sequence around the hot gas flow rise and observation point); generating a plurality of local path points, tilt angles and speeds of ASVs, etc.

Some inputs 111 to the on-board path planner 113 include, but are not limited to: sensor readings 114, ASV's energy capacity 104, energy readings 104b, autopilot instructions 115, weather forecast 106, map (terrain, land coverage, underwater, etc.) 105, output 112 from off-board planners 108, potential value function graphs 109, path lists 110 with values associated therewith, path points 116a, one or more desired energy harvesting types 100, and observed importance factors 107. The inputs 111 of the on-board path planner 113 may also include an endpoint 101, one or more regions of interest 101, no-fly regions, boundaries, historical flight data 102, and aircraft parameters 103. In a preferred embodiment, the output of the on-board (local) path planner 113 includes a map 117 with potential probabilities and/or a new list of waypoints 116.

Various methods may be employed in local path planner 113. These methods include: a time base algorithm; greedy algorithm decision making; an intelligent decision making system.

Time base algorithm

A time-based system may be used to balance the detection energy and the time to perform tasks. After a defined time in the energy detection mode, the system can directly start observing the latest task. For example, an ASV employing a time base system may perform observation tasks at specific times. If the ASV remains in the observation mode after this period of time, the system may switch to the energy detection mode for another specified period of time. The system may switch between the detection mode and the observation mode a plurality of times. Fig. 3 shows a time-based algorithm.

The ASV system may access the list of observation points. At some time, after completing the detection mode by climb 118 and decision 119, the ASV will find the first observation point (i.e., the nearest observation point 120). The ASV may then decide to observe 124 the first observation point 123 and update the observation list 125; or if the ASV never detects the area of the superheated steam 126, they are used to perform energy harvesting 130 and update the hot gas flow list 131 as needed. In one embodiment, the balancing decision is from an on-board timer. If the timer expires during the observation mode, the system will switch the ASV to the probing mode 126 for a specified period of time. In one embodiment, the system may repeat this action until the ASV reaches the observation point. Once an observation point is observed, the system moves the observation point to the end of the list of observation points and sets the next observation point to the next target. This sequence may be repeated. In another embodiment, if the timer expires during the probing mode, the system will switch the ASV to the observation mode for a specified period of time.

Value function graph

Since the location of the first observation point may be known, a grid map or a value function map covering the entire region of interest may be created. Fig. 4 shows one example of a potential value function map 109 created by the on-board or off-board path planners 108, 113. The user may input latitude, longitude, and/or altitude information of the observation target. The system will then assign a value to each target. In one embodiment, the bonus map is created by assigning a large positive bonus 403 to points of interest and a negative bonus 401 to "no fly" areas (e.g., areas with bad weather). Fig. 4 shows one example of two observation points 403 located at [6,9] and [10, 10]; the starting point is located at [0,0]. Each observation point is given a large positive value (5). In one embodiment, origin 401 is given a negative value (-1). The remainder of region 402 is assigned a value of 0.

Fig. 5 shows another example of a value function graph. The value function graph is produced using the prize graph shown in fig. 4 and the value function equation defined below. The system may calculate a relative value for each cell/location. The method for generating the value function graph can comprise the following steps: segmenting a region of interest into a grid; adding a starting point; and defining the potential observation point as a positive unit. The method may further comprise the step of defining the grid state, i.e. each cell is defined as a new state of the environment. Note that the value 404 of each cell is related to its "goodness" in that cell. One approach is to define a variable to tell us how good each unit is by assigning rewards to each unit. In one embodiment, the map is created by assigning a large positive reward to points of interest, a negative reward to "no fly" areas (e.g., areas of bad weather), and a small positive reward to past locations where hot gas flows recur. When the observation point is crossed, rewards are increased. In fig. 5, the highest value 403 is assigned to the cell [11, 10]. Units [12, 10]; [8,7] etc. are assigned medium level rewards 408, 405. Units [13, 10]; [9,6] is assigned a low level prize 406, 407. Units [4, 10]; [6,0] is assigned a negligible level of rewards 401, 402.

The concept of "how good" 404 is defined herein in terms of a prospective future reward or prospective return. Thus, the value function is defined with respect to the policy. A policy is a mapping from each state and action to the probability of taking an action when that state is reached.

The method may further include defining a set of possible actions. A special set of actions is defined by the eight actions that an ASV may take, all with the same probability of being selected. The ASV may move in eight directions from any cell to its neighboring cells. It may be noted that an edge condition is a limited form of action (i.e., movement from [0,0] can only be in three directions).

To define the value function equation, a state S e S may be defined, where S may be a point in the mesh size m n, which represents the geographic location. s may store values for weather, energy probability, and the presence or absence of observation points. The definition of rewards is as follows:

where β and θ are positive real numbers. Then, act a is defined at grid t _t ∈A。

Note that W.P denotes "probability … …". Then, a policy, pi (s, a), can be defined that assigns a probability for each action in each state. For simplicity, it is assumed that this is a uniform allocation strategy from here on, but it can be any strategy, even learned.

Will G _f Defined as the expected prize at position t:

where 0.ltoreq.y.ltoreq.1 is the discount factor for future rewards.

Wherein r is _t 、r _t +1..is generated by a strategy pi starting from state s.

Then the value function for each grid point may be:

A map of the grid of value functions is generated using the value functions.

The above steps may also be applied to multiple input sources (i.e., historical flight information or wind). In one embodiment, one of the plurality of input sources may include historical flight information.

Fig. 6 shows a probability map of the hot gas flow previously found in a given region. The closer the cell value is to 1, the greater the likelihood of encountering the hot gas stream before it is at that location. The probability map may show the likelihood of finding energy from historical flight/weather information. For example, the likelihood of finding an energy source at [5,1]602 is 99%; the likelihood of finding an energy source at [7,2]603 is 97%; and the likelihood of finding an energy source on the black square 601 labeled 0.38 is 38%.

Fig. 7 illustrates one embodiment of a combination diagram. The combined graph may combine a value function graph and a probability graph. The combined graph may be a dynamic graph that changes based on user input of alpha. The greedy decision algorithm below explains the user-defined alpha values in more detail. The value of each cell is weighted differently according to the user-defined alpha value. For example, if the user-defined alpha value is closer to the observation mode, the observation point will have a higher relative weight. This makes the system more likely to default to the observation mode. In another case, if the user defines the alpha value as being closer to the energy harvesting probe, the energy source will have a higher relative weight. This makes the system more likely to default to a probing mode.

Notably, there are many ways to combine the value function graph and probability graph information, such as: a high prize value is added to the region where the probability of hot gas flow is high, and a low prize value is added otherwise. In one embodiment, an importance multiplier may be introduced to balance rewards associated with the observation point and the hot gas flow rise point. The importance multiplier value may be adjusted according to different tasks, where sometimes the detection of hot gas flow is more important than the observation of the observation point and vice versa.

The observation point may also be moving. The algorithm may track any fixed or moving observation points. The moving object may need an online connection to refresh the map.

Greedy decision algorithm

The greedy decision algorithm may balance between energy detection and observation modes through greedy probabilities.

Once the value function graph is defined, a graph of optimal behavior can be defined as follows. Defining the steps or actions taken by the ASV to move one or n units. The value function graph shows the best behavior as the action that an ASV can take from each given cell to the highest value neighbor.

The greedy decision algorithm may then balance between the observation mode and the detection mode. The algorithm may use various patterns to determine the behavior of the ASV, such as a probing mode and an observing mode. In one embodiment, the system will choose to access the highest value neighbor because this is the unit that defines the best behavior path. In another embodiment, bias graphs may be employed in combination with value function graphs in order to achieve more accurate decisions and behavior. The bias map may select a given best value cell that matches the bias direction. The algorithm may narrow down the potential cells that are available for selection by using the bias map.

The greedy decision algorithm may then initiate a probing mode to find new energy sources. In one embodiment, the algorithm may include a bias map in combination with the probe map and bias the map toward the observation point.

In another embodiment, the algorithm may utilize a greedy function or a random function to switch between the probing mode and the observation mode. An alpha value may be defined. The alpha value may represent a balance between probing and going to the observation point. The alpha value varies between 0 and 1. In one embodiment, the α value may be defined such that the closer it is to zero, the more prone it is to the detection mode; and the closer it is to 1, the more prone it is to the observation mode. The alpha value is continuously close to 1 as time goes by and the detection goes deep. This will force the observation point. Once the ASV reaches the observation point, the alpha value is close to 0 (e.g., 0.0001). This allows the ASV to continue probing. Over time, the ASV returns to the detection mode when the speed of the alpha value decreases. The increase in alpha value depends on the super parameter. The hyper-parameters may be selected by the user. Which may range between 1% and 99%. The preferred range is about 5% to 15%. For example, a 10% superparameter would update the map every time an observation point is accessed. The reward for that observation point is reduced to approximately 0 so that it is more inclined to another observation point than the current observation point. This action may be repeated until all observation points are observed. In another case, once the next observation point is reached, the prize value may be restored for the previous observation point.

The greedy decision algorithm is advantageous because many observation points can be defined. It can be summarized with different observation points with different importance levels. It may include priority and aerographic information to easily generate value functions and make the map more intelligent. In addition, real-time updates may also be run on the ASV and provided.

The following step function may be employed:

fig. 8 provides a state diagram of a decision 819 autoso system that adjusts to include a greedy decision algorithm. The "go to observe" tab 824 simply moves directly to observe mode 825. The user may adjust or select the amount of time consumed in the probe mode 826. The user may adjust or select the amount of time consumed by the observation mode 825.

1. With a probability of α, enter a "go to observe" state 824.

2. Once the observation point 825 is completed, it goes from the observation state to the decision state 819, and sets α=0.001.

3. With a probability of 1-alpha, the "probe" state 826 is entered.

4. In the 'detection' state on one side, detection is carried out once according to the definition of an AutoSoar detection method. When returning to the decision state 819, α=α+0.1 is raised.

5. If very close to the observation point, we will go to observations 825 marked 5 and 6 in the schematic.

6. After moving only one cell, return to decision state 819.

7. If a hot gas flow is encountered, the hot gas flow 829 is locked.

8. Once the lock 829 is complete, the decision state 819 is returned.

Intelligent decision-making system

The intelligent decision system is a combination of early approaches. In this algorithm, the actions of the system are affected by a set of rules defined before flight. An offset algorithm may be used to select the optimal action to maximize the probability and observed behavior of the hot gas flow. The action of the system may be affected by a set of rules defined before flight.

Fig. 9 is a schematic diagram of an on-board system for the intelligent decision algorithm 119. The algorithm combines a time algorithm, a greedy algorithm, and sensor readings to detect more desirable energy sources in the environment. These actions are optimized based on such that the system maximizes the behavior of the known energy source for the observation point and history. The bias defined in the foregoing optimization is utilized to efficiently maximize utilization in a safe manner.

The algorithm may evaluate the readings of the sensor 114 and evaluate its value versus graph. In one embodiment, if the original atlas is deemed not accurate enough, the algorithm may be configured to trigger a new global path planner sequence.

The intelligent AI decision system is an online decision system. The system may be on-board or off-board. The algorithm uses the input signal to determine the next waypoint, tilt angle and speed of the ASV. The AI system 132 first checks 133 the readings from the inputs and if they are of any uncertainty or the readings differ from their value functions, the AI system will recalculate 134 and update their environmental value functions.

If the readings are within an acceptable range of the system's value function, the system will generate an observation map (such as a value function map), an uncertainty map, energy, wind and glide maps. The AI system 132 then combines these maps using the pre-flight defined alpha factor.

The observation map may also be modified by a time factor. The time factor is a value between 0 and 1. It modifies rewards for observation points before updating the value function graph. If an observation point is observed, its reward is reduced.

Because of the combination of these maps, the generated map 135 is biased towards the heading of the energy source, observation point, wind direction, and ASV.

The intelligent AI decision system 132 then calculates the trajectory and direction of travel to the next point and generates the waypoint 116, tilt angle, and speed.

Reinforcement Learning (RL) proxy decision system

RL system 136 is similar to an intelligent decision making system. Fig. 10 is a schematic diagram of an on-board system for a decision algorithm with reinforcement learning system 136. In this embodiment, the system utilizes the information to decide whether to access the observation point or the energy harvesting point based on the decision.

RL system 136 can be trained or designed to make decisions. One method of training is to have the RL proxy train in a simulated environment. The evaluation may be performed by human feedback or by comparing the results with those of other systems.

A reward function may also be defined to train the RL agents that evaluate how much energy was used, whether observation points were visited, and the time spent on them. RL systems may also employ deep neural networks. The RL system makes decisions based on the input signal, the processed data and the subsequent points.

For purposes of simplicity and clarity of illustration, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements. Furthermore, numerous specific details are set forth in order to provide a thorough understanding of the examples described herein. However, it will be understood by those of ordinary skill in the art that the examples described herein may be practiced without these specific details. In other instances, well-known methods, procedures, and components have not been described in detail so as not to obscure the examples described herein. Furthermore, this description should not be taken as limiting the scope of the examples described herein.

It should be understood that the examples and corresponding schematic diagrams used herein are for illustrative purposes only. Different configurations and terms may be employed without departing from the principles of the present invention. For example, components and modules may be added, deleted, and modified or arranged in various connections without departing from these principles.

It should also be appreciated that any module or component of the invention that executes instructions illustrated herein may include or otherwise access a computer-readable medium, such as a storage medium, a computer storage medium, or a data storage device (removable and/or non-removable), e.g., magnetic disks, optical disks, or tape. Computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Examples of computer storage media include RAM, ROM, EEPROM, flash memory or other storage technology, CD-ROM, digital Versatile Disks (DVD) or other optical storage media, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by an application, module, or both. Any such computer storage media may be part of, any component of, or associated with, the system, etc., or may be accessible by, or connected with, the system. Any of the applications or modules described herein may be implemented using computer-readable/executable instructions that may be stored or otherwise maintained by such computer-readable media.

The steps or operations in the flowcharts and diagrams described herein are exemplary only. There may be many variations to these steps or operations without departing from the principles described above. For example, the steps may be performed in a differing order, or steps may be added, deleted or modified.

While the foregoing principles have been described in connection with certain specific examples thereof, various modifications thereof will be apparent to those skilled in the art, as set forth in the appended claims.

Claims

1. A system for controlling an auto-induction vehicle, comprising:

-a local on-board system of the auto-induction vehicle;

-an off-board system;

-at least one sensor located on the auto-induction vehicle;

-an autopilot command system;

wherein the local on-board system obtains information from the off-board system, sensors, and an autopilot command system; and is also provided with

Wherein the off-board system includes a decision algorithm to select access to the observation point or to access to the energy harvesting point based on the received information.

2. The system of claim 1, wherein the auto-induction vehicle is an aircraft and the energy harvesting point is a hot gas stream.

3. The system of claim 1, wherein the auto-induction vehicle is an underwater vehicle and the energy collection point is a wave stream.

4. A method of controlling an auto-induction vehicle, comprising:

-generating a rewards graph corresponding to at least one observation point;

-generating a value function graph;

-generating a probability map corresponding to at least one energy acquisition point;

-generating a combined graph by combining the cost function graph and the probability graph;

-making a decision to visit the observation point or to visit the energy harvesting point based on the generated combined map; and

-sending instructions to an on-board system on the auto-induction vehicle to access the observation point or to access the energy harvesting point according to the decision.

5. The method of claim 4, wherein the step of generating the value function graph comprises:

-segmenting the region of interest into a mesh;

-adding a starting point; and

-assigning a positive value to the observation point.

6. The method according to claim 5, further comprising the step of:

-assigning a negative value to the restricted area.