CN114356535A - Resource management method and device for wireless sensor network - Google Patents

Resource management method and device for wireless sensor network Download PDF

Info

Publication number
CN114356535A
CN114356535A CN202210255790.2A CN202210255790A CN114356535A CN 114356535 A CN114356535 A CN 114356535A CN 202210255790 A CN202210255790 A CN 202210255790A CN 114356535 A CN114356535 A CN 114356535A
Authority
CN
China
Prior art keywords
wireless sensor
sensor network
agent
reward
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210255790.2A
Other languages
Chinese (zh)
Inventor
曾勇
万子金
熊山山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jincheng Century Consulting Service Co ltd
Original Assignee
Beijing Jincheng Century Consulting Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jincheng Century Consulting Service Co ltd filed Critical Beijing Jincheng Century Consulting Service Co ltd
Priority to CN202210255790.2A priority Critical patent/CN114356535A/en
Publication of CN114356535A publication Critical patent/CN114356535A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Mobile Radio Communication Systems (AREA)

Abstract

The application relates to a resource management method and a device of a wireless sensor network; the method comprises the following steps: taking each sensor node in the wireless sensor network as an agent; setting network parameters for a wireless sensor network, wherein the network parameters at least comprise: environmental status, action lists, and reward functions; performing iterative interaction of multiple agents based on the network parameters to determine an optimal strategy; and performing resource allocation and task scheduling on the sensor nodes in the wireless sensor network according to the optimal strategy. According to the scheme, the dynamic interaction theory of the multi-agent is applied to the wireless sensor network, and the problems of resource allocation and task scheduling in the wireless sensor network are solved, so that the wireless sensor network can actively perform resource allocation and task scheduling and provide an online monitoring function under the conditions that the wireless sensor network is inaccessible and the outside cannot intervene.

Description

Resource management method and device for wireless sensor network
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a resource management method and device of a wireless sensor network.
Background
Typically in wireless sensor networks, wireless sensor nodes are heterogeneous, energy-constrained, and tend to operate under dynamic and ambiguous conditions. In these cases, the nodes need to know how to collaborate on tasks and resources (including power and bandwidth).
In the related art, in some application scenarios, the wireless sensor network sometimes disconnects from the outside and is in an inaccessible state, and the outside cannot schedule and manage the sensor network. In such a case, the wireless sensor network needs to actively perform resource allocation and task scheduling.
Disclosure of Invention
To overcome at least some of the problems in the related art, the present application provides a resource management method and apparatus for a wireless sensor network.
According to a first aspect of embodiments of the present application, there is provided a resource management method for a wireless sensor network, including:
taking each sensor node in the wireless sensor network as an agent;
setting network parameters for a wireless sensor network, wherein the network parameters at least comprise: environmental status, action space, and reward function;
performing iterative interaction of multiple agents based on the network parameters to determine an optimal strategy;
and performing resource allocation and task scheduling on the sensor nodes in the wireless sensor network according to the optimal strategy.
Further, the environmental state includes: battery power and/or spectrum availability; the action list includes: receiving or sending a specified packet, and/or performing a specified task; the reward function includes: internal rewards and/or external rewards.
Further, the internal reward is a reward function defined based on an internal variable, and the external reward is a reward function defined according to feedback of a central controller or other nodes;
each sensor node is provided with a corresponding reward function; the other nodes are other sensor nodes except the other nodes in the wireless sensor network.
Further, the taking each sensor node in the wireless sensor network as an agent includes:
modeling a wireless sensor network
Figure 690994DEST_PATH_IMAGE001
As a collection of agents; wherein
Figure 841352DEST_PATH_IMAGE002
The number of the sensor nodes in the wireless sensor network;
order to
Figure 108386DEST_PATH_IMAGE003
Representing a state space; wherein the content of the first and second substances,
Figure 353291DEST_PATH_IMAGE004
is a shared state space that is used by the client,
Figure 22170DEST_PATH_IMAGE005
is an intelligent agent
Figure 343430DEST_PATH_IMAGE006
The local state space of (a) is,
Figure 645229DEST_PATH_IMAGE007
order to
Figure 647820DEST_PATH_IMAGE008
Representing a space of action in which
Figure 233522DEST_PATH_IMAGE009
Is as follows
Figure 397787DEST_PATH_IMAGE006
The action space of the individual agent.
Further, the reward function is:
Figure 948068DEST_PATH_IMAGE010
wherein the content of the first and second substances,
Figure 488770DEST_PATH_IMAGE011
as an agent
Figure 928979DEST_PATH_IMAGE006
The reward earned;
Figure 342774DEST_PATH_IMAGE012
further, the performing iterative interactions of the multi-agent includes:
defining an action value function and a cost function;
converging to an optimal action value function through iterative interaction of multiple agents;
and determining an optimal strategy according to the optimal action value function.
Further, the action value function is:
Figure 806116DEST_PATH_IMAGE013
the cost function is:
Figure 212827DEST_PATH_IMAGE014
wherein the content of the first and second substances,
Figure 710804DEST_PATH_IMAGE015
indicating slave status
Figure 528457DEST_PATH_IMAGE016
Initiating and selecting actions from an action space
Figure 744674DEST_PATH_IMAGE017
Enter the next state
Figure 689497DEST_PATH_IMAGE018
The reward obtained by the agent;
Figure 41980DEST_PATH_IMAGE019
the value range of gamma is more than or equal to 0 and less than or equal to 1 as the discount factor.
Further, the step of iterative interaction of the multi-agent comprises:
Figure 531999DEST_PATH_IMAGE020
wherein the content of the first and second substances,
Figure 969933DEST_PATH_IMAGE021
indicating the learning rate.
Further, the determining an optimal policy according to the optimal action value function includes:
Figure 718446DEST_PATH_IMAGE022
wherein the content of the first and second substances,
Figure 239951DEST_PATH_IMAGE023
is shown in a state
Figure 353401DEST_PATH_IMAGE016
Temporal selection of actions from action space
Figure 340948DEST_PATH_IMAGE024
Is an optimal strategy.
According to a second aspect of the embodiments of the present application, there is provided a resource management apparatus for a wireless sensor network, including:
the setting module is used for taking each sensor node in the wireless sensor network as an agent and setting network parameters for the wireless sensor network; the network parameters include at least: environmental status, action lists, and reward functions;
the iteration module is used for carrying out iterative interaction of the multiple intelligent agents based on the network parameters and determining an optimal strategy;
and the management module is used for carrying out resource allocation and task scheduling on the sensor nodes in the wireless sensor network according to the optimal strategy.
The technical scheme provided by the embodiment of the application has the following beneficial effects:
the scheme of the application applies the multi-agent dynamic interaction theory to the wireless sensor network, and solves the problems of resource allocation and task scheduling in the wireless sensor network, so that the wireless sensor network can actively carry out resource allocation and task scheduling and provide an online monitoring function under the conditions of no access and no intervention from the outside, for example: controlling the temperature of a nuclear reactor, or invasive brain or muscle signal monitoring.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
Fig. 1 is a flowchart illustrating a resource management method of a wireless sensor network according to an example embodiment.
FIG. 2 is a schematic diagram of the interaction of an agent with an environment in multi-agent reinforcement learning.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of methods and apparatus consistent with certain aspects of the present application, as detailed in the appended claims.
Fig. 1 is a flowchart illustrating a resource management method of a wireless sensor network according to an example embodiment. The method may comprise the steps of:
step S1, taking each sensor node in the wireless sensor network as an agent;
step S2, setting network parameters for the wireless sensor network, wherein the network parameters at least comprise: environmental status, action lists, and reward functions;
step S3, carrying out iterative interaction of multiple agents based on the network parameters, and determining an optimal strategy;
and step S4, performing resource allocation and task scheduling on the sensor nodes in the wireless sensor network according to the optimal strategy.
The scheme of the application applies the multi-agent dynamic interaction theory to the wireless sensor network, and solves the problems of resource allocation and task scheduling in the wireless sensor network, so that the wireless sensor network can actively carry out resource allocation and task scheduling and provide an online monitoring function under the conditions of no access and no intervention from the outside, for example: controlling the temperature of a nuclear reactor, or invasive brain or muscle signal monitoring.
It should be understood that, although the steps in the flowchart of fig. 1 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 1 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
To further detail the technical solution of the present application, the multi-agent reinforcement learning problem is first introduced briefly.
Multi-agent reinforcement learning consists of several agents interacting with the environment and awarding prizes on an interactive basis. In order to model a wireless sensor network with reinforcement learning, the present solution refers to wireless sensor nodes as agents, and may consider the environment in which they are located, or consider other nodes as environments in which they tend to interact over a period of time.
In reinforcement learning, there is an environmental state; in some embodiments, there may be a series of measurements being made by the node, such as: their battery power, spectrum availability. The set of all environment states is defined as a state space, and the size of the state space grows exponentially as the number of parameters in the set increases.
Another index to be resolved is an action list. A node may receive or send a specified packet and may even perform a specified task.
Finally, it is necessary to define how the reward function is set. Two types of reward functions were studied: (1) internal rewards, i.e. the agent defines a reward function for itself based on some internal variables, such as energy usage; (2) external rewards, i.e., the agent receiving certain rewards from a central controller or other node, e.g., confirming that a packet has been successfully received.
The problem of multi-agent reinforcement learning is a broad research topic. The scheme mainly considers the solutions related to Q-Learning, which is one of the classical solutions of the scenes without available models in the environment.
To model an environment, Q-Learning treats the environment as a Markov decision process, where the state set, probability function, of the model environment is based on the current state, the actions of the agent, and the next state.
The scheme is applied to the application of the multi-agent Q-learning in the management problem of the communication resources of the wireless sensor. Three main frameworks are used to solve the multi-agent Q-Learning problem of resource allocation in wireless sensor networks: (1) the wireless node is an independent learner; (2) simulating a scenario of the joint learner using a framework of random gambling; (3) for the case of one leader and several followers, convergence to the optimal action value function is faster.
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below with reference to the accompanying drawings.
1、Q-learning
The present solution defines a list of main parameters. A random game is a tuple:
Figure 565256DEST_PATH_IMAGE001
intelligent agent set
Figure 705382DEST_PATH_IMAGE025
Figure 52049DEST_PATH_IMAGE003
A state space is shown that is composed of a local state space and a shared state space.
Figure 198997DEST_PATH_IMAGE004
Is a shared state space that is used by the client,
Figure 538580DEST_PATH_IMAGE005
is an intelligent agent
Figure 782480DEST_PATH_IMAGE006
Of a local state space of, wherein
Figure 50781DEST_PATH_IMAGE007
Figure 950604DEST_PATH_IMAGE008
Action space of agent
Figure 579031DEST_PATH_IMAGE026
Transfer function
Figure 923775DEST_PATH_IMAGE027
Wherein
Figure 549929DEST_PATH_IMAGE028
Intelligent agent actual value reward function
In Q-Learning, intelligenceThe body finds the optimal strategy through iterative interactions with the environment. In each step, the agent first observes the environmental state, considers its full observability, and models it using a Markov Decision Process (MDP) based on its current policy function. It decides to take any action that can change the environmental state to maximize its desired jackpot: (
Figure 733785DEST_PATH_IMAGE029
Max). )
Based on the reward value it receives to observe the next state, it updates its decision from the environment.
To mathematically discuss the Q-Learning function, it is first necessary to define a state action value function and a state value function:
Figure 916636DEST_PATH_IMAGE030
Figure 135128DEST_PATH_IMAGE031
a state action value function that, if an action U is taken starting from state S and from the set of available actions, (or Q-function, equation 1) results in an expected cumulative reward based on state-action; wherein the state value function (or V-function, formula 2) indicates that if starting from state S, the expected jackpot based on state can be obtained. It should be noted that formula one (Q function) is the expected reward obtained based on state action, and formula two (V function) is the expected reward obtained based on state no action only; both functions are expected values, while the expected arguments are different.
The discount factor 0 ≦ γ ≦ 1 indicates: how far and how long the agent considers when making a decision, the value range (0, 1)]。
Figure 932183DEST_PATH_IMAGE019
The larger the agent, the more steps are considered forward, but the training difficulty is higher;
Figure 852603DEST_PATH_IMAGE019
smaller agents focus on the pre-ocular benefits and less training difficulty.
If the best action value function is known, the best strategy can be calculated as follows:
Figure 822833DEST_PATH_IMAGE032
in Q-Learning, the agent iteratively begins interacting with the environment. In each step, it iteratively updates its state action value function and state value function (equation 4) based on the state it started, the action it took, the reward it earned, and the state it earned. In the formula 4, the first and second groups of the compound,
Figure 833514DEST_PATH_IMAGE021
indicating the learning rate. The goal of Q-Learning is to iteratively converge to an optimal state action value function and state value function, as shown in equation 4:
Figure 614520DEST_PATH_IMAGE033
2. multi-agent scene oriented extended Q-Learning
As shown in fig. 2, there are multiple agents interacting with the same environment. The most obvious solution is to consider independent learners interacting passively with the environment and add agent index numbers i to the state action value function, the state value function, and the reward function (add index numbers i to agents).
Figure 772969DEST_PATH_IMAGE034
This approach has several problems:
first, in this case, the agents may selfish attempt to maximize their expected cumulative rewards without regard to the actions of the other agents.
Second, a agent cannot unilaterally maximize its expected cumulative reward without regard to other agent behaviors.
Finally, the definition of the cost function is no longer valid. The expected jackpot cannot be updated by maximizing the operational cost function for the set of available operations for agent i.
To solve the first and second problems, actions of other agents may be added to the state action value function and the reward function (equation 6).
Figure 799087DEST_PATH_IMAGE035
3. Method for searching optimal value function
Generally, there are two main methods for updating the cost function:
A. a random strategy framework is adopted, which is a generalized form of Markov strategies and is suitable for a plurality of intelligent agents to interact with the same environment at the same time; B. a wide variety of games are used to simulate continuous action taking a scene.
In the application of wireless sensor network resource management, the method for finding the optimal cost function can be divided into two main frameworks.
3.1 independent Agents
In the problem of wireless sensor network resource management, a multi-agent Q-Learning algorithm based on independent learners is provided. Although training sensor nodes as joint action learners is more accurate, in most cases, the performance of the agents in both frameworks is nearly the same.
This approach will reduce training costs, whether the entire network or just one new sensor node, and the need for communication between nodes.
There are two cases where their approach is not feasible: (1) when strict coordination tasks need to be performed among the agents for specific tasks; (2) when there is a delay between the action taken by the agent and the reward it has acquired, e.g. when the agent needs to wait for some confirmation by the recipient, the node cannot connect the delayed reward with its action.
3.2 random Game
The multi-agent Q-learning problem is the primary and classical approach to modeling using the framework of random gambling. Three most successful algorithms for updating the cost function are proposed: NashQ-Learning, friends and Foe Q-Learning, and Minimax Q-Learning.
The authors of these three methods all show that in some cases the action value function converges to an optimal value.
The main challenge of this framework is the dimensionality, which makes it difficult to train a large number of agents.
Based on Minimax Q-Learning, the value functions in the two zeros and agents can be updated as follows:
Figure 929854DEST_PATH_IMAGE036
for a general scenario, each agent has a peer and a competitor. Based on this assumption, the cost function can be updated as follows:
Figure 131028DEST_PATH_IMAGE037
a common solution, called NashQ-Learning, uses the following command update value function:
Figure 261926DEST_PATH_IMAGE038
the scheme aims to investigate the Q-Learning algorithm of the multi-agent, analyze different game theory frameworks and solve the application of each framework. The target application of the scheme is resource management in the wireless sensor network, the Q-Learning algorithm is expanded to be used in a multi-agent scene, and a game theory framework for solving the problems of resource allocation and task scheduling in the wireless sensor network is solved.
An embodiment of the present application further provides a resource management device for a wireless sensor network, including:
the setting module is used for taking each sensor node in the wireless sensor network as an agent and setting network parameters for the wireless sensor network; the network parameters include at least: environmental status, action lists, and reward functions;
the iteration module is used for carrying out iterative interaction of the multiple intelligent agents based on the network parameters and determining an optimal strategy;
and the management module is used for carrying out resource allocation and task scheduling on the sensor nodes in the wireless sensor network according to the optimal strategy.
With regard to the apparatus in the above embodiment, the specific steps in which the respective modules perform operations have been described in detail in the embodiment related to the method, and are not described in detail herein. The modules in the resource management device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.
It should be noted that, in the description of the present application, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present application, the meaning of "a plurality" means at least two unless otherwise specified.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims (10)

1. A resource management method of a wireless sensor network is characterized by comprising the following steps:
taking each sensor node in the wireless sensor network as an agent;
setting network parameters for a wireless sensor network, wherein the network parameters at least comprise: environmental status, action lists, and reward functions;
performing iterative interaction of multiple agents based on the network parameters to determine an optimal strategy;
and performing resource allocation and task scheduling on the sensor nodes in the wireless sensor network according to the optimal strategy.
2. The method of claim 1, wherein the environmental state comprises: battery power and/or spectrum availability; the action list includes: receiving or sending a specified packet, and/or performing a specified task; the reward function includes: internal rewards and/or external rewards.
3. The method of claim 2, wherein the internal reward is a reward function defined based on an internal variable, and the external reward is a reward function defined from feedback from a central controller or other node;
each sensor node is provided with a corresponding reward function; the other nodes are other sensor nodes except the other nodes in the wireless sensor network.
4. The method according to any one of claims 1-3, wherein said using each sensor node in the wireless sensor network as an agent comprises:
modeling a wireless sensor network
Figure 645155DEST_PATH_IMAGE001
As a collection of agents; wherein
Figure 94591DEST_PATH_IMAGE002
The number of the sensor nodes in the wireless sensor network;
order to
Figure 11863DEST_PATH_IMAGE003
Representing a state space; wherein the content of the first and second substances,
Figure 230354DEST_PATH_IMAGE004
is a shared state space that is used by the client,
Figure 27409DEST_PATH_IMAGE005
is an intelligent agent
Figure 947829DEST_PATH_IMAGE006
The local state space of (a) is,
Figure 183639DEST_PATH_IMAGE007
order to
Figure 194320DEST_PATH_IMAGE008
Representing a space of action in which
Figure 975325DEST_PATH_IMAGE009
Is as follows
Figure 133774DEST_PATH_IMAGE006
The action space of the individual agent.
5. The method of claim 4, wherein the reward function is:
Figure 911891DEST_PATH_IMAGE010
wherein the content of the first and second substances,
Figure 42658DEST_PATH_IMAGE011
as an agent
Figure 243832DEST_PATH_IMAGE006
The reward earned;
Figure 640310DEST_PATH_IMAGE012
6. the method of claim 5, wherein said performing iterative interactions of the multi-agent comprises:
defining an action value function and a cost function;
converging to an optimal action value function through iterative interaction of multiple agents;
and determining an optimal strategy according to the optimal action value function.
7. The method of claim 6, wherein the action value function is:
Figure 952343DEST_PATH_IMAGE013
the cost function is:
Figure 937616DEST_PATH_IMAGE014
wherein the content of the first and second substances,
Figure 558959DEST_PATH_IMAGE015
indicating slave status
Figure 426421DEST_PATH_IMAGE016
Initiating and selecting actions from an action space
Figure 479828DEST_PATH_IMAGE017
Enter the next state
Figure 398236DEST_PATH_IMAGE018
The reward obtained by the agent;
Figure 941213DEST_PATH_IMAGE019
the value range of gamma is more than or equal to 0 and less than or equal to 1 as the discount factor.
8. The method of claim 7, wherein the step of iterative interaction of the multi-agent comprises:
Figure 548168DEST_PATH_IMAGE020
wherein the content of the first and second substances,
Figure 405266DEST_PATH_IMAGE021
indicating the learning rate.
9. The method of claim 8, wherein determining an optimal policy according to an optimal action value function comprises:
Figure 161869DEST_PATH_IMAGE022
wherein the content of the first and second substances,
Figure 360901DEST_PATH_IMAGE023
is shown inStatus of state
Figure 406217DEST_PATH_IMAGE016
Temporal selection of actions from action space
Figure 129322DEST_PATH_IMAGE024
Is an optimal strategy.
10. A resource management apparatus of a wireless sensor network, comprising:
the setting module is used for taking each sensor node in the wireless sensor network as an agent and setting network parameters for the wireless sensor network; the network parameters include at least: environmental status, action lists, and reward functions;
the iteration module is used for carrying out iterative interaction of the multiple intelligent agents based on the network parameters and determining an optimal strategy;
and the management module is used for carrying out resource allocation and task scheduling on the sensor nodes in the wireless sensor network according to the optimal strategy.
CN202210255790.2A 2022-03-16 2022-03-16 Resource management method and device for wireless sensor network Pending CN114356535A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210255790.2A CN114356535A (en) 2022-03-16 2022-03-16 Resource management method and device for wireless sensor network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210255790.2A CN114356535A (en) 2022-03-16 2022-03-16 Resource management method and device for wireless sensor network

Publications (1)

Publication Number Publication Date
CN114356535A true CN114356535A (en) 2022-04-15

Family

ID=81095210

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210255790.2A Pending CN114356535A (en) 2022-03-16 2022-03-16 Resource management method and device for wireless sensor network

Country Status (1)

Country Link
CN (1) CN114356535A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090187641A1 (en) * 2006-03-29 2009-07-23 Cong Li Optimization of network protocol options by reinforcement learning and propagation
CN106358203A (en) * 2016-08-30 2017-01-25 湖南大学 Method for spectrum allocation in distributed cognition wireless sensor network on basis of Q study
CN109462858A (en) * 2017-11-08 2019-03-12 北京邮电大学 A kind of wireless sensor network parameter adaptive adjusting method
CN111065145A (en) * 2020-01-13 2020-04-24 清华大学 Q learning ant colony routing method for underwater multi-agent
CN111641681A (en) * 2020-05-11 2020-09-08 国家电网有限公司 Internet of things service unloading decision method based on edge calculation and deep reinforcement learning
CN113141592A (en) * 2021-04-11 2021-07-20 西北工业大学 Long-life-cycle underwater acoustic sensor network self-adaptive multi-path routing mechanism
CN113938917A (en) * 2021-08-30 2022-01-14 北京工业大学 Heterogeneous B5G/RFID intelligent resource distribution system applied to industrial Internet of things
CN114095940A (en) * 2021-11-17 2022-02-25 北京邮电大学 Slice resource allocation method and equipment for hybrid access cognitive wireless network

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090187641A1 (en) * 2006-03-29 2009-07-23 Cong Li Optimization of network protocol options by reinforcement learning and propagation
CN106358203A (en) * 2016-08-30 2017-01-25 湖南大学 Method for spectrum allocation in distributed cognition wireless sensor network on basis of Q study
CN109462858A (en) * 2017-11-08 2019-03-12 北京邮电大学 A kind of wireless sensor network parameter adaptive adjusting method
CN111065145A (en) * 2020-01-13 2020-04-24 清华大学 Q learning ant colony routing method for underwater multi-agent
CN111641681A (en) * 2020-05-11 2020-09-08 国家电网有限公司 Internet of things service unloading decision method based on edge calculation and deep reinforcement learning
CN113141592A (en) * 2021-04-11 2021-07-20 西北工业大学 Long-life-cycle underwater acoustic sensor network self-adaptive multi-path routing mechanism
CN113938917A (en) * 2021-08-30 2022-01-14 北京工业大学 Heterogeneous B5G/RFID intelligent resource distribution system applied to industrial Internet of things
CN114095940A (en) * 2021-11-17 2022-02-25 北京邮电大学 Slice resource allocation method and equipment for hybrid access cognitive wireless network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
我勒个矗: "强化学习(Reinforcement Learning)知识整理", 《HTTPS://ZHUANLAN.ZHIHU.COM/P/25319023》 *

Similar Documents

Publication Publication Date Title
Fox et al. Multi-level discovery of deep options
CN113225377B (en) Internet of things edge task unloading method and device
CN112329948A (en) Multi-agent strategy prediction method and device
CN114375066B (en) Distributed channel competition method based on multi-agent reinforcement learning
CN116069512B (en) Serverless efficient resource allocation method and system based on reinforcement learning
Gallego et al. Opponent aware reinforcement learning
Yang et al. Keeping in Touch with Collaborative UAVs: A Deep Reinforcement Learning Approach.
Xu et al. Living with artificial intelligence: A paradigm shift toward future network traffic control
CN114090108B (en) Method and device for executing computing task, electronic equipment and storage medium
Sun et al. Markov decision evolutionary game theoretic learning for cooperative sensing of unmanned aerial vehicles
CN114356535A (en) Resource management method and device for wireless sensor network
Zhang et al. Clique-based cooperative multiagent reinforcement learning using factor graphs
CN115150335B (en) Optimal flow segmentation method and system based on deep reinforcement learning
Rapetswa et al. Towards a multi-agent reinforcement learning approach for joint sensing and sharing in cognitive radio networks
CN116193516A (en) Cost optimization method for efficient federation learning in Internet of things scene
CN115903901A (en) Output synchronization optimization control method for unmanned cluster system with unknown internal state
CN113157344B (en) DRL-based energy consumption perception task unloading method in mobile edge computing environment
Taylor et al. Two decades of multiagent teamwork research: past, present, and future
EP4226279A1 (en) Interactive agent
Jin et al. Hector: A reinforcement learning-based scheduler for minimizing casualties of a military drone swarm
Chen WEIGHT SPEEDY Q-LEARNING FOR FEEDBACK STABILIZATION OF PROBABILISTIC BOOLEAN CONTROL NETWORKS: http://dx. doi. org/10.17654/0972096023009
Liu et al. A novel data-driven model-free synchronization protocol for discrete-time multi-agent systems via TD3 based algorithm
US20230281277A1 (en) Remote agent implementation of reinforcement learning policies
Burger et al. Developing Action Policies with Q-Learning and Shallow Neural Networks on Reconfigurable Embedded Devices
Peng et al. A review of the development of distributed task planning in command and control domain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220415