CN111225380A - Dynamic access method for air-space-earth-sea integrated multi-user cooperative learning - Google Patents

Dynamic access method for air-space-earth-sea integrated multi-user cooperative learning Download PDF

Info

Publication number
CN111225380A
CN111225380A CN202010033307.7A CN202010033307A CN111225380A CN 111225380 A CN111225380 A CN 111225380A CN 202010033307 A CN202010033307 A CN 202010033307A CN 111225380 A CN111225380 A CN 111225380A
Authority
CN
China
Prior art keywords
learning
action
value
user
terminal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010033307.7A
Other languages
Chinese (zh)
Inventor
谷林海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dongfanghong Satellite Mobile Communication Co Ltd
Original Assignee
Dongfanghong Satellite Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dongfanghong Satellite Mobile Communication Co Ltd filed Critical Dongfanghong Satellite Mobile Communication Co Ltd
Priority to CN202010033307.7A priority Critical patent/CN111225380A/en
Publication of CN111225380A publication Critical patent/CN111225380A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/02Resource partitioning among network components, e.g. reuse partitioning
    • H04W16/10Dynamic resource partitioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/14Spectrum sharing arrangements between different networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W74/00Wireless channel access, e.g. scheduled or random access
    • H04W74/08Non-scheduled or contention based access, e.g. random access, ALOHA, CSMA [Carrier Sense Multiple Access]

Abstract

The invention belongs to the technical field of aerospace, geodetic and sea integrated communication, and particularly relates to an aerospace, geodetic and sea integrated multi-user cooperative learning dynamic access method; according to the method, each terminal user in the air-space-ground-sea integrated communication system independently learns by adopting an Agent reinforcement learning algorithm, meanwhile, a plurality of terminal users realize strategy sharing through a blackboard model, a plurality of strategies are fused and improved through a fusion algorithm after self-learning, and then the strategies after fusion are used for re-learning, so that the prior knowledge of each terminal user is increased, the learning speed is accelerated, the learning efficiency is improved, the collision probability of the air-space-ground-sea integrated communication system is reduced, and the average capacity of the air-space-ground-sea integrated communication system is improved.

Description

Dynamic access method for air-space-earth-sea integrated multi-user cooperative learning
Technical Field
The invention relates to the technical field of space, sky, earth and sea integrated communication, in particular to a space, sky, earth and sea integrated multi-user cooperative learning dynamic access method.
Background
The air, space, ground and sea integrated information network is based on a ground network, is expanded by a space-based network, adopts a unified technical architecture, a unified technical system and a unified standard specification, is formed by interconnecting and intercommunicating a space-based information network, the internet and a mobile communication network, and has the characteristics of diversified service bearing, heterogeneous network interconnection, global resource management and the like. The air, space, ground and sea integrated information network is used as an important national information infrastructure and has important significance in a plurality of fields such as homeland security, emergency disaster relief, transportation, economic development and the like.
In order to meet the use requirements of frequency spectrum resources of the air-space-earth-sea integrated communication system, on one hand, the available frequency spectrum needs to be expanded, for example, a terahertz frequency spectrum and a visible light frequency spectrum are adopted; on the other hand, the spectrum use rule needs to be changed, the current situation that the current authorized carrier use mode is dominant is broken through, and the spectrum is distributed and used in a more flexible mode, so that the spectrum resource utilization rate is improved.
At present, the ground communication and the satellite communication mainly adopt an authorized carrier wave using mode, a frequency spectrum resource owner monopolizes a frequency spectrum using authority, and other demanders have no opportunity to use the frequency spectrum resource even if the frequency spectrum resource is temporarily idle. The exclusive authorized spectrum has strict limits and requirements on technical indexes, use areas and the like of users, can effectively avoid intersystem interference and can be used for a long time. However, while having higher stability and reliability, this method also has the problems of spectrum idleness and insufficient utilization caused by the exclusive use of the frequency band by the authorized user, which aggravates the contradiction between supply and demand of spectrum.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides an aerospace, geodetic and sea integrated multi-user cooperative learning dynamic access method.
In one aspect, the invention provides an aerospace, geodetic and sea integrated multi-user collaborative learning dynamic access method, which comprises the following steps:
s1, presetting the value of M and the value of T, setting the time when T is 0, randomly initializing Q (S, a) of N terminal users, and setting an initial learning rate lambda and a discount factor β;
s2: executing a standard Q learning algorithm by the N terminal users respectively;
s3: judging whether t can be evenly divided by M; if the value cannot be evenly divided, the process proceeds directly to step S4; if the current value can be completely removed, the N terminal users exchange strategies and fuse, namely after learning M steps, the N terminal users release the current accumulated Q value to the blackboard, and simultaneously obtain the Q values of other terminal users from the blackboard, so that each terminal user fuses the strategy according to a fusion algorithm, selects an action according to the fused strategy, and then enters the step S4.
S4:t=t+1;
S5: judging whether T is more than or equal to T or not; if so, selecting the action by completely adopting a greedy strategy; if not, the process returns to step S2.
Optionally, the greedy policy a*(s) the computational expression is:
Figure BDA0002365121600000021
wherein, a represents an optional action set, and b represents that the optional action set occupies a frequency point; arg (×) is an angle-taking operation; max (×) is the max operation.
Optionally, the step of the standard Q learning algorithm includes: step A. observing the current state s of the environmentt(ii) a B, selecting an action a according to Boltzmann action selection strategytAnd executing; step C, observing the subsequent state s of the environmenttAnd obtaining an enhanced signal r from the environment; step D, corresponding Q(s) to the state-action pair (s, a)t,at) And (6) updating.
Optionally, the action selection policy calculation expression is as follows:
Figure BDA0002365121600000022
wherein, Q(s)t,ai) Is the Q value, p (a), of each state-action pairi/stQ) is in state stProbability of selecting action a; a represents an optional action set, and b represents that the optional action set occupies a frequency point; t is an adjustable temperature parameter, and the larger the T is, the stronger the randomness of the selected action is; exp (, x) is an exponential operation.
Optionally, the update formula of the Q-value table is as follows:
Figure BDA0002365121600000031
r=rt(s,a″t,a2)
wherein i represents the first end user; a is1,a′1∈A,a2,a′2A' is a joint action of all end users; r ist(s,a″t,a2) A reward function for the environment for the associated action; s' is the observation environment state.
Optionally, the step of fusing the algorithm includes:
step A, setting the step M as a learning period, after each learning period is finished, each terminal user sends the current Q value of the terminal user to the blackboard, shares the Q values of other terminal users in the blackboard, and finds out the terminal user with the maximum Q value
Figure BDA0002365121600000032
The computational expression of the end user of (1) is:
Figure BDA0002365121600000033
step B, calculating
Figure BDA0002365121600000034
Step C. calculation
Figure BDA0002365121600000035
Step D, for all the end users i e {1,2, … N }, there are
Figure BDA0002365121600000036
The invention has the beneficial effects that:
(1) the invention discloses an aerospace, geodetic and sea integrated multi-user cooperative learning dynamic access method, and provides a dynamic spectrum access theory and a model thereof suitable for an aerospace, geodetic and sea integrated communication system.
(2) The invention relates to an air-space-earth-sea integrated multi-user collaborative learning dynamic access method, wherein a fusion algorithm in the method considers interaction and communication among terminal users, eliminates redundant actions in a strategy as much as possible through cooperation among the terminal users, and then realizes a final target in a relatively high-efficiency mode, thereby improving the execution efficiency of a system.
(3) The invention discloses an aerospace, geodetic and sea integrated multi-user collaborative learning dynamic access method which adopts a shared blackboard model to realize information sharing and achieves the aims of realizing collaboration and accelerated learning.
(4) According to the dynamic access method for the air-space-earth-sea integrated multi-user collaborative learning, the probability of conflict occurrence is reduced by combining the sharing algorithm and the Q learning algorithm, and the learning speed and the learning effect of the system are really and greatly improved by the communication and sharing strategy.
(5) The air, space, ground and sea integrated multi-user cooperative learning dynamic access method allocates and uses frequency spectrum in a more flexible mode, so that the utilization rate of frequency spectrum resources is improved.
(6) The invention discloses an air-space-earth-sea integrated multi-user collaborative learning dynamic access method which combines reinforcement learning in artificial intelligence with a spectrum sharing technology to realize intelligent dynamic spectrum sharing.
Drawings
In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.
FIG. 1 is a flow chart of an aerospace-terrestrial-sea integrated multi-user cooperative learning dynamic access method according to the present invention;
fig. 2 is a model schematic diagram of an aerospace-terrestrial-sea integrated multi-user cooperative learning dynamic access method of the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and therefore are only examples, and the protection scope of the present invention is not limited thereby.
It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which the invention pertains.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
At present, the ground communication and the satellite communication mainly adopt an authorized carrier wave using mode, a frequency spectrum resource owner monopolizes a frequency spectrum using authority, and other demanders have no opportunity to use the frequency spectrum resource even if the frequency spectrum resource is temporarily idle. The exclusive authorized spectrum has strict limits and requirements on technical indexes, use areas and the like of users, can effectively avoid intersystem interference and can be used for a long time. However, the method has higher stability and reliability, and also has the problems of spectrum idleness, insufficient utilization and the like caused by the exclusive use of the frequency band by the authorized user, so that the contradiction between supply and demand of the spectrum is aggravated; in order to solve the above problems, it is necessary to develop an aerospace-geostationary integrated multi-user collaborative learning dynamic access method, which improves a plurality of strategies by fusion algorithm after self-learning, and then learns again by using the fused strategies, so as to increase the prior knowledge of each terminal user, thereby accelerating learning speed, improving learning efficiency, reducing the collision probability of the aerospace-geostationary integrated communication system, and improving the average capacity of the aerospace-geostationary integrated communication system.
The specific embodiment of the present invention provides an aerospace, terrestrial and sea integrated multi-user collaborative learning dynamic access method, which is shown in fig. 1-2 and comprises the following steps:
in step S1, the value of M and the value of T are preset, and Q (S, a) of N end users is randomly initialized at the time when T is 0, and the initial learning rate λ and the discount factor β are set.
In step S2, the N end users are each subjected to a standard Q learning algorithm.
In an embodiment of the present invention, the step of the standard Q learning algorithm includes: step A. observing the current state s of the environmentt(ii) a B, selecting an action a according to Boltzmann action selection strategytAnd executing; step C, observing the subsequent state s of the environmenttAnd obtaining an enhanced signal r from the environment; step D, corresponding Q(s) to the state-action pair (s, a)t,at) And (6) updating.
The action selection strategy calculation expression is as follows:
Figure BDA0002365121600000051
wherein, Q(s)t,ai) Is the Q value, p (a), of each state-action pairi/stQ) is in state stProbability of selecting action a; a represents an optional action set, and b represents that the optional action set occupies a frequency point; t is an adjustable temperature parameter, and the larger the T is, the stronger the randomness of the selected action is; exp (, x) is an exponential operation.
The updating formula of the Q value table is as follows:
Figure BDA0002365121600000061
r=rt(s,a′t,a2)
wherein i represents the first end user; a is1,a′1∈A,a2,a′2A' is a joint action of all end users; r ist(s,a′t,a2) A reward function for the environment for the associated action; s' is the observation environment state.
In step S3, it is determined whether t is divisible by M; if the value cannot be evenly divided, the process proceeds directly to step S4; if the current value can be completely removed, the N terminal users exchange strategies and fuse, namely after learning M steps, the N terminal users release the current accumulated Q value to the blackboard, and simultaneously obtain the Q values of other terminal users from the blackboard, so that each terminal user fuses the strategy according to a fusion algorithm, selects an action according to the fused strategy, and then enters the step S4.
In the embodiment of the invention, the fusion algorithm considers the interaction and communication among terminal users, aims to eliminate redundant actions in a strategy as much as possible through the cooperation among the terminal users, and then realizes a final target in a relatively high-efficiency mode, so that the execution efficiency and the convergence performance of a system are improved, and the fusion algorithm comprises the following steps: step A, setting the step M as a learning period, after each learning period is finished, each terminal user sends the current Q value of the terminal user to the blackboard, shares the Q values of other terminal users in the blackboard, and finds out the terminal user with the maximum Q value
Figure BDA0002365121600000062
The computational expression of the end user of (1) is:
Figure BDA0002365121600000063
step B, calculating
Figure BDA0002365121600000064
Step C. calculation
Figure BDA0002365121600000065
Step D, for all the end users i e {1,2, … N }, there are
Figure BDA0002365121600000066
In step S4, t is t + 1.
In step S5, it is determined whether T is equal to or greater than T; if so, selecting the action by completely adopting a greedy strategy; if not, the process returns to step S2.
In the embodiment of the invention, the greedy strategy a*(s) the computational expression is:
Figure BDA0002365121600000071
wherein a representsSelecting an action set, wherein b represents that the action set occupies a frequency point; arg (×) is an angle-taking operation; max (×) is the max operation.
As shown in fig. 2, the model of the air-space-earth-sea integrated multi-user collaborative learning dynamic access method of the present invention mainly includes: the method comprises the steps of terminal user, Q learning, blackboard model sharing, fusion algorithm, selector, actuator and air, space, earth and sea integrated environment.
The terminal users are all accessible terminal users of the air-space-ground-sea integrated communication system, and comprise mobile terminals, terminals of the internet of things and the like.
The Q learning is to intelligently adjust the action strategy of the terminal user by learning through a Q-learning algorithm according to the action a taken by the environment state s and the reward function r.
The sharing blackboard model is that after N terminal users learn a certain number of steps M, each terminal user issues the current Q value of the terminal user to a blackboard, and simultaneously obtains the Q values of other terminal users from the blackboard, thereby realizing strategy sharing.
The fusion algorithm is a strategy for fusing strategies obtained from the blackboard in order to obtain a higher reward value.
The selector selects an action based on the Q value and the selected action selection policy.
The actuator is used for executing the action selected by the selector and acting on the environment so as to enable the environment state stTransition to the next state st+1
The integrated environment of air, sky, earth and sea is the environment of communication system composed of air, sky, earth and sea.
The invention designs an aerospace-geostationary integrated multi-user collaborative learning dynamic access method, which is characterized in that each terminal user in aerospace-geostationary integrated multi-user collaborative learning independently learns by adopting an Agent reinforcement learning algorithm, meanwhile, a plurality of terminal users realize strategy sharing through a blackboard model, a plurality of strategies are fused and improved by utilizing a fusion algorithm after self-learning, and then the fused strategies are used for re-learning, so that the prior knowledge of each terminal user is increased, the learning speed is accelerated, the learning efficiency is improved, the conflict probability of the aerospace-geostationary integrated communication system is reduced, and the average capacity of the aerospace-geostationary integrated communication system is improved. The method provides a dynamic spectrum access theory and a model thereof suitable for an aerospace, geodetic and sea integrated communication system. The fusion algorithm in the method considers the interaction and communication among the terminal users, eliminates redundant actions in the strategy as much as possible through the cooperation among the terminal users, and then realizes the final target in a relatively high-efficiency mode, thereby improving the execution efficiency of the system. The method realizes information sharing by adopting a sharing blackboard model, and achieves the aims of realizing cooperation and accelerating learning. The probability of conflict is reduced by combining the sharing algorithm and the Q learning algorithm, and the learning speed and the learning effect of the system are really and greatly improved by the communication and sharing strategy. The frequency spectrum is allocated and used in a more flexible mode, and therefore the utilization rate of frequency spectrum resources is improved. And the intelligent dynamic spectrum sharing is realized by combining reinforcement learning in artificial intelligence with a spectrum sharing technology.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims (6)

1. An aerospace, geostationary and marine integrated multi-user collaborative learning dynamic access method is characterized by comprising the following steps:
s1, presetting the value of M and the value of T, setting the time when T is 0, randomly initializing Q (S, a) of N terminal users, and setting an initial learning rate lambda and a discount factor β;
s2: executing a standard Q learning algorithm by the N terminal users respectively;
s3: judging whether t can be evenly divided by M; if the value cannot be evenly divided, the process proceeds directly to step S4; if the current value of Q is completely eliminated, the N terminal users exchange strategies and are fused, namely after learning M steps, the N terminal users release the current accumulated Q value to a blackboard, and simultaneously obtain the Q values of other terminal users from the blackboard, so that each terminal user fuses the strategy according to a fusion algorithm, selects an action according to the fused strategy, and then enters step S4;
S4:t=t+1;
s5: judging whether T is more than or equal to T or not; if so, selecting the action by completely adopting a greedy strategy; if not, the process returns to step S2.
2. The method of claim 1, wherein the greedy policy a*(s) the computational expression is:
Figure FDA0002365121590000011
wherein, a represents an optional action set, and b represents that the optional action set occupies a frequency point; arg (×) is an angle-taking operation; max (×) is the max operation.
3. The method of claim 1, wherein the step of the standard Q learning algorithm comprises:
step A. observing the current state s of the environmentt
B, selecting an action a according to Boltzmann action selection strategytAnd executing;
step C, observing the subsequent state s of the environmenttAnd obtaining an enhanced signal r from the environment;
step D, corresponding Q(s) to the state-action pair (s, a)t,at) And (6) updating.
4. The method of claim 3, wherein the action selection policy computation expression is as follows:
Figure FDA0002365121590000021
wherein, Q(s)t,ai) Is the Q value, p (a), of each state-action pairi/stQ) is in state stProbability of selecting action a; a represents an optional action set, and b represents that the optional action set occupies a frequency point; t is an adjustable temperature parameter, and the larger the T is, the stronger the randomness of the selected action is; exp (, x) is an exponential operation.
5. The method of claim 3, wherein the Q-value table is updated as follows:
Figure FDA0002365121590000022
r=rt(s,a″t,a2)
wherein i represents the first end user; a is1,a′1∈A,a2,a′2A' is a joint action of all end users; r ist(s,a″t,a2) A reward function for the environment for the associated action; s' is the observation environment state.
6. The method of claim 1, wherein the step of fusing the algorithms comprises:
step A, setting the step M as a learning period, after each learning period is finished, each terminal user sends the current Q value of the terminal user to the blackboard, shares the Q values of other terminal users in the blackboard, and finds out the terminal user with the maximum Q value
Figure FDA0002365121590000023
The computational expression of the end user of (1) is:
Figure FDA0002365121590000024
step B, calculating
Figure FDA0002365121590000025
Step C. calculation
Figure FDA0002365121590000026
Step D, for all the end users i e {1,2, … N }, there are
Figure FDA0002365121590000027
CN202010033307.7A 2020-01-13 2020-01-13 Dynamic access method for air-space-earth-sea integrated multi-user cooperative learning Pending CN111225380A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010033307.7A CN111225380A (en) 2020-01-13 2020-01-13 Dynamic access method for air-space-earth-sea integrated multi-user cooperative learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010033307.7A CN111225380A (en) 2020-01-13 2020-01-13 Dynamic access method for air-space-earth-sea integrated multi-user cooperative learning

Publications (1)

Publication Number Publication Date
CN111225380A true CN111225380A (en) 2020-06-02

Family

ID=70828364

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010033307.7A Pending CN111225380A (en) 2020-01-13 2020-01-13 Dynamic access method for air-space-earth-sea integrated multi-user cooperative learning

Country Status (1)

Country Link
CN (1) CN111225380A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113613301A (en) * 2021-08-04 2021-11-05 北京航空航天大学 Air-space-ground integrated network intelligent switching method based on DQN

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100302961A1 (en) * 2009-04-01 2010-12-02 Imec Method for resolving network contention
CN102238555A (en) * 2011-07-18 2011-11-09 南京邮电大学 Collaborative learning based method for multi-user dynamic spectrum access in cognitive radio
CN102256262A (en) * 2011-07-14 2011-11-23 南京邮电大学 Multi-user dynamic spectrum accessing method based on distributed independent learning
US20140198703A1 (en) * 2013-01-17 2014-07-17 Raytheon Bbn Technologies Corp. Interface and link selection for a multi-frequency multi-rate multi-transceiver communication device
CN108777872A (en) * 2018-05-22 2018-11-09 中国人民解放军陆军工程大学 A kind of anti-interference model of depth Q neural networks and intelligent Anti-interference algorithm
CN108809452A (en) * 2018-05-02 2018-11-13 河海大学常州校区 Optimal perceived channel selecting method in dynamic spectrum access system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100302961A1 (en) * 2009-04-01 2010-12-02 Imec Method for resolving network contention
CN102256262A (en) * 2011-07-14 2011-11-23 南京邮电大学 Multi-user dynamic spectrum accessing method based on distributed independent learning
CN102238555A (en) * 2011-07-18 2011-11-09 南京邮电大学 Collaborative learning based method for multi-user dynamic spectrum access in cognitive radio
US20140198703A1 (en) * 2013-01-17 2014-07-17 Raytheon Bbn Technologies Corp. Interface and link selection for a multi-frequency multi-rate multi-transceiver communication device
CN108809452A (en) * 2018-05-02 2018-11-13 河海大学常州校区 Optimal perceived channel selecting method in dynamic spectrum access system
CN108777872A (en) * 2018-05-22 2018-11-09 中国人民解放军陆军工程大学 A kind of anti-interference model of depth Q neural networks and intelligent Anti-interference algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YUNG-FA HUANG等: ""Resource Allocation For D2D Communications With A Novel Distributed Q-Learning Algorithm In Heterogeneous Networks"", 《2018 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC)》 *
李晓静: ""基于强化学习的动态频谱分配算法的研究"", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑(月刊)2012年第04期》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113613301A (en) * 2021-08-04 2021-11-05 北京航空航天大学 Air-space-ground integrated network intelligent switching method based on DQN
CN113613301B (en) * 2021-08-04 2022-05-13 北京航空航天大学 Air-ground integrated network intelligent switching method based on DQN

Similar Documents

Publication Publication Date Title
CN111930436B (en) Random task queuing unloading optimization method based on edge calculation
Zhou et al. Collaborative data scheduling with joint forward and backward induction in small satellite networks
He et al. Dynamic scheduling of hybrid tasks with time windows in data relay satellite networks
CN113258982B (en) Satellite information transmission method, device, equipment, medium and product
CN111556516A (en) Distributed wireless network task cooperative distribution method facing delay and energy efficiency sensitive service
CN110621052B (en) Multipath routing optimization method
CN111225380A (en) Dynamic access method for air-space-earth-sea integrated multi-user cooperative learning
Gong et al. Computation offloading for rechargeable users in space-air-ground networks
CN106230528A (en) A kind of cognition wireless network frequency spectrum distributing method and system
Tian et al. Asynchronous federated learning empowered computation offloading in collaborative vehicular networks
Lakew et al. Adaptive partial offloading and resource harmonization in wireless edge computing-assisted ioe networks
Cui et al. Joint computation offloading and resource management for usvs cluster of fog-cloud computing architecture
CN112733357A (en) Heterogeneous space network data transmission task collaborative planning method
Wang et al. Dynamic Routing Algorithm with Q-learning for Internet of things with Delayed Estimator
Li et al. Multiagent reinforcement learning for task offloading of space/aerial-assisted edge computing
Lan et al. UAV-Assisted Computation Offloading Towards Energy-Efficient Blockchain Operations in Internet of Things
CN107968832B (en) Fair resource allocation method based on lightweight content-centric network architecture
CN111371572B (en) Network node election method and node equipment
CN112565073A (en) Unmanned ship head-collar node dynamic election system and method based on block chain
Cui et al. Online Container Scheduling for Low-Latency IoT Services in Edge Cluster Upgrade: A Reinforcement Learning Approach
Asheralieva et al. Ultra-Reliable Low-Latency Slicing in Space-Air-Ground Multi-Access Edge Computing Networks for Next-Generation Internet of Things and Mobile Applications
CN111629037B (en) Dynamic cloud content distribution network content placement method based on collaborative reinforcement learning
Brahimi et al. Cloud service selection in IoFT-enabled Multi-access Edge Computing: a Game Theoretic approach
CN116980881B (en) Multi-unmanned aerial vehicle collaboration data distribution method, system, electronic equipment and medium
Marbukh Towards Robust Access for Internet of Things in Uncertain/Adverse Communication Environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200602