CN111225380A

CN111225380A - Dynamic access method for air-space-earth-sea integrated multi-user cooperative learning

Info

Publication number: CN111225380A
Application number: CN202010033307.7A
Authority: CN
Inventors: 谷林海
Original assignee: Dongfanghong Satellite Mobile Communication Co Ltd
Current assignee: Dongfanghong Satellite Mobile Communication Co Ltd
Priority date: 2020-01-13
Filing date: 2020-01-13
Publication date: 2020-06-02

Abstract

The invention belongs to the technical field of aerospace, geodetic and sea integrated communication, and particularly relates to an aerospace, geodetic and sea integrated multi-user cooperative learning dynamic access method; according to the method, each terminal user in the air-space-ground-sea integrated communication system independently learns by adopting an Agent reinforcement learning algorithm, meanwhile, a plurality of terminal users realize strategy sharing through a blackboard model, a plurality of strategies are fused and improved through a fusion algorithm after self-learning, and then the strategies after fusion are used for re-learning, so that the prior knowledge of each terminal user is increased, the learning speed is accelerated, the learning efficiency is improved, the collision probability of the air-space-ground-sea integrated communication system is reduced, and the average capacity of the air-space-ground-sea integrated communication system is improved.

Description

Dynamic access method for air-space-earth-sea integrated multi-user cooperative learning

Technical Field

The invention relates to the technical field of space, sky, earth and sea integrated communication, in particular to a space, sky, earth and sea integrated multi-user cooperative learning dynamic access method.

Background

The air, space, ground and sea integrated information network is based on a ground network, is expanded by a space-based network, adopts a unified technical architecture, a unified technical system and a unified standard specification, is formed by interconnecting and intercommunicating a space-based information network, the internet and a mobile communication network, and has the characteristics of diversified service bearing, heterogeneous network interconnection, global resource management and the like. The air, space, ground and sea integrated information network is used as an important national information infrastructure and has important significance in a plurality of fields such as homeland security, emergency disaster relief, transportation, economic development and the like.

In order to meet the use requirements of frequency spectrum resources of the air-space-earth-sea integrated communication system, on one hand, the available frequency spectrum needs to be expanded, for example, a terahertz frequency spectrum and a visible light frequency spectrum are adopted; on the other hand, the spectrum use rule needs to be changed, the current situation that the current authorized carrier use mode is dominant is broken through, and the spectrum is distributed and used in a more flexible mode, so that the spectrum resource utilization rate is improved.

At present, the ground communication and the satellite communication mainly adopt an authorized carrier wave using mode, a frequency spectrum resource owner monopolizes a frequency spectrum using authority, and other demanders have no opportunity to use the frequency spectrum resource even if the frequency spectrum resource is temporarily idle. The exclusive authorized spectrum has strict limits and requirements on technical indexes, use areas and the like of users, can effectively avoid intersystem interference and can be used for a long time. However, while having higher stability and reliability, this method also has the problems of spectrum idleness and insufficient utilization caused by the exclusive use of the frequency band by the authorized user, which aggravates the contradiction between supply and demand of spectrum.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides an aerospace, geodetic and sea integrated multi-user cooperative learning dynamic access method.

In one aspect, the invention provides an aerospace, geodetic and sea integrated multi-user collaborative learning dynamic access method, which comprises the following steps:

s1, presetting the value of M and the value of T, setting the time when T is 0, randomly initializing Q (S, a) of N terminal users, and setting an initial learning rate lambda and a discount factor β;

s2: executing a standard Q learning algorithm by the N terminal users respectively;

s3: judging whether t can be evenly divided by M; if the value cannot be evenly divided, the process proceeds directly to step S4; if the current value can be completely removed, the N terminal users exchange strategies and fuse, namely after learning M steps, the N terminal users release the current accumulated Q value to the blackboard, and simultaneously obtain the Q values of other terminal users from the blackboard, so that each terminal user fuses the strategy according to a fusion algorithm, selects an action according to the fused strategy, and then enters the step S4.

S4：t＝t+1；

S5: judging whether T is more than or equal to T or not; if so, selecting the action by completely adopting a greedy strategy; if not, the process returns to step S2.

Optionally, the greedy policy a^*(s) the computational expression is:

wherein, a represents an optional action set, and b represents that the optional action set occupies a frequency point; arg (×) is an angle-taking operation; max (×) is the max operation.

Optionally, the step of the standard Q learning algorithm includes: step A. observing the current state s of the environment_t(ii) a B, selecting an action a according to Boltzmann action selection strategy_tAnd executing; step C, observing the subsequent state s of the environment_tAnd obtaining an enhanced signal r from the environment; step D, corresponding Q(s) to the state-action pair (s, a)_t,a_t) And (6) updating.

Optionally, the action selection policy calculation expression is as follows:

wherein, Q(s)_t,a_i) Is the Q value, p (a), of each state-action pair_i/s_tQ) is in state s_tProbability of selecting action a; a represents an optional action set, and b represents that the optional action set occupies a frequency point; t is an adjustable temperature parameter, and the larger the T is, the stronger the randomness of the selected action is; exp (, x) is an exponential operation.

Optionally, the update formula of the Q-value table is as follows:

r＝r_t(s,a″_t,a₂)

wherein i represents the first end user; a is₁,a′₁∈A，a₂,a′₂A' is a joint action of all end users; r is_t(s,a″_t,a₂) A reward function for the environment for the associated action; s' is the observation environment state.

Optionally, the step of fusing the algorithm includes:

step A, setting the step M as a learning period, after each learning period is finished, each terminal user sends the current Q value of the terminal user to the blackboard, shares the Q values of other terminal users in the blackboard, and finds out the terminal user with the maximum Q value

The computational expression of the end user of (1) is:

step B, calculating

Step C. calculation

Step D, for all the end users i e {1,2, … N }, there are

The invention has the beneficial effects that:

(1) the invention discloses an aerospace, geodetic and sea integrated multi-user cooperative learning dynamic access method, and provides a dynamic spectrum access theory and a model thereof suitable for an aerospace, geodetic and sea integrated communication system.

(2) The invention relates to an air-space-earth-sea integrated multi-user collaborative learning dynamic access method, wherein a fusion algorithm in the method considers interaction and communication among terminal users, eliminates redundant actions in a strategy as much as possible through cooperation among the terminal users, and then realizes a final target in a relatively high-efficiency mode, thereby improving the execution efficiency of a system.

(3) The invention discloses an aerospace, geodetic and sea integrated multi-user collaborative learning dynamic access method which adopts a shared blackboard model to realize information sharing and achieves the aims of realizing collaboration and accelerated learning.

(4) According to the dynamic access method for the air-space-earth-sea integrated multi-user collaborative learning, the probability of conflict occurrence is reduced by combining the sharing algorithm and the Q learning algorithm, and the learning speed and the learning effect of the system are really and greatly improved by the communication and sharing strategy.

(5) The air, space, ground and sea integrated multi-user cooperative learning dynamic access method allocates and uses frequency spectrum in a more flexible mode, so that the utilization rate of frequency spectrum resources is improved.

(6) The invention discloses an air-space-earth-sea integrated multi-user collaborative learning dynamic access method which combines reinforcement learning in artificial intelligence with a spectrum sharing technology to realize intelligent dynamic spectrum sharing.

Drawings

In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.

FIG. 1 is a flow chart of an aerospace-terrestrial-sea integrated multi-user cooperative learning dynamic access method according to the present invention;

fig. 2 is a model schematic diagram of an aerospace-terrestrial-sea integrated multi-user cooperative learning dynamic access method of the present invention.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and therefore are only examples, and the protection scope of the present invention is not limited thereby.

It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which the invention pertains.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

At present, the ground communication and the satellite communication mainly adopt an authorized carrier wave using mode, a frequency spectrum resource owner monopolizes a frequency spectrum using authority, and other demanders have no opportunity to use the frequency spectrum resource even if the frequency spectrum resource is temporarily idle. The exclusive authorized spectrum has strict limits and requirements on technical indexes, use areas and the like of users, can effectively avoid intersystem interference and can be used for a long time. However, the method has higher stability and reliability, and also has the problems of spectrum idleness, insufficient utilization and the like caused by the exclusive use of the frequency band by the authorized user, so that the contradiction between supply and demand of the spectrum is aggravated; in order to solve the above problems, it is necessary to develop an aerospace-geostationary integrated multi-user collaborative learning dynamic access method, which improves a plurality of strategies by fusion algorithm after self-learning, and then learns again by using the fused strategies, so as to increase the prior knowledge of each terminal user, thereby accelerating learning speed, improving learning efficiency, reducing the collision probability of the aerospace-geostationary integrated communication system, and improving the average capacity of the aerospace-geostationary integrated communication system.

The specific embodiment of the present invention provides an aerospace, terrestrial and sea integrated multi-user collaborative learning dynamic access method, which is shown in fig. 1-2 and comprises the following steps:

in step S1, the value of M and the value of T are preset, and Q (S, a) of N end users is randomly initialized at the time when T is 0, and the initial learning rate λ and the discount factor β are set.

In step S2, the N end users are each subjected to a standard Q learning algorithm.

In an embodiment of the present invention, the step of the standard Q learning algorithm includes: step A. observing the current state s of the environment_t(ii) a B, selecting an action a according to Boltzmann action selection strategy_tAnd executing; step C, observing the subsequent state s of the environment_tAnd obtaining an enhanced signal r from the environment; step D, corresponding Q(s) to the state-action pair (s, a)_t,a_t) And (6) updating.

The action selection strategy calculation expression is as follows:

The updating formula of the Q value table is as follows:

r＝r_t(s,a′_t,a₂)

wherein i represents the first end user; a is₁,a′₁∈A，a₂,a′₂A' is a joint action of all end users; r is_t(s,a′_t,a₂) A reward function for the environment for the associated action; s' is the observation environment state.

In step S3, it is determined whether t is divisible by M; if the value cannot be evenly divided, the process proceeds directly to step S4; if the current value can be completely removed, the N terminal users exchange strategies and fuse, namely after learning M steps, the N terminal users release the current accumulated Q value to the blackboard, and simultaneously obtain the Q values of other terminal users from the blackboard, so that each terminal user fuses the strategy according to a fusion algorithm, selects an action according to the fused strategy, and then enters the step S4.

In the embodiment of the invention, the fusion algorithm considers the interaction and communication among terminal users, aims to eliminate redundant actions in a strategy as much as possible through the cooperation among the terminal users, and then realizes a final target in a relatively high-efficiency mode, so that the execution efficiency and the convergence performance of a system are improved, and the fusion algorithm comprises the following steps: step A, setting the step M as a learning period, after each learning period is finished, each terminal user sends the current Q value of the terminal user to the blackboard, shares the Q values of other terminal users in the blackboard, and finds out the terminal user with the maximum Q value

The computational expression of the end user of (1) is:

step B, calculating

Step C. calculation

Step D, for all the end users i e {1,2, … N }, there are

In step S4, t is t + 1.

In step S5, it is determined whether T is equal to or greater than T; if so, selecting the action by completely adopting a greedy strategy; if not, the process returns to step S2.

In the embodiment of the invention, the greedy strategy a^*(s) the computational expression is:

wherein a representsSelecting an action set, wherein b represents that the action set occupies a frequency point; arg (×) is an angle-taking operation; max (×) is the max operation.

As shown in fig. 2, the model of the air-space-earth-sea integrated multi-user collaborative learning dynamic access method of the present invention mainly includes: the method comprises the steps of terminal user, Q learning, blackboard model sharing, fusion algorithm, selector, actuator and air, space, earth and sea integrated environment.

The terminal users are all accessible terminal users of the air-space-ground-sea integrated communication system, and comprise mobile terminals, terminals of the internet of things and the like.

The Q learning is to intelligently adjust the action strategy of the terminal user by learning through a Q-learning algorithm according to the action a taken by the environment state s and the reward function r.

The sharing blackboard model is that after N terminal users learn a certain number of steps M, each terminal user issues the current Q value of the terminal user to a blackboard, and simultaneously obtains the Q values of other terminal users from the blackboard, thereby realizing strategy sharing.

The fusion algorithm is a strategy for fusing strategies obtained from the blackboard in order to obtain a higher reward value.

The selector selects an action based on the Q value and the selected action selection policy.

The actuator is used for executing the action selected by the selector and acting on the environment so as to enable the environment state s_tTransition to the next state s_t+1。

The integrated environment of air, sky, earth and sea is the environment of communication system composed of air, sky, earth and sea.

The invention designs an aerospace-geostationary integrated multi-user collaborative learning dynamic access method, which is characterized in that each terminal user in aerospace-geostationary integrated multi-user collaborative learning independently learns by adopting an Agent reinforcement learning algorithm, meanwhile, a plurality of terminal users realize strategy sharing through a blackboard model, a plurality of strategies are fused and improved by utilizing a fusion algorithm after self-learning, and then the fused strategies are used for re-learning, so that the prior knowledge of each terminal user is increased, the learning speed is accelerated, the learning efficiency is improved, the conflict probability of the aerospace-geostationary integrated communication system is reduced, and the average capacity of the aerospace-geostationary integrated communication system is improved. The method provides a dynamic spectrum access theory and a model thereof suitable for an aerospace, geodetic and sea integrated communication system. The fusion algorithm in the method considers the interaction and communication among the terminal users, eliminates redundant actions in the strategy as much as possible through the cooperation among the terminal users, and then realizes the final target in a relatively high-efficiency mode, thereby improving the execution efficiency of the system. The method realizes information sharing by adopting a sharing blackboard model, and achieves the aims of realizing cooperation and accelerating learning. The probability of conflict is reduced by combining the sharing algorithm and the Q learning algorithm, and the learning speed and the learning effect of the system are really and greatly improved by the communication and sharing strategy. The frequency spectrum is allocated and used in a more flexible mode, and therefore the utilization rate of frequency spectrum resources is improved. And the intelligent dynamic spectrum sharing is realized by combining reinforcement learning in artificial intelligence with a spectrum sharing technology.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims

1. An aerospace, geostationary and marine integrated multi-user collaborative learning dynamic access method is characterized by comprising the following steps:

s3: judging whether t can be evenly divided by M; if the value cannot be evenly divided, the process proceeds directly to step S4; if the current value of Q is completely eliminated, the N terminal users exchange strategies and are fused, namely after learning M steps, the N terminal users release the current accumulated Q value to a blackboard, and simultaneously obtain the Q values of other terminal users from the blackboard, so that each terminal user fuses the strategy according to a fusion algorithm, selects an action according to the fused strategy, and then enters step S4;

S4：t＝t+1；

2. The method of claim 1, wherein the greedy policy a^*(s) the computational expression is:

3. The method of claim 1, wherein the step of the standard Q learning algorithm comprises:

step A. observing the current state s of the environment_t；

B, selecting an action a according to Boltzmann action selection strategy_tAnd executing;

step C, observing the subsequent state s of the environment_tAnd obtaining an enhanced signal r from the environment;

step D, corresponding Q(s) to the state-action pair (s, a)_t,a_t) And (6) updating.

4. The method of claim 3, wherein the action selection policy computation expression is as follows:

5. The method of claim 3, wherein the Q-value table is updated as follows:

r＝r_t(s,a″_t,a₂)

6. The method of claim 1, wherein the step of fusing the algorithms comprises:

The computational expression of the end user of (1) is:

step B, calculating

Step C. calculation

Step D, for all the end users i e {1,2, … N }, there are