CN115996475A - Ultra-dense networking multi-service slice resource allocation method and device - Google Patents

Ultra-dense networking multi-service slice resource allocation method and device Download PDF

Info

Publication number
CN115996475A
CN115996475A CN202211487474.4A CN202211487474A CN115996475A CN 115996475 A CN115996475 A CN 115996475A CN 202211487474 A CN202211487474 A CN 202211487474A CN 115996475 A CN115996475 A CN 115996475A
Authority
CN
China
Prior art keywords
base station
micro base
representing
value
base stations
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211487474.4A
Other languages
Chinese (zh)
Inventor
张勇
滕颖蕾
柴玉昊
张震宇
袁思雨
白昊男
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202211487474.4A priority Critical patent/CN115996475A/en
Publication of CN115996475A publication Critical patent/CN115996475A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The invention provides a method and a device for distributing resources of ultra-dense networking multi-service slices, comprising the following steps: acquiring a multi-agent reinforcement learning model, wherein a strategy network and a value network are deployed on each micro base station, a transmission power equalization solution is solved in advance, the strategy network takes the transmission rate and the transmission power of the micro base station as state parameters, and a correlation parameter set of each micro base station and a predicted transmission power set of other micro base stations as action parameters; each micro base station acquires own state parameters, generates corresponding action strategies, calculates estimated Q values for the action strategies generated by the corresponding micro base stations according to global information by a value network, and is used for updating strategy network parameters; constructing a loss function of the estimated Q value and the actual Q value by taking the maximized rewarding value as a target, and updating parameters of the value network until the model reaches a preset performance requirement; and inputting the state parameters of each micro base station into a trained multi-agent reinforcement learning model to generate corresponding action strategies so as to realize multi-service slice resource allocation.

Description

Ultra-dense networking multi-service slice resource allocation method and device
Technical Field
The invention relates to the technical field of communication, in particular to a method and a device for distributing ultra-dense networking multi-service slice resources.
Background
Network slicing technology has become one of the key technologies of the fifth generation mobile network (Fifth Generation Mobile Network, 5G), and in the next generation mobile network and other technical fields, the network has raised requirements for flexibility, isolation, privacy, customization and the like for differentiated services of users, and at the same time, the importance of a small-scale network for providing specific services has also increased to meet the requirements of different scenes and different crowds.
One of the emerging solutions is to introduce ultra-dense networking of macro-micro base station isomerism, which meets the transmission capacity and coverage requirements of users. The ultra-dense networking of the base stations can improve the spectrum efficiency of the system to a certain extent, dynamic radio resource allocation is carried out through rapid resource scheduling, the licensed spectrum of the macro base station (Macro Base Station, MBS) is multiplexed at the micro base station (Small Base Station, SBS), and the utilization rate and the spectrum efficiency of the system radio resource are improved, but the problems of system interference and system cost are brought. In order to provide reliable service, the micro base station needs to acquire multiplexing permission of the macro base station spectrum, which needs to perform interference coordination between the macro base station and the micro base station to ensure that the operation of the micro base station is not affected by harmful interference.
Therefore, a method for reducing contention interference of the micro base station and optimizing radio resource allocation under the premise of guaranteeing the communication quality and the communication requirement of the user is needed.
Disclosure of Invention
In view of this, the embodiments of the present invention provide a method and an apparatus for distributing resources of ultra-dense networking multi-service slices, so as to eliminate or improve one or more drawbacks existing in the prior art, and solve the problems of system interference and system cost caused when the wireless resource utilization rate and spectrum efficiency of the system are improved in the prior art.
In one aspect, the invention provides a resource allocation method for multi-service slices of ultra-dense networking, which is characterized in that the ultra-dense networking comprises at least one macro base station, and each macro base station is also connected with a plurality of micro base stations for service; the users of the micro base stations multiplex slice resources of the Ying Hong base stations, and the method is used for carrying out multi-service slice resource allocation based on cross-layer interference generated between the micro base stations and the macro base stations and same-layer interference generated between adjacent micro base stations; the method comprises the following steps:
acquiring a multi-agent reinforcement learning model, wherein the multi-agent reinforcement learning model deploys a strategy network and a value network on each micro base station; each strategy network constructs a state space by taking the transmission rate and the total transmitting power of each user in a corresponding single micro base station as state parameters; acquiring associated parameters used for indicating whether users in all micro base stations multiplex resource blocks in macro base stations or not, and constructing an action space by taking an associated parameter set of each micro base station and a predicted transmitting power set of other micro base stations as action parameters; each micro base station obtains the state parameters of the micro base station, selects corresponding actions according to the strategy network, generates estimated Q values according to the state parameters of the corresponding micro base station, the selected actions and the state parameters and actions of other micro base stations by the value network of each micro base station, and is used for carrying out parameter updating on the strategy network of the corresponding micro base station; constructing a loss function of the estimated Q value and the actual Q value of the model by taking the maximized rewarding value as an optimization target, and carrying out parameter updating on the value network; until reaching the preset performance requirement;
In the state updating process, the macro base station constructs a macro base station income calculation formula according to the cross-layer interference price and the cross-layer interference generated by multiplexing resource blocks in the micro base station by the user; the micro base station constructs a micro base station gain calculation formula according to the association parameter, the fixed bandwidth length of the resource block, the signal-to-interference-plus-noise ratio, the same-layer interference price, the same-layer interference, the cross-layer interference price and the cross-layer interference; taking the macro base station as a leader and taking each micro base station as a follower to construct a non-cooperative game; fixing the values of the associated parameters, and solving the micro base station profit calculation by adopting a reverse induction method to obtain the transmitting power equalization solution of each micro base station so as to update the state space of each strategy network; substituting the transmission power equalization solution into the macro base station income calculation to obtain the cross-layer interference price equalization solution;
and inputting the state parameters of each micro base station into the multi-agent reinforcement learning model to generate corresponding action strategies so as to realize multi-service slice resource allocation.
In some embodiments of the present invention, the macro base station constructs a macro base station benefit calculation formula according to a cross-layer interference price and cross-layer interference generated by multiplexing resource blocks in the micro base station by a user, where the macro base station benefit calculation formula is:
Figure BDA0003963158810000021
wherein ,UMBS Representing the macro base station benefit; u (U) UE Representing a set of users for all micro base stations; u (U) PRB Representing the total number of resource blocks; u (U) BS Representing the macro base station and a set of all micro base stations;
Figure BDA0003963158810000022
representing the cross-layer interference price of a user i using a resource block j at a micro base station b; />
Figure BDA0003963158810000023
Indicating cross-layer interference caused by the use of resource block j by user i at micro base station b.
In some embodiments of the present invention, the micro base station constructs a micro base station benefit calculation formula according to the association parameter, the resource block fixed bandwidth length, the signal to interference plus noise ratio, the same layer interference price, the same layer interference, the cross layer interference price and the cross layer interference, wherein the micro base station benefit calculation formula is:
Figure BDA0003963158810000031
s.t.
Figure BDA0003963158810000032
Figure BDA0003963158810000033
Figure BDA0003963158810000034
Figure BDA0003963158810000035
Figure BDA0003963158810000036
Figure BDA0003963158810000037
wherein ,Ub Representing the micro base station benefit; u (U) UE,b Representing a set of users of the micro base station b; u (U) s The type of slice is represented;
Figure BDA0003963158810000038
representing the association relationship between the user i and the slice s, the resource block j and the micro base station b; b represents the fixed bandwidth length of the resource block;
Figure BDA0003963158810000039
indicating that the user i uses the resource block j to cause the signal to interference plus noise ratio at the micro base station b; />
Figure BDA00039631588100000310
Indicating the same-layer interference price of user i using resource block j at micro base station b;/>
Figure BDA00039631588100000311
The same-layer interference caused by the use of the resource block j by the user i at the micro base station b is shown;
Figure BDA00039631588100000312
representing the cross-layer interference price of a user i using a resource block j at a micro base station b; / >
Figure BDA00039631588100000313
Indicating cross-layer interference caused by a user i using a resource block j at a micro base station b; />
Figure BDA00039631588100000314
Representing the transmitting power allocated to the resource block j by the user i at the micro base station b; u (U) UE Representing a set of users for all micro base stations; u (U) BS Representing the macro base station and a set of all micro base stations; i max Representing the interference maximum; u (U) PRB Representing the total number of resource blocks; τ represents the total number of resource blocks.
In some embodiments of the present invention, each policy network constructs a state space with a transmission rate of each user in a corresponding single micro base station and a total transmission power as state parameters, where the total transmission power uses the transmission power equalization solution, and the state parameters are expressed as:
Figure BDA00039631588100000315
wherein ,sj (t) represents a state parameter of the micro base station at the time t; r is (r) N,j (t) represents the transmission rate of the nth user multiplexing resource block j at the time t;
Figure BDA00039631588100000316
the user i allocates the transmitting power of the resource block j in the micro base station b; u (U) j Representing a set of users multiplexing resource blocks j;
the value network of each micro base station generates a predicted Q value according to the state parameters and the selected actions of the corresponding micro base station and the state parameters and actions of other micro base stations, and the state parameters of the value network are expressed as follows:
s j '(t)=(s j (t),a j (t),s -j (t),a -j (t));
wherein ,sj (t) represents a state parameter of the micro base station at the time t; a, a j (t) represents an operation parameter of the micro base station at the time t; s is(s) -j (t) represents a state parameter set of other micro base stations at the moment t; a, a -j And (t) represents the action parameter set of other micro base stations at the moment t.
In some embodiments of the present invention, an association parameter for indicating whether a user in each micro base station multiplexes a resource block in a macro base station is obtained, and an action space is constructed by using an association parameter set of each micro base station and a predicted transmit power set of other micro base stations as action parameters, where the action parameters are expressed as:
a j (t)={W j ,P -j };
wherein ,
Figure BDA0003963158810000041
Figure BDA0003963158810000042
wherein ,aj (t) represents an operation parameter of the micro base station at the time t; w (W) j Representing a set of associated parameters; p (P) -j Representing a set of predicted transmit powers of other micro base stations;
Figure BDA0003963158810000043
representing the association relationship between the user i and the slice s, the resource block j and the micro base station b;
Figure BDA0003963158810000044
representing the transmitting power allocated to the resource block j by the user i at the micro base station b; u (U) j Representing the set of users multiplexing resource block j.
In some embodiments of the present invention, the value network of each micro base station generates a predicted Q value according to the state parameter and the selected action of the corresponding micro base station and the state parameters and actions of other micro base stations, and constructs a policy gradient according to the predicted Q value, and is used for updating parameters of the policy network of the corresponding micro base station, where a calculation formula of the policy gradient is as follows:
Figure BDA0003963158810000045
wherein ,
Figure BDA0003963158810000046
representing the policy gradient; θ represents a policy parameter; j (u) j ) Representing a cumulative estimated prize value; d represents an experience playback pool; u (u) j (a j |s j ) Representing an action strategy made by the micro base station according to the state; />
Figure BDA0003963158810000047
Representing the value network; s is(s) j Representing the state of the micro base station estimated by the value network; a, a j Representing the motion of the micro base station estimated by the value network; s is(s) other Representing the state of other micro base stations estimated by the value network; a, a other And representing the actions of other micro base stations estimated by the value network.
In some embodiments of the present invention, a loss function of the estimated Q value and the actual Q value of the model is constructed with the maximized reward value as an optimization target, and the value network is updated with parameters, where a calculation formula of the loss function is as follows:
Figure BDA0003963158810000048
wherein ,
Figure BDA0003963158810000049
representing the loss function; θ represents a policy parameterA number; u (u) j Representing adaptive weight parameters; r represents the updated result of the action; />
Figure BDA00039631588100000410
Representing the value network; s is(s) j Representing the state of the micro base station estimated by the value network; a, a j Representing the motion of the micro base station estimated by the value network; s is(s) other Representing the state of other micro base stations estimated by the value network; a, a other Representing the actions of other micro base stations estimated by the value network; / >
Figure BDA00039631588100000411
Representing the actual Q value.
In some embodiments of the present invention, a loss function of the estimated Q value and the actual Q value of the model is constructed with the maximized reward value as an optimization target, and the value network is updated with parameters, where the calculation formula of the reward value is:
Figure BDA00039631588100000412
wherein, reward j Representing the prize value; u (u) j Representing the adaptive weight parameters; r is (r) j Representing the total transmission rate of each user multiplexing resource block j of the micro base station; n represents the total number of users; r is (r) -j Representing the total transmission rate of other micro base stations;
Figure BDA0003963158810000051
representing the interference price of the same layer; />
Figure BDA0003963158810000052
Representing co-layer interference; />
Figure BDA0003963158810000053
Representing cross-layer interference prices; />
Figure BDA0003963158810000054
Representation ofCross-layer interference; u (U) j Representing the set of users multiplexing resource block j.
In some embodiments of the present invention, the adaptive weight parameter is learned according to a state of a global environment;
when u is j When=1, the prize value is only related to the transmission rate of the micro base station, is zero and is played;
when 0 is<u j <1, the prize value is related to the transmission rate of the micro base station itself and the transmission rate of other micro base stations, so as to form a hybrid game.
In another aspect, the invention also provides a computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of a method as described in any one of the above-mentioned.
The invention has the advantages that:
the invention provides a method and a device for distributing resources of ultra-dense networking multi-service slices, which are characterized in that a multi-agent reinforcement learning model obtained through pre-training is obtained, and state parameters of each micro base station are input into the multi-agent reinforcement learning model, so that action strategies are correspondingly generated, and the purposes of reducing competition and interference of the micro base stations on the premise of guaranteeing the communication quality and the communication requirement of users are achieved, so that the wireless resource distribution is optimized, and the frequency spectrum tension is relieved.
In multi-agent reinforcement learning model training, constructing a macro base station income calculation formula and a micro base station income calculation formula in advance, modeling a resource allocation problem as a non-cooperative game, and obtaining a transmitting power balance solution and a cross-layer interference price balance solution; and then, using the transmission power equalization solution to update the state space of each strategy network in the multi-agent reinforcement learning model, guiding the model to update and optimize to a preset direction, and simplifying the calculation amount of the model. Meanwhile, the multi-agent reinforcement learning model deploys a strategy network and a value network on each micro base station, and the value network can acquire global information and generate more accurate estimated Q values, so that the strategy network generates more optimal action strategies.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.
It will be appreciated by those skilled in the art that the objects and advantages that can be achieved with the present invention are not limited to the above-described specific ones, and that the above and other objects that can be achieved with the present invention will be more clearly understood from the following detailed description.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate and together with the description serve to explain the invention. In the drawings:
fig. 1 is a schematic structural diagram of a resource allocation method for ultra-dense networking multi-service slices according to an embodiment of the invention.
FIG. 2 is a schematic diagram of an ultra-dense networking architecture in accordance with an embodiment of the present invention.
Fig. 3 is a flowchart of a method for resource allocation of ultra-dense networking multi-service slices according to an embodiment of the invention.
Fig. 4 (a) is a schematic diagram of a profit situation when the number of resource blocks is 20 according to an embodiment of the present invention.
Fig. 4 (b) is a schematic diagram illustrating a profit situation when the number of resource blocks is 24 according to an embodiment of the present invention.
Fig. 4 (c) is a schematic diagram illustrating a profit situation when the number of resource blocks is 28 according to an embodiment of the present invention.
Fig. 5 is a schematic diagram showing the comparison between the effects of the madmpg algorithm and the stamperberg game on the average gain impact of the base station for the cross-layer interference price in an embodiment of the invention.
Detailed Description
The present invention will be described in further detail with reference to the following embodiments and the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent. The exemplary embodiments of the present invention and the descriptions thereof are used herein to explain the present invention, but are not intended to limit the invention.
It should be noted here that, in order to avoid obscuring the present invention due to unnecessary details, only structures and/or processing steps closely related to the solution according to the present invention are shown in the drawings, while other details not greatly related to the present invention are omitted.
It should be emphasized that the term "comprises/comprising" when used herein is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.
It is also noted herein that the term "coupled" may refer to not only a direct connection, but also an indirect connection in which an intermediate is present, unless otherwise specified.
Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. In the drawings, the same reference numerals represent the same or similar components, or the same or similar steps.
In order to solve the problems of system interference and system cost caused by the improvement of the utilization rate of system wireless resources and the spectrum efficiency in the prior art, namely the problems of dynamic multi-service slicing and power distribution of macro base stations and micro base stations in the ultra-dense networking when sharing a macro base station network, the invention provides a resource distribution method for the multi-service slicing of the ultra-dense networking. The method comprises the following steps S101 to S102:
step S101: and obtaining a multi-agent reinforcement learning model, wherein the model is obtained through training by a preset method.
Step S102: and inputting the state parameters of each micro base station into a multi-agent reinforcement learning model to generate corresponding action strategies so as to realize multi-service slice resource allocation.
As shown in fig. 1, the price is formulated and charged by researching cross-layer interference caused by macro base stations to multiplexing resource blocks of micro base stations, each micro base station carries out dynamic slicing and power distribution after receiving the pricing scheme of the macro base station, on the basis of not influencing macro base station users, the respective economic benefits and communication quality of the macro base station users and the micro base station users are improved, the problems are decomposed into a transmitting power distribution problem and a slicing resource block distribution problem, and the problems are converted into an optimization problem of maximizing the benefits of the macro base station and the micro base station on the premise of guaranteeing the communication quality of the users and reducing the interference.
Fig. 2 is a schematic diagram of a group of ultra-dense networking architecture, where the ultra-dense networking network includes a macro base station and a plurality of micro base stations, where the macro base station covers a range of the plurality of micro base stations, and users of the micro base stations may multiplex slice resources of the macro base station for communication, but may cause cross-layer interference to the macro base station users. The slice manager is responsible for coordinating the number of resource blocks required by each slice, realizing isolation communication among the slices, and the slices can be deployed in a plurality of micro base stations. To reduce communication signaling overhead, each micro base station performs resource allocation to the respective micro base station and connected users independently and in a distributed manner. It should be noted that fig. 1 and fig. 2 only show a group of architecture of ultra-dense networking as a reference, and the number of macro base stations and micro base stations can be adjusted according to specific requirements and/or specific application scenarios in actual use.
In step S101, in order to obtain a multi-agent reinforcement learning model suitable for the application scenario of the present invention, the initial reinforcement learning model needs to be trained. However, considering that the difficulty of training the initial reinforcement learning model to learn the resource allocation strategy is high, the problems that the model is not optimized according to the preset direction and further cannot obtain a better strategy, the training amount, the operation amount and the like can be solved, so that in order to better describe the resource allocation strategy of a user, the operation amount of the initial reinforcement learning model is simplified, the problem of analysis by a reverse induction method is firstly solved, the game balance is solved, the initial reinforcement learning model is guided to train and optimize in the preset direction, and finally the multi-agent reinforcement learning model required by the invention is obtained, so that the resource allocation is realized.
The method for training the initial reinforcement learning model to obtain the multi-agent reinforcement learning model comprises the following steps:
a strategy network and a value network are deployed on each micro base station; each strategy network constructs a state space by taking the transmission rate and the total transmitting power of each user in a corresponding single micro base station as state parameters; acquiring associated parameters used for indicating whether users in all micro base stations multiplex resource blocks in macro base stations or not, and constructing an action space by taking an associated parameter set of each micro base station and a predicted transmitting power set of other micro base stations as action parameters; each micro base station obtains the state parameters of the micro base station, selects corresponding actions according to the strategy network, generates estimated Q values according to the state parameters of the corresponding micro base station, the selected actions and the state parameters and actions of other micro base stations, and is used for carrying out parameter updating on the strategy network of the corresponding micro base station; constructing a loss function of the estimated Q value and the actual Q value of the model by taking the maximized rewarding value as an optimization target, and carrying out parameter updating on the value network; until the preset performance requirement is reached.
In the state updating process, the macro base station constructs a macro base station income calculation formula according to the cross-layer interference price and the cross-layer interference generated by multiplexing resource blocks in the micro base station by the user; the micro base station constructs a micro base station gain calculation formula according to the association parameter, the fixed bandwidth length of the resource block, the signal to interference plus noise ratio, the same-layer interference price, the same-layer interference, the cross-layer interference price and the cross-layer interference; taking the macro base station as a leader, and taking each micro base station as a follower to construct a non-cooperative game; fixing the values of the associated parameters, and solving the benefit calculation formula of the micro base stations by adopting a reverse induction method to obtain the transmission power equalization solution of each micro base station so as to update the state space of each strategy network; and (5) balancing the transmitting power, namely, jie Dairu, and obtaining a cross-layer interference price balancing solution by using a macro base station gain calculation formula.
Specifically, macro base station benefits and micro base station benefits form a set of Stankleber game, wherein the macro base station is a leader and is responsible for making cross-layer interference prices, and the micro base station is a follower and is responsible for giving associated parameters and transmitting power. For the multi-objective joint optimization problem of the model, solving the game equilibrium solution and the design algorithm is more difficult than the single-objective optimization problem, and the traditional iterative algorithm is insufficient for solving the communication problem among multiple base stations. Therefore, in the invention, a reverse induction method is adopted, the value of the associated parameter is fixed at first, the micro base station gain calculation formula is solved, the transmission power equalization solution of each micro base station is obtained, and the result of the transmission power equalization solution is substituted into the macro base station gain calculation formula to obtain the cross-layer interference price equalization solution. After a fixed equalization solution of the transmitting power is obtained, an optimization scheme of resource block association is provided by combining a multi-agent reinforcement learning model, and information of other micro base stations required by the equalization solution is provided by an independent prediction method. The overall flow of optimizing the resource allocation strategy is shown in figure (3).
In some embodiments, macro base station formulated cross-layer interference pricing
Figure BDA0003963158810000081
The macro base station benefit calculation formula is shown in formula (1):
Figure BDA0003963158810000082
in the formula (1), U MBS Representing macro base station revenues; u (U) UE Representing a set of users for all micro base stations; u (U) PRB Representing the total number of resource blocks; u (U) BS Representing a macro base station and a set of all micro base stations;
Figure BDA0003963158810000083
representing the cross-layer interference price of a user i using a resource block j at a micro base station b; />
Figure BDA0003963158810000084
Indicating cross-layer interference caused by the use of resource block j by user i at micro base station b.
Wherein cross-layer interference
Figure BDA0003963158810000085
The calculation formula of (2) is shown as the following formula:
Figure BDA0003963158810000086
in the formula (2),
Figure BDA0003963158810000087
representing cross-layer interference; u (U) s The type of slice is represented; />
Figure BDA0003963158810000088
Representing the association relationship between the user i and the slice s, the slice resource block j and the micro base station b; />
Figure BDA0003963158810000089
Representing the transmitting power allocated to the resource block j by the user i at the micro base station b;
Figure BDA00039631588100000810
the channel gain of the macro base station resource block j multiplexed by the user i at the micro base station b is shown.
wherein ,
Figure BDA00039631588100000811
the association relationship between the user i and the slice s, the slice resource block j and the micro base station b is shown, which can be specifically understood as follows: if user i uses resource block j belonging to slice s in micro base station b +.>
Figure BDA00039631588100000812
If user i does not use resource block j,/belonging to slice s in micro base station b>
Figure BDA00039631588100000813
As shown in the formula (1), the macro base station obtains the sum of charges for multiplexing resource blocks by the users of all the micro base stations to transmit and generate interference to the macro base station. Each micro base station user has independent benefits, but multiplexing frequency spectrum transmission can cause interference and affect the benefits of other micro base station users and macro base station users, so that the macro base station is used as a leader, each micro base station is used as a follower, and the resource allocation problem of the micro base station is constructed into a non-cooperative game. Each micro base station user is considered a selfish, rational player, and the game strategy space consists of a transmit power allocation strategy space and a slice resource block allocation strategy space, and each micro base station attempts to maximize its utility.
In some embodiments, the micro base station benefit is calculated as shown in formula (3), and the constraint is shown in formulas (4) to (9):
Figure BDA0003963158810000091
s.t.
Figure BDA0003963158810000092
Figure BDA0003963158810000093
Figure BDA0003963158810000094
Figure BDA0003963158810000095
Figure BDA0003963158810000096
Figure BDA0003963158810000097
in the formulas (3) to (9), U b Representing the benefit of the micro base station; u (U) UE,b Representing a set of users of the micro base station b; u (U) s The type of slice is represented;
Figure BDA0003963158810000098
representing the association relationship between the user i and the slice s, the resource block j and the micro base station b; b represents the fixed bandwidth length of the resource block; />
Figure BDA0003963158810000099
Indicating that the user i uses the resource block j to cause the signal to interference plus noise ratio at the micro base station b; />
Figure BDA00039631588100000910
The same-layer interference price of the resource block j used by the user i at the micro base station b is represented; />
Figure BDA00039631588100000911
The same-layer interference caused by the use of the resource block j by the user i at the micro base station b is shown; />
Figure BDA00039631588100000912
Representing the cross-layer interference price of a user i using a resource block j at a micro base station b; />
Figure BDA00039631588100000913
Indicating cross-layer interference caused by a user i using a resource block j at a micro base station b; />
Figure BDA00039631588100000914
Representing the transmitting power allocated to the resource block j by the user i at the micro base station b; u (U) UE Representing a set of users for all micro base stations; u (U) BS Representing a macro base station and a set of all micro base stations; i max Representing the interference maximum; u (U) PRB Representing the total number of resource blocks; τ represents the total number of resource blocks.
In each constraint condition, the lower limit of the transmitting power of the micro base station user is indicated by the formula (4); equation (5) shows the maximum value of all users' co-layer interference; equation (6) shows the maximum value of cross-layer interference for all users; equation (7) is a limitation of the number of resource blocks of a slice, and the number of allocated resource blocks in all slices cannot exceed the total number of the resource blocks of the slice; equation (8) shows that the same user is associated with at most one base station; equation (9) is an isolation constraint for a slice, indicating that one resource block can only be allocated to the same user at a time.
In equation (3), co-layer interference
Figure BDA00039631588100000915
The calculation formula of (2) is shown in the formula (10):
Figure BDA00039631588100000916
in the formula (10), K b',b A binary variable indicating whether or not the micro base station b' overlaps or is adjacent to the micro base station b;
Figure BDA0003963158810000101
representing the association relationship between the user i and the slice s, the resource block j and the micro base station b; />
Figure BDA0003963158810000102
Representing the transmitting power allocated to the resource block j by the user i at the micro base station b; />
Figure BDA0003963158810000103
Indicating the channel gain of user i at resource block j of micro base station b'.
The method for solving the transmission power equalization solution and the cross-layer interference price equalization solution comprises the following steps:
fixing associated parameters
Figure BDA0003963158810000104
And (3) obtaining the maximum value of the micro base station gain, wherein the micro base station gain calculation formula, namely the formula (3) is simplified as follows:
Figure BDA0003963158810000105
constraint 1:
Figure BDA0003963158810000106
constraint 2:
Figure BDA0003963158810000107
to U b And (3) deriving to obtain:
Figure BDA0003963158810000108
obtaining a second derivative, and obtaining:
Figure BDA0003963158810000109
/>
thus U b On the domain is a convex function, where σ represents the white noise of the channel.
From the following components
Figure BDA00039631588100001010
Obtaining:
Figure BDA00039631588100001011
and obtaining the transmitting power equalization solution of each micro base station.
From the following components
Figure BDA00039631588100001012
The constraint of (2) can be obtained:
Figure BDA00039631588100001013
order the
Figure BDA00039631588100001014
Let us assume->
Figure BDA00039631588100001015
It is also necessary to ensure that the macro base station benefit is maximum, and the calculation formula of the macro base station benefit is expressed as:
Figure BDA0003963158810000111
constraint 1:
Figure BDA0003963158810000112
constraint 2:
Figure BDA0003963158810000113
macro base station benefit and cross-layer interference price
Figure BDA0003963158810000114
Regarding the value interval of (a), a discontinuous optimization problem is presented, so that an index function is introduced first:
Figure BDA0003963158810000115
Given a given
Figure BDA0003963158810000116
When U MBS Becomes a fully micro-functional. In the Stankleberg game, the default macro base station has the ability to collect channel information between the macro base station and each micro base station, i.e./the>
Figure BDA0003963158810000117
(user i multiplexes the channel gain of macro base station resource block j at micro base station b), the calculation of macro base station benefit can be rewritten as follows:
Figure BDA0003963158810000118
constraint 1:
Figure BDA0003963158810000119
constraint 2:
Figure BDA00039631588100001110
constraint 3:
Figure BDA00039631588100001111
and has been obtained above
Figure BDA00039631588100001112
/>
Order the
Figure BDA00039631588100001113
Figure BDA00039631588100001114
The KKT condition may be written as:
Figure BDA00039631588100001115
Figure BDA00039631588100001116
Figure BDA00039631588100001117
Figure BDA00039631588100001118
α,β,γ≥0;
from the KKT condition:
Figure BDA0003963158810000121
analysis resulted in α=β=0.
The KKT condition is a form of the Lagrangian multiplier method, and is mainly applied to an optimal solving mode under the condition that the optimization function has non-equivalent constraint.
In summary, the macro base station cross-layer interference price equalization solution can be obtained in the following form:
(1) When (when)
Figure BDA0003963158810000122
In the time-course of which the first and second contact surfaces,
Figure BDA0003963158810000123
(2) When (when)
Figure BDA0003963158810000124
Figure BDA0003963158810000125
In the time-course of which the first and second contact surfaces,
Figure BDA0003963158810000126
/>
Figure BDA0003963158810000127
.......
(N) when
Figure BDA0003963158810000128
Figure BDA0003963158810000129
In the time-course of which the first and second contact surfaces,
Figure BDA00039631588100001210
Figure BDA00039631588100001211
Figure BDA00039631588100001212
thus, a set of stonelberg betting solutions is obtained:
Figure BDA00039631588100001213
and />
Figure BDA00039631588100001214
Due to the +.>
Figure BDA00039631588100001215
and />
Figure BDA00039631588100001216
When (I)>
Figure BDA00039631588100001217
And
Figure BDA00039631588100001218
is unique and is therefore a nash equilibrium solution.
After a fixed equilibrium solution of the transmitting power is obtained, a state space of a strategy network of the multi-agent reinforcement learning model is set, a multi-agent reinforcement learning model is utilized to provide related parameters of resource blocks, information of other base stations required by the equilibrium solution is provided through an independent prediction method, and resource allocation optimization is carried out in combination with game results.
In some embodiments, the multi-agent reinforcement learning model adopts a madppg algorithm, madppg is an extension of DDPG in multi-agent tasks, and the basic idea is centralized learning and decentralized execution. MADDPG algorithm introduces critic capable of observing global information to guide actor training when model training, and only uses actor with local observation to take action when testing.
The multi-agent reinforcement learning model deploys a strategy network (strategy generated by an actor) and a value network (strategy generated by a critic evaluation actor) on each micro base station, wherein the actor can only acquire the information of the micro base station to which the actor belongs, and the critic can acquire the information of all the micro base stations. Because of the non-cooperative competition relationship among the micro base stations in the invention, the targets of the micro base stations are different, and each micro base station has a strategy network and a value network corresponding to the targets.
In some embodiments, each policy network constructs a state space with the transmission rate and total transmit power of each user in a corresponding single micro base station as state parameters designed as shown in equation (11):
Figure BDA0003963158810000131
in the formula (11), r N,j (t) represents the transmission rate of the nth user multiplexing resource block j at the time t;
Figure BDA0003963158810000132
The user i allocates the transmitting power of the resource block j in the micro base station b; u (U) j Representing the set of users multiplexing resource block j.
The first N items r 1,j (t),r 2,j (t),…,r N,j (t) represents the transmission rate of all user multiplexing resource blocks j at the moment t; the n+1th item represents the sum of the transmission powers allocated by the users of all the multiplexing resource blocks j.
wherein ,
Figure BDA0003963158810000133
for the transmit power equalization solution found above, specific: />
Figure BDA0003963158810000134
In some embodiments, the association parameters used for indicating whether the user in each micro base station multiplexes the resource blocks in the macro base station are obtained, and an action space is constructed by taking the association parameter set of each micro base station and the predicted transmission power set of other micro base stations as action parameters, wherein the action parameters are designed as shown in a formula (12):
a j (t)={W j ,P -j }; (12)
in the formula (11), W j Representation of
Figure BDA0003963158810000135
A set of associated parameters; p (P) -j Representing the predicted set of other micro base station transmit powers, as shown in equation (13) and equation (14):
Figure BDA0003963158810000136
Figure BDA0003963158810000137
wherein ,
Figure BDA0003963158810000141
representing the association relationship between the user i and the slice s, the resource block j and the micro base station b; />
Figure BDA0003963158810000142
Representing the transmitting power allocated to the resource block j by the user i at the micro base station b; u (U) j Representing the set of users multiplexing resource block j.
In some embodiments, the value network of each micro base station generates an estimated Q value according to the state parameters and the selected actions of the corresponding micro base station and the state parameters and actions of other micro base stations, and the state parameters of the value network are shown in formula (15):
s j '(t)=(s j (t),a j (t),s -j (t),a -j (t)); (15)
In the formula (15), s j (t) represents a state parameter of the micro base station at the time t; a, a j (t) represents an operation parameter of the micro base station at the time t; s is(s) -j (t) represents a state parameter set of other micro base stations at the moment t; a, a -j And (t) represents the action parameter set of other micro base stations at the moment t.
In the training process of the multi-agent reinforcement learning model, each micro base station (actor) randomly samples according to the state of the current moment, selects and executes corresponding actions, and correspondingly, a value network (critic) calculates an estimated Q value according to the state of the micro base station and the selected actions to serve as feedback for the actions of the micro base station. The strategy network updates the strategy according to critic feedback, and the value network builds a loss function according to the estimated Q value and the actual Q value for training. In the invention, critic can acquire global information, namely, the state and action of other micro base stations can be acquired, so as to obtain a more accurate estimated Q value.
In the multi-agent reinforcement learning model test process, each micro base station randomly samples according to the state at the current moment, selects and executes corresponding actions, at the moment, feedback of critic is not needed any more, and the state or action of other micro base stations is not needed to be relied on, so that decentralized execution is realized.
In the present invention, entering a state parameter results in a deterministic action and is therefore a deterministic strategy.
For deterministic policies, constructing a policy gradient for updating the policy network according to the estimated Q value of the value network, wherein the policy gradient is calculated as shown in formula (16):
Figure BDA0003963158810000143
in formula (16), θ represents a policy parameter; j (u) j ) Representing a cumulative estimated prize value; d represents an experience playback pool; u (u) j (a j |s j ) Representing an action strategy made by the micro base station according to the state;
Figure BDA0003963158810000144
representing a value network; s is(s) j Representing the state of the micro base station estimated by the value network; a, a j Representing the motion of the micro base station estimated by the value network; s is(s) other Representing the state of other micro base stations estimated by the value network; a, a other Representing the actions of other micro base stations estimated by the value network.
For the value network, the parameters need to be updated by constructing a loss function of the estimated Q value and the actual Q value of the model by taking the maximized rewarding value as an optimization target.
For setting the rewarding value, firstly, calculating the total transmission rate of multiplexing resource blocks of each user of the micro base station, wherein the calculation formula is shown in a formula (17):
Figure BDA0003963158810000151
in the formula (17), U j Representing a set of users multiplexing resource blocks j; r is (r) i,j Representing the transmission rate of user i using resource block j; u (U) s The type of slice is represented;
Figure BDA0003963158810000152
Representing the association relationship between the user i and the slice s, the slice resource block j and the micro base station b; b represents the fixed bandwidth length of the resource block; />
Figure BDA0003963158810000153
Indicating that user i is using resource block j at micro base station b to create a signal to interference plus noise ratio.
wherein ,
Figure BDA0003963158810000154
Figure BDA0003963158810000155
and (5) performing equalization solution on the transmission power obtained above.
Figure BDA0003963158810000156
Indicating the transmitting power of the user i allocated to the slice resource block j at the micro base station b; />
Figure BDA0003963158810000157
The channel gain of the resource block j of the micro base station b of the channel gain user i is shown; />
Figure BDA0003963158810000158
Representing the transmission power allocated to the resource block j by the user i at the micro base station b'; />
Figure BDA0003963158810000159
Indicating the channel gain of user i at resource block j of micro base station b'.
On this basis, in order to represent the competition and cooperation relationship between the micro base station and other micro base stations, the prize value calculation formula is designed as shown in formula (18):
Figure BDA00039631588100001510
in the formula (18), u j Representing adaptive weight parameters; r is (r) j Representing the total transmission rate of each user multiplexing resource block j of the micro base station; n represents the total number of users; r is (r) -j Representing the transmission rate of other micro base stations;
Figure BDA00039631588100001511
representing the interference price of the same layer; />
Figure BDA00039631588100001512
Representing co-layer interference; />
Figure BDA00039631588100001513
Representing cross-layer interference prices; />
Figure BDA00039631588100001514
Representing cross-layer interference; u (U) j Representing the set of users multiplexing resource block j.
Specifically, the adaptive weight parameter is learned according to the state of the global environment, when u j When=1, the prize value is only related to the transmission rate of the micro base station, and is zero and game; when 0 is<u j <When 1, the prize value is related to the transmission rate of the micro base station, and also related to the transmission rates of other micro base stations, so as to form the hybrid game.
The loss function constructed according to the estimated Q value and the actual Q value of the model is shown in a formula (19):
Figure BDA00039631588100001515
in the formula (19), θ represents a policy parameter; u (u) j Representing adaptive weight parameters; r represents the updated result of the action;
Figure BDA00039631588100001516
representing a value network; s is(s) j Representing the state of the micro base station estimated by the value network; a, a j Representing the motion of the micro base station estimated by the value network; s is(s) other Representing the state of other micro base stations estimated by the value network; a, a other Representing the actions of other micro base stations estimated by the value network; />
Figure BDA00039631588100001517
Representing the actual Q value.
Figure BDA00039631588100001518
The updated calculation formula is shown in formula (20):
Figure BDA00039631588100001519
in the formula (20), r j Representing the total transmission rate of each user multiplexing resource block j of the micro base station;
Figure BDA0003963158810000161
representing an actual value network; s' j Representing the actual state of the micro base station; a' j Representing the actual actions of the micro base station; s' other Representing the actual state of other micro base stations; a' other Representing the actual actions of other micro base stations; u's' j Representing actual adaptive weight parameters; o (o) j Representing the local information observed by the micro base station.
Updating the parameters of the value network by taking the maximized rewarding value as an optimization target; the strategy network builds a strategy gradient according to the estimated Q value generated by the value network, and updates the strategy gradient to enable the generated action strategy to be more accurate until the initial reinforcement learning model reaches the preset performance requirement, so that the multi-agent reinforcement learning model required by the invention is obtained.
In step S102, the state parameters of each micro base station are input into the trained multi-agent reinforcement learning model, and corresponding action strategies are generated to realize multi-service slice resource allocation.
The invention is described in detail below with reference to one example:
according to the ultra-dense networking multi-service slice resource allocation method provided by the invention, simulation experiments are carried out.
The method is provided with a macro base station and five micro base stations, and the positions of the macro base station and the micro base stations are fixed. The user accesses the base station with the maximum channel gain according to the channel gain of the user. For the micro base station, the number of resource blocks k=24, the number of users is 16, the bandwidth b=10 MHz on each subcarrier, and the transmitting power on each subchannel does not exceed 1W. In the initial reinforcement learning model, the learning rate of actor is 0.02 and the learning rate of critic is 0.01.
In the simulation experiment, the situation of user benefit is explained by changing the number of resource blocks. Specifically, ten step sizes are taken as one round, and the average value of the ten step sizes is taken as an experimental result. As shown in fig. 4 (a) to 4 (c), the first 50 rounds are pre-training phases, and it can be seen that all base stations try to change their own strategy to get higher benefits by themselves until all base stations converge. And as the number of resource blocks increases, the benefit of all micro base stations is increased, but the rate of convergence is slowed down. Wherein the number of resource blocks k=20 of fig. 4 (a), and the number of resource blocks k=24 of fig. 4 (b); the number of resource blocks k=28 of fig. 4 (c).
As shown in fig. 5, the case of the average benefit of the user is explained by changing the cross-layer interference price. According to game theory result, under the condition of unchanged other parameters, the cross-layer interference price and the micro base station income are available
Figure BDA0003963158810000162
While other resources are allocated at the same time, the average benefit of the micro base station gradually decreases with the increase of the cross-layer interference price, but the decrease amplitude gradually decreases. Meanwhile, the result of the MADDPG algorithm is compared with the theoretical value of the Stankleberg game, and is close to the theoretical value of the game and slightly higher than the theoretical value of the game due to the income self-adaptive adjustment mechanism.
The invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of a method.
Accordingly, the present invention also provides an apparatus comprising a computer apparatus including a processor and a memory, the memory having stored therein computer instructions for executing the computer instructions stored in the memory, the apparatus implementing the steps of the method as described above when the computer instructions are executed by the processor.
The embodiments of the present invention also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the edge computing server deployment method described above. The computer readable storage medium may be a tangible storage medium such as Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, floppy disks, hard disk, a removable memory disk, a CD-ROM, or any other form of storage medium known in the art.
In summary, the invention provides a method and a device for distributing resources of ultra-dense networking multi-service slices, which are characterized in that a multi-agent reinforcement learning model obtained through pre-training is obtained, and state parameters of each micro base station are input into the multi-agent reinforcement learning model, so that action strategies are correspondingly generated, and the competition and interference of the micro base stations are reduced on the premise of ensuring the communication quality and the communication requirement of users, thereby optimizing the radio resource distribution and relieving the frequency spectrum tension.
In multi-agent reinforcement learning model training, constructing a macro base station income calculation formula and a micro base station income calculation formula in advance, modeling a resource allocation problem as a non-cooperative game, and obtaining a transmitting power balance solution and a cross-layer interference price balance solution; and then, using the transmission power equalization solution to update the state space of each strategy network in the multi-agent reinforcement learning model, guiding the model to update and optimize to a preset direction, and simplifying the calculation amount of the model. Meanwhile, the multi-agent reinforcement learning model deploys a strategy network and a value network on each micro base station, and the value network can acquire global information and generate more accurate estimated Q values, so that the strategy network generates more optimal action strategies.
Those of ordinary skill in the art will appreciate that the various illustrative components, systems, and methods described in connection with the embodiments disclosed herein can be implemented as hardware, software, or a combination of both. The particular implementation is hardware or software dependent on the specific application of the solution and the design constraints. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave.
It should be understood that the invention is not limited to the particular arrangements and instrumentality described above and shown in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and shown, and those skilled in the art can make various changes, modifications and additions, or change the order between steps, after appreciating the spirit of the present invention.
In this disclosure, features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, and various modifications and variations can be made to the embodiments of the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. The ultra-dense networking multi-service slice resource allocation method is characterized in that the ultra-dense networking comprises at least one macro base station, and each macro base station is also connected with a plurality of micro base stations for service; the users of the micro base stations multiplex slice resources of the Ying Hong base stations, and the method is used for carrying out multi-service slice resource allocation based on cross-layer interference generated between the micro base stations and the macro base stations and same-layer interference generated between adjacent micro base stations; the method comprises the following steps:
acquiring a multi-agent reinforcement learning model, wherein the multi-agent reinforcement learning model deploys a strategy network and a value network on each micro base station; each strategy network constructs a state space by taking the transmission rate and the total transmitting power of each user in a corresponding single micro base station as state parameters; acquiring associated parameters used for indicating whether users in all micro base stations multiplex resource blocks in macro base stations or not, and constructing an action space by taking an associated parameter set of each micro base station and a predicted transmitting power set of other micro base stations as action parameters; each micro base station obtains the state parameters of the micro base station, selects corresponding actions according to the strategy network, generates estimated Q values according to the state parameters of the corresponding micro base station, the selected actions and the state parameters and actions of other micro base stations by the value network of each micro base station, and is used for carrying out parameter updating on the strategy network of the corresponding micro base station; constructing a loss function of the estimated Q value and the actual Q value of the model by taking the maximized rewarding value as an optimization target, and carrying out parameter updating on the value network; until reaching the preset performance requirement;
In the state updating process, the macro base station constructs a macro base station income calculation formula according to the cross-layer interference price and the cross-layer interference generated by multiplexing resource blocks in the micro base station by the user; the micro base station constructs a micro base station gain calculation formula according to the association parameter, the fixed bandwidth length of the resource block, the signal-to-interference-plus-noise ratio, the same-layer interference price, the same-layer interference, the cross-layer interference price and the cross-layer interference; taking the macro base station as a leader and taking each micro base station as a follower to construct a non-cooperative game; fixing the values of the associated parameters, and solving the micro base station profit calculation by adopting a reverse induction method to obtain the transmitting power equalization solution of each micro base station so as to update the state space of each strategy network; substituting the transmission power equalization solution into the macro base station income calculation to obtain the cross-layer interference price equalization solution;
and inputting the state parameters of each micro base station into the multi-agent reinforcement learning model to generate corresponding action strategies so as to realize multi-service slice resource allocation.
2. The method for distributing ultra-dense networking multi-service slice resources according to claim 1, wherein the macro base station constructs a macro base station profit calculation formula according to a cross-layer interference price and cross-layer interference generated by multiplexing resource blocks in the micro base station by users, and the macro base station profit calculation formula is as follows:
Figure FDA0003963158800000011
wherein ,UMBS Representing the macro base station benefit; u (U) UE Representing a set of users for all micro base stations; u (U) PRB Representing the total number of resource blocks; u (U) BS Representing the macro base station and a set of all micro base stations;
Figure FDA0003963158800000012
cross-layer interference indicating that user i uses resource block j at micro base station bDisturbing the price; />
Figure FDA0003963158800000021
Indicating cross-layer interference caused by the use of resource block j by user i at micro base station b.
3. The method for distributing resources of ultra-dense networking multi-service slices according to claim 1, wherein the micro base station constructs a micro base station profit calculation formula according to the association parameter, a resource block fixed bandwidth length, a signal-to-interference-plus-noise ratio, a same-layer interference price, same-layer interference, the cross-layer interference price and the cross-layer interference, and the micro base station profit calculation formula is:
Figure FDA0003963158800000022
s.t.
Figure FDA0003963158800000023
Figure FDA0003963158800000024
/>
Figure FDA0003963158800000025
Figure FDA0003963158800000026
Figure FDA0003963158800000027
Figure FDA0003963158800000028
wherein ,Ub Representing the micro base station benefit; u (U) UE,b Representing a set of users of the micro base station b; u (U) s The type of slice is represented;
Figure FDA0003963158800000029
representing the association relationship between the user i and the slice s, the resource block j and the micro base station b; b represents the fixed bandwidth length of the resource block; />
Figure FDA00039631588000000210
Indicating that the user i uses the resource block j to cause the signal to interference plus noise ratio at the micro base station b; />
Figure FDA00039631588000000211
The same-layer interference price of the resource block j used by the user i at the micro base station b is represented; />
Figure FDA00039631588000000212
The same-layer interference caused by the use of the resource block j by the user i at the micro base station b is shown; / >
Figure FDA00039631588000000213
Representing the cross-layer interference price of a user i using a resource block j at a micro base station b; />
Figure FDA00039631588000000214
Indicating cross-layer interference caused by a user i using a resource block j at a micro base station b; />
Figure FDA00039631588000000215
Representing the transmitting power allocated to the resource block j by the user i at the micro base station b; u (U) UE Representing a set of users for all micro base stations; u (U) BS Representing the macro base station and a set of all micro base stations; i max Representing the interference maximum; u (U) PRB Representing the total number of resource blocks; τ represents the total number of resource blocks。
4. The method for distributing resources of ultra-dense networking multi-service slices according to claim 1, wherein each policy network constructs a state space by taking a transmission rate of each user in a corresponding single micro base station and a total transmission power as state parameters, wherein the total transmission power uses the transmission power equalization solution, and the state parameters are expressed as:
Figure FDA00039631588000000216
wherein ,sj (t) represents a state parameter of the micro base station at the time t; r is (r) N,j (t) represents the transmission rate of the nth user multiplexing resource block j at the time t;
Figure FDA00039631588000000217
the user i allocates the transmitting power of the resource block j in the micro base station b; u (U) j Representing a set of users multiplexing resource blocks j;
the value network of each micro base station generates a predicted Q value according to the state parameters and the selected actions of the corresponding micro base station and the state parameters and actions of other micro base stations, and the state parameters of the value network are expressed as follows:
s j '(t)=(s j (t),a j (t),s -j (t),a -j (t));
wherein ,sj (t) represents a state parameter of the micro base station at the time t; a, a j (t) represents an operation parameter of the micro base station at the time t; s is(s) -j (t) represents a state parameter set of other micro base stations at the moment t; a, a -j And (t) represents the action parameter set of other micro base stations at the moment t.
5. The method for allocating resources to ultra-dense networking multi-service slices according to claim 1, wherein the method is characterized in that the method comprises the steps of obtaining association parameters for indicating whether users in each micro base station multiplex resource blocks in macro base stations, and constructing an action space by taking an association parameter set of each micro base station and a predicted transmission power set of each other micro base station as action parameters, wherein the action parameters are expressed as:
a j (t)={W j ,P -j };
wherein ,
Figure FDA0003963158800000031
Figure FDA0003963158800000032
/>
wherein ,aj (t) represents an operation parameter of the micro base station at the time t; w (W) j Representing a set of associated parameters; p (P) -j Representing a set of predicted transmit powers of other micro base stations;
Figure FDA0003963158800000033
representing the association relationship between the user i and the slice s, the resource block j and the micro base station b; />
Figure FDA0003963158800000034
Representing the transmitting power allocated to the resource block j by the user i at the micro base station b; u (U) j Representing the set of users multiplexing resource block j.
6. The method for distributing resources of ultra-dense networking multi-service slices according to claim 1, wherein the value network of each micro base station generates an estimated Q value according to the state parameters and the selected actions of the corresponding micro base station and the state parameters and actions of other micro base stations, constructs a policy gradient according to the estimated Q value, and is used for parameter updating of the policy network of the corresponding micro base station, and the calculation formula of the policy gradient is as follows:
Figure FDA0003963158800000035
wherein ,
Figure FDA0003963158800000036
representing the policy gradient; θ represents a policy parameter; j (u) j ) Representing a cumulative estimated prize value; d represents an experience playback pool; u (u) j (a j |s j ) Representing an action strategy made by the micro base station according to the state; />
Figure FDA0003963158800000037
Representing the value network; s is(s) j Representing the state of the micro base station estimated by the value network; a, a j Representing the motion of the micro base station estimated by the value network; s is(s) other Representing the state of other micro base stations estimated by the value network; a, a other And representing the actions of other micro base stations estimated by the value network.
7. The method for distributing resources of ultra-dense networking multi-service slices according to claim 1, wherein a loss function of the estimated Q value and the actual Q value of the model is constructed by taking a maximized rewarding value as an optimization target, and parameters of the value network are updated, and the calculation formula of the loss function is as follows:
Figure FDA0003963158800000038
wherein ,
Figure FDA0003963158800000041
representing the loss function; θ represents a policy parameter; u (u) j Representing adaptive weight parameters; r represents the updated result of the action; />
Figure FDA0003963158800000042
Representing the value network; s is(s) j Representing the state of the micro base station estimated by the value network; a, a j Representing the motion of the micro base station estimated by the value network; s is(s) other Representing the priceThe state of other micro base stations estimated by the value network; a, a ohter Representing the actions of other micro base stations estimated by the value network; />
Figure FDA0003963158800000043
Representing the actual Q value.
8. The method for distributing resources of ultra-dense networking multi-service slices according to claim 7, wherein a loss function of the estimated Q value and the actual Q value of the model is constructed by taking a maximized rewarding value as an optimization target, and parameters of the value network are updated, wherein the calculating formula of the rewarding value is as follows:
Figure FDA0003963158800000044
wherein, reward j Representing the prize value; u (u) j Representing the adaptive weight parameters; r is (r) j Representing the total transmission rate of each user multiplexing resource block j of the micro base station; n represents the total number of users; r is (r) -j Representing the total transmission rate of other micro base stations;
Figure FDA0003963158800000045
representing the interference price of the same layer; />
Figure FDA0003963158800000046
Representing co-layer interference; />
Figure FDA0003963158800000047
Representing cross-layer interference prices; />
Figure FDA0003963158800000048
Representing cross-layer interference; u (U) j Representing the set of users multiplexing resource block j.
9. The method for distributing resources of ultra-dense networking multi-service slices according to claim 8, wherein the adaptive weight parameters are learned according to the state of a global environment;
when u is j When=1, the prize value is only related to the transmission rate of the micro base station, is zero and is played;
when 0 is<u j <1, the prize value is related to the transmission rate of the micro base station itself and the transmission rate of other micro base stations, so as to form a hybrid game.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 9.
CN202211487474.4A 2022-11-25 2022-11-25 Ultra-dense networking multi-service slice resource allocation method and device Pending CN115996475A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211487474.4A CN115996475A (en) 2022-11-25 2022-11-25 Ultra-dense networking multi-service slice resource allocation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211487474.4A CN115996475A (en) 2022-11-25 2022-11-25 Ultra-dense networking multi-service slice resource allocation method and device

Publications (1)

Publication Number Publication Date
CN115996475A true CN115996475A (en) 2023-04-21

Family

ID=85989626

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211487474.4A Pending CN115996475A (en) 2022-11-25 2022-11-25 Ultra-dense networking multi-service slice resource allocation method and device

Country Status (1)

Country Link
CN (1) CN115996475A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116562218A (en) * 2023-05-05 2023-08-08 之江实验室 Method and system for realizing layout planning of rectangular macro-cells based on reinforcement learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116562218A (en) * 2023-05-05 2023-08-08 之江实验室 Method and system for realizing layout planning of rectangular macro-cells based on reinforcement learning
CN116562218B (en) * 2023-05-05 2024-02-20 之江实验室 Method and system for realizing layout planning of rectangular macro-cells based on reinforcement learning

Similar Documents

Publication Publication Date Title
CN109639377B (en) Spectrum resource management method based on deep reinforcement learning
Huo et al. Stackelberg game-based energy-efficient resource allocation for 5G cellular networks
US20100097948A1 (en) Characterization of co-channel interference in a wireless communication system
CN113038616B (en) Frequency spectrum resource management and allocation method based on federal learning
JP2013506375A (en) User scheduling and transmission power control method and apparatus in communication system
Al-Zahrani et al. A game theory approach for inter-cell interference management in OFDM networks
CN109982434A (en) Wireless resource scheduling integrated intelligent control system and method, wireless communication system
Qi et al. Energy-efficient resource allocation for UAV-assisted vehicular networks with spectrum sharing
CN115996475A (en) Ultra-dense networking multi-service slice resource allocation method and device
LeAnh et al. Orchestrating resource management in LTE-unlicensed systems with backhaul link constraints
Asuquo et al. Optimized channel allocation in emerging mobile cellular networks
Tang et al. Nonconvex dynamic spectrum allocation for cognitive radio networks via particle swarm optimization and simulated annealing
CN113382414A (en) Non-orthogonal multiple access system resource allocation method and device based on network slice
Kim Femtocell network power control scheme based on the weighted voting game
CN112272410B (en) Model training method for user association and resource allocation in NOMA (non-orthogonal multiple Access) network
Corcoran et al. Reinforcement learning for automated energy efficient mobile network performance tuning
CN105228233B (en) A kind of Poewr control method based on monotonicity optimization and linear search in cognition wireless network
Ron et al. Learning-based joint optimization of mode selection and transmit power control for D2D communication underlaid cellular networks
US9913170B2 (en) Wireless communication methods and apparatus
CN114051252A (en) Multi-user intelligent transmitting power control method in wireless access network
KR100932918B1 (en) Method for controlling inter-cell interference of terminal in wireless communication system
Wang et al. Two‐level scheme to maximise the number of guaranteed users in downlink femtocell networks
Kelechi et al. D-GRACE: Discounted spectrum price game-based resource allocation in a competitive environment for TVWS networks
Gandhi et al. Coverage, Capacity and Cost Analysis of 4G-LTE and 5G Networks: A Case Study of Ahmedabad and Gandhinagar
George Optimal and Game-theoretic Resource Allocations for Multiuser Wireless Energy-Harvesting and Communications Systems.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination