CN108966325A - A kind of optimal decoding sequence uplink transmission time optimization method of nonopiate access based on depth deterministic policy gradient - Google Patents
A kind of optimal decoding sequence uplink transmission time optimization method of nonopiate access based on depth deterministic policy gradient Download PDFInfo
- Publication number
- CN108966325A CN108966325A CN201810668879.5A CN201810668879A CN108966325A CN 108966325 A CN108966325 A CN 108966325A CN 201810668879 A CN201810668879 A CN 201810668879A CN 108966325 A CN108966325 A CN 108966325A
- Authority
- CN
- China
- Prior art keywords
- cur
- test
- transmission time
- uplink transmission
- intelligent terminal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/02—Power saving arrangements
- H04W52/0209—Power saving arrangements in terminal devices
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/02—Power saving arrangements
- H04W52/0209—Power saving arrangements in terminal devices
- H04W52/0212—Power saving arrangements in terminal devices managed by the network, e.g. network or access point is master and terminal is slave
- H04W52/0219—Power saving arrangements in terminal devices managed by the network, e.g. network or access point is master and terminal is slave where the power saving management affects multiple terminals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/04—TPC
- H04W52/18—TPC being performed according to specific parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W16/00—Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
- H04W16/14—Spectrum sharing arrangements between different networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The optimal decoding sequence uplink transmission time optimization method of a kind of nonopiate access based on depth deterministic policy gradient, comprising the following steps: (1) giving definite decoding sequence πmUnder conditions of, optimization problem is described as nonconvex property optimization problem;(P1-m) problem is in given intelligent terminal upload amountIn the case where find optimal whole radio resource consumption, observation (P1-m) problem knows only one variable of its objective function;(2) and (3) find optimal uplink transmission time by depth deterministic policy method, so that there is optimal whole radio resource consumption;(4) optimal decoding is found using algorithm OptOrder-Algorithm to sort, then combined depth nitrification enhancement, finally export global minima entirety radio resource consumption and global optimum's uplink transmission time.The present invention improves system efficiency of transmission, more good wireless network Quality of experience is obtained, so that there is optimal whole radio resource consumption.
Description
Technical field
The invention belongs to the communications field, a kind of optimal decoding sequence of nonopiate access based on depth deterministic policy gradient
Uplink transmission time optimization method.
Background technique
The extensive connection for adapting to Internet of Things (Internet of Thing, IoT) application has been considered as following 5G honeycomb system
The important goal of system.Non-orthogonal multiple access (Non-orthogonal Multiple Access, NOMA) makes one group of intelligence eventually
End (Smart Terminal, ST) can share identical spectral channel simultaneously and be transmitted, to realize that spectrum efficient data pass
Defeated target provides a kind of effective method.It is contemplated that the uplink in wireless network is transmitted, wherein intelligent terminal (example
Such as smartwatch) using NOMA technology send their data to access hot spot.We are intended to reduce to the maximum extent whole wireless
Consumed resource, including uplink transmission time and uplink gross energy.
Summary of the invention
The uplink transmission time of the prior art is longer, the biggish deficiency of intelligent terminal energy consumption in order to overcome, the present invention
There is provided a kind of minimum uplink transmission time and all intelligent terminal total power consumptions based on depth deterministic policy gradient
The nonopiate optimal decoding sequence uplink transmission time optimization method of access, the present invention difficult point excessive for uplink transmission time,
Primary concern is that transmitting data using nonopiate access technology, have studied a kind of based on the non-of depth deterministic policy gradient
The orthogonal optimal decoding sequence uplink transmission time optimization method of access.
The technical solution adopted by the present invention to solve the technical problems is:
A kind of nonopiate access based on depth deterministic policy gradient optimal decoding sequence uplink transmission time optimization side
Method, comprising the following steps:
(1) a total of I intelligent terminal under the coverage area of access hot spot, intelligent terminal setIt indicates, that is to say, that give one group of intelligent terminalJust there is I!Kind decoding sequence,
Intelligent terminal sends data to access hot spot simultaneously using nonopiate access technology, and wherein intelligent terminal i needs the data sent
Amount is usedIt indicates;
Guaranteeing to be sent completely the data volume of all intelligent terminals and is giving a kind of decoding sequence πm, wherein m=1,
2,…,I!Under conditions of, minimize uplink transmission time and all intelligent terminal total power consumptions optimization problem be described as
Optimization problem (P1-m) problem shown in lower:
0≤tm≤Tmax (1-3)
Variables:tm
Each variable in problem is done into an explanation below, as follows:
πm(i): giving definite decoding sequence πmUnder conditions of, the decoding order of intelligent terminal i;
α: the weight factor of uplink transmission time;
β: the weight factor of uplink total power consumption;
tm: intelligent terminal sends data to the uplink transmission time of access hot spot, and unit is the second;
It is about tmFunction, indicate m kind decode sequence πmIn the case where, intelligent terminal i is on given
Row transmission time tmInterior completion sends data volumeRequired minimum emissive power, unit are watts;
W: for intelligent terminal to the channel width of access hot spot, unit is hertz;
n0: the spectral power density of channel background noise;
giA: channel power gain of the intelligent terminal i to access hot spot;
Intelligent terminal i needs to be sent to the data volume of access hot spot, and unit is megabit;
Intelligent terminal i maximum uploads energy consumption, and unit is joule;
Tmax: intelligent terminal sends data to the maximum uplink transmission time of access hot spot, and unit is the second;
(P1-m) problem is in given intelligent terminal upload amountIn the case where find the smallest entirety and wirelessly provide
Source consumption (including uplink transmission time and all intelligent terminal total power consumptions), observation (P1-m) problem know its target
Only one variable of function t*,m;
(2) an optimal uplink transmission time is found by depth deterministic policy gradient method is denoted as t*,m, the depth
Deterministic policy gradient method is spent by execution unit, and scoring unit and environment are formed;When the uplink of all intelligent terminals
Between tmWith the minimum emissive power of each intelligent terminalIt is all compiled into state x needed for execution unitT, execute
Unit takes movement a to uplink transmission time t under current statemIt is modified and enters next state xT+1, obtain simultaneously
Reward r (the x that environment returnsT, a), score unit bonding state xT, act the reward r (x that a and environment returnT, a) to execution
Unit marking, that is, show execution unit in state xTUnder take movement a be it is bad, the target of execution unit is exactly to allow judge paper
Member make point the higher the better, and the target for the unit that scores is to allow oneself to get every time point all close to really, passing through reward r
(xT, a) adjust;In execution unit, score under unit and the continuous interactive refreshing of environment, tmIt will be constantly optimised whole until finding
The minimum value of body radio resource consumption, the update mode for the unit that scores are as follows:
S(xT, a)=r (xT,a)+γS′(xT+1,a′) (2-1)
Wherein, each parameter definition is as follows:
xT: in moment T, system status;
xT+1: in moment T+1, system status;
A: in the movement that current state execution unit is taken;
A ': in the movement that NextState execution unit is taken;
S(xT, a): the assessment network in execution unit is in state xTUnder take movement the obtained score value of a;
S′(xT+1, a '): the target network in execution unit is in state xT+1Under take movement the obtained score value of a ';
r(xT, a): in state xTUnder take movement the obtained reward of a;
γ: reward decaying specific gravity;
(3) the uplink transmission time t of all intelligent terminalsmWith the minimum emissive power of each intelligent terminalState x as depth deterministic policy gradient methodT, movement a is then to state xTChange, be after change
The total losses of system can with one set standard value be compared, if than this standard value greatly if make currently to reward r (xT,a)
It is set as negative value, otherwise is set as positive value, simultaneity factor enters NextState xT+1;
The iterative process of depth deterministic policy gradient method are as follows:
Step 3.1: the execution unit in initialization depth deterministic policy gradient method, score unit and data base, when
Preceding system mode is xT, T is initialized as 1, and the number of iterations k is initialized as 1;
Step 3.2: when k is less than or equal to given the number of iterations K, in state xTUnder, execution unit predicts one and moves
Make a;
Step 3.3: a is to state x for movementTIt is modified, it is made to become NextState xT+1And obtain the prize that environment is fed back
Encourage r (xT,a);
Step 3.4: according to format (xT,a,r(xT,a),xT+1) historical experience is stored in data base;
Step 3.5: scoring unit reception acts a, state xtWith reward r (xT, a), score S (x is got to execution unitT,
a);
Step 3.6: execution unit constantly goes to maximize score S (x by updating inherent parametersT, a), allow as much as possible certainly
Oneself can make high score movement in next time;
Step 3.7: scoring unit extracts the historical experience in data base, constantly learns, and undated parameter oneself to beat
Point as far as possible accurate, while k=k+1, return to step 3.2;
Step 3.8: when k is greater than given the number of iterations K, learning process terminates, and obtains optimal uplink transmission time t*,mAnd optimal whole radio resource consumption
(4) it obtains giving a kind of decoding sequence πmUnder conditions of optimal uplink transmission time after, then propose algorithm
OptOrder-Algorithm sorts to find optimal decoding, namely finds global optimum's uplink transmission time, so that having complete
The minimum whole radio resource consumption of office;
The solution procedure of algorithm OptOrder-Algorithm is: setting intelligent terminal collection is combined into Iall={ g1A,g2A,…,
gIA, | Iall| indicate set IallBase, initialize current optional set Icur={ g1A,g2A,…,gIA, | Icur| indicate set
IcurBase, current optimal decoding sortsCurrent optimal solution CBV is a sufficiently large number, current test setFirstly, first time iterative process, from IcurIn successively select element to inject Icur,testIn, pass through calling
Algorithm P2-Algorithm finds out current optimal Icur,test, i.e., so that there is the I of current minimum whole radio resource consumptioncur ,test, update Icur, i.e., IallRemove Icur,testSet later is to Icur, while updating CBS, i.e., current optimal Icur ,testTo CBS;Then in second of iterative process, from current IcurIn successively select element to inject Icur,testIn (at this time
Icur,testOnly one element is inserted in the element left side or the right), by calling algorithm P2-Algorithm to find out currently most
Excellent Icur,test, i.e., so that there is the I of current minimum whole radio resource consumptioncur,test, update Icur, i.e., IallRemove Icur ,testSet later is to Icur, while updating CBS, i.e., current optimal Icur,testTo CBS;Every time from current IcurIn successively
An element is selected to inject Icur,testWhen, fixed I cannot be changedcur,testElement position arrangement in set, such iteration
To the last an iteration finds decoding the sequence CBS, global minima entirety radio resource consumption θ of global optimum*, the overall situation is most
Excellent uplink transmission time t*;
Finally, the θ of algorithm OptOrder-Algorithm output*Required global minima is whole in (P1-m) problem of representative
Radio resource consumption, global optimum uplink transmission time t to be asked in (P1-m) problem*。
Further, in the step (4), the solution procedure of algorithm OptOrder-Algorithm is as follows:
Step 4.1: setting Iall=Icur={ g1A,g2A,…,gIA},
Step 4.2: starting while circulation
Step 4.3: setting CBV is a sufficiently large number;
Step 4.4: starting for and recycle m=1:1:| Icur|;
Step 4.5: starting for and recycle h=0:1:| CBS |;
Step 4.6: setting
Step 4.7: if h=0, setting Icur,test={ Icur(m),CBS}
Step 4.8: else if h ≠ 0, sets Icur,test={ CBS (1:h), Icur(m),CBS(h+1:|CBS|)};
Step 4.9: obtaining Icur,testAfterwards, joint (2) and (3) depth deterministic policy gradient method calculates θ*,cur,test
And t*,m;
Step 4.10: if θ*,cur,test< CBV sets CBV=θ*,cur,test, t*=t*,m, concurrently set CBS=Icur ,test;
Step 4.11: as h=| CBS | when, for circulation of end step 4.5;
Step 4.12: working as m=| Icur| when, for circulation of end step 4.4;
Step 4.13: setting Icur=Iall\CBS;
Step 4.14: whenWhen, the while circulation of end step 4.2;
Step 4.15: output θ*=CBV and t*。
Technical concept of the invention are as follows: firstly, considering that mobile subscriber passes through nonopiate access skill in cellular radio networks
Art transmission data, which are realized, minimizes uplink transmission time and all mobile subscriber's total power consumptions to obtain certain economic benefit
And service quality.Here, the premise of consideration is the upload energy consumption and the limitation of uplink transmission time of mobile subscriber.It is protecting
Card is sent completely under conditions of all mobile user data amounts, is realized and is minimized whole radio resource consumption and all intelligent terminals
Total power consumption amount;Then algorithm OptOrder-Algorithm is proposed to find optimal decoding sequence, calculates the overall situation most
Excellent uplink transmission time and global minima entirety radio resource consumption.
Beneficial effects of the present invention are mainly manifested in: 1, generally speaking for uplink, significantly using nonopiate access technology
Improve system efficiency of transmission;2, more good wireless network generally speaking for uplink, is obtained by nonopiate access technology
Quality of experience;3, optimal uplink transmission time is obtained by depth deterministic policy gradient method, so that there is optimal entirety
Radio resource consumption (including uplink transmission time and all intelligent terminal total power consumptions).
Detailed description of the invention
Fig. 1 is the uplink schematic diagram of a scenario of multiple intelligent terminals and access hot spot in wireless network;
Fig. 2 is all ordering scenario schematic diagrames of 3 STs;
5 STs that Fig. 3 corresponds to algorithm OptOrder-Algorithm illustrate schematic diagram;
Fig. 4 is the method flow diagram for finding optimal uplink transmission time.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawing.
Referring to Fig.1, Fig. 2, Fig. 3 and Fig. 4, a kind of optimal decoding row of nonopiate access based on depth deterministic policy gradient
Sequence uplink transmission time optimization method, the condition for being sent completely all Intelligent terminal datas can be guaranteed at the same time by carrying out this method
Under, so that uplink transmission time and all intelligent terminal total power consumptions minimize, improve the wireless network experience of whole system
Quality.The present invention is applied to wireless network, in scene as shown in Figure 1.Include for optimization method of the target design to problem
Following steps:
(1) a total of I intelligent terminal under the coverage area of access hot spot, intelligent terminal setIt indicates, that is to say, that give one group of intelligent terminalJust there is I!Kind decoding sequence,
Intelligent terminal sends data to access hot spot simultaneously using nonopiate access technology, and wherein intelligent terminal i needs the data sent
Amount is usedIt indicates;
Guaranteeing to be sent completely the data volume of all intelligent terminals and is giving a kind of decoding sequence πm, wherein m=1,
2,…,I!Under conditions of, minimize uplink transmission time and all intelligent terminal total power consumptions optimization problem be described as
Optimization problem (P1-m) problem shown in lower:
0≤tm≤Tmax (1-3)
Variables:tm
Each variable in problem is done into an explanation below, as follows:
πm(i): giving definite decoding sequence πmUnder conditions of, the decoding order of intelligent terminal i;
α: the weight factor of uplink transmission time;
β: the weight factor of uplink total power consumption;
tm: intelligent terminal sends data to the uplink transmission time of access hot spot, and unit is the second;
It is about tmFunction, indicate m kind decode sequence πmIn the case where, intelligent terminal i is on given
Row transmission time tmInterior completion sends data volumeRequired minimum emissive power, unit are watts;
W: for intelligent terminal to the channel width of access hot spot, unit is hertz;
n0: the spectral power density of channel background noise;
giA: channel power gain of the intelligent terminal i to access hot spot;
Intelligent terminal i needs to be sent to the data volume of access hot spot, and unit is megabit;
Intelligent terminal i maximum uploads energy consumption, and unit is joule;
Tmax: intelligent terminal sends data to the maximum uplink transmission time of access hot spot, and unit is the second;
(P1-m) problem is in given intelligent terminal upload amountIn the case where find the smallest entirety and wirelessly provide
Source consumption (including uplink transmission time and all intelligent terminal total power consumptions), observation (P1-m) problem know its target
Only one variable of function t*,m;
(2) an optimal uplink transmission time is found by depth deterministic policy gradient method is denoted as t*,m, the depth
Deterministic policy gradient method is spent by execution unit, and scoring unit and environment are formed;When the uplink of all intelligent terminals
Between tmWith the minimum emissive power of each intelligent terminalIt is all compiled into state x needed for execution unitT, execute
Unit takes movement a to uplink transmission time t under current statemIt is modified and enters next state xT+1, obtain simultaneously
Reward r (the x that environment returnsT, a), score unit bonding state xT, act the reward r (x that a and environment returnT, a) to execution
Unit marking, that is, show execution unit in state xTUnder take movement a be it is bad, the target of execution unit is exactly to allow judge paper
Member make point the higher the better, and the target for the unit that scores is to allow oneself to get every time point all close to really, passing through reward r
(xT, a) adjust;In execution unit, score under unit and the continuous interactive refreshing of environment, tmIt will be constantly optimised whole until finding
The minimum value of body radio resource consumption, the update mode for the unit that scores are as follows:
S(xT, a)=r (xT,a)+γS′(xT+1,a′) (2-1)
Wherein, each parameter definition is as follows:
xT: in moment T, system status;
xT+1: in moment T+1, system status;
A: in the movement that current state execution unit is taken;
A ': in the movement that NextState execution unit is taken;
S(xT, a): the assessment network in execution unit is in state xTUnder take movement the obtained score value of a;
S′(xT+1, a '): the target network in execution unit is in state xT+1Under take movement the obtained score value of a ';
r(xT, a): in state xTUnder take movement the obtained reward of a;
γ: reward decaying specific gravity;
(3) the uplink transmission time t of all intelligent terminalsmWith the minimum emissive power of each intelligent terminal
State x as depth deterministic policy gradient methodT, movement a is then to state xTChange, the total losses of system after change
Can with one set standard value be compared, if than this standard value greatly if make currently to reward r (xT, it a) is set as negative value, instead
Be set as positive value, simultaneity factor enters NextState xT+1;
The iterative process of depth deterministic policy gradient method are as follows:
Step 3.1: the execution unit in initialization depth deterministic policy gradient method, score unit and data base, when
Preceding system mode is xT, T is initialized as 1, and the number of iterations k is initialized as 1;
Step 3.2: when k is less than or equal to given the number of iterations K, in state xTUnder, execution unit predicts one and moves
Make a;
Step 3.3: a is to state x for movementTIt is modified, it is made to become NextState xT+1And obtain the prize that environment is fed back
Encourage r (xT,a);
Step 3.4: according to format (xT,a,r(xT,a),xT+1) historical experience is stored in data base;
Step 3.5: scoring unit reception acts a, state xtWith reward r (xT, a), score S (x is got to execution unitT,
a);
Step 3.6: execution unit constantly goes to maximize score S (x by updating inherent parametersT, a), allow as much as possible certainly
Oneself can make high score movement in next time;
Step 3.7: scoring unit extracts the historical experience in data base, constantly learns, and undated parameter oneself to beat
Point as far as possible accurate, while k=k+1, return to step 3.2;
Step 3.8: when k is greater than given the number of iterations K, learning process terminates, and obtains optimal uplink transmission time t*,mAnd optimal whole radio resource consumption
(4) it obtains giving a kind of decoding sequence πmUnder conditions of optimal uplink transmission time after, then propose algorithm
OptOrder-Algorithm sorts to find optimal decoding, namely finds global optimum's uplink transmission time, so that having complete
The minimum whole radio resource consumption of office;
The solution procedure of algorithm OptOrder-Algorithm is: setting intelligent terminal collection is combined into Iall={ g1A,g2A,…,
gIA, | Iall| indicate set IallBase, initialize current optional set Icur={ g1A,g2A,…,gIA, | Icur| indicate set
IcurBase, current optimal decoding sortsCurrent optimal solution CBV is a sufficiently large number, current test setFirstly, first time iterative process, from IcurIn successively select element to inject Icur,testIn, pass through calling
Algorithm P2-Algorithm finds out current optimal Icur,test, i.e., so that there is the I of current minimum whole radio resource consumptioncur ,test, update Icur, i.e., IallRemove Icur,testSet later is to Icur, while updating CBS, i.e., current optimal Icur ,testTo CBS;Then in second of iterative process, from current IcurIn successively select element to inject Icur,testIn (at this time
Icur,testOnly one element is inserted in the element left side or the right), by calling algorithm P2-Algorithm to find out currently most
Excellent Icur,test, i.e., so that there is the I of current minimum whole radio resource consumptioncur,test, update Icur, i.e., IallRemove Icur ,testSet later is to Icur, while updating CBS, i.e., current optimal Icur,testTo CBS;Every time from current IcurIn successively
An element is selected to inject Icur,testWhen, fixed I cannot be changedcur,testElement position arrangement in set, such iteration
To the last an iteration finds decoding the sequence CBS, global minima entirety radio resource consumption θ of global optimum*, the overall situation is most
Excellent uplink transmission time t*;The solution procedure of algorithm OptOrder-Algorithm is as follows:
Step 4.1: setting Iall=Icur={ g1A,g2A,…,gIA},
Step 4.2: starting while circulation
Step 4.3: setting CBV is a sufficiently large number;
Step 4.4: starting for and recycle m=1:1:| Icur|;
Step 4.5: starting for and recycle h=0:1:| CBS |;
Step 4.6: setting
Step 4.7: if h=0, setting Icur,test={ Icur(m),CBS}
Step 4.8: else if h ≠ 0, sets Icur,test={ CBS (1:h), Icur(m),CBS(h+1:|CBS|)};
Step 4.9: obtaining Icur,testAfterwards, joint (2) and (3) depth deterministic policy gradient method calculates θ*,cur,test
And t*,m;
Step 4.10: if θ*,cur,test< CBV sets CBV=θ*,cur,test, t*=t*,m, concurrently set CBS=Icur ,test;
Step 4.11: as h=| CBS | when, for circulation of end step 4.5;
Step 4.12: working as m=| Icur| when, for circulation of end step 4.4;
Step 4.13: setting Icur=Iall\CBS;
Step 4.14: whenWhen, the while circulation of end step 4.2;
Step 4.15: output θ*=CBV and t*;
Finally, the θ of algorithm OptOrder-Algorithm output*Required global minima is whole in (P1-m) problem of representative
Radio resource consumption, global optimum uplink transmission time t to be asked in (P1-m) problem*。
Claims (1)
- The uplink transmission time optimization method 1. a kind of optimal decoding of nonopiate access based on depth deterministic policy gradient is sorted, It is characterized in that, the described method comprises the following steps:(1) a total of I intelligent terminal under the coverage area of access hot spot, intelligent terminal setTable Show, that is to say, that give one group of intelligent terminalJust there is I!Kind decoding sequence, intelligent terminal using it is non-just Access technology is handed over to send data to access hot spot simultaneously, the data volume that wherein intelligent terminal i needs to send is usedIt indicates;Guaranteeing to be sent completely the data volume of all intelligent terminals and is giving a kind of decoding sequence πm, wherein m=1,2 ..., I! Under conditions of, what the optimization problem description of minimum uplink transmission time and all intelligent terminal total power consumptions was as follows Optimization problem (P1-m) problem:0≤tm≤Tmax (1-3)Variables:tmmEach variable in problem is done into an explanation below, as follows:πm(i): giving definite decoding sequence πmUnder conditions of, the decoding order of intelligent terminal i;α: the weight factor of uplink transmission time;β: the weight factor of uplink total power consumption;tm: intelligent terminal sends data to the uplink transmission time of access hot spot, and unit is the second;It is about tmFunction, indicate m kind decode sequence πmIn the case where, intelligent terminal i is passed in given uplink Defeated time tmInterior completion sends data volumeRequired minimum emissive power, unit are watts;W: for intelligent terminal to the channel width of access hot spot, unit is hertz;n0: the spectral power density of channel background noise;giA: channel power gain of the intelligent terminal i to access hot spot;Intelligent terminal i needs to be sent to the data volume of access hot spot, and unit is megabit;Intelligent terminal i maximum uploads energy consumption, and unit is joule;Tmax: intelligent terminal sends data to the maximum uplink transmission time of access hot spot, and unit is the second;(P1-m) problem is in given intelligent terminal upload amountIn the case where find the smallest whole radio resource and disappear Consumption (including uplink transmission time and all intelligent terminal total power consumptions), observation (P1-m) problem know its objective function Only one variable t*, m;(2) an optimal uplink transmission time is found by depth deterministic policy gradient method is denoted as t*, m, the depth is true Qualitative Policy-Gradient method is made of execution unit, scoring unit and environment;The uplink transmission time t of all intelligent terminalsm With the minimum emissive power of each intelligent terminalIt is all compiled into state x needed for execution unitT, execute list Member takes movement a to uplink transmission time t under current statemIt is modified and enters next state xT+1, while obtaining ring Reward r (the x that border returnsT, a), score unit bonding state xT, act the reward r (x that a and environment returnT, a) executed list Member marking, that is, show execution unit in state xTUnder take movement a be it is bad, the target of execution unit be exactly allow scoring unit Make score the higher the better, and the target for the unit that scores is that oneself is allowed to get every time point all close to true, passes through reward r (xT, A) it adjusts;In execution unit, score under unit and the continuous interactive refreshing of environment, tmIt will be constantly optimised until finding whole nothing The minimum value of line resource consumption, the update mode for the unit that scores are as follows:S(xT, a)=r (xT, a)+γ S ' (xT+1, a ') and (2-1)Wherein, each parameter definition is as follows:xT: in moment T, system status;xT+1: in moment T+1, system status;A: in the movement that current state execution unit is taken;A ': in the movement that NextState execution unit is taken;S(xT, a): the assessment network in execution unit is in state xTUnder take movement the obtained score value of a;S′(xT+1, a '): the target network in execution unit is in state xT+1Under take movement the obtained score value of a ';r(xT, a): in state xTUnder take movement the obtained reward of a;γ: reward decaying specific gravity;(3) the uplink transmission time t of all intelligent terminalsmWith the minimum emissive power of each intelligent terminalAs The state x of depth deterministic policy gradient methodT, movement a is then to state xTChange, the total losses of system can be with after change One setting standard value be compared, if than this standard value greatly if make currently to reward r (xT, it a) is set as negative value, otherwise is set For positive value, simultaneity factor enters NextState xT+1;The iterative process of depth deterministic policy gradient method are as follows:Step 3.1: the execution unit in initialization depth deterministic policy gradient method, score unit and data base, current to be System state is xT, T is initialized as 1, and the number of iterations k is initialized as 1;Step 3.2: when k is less than or equal to given the number of iterations K, in state xTUnder, execution unit predicts a movement a;Step 3.3: a is to state x for movementTIt is modified, it is made to become NextState xT+1And obtain the reward r that environment is fed back (xT, a);Step 3.4: according to format (xT, a, r (xT, a), xT+1) historical experience is stored in data base;Step 3.5: scoring unit reception acts a, state xtWith reward r (xT, a), score S (x is got to execution unitT, a);Step 3.6: execution unit constantly goes to maximize score S (x by updating inherent parametersT, a), allow as much as possible oneself under It is secondary to make high score movement;Step 3.7: scoring unit extracts the historical experience in data base, constantly learns, and undated parameter makes score that oneself is made It is as accurate as possible, while k=k+1, return to step 3.2;Step 3.8: when k is greater than given the number of iterations K, learning process terminates, and obtains optimal uplink transmission time t*, m, and Optimal whole radio resource consumption(4) it obtains giving a kind of decoding sequence πmUnder conditions of optimal uplink transmission time after, then propose algorithm OptOrder-Algorithm sorts to find optimal decoding, namely finds global optimum's uplink transmission time, so that having complete The minimum whole radio resource consumption of office;The solution throughway of algorithm OptOrder-Algorithm is: setting intelligent terminal collection is combined into Iall={ g1A, g2A..., gIA, | Iall| indicate set IallBase, initialize current optional set Icur={ g1A, g2A..., gIA, | Icur| indicate set IcurBase, current optimal decoding sortsCurrent optimal solution CBV is a sufficiently large number, current test setFirstly, first time iterative process, from IcurIn successively select element to inject ICur, testIn, pass through calling Algorithm P2-Algorithm finds out current optimal ICur, test, i.e., so that there is the I of current minimum whole radio resource consumptioncur , test, update Icur, i.e., IallRemove ICur, testSet later is to Icur, while updating CBS, i.e., current optimal Icur , testTo CBS;Then in second of iterative process, from current IcurIn successively select element to inject ICur, testIn (at this time ICur, testOnly one element is inserted in the element left side or the right), by calling algorithm P2-Algorithm to find out currently most Excellent ICur, test, i.e., so that there is the I of current minimum whole radio resource consumptionCur, test, update Icur, i.e., IallRemove Icur , testSet later is to Icur, while updating CBS, i.e., current optimal ICur, testTo CBS;Every time from current IcurIn successively An element is selected to inject ICur, testWhen, fixed I cannot be changedCur, testElement position arrangement in set, such iteration To the last an iteration finds decoding the sequence CBS, global minima entirety radio resource consumption θ of global optimum*, the overall situation is most Excellent uplink transmission time t*;The solution procedure of algorithm OptOrder-Algorithm is as follows:Step 4.1: settingStep 4.2: starting while circulationStep 4.3: setting CBV is a sufficiently large number;Step 4.4: starting for and recycle m=1:1:| Icur|;Step 4.5: starting for and recycle h=0:1:| CBS |;Step 4.6: settingStep 4.7: if h=0, setting ICur, test={ Icur(m), CBS }Step 4.8: else if h ≠ 0, sets ICur, test={ CBS (1:h), Icur(m), CBS (h+1:| CBS |) };Step 4.9: obtaining ICur, testAfterwards, joint (2) and (3) depth deterministic policy gradient method calculates θ*, cur, testAnd t*, m;Step 4.10: if θ*, cur, test< CBV sets CBV=θ*, cur, test, t*=t*, m, concurrently set CBS=ICur, test;Step 4.11: as h=| CBS | when, for circulation of end step 4.5;Step 4.12: working as m=| Icur| when, for circulation of end step 4.4;Step 4.13: setting Icur=Iall\CBS;Step 4.14: whenWhen, the while circulation of end step 4.2;Step 4.15: output θ*=CBV and t*;Finally, the θ of algorithm OptOrder-Algorithm output*Required global minima is integrally wireless in (P1-m) problem of representative Resource consumption, global optimum uplink transmission time t to be asked in (P1-m) problem*。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810668879.5A CN108966325B (en) | 2018-06-25 | 2018-06-25 | Non-orthogonal access optimal decoding sorting uplink transmission time optimization method based on depth certainty strategy gradient |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810668879.5A CN108966325B (en) | 2018-06-25 | 2018-06-25 | Non-orthogonal access optimal decoding sorting uplink transmission time optimization method based on depth certainty strategy gradient |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108966325A true CN108966325A (en) | 2018-12-07 |
CN108966325B CN108966325B (en) | 2021-08-03 |
Family
ID=64486606
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810668879.5A Active CN108966325B (en) | 2018-06-25 | 2018-06-25 | Non-orthogonal access optimal decoding sorting uplink transmission time optimization method based on depth certainty strategy gradient |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108966325B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106385300A (en) * | 2016-08-31 | 2017-02-08 | 上海交通大学 | Uplink NOMA power distribution method based on dynamic decoding SIC receiver |
CN106788651A (en) * | 2017-01-22 | 2017-05-31 | 西安交通大学 | The information transferring method of many geographic area broadcast systems accessed based on non-orthogonal multiple |
US20180042021A1 (en) * | 2016-08-05 | 2018-02-08 | National Tsing Hua University | Method of power allocation and base station using the same |
-
2018
- 2018-06-25 CN CN201810668879.5A patent/CN108966325B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180042021A1 (en) * | 2016-08-05 | 2018-02-08 | National Tsing Hua University | Method of power allocation and base station using the same |
CN106385300A (en) * | 2016-08-31 | 2017-02-08 | 上海交通大学 | Uplink NOMA power distribution method based on dynamic decoding SIC receiver |
CN106788651A (en) * | 2017-01-22 | 2017-05-31 | 西安交通大学 | The information transferring method of many geographic area broadcast systems accessed based on non-orthogonal multiple |
Non-Patent Citations (1)
Title |
---|
ZHAOHUI YANG等: "On the Optimality of Power Allocation for NOMA Downlinks With Individual QoS Constraints", 《IEEE COMMUNICATIONS LETTERS》 * |
Also Published As
Publication number | Publication date |
---|---|
CN108966325B (en) | 2021-08-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | Slow adaptive OFDMA systems through chance constrained programming | |
Kim et al. | Sum throughput maximization for multi-user MIMO cognitive wireless powered communication networks | |
Di et al. | Optimal resource allocation in wireless powered communication networks with user cooperation | |
Zeng et al. | Downlink CSI feedback algorithm with deep transfer learning for FDD massive MIMO systems | |
CN108924935A (en) | A kind of power distribution method in NOMA based on nitrification enhancement power domain | |
CN101730109B (en) | Orthogonal frequency division multiple access relay system resource allocation method based on game theory | |
CN104640220A (en) | Frequency and power distributing method based on NOMA (non-orthogonal multiple access) system | |
Vu et al. | Spectral and energy efficiency maximization for content-centric C-RANs with edge caching | |
Ye et al. | Relay selections for cooperative underlay CR systems with energy harvesting | |
CN108064077B (en) | The power distribution method of full duplex D2D in cellular network | |
CN106231665B (en) | Resource allocation methods based on the switching of RRH dynamic mode in number energy integrated network | |
CN105813189B (en) | A kind of D2D distributed power optimization method in Cellular Networks | |
Salari et al. | Joint EH time allocation and distributed beamforming in interference-limited two-way networks with EH-based relays | |
Liang et al. | Joint user-channel assignment and power allocation for non-orthogonal multiple access relaying networks | |
Jalali et al. | Optimal resource allocation for MC-NOMA in SWIPT-enabled networks | |
Meng et al. | Sum-rate maximization in star-ris assisted rsma networks: A ppo-based algorithm | |
Yuan et al. | Latency-critical downlink multiple access: A hybrid approach and reliability maximization | |
CN103369658B (en) | The lower collaborative OFDMA system Poewr control method of safety of physical layer constraint | |
Rajawat et al. | Cross-layer design of coded multicast for wireless random access networks | |
CN108966325A (en) | A kind of optimal decoding sequence uplink transmission time optimization method of nonopiate access based on depth deterministic policy gradient | |
Zhang et al. | Joint subcarrier assignment and downlink-uplink time-power allocation for wireless powered OFDM-NOMA systems | |
CN107182116A (en) | Interference control method based on power distribution in Full-duplex cellular GSM | |
CN110225533A (en) | NB-IoT wireless energy distribution method, device, computer equipment and storage medium | |
Muhammad et al. | Optimizing information freshness leveraging multi-RISs in NOMA-based IoT networks | |
Lyu et al. | Non-orthogonal multiple access in wireless powered communication networks with SIC constraints |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |