CN108966325A - A kind of optimal decoding sequence uplink transmission time optimization method of nonopiate access based on depth deterministic policy gradient - Google Patents

A kind of optimal decoding sequence uplink transmission time optimization method of nonopiate access based on depth deterministic policy gradient Download PDF

Info

Publication number
CN108966325A
CN108966325A CN201810668879.5A CN201810668879A CN108966325A CN 108966325 A CN108966325 A CN 108966325A CN 201810668879 A CN201810668879 A CN 201810668879A CN 108966325 A CN108966325 A CN 108966325A
Authority
CN
China
Prior art keywords
cur
test
transmission time
uplink transmission
intelligent terminal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810668879.5A
Other languages
Chinese (zh)
Other versions
CN108966325B (en
Inventor
吴远
张�成
倪克杰
石佳俊
钱丽萍
黄亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201810668879.5A priority Critical patent/CN108966325B/en
Publication of CN108966325A publication Critical patent/CN108966325A/en
Application granted granted Critical
Publication of CN108966325B publication Critical patent/CN108966325B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/02Power saving arrangements
    • H04W52/0209Power saving arrangements in terminal devices
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/02Power saving arrangements
    • H04W52/0209Power saving arrangements in terminal devices
    • H04W52/0212Power saving arrangements in terminal devices managed by the network, e.g. network or access point is master and terminal is slave
    • H04W52/0219Power saving arrangements in terminal devices managed by the network, e.g. network or access point is master and terminal is slave where the power saving management affects multiple terminals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/18TPC being performed according to specific parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/14Spectrum sharing arrangements between different networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The optimal decoding sequence uplink transmission time optimization method of a kind of nonopiate access based on depth deterministic policy gradient, comprising the following steps: (1) giving definite decoding sequence πmUnder conditions of, optimization problem is described as nonconvex property optimization problem;(P1-m) problem is in given intelligent terminal upload amountIn the case where find optimal whole radio resource consumption, observation (P1-m) problem knows only one variable of its objective function;(2) and (3) find optimal uplink transmission time by depth deterministic policy method, so that there is optimal whole radio resource consumption;(4) optimal decoding is found using algorithm OptOrder-Algorithm to sort, then combined depth nitrification enhancement, finally export global minima entirety radio resource consumption and global optimum's uplink transmission time.The present invention improves system efficiency of transmission, more good wireless network Quality of experience is obtained, so that there is optimal whole radio resource consumption.

Description

In a kind of optimal decoding sequence of nonopiate access based on depth deterministic policy gradient Row transmission time optimization method
Technical field
The invention belongs to the communications field, a kind of optimal decoding sequence of nonopiate access based on depth deterministic policy gradient Uplink transmission time optimization method.
Background technique
The extensive connection for adapting to Internet of Things (Internet of Thing, IoT) application has been considered as following 5G honeycomb system The important goal of system.Non-orthogonal multiple access (Non-orthogonal Multiple Access, NOMA) makes one group of intelligence eventually End (Smart Terminal, ST) can share identical spectral channel simultaneously and be transmitted, to realize that spectrum efficient data pass Defeated target provides a kind of effective method.It is contemplated that the uplink in wireless network is transmitted, wherein intelligent terminal (example Such as smartwatch) using NOMA technology send their data to access hot spot.We are intended to reduce to the maximum extent whole wireless Consumed resource, including uplink transmission time and uplink gross energy.
Summary of the invention
The uplink transmission time of the prior art is longer, the biggish deficiency of intelligent terminal energy consumption in order to overcome, the present invention There is provided a kind of minimum uplink transmission time and all intelligent terminal total power consumptions based on depth deterministic policy gradient The nonopiate optimal decoding sequence uplink transmission time optimization method of access, the present invention difficult point excessive for uplink transmission time, Primary concern is that transmitting data using nonopiate access technology, have studied a kind of based on the non-of depth deterministic policy gradient The orthogonal optimal decoding sequence uplink transmission time optimization method of access.
The technical solution adopted by the present invention to solve the technical problems is:
A kind of nonopiate access based on depth deterministic policy gradient optimal decoding sequence uplink transmission time optimization side Method, comprising the following steps:
(1) a total of I intelligent terminal under the coverage area of access hot spot, intelligent terminal setIt indicates, that is to say, that give one group of intelligent terminalJust there is I!Kind decoding sequence, Intelligent terminal sends data to access hot spot simultaneously using nonopiate access technology, and wherein intelligent terminal i needs the data sent Amount is usedIt indicates;
Guaranteeing to be sent completely the data volume of all intelligent terminals and is giving a kind of decoding sequence πm, wherein m=1, 2,…,I!Under conditions of, minimize uplink transmission time and all intelligent terminal total power consumptions optimization problem be described as Optimization problem (P1-m) problem shown in lower:
0≤tm≤Tmax (1-3)
Variables:tm
Each variable in problem is done into an explanation below, as follows:
πm(i): giving definite decoding sequence πmUnder conditions of, the decoding order of intelligent terminal i;
α: the weight factor of uplink transmission time;
β: the weight factor of uplink total power consumption;
tm: intelligent terminal sends data to the uplink transmission time of access hot spot, and unit is the second;
It is about tmFunction, indicate m kind decode sequence πmIn the case where, intelligent terminal i is on given Row transmission time tmInterior completion sends data volumeRequired minimum emissive power, unit are watts;
W: for intelligent terminal to the channel width of access hot spot, unit is hertz;
n0: the spectral power density of channel background noise;
giA: channel power gain of the intelligent terminal i to access hot spot;
Intelligent terminal i needs to be sent to the data volume of access hot spot, and unit is megabit;
Intelligent terminal i maximum uploads energy consumption, and unit is joule;
Tmax: intelligent terminal sends data to the maximum uplink transmission time of access hot spot, and unit is the second;
(P1-m) problem is in given intelligent terminal upload amountIn the case where find the smallest entirety and wirelessly provide Source consumption (including uplink transmission time and all intelligent terminal total power consumptions), observation (P1-m) problem know its target Only one variable of function t*,m
(2) an optimal uplink transmission time is found by depth deterministic policy gradient method is denoted as t*,m, the depth Deterministic policy gradient method is spent by execution unit, and scoring unit and environment are formed;When the uplink of all intelligent terminals Between tmWith the minimum emissive power of each intelligent terminalIt is all compiled into state x needed for execution unitT, execute Unit takes movement a to uplink transmission time t under current statemIt is modified and enters next state xT+1, obtain simultaneously Reward r (the x that environment returnsT, a), score unit bonding state xT, act the reward r (x that a and environment returnT, a) to execution Unit marking, that is, show execution unit in state xTUnder take movement a be it is bad, the target of execution unit is exactly to allow judge paper Member make point the higher the better, and the target for the unit that scores is to allow oneself to get every time point all close to really, passing through reward r (xT, a) adjust;In execution unit, score under unit and the continuous interactive refreshing of environment, tmIt will be constantly optimised whole until finding The minimum value of body radio resource consumption, the update mode for the unit that scores are as follows:
S(xT, a)=r (xT,a)+γS′(xT+1,a′) (2-1)
Wherein, each parameter definition is as follows:
xT: in moment T, system status;
xT+1: in moment T+1, system status;
A: in the movement that current state execution unit is taken;
A ': in the movement that NextState execution unit is taken;
S(xT, a): the assessment network in execution unit is in state xTUnder take movement the obtained score value of a;
S′(xT+1, a '): the target network in execution unit is in state xT+1Under take movement the obtained score value of a ';
r(xT, a): in state xTUnder take movement the obtained reward of a;
γ: reward decaying specific gravity;
(3) the uplink transmission time t of all intelligent terminalsmWith the minimum emissive power of each intelligent terminalState x as depth deterministic policy gradient methodT, movement a is then to state xTChange, be after change The total losses of system can with one set standard value be compared, if than this standard value greatly if make currently to reward r (xT,a) It is set as negative value, otherwise is set as positive value, simultaneity factor enters NextState xT+1
The iterative process of depth deterministic policy gradient method are as follows:
Step 3.1: the execution unit in initialization depth deterministic policy gradient method, score unit and data base, when Preceding system mode is xT, T is initialized as 1, and the number of iterations k is initialized as 1;
Step 3.2: when k is less than or equal to given the number of iterations K, in state xTUnder, execution unit predicts one and moves Make a;
Step 3.3: a is to state x for movementTIt is modified, it is made to become NextState xT+1And obtain the prize that environment is fed back Encourage r (xT,a);
Step 3.4: according to format (xT,a,r(xT,a),xT+1) historical experience is stored in data base;
Step 3.5: scoring unit reception acts a, state xtWith reward r (xT, a), score S (x is got to execution unitT, a);
Step 3.6: execution unit constantly goes to maximize score S (x by updating inherent parametersT, a), allow as much as possible certainly Oneself can make high score movement in next time;
Step 3.7: scoring unit extracts the historical experience in data base, constantly learns, and undated parameter oneself to beat Point as far as possible accurate, while k=k+1, return to step 3.2;
Step 3.8: when k is greater than given the number of iterations K, learning process terminates, and obtains optimal uplink transmission time t*,mAnd optimal whole radio resource consumption
(4) it obtains giving a kind of decoding sequence πmUnder conditions of optimal uplink transmission time after, then propose algorithm OptOrder-Algorithm sorts to find optimal decoding, namely finds global optimum's uplink transmission time, so that having complete The minimum whole radio resource consumption of office;
The solution procedure of algorithm OptOrder-Algorithm is: setting intelligent terminal collection is combined into Iall={ g1A,g2A,…, gIA, | Iall| indicate set IallBase, initialize current optional set Icur={ g1A,g2A,…,gIA, | Icur| indicate set IcurBase, current optimal decoding sortsCurrent optimal solution CBV is a sufficiently large number, current test setFirstly, first time iterative process, from IcurIn successively select element to inject Icur,testIn, pass through calling Algorithm P2-Algorithm finds out current optimal Icur,test, i.e., so that there is the I of current minimum whole radio resource consumptioncur ,test, update Icur, i.e., IallRemove Icur,testSet later is to Icur, while updating CBS, i.e., current optimal Icur ,testTo CBS;Then in second of iterative process, from current IcurIn successively select element to inject Icur,testIn (at this time Icur,testOnly one element is inserted in the element left side or the right), by calling algorithm P2-Algorithm to find out currently most Excellent Icur,test, i.e., so that there is the I of current minimum whole radio resource consumptioncur,test, update Icur, i.e., IallRemove Icur ,testSet later is to Icur, while updating CBS, i.e., current optimal Icur,testTo CBS;Every time from current IcurIn successively An element is selected to inject Icur,testWhen, fixed I cannot be changedcur,testElement position arrangement in set, such iteration To the last an iteration finds decoding the sequence CBS, global minima entirety radio resource consumption θ of global optimum*, the overall situation is most Excellent uplink transmission time t*
Finally, the θ of algorithm OptOrder-Algorithm output*Required global minima is whole in (P1-m) problem of representative Radio resource consumption, global optimum uplink transmission time t to be asked in (P1-m) problem*
Further, in the step (4), the solution procedure of algorithm OptOrder-Algorithm is as follows:
Step 4.1: setting Iall=Icur={ g1A,g2A,…,gIA},
Step 4.2: starting while circulation
Step 4.3: setting CBV is a sufficiently large number;
Step 4.4: starting for and recycle m=1:1:| Icur|;
Step 4.5: starting for and recycle h=0:1:| CBS |;
Step 4.6: setting
Step 4.7: if h=0, setting Icur,test={ Icur(m),CBS}
Step 4.8: else if h ≠ 0, sets Icur,test={ CBS (1:h), Icur(m),CBS(h+1:|CBS|)};
Step 4.9: obtaining Icur,testAfterwards, joint (2) and (3) depth deterministic policy gradient method calculates θ*,cur,test And t*,m
Step 4.10: if θ*,cur,test< CBV sets CBV=θ*,cur,test, t*=t*,m, concurrently set CBS=Icur ,test
Step 4.11: as h=| CBS | when, for circulation of end step 4.5;
Step 4.12: working as m=| Icur| when, for circulation of end step 4.4;
Step 4.13: setting Icur=Iall\CBS;
Step 4.14: whenWhen, the while circulation of end step 4.2;
Step 4.15: output θ*=CBV and t*
Technical concept of the invention are as follows: firstly, considering that mobile subscriber passes through nonopiate access skill in cellular radio networks Art transmission data, which are realized, minimizes uplink transmission time and all mobile subscriber's total power consumptions to obtain certain economic benefit And service quality.Here, the premise of consideration is the upload energy consumption and the limitation of uplink transmission time of mobile subscriber.It is protecting Card is sent completely under conditions of all mobile user data amounts, is realized and is minimized whole radio resource consumption and all intelligent terminals Total power consumption amount;Then algorithm OptOrder-Algorithm is proposed to find optimal decoding sequence, calculates the overall situation most Excellent uplink transmission time and global minima entirety radio resource consumption.
Beneficial effects of the present invention are mainly manifested in: 1, generally speaking for uplink, significantly using nonopiate access technology Improve system efficiency of transmission;2, more good wireless network generally speaking for uplink, is obtained by nonopiate access technology Quality of experience;3, optimal uplink transmission time is obtained by depth deterministic policy gradient method, so that there is optimal entirety Radio resource consumption (including uplink transmission time and all intelligent terminal total power consumptions).
Detailed description of the invention
Fig. 1 is the uplink schematic diagram of a scenario of multiple intelligent terminals and access hot spot in wireless network;
Fig. 2 is all ordering scenario schematic diagrames of 3 STs;
5 STs that Fig. 3 corresponds to algorithm OptOrder-Algorithm illustrate schematic diagram;
Fig. 4 is the method flow diagram for finding optimal uplink transmission time.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawing.
Referring to Fig.1, Fig. 2, Fig. 3 and Fig. 4, a kind of optimal decoding row of nonopiate access based on depth deterministic policy gradient Sequence uplink transmission time optimization method, the condition for being sent completely all Intelligent terminal datas can be guaranteed at the same time by carrying out this method Under, so that uplink transmission time and all intelligent terminal total power consumptions minimize, improve the wireless network experience of whole system Quality.The present invention is applied to wireless network, in scene as shown in Figure 1.Include for optimization method of the target design to problem Following steps:
(1) a total of I intelligent terminal under the coverage area of access hot spot, intelligent terminal setIt indicates, that is to say, that give one group of intelligent terminalJust there is I!Kind decoding sequence, Intelligent terminal sends data to access hot spot simultaneously using nonopiate access technology, and wherein intelligent terminal i needs the data sent Amount is usedIt indicates;
Guaranteeing to be sent completely the data volume of all intelligent terminals and is giving a kind of decoding sequence πm, wherein m=1, 2,…,I!Under conditions of, minimize uplink transmission time and all intelligent terminal total power consumptions optimization problem be described as Optimization problem (P1-m) problem shown in lower:
0≤tm≤Tmax (1-3)
Variables:tm
Each variable in problem is done into an explanation below, as follows:
πm(i): giving definite decoding sequence πmUnder conditions of, the decoding order of intelligent terminal i;
α: the weight factor of uplink transmission time;
β: the weight factor of uplink total power consumption;
tm: intelligent terminal sends data to the uplink transmission time of access hot spot, and unit is the second;
It is about tmFunction, indicate m kind decode sequence πmIn the case where, intelligent terminal i is on given Row transmission time tmInterior completion sends data volumeRequired minimum emissive power, unit are watts;
W: for intelligent terminal to the channel width of access hot spot, unit is hertz;
n0: the spectral power density of channel background noise;
giA: channel power gain of the intelligent terminal i to access hot spot;
Intelligent terminal i needs to be sent to the data volume of access hot spot, and unit is megabit;
Intelligent terminal i maximum uploads energy consumption, and unit is joule;
Tmax: intelligent terminal sends data to the maximum uplink transmission time of access hot spot, and unit is the second;
(P1-m) problem is in given intelligent terminal upload amountIn the case where find the smallest entirety and wirelessly provide Source consumption (including uplink transmission time and all intelligent terminal total power consumptions), observation (P1-m) problem know its target Only one variable of function t*,m
(2) an optimal uplink transmission time is found by depth deterministic policy gradient method is denoted as t*,m, the depth Deterministic policy gradient method is spent by execution unit, and scoring unit and environment are formed;When the uplink of all intelligent terminals Between tmWith the minimum emissive power of each intelligent terminalIt is all compiled into state x needed for execution unitT, execute Unit takes movement a to uplink transmission time t under current statemIt is modified and enters next state xT+1, obtain simultaneously Reward r (the x that environment returnsT, a), score unit bonding state xT, act the reward r (x that a and environment returnT, a) to execution Unit marking, that is, show execution unit in state xTUnder take movement a be it is bad, the target of execution unit is exactly to allow judge paper Member make point the higher the better, and the target for the unit that scores is to allow oneself to get every time point all close to really, passing through reward r (xT, a) adjust;In execution unit, score under unit and the continuous interactive refreshing of environment, tmIt will be constantly optimised whole until finding The minimum value of body radio resource consumption, the update mode for the unit that scores are as follows:
S(xT, a)=r (xT,a)+γS′(xT+1,a′) (2-1)
Wherein, each parameter definition is as follows:
xT: in moment T, system status;
xT+1: in moment T+1, system status;
A: in the movement that current state execution unit is taken;
A ': in the movement that NextState execution unit is taken;
S(xT, a): the assessment network in execution unit is in state xTUnder take movement the obtained score value of a;
S′(xT+1, a '): the target network in execution unit is in state xT+1Under take movement the obtained score value of a ';
r(xT, a): in state xTUnder take movement the obtained reward of a;
γ: reward decaying specific gravity;
(3) the uplink transmission time t of all intelligent terminalsmWith the minimum emissive power of each intelligent terminal State x as depth deterministic policy gradient methodT, movement a is then to state xTChange, the total losses of system after change Can with one set standard value be compared, if than this standard value greatly if make currently to reward r (xT, it a) is set as negative value, instead Be set as positive value, simultaneity factor enters NextState xT+1
The iterative process of depth deterministic policy gradient method are as follows:
Step 3.1: the execution unit in initialization depth deterministic policy gradient method, score unit and data base, when Preceding system mode is xT, T is initialized as 1, and the number of iterations k is initialized as 1;
Step 3.2: when k is less than or equal to given the number of iterations K, in state xTUnder, execution unit predicts one and moves Make a;
Step 3.3: a is to state x for movementTIt is modified, it is made to become NextState xT+1And obtain the prize that environment is fed back Encourage r (xT,a);
Step 3.4: according to format (xT,a,r(xT,a),xT+1) historical experience is stored in data base;
Step 3.5: scoring unit reception acts a, state xtWith reward r (xT, a), score S (x is got to execution unitT, a);
Step 3.6: execution unit constantly goes to maximize score S (x by updating inherent parametersT, a), allow as much as possible certainly Oneself can make high score movement in next time;
Step 3.7: scoring unit extracts the historical experience in data base, constantly learns, and undated parameter oneself to beat Point as far as possible accurate, while k=k+1, return to step 3.2;
Step 3.8: when k is greater than given the number of iterations K, learning process terminates, and obtains optimal uplink transmission time t*,mAnd optimal whole radio resource consumption
(4) it obtains giving a kind of decoding sequence πmUnder conditions of optimal uplink transmission time after, then propose algorithm OptOrder-Algorithm sorts to find optimal decoding, namely finds global optimum's uplink transmission time, so that having complete The minimum whole radio resource consumption of office;
The solution procedure of algorithm OptOrder-Algorithm is: setting intelligent terminal collection is combined into Iall={ g1A,g2A,…, gIA, | Iall| indicate set IallBase, initialize current optional set Icur={ g1A,g2A,…,gIA, | Icur| indicate set IcurBase, current optimal decoding sortsCurrent optimal solution CBV is a sufficiently large number, current test setFirstly, first time iterative process, from IcurIn successively select element to inject Icur,testIn, pass through calling Algorithm P2-Algorithm finds out current optimal Icur,test, i.e., so that there is the I of current minimum whole radio resource consumptioncur ,test, update Icur, i.e., IallRemove Icur,testSet later is to Icur, while updating CBS, i.e., current optimal Icur ,testTo CBS;Then in second of iterative process, from current IcurIn successively select element to inject Icur,testIn (at this time Icur,testOnly one element is inserted in the element left side or the right), by calling algorithm P2-Algorithm to find out currently most Excellent Icur,test, i.e., so that there is the I of current minimum whole radio resource consumptioncur,test, update Icur, i.e., IallRemove Icur ,testSet later is to Icur, while updating CBS, i.e., current optimal Icur,testTo CBS;Every time from current IcurIn successively An element is selected to inject Icur,testWhen, fixed I cannot be changedcur,testElement position arrangement in set, such iteration To the last an iteration finds decoding the sequence CBS, global minima entirety radio resource consumption θ of global optimum*, the overall situation is most Excellent uplink transmission time t*;The solution procedure of algorithm OptOrder-Algorithm is as follows:
Step 4.1: setting Iall=Icur={ g1A,g2A,…,gIA},
Step 4.2: starting while circulation
Step 4.3: setting CBV is a sufficiently large number;
Step 4.4: starting for and recycle m=1:1:| Icur|;
Step 4.5: starting for and recycle h=0:1:| CBS |;
Step 4.6: setting
Step 4.7: if h=0, setting Icur,test={ Icur(m),CBS}
Step 4.8: else if h ≠ 0, sets Icur,test={ CBS (1:h), Icur(m),CBS(h+1:|CBS|)};
Step 4.9: obtaining Icur,testAfterwards, joint (2) and (3) depth deterministic policy gradient method calculates θ*,cur,test And t*,m
Step 4.10: if θ*,cur,test< CBV sets CBV=θ*,cur,test, t*=t*,m, concurrently set CBS=Icur ,test
Step 4.11: as h=| CBS | when, for circulation of end step 4.5;
Step 4.12: working as m=| Icur| when, for circulation of end step 4.4;
Step 4.13: setting Icur=Iall\CBS;
Step 4.14: whenWhen, the while circulation of end step 4.2;
Step 4.15: output θ*=CBV and t*
Finally, the θ of algorithm OptOrder-Algorithm output*Required global minima is whole in (P1-m) problem of representative Radio resource consumption, global optimum uplink transmission time t to be asked in (P1-m) problem*

Claims (1)

  1. The uplink transmission time optimization method 1. a kind of optimal decoding of nonopiate access based on depth deterministic policy gradient is sorted, It is characterized in that, the described method comprises the following steps:
    (1) a total of I intelligent terminal under the coverage area of access hot spot, intelligent terminal setTable Show, that is to say, that give one group of intelligent terminalJust there is I!Kind decoding sequence, intelligent terminal using it is non-just Access technology is handed over to send data to access hot spot simultaneously, the data volume that wherein intelligent terminal i needs to send is usedIt indicates;
    Guaranteeing to be sent completely the data volume of all intelligent terminals and is giving a kind of decoding sequence πm, wherein m=1,2 ..., I! Under conditions of, what the optimization problem description of minimum uplink transmission time and all intelligent terminal total power consumptions was as follows Optimization problem (P1-m) problem:
    0≤tm≤Tmax (1-3)
    Variables:tmm
    Each variable in problem is done into an explanation below, as follows:
    πm(i): giving definite decoding sequence πmUnder conditions of, the decoding order of intelligent terminal i;
    α: the weight factor of uplink transmission time;
    β: the weight factor of uplink total power consumption;
    tm: intelligent terminal sends data to the uplink transmission time of access hot spot, and unit is the second;
    It is about tmFunction, indicate m kind decode sequence πmIn the case where, intelligent terminal i is passed in given uplink Defeated time tmInterior completion sends data volumeRequired minimum emissive power, unit are watts;
    W: for intelligent terminal to the channel width of access hot spot, unit is hertz;
    n0: the spectral power density of channel background noise;
    giA: channel power gain of the intelligent terminal i to access hot spot;
    Intelligent terminal i needs to be sent to the data volume of access hot spot, and unit is megabit;
    Intelligent terminal i maximum uploads energy consumption, and unit is joule;
    Tmax: intelligent terminal sends data to the maximum uplink transmission time of access hot spot, and unit is the second;
    (P1-m) problem is in given intelligent terminal upload amountIn the case where find the smallest whole radio resource and disappear Consumption (including uplink transmission time and all intelligent terminal total power consumptions), observation (P1-m) problem know its objective function Only one variable t*, m
    (2) an optimal uplink transmission time is found by depth deterministic policy gradient method is denoted as t*, m, the depth is true Qualitative Policy-Gradient method is made of execution unit, scoring unit and environment;The uplink transmission time t of all intelligent terminalsm With the minimum emissive power of each intelligent terminalIt is all compiled into state x needed for execution unitT, execute list Member takes movement a to uplink transmission time t under current statemIt is modified and enters next state xT+1, while obtaining ring Reward r (the x that border returnsT, a), score unit bonding state xT, act the reward r (x that a and environment returnT, a) executed list Member marking, that is, show execution unit in state xTUnder take movement a be it is bad, the target of execution unit be exactly allow scoring unit Make score the higher the better, and the target for the unit that scores is that oneself is allowed to get every time point all close to true, passes through reward r (xT, A) it adjusts;In execution unit, score under unit and the continuous interactive refreshing of environment, tmIt will be constantly optimised until finding whole nothing The minimum value of line resource consumption, the update mode for the unit that scores are as follows:
    S(xT, a)=r (xT, a)+γ S ' (xT+1, a ') and (2-1)
    Wherein, each parameter definition is as follows:
    xT: in moment T, system status;
    xT+1: in moment T+1, system status;
    A: in the movement that current state execution unit is taken;
    A ': in the movement that NextState execution unit is taken;
    S(xT, a): the assessment network in execution unit is in state xTUnder take movement the obtained score value of a;
    S′(xT+1, a '): the target network in execution unit is in state xT+1Under take movement the obtained score value of a ';
    r(xT, a): in state xTUnder take movement the obtained reward of a;
    γ: reward decaying specific gravity;
    (3) the uplink transmission time t of all intelligent terminalsmWith the minimum emissive power of each intelligent terminalAs The state x of depth deterministic policy gradient methodT, movement a is then to state xTChange, the total losses of system can be with after change One setting standard value be compared, if than this standard value greatly if make currently to reward r (xT, it a) is set as negative value, otherwise is set For positive value, simultaneity factor enters NextState xT+1
    The iterative process of depth deterministic policy gradient method are as follows:
    Step 3.1: the execution unit in initialization depth deterministic policy gradient method, score unit and data base, current to be System state is xT, T is initialized as 1, and the number of iterations k is initialized as 1;
    Step 3.2: when k is less than or equal to given the number of iterations K, in state xTUnder, execution unit predicts a movement a;
    Step 3.3: a is to state x for movementTIt is modified, it is made to become NextState xT+1And obtain the reward r that environment is fed back (xT, a);
    Step 3.4: according to format (xT, a, r (xT, a), xT+1) historical experience is stored in data base;
    Step 3.5: scoring unit reception acts a, state xtWith reward r (xT, a), score S (x is got to execution unitT, a);
    Step 3.6: execution unit constantly goes to maximize score S (x by updating inherent parametersT, a), allow as much as possible oneself under It is secondary to make high score movement;
    Step 3.7: scoring unit extracts the historical experience in data base, constantly learns, and undated parameter makes score that oneself is made It is as accurate as possible, while k=k+1, return to step 3.2;
    Step 3.8: when k is greater than given the number of iterations K, learning process terminates, and obtains optimal uplink transmission time t*, m, and Optimal whole radio resource consumption
    (4) it obtains giving a kind of decoding sequence πmUnder conditions of optimal uplink transmission time after, then propose algorithm OptOrder-Algorithm sorts to find optimal decoding, namely finds global optimum's uplink transmission time, so that having complete The minimum whole radio resource consumption of office;
    The solution throughway of algorithm OptOrder-Algorithm is: setting intelligent terminal collection is combined into Iall={ g1A, g2A..., gIA, | Iall| indicate set IallBase, initialize current optional set Icur={ g1A, g2A..., gIA, | Icur| indicate set IcurBase, current optimal decoding sortsCurrent optimal solution CBV is a sufficiently large number, current test setFirstly, first time iterative process, from IcurIn successively select element to inject ICur, testIn, pass through calling Algorithm P2-Algorithm finds out current optimal ICur, test, i.e., so that there is the I of current minimum whole radio resource consumptioncur , test, update Icur, i.e., IallRemove ICur, testSet later is to Icur, while updating CBS, i.e., current optimal Icur , testTo CBS;Then in second of iterative process, from current IcurIn successively select element to inject ICur, testIn (at this time ICur, testOnly one element is inserted in the element left side or the right), by calling algorithm P2-Algorithm to find out currently most Excellent ICur, test, i.e., so that there is the I of current minimum whole radio resource consumptionCur, test, update Icur, i.e., IallRemove Icur , testSet later is to Icur, while updating CBS, i.e., current optimal ICur, testTo CBS;Every time from current IcurIn successively An element is selected to inject ICur, testWhen, fixed I cannot be changedCur, testElement position arrangement in set, such iteration To the last an iteration finds decoding the sequence CBS, global minima entirety radio resource consumption θ of global optimum*, the overall situation is most Excellent uplink transmission time t*;The solution procedure of algorithm OptOrder-Algorithm is as follows:
    Step 4.1: setting
    Step 4.2: starting while circulation
    Step 4.3: setting CBV is a sufficiently large number;
    Step 4.4: starting for and recycle m=1:1:| Icur|;
    Step 4.5: starting for and recycle h=0:1:| CBS |;
    Step 4.6: setting
    Step 4.7: if h=0, setting ICur, test={ Icur(m), CBS }
    Step 4.8: else if h ≠ 0, sets ICur, test={ CBS (1:h), Icur(m), CBS (h+1:| CBS |) };
    Step 4.9: obtaining ICur, testAfterwards, joint (2) and (3) depth deterministic policy gradient method calculates θ*, cur, testAnd t*, m
    Step 4.10: if θ*, cur, test< CBV sets CBV=θ*, cur, test, t*=t*, m, concurrently set CBS=ICur, test
    Step 4.11: as h=| CBS | when, for circulation of end step 4.5;
    Step 4.12: working as m=| Icur| when, for circulation of end step 4.4;
    Step 4.13: setting Icur=Iall\CBS;
    Step 4.14: whenWhen, the while circulation of end step 4.2;
    Step 4.15: output θ*=CBV and t*
    Finally, the θ of algorithm OptOrder-Algorithm output*Required global minima is integrally wireless in (P1-m) problem of representative Resource consumption, global optimum uplink transmission time t to be asked in (P1-m) problem*
CN201810668879.5A 2018-06-25 2018-06-25 Non-orthogonal access optimal decoding sorting uplink transmission time optimization method based on depth certainty strategy gradient Active CN108966325B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810668879.5A CN108966325B (en) 2018-06-25 2018-06-25 Non-orthogonal access optimal decoding sorting uplink transmission time optimization method based on depth certainty strategy gradient

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810668879.5A CN108966325B (en) 2018-06-25 2018-06-25 Non-orthogonal access optimal decoding sorting uplink transmission time optimization method based on depth certainty strategy gradient

Publications (2)

Publication Number Publication Date
CN108966325A true CN108966325A (en) 2018-12-07
CN108966325B CN108966325B (en) 2021-08-03

Family

ID=64486606

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810668879.5A Active CN108966325B (en) 2018-06-25 2018-06-25 Non-orthogonal access optimal decoding sorting uplink transmission time optimization method based on depth certainty strategy gradient

Country Status (1)

Country Link
CN (1) CN108966325B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106385300A (en) * 2016-08-31 2017-02-08 上海交通大学 Uplink NOMA power distribution method based on dynamic decoding SIC receiver
CN106788651A (en) * 2017-01-22 2017-05-31 西安交通大学 The information transferring method of many geographic area broadcast systems accessed based on non-orthogonal multiple
US20180042021A1 (en) * 2016-08-05 2018-02-08 National Tsing Hua University Method of power allocation and base station using the same

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180042021A1 (en) * 2016-08-05 2018-02-08 National Tsing Hua University Method of power allocation and base station using the same
CN106385300A (en) * 2016-08-31 2017-02-08 上海交通大学 Uplink NOMA power distribution method based on dynamic decoding SIC receiver
CN106788651A (en) * 2017-01-22 2017-05-31 西安交通大学 The information transferring method of many geographic area broadcast systems accessed based on non-orthogonal multiple

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHAOHUI YANG等: "On the Optimality of Power Allocation for NOMA Downlinks With Individual QoS Constraints", 《IEEE COMMUNICATIONS LETTERS》 *

Also Published As

Publication number Publication date
CN108966325B (en) 2021-08-03

Similar Documents

Publication Publication Date Title
Li et al. Slow adaptive OFDMA systems through chance constrained programming
Kim et al. Sum throughput maximization for multi-user MIMO cognitive wireless powered communication networks
Di et al. Optimal resource allocation in wireless powered communication networks with user cooperation
Zeng et al. Downlink CSI feedback algorithm with deep transfer learning for FDD massive MIMO systems
CN108924935A (en) A kind of power distribution method in NOMA based on nitrification enhancement power domain
CN101730109B (en) Orthogonal frequency division multiple access relay system resource allocation method based on game theory
CN104640220A (en) Frequency and power distributing method based on NOMA (non-orthogonal multiple access) system
Vu et al. Spectral and energy efficiency maximization for content-centric C-RANs with edge caching
Ye et al. Relay selections for cooperative underlay CR systems with energy harvesting
CN108064077B (en) The power distribution method of full duplex D2D in cellular network
CN106231665B (en) Resource allocation methods based on the switching of RRH dynamic mode in number energy integrated network
CN105813189B (en) A kind of D2D distributed power optimization method in Cellular Networks
Salari et al. Joint EH time allocation and distributed beamforming in interference-limited two-way networks with EH-based relays
Liang et al. Joint user-channel assignment and power allocation for non-orthogonal multiple access relaying networks
Jalali et al. Optimal resource allocation for MC-NOMA in SWIPT-enabled networks
Meng et al. Sum-rate maximization in star-ris assisted rsma networks: A ppo-based algorithm
Yuan et al. Latency-critical downlink multiple access: A hybrid approach and reliability maximization
CN103369658B (en) The lower collaborative OFDMA system Poewr control method of safety of physical layer constraint
Rajawat et al. Cross-layer design of coded multicast for wireless random access networks
CN108966325A (en) A kind of optimal decoding sequence uplink transmission time optimization method of nonopiate access based on depth deterministic policy gradient
Zhang et al. Joint subcarrier assignment and downlink-uplink time-power allocation for wireless powered OFDM-NOMA systems
CN107182116A (en) Interference control method based on power distribution in Full-duplex cellular GSM
CN110225533A (en) NB-IoT wireless energy distribution method, device, computer equipment and storage medium
Muhammad et al. Optimizing information freshness leveraging multi-RISs in NOMA-based IoT networks
Lyu et al. Non-orthogonal multiple access in wireless powered communication networks with SIC constraints

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant