TW202406390A - Resource allocation method in downlink pattern division multiple access system based on artificial intelligence - Google Patents
Resource allocation method in downlink pattern division multiple access system based on artificial intelligence Download PDFInfo
- Publication number
- TW202406390A TW202406390A TW111128417A TW111128417A TW202406390A TW 202406390 A TW202406390 A TW 202406390A TW 111128417 A TW111128417 A TW 111128417A TW 111128417 A TW111128417 A TW 111128417A TW 202406390 A TW202406390 A TW 202406390A
- Authority
- TW
- Taiwan
- Prior art keywords
- allocation
- subcarrier
- current
- allocated
- action
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000013468 resource allocation Methods 0.000 title claims abstract description 24
- 238000013473 artificial intelligence Methods 0.000 title claims description 21
- 230000009471 action Effects 0.000 claims abstract description 145
- 230000002787 reinforcement Effects 0.000 claims abstract description 54
- 238000012549 training Methods 0.000 claims abstract description 50
- 230000003595 spectral effect Effects 0.000 claims abstract description 12
- 238000001228 spectrum Methods 0.000 claims description 29
- 230000006870 function Effects 0.000 claims description 7
- 230000008569 process Effects 0.000 description 13
- 238000005516 engineering process Methods 0.000 description 10
- 229920002939 poly(N,N-dimethylacrylamides) Polymers 0.000 description 6
- 239000011159 matrix material Substances 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
Images
Landscapes
- Mobile Radio Communication Systems (AREA)
- Small-Scale Networks (AREA)
Abstract
Description
本發明是有關於一種資源分配方法,特別是指一種基於人工智慧算法之下行模式區分多址接入系統資源分配方法。The present invention relates to a resource allocation method, in particular to a resource allocation method for a downlink mode differentiated multiple access system based on artificial intelligence algorithms.
在現有的正交多重存取(Orthogonal multiple access, OMA)中,每一個用戶只能使用一個特定的資源塊,如頻帶、時隙、正交擴頻碼,但隨著行動通訊蓬的發展,對頻譜效率的需求也與日俱增,正交多重存取顯然已經無法滿足現今用戶的需求。In the existing Orthogonal multiple access (OMA), each user can only use a specific resource block, such as frequency band, time slot, and orthogonal spreading code. However, with the development of mobile communications, The demand for spectrum efficiency is also increasing day by day, and orthogonal multiple access is obviously unable to meet the needs of today's users.
為因應頻譜效率提升的需求,非正交多重存取(Non-orthogonal multiple access, NOMA)技術,例如多用戶疊加傳輸(Multi-User Superposition Transmission, MUST)及模式區分多址接入(Pattern Division Multiple Access, PDMA)技術。In response to the need to improve spectrum efficiency, non-orthogonal multiple access (NOMA) technologies, such as Multi-User Superposition Transmission (MUST) and Pattern Division Multiple Access (Pattern Division Multiple Access, PDMA) technology.
MUST技術是屬於單載波 NOMA,在MUST技術中,通過功率域、碼域或星座域的疊加,允許多個用戶複用同一個資源塊,以提高頻譜效率和接入用戶數,且在MUST系統傳輸訊息時,重疊編碼將多用戶的訊號用不同的功率分配疊加在一起,傳送到接收端時再利用連續性干擾消除(successive interference cancellation, SIC)技術將多用戶的疊加訊號分離開來,這時如果用戶訊號間的能量差異越大,就越容易分辨出訊號,從而有較佳的錯誤率,故合理分配訊號的功率對MUST系統來說尤為重要。MUST technology belongs to single-carrier NOMA. In MUST technology, multiple users are allowed to reuse the same resource block through the superposition of power domain, code domain or constellation domain to improve spectrum efficiency and the number of access users. In the MUST system When transmitting information, overlapping coding superimposes the signals of multiple users with different power allocation. When transmitted to the receiving end, the continuous interference cancellation (SIC) technology is used to separate the superimposed signals of multiple users. At this time The greater the energy difference between user signals, the easier it is to distinguish the signals, resulting in a better error rate. Therefore, reasonable distribution of signal power is particularly important for the MUST system.
不同於MUST技術,PDMA技術是屬於多載波NOMA,在PDMA系統傳輸訊息時,除了重疊編碼將多用戶的訊號用不同的功率分配疊加在一起外,還通過模式矩陣設計將用戶的相同編碼位元映射到不同的子載波上,從而實現分集(Diversity)及多路複用(multiplexing),故合理分配訊號的功率及子載波對PDMA系統來說尤為重要。Different from MUST technology, PDMA technology is a multi-carrier NOMA. When the PDMA system transmits information, in addition to overlapping coding to superimpose the signals of multiple users with different power allocation, the same coding bits of the users are also combined through the pattern matrix design. Mapping to different subcarriers to achieve diversity and multiplexing, so rational allocation of signal power and subcarriers is particularly important for PDMA systems.
然而,現有的PDMA系統無法根據系統的動態場景進行最優功率及子載波分配。However, existing PDMA systems cannot perform optimal power and subcarrier allocation according to the dynamic scenarios of the system.
因此,本發明的目的,即在提供一種根據系統的動態場景進行最優功率及子載波分配的基於人工智慧算法之下行模式區分多址接入系統資源分配方法。Therefore, the purpose of the present invention is to provide a downlink mode differentiated multiple access system resource allocation method based on artificial intelligence algorithms for optimal power and subcarrier allocation according to the dynamic scenario of the system.
於是,本發明基於人工智慧算法之下行模式區分多址接入系統資源分配方法,由一基站來實施,該基站經由一無線通道與 K個用戶端通訊連接,該基站儲存多個子載波分配動作、多個功率分配動作,及一包括 N× K個相關於該等用戶端分別在 N個子載波的通道強度的通道狀態資訊,其中 K>1, N>1,該方法包含一步驟(A)、一步驟(B)、一步驟(C)、一步驟(D)、一步驟(E)、一步驟(F)、一步驟(G)、一步驟(H)、一步驟(I)、一步驟(J)、一步驟(K)、一步驟(L),及一步驟(M)。 Therefore, the present invention is based on the artificial intelligence algorithm for downlink mode differentiated multiple access system resource allocation method, which is implemented by a base station. The base station communicates with K users through a wireless channel. The base station stores multiple subcarrier allocation actions, A plurality of power allocation actions, and a channel status information including N × K related to the channel strength of the user terminals on N subcarriers respectively, where K> 1, N> 1, the method includes a step (A), One step (B), one step (C), one step (D), one step (E), one step (F), one step (G), one step (H), one step (I), one step (J), one step (K), one step (L), and one step (M).
在該步驟(A)中,該基站向該等用戶端分配該等子載波,以獲得 N× K個指示出該等用戶端是否分配到該等子載波的當前子載波分配結果。 In this step (A), the base station allocates the subcarriers to the user terminals to obtain N × K current subcarrier allocation results indicating whether the user terminals are allocated the subcarriers.
在該步驟(B)中,該基站根據該等當前子載波分配結果及該通道狀態資訊獲得 N× K個分別對應該等當前子載波分配結果的當前分配功率。 In this step (B), the base station obtains N × K current allocated powers respectively corresponding to the current subcarrier allocation results according to the current subcarrier allocation results and the channel status information.
在該步驟(C)中,該基站將該等子載波分配動作、該等功率分配動作、該等當前子載波分配結果,及該等當前分配功率輸入至一動作強化學習網路,以致該動作強化學習網路輸出多個分別對應該等功率分配動作及該等子載波分配動作的動作值。In this step (C), the base station inputs the subcarrier allocation actions, the power allocation actions, the current subcarrier allocation results, and the current allocated powers to an action reinforcement learning network, so that the action The reinforcement learning network outputs a plurality of action values corresponding to the equal power allocation actions and the multiple subcarrier allocation actions respectively.
在該步驟(D)中,該基站判定該等動作值是否皆小於等於0。In this step (D), the base station determines whether the action values are all less than or equal to 0.
在該步驟(E)中,當判定出該等動作值之其中一者大於0時,該基站從該等子載波分配動作及該等功率分配動作中選擇一目標分配動作。In step (E), when it is determined that one of the action values is greater than 0, the base station selects a target allocation action from the subcarrier allocation actions and the power allocation actions.
在該步驟(F)中,該基站根據該等當前子載波分配結果、該等當前分配功率及該目標分配動作,獲得多個分別對應該等當前子載波分配結果的更新子載波分配結果及多個分別對應該等當前分配功率的更新分配功率。In this step (F), the base station obtains multiple updated subcarrier allocation results and multiple updated subcarrier allocation results respectively corresponding to the current subcarrier allocation results based on the current subcarrier allocation results, the current allocated power, and the target allocation action. The updated allocated power respectively corresponds to the current allocated power.
在該步驟(G)中,該基站根據該等當前分配功率及該等更新分配功率計算出一獎勵值。In this step (G), the base station calculates a reward value based on the current allocated powers and the updated allocated powers.
在該步驟(H)中,該基站產生並儲存一包括該等當前子載波分配結果、該等當前分配功率、該目標分配動作、該獎勵值、該等更新子載波分配結果,及該等更新分配功率的訓練資料。In this step (H), the base station generates and stores a data including the current subcarrier allocation results, the current allocation power, the target allocation action, the reward value, the updated subcarrier allocation results, and the updates. Training information for distributing power.
在該步驟(I)中,該基站從儲存的訓練資料中選取多筆目標訓練資料,並根據該等目標訓練資料訓練至少一強化學習網路,該至少一強化學習網路包括該動作強化學習網路。In this step (I), the base station selects a plurality of target training data from the stored training data, and trains at least one reinforcement learning network based on the target training data. The at least one reinforcement learning network includes the action reinforcement learning Internet.
在該步驟(J)中,該基站將該等更新子載波分配結果及該等更新分配功率分別作為該等當前子載波分配結果及該等當前分配功率重複步驟(C)~(I)直到該等動作值皆小於等於0。In this step (J), the base station uses the updated subcarrier allocation results and the updated allocated power as the current subcarrier allocation results and the current allocated power respectively and repeats steps (C) ~ (I) until the The action values are all less than or equal to 0.
在該步驟(K)中,當判定出該等動作值皆小於等於0時,該基站根據該等當前分配功率計算出一候選頻譜效率,並儲存該等當前子載波分配結果、該等當前分配功率,及該候選頻譜效率。In this step (K), when it is determined that the action values are all less than or equal to 0, the base station calculates a candidate spectrum efficiency based on the current allocated power, and stores the current subcarrier allocation results, the current allocation power, and the spectrum efficiency of the candidate.
在該步驟(L)中,重複進行步驟(A)~(K) P次,以獲得 P個候選頻譜效率,其中 P>1。 In this step (L), steps (A) ~ (K) are repeated P times to obtain P candidate spectrum efficiencies, where P > 1.
在該步驟(M)中,該基站從該等候選頻譜效率中獲得一最高的目標頻譜效率。In this step (M), the base station obtains a highest target spectrum efficiency from the candidate spectrum efficiencies.
本發明之功效在於:該基站利用該動作強化學習網路在不同場景記錄學習,以獲取具有最大的獎勵値之最佳分配動作,並進一步獲得該等候選頻譜效率,再從該等候選頻譜效率中獲得最高的該目標頻譜效率,其中,該目標頻譜效率對應的子載波分配及功率分配即為最優。The effect of the present invention is: the base station uses the action reinforcement learning network to record learning in different scenarios to obtain the best allocation action with the largest reward value, and further obtains the candidate spectrum efficiencies, and then obtains the candidate spectrum efficiencies from the The highest target spectral efficiency is obtained, wherein the subcarrier allocation and power allocation corresponding to the target spectrum efficiency are optimal.
在本發明被詳細描述之前,應當注意在以下的說明內容中,類似的元件是以相同的編號來表示。Before the present invention is described in detail, it should be noted that in the following description, similar elements are designated with the same numbering.
參閱圖1,本發明基於人工智慧算法之下行模式區分多址接入系統資源分配方法的一實施例是由一基站11執行,該基站11支援下行功率域的模式區分多址接入技術,該基站11經由一無線通道100與
K個用戶端12通訊連接,該基站11通過為每一用戶端12使用不同等級的功率將該等用戶端12的信號疊加在
N個子載波上,其中
K>1,
N>1。值的注意的是,在本實施例中,該基站11例如為單天線基站(base station, BS),該等用戶端12例如為智慧型手機,但不以此為限。
Referring to Figure 1, an embodiment of the downlink Mode Differentiated Multiple Access system resource allocation method based on artificial intelligence algorithms of the present invention is executed by a
該基站11儲存有多個子載波分配動作、多個功率分配動作,及一包括
K個分別對應該等用戶端12的通道強度的通道狀態資訊,其中該通道狀態資訊係該基站11根據上行導頻估算出來的。
The
參閱圖1、2展示了本發明基於人工智慧算法之下行模式區分多址接入系統資源分配方法的該實施例,以下詳述圖2所示的該實施例的各個步驟。Referring to Figures 1 and 2, an embodiment of the downlink mode differentiated multiple access system resource allocation method based on artificial intelligence algorithms of the present invention is shown. Each step of the embodiment shown in Figure 2 will be described in detail below.
在步驟21中,該基站11初始化多個強化學習網路。In
值得注意的是,在本實施例中,該等強化學習網路的類型例如為Q學習網路,且數量為二,該等強化學習網路分別為一更新網路和一目標網路,該等強化學習網路例如包括一具有五十個節點的全連階層,啟動函數例如為整流線性單位函數(Rectified Linear Unit, ReLU),設定一學習演算法例如為自適應時刻估計方法(Adaptive Moment Estimation, Adam),設定一損失函數例如為均方誤差(mean-square error, MSE),在其他實施方式中,該等強化學習網路例如包括一對照表(Q表格),該學習演算法可為隨機梯度下降法(Stochastic gradient descent, SGD)、動量梯度下降法(Momentum)、或Adagrad算法,損失函數可為平方損失函數或絕對值損失函數,此外,強化學習網路的類型不限於Q學習網路,同時該基站11亦可僅初始化一強化學習網路,但不以此為限。It is worth noting that in this embodiment, the type of the reinforcement learning network is, for example, Q-learning network, and the number is two. The reinforcement learning networks are an update network and a target network respectively. The reinforcement learning network includes, for example, a fully connected layer with fifty nodes, the activation function is, for example, a Rectified Linear Unit (ReLU), and a learning algorithm is set, for example, the Adaptive Moment Estimation method. , Adam), setting a loss function such as mean-square error (MSE). In other implementations, the reinforcement learning networks include, for example, a comparison table (Q table), and the learning algorithm can be Stochastic gradient descent (SGD), momentum gradient descent (Momentum), or Adagrad algorithm, the loss function can be a square loss function or an absolute value loss function. In addition, the type of reinforcement learning network is not limited to Q learning network path, and the
在步驟22中,該基站11判定是否已循環
P次。當該基站11判定出未循環
P次時,流程進行步驟23;而當該基站11判定出已循環
P次時,流程進行步驟34。值得注意的是,在本實施例中,該基站是以一循環計數器(圖未示)計數循環次數,其中
P=20000,但不以此為限。
In
在步驟23中,該基站11向該等用戶端12分配該等子載波,以獲得
N×
K個指示出該等用戶端12是否分配到該等子載波的當前子載波分配結果。
In
值得注意的是,該基站11是按照一特徵模式矩陣(characteristic pattern matrix)
向該等用戶端12分配該等子載波,該等當前子載波分配結果
滿足下列條件:
,及
,
其中,
為第
k個的用戶端12在當前時刻
t是否分配到第
n個子載波的當前子載波分配結果,
,
為第
k個的用戶端12在當前時刻
t分配到第
n個子載波,
為第
k個的用戶端12在當前時刻
t未分配到第
n個子載波,
為每一子載波上的最大用戶端數,該特徵模式矩陣
可表示為:
。
It is worth noting that the
要再注意的是,每一用戶端12分配的子載波數量和每一子載波上的用戶端數量需要考慮接收機正確檢測和用戶端12間干擾問題,以
K=6,
N=4為例,每一用戶端12分配的子載波數量
L為
,每一子載波上的用戶端數量
U為
。
It should be noted that the number of subcarriers allocated to each
在步驟24中,該基站11根據該等當前子載波分配結果及該通道狀態資訊獲得
N×
K個分別對應該等當前子載波分配結果的當前分配功率。
In
搭配參閱圖3,步驟24包括子步驟241、242,以下說明步驟24所包括的子步驟。Referring to FIG. 3 ,
在子步驟241中,對於每一子載波,該基站11根據該通道狀態資訊中該等用戶端12在該子載波的通道強度由大至小排序該等用戶端12。該基站11排序方式以下式表示:
,
其中
,
,
為在第
n個子載波上的第
k個順序的用戶端12在當前時刻
t的通道強度,
為在第
n個子載波上的第
k+1個順序的用戶端12在當前時刻
t的通道強度。
In
值得注意的是,在本實施例中,由於在SIC技術中,為了信幹噪比(SINR)最大化,要求該等用戶端12的分配功率與通道強度成反比,且在解碼時,該基站11按照該等用戶端12的分配功率之係數由大至小進行解碼,故先行將該等用戶端12依照該等通道強度由大至小排序,方便該基站11後續依據排序由小至大分配功率以及解碼,但不以此為限。It is worth noting that in this embodiment, in SIC technology, in order to maximize the signal-to-interference-to-noise ratio (SINR), the allocated power of the
在子步驟242中,對於每一子載波,該基站11根據分配到該子載波的用戶端12的順序依序分配功率,未分配到該子載波的用戶端12則不分配功率,即功率為零,以獲得該等當前分配功率。In
其中,該等當前分配功率
滿足下列條件:
,
,
若
則
,及
若
則
,
,
,
,
為第
k個用戶端12在當前時刻
t是否分配到第
n個子載波的子載波分配結果,
,
為在第
n個子載波上的第
個順序的用戶端12在當前時刻
t分配到的當前分配功率之係數,
為在第
n個子載波上的第
k個順序的用戶端12在當前時刻
t分配到的當前分配功率之係數。
Among them, the currently allocated power Meet the following conditions: , , like rule , and if rule , , , , The subcarrier allocation result is whether the
在步驟25中,該基站11將該等子載波分配動作、該等功率分配動作、該等當前子載波分配結果,及該等當前分配功率輸入至該等強化學習網路中之一動作強化學習網路,以致該動作強化學習網路輸出多個分別對應該等功率分配動作及該等子載波分配動作的動作值。In
值得注意的是,在本實施例中,該動作強化學習網路為該更新網路,該等動作值為Q值,每一子載波分配動作一次只調整一個用戶端12的一個子載波,該等子載波分配動作可以下式表示:
,
其中
表示在當前時刻
t第
n個子載波被分配給第
k個用戶端12,如第
n個子載波在上一時刻已經被分配給第
k個用戶端12,則保持子載波分配情況不變。
表示在當前時刻
t第
n個子載波未被分配給第
k個用戶端12,如第
n個子載波在上一時刻已經未被分配給第
k個用戶端12,則保持子載波分配情況不變,該等子載波分配動作的數量為2×
N×
K個。此外,每一功率分配動作一次只調整一個功率係數,該等功率分配動作可以下式表示:
,
其中
,
,
表示對功率係數
增加
,
表示功率係數
不變,
表示對功率係數
減少
,該等功率分配動作的數量為3×
N×
K個,但不以此為限。
It is worth noting that in this embodiment, the action reinforcement learning network is the update network, the action values are Q values, and each subcarrier allocation action only adjusts one subcarrier of one
在步驟26中,該基站11判定該等動作值是否皆小於等於0。當該基站11判定出該等動作值之其中一者大於0時,流程進行步驟27;而當該基站11判定出該等動作值皆小於等於0時,則流程進行步驟33。In
要特別注意的是,在本實施例的步驟26中,判定該等動作值是否皆小於等於0,只觀察適用於當前超載率的該更新網路輸出的動作值,並不借鑒當前超載率下的該目標網路的輸出值,因此,在步驟25中,該基站11只將該等子載波分配動作、該等功率分配動作,及該等當前分配功率輸入至該更新網路。It should be particularly noted that in
要再特別注意的是,若該等動作值皆小於等於0,則認為在當前的狀態下採取任何功率分配動作都會使得長期預期獎勵變低,然而,獎勵需要越高越好,因此判定此時的功率分配動作為最優結果,不再進行功率分配動作,而進行步驟33。It is important to note that if the action values are all less than or equal to 0, it is considered that taking any power allocation action in the current state will make the long-term expected reward lower. However, the reward needs to be as high as possible, so it is determined that at this time The power allocation action is the optimal result, and the power allocation action is no longer performed, but step 33 is performed.
在步驟27中,該基站11從該等子載波分配動作及該等功率分配動作中選擇一目標分配動作。其中,該目標分配動作為隨機選取的機率為
,該目標分配動作對應的動作值為該等動作值中最高的機率為
,
且
。值得注意的是,在本實施例中,
為10%,
為90%,但不以此為限,在其他實施方式中,該目標分配動作亦可僅為根據當前狀態選取,或是選擇該等動作值中最高者所對應的動作。
In step 27, the
在步驟28中,該基站11根據該等當前子載波分配結果、該等當前分配功率及該目標分配動作,獲得多個分別對應該等當前子載波分配結果的更新子載波分配結果及多個分別對應該等當前分配功率的更新分配功率。In
搭配參閱圖4,步驟28包括子步驟281~289,以下說明步驟28所包括的子步驟。Referring to FIG. 4 ,
在子步驟281中,該基站11判定該目標分配動作是否為子載波分配動作。當該基站11判定出該目標分配動作為子載波分配動作,流程進行子步驟282;而當該基站11判定出該目標分配動作不為子載波分配動作,表示目標分配動作為功率分配動作,則流程進行子步驟286。In
在子步驟282中,該基站11根據該目標分配動作獲得
N×K個分別對應該等當前子載波分配結果的替換子載波分配結果。
In
在子步驟283中,該基站11判定該等替換子載波分配結果是否滿足多個子載波分配條件。當該基站11判定出該等替換子載波分配結果不滿足該等子載波分配條件之其中一者時,流程進行子步驟284;而當該基站11判定出該等替換子載波分配結果滿足該等子載波分配條件時,則流程進行子步驟285。In
值得注意的是,該等子載波分配條件包括:
,及
,
其中,
為在第
n個子載波上的第
k個順序的用戶端12在下一時刻
t+1的替換子載波分配結果,
,
為第
k個用戶端12在下一時刻
t+1分配到第
n個子載波,
為第
k個用戶端12在下一時刻
t+1未分配到第
n個子載波,
為每一子載波上的最大用戶端數,但不以此為限。
It is worth noting that the subcarrier allocation conditions include: ,and , in, is the replacement subcarrier allocation result for the kth
在子步驟284中,該基站11將該等當前子載波分配結果及該等當前分配功率分別作為該等更新子載波分配結果及該等更新分配功率,即子載波分配結果及分配功率保持不變。In
在子步驟285中,該基站11將該等替換子載波分配結果作為該等更新子載波分配結果,並根據該等更新子載波分配結果及該通道狀態資訊獲得該等更新分配功率。In
要特別注意的是,在子步驟285中該基站11獲得該等更新分配功率的方式類似於子步驟241、242獲得該等當前分配功率的方式,故在此不加以贅述。It should be particularly noted that the way in which the
在子步驟286中,該基站11對該等當前分配功率進行該目標分配動作,以獲得多個分別對應該等當前分配功率的替換分配功率。In
要特別注意的是,若該目標分配動作為功率分配,不論是根據當前狀態選取的動作或是該等動作值中最高者所對應的動作 ,對應的當前子載波分配結果 ,且要增加 的 或減少 的 對應的當前子載波分配結果 。 It is important to note that if the target allocation action is power allocation, whether it is the action selected based on the current state or the action corresponding to the highest action value among these actions , the corresponding current subcarrier allocation result , and to increase of or reduce of The corresponding current subcarrier allocation result .
在子步驟287中,該基站11判定該等替換分配功率是否滿足多個功率分配條件。當該基站11判定出該等替換分配功率不滿足該等功率分配條件之其中一者時,流程進行子步驟288;而當該基站11判定出該等替換分配功率滿足該等功率分配條件時,則流程進行子步驟289。In
值得注意的是,該等功率分配條件包括:
,
,
若
則
,及
若
則
,
其中,
,
,
為在第
n個子載波上的第
個順序的用戶端12在下一時刻
t+1分配到的替換分配功率之係數,
為在第
n個子載波上的第
k個順序的用戶端12在下一時刻
t+1分配到的替換分配功率之係數。
It is worth noting that such power allocation conditions include: , , like rule , and if rule , in, , , is the nth subcarrier on the The coefficient of the replacement allocation power allocated to the
在子步驟288中,該基站11將該等當前子載波分配結果及該等當前分配功率分別作為該等更新子載波分配結果及該等更新分配功率,即子載波分配結果及分配功率保持不變。In
在子步驟289中,該基站11將該等當前子載波分配結果及該等替換分配功率分別作為該等更新子載波分配結果及該等更新分配功率。In
在步驟29中,該基站11判定一相關於當前該基站11通訊連接的用戶端12之數量與用戶端12的信號疊加到的子載波之數量的超載率是否為
K/
N。當該基站11判定出該超載率為
K/
N時,流程進行步驟30;而當該基站11判定出該超載率不為
K/
N時,則流程重複步驟21。
In
要特別注意的是,該超載率為當前該基站11通訊連接的用戶端12之數量除以用戶端12的信號疊加到的子載波之數量,在本實施例中,該基站11通訊的用戶端數量和位置都是不固定的,該基站11會根據用戶端12的數量調整資源配置方案,故在通過上行導頻估計發現該超載率不為
K/
N時,即該超載率改變時(設改變後的超載率為
K’/
N,
K’>1且
K’
K),該基站11會儲存一包括該等强化學習網路且對應超載率為
K/
N的歷史強化學習網路資訊,並判定是否儲存有一對應超載率為
K’/
N的目標歷史強化學習網路資訊,若儲存有該目標歷史強化學習網路資訊,則載入該歷史強化學習網路資訊,並進行步驟22,否則流程回到步驟21,該基站11初始化該等强化學習網路,以作為適用於超載率為
K’/
N的强化學習網路。
It should be noted that the overload rate is the number of
在步驟30中,該基站11根據該等當前子載波分配結果、該等當前分配功率、該等更新子載波分配結果,及該等更新分配功率計算出一獎勵值。In
搭配參閱圖5,步驟30包括子步驟301~303,以下說明步驟30所包括的子步驟。Referring to FIG. 5 ,
在子步驟301中,該基站11根據該等當前子載波分配結果及該等當前分配功率計算出一第一頻譜效率
。其中該第一頻譜效率
以下式表示:
,
,
,
其中,
該等當前子載波分配結果及該等當前分配功率的集合,
為該等當前子載波分配結果,
為該等當前分配功率,
為第
k個用戶端12在第
n個子載波及在當前時刻
t的通道容量(Channel capacity),
為第
n個子載波頻寬,
為第
k個用戶端12在第
n個子載波及在當前時刻
t的信幹噪比,
為在第
n個子載波上的第
j個順序的用戶端12在當前時刻
t分配到的當前分配功率之係數,
為第
j個用戶端12在當前時刻
t是否分配到第
n個子載波的當前子載波分配結果,
為該基站11的分配的總功率,
為SIC殘留係數,
為加性高斯白色雜訊(AWGN)。
In
要再特別注意的是,由於未分配到該子載波的用戶端12則不分配功率,因此在步驟30中,該基站11實際可僅根據該等當前分配功率及該等更新分配功率計算出該獎勵值,第
k個用戶端12在第
n個子載波及在當前時刻
t的信幹噪比
亦可表示為:
。
It should be noted that since the
在子步驟302中,該基站11根據該等更新子載波分配結果及該等更新分配功率計算出一第二頻譜效率
。該第二頻譜效率
算式與該第一頻譜效率
相同故在此不加以贅述。
In
在子步驟303中,該基站11根據該第一頻譜效率
及該第二頻譜效率
計算出一獎勵值
,
為在當前時刻
t所選取的該目標分配動作。
In
值得注意的是,在本實施例中,該獎勵值為該第二頻譜效率減去該第一頻譜效率,即該獎勵值 ,但不以此為限。 It is worth noting that in this embodiment, the reward value is the second spectral efficiency minus the first spectral efficiency, that is, the reward value , but not limited to this.
在步驟31中,該基站11產生並儲存一包括該等當前子載波分配結果、該等當前分配功率、該目標分配動作、該獎勵值、該等更新子載波分配結果,及該等更新分配功率的訓練資料。In
在步驟32中,該基站11從儲存的訓練資料中選取多筆目標訓練資料,並根據該等目標訓練資料訓練該等強化學習網路,並重複進行步驟25。In
值得一提的是,在重複步驟25前,該基站11會先將在步驟28所獲得該等更新子載波分配結果及該等更新分配功率分別作為該等當前子載波分配結果及該等當前分配功率,再重複進行步驟25。It is worth mentioning that before repeating
搭配參閱圖6,步驟32包括子步驟321~323,以下說明步驟32所包括的子步驟。Referring to FIG. 6 ,
在子步驟321中,該基站11從儲存的訓練資料中選取該等目標訓練資料。In
值得注意的是,在本實施例中,該基站11例如隨機選取32筆目標訓練資料,而在循環開始初期,因為沒有儲存足夠的訓練資料,故32筆目標訓練資料中會有幾筆目標訓練資料為空,但不以此為限。It is worth noting that in this embodiment, the
在子步驟322中,該基站11將該等目標訓練資料的當前子載波分配結果、當前分配功率,及目標分配動作輸入至該動作強化學習網路,以致該動作強化學習網路輸出多個分別對應該等目標訓練資料的訓練動作值。In
在子步驟323中,該基站11根據該等目標訓練資料及該等訓練動作值調整該等強化學習網路。In
值得注意的是,在本實施例中,該基站11根據該等目標訓練資料的獎勵值及該等訓練動作值利用該損失函數獲得一損失值,並根據該損失值利用該學習演算法將該等強化學習網路進行更新,以調整該等強化學習網路,亦即對於每一目標訓練資料,該基站11將該目標訓練資料中的當前子載波分配結果、當前分配功率,及目標分配動作輸入至該更新網路,使得該更新網路輸出
,再將該目標訓練資料中的獎勵值、更新子載波分配結果,及更新分配功率輸入至該目標網路,使得該目標網路輸出
,並求得
與
的均方誤差作為該損失值,其中
為權衡即時獎勵和後續獎勵重要性的折現因數,
為該目標訓練資料對應的訓練動作值,
為該目標訓練資料的更新子載波分配結果及更新分配功率集合搭配所有子載波分配動作及功率分配動作能獲得的最大動作值,再根據該等目標訓練資料的損失值利用自適應時刻估計方法對該更新網路的參數進行更新,在多次更新之後,例如32次,再將該更新網路的參數複製到該目標網路,以更新該目標網路的參數,但不以此為限,在其他只有該更新網路的實施方式中,則不需要將該更新網路的參數複製到該目標網路。
It is worth noting that in this embodiment, the
要特別注意的是,在其他該等強化學習網路例如包括該對照表的實施方式中,該對照表具有多個表格動作值,每一表格動作值對應一子載波分配結果、一分配功率結果,及一分配動作,在步驟32中,該基站11根據該等目標訓練資料更新該對照表,以訓練該等強化學習網路。詳細而言,該基站11根據以下公式更新該對照表:
,
其中,
表示第
i筆目標訓練資料的子載波分配集合及分配功率集合,
表示第
i筆目標訓練資料的目標分配動作,
表示第
i筆目標訓練資料的獎勵值,
表示第
i筆目標訓練資料的更新子載波分配結果及更新分配功率集合,
m表示
更新的次數,
Q
m (
s
i ,
a
i )為該對照表中對應該第
i筆目標訓練資料的子載波分配結果、分配功率,及目標分配動作的一目標表格動作值,
表示該目標表格動作值更新後的值,
表示更新的學習率,
表示該對照表中對應該第
i筆目標訓練資料的更新子載波分配結果及更新分配功率集合搭配所有子載波分配動作及功率分配動作能獲得的一最大表格動作值,
是由該等強化學習網路中的目標網路計算出來,
是由該等強化學習網路中的更新網路計算出來,因為PDMA技術子載波分配動作及功率分配動作較多,Q表格需要較多的儲存空間,故本實施例是採用含有一隱藏層的Q網路對Q表格進行擬合,即Q網路的輸入對應Q表格中的狀態矩陣,Q網路的輸出對應Q表格中該狀態的Q值,因為Q網路中參數的個數遠小於Q表格中Q值的個數,所以節省了該基站的儲存空間。
It should be particularly noted that in other implementations of the reinforcement learning network, such as the comparison table, the comparison table has multiple table action values, and each table action value corresponds to a subcarrier allocation result and an allocation power result. , and an allocation action. In
在步驟33中,該基站11根據該等當前子載波分配結果及該等當前分配功率計算出一候選頻譜效率,並儲存該等當前子載波分配結果、該等當前分配功率,及該候選頻譜效率,並重複進行步驟22。值得注意的是,在本實施例中,每當進行步驟33該循環計數器加1,但不以此為限,在其他實施方式中,該循環計數器亦可在步驟23或步驟24加1。In
在步驟34中,該基站11從該等候選頻譜效率中獲得一最高的目標頻譜效率,該循環計數器清零並重複步驟22,其中,該目標頻譜效率對應的子載波分配結果及分配功率即為最佳的子載波分配結果及最佳的分配功率。In
綜上所述,本發明為基於人工智慧算法之下行模式區分多址接入系統資源分配方法,藉由該基站11利用該等強化學習網路在不同場景記錄學習,以獲取具有最大的獎勵値之最佳分配動作,並進一步獲得該等候選頻譜效率,再從該等候選頻譜效率中獲得最高的該目標頻譜效率,其中,該目標頻譜效率對應的子載波分配及功率分配即為最優,故確實能達成本發明的目的。To sum up, the present invention is a downlink mode differentiated multiple access system resource allocation method based on artificial intelligence algorithms. The
惟以上所述者,僅為本發明的實施例而已,當不能以此限定本發明實施的範圍,凡是依本發明申請專利範圍及專利說明書內容所作的簡單的等效變化與修飾,皆仍屬本發明專利涵蓋的範圍內。However, the above are only examples of the present invention and should not be used to limit the scope of the present invention. All simple equivalent changes and modifications made based on the patent scope of the present invention and the content of the patent specification are still within the scope of the present invention. within the scope covered by the patent of this invention.
11:基站
12:用戶端
100:無線通道
21~34:步驟
241、242:子步驟
281~289:子步驟
301~303:子步驟
321~323:子步驟
11: Base station
12: Client
100:
本發明的其他的特徵及功效,將於參照圖式的實施方式中清楚地呈現,其中:
圖1是一方塊圖,說明用以實施本發明基於人工智慧算法之下行模式區分多址接入系統資源分配方法的一實施例的一基站;
圖2是一流程圖,說明本發明基於人工智慧算法之下行模式區分多址接入系統資源分配方法的該實施例;
圖3是一流程圖,輔助說明圖2步驟24的子步驟;
圖4是一流程圖,輔助說明圖2步驟28的子步驟;
圖5是一流程圖,輔助說明圖2步驟30的子步驟;及
圖6是一流程圖,輔助說明圖2步驟32的子步驟。
Other features and effects of the present invention will be clearly presented in the embodiments with reference to the drawings, in which:
Figure 1 is a block diagram illustrating a base station used to implement an embodiment of the downlink mode differentiated multiple access system resource allocation method based on artificial intelligence algorithms of the present invention;
Figure 2 is a flow chart illustrating this embodiment of the downlink mode differentiated multiple access system resource allocation method based on artificial intelligence algorithms of the present invention;
Figure 3 is a flow chart to assist in explaining the sub-steps of
21~34:步驟 21~34: Steps
Claims (12)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW111128417A TWI812371B (en) | 2022-07-28 | 2022-07-28 | Resource allocation method in downlink pattern division multiple access system based on artificial intelligence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW111128417A TWI812371B (en) | 2022-07-28 | 2022-07-28 | Resource allocation method in downlink pattern division multiple access system based on artificial intelligence |
Publications (2)
Publication Number | Publication Date |
---|---|
TWI812371B TWI812371B (en) | 2023-08-11 |
TW202406390A true TW202406390A (en) | 2024-02-01 |
Family
ID=88585914
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW111128417A TWI812371B (en) | 2022-07-28 | 2022-07-28 | Resource allocation method in downlink pattern division multiple access system based on artificial intelligence |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI812371B (en) |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109246754B (en) * | 2017-01-04 | 2019-10-22 | 华为技术有限公司 | A kind of communication means and its terminal device, the network equipment |
CN111200572B (en) * | 2018-11-19 | 2021-10-22 | 华为技术有限公司 | Data transmission method and device |
WO2020242898A1 (en) * | 2019-05-26 | 2020-12-03 | Genghiscomm Holdings, LLC | Non-orthogonal multiple access |
CN111212438B (en) * | 2020-02-24 | 2021-07-16 | 西北工业大学 | Resource allocation method of wireless energy-carrying communication technology |
CN113923767B (en) * | 2021-09-23 | 2023-10-13 | 怀化建南电子科技有限公司 | Energy efficiency maximization method for multi-carrier cooperation non-orthogonal multiple access system |
-
2022
- 2022-07-28 TW TW111128417A patent/TWI812371B/en active
Also Published As
Publication number | Publication date |
---|---|
TWI812371B (en) | 2023-08-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107979455B (en) | System and method for scalable digital communication with adaptive system parameters | |
Hieu et al. | Optimal power allocation for rate splitting communications with deep reinforcement learning | |
CN104365137B (en) | The method and its equipment of feeding back channel state information in a wireless communication system | |
JP5496203B2 (en) | Method for changing transmit power pattern in a multi-cell environment | |
JP4885980B2 (en) | Method and apparatus for flexible reporting of control information | |
RU2435315C2 (en) | Method and device for transmission of alarm information by means of channel identifiers | |
JP4801172B2 (en) | Method and apparatus for communicating and / or utilizing transmission power information | |
CN103095398B (en) | Method and user equipment and base station for transmission and control information | |
CN101606414B (en) | Channel quality indicator reporting flexibly | |
WO2015144094A1 (en) | Multi-user, multiple access, systems, methods, and devices | |
JP5600795B2 (en) | Multi-user MIMO system, base station, user equipment, and CQI feedback method | |
CN108781448A (en) | The mechanism of multilevel distributed cooperative multi-point technology | |
JP2012257300A (en) | Communications device control information reporting related methods and apparatus | |
CN102316597A (en) | Resource scheduling method and device for multiple input multiple output (MIMO) system | |
WO2012116755A1 (en) | Lte scheduling | |
JP2012124585A (en) | Communication quality estimation device, base station device, communication quality estimation method, and communication quality estimation program | |
JP2016219995A (en) | Radio communication system, base station, radio terminal, and base station processing method | |
JP2008532338A (en) | Communications system | |
Lin et al. | Fdof: Enhancing channel utilization for 802.11 ac | |
CN114867030A (en) | Double-time-scale intelligent wireless access network slicing method | |
WO2018157365A1 (en) | Method and apparatus for user equipment and base station used for power regulation | |
WO2011160387A1 (en) | Method and base station for scheduling resources of multiple input multiple output | |
Gong et al. | Dynamic user scheduling with user satisfaction rate in cell-free massive mimo | |
WO2017108075A1 (en) | Method and apparatus for enhancing user selection in a mu-mimo system | |
Orakzai et al. | Optimal energy‐efficient resource allocation in uplink SC‐FDMA networks |