KR20230097696A

KR20230097696A - Rate-splitting multiple access method for the base station with limited energy supply to maximize the sum rate based on deep reinforcement learning

Info

Publication number: KR20230097696A
Application number: KR1020210187442A
Authority: KR
Inventors: 신원재; 성재협
Original assignee: 아주대학교산학협력단
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2023-07-03

Abstract

본 발명의 일 실시 예에 따른 에너지 공급이 제한된 기지국이 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법은, 제1 상태의 상기 기지국 정보를 기반으로 제1 송신전력을 결정하는 단계; 복수의 단말들에게 전송할 데이터를 공유 및 개인 메시지로 분할하고, 상기 공유 및 개인 메시지를 각각 공유 및 개인 스트림으로 부호화하는 단계; 상기 결정된 송신전력 내에서 상기 부호화된 공유 및 개인 스트림 각각에 대한 전력할당 및 빔포밍 설계를 하는 단계; 상기 복수의 단말들에게 상기 데이터를 송신하고 합 전송률을 수신하는 단계; 및 상기 수신한 합 전송률 및 제2 상태의 상기 기지국 정보를 기반으로 제2 송신전력을 결정하는 단계;를 포함하는 것을 특징으로 한다. 이에 따라, 본 발명의 일 실시 예에서는, 기지국이 복수의 단말들에게서 수신한 합 전송률 및 기지국 상태 정보를 기반으로 심층강화학습을 사용하여 송신전력을 결정하고 결정된 송신전력 내에서 최적화 기법을 사용하여 부호화된 공유 및 개인 스트림 각각에 대한 전력할당 및 빔포밍 설계를 함으로써 에너지 공급이 제한된 기지국이 변화하는 외부 에너지 및 통신 채널 상태에 적합하도록 공급받은 에너지를 효율적으로 사용 가능하다.According to an embodiment of the present invention, a transmission rate division multiple access method for maximizing an aggregate transmission rate of a base station having limited energy supply includes determining a first transmit power based on information of the base station in a first state; dividing data to be transmitted to a plurality of terminals into shared and private messages, and encoding the shared and private messages into shared and private streams, respectively; performing power allocation and beamforming design for each of the coded shared and private streams within the determined transmission power; transmitting the data to the plurality of terminals and receiving a sum transmission rate; and determining a second transmit power based on the received sum rate and the base station information in the second state. Accordingly, in an embodiment of the present invention, the base station determines the transmission power by using deep reinforcement learning based on the sum rate and base station state information received from the plurality of terminals, and uses an optimization technique within the determined transmission power. By designing power allocation and beamforming for each of the coded shared and private streams, a base station with limited energy supply can efficiently use the supplied energy to suit changing external energy and communication channel conditions.

Description

Rate-splitting multiple access method for the base station with limited energy supply to maximize the sum rate based on deep reinforcement learning}

본 발명은 에너지 공급이 제한된 기지국이 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법에 관한 것으로, 보다 구체적으로는 심층강화학습 및 최적화 기법을 이용하여 송신기에서의 전력할당 및 빔포밍 설계를 최적화하는, 에너지 공급이 제한된 통신 환경에서 합 전송률 최대화를 위한 전송률 분할 다중접속 방법에 관한 것이다.The present invention relates to a rate division multiple access method for maximizing the sum rate in a base station with limited energy supply, and more specifically, to optimize power allocation and beamforming design in a transmitter using deep reinforcement learning and optimization techniques. , a rate division multiple access method for maximizing the sum rate in a communication environment where energy supply is limited.

LTE 기반의 4G 이동통신의 등장 이후 복수개의 안테나를 이용한 MIMO(Multiple Input Multiple Output) 기술은 이동통신에서 반드시 필요한 핵심 기술이 되었다. 최근 MIMO 기술은 Massive MIMO 등으로 발전하여 이론적으로는 무한대의 안테나까지도 고려하고 있다. 또한, 무선 LAN에서도 802.11n을 시작으로 MIMO 기술이 보급되어 최근 출시되는 모든 무선 LAN에서는 MIMO 기술을 기본으로 채용하고 있으며, 빔포밍과 공간 다중화 기술을 결합한 다중 사용자 MIMO(Multi-User MIMO: MU-MIMO) 기술이 등장하였다.Since the advent of LTE-based 4G mobile communication, MIMO (Multiple Input Multiple Output) technology using multiple antennas has become a core technology that is essential for mobile communication. Recently, MIMO technology has been developed into Massive MIMO, etc., and theoretically, infinite antennas are considered. In addition, in wireless LAN, MIMO technology has been spread starting with 802.11n, and all recently released wireless LANs adopt MIMO technology as a standard. MIMO) technology has emerged.

다중 사용자 MIMO 기술은 하나의 기지국에서 다수의 안테나를 사용하여 동일 주파수 대역으로 다수 사용자에게 동시에 서비스를 지원 가능하여 무선 대역의 효율성이 증대된다. 이러한 장점에도 불구하고 하나의 기지국에서 다수의 안테나를 사용하여 다수 사용자에게 서비스를 제공하기 때문에 사용자 간의 간섭(inter-user interference) 문제가 존재하게 되며, 무선 통신의 특성상 정보 전달 매체인 채널의 정보는 급격히 변화하므로 모든 사용자에 대하여 정확한 채널 정보를 획득하기란 사실상 불가능하게 되어 서비스를 제공받는 사용자 간의 신호 간섭은 피할 수 없는 실정이다.Multi-user MIMO technology can simultaneously support services to multiple users in the same frequency band using multiple antennas in one base station, thereby increasing the efficiency of the radio band. Despite these advantages, since one base station uses multiple antennas to provide services to multiple users, there is a problem of inter-user interference. Since it changes rapidly, it is practically impossible to obtain accurate channel information for all users, and thus signal interference between service users is unavoidable.

전송률 분할 다중접속기술(RSMA: Rate-Splitting Multiple Access)은 비직교 다중 사용자 접속기술로서, 다중 사용자 및 다중 안테나 통신 환경에서 우수한 성능과 강인성을 가지며 다양한 채널 환경에서 다수의 사용자를 지원 가능한 것이 특징이다. 이러한 전송률 분할 다중접속기술은 기존의 다중접속기술에 비해 에너지 및 주파수 대역의 효율성과 부정확한 채널 상태 정보에 강인하다는 측면에서 강점을 지니고 있다. 한편, 지금까지의 전송률 분할 다중접속기술은 송신전력에 제한이 없는 기지국을 사용하여 기지국이 일정 송신전력으로 사용자에게 서비스를 제공할 수 있는 환경에서 변화하는 채널 상태 정보에 대비하여 최대의 합 전송률을 제공할 수 있도록 발전되어왔다.Rate-Splitting Multiple Access (RSMA) is a non-orthogonal multi-user access technology that has excellent performance and robustness in a multi-user and multi-antenna communication environment and is capable of supporting multiple users in a variety of channel environments. . This rate division multiple access technology has strengths compared to existing multiple access technologies in terms of energy and frequency band efficiency and robustness against inaccurate channel state information. On the other hand, the transmission rate division multiple access technology so far uses a base station with no limit on transmission power to obtain the maximum sum transmission rate in preparation for changing channel state information in an environment where the base station can provide services to users with a certain transmission power. has been developed to provide

직진성이 강한 고주파 대역(mmWave)을 사용하는 차세대 이동통신기술인 6G에서는 무인 항공기 시스템인 드론형 기지국 플랫폼을 활용하는 이동 기지국의 중요성이 점차 부각되고 있다. 고주파 대역은 기존 주파수 대역에 비해 넓은 대역폭을 활용하기 때문에 대용량 데이터의 전송에는 적합하지만 직진성이 강하므로 전파 쉐도잉(shadowing) 등에 상대적으로 취약하며 이를 해결하기 위해 기지국 수를 늘리고 기지국 간의 간격을 좁히는 스몰 셀(small cell)의 중요성이 점점 커지는 실정이다. 이동 기지국은 제한된 송신전력으로 인해 태양에너지, RF신호 등의 외부 에너지를 통해 에너지를 공급받으므로 이를 통신 채널 상태에 적합하게 효율적으로 사용해야 사용자에게 신뢰성 있는 서비스 제공이 가능하다. 그러나 변화하는 외부 에너지와 통신 채널 상태에 적합하게 공급받은 에너지를 효율적으로 사용하기는 어렵다. 이에 따라, 에너지 공급이 제한된 기지국이 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법에 있어서 변화하는 외부 에너지 및 통신 채널 상태에 적합하게 공급받은 에너지를 효율적으로 사용 가능한 다중접속 방법이 필요하다.In 6G, a next-generation mobile communication technology that uses a high-frequency band (mmWave) with strong linearity, the importance of a mobile base station using a drone-type base station platform, an unmanned aerial vehicle system, is gradually emerging. Since the high frequency band utilizes a wider bandwidth than the existing frequency band, it is suitable for transmitting large amounts of data, but is relatively vulnerable to radio wave shadowing due to its strong linearity. The importance of small cells is increasing. Since the mobile base station is supplied with energy through external energy such as solar energy and RF signal due to limited transmission power, it is possible to provide reliable service to the user by using it efficiently and appropriately for the communication channel condition. However, it is difficult to efficiently use the energy supplied appropriately to the changing external energy and communication channel conditions. Accordingly, in the transmission rate division multiple access method for maximizing the sum transmission rate of a base station with limited energy supply, a multiple access method capable of efficiently using the energy supplied appropriately to the changing external energy and communication channel conditions is required.

본 발명은 상술한 바와 같은 종래 기술의 문제점을 해결하기 위한 것으로서, 기지국이 변화하는 외부 에너지 및 통신 채널 상태에 적합하게 공급받은 에너지를 효율적으로 사용 가능하도록 하는 에너지 공급이 제한된 기지국이 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법을 제공하는 것이다.The present invention is to solve the problems of the prior art as described above, and a base station with limited energy supply that allows the base station to efficiently use energy supplied appropriately for changing external energy and communication channel conditions maximizes the sum transmission rate. It is to provide a rate division multiple access method to perform.

본 발명의 제1 특징에 따른 에너지 공급이 제한된 기지국이 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법은, 제1 상태의 상기 기지국 정보를 기반으로 제1 송신전력을 결정하는 단계; 복수의 단말들에게 전송할 데이터를 공유 및 개인 메시지로 분할하고, 상기 공유 및 개인 메시지를 각각 공유 및 개인 스트림으로 부호화하는 단계; 상기 결정된 송신전력 내에서 상기 부호화된 공유 및 개인 스트림 각각에 대한 전력할당 및 빔포밍 설계를 하는 단계; 상기 복수의 단말들에게 상기 데이터를 송신하고 합 전송률을 수신하는 단계; 및 상기 수신한 합 전송률 및 제2 상태의 상기 기지국 정보를 기반으로 제2 송신전력을 결정하는 단계를 포함하는 것을 특징으로 한다.According to a first aspect of the present invention, a rate division multiple access method for maximizing the sum transmission rate of a base station with limited energy supply includes determining a first transmit power based on information of the base station in a first state; dividing data to be transmitted to a plurality of terminals into shared and private messages, and encoding the shared and private messages into shared and private streams, respectively; performing power allocation and beamforming design for each of the coded shared and private streams within the determined transmission power; transmitting the data to the plurality of terminals and receiving a sum transmission rate; and determining a second transmission power based on the received sum rate and the base station information in a second state.

본 발명의 제2 특징에 따른 에너지 공급이 제한된 통신 환경에서 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속을 사용하는 기지국은, 복수의 단말들과 신호를 송수신하는 송수신부; 상기 송수신부와 접속되고 상기 기지국의 동작을 제어하는 프로세서; 및 메모리를 포함하고, 상기 프로세서는, 제1 상태의 상기 기지국 정보를 기반으로 제1 송신전력을 결정하는 동작; 상기 복수의 단말들에게 전송할 데이터를 공유 및 개인 메시지로 분할하고, 상기 공유 및 개인 메시지를 각각 공유 및 개인 스트림으로 부호화하는 동작; 상기 결정된 송신전력 내에서 상기 부호화된 공유 및 개인 스트림 각각에 대한 전력할당 및 빔포밍 설계를 하는 동작; 상기 복수의 단말들에게 상기 데이터를 송신하고 합 전송률을 수신하는 동작; 및 상기 수신한 합 전송률 및 제2 상태의 상기 기지국 정보를 기반으로 제2 송신전력을 결정하는 동작을 수행하는 것을 특징으로 한다.A base station using rate division multiple access for maximizing a sum rate in a communication environment in which energy supply is limited according to a second feature of the present invention includes a transceiver for transmitting and receiving signals to and from a plurality of terminals; a processor connected to the transceiver and controlling an operation of the base station; and a memory, wherein the processor determines a first transmit power based on the base station information in a first state; dividing data to be transmitted to the plurality of terminals into shared and personal messages, and encoding the shared and personal messages into shared and personal streams, respectively; performing power allocation and beamforming design for each of the coded shared and private streams within the determined transmit power; transmitting the data to the plurality of terminals and receiving a sum transmission rate; and determining a second transmit power based on the received sum rate and the base station information in the second state.

본 발명의 제3 특징에 따른 에너지 공급이 제한된 통신 환경에서의 합 전송률 최대화를 위한 전송률 분할 다중접속을 제공하기 위해 디지털 처리 장치에 의해 실행될 수 있는 명령어들의 프로그램이 유형적으로 구현되어 있으며, 디지털 처리 장치에 의해 판독될 수 있는 기록매체는, 본 발명의 제1 특징에 따른 전송률 분할 다중접속 방법을 컴퓨터에서 실행시키기 위한 프로그램이 기록된 것을 특징으로 한다.In order to provide transmission rate division multiple access for maximizing the sum transmission rate in a communication environment with limited energy supply according to the third aspect of the present invention, a program of instructions executable by a digital processing device is tangibly implemented, and the digital processing device The recording medium, which can be read by , is characterized in that a program for executing the transmission rate division multiple access method according to the first aspect of the present invention in a computer is recorded.

본 발명의 실시 예에 따른 에너지 공급이 제한된 기지국이 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법은 다음과 같은 효과를 제공한다.The rate division multiple access method for maximizing the sum rate of a base station with limited energy supply according to an embodiment of the present invention provides the following effects.

기지국이 복수의 단말들에게서 수신한 합 전송률 및 기지국 상태 정보를 기반으로 심층강화학습을 사용하여 송신전력을 결정하고 결정된 송신전력 내에서 최적화 기법을 사용하여 부호화된 공유 및 개인 스트림 각각에 대한 전력할당 및 빔포밍 설계를 함으로써 에너지 공급이 제한된 기지국이 변화하는 외부 에너지 및 통신 채널 상태에 적합하도록 공급받은 에너지를 효율적으로 사용 가능하다.The base station determines the transmission power using deep reinforcement learning based on the sum transmission rate and base station state information received from the plurality of terminals, and allocates power to each of the coded shared and private streams using an optimization technique within the determined transmission power And by designing beamforming, a base station with limited energy supply can efficiently use the supplied energy to suit the changing external energy and communication channel conditions.

상술한 효과는 기지국이 불가피하게 에너지 공급이 제한될 수 밖에 없는 이동 기지국인 경우 더욱 큰 장점이 되며, 이러한 이동 기지국은 셀과 셀 사이를 이어주는 역할을 하여 네트워크 연결성의 확보를 보다 용이하게 해 주며, 기존에 기지국이 존재하지 않아 통신이 제한된 지역까지 서비스 지원이 가능하다.The above effect is a greater advantage when the base station is a mobile base station inevitably limited in energy supply, and such a mobile base station plays a role in connecting cells to cells, making it easier to secure network connectivity, It is possible to support services even in areas where communication is restricted because there is no existing base station.

전송률 분할 다중접속기술을 사용함으로써, 다중 사용자 MIMO 환경에서 사용자들의 채널 상태 정보를 정확히 알지 못하더라도 동일한 주파수 대역을 사용하여 사용자들에게 신뢰성 있는 서비스를 제공하는 것이 가능하다.By using rate-division multiple access technology, it is possible to provide reliable services to users using the same frequency band even if channel state information of users is not accurately known in a multi-user MIMO environment.

도 1은 본 발명의 일 실시 예에 따른 에너지 공급이 제한된 기지국이 심층강화학습을 기반으로 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법에 대한 플로우차트이다.
도 2는 본 발명의 일 실시 예에 따른 에너지 공급이 제한된 통신 환경에서 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속을 사용하는 기지국을 나타내는 블록도이다.
도 3은 본 발명의 일 실시 예에 따른 에너지 공급이 제한된 기지국이 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법이 수행되는 일련의 과정을 개략적으로 도시한 도면이다.
도 4는 본 발명의 일 실시 예에 따른 전송률 분할 다중접속기술을 포함한 다중접속기술들의 성능을 비교한 그래프이다.
도 5는 본 발명의 일 실시 예에 따른 에너지 공급이 제한된 통신 환경에서 심층강화학습이 활용된 전송률 분할 다중 접속기술을 포함한 다중접속기술들의 성능을 비교한 그래프이다.1 is a flowchart of a rate division multiple access method for maximizing a sum rate based on deep reinforcement learning in a base station with limited energy supply according to an embodiment of the present invention.
2 is a block diagram illustrating a base station using rate division multiple access for maximizing the sum rate in a communication environment where energy supply is limited according to an embodiment of the present invention.
FIG. 3 is a diagram schematically illustrating a series of processes in which a rate division multiple access method for maximizing the sum rate is performed by a base station with limited energy supply according to an embodiment of the present invention.
4 is a graph comparing performance of multiple access technologies including rate division multiple access technology according to an embodiment of the present invention.
5 is a graph comparing performance of multiple access technologies including rate division multiple access technology using deep reinforcement learning in a communication environment where energy supply is limited according to an embodiment of the present invention.

이하, 본 발명에 대해서 실시예 및 도면을 참조하여 구체적으로 설명한다. 그러나, 이하의 설명은 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.Hereinafter, the present invention will be described in detail with reference to examples and drawings. However, the following description is not intended to limit the present invention to specific embodiments, and in describing the present invention, if it is determined that the detailed description of related known technologies may obscure the gist of the present invention, the detailed description will be omitted. .

본 발명의 실시 예들에서 기지국은 일반적으로 무선기기와 통신하는 고정된 지점(fixed station)을 말하나 이동성을 가질 수도 있으며, eNB(evolved-NodeB), BTS(base transceiver system) 등의 용어로 대체될 수 있다.In embodiments of the present invention, a base station generally refers to a fixed station that communicates with a wireless device, but may have mobility, and may be replaced by terms such as eNB (evolved-NodeB) and BTS (base transceiver system). there is.

또한, 본 발명의 실시 예들에서 단말은 고정되거나 이동성을 가질 수 있으며, 사용자 기기(UE: User Equipment), 이동국(MS: Mobile Station), 가입자 단말(SS: Subscriber Station), 이동 가입자 단말(MSS: Mobile Subscriber Station), 이동단말(Mobile Terminal) 또는 발전된 이동단말(AMS: Advanced Mobile Station) 등의 용어로 대체될 수 있다.In addition, in embodiments of the present invention, a terminal may be fixed or mobile, and may include a user equipment (UE), a mobile station (MS), a subscriber station (SS), and a mobile subscriber station (MSS). Mobile Subscriber Station), Mobile Terminal, or Advanced Mobile Station (AMS).

도 1은 본 발명의 일 실시 예에 따른 에너지 공급이 제한된 기지국이 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법에 대한 플로우차트이고, 도 2는 본 발명의 일 실시 예에 따른 에너지 공급이 제한된 통신 환경에서 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속을 사용하는 기지국을 나타내는 블록도이다.1 is a flow chart of a rate division multiple access method for maximizing the sum transmission rate of a base station with limited energy supply according to an embodiment of the present invention, and FIG. It is a block diagram showing a base station using rate division multiple access for maximizing the sum rate in a communication environment.

도 1을 참조하면, 본 발명의 일 실시 예에 따른 에너지 공급이 제한된 기지국(200)이 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법(100)은, 제1 상태의 상기 기지국 정보를 기반으로 제1 송신전력을 결정하는 단계(110); 복수의 단말들(300)에게 전송할 데이터를 공유 및 개인 메시지로 분할하고, 상기 공유 및 개인 메시지를 각각 공유 및 개인 스트림으로 부호화하는 단계(120); 상기 결정된 송신전력 내에서 상기 부호화된 공유 및 개인 스트림 각각에 대한 전력할당 및 빔포밍 설계를 하는 단계(130); 상기 복수의 단말들(300)에게 상기 데이터를 송신하고 합 전송률을 수신하는 단계(140); 및 상기 수신한 합 전송률 및 제2 상태의 상기 기지국 정보를 기반으로 제2 송신전력을 결정하는 단계(150);를 포함하는 것을 특징으로 한다.Referring to FIG. 1, a rate division multiple access method 100 for maximizing the sum rate of a base station 200 with limited energy supply according to an embodiment of the present invention is based on the base station information in the first state Determining a first transmit power (110); Dividing data to be transmitted to a plurality of terminals 300 into shared and personal messages, and encoding the shared and personal messages into shared and personal streams (120); performing power allocation and beamforming design for each of the coded shared and private streams within the determined transmit power (130); transmitting the data to the plurality of terminals 300 and receiving a sum transmission rate (140); and determining (150) a second transmit power based on the received sum rate and the base station information in the second state.

도 2를 참조하면, 본 발명의 일 실시 예에 따른 에너지 공급이 제한된 통신 환경에서 심층강화학습을 기반으로 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속을 사용하는 기지국(200)은, 복수의 단말들(300)과 신호를 송수신하는 송수신부(210); 상기 송수신부(210)와 접속되고 상기 기지국(200)의 동작을 제어하는 프로세서(220); 및 메모리(230)를 포함하고, 상기 프로세서(220)는, 제1 상태의 상기 기지국 정보를 기반으로 제1 송신전력을 결정하는 동작(110); 상기 복수의 단말들(300)에게 전송할 데이터를 공유 및 개인 메시지로 분할하고, 상기 공유 및 개인 메시지를 각각 공유 및 개인 스트림으로 부호화하는 동작(120); 상기 결정된 송신전력 내에서 상기 부호화된 공유 및 개인 스트림 각각에 대한 전력할당 및 빔포밍 설계를 하는 동작(130); 상기 복수의 단말들(300)에게 상기 데이터를 송신하고 합 전송률을 수신하는 동작(140); 및 상기 수신한 합 전송률 및 제2 상태의 상기 기지국 정보를 기반으로 제2 송신전력을 결정하는 동작(150);을 수행하는 것을 특징으로 한다.Referring to FIG. 2, a base station 200 using rate division multiple access for maximizing a sum rate based on deep reinforcement learning in a communication environment with limited energy supply according to an embodiment of the present invention includes a plurality of terminals. a transceiver 210 for transmitting and receiving signals with the field 300; a processor 220 connected to the transceiver 210 and controlling an operation of the base station 200; and a memory 230, wherein the processor 220 includes an operation 110 of determining a first transmission power based on the base station information in a first state; dividing data to be transmitted to the plurality of terminals 300 into shared and personal messages, and encoding the shared and personal messages into shared and personal streams (120); performing power allocation and beamforming design for each of the coded shared and private streams within the determined transmit power (130); an operation 140 of transmitting the data to the plurality of terminals 300 and receiving a sum transmission rate; and an operation 150 of determining a second transmit power based on the received sum rate and the base station information in the second state.

기지국(200)은 기지국 상태 정보를 기반으로 송신전력을 결정한다. 본 발명의 일 실시 예에 있어서, 기지국의 상태 정보는 기지국에 수확된 에너지의 양, 기지국에 저장되어 있는 에너지의 양 및 통신하는 복수의 단말들(300)에게 피드백 받은 채널 정보를 포함할 수 있다.The base station 200 determines transmission power based on base station status information. In one embodiment of the present invention, the state information of the base station may include the amount of energy harvested in the base station, the amount of energy stored in the base station, and channel information fed back from a plurality of communicating terminals 300. .

이러한 송신전력의 결정은 기지국(200)의 프로세서(220)가 송수신부(210) 및 메모리(230)를 통해 기지국의 상태 정보를 수신하여 심층강화학습 알고리즘을 수행하여 이루어지게 된다.The determination of the transmit power is performed by the processor 220 of the base station 200 receiving state information of the base station through the transceiver 210 and the memory 230 and performing a deep reinforcement learning algorithm.

본 발명의 일 실시 예에 있어서, 심층강화학습 알고리즘은 머신러닝 기반의 모델로서 강화 학습(reinforcement learning) 기반의 모델일 수 있다. 강화 학습 기반의 모델은 관리자의 개입 없이 학습에 따라 지속적으로 또는 주기적으로 업데이트될 수 있다.In an embodiment of the present invention, the deep reinforcement learning algorithm is a machine learning-based model and may be a reinforcement learning-based model. Reinforcement learning-based models can be continuously or periodically updated as they learn without manager intervention.

강화 학습 기반의 모델은, 에이전트(agent)가 환경(environment)로부터 액션(action)을 취하며, 이에 따라 에이전트에 보상(reward)과 State(상태)가 주어지게 되고, 보상이 가장 크게 되도록 에이전트를 훈련시키는 모델일 수 있다. 강화 학습 기반의 모델은, 에이전트가 사전 지식을 가지고 있는지 여부에 따라, 모델 베이스(model-based) 알고리즘을 사용하는 모델 또는 모델 프리(model-free) 알고리즘을 사용하는 모델로 구분될 수 있다.In the reinforcement learning-based model, an agent takes an action from the environment, and accordingly, a reward and a state are given to the agent, and the agent is selected so that the reward is the largest. It can be a training model. A model based on reinforcement learning may be classified into a model using a model-based algorithm or a model using a model-free algorithm, depending on whether an agent has prior knowledge.

심층 강화 학습은, 상술한 바와 같은 강화 학습을 사용하여 실제 환경과 같은 연속적이고 복잡한 상태 공간과 행동 공간에서의 문제를 해결하기 위해 심층 신경망(deep neural network)을 사용하는 알고리즘이다. 본 발명의 일 실시 예에 있어서, 에이전트는 기지국, 환경은 에너지 공급이 제한된 전송률 분할 다중접속 통신 환경, 행동은 기지국에서 사용자로의 송신, 보상은 합 전송률, 상태는 수확된 에너지의 양, 기지국에 저장되어 있는 에너지의 양 및 통신하는 복수의 단말들에게 피드백 받은 채널 정보로 설정할 수 있다.Deep reinforcement learning is an algorithm that uses a deep neural network to solve a problem in a continuous and complex state space and action space like a real environment using reinforcement learning as described above. In an embodiment of the present invention, the agent is a base station, the environment is a rate division multiple access communication environment with limited energy supply, the action is transmission from the base station to the user, the compensation is the sum transmission rate, the state is the amount of energy harvested, and the base station It can be set based on the amount of stored energy and channel information fed back from a plurality of communicating terminals.

본 발명의 일 실시 예에 있어서, 송신전력의 결정에 사용되는 심층강화학습 알고리즘은 SAC(soft actor critic) 알고리즘일 수 있다.In an embodiment of the present invention, a deep reinforcement learning algorithm used to determine transmission power may be a soft actor critic (SAC) algorithm.

기지국(200)은 전송률 분할 다중접속기술을 사용하여 복수의 단말들(300)에게 전송할 데이터를 공유 메시지(Common message) 및 개인 메시지(Private message)의 2종류로 분할한다. 공유 메시지는 모든 사용자가 공유하는 코드북(Codebook)을 사용하여 1개의 공유 스트림으로 부호화(encoding)되며 이는 모든 사용자가 복호화(decoding)할 수 있다. 개인 메시지는 각 메시지에 해당하는 특정 사용자만 가지는 코드북을 통해 개인 스트림으로 부호화되며 상기 특정 사용자에 의해서만 복호화 가능하다.The base station 200 divides data to be transmitted to a plurality of terminals 300 into two types, a common message and a private message, using rate division multiple access technology. A shared message is encoded into one shared stream using a codebook shared by all users, which can be decoded by all users. A personal message is encoded into a personal stream through a codebook that only a specific user corresponding to each message has, and can be decoded only by the specific user.

사용자는 공유 스트림을 복호화하고 공유 메시지 중 자신의 메시지에 해당하는 부분을 얻는다. 그리고 순차적 간섭제거 기법(SIC: successive interference cancellation)을 통해 수신 신호에서 공유 스트림을 제거한다. 순차적 간섭제거 기술은 수신기 측에서 다중 수신신호를 동시에 처리하는데 사용된다. 이 기술은 대상 신호들 중에서 신호 강도 차이를 이용하여 수신기에서 신호를 처리하도록 한다. 즉, 수신기는 보다 강한 신호를 복호화 한 후 중첩된 신호로부터 강한 신호를 먼저 추출하고, 이후 나머지 신호로부터 약한 신호를 제거한다.The user decrypts the shared stream and obtains the portion of the shared message that corresponds to his/her own message. In addition, the shared stream is removed from the received signal through successive interference cancellation (SIC). The sequential interference cancellation technique is used to simultaneously process multiple received signals at the receiver side. This technique allows a receiver to process a signal using a difference in signal strength among target signals. That is, the receiver decodes the stronger signal, first extracts the stronger signal from the superimposed signal, and then removes the weaker signal from the remaining signals.

공유 스트림 제거 후 사용자는 다른 사용자에 해당하는 개인 스트림으로 인한 간섭만을 가지게 되며 이를 잡음으로 처리하여 원하는 개인 스트림을 복호화 한다. 이러한 순차적 간섭제거 기법을 사용한 공유 스트림과 개인 스트림의 복호화를 통해 각 사용자는 자신에게 해당하는 원래의 메시지를 얻을 수 있게 된다.After removing the shared stream, the user has only interference caused by the personal stream corresponding to other users, and treats it as noise to decode the desired personal stream. Through decoding of a shared stream and a private stream using the sequential interference cancellation technique, each user can obtain an original message corresponding to himself/herself.

따라서, 기지국은 서비스를 제공받을 사용자들의 채널 상태 정보를 정확하게 알지 못하는 상황(예를 들어, 양자화 에러, 채널 상태 변경 등에 기인한 경우)에서 공유 메시지를 활용하여 사용자들에게 전송할 메시지의 일부를 멀티캐스트(multicast)로 전송하여 인접한 사용자 간의 신호 간섭을 최소화할 수 있다.Therefore, the base station multicasts a part of the message to be transmitted to the users by utilizing the shared message in a situation where the channel state information of the users to be provided with service is not accurately known (eg, due to quantization error, channel state change, etc.) (multicast) to minimize signal interference between adjacent users.

기지국은 분할한 메시지를 다수의 사용자들에게 전송하기 위해 프리코더(precoder)를 사용하여 하나의 공유 스트림에 대해서 전력할당 및 빔포밍(beamforming) 설계를 하고, 각각의 개인 스트림에 대해서도 전력할당 및 빔포밍 설계를 한다. 이 경우 공유 스트림의 전력할당 및 빔포밍 설계와 각각의 개인 스트림의 전력할당은 비볼록(non-convex)한 문제에 해당한다.The base station performs power allocation and beamforming design for one shared stream using a precoder to transmit the divided message to multiple users, and power allocation and beamforming for each individual stream. Do a foam design. In this case, power allocation and beamforming design of shared streams and power allocation of individual streams correspond to non-convex problems.

볼록(convex)한 문제는 하나의 최소값(global minimum) 또는 하나의 최대값(global maximum)이 존재하므로 이들을 계산하기가 상대적으로 수월하나, 삼차식과 같은 비볼록한 문제의 경우는 최소값 또는 최대값에 추가적으로 극소값(local minimum) 및 극대값(local maximum)을 가지므로 최소값 또는 최대값의 계산이 상대적으로 어렵다. 예를 들어, 비볼록한 문제에 있어서, 볼록한 문제를 푸는 대표적인 방법인 경사 하강법(gradient descent)과 같은 방식들을 사용하여 문제를 푸는 경우, 최소값 또는 최대값으로 수렴하지 않고 극소값 또는 극대값에 빠지게 되어 구하고자 했던 최소값 또는 최대값을 산출하지 못하는 경우가 발생할 수 있다.Convex problems have one minimum (global minimum) or one maximum (global maximum), so it is relatively easy to calculate them, but in the case of a non-convex problem, such as a cubic equation, additional Since it has a local minimum and a local maximum, it is relatively difficult to calculate the minimum or maximum value. For example, in a non-convex problem, when solving a problem using methods such as gradient descent, which is a representative method for solving convex problems, it does not converge to the minimum or maximum value, but falls into the minimum or maximum value, There may be cases where the desired minimum or maximum value cannot be calculated.

이에 따라, 본 발명의 일 실시 예에 있어서, 부호화된 공유 및 개인 스트림 각각에 대한 전력할당 및 빔포밍 설계는, 개인 스트림의 빔포밍 설계는 제로포싱(Zero-Forcing) 기법을 이용하여 우선적으로 수행하고 개인 스트림의 전력할당, 공유 스트림의 전력할당 및 공유 스트림의 빔포밍 설계는 최적화 알고리즘을 이용하여 수행할 수 있다.Accordingly, in an embodiment of the present invention, power allocation and beamforming design for each of the coded shared and private streams are preferentially performed using a zero-forcing technique. In addition, power allocation of personal streams, power allocation of shared streams, and beamforming design of shared streams may be performed using an optimization algorithm.

본 발명의 일 실시 예에 있어서, 개인 스트림의 전력할당, 공유 스트림의 전력할당 및 공유 스트림의 빔포밍 설계에 사용되는 최적화 알고리즘은 SLSQP(sequential least squares programming) 알고리즘일 수 있다.In an embodiment of the present invention, an optimization algorithm used for power allocation of a personal stream, power allocation of a shared stream, and beamforming design of a shared stream may be a sequential least squares programming (SLSQP) algorithm.

본 발명의 일 실시 예에 있어서, SLSQP 알고리즘은 개인 스트림의 전력할당, 공유 스트림의 전력할당 및 공유 스트림의 빔포밍 설계를 단순화해 목적함수를 볼록한 이차식으로 근사하여 계산하고 다음 지점을 예측하여 다시 동일한 방법을 수행하는 일련의 과정을 반복하면서 문제를 풀어냄으로써 비볼록/비선형의 문제를 최적으로 해결할 수 있다.In an embodiment of the present invention, the SLSQP algorithm simplifies power allocation of private streams, power allocation of shared streams, and beamforming design of shared streams, approximates the objective function with a convex quadratic equation, calculates the next point, and predicts the next point again. It is possible to optimally solve a non-convex/non-linear problem by repeating a series of processes of performing the same method and solving the problem.

본 발명의 일 실시 예에 있어서, SLSQP 알고리즘은 개인 스트림 및 공유 스트림의 전력할당에 관하여는 기지국이 전력할당을 심층강화학습의 결과로 결정된 송신전력보다 크게 설정할 수 없다는 제약 조건, 및 전송하고자 하는 모든 메시지들의 전력할당 비율의 합이 1보다 작거나 같아야 한다는 상하한 조건 하에서 동작되고, 공유 스트림의 빔포밍 설계에 관하여는 공유 스트림의 정규화된 빔포밍 벡터의 L2 norm의 제곱이 1보다는 작거나 같아야 한다는 조건 하에서 동작한다.In an embodiment of the present invention, the SLSQP algorithm is a constraint condition that the base station cannot set the power allocation higher than the transmit power determined as a result of deep reinforcement learning with respect to power allocation of the personal stream and the shared stream, and all desired transmission It operates under the upper and lower limit conditions that the sum of the power allocation ratios of messages must be less than or equal to 1, and regarding the beamforming design of the shared stream, the square of the L2 norm of the normalized beamforming vector of the shared stream must be less than or equal to 1 operate under the conditions

도 3은 본 발명의 일 실시 예에 따른 에너지 공급이 제한된 기지국이 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법이 수행되는 일련의 과정을 개략적으로 도시한 도면이다.FIG. 3 is a diagram schematically illustrating a series of processes in which a rate division multiple access method for maximizing the sum rate is performed by a base station with limited energy supply according to an embodiment of the present invention.

도 3과 함께 다시 도 2를 참조하면, 기지국(200)은 송수신부(210), 프로세서(220) 및 메모리(230)를 포함한다. 프로세서(220)는 본 발명에서 제안한 절차 및/또는 방법들을 구현하도록 구성될 수 있다. 메모리(230)는 프로세서(220)와 연결되고 프로세서(220)의 동작과 관련한 다양한 정보를 저장한다. 송수신부(210)는 프로세서(220)와 연결되고 복수의 단말들(300)과 통신하여 무선 신호를 송신 및/또는 수신한다.Referring again to FIG. 2 together with FIG. 3 , the base station 200 includes a transceiver 210 , a processor 220 and a memory 230 . The processor 220 may be configured to implement the procedures and/or methods proposed in the present invention. The memory 230 is connected to the processor 220 and stores various information related to the operation of the processor 220 . The transceiver 210 is connected to the processor 220 and communicates with the plurality of terminals 300 to transmit and/or receive radio signals.

도 3에서, B_i는 i번째 단계에서 기지국에 저장되어 있는 에너지의 양, E_i는 i번째 단계에서 수확된 에너지의 양, h_i는 i번째 단계에서 복수의 단말들이 기지국에 피드백할 채널 상태 정보, H_i는 i번째 단계에서 기지국이 복수의 단말들에게 피드백 받은 채널 상태 정보, 그리고 e_i는 i번째 단계에서 발생하는 채널 상태 정보 손실을 나타낸다.In FIG. 3, B _i is the amount of energy stored in the base station in the ith step, E _i is the amount of energy harvested in the ith step, and h _i is the channel state to be fed back to the base station by a plurality of terminals in the ith step. Information H _i represents channel state information fed back by the base station from a plurality of terminals in the i-th step, and e _i represents channel state information loss occurring in the i-th step.

본 발명의 일 실시 예에 있어서, 기지국(200)은 복수의 단말들(300)에게 데이터를 송신하고 합 전송률(10)을 수신할 수 있다. 또한 상기 수신한 합 전송률(10)과 다음 상태의 기지국 정보(20)를 기반으로 송신전력을 다시 결정할 수 있다.In one embodiment of the present invention, the base station 200 may transmit data to a plurality of terminals 300 and receive the sum data rate 10 . In addition, transmission power may be determined again based on the received sum rate 10 and base station information 20 in the next state.

도 3을 참조하면, 다음 상태의 기지국 정보(20)는 i번째 단계에서 기지국에 저장되어 있는 에너지의 양(B_i), i번째 단계에서 수확된 에너지의 양(E_i), i번째 단계에서 피드백 받은 채널 정보 상태(H_i)를 포함하는 정보이다.Referring to FIG. 3, base station information 20 in the next state includes the amount of energy stored in the base station at the i-th step (B _i ), the amount of energy harvested at the ith step (E _i ), and the i-th step. This is information including the feedback received channel information state (H _i ).

송신전력의 결정에는 심층강화학습 알고리즘이 사용될 수 있고, 본 발명의 일 실시 예에 있어서, 상기 심층강화학습 알고리즘은 SAC(soft actor critic) 알고리즘(30)일 수 있다.A deep reinforcement learning algorithm may be used to determine the transmission power, and in an embodiment of the present invention, the deep reinforcement learning algorithm may be a soft actor critic (SAC) algorithm 30 .

기지국(200)은 기지국에 저장되어 있는 에너지의 양, 수확된 에너지의 양, 피드백 받은 채널 정보 상태를 기반으로 사용할 송신전력을 결정한다. 이후 기지국(200)은 통신 환경(즉, 복수의 단말들(300))으로부터 합 전송률(10)을 보상으로 받아 수행했던 행동에 대한 평가를 받으며, 동시에 다음 상태의 기지국 정보(20)를 전달받게 된다. 에이전트는 이를 참고하여 보다 큰 보상을 받을 수 있는 방향으로 일련의 학습 과정을 반복 진행하여 최적의 송신전력을 결정할 수 있게 된다.The base station 200 determines transmission power to be used based on the amount of energy stored in the base station, the amount of harvested energy, and the state of channel information received as feedback. Thereafter, the base station 200 receives the sum transmission rate 10 from the communication environment (ie, the plurality of terminals 300) as a reward, evaluates the action performed, and at the same time receives base station information 20 of the next state. do. Referring to this, the agent can determine the optimal transmission power by repeating a series of learning processes in a direction that can receive a larger reward.

이 경우, 기지국에 저장되어 있는 에너지의 양은 소정 조건을 만족해야 하고 기지국에 저장되어 있는 에너지의 양을 단계별로 b1 및 b2, 기지국의 최대 저장 가능 에너지의 양을 max b, 송신전력을 p, 전송시간을 T, 수확된 에너지의 양을 E라고 하면 그 조건은 하기와 같다.In this case, the amount of energy stored in the base station must satisfy certain conditions, and the amount of energy stored in the base station is b1 and b2 step by step, max b is the maximum amount of energy that can be stored in the base station, p is the transmit power, and transmission If the time is T and the amount of harvested energy is E, the conditions are as follows.

b2 = m(b1, p, E) = min(max b, b1-pT+E)b2 = m(b1, p, E) = min(max b, b1-pT+E)

따라서, 본 발명의 일 실시 예에 있어서, 제2 상태의 기지국에 저장되어 있는 에너지의 양은, (a) 기지국 보유 가능 최대 에너지의 양, (b) 제1 상태의 기지국에 저장되어 있는 에너지의 양에서 제1 상태의 결정된 송신전력 및 데이터가 전송된 시간의 곱을 빼고, 제2 상태의 기지국에 수확된 에너지의 양을 더한 에너지의 양, 인 경우에 상기 (a) 및 (b) 중 더 작은 에너지의 양으로 결정될 수 있다.Therefore, in one embodiment of the present invention, the amount of energy stored in the base station in the second state is: (a) the amount of maximum energy that the base station can hold, (b) the amount of energy stored in the base station in the first state The amount of energy obtained by subtracting the product of the determined transmit power in the first state and the time at which the data is transmitted, and adding the amount of energy harvested to the base station in the second state, in the case of (a) and (b), the smaller energy can be determined by the amount of

심층강화학습 알고리즘인 SAC(soft actor critic) 알고리즘(30)을 통해 다시 결정된 송신전력(40)은 i번째 단계에서 부호화된 공유 및 개인 스트림 각각에 대한 전력할당 및 빔포밍 설계를 하도록 사용되고, 이러한 동작의 수행에 있어서 최적화 기법인 SLSQP(sequential least squares programming) 알고리즘(50)이 사용되며, i번째 단계에서 최적의 프리코더를 이용하여 다수의 사용자들인 복수의 단말들(300)에게 데이터를 전송하게 된다.The transmit power 40 determined again through the soft actor critic (SAC) algorithm 30, which is a deep reinforcement learning algorithm, is used to design power allocation and beamforming for each of the shared and private streams encoded in the i-th step, and these operations In the performance of SLSQP (sequential least squares programming) algorithm 50, which is an optimization technique, is used, and data is transmitted to a plurality of terminals 300, which are a plurality of users, by using an optimal precoder in the i-th step. .

이후 기지국(200)은 복수의 단말들(300)로부터 새로운 합 전송률을 보상으로 받아 수행했던 행동에 대한 평가를 받게 되고, 상술한 바와 같이 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법은 반복적으로 수행될 수 있다.Then, the base station 200 receives the new sum rate from the plurality of terminals 300 as a reward and evaluates the action performed. As described above, the rate division multiple access method for maximizing the sum rate repeatedly can be performed

본 발명의 일 실시 예에 있어서, 에너지 공급이 제한된 통신 환경의 기지국은 이동 기지국일 수 있다. 드론형 기지국 플랫폼은 셀과 셀 사이를 이어주는 역할을 하여 네트워크 연결성의 확보를 보다 용이하게 해 주며, 기재국이 존재하지 않아서 통신이 제한된 지역까지도 서비스를 지원할 수 있도록 해 준다. 또한, 기지국의 고도를 높여 가시선전파(line of sight)를 최대한 확보하여 고주파 대역 통신에 있어서 쉐도잉 등에 취약한 면을 보완할 수 있고, 자연 재해 등의 이유로 기존의 기지국이 파괴된 경우에도 지속적인 서비스 제공이 가능하다.In one embodiment of the present invention, a base station in a communication environment where energy supply is limited may be a mobile base station. The drone-type base station platform makes it easier to secure network connectivity by serving as a link between cells, and enables service to be supported even in areas where communication is restricted because base stations do not exist. In addition, by raising the height of the base station to secure the maximum line of sight, it is possible to compensate for weaknesses in high-frequency band communication such as shadowing, and to provide continuous service even when the existing base station is destroyed due to natural disasters. this is possible

이동 기지국은 제한된 송신전력으로 인해 태양에너지, RF신호 등의 외부 에너지를 통해 에너지를 공급받으므로 이를 통신 채널 상태에 적합하게 효율적으로 사용해야 사용자에게 신뢰성 있는 서비스 제공이 가능한 바, 본 발명에 따른 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법을 통해 변화하는 외부 에너지 및 통신 채널 상태에 적합하게 공급받은 에너지를 효율적으로 사용 가능하게 된다.Since the mobile base station receives energy through external energy such as solar energy and RF signal due to limited transmission power, it is possible to provide reliable service to the user by using it efficiently and appropriately for the communication channel condition. The sum transmission rate according to the present invention Through the transmission rate division multiple access method for maximizing, it is possible to efficiently use energy supplied appropriately to changing external energy and communication channel conditions.

이하에서는 1) 전송률 분할 다중접속기술과 그 외의 다중접속기술들의 성능 비교, 및 2) 에너지 공급이 제한된 통신 환경에서 심층강화학습이 활용된 전송률 분할 다중 접속기술과 그 외의 다중접속기술들의 성능 비교에 대한 실험 과정 및 결과를 구체적으로 설명한다.In the following, 1) performance comparison of rate division multiple access technology and other multiple access technologies, and 2) performance comparison of rate division multiple access technology and other multiple access technologies using deep reinforcement learning in a communication environment with limited energy supply. The experimental process and results are explained in detail.

도 4는 본 발명의 일 실시 예에 따른 전송률 분할 다중접속기술을 포함한 다중접속기술들의 성능을 비교한 그래프이고, 도 5는 본 발명의 일 실시 예에 따른 에너지 공급이 제한된 통신 환경에서 심층강화학습이 활용된 전송률 분할 다중 접속기술을 포함한 다중접속기술들의 성능을 비교한 그래프이다.4 is a graph comparing performance of multiple access technologies including rate division multiple access technology according to an embodiment of the present invention, and FIG. 5 is deep reinforcement learning in a communication environment with limited energy supply according to an embodiment of the present invention. This is a graph comparing the performance of multiple access technologies including this utilized rate division multiple access technology.

실험에 대한 시뮬레이션 환경은 이동 기지국이 에너지 수확을 통해 50% 확률로 배터리 충전가능 용량만큼 외부의 에너지를 공급받은 뒤 다수의 사용자들에게 서비스를 제공해주는 통신 환경이다. 이 경우 사용자는 채널 상태 정보를 완전하게 알고 있으며 이를 기지국에게 피드백 해 주는 과정에서 양자화 에러, 채널 상태의 급격한 변화 등으로 인해 채널 상태 정보 손실이 발생한다. 이에 따라 기지국은 불완전한 채널 상태 정보를 피드백 받게 된다.The simulation environment for the experiment is a communication environment in which a mobile base station provides services to multiple users after receiving external energy as much as the battery's chargeable capacity with a 50% probability through energy harvesting. In this case, the user completely knows the channel state information, and in the process of feeding it back to the base station, loss of channel state information occurs due to quantization errors, rapid changes in channel state, and the like. Accordingly, the base station receives incomplete channel state information as feedback.

도 4 및 도 5의 그래프들에서, x축은 학습을 진행한 횟수(learning iteration)이고, 시뮬레이션 과정에서는 100 step을 1 iteration으로 설정하였다. y축은 1000 step이 진행될 때마다 학습된 에이전트가 정해진 검증 환경에 대해 평가한 성능의 상대적인 척도이다. 여기서, step은 에이전트와 환경의 상호작용을 의미한다.In the graphs of FIGS. 4 and 5, the x-axis is the number of learning iterations, and 100 steps were set to 1 iteration in the simulation process. The y-axis is a relative measure of the performance evaluated by the learned agent for the specified verification environment every 1000 steps. Here, step means interaction between agent and environment.

검증 환경은 학습 환경과 동일하게 설정되었다. 이동 기지국은 동일하게 에너지 수확을 통해 50% 확률로 배터리 충전가능 용량만큼 외부의 에너지를 공급받은 뒤 수확된 에너지의 양, 배터리에 남아있는 에너지의 양, 피드백 받은 채널 정보로 동일하게 구성된 상태 하에서 최적의 송신전력을 결정하여 다수의 사용자들에게 서비스를 제공한다. 이후 환경으로부터 보상 즉, 합 전송률을 수신한다. 성능의 비교에 있어서 보다 뚜렷한 가시화를 위해 환경으로부터 받은 보상에 1/10000을 곱해주어 값을 산출하여 그래프의 y축으로 도시되도록 설정하였으며 평가 과정의 일관성을 보장하기 위해 검증 모델은 일정하게 설정하였다.The verification environment was set identically to the learning environment. The mobile base station is optimally configured under the same configuration of the amount of energy harvested, the amount of energy remaining in the battery, and the feedback channel information after receiving external energy as much as the battery's chargeable capacity with a 50% probability through energy harvesting. Determines the transmit power of the system to provide services to multiple users. Then, it receives a reward from the environment, i.e., the sum rate. For clearer visualization in performance comparison, the reward received from the environment was multiplied by 1/10000 to calculate the value and set to be shown on the y-axis of the graph, and the verification model was set constant to ensure the consistency of the evaluation process.

다시 도 4를 참조하면, 다음의 4가지 경우에 대해 비교하였다. Equal(Equal Power Allocation Precoder) Greedy, WF(Water Filling Precoder) Greedy, SLSQP Greedy(기지국이 자신이 피드백 받은 불완전한 채널 상태정보가 불완전하다는 것을 인지하고 있는 경우), SLSQP Greedy(no-info σ_e)(기지국이 자신이 피드백 받은 불완전한 채널 상태정보가 완전하다고 인지하고 있는 경우).Referring back to FIG. 4 , the following four cases were compared. Equal (Equal Power Allocation Precoder) Greedy, WF (Water Filling Precoder) Greedy, SLSQP Greedy (when the base station recognizes that the incomplete channel status information it has received is incomplete), SLSQP Greedy (no-info σ _e ) ( When the base station recognizes that the incomplete channel state information it has received is complete).

Greedy란 심층강화학습을 적용하지 않은 경우라는 것을 의미한다. 즉, 기지국이 에너지를 수확하면 수확된 에너지, 배터리에 남아있는 에너지의 양 및 피드백 받은 채널 정보를 고려하지 않고 에너지를 수확하면 수확하는 대로 모두 사용해버리는 경우이다.Greedy means that deep reinforcement learning is not applied. That is, when the base station harvests energy, it is a case in which all of the harvested energy is used as soon as it is harvested without considering the harvested energy, the amount of energy remaining in the battery, and feedback channel information.

SLSQP Greedy와 SLSQP Greedy(no-info σ_e)는 메시지를 개인메시지와 공유메시지로 나누어 함께 전송하는 전송률 분할 다중접속기술을 사용한 결과이고, Equal Greedy와 WF Greedy는 메시지를 개인메시지와 공유메시지로 나누지 않고 유니캐스트(unicast) 방식으로만 메시지를 전송하고 있는 일반적인 다중접속기술이다.SLSQP Greedy and SLSQP Greedy (no-info σ _e ) are the results of using rate-division multiple access technology that divides messages into private messages and shared messages and transmits them together. Equal Greedy and WF Greedy do not divide messages into private messages and shared messages. It is a general multiple access technology that transmits messages only in a unicast method without

도 4의 실험 결과를 통해 메시지를 개인메시지와 공유메시지로 나누어 전송하는 전송률 분할 다중 접속기술이 공유 메시지를 사용하지 않는 다른 다중접속기술들에 비해 합 전송률 측면에서 우월한 성능을 보이고 있음을 알 수 있다.Through the experimental results of FIG. 4, it can be seen that the transmission rate division multiple access technology that transmits a message by dividing it into a personal message and a shared message shows superior performance in terms of the sum transmission rate compared to other multiple access technologies that do not use a shared message. .

다시 도 5를 참조하면, 다음의 8가지 경우에 대해 비교하였다. SAC+SLSQP, SAC+SLSQP(no-info σ_e), SAC+WF, SAC+Equal, Equal Greedy, WF Greedy, SLSQP Greedy, SLSQP Greedy(no-info σ_e).Referring back to FIG. 5 , the following 8 cases were compared. SAC+SLSQP, SAC+SLSQP(no-info σ _e ), SAC+WF, SAC+Equal, Equal Greedy, WF Greedy, SLSQP Greedy, SLSQP Greedy(no-info σ _{e )} .

도 5의 실험 결과를 통해 에너지 수확 환경에서 심층강화학습을 사용한 4가지 경우의 결과가 심층강화학습을 사용하지 않은 4가지 경우의 결과와 비교하여 합 전송률 측면에서 각각 우월한 성능을 보이고 있음을 확인할 수 있다. 또한, 심층강화학습을 사용한 경우에 대해서 Greedy한 경우 각각을 동일 조건 하에서 비교해 보면 전송률 분할 다중접속기술이 공유메시지를 사용하지 않는 다른 다중접속기술들에 비해 우월한 성능을 보이고 있음을 확인할 수 있다. 따라서, 본 발명에 따른 에너지 수확을 고려한 심층강화학습 기반 합 전송률 최대화를 위한 전송률 분할 다중접속 방법이 동일 조건의 통신 환경에서 기존의 다중접속기술들에 비해 우월한 성능을 보이고 있음을 알 수 있다.Through the experimental results of FIG. 5, it can be confirmed that the results of the 4 cases using deep reinforcement learning in the energy harvesting environment show superior performance in terms of sum transmission rate compared to the results of the 4 cases without using deep reinforcement learning. there is. In addition, comparing each Greedy case under the same conditions with respect to the case using deep reinforcement learning, it can be confirmed that the rate division multiple access technology shows superior performance compared to other multiple access technologies that do not use shared messages. Therefore, it can be seen that the transmission rate division multiple access method for maximizing the transmission rate based on deep reinforcement learning considering energy harvesting according to the present invention shows superior performance compared to existing multiple access technologies under the same communication environment.

지금까지 살펴본 바와 같이, 본 발명의 실시 예에 따른 에너지 공급이 제한된 기지국이 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법은 기지국이 복수의 단말들에게서 수신한 합 전송률 및 기지국 상태 정보를 기반으로 심층강화학습을 사용하여 송신전력을 결정하고 결정된 송신전력 내에서 최적화 기법을 사용하여 부호화된 공유 및 개인 스트림 각각에 대한 전력할당 및 빔포밍 설계를 함으로써 에너지 공급이 제한된 기지국이 변화하는 외부 에너지 및 통신 채널 상태에 적합하도록 공급받은 에너지를 효율적으로 사용 가능한 효과를 제공한다.As described above, the rate division multiple access method for maximizing the sum rate by a base station with limited energy supply according to an embodiment of the present invention is based on the sum rate and base station status information received by the base station from a plurality of terminals. Determining the transmission power using deep reinforcement learning, and using an optimization technique within the determined transmission power to design power allocation and beamforming for each coded shared and private stream, thereby changing external energy and communication in a base station with limited energy supply It provides an effect that can efficiently use the energy supplied to suit the channel state.

본 발명에 따른 실시예는 다양한 수단, 예를 들어, 하드웨어, 펌웨어(firmware), 소프트웨어 또는 그것들의 결합 등에 의해 구현될 수 있다. 하드웨어에 의한 구현의 경우, 본 발명의 일 실시예는 하나 또는 그 이상의 ASICs(application specific integrated circuits), DSPs(digital signal processors), DSPDs(digital signal processing devices), PLDs(programmable logic devices), FPGAs(field programmable gate arrays), 프로세서, 컨트롤러, 마이크로 컨트롤러, 마이크로 프로세서 등에 의해 구현될 수 있다.An embodiment according to the present invention may be implemented by various means, for example, hardware, firmware, software, or a combination thereof. In the case of hardware implementation, one embodiment of the present invention provides one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), FPGAs ( field programmable gate arrays), processors, controllers, microcontrollers, microprocessors, etc.

펌웨어나 소프트웨어에 의한 구현의 경우, 본 발명의 일 실시예는 이상에서 설명된 기능 또는 동작들을 수행하는 모듈, 절차, 함수 등의 형태로 구현될 수 있다. 소프트웨어 코드는 메모리 유닛에 저장되어 프로세서에 의해 구동될 수 있다. 상기 메모리 유닛은 상기 프로세서 내부 또는 외부에 위치하여, 이미 공지된 다양한 수단에 의해 상기 프로세서와 데이터를 주고 받을 수 있다.In the case of implementation by firmware or software, an embodiment of the present invention may be implemented in the form of a module, procedure, or function that performs the functions or operations described above. The software codes may be stored in a memory unit and driven by a processor. The memory unit may be located inside or outside the processor and exchange data with the processor by various means known in the art.

한편, 본 발명의 실시예들은 컴퓨터로 읽을 수 있는 기록 매체에 컴퓨터가 읽을 수 있는 코드로 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다.Meanwhile, the embodiments of the present invention can be implemented as computer readable codes in a computer readable recording medium. The computer-readable recording medium includes all types of recording devices in which data that can be read by a computer system is stored.

컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현하는 것을 포함한다. 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고 본 발명을 구현하기 위한 기능적인(functional) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술 분야의 프로그래머들에 의하여 용이하게 추론될 수 있다.Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc., and implementation in the form of a carrier wave (for example, transmission over the Internet) include In addition, the computer-readable recording medium may be distributed to computer systems connected through a network, so that computer-readable codes may be stored and executed in a distributed manner. In addition, functional programs, codes, and code segments for implementing the present invention can be easily inferred by programmers in the technical field to which the present invention belongs.

다양한 변형예가 본 발명의 범위를 벗어남이 없이 본 명세서에 기술되고 예시된 구성 및 방법으로 만들어질 수 있으므로, 상기 상세한 설명에 포함되거나 첨부 도면에 도시된 모든 사항은 예시적인 것으로 본 발명을 제한하기 위한 것이 아니다. 따라서, 본 발명의 범위는 상술한 예시적인 실시예에 의해 제한되지 않으며, 이하의 청구 범위 및 그 균등물에 따라서만 정해져야 한다.As various modifications may be made to the configurations and methods described and illustrated herein without departing from the scope of the present invention, all matter contained in the above detailed description or shown in the accompanying drawings is illustrative and not intended to limit the present invention. It is not. Accordingly, the scope of the present invention is not limited by the above-described exemplary embodiments, and should be defined only in accordance with the following claims and equivalents thereof.

10: i-1단계의 합 전송률 20: i단계의 기지국 상태 정보
30: SAC 알고리즘 40: i단계의 결정된 송신전력
50: SLSQP 알고리즘 200: 기지국
210: 송수신부 220: 프로세서
230: 메모리 300: 사용자 단말들10: sum transmission rate of step i-1 20: base station state information of step i
30: SAC algorithm 40: determined transmission power of step i
50: SLSQP algorithm 200: base station
210: transceiver 220: processor
230: memory 300: user terminals

Claims

In a rate division multiple access method for a base station with limited energy supply to maximize the sum rate,
determining a first transmission power based on the base station information in a first state;
dividing data to be transmitted to a plurality of terminals into shared and private messages, and encoding the shared and private messages into shared and private streams, respectively;
performing power allocation and beamforming design for each of the coded shared and private streams within the determined transmission power;
transmitting the data to the plurality of terminals and receiving a sum transmission rate; and
determining a second transmit power based on the received sum rate and the base station information in a second state;
Rate division multiple access method comprising a.

According to claim 1,
Power allocation and beamforming design for each of the encoded shared and private streams are performed first using a zero-forcing technique, and power allocation and beamforming design of the private streams are performed first. The transmission rate division multiple access method, characterized in that the power allocation of the stream and the beamforming design of the shared stream are performed using a first optimization algorithm.

According to claim 2,
The first optimization algorithm is a rate division multiple access method, characterized in that the SLSQP (sequential least squares programming) algorithm.

According to claim 1,
The base station information in the first and second states includes an amount of energy harvested in the base station, an amount of energy stored in the base station, and channel information fed back from the plurality of terminals,
The transmission power is determined by using a second deep reinforcement learning algorithm.

According to claim 4,
The amount of energy stored in the base station in the second state is
(a) the maximum amount of energy that can be possessed by the base station;
(b) subtracting the product of the determined transmission power in the first state and the time at which the data is transmitted from the amount of energy stored in the base station in the first state, and adding the amount of energy harvested in the base station in the second state amount of energy;
The transmission rate division multiple access method, characterized in that the determination is made with the smaller amount of energy among (a) and (b).

According to claim 4,
The second deep reinforcement learning algorithm is a rate division multiple access method, characterized in that the soft actor critic (SAC) algorithm.

According to claim 1,
The rate division multiple access method, characterized in that the base station is a mobile base station.

In a base station using rate division multiple access for maximizing the sum rate in a communication environment where energy supply is limited,
Transmitting and receiving unit for transmitting and receiving signals with a plurality of terminals;
a processor connected to the transceiver and controlling an operation of the base station; and a memory;
the processor,
determining a first transmit power based on the base station information in a first state;
dividing data to be transmitted to the plurality of terminals into shared and personal messages, and encoding the shared and personal messages into shared and personal streams, respectively;
performing power allocation and beamforming design for each of the coded shared and private streams within the determined transmit power;
transmitting the data to the plurality of terminals and receiving a sum transmission rate; and
determining a second transmit power based on the received sum rate and the base station information in a second state;
A base station that performs

According to claim 8,
Power allocation and beamforming design for each of the encoded shared and private streams are performed first using a zero-forcing technique, and power allocation and beamforming design of the private streams are performed first. The base station, characterized in that the power allocation of the stream and the beamforming design of the shared stream are performed using a first optimization algorithm.

According to claim 9,
The base station, characterized in that the first optimization algorithm is a sequential least squares programming (SLSQP) algorithm.

According to claim 8,
The base station information in the first and second states includes an amount of energy harvested in the base station, an amount of energy stored in the base station, and channel information fed back from the plurality of terminals,
The base station, characterized in that the determination of the transmission power is performed using a second deep reinforcement learning algorithm.

According to claim 11,
The amount of energy stored in the base station in the second state is
(a) the maximum amount of energy that can be possessed by the base station;
(b) subtracting the product of the determined transmission power in the first state and the time at which the data is transmitted from the amount of energy stored in the base station in the first state, and adding the amount of energy harvested in the base station in the second state amount of energy;
A base station, characterized in that determined by the smaller amount of energy among (a) and (b).

According to claim 11,
The base station, characterized in that the second deep reinforcement learning algorithm is a soft actor critic (SAC) algorithm.

According to claim 8,
The base station is characterized in that the base station is a mobile base station.

A recording medium in which a program of instructions executable by a digital processing device is tangibly implemented to provide transmission rate division multiple access for maximizing the sum transmission rate in a communication environment where energy supply is limited, and which can be read by the digital processing device as,
A computer-readable recording medium on which a program for executing the method of any one of claims 1 to 7 is recorded on a computer.