KR102664971B1

KR102664971B1 - Rate-splitting multiple access method for the base station with limited energy supply to maximize the sum rate based on deep reinforcement learning

Info

Publication number: KR102664971B1
Application number: KR1020210187442A
Authority: KR
Inventors: 신원재; 성재협
Original assignee: 고려대학교 산학협력단
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2024-05-10
Also published as: KR20230097696A

Abstract

본 발명의 일 실시 예에 따른 에너지 공급이 제한된 기지국이 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법은, 제1 상태의 상기 기지국 정보를 기반으로 제1 송신전력을 결정하는 단계; 복수의 단말들에게 전송할 데이터를 공유 및 개인 메시지로 분할하고, 상기 공유 및 개인 메시지를 각각 공유 및 개인 스트림으로 부호화하는 단계; 상기 결정된 송신전력 내에서 상기 부호화된 공유 및 개인 스트림 각각에 대한 전력할당 및 빔포밍 설계를 하는 단계; 상기 복수의 단말들에게 상기 데이터를 송신하고 합 전송률을 수신하는 단계; 및 상기 수신한 합 전송률 및 제2 상태의 상기 기지국 정보를 기반으로 제2 송신전력을 결정하는 단계;를 포함하는 것을 특징으로 한다. 이에 따라, 본 발명의 일 실시 예에서는, 기지국이 복수의 단말들에게서 수신한 합 전송률 및 기지국 상태 정보를 기반으로 심층강화학습을 사용하여 송신전력을 결정하고 결정된 송신전력 내에서 최적화 기법을 사용하여 부호화된 공유 및 개인 스트림 각각에 대한 전력할당 및 빔포밍 설계를 함으로써 에너지 공급이 제한된 기지국이 변화하는 외부 에너지 및 통신 채널 상태에 적합하도록 공급받은 에너지를 효율적으로 사용 가능하다.A rate division multiple access method for a base station with limited energy supply to maximize the sum rate according to an embodiment of the present invention includes determining a first transmission power based on information about the base station in a first state; Splitting data to be transmitted to a plurality of terminals into shared and personal messages, and encoding the shared and personal messages into shared and personal streams, respectively; Designing power allocation and beamforming for each of the encoded shared and private streams within the determined transmission power; Transmitting the data to the plurality of terminals and receiving a sum data rate; and determining a second transmission power based on the received sum data rate and the base station information in the second state. Accordingly, in one embodiment of the present invention, the base station determines the transmission power using deep reinforcement learning based on the sum data rate and base station status information received from a plurality of terminals, and uses an optimization technique within the determined transmission power. By designing power allocation and beamforming for each of the encoded shared and private streams, a base station with limited energy supply can efficiently use the supplied energy to suit changing external energy and communication channel conditions.

Description

Rate-splitting multiple access method for the base station with limited energy supply to maximize the sum rate based on deep reinforcement learning}

본 발명은 에너지 공급이 제한된 기지국이 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법에 관한 것으로, 보다 구체적으로는 심층강화학습 및 최적화 기법을 이용하여 송신기에서의 전력할당 및 빔포밍 설계를 최적화하는, 에너지 공급이 제한된 통신 환경에서 합 전송률 최대화를 위한 전송률 분할 다중접속 방법에 관한 것이다.The present invention relates to a rate division multiple access method for a base station with limited energy supply to maximize the sum rate, and more specifically, to optimize power allocation and beamforming design in the transmitter using deep reinforcement learning and optimization techniques. , This is about a rate division multiple access method for maximizing the sum rate in a communication environment with limited energy supply.

LTE 기반의 4G 이동통신의 등장 이후 복수개의 안테나를 이용한 MIMO(Multiple Input Multiple Output) 기술은 이동통신에서 반드시 필요한 핵심 기술이 되었다. 최근 MIMO 기술은 Massive MIMO 등으로 발전하여 이론적으로는 무한대의 안테나까지도 고려하고 있다. 또한, 무선 LAN에서도 802.11n을 시작으로 MIMO 기술이 보급되어 최근 출시되는 모든 무선 LAN에서는 MIMO 기술을 기본으로 채용하고 있으며, 빔포밍과 공간 다중화 기술을 결합한 다중 사용자 MIMO(Multi-User MIMO: MU-MIMO) 기술이 등장하였다.Since the emergence of LTE-based 4G mobile communication, MIMO (Multiple Input Multiple Output) technology using multiple antennas has become an essential core technology in mobile communication. Recently, MIMO technology has developed into Massive MIMO, and even theoretically infinite antennas are being considered. In addition, MIMO technology has spread in wireless LANs, starting with 802.11n, and all recently released wireless LANs adopt MIMO technology as standard, and multi-user MIMO (MU-U-MIMO) combines beamforming and spatial multiplexing technology. MIMO) technology has emerged.

다중 사용자 MIMO 기술은 하나의 기지국에서 다수의 안테나를 사용하여 동일 주파수 대역으로 다수 사용자에게 동시에 서비스를 지원 가능하여 무선 대역의 효율성이 증대된다. 이러한 장점에도 불구하고 하나의 기지국에서 다수의 안테나를 사용하여 다수 사용자에게 서비스를 제공하기 때문에 사용자 간의 간섭(inter-user interference) 문제가 존재하게 되며, 무선 통신의 특성상 정보 전달 매체인 채널의 정보는 급격히 변화하므로 모든 사용자에 대하여 정확한 채널 정보를 획득하기란 사실상 불가능하게 되어 서비스를 제공받는 사용자 간의 신호 간섭은 피할 수 없는 실정이다.Multi-user MIMO technology can support services to multiple users simultaneously in the same frequency band using multiple antennas in one base station, thereby increasing the efficiency of the wireless band. Despite these advantages, there is a problem of inter-user interference because one base station uses multiple antennas to provide services to multiple users, and due to the nature of wireless communication, the information in the channel, which is the information transmission medium, is Due to rapid changes, it is virtually impossible to obtain accurate channel information for all users, so signal interference between users receiving services is unavoidable.

전송률 분할 다중접속기술(RSMA: Rate-Splitting Multiple Access)은 비직교 다중 사용자 접속기술로서, 다중 사용자 및 다중 안테나 통신 환경에서 우수한 성능과 강인성을 가지며 다양한 채널 환경에서 다수의 사용자를 지원 가능한 것이 특징이다. 이러한 전송률 분할 다중접속기술은 기존의 다중접속기술에 비해 에너지 및 주파수 대역의 효율성과 부정확한 채널 상태 정보에 강인하다는 측면에서 강점을 지니고 있다. 한편, 지금까지의 전송률 분할 다중접속기술은 송신전력에 제한이 없는 기지국을 사용하여 기지국이 일정 송신전력으로 사용자에게 서비스를 제공할 수 있는 환경에서 변화하는 채널 상태 정보에 대비하여 최대의 합 전송률을 제공할 수 있도록 발전되어왔다.Rate-Splitting Multiple Access (RSMA) is a non-orthogonal multi-user access technology that has excellent performance and robustness in a multi-user and multi-antenna communication environment and is capable of supporting multiple users in a variety of channel environments. . This rate division multiple access technology has strengths compared to existing multiple access technologies in terms of energy and frequency band efficiency and robustness to inaccurate channel state information. Meanwhile, the rate division multiple access technology to date uses a base station with no limit on transmission power to provide the maximum sum data rate in preparation for changing channel state information in an environment where the base station can provide services to users with constant transmission power. has been developed to provide

직진성이 강한 고주파 대역(mmWave)을 사용하는 차세대 이동통신기술인 6G에서는 무인 항공기 시스템인 드론형 기지국 플랫폼을 활용하는 이동 기지국의 중요성이 점차 부각되고 있다. 고주파 대역은 기존 주파수 대역에 비해 넓은 대역폭을 활용하기 때문에 대용량 데이터의 전송에는 적합하지만 직진성이 강하므로 전파 쉐도잉(shadowing) 등에 상대적으로 취약하며 이를 해결하기 위해 기지국 수를 늘리고 기지국 간의 간격을 좁히는 스몰 셀(small cell)의 중요성이 점점 커지는 실정이다. 이동 기지국은 제한된 송신전력으로 인해 태양에너지, RF신호 등의 외부 에너지를 통해 에너지를 공급받으므로 이를 통신 채널 상태에 적합하게 효율적으로 사용해야 사용자에게 신뢰성 있는 서비스 제공이 가능하다. 그러나 변화하는 외부 에너지와 통신 채널 상태에 적합하게 공급받은 에너지를 효율적으로 사용하기는 어렵다. 이에 따라, 에너지 공급이 제한된 기지국이 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법에 있어서 변화하는 외부 에너지 및 통신 채널 상태에 적합하게 공급받은 에너지를 효율적으로 사용 가능한 다중접속 방법이 필요하다.In 6G, the next-generation mobile communication technology that uses a high-frequency band (mmWave) with strong linearity, the importance of mobile base stations utilizing drone-type base station platforms, an unmanned aerial vehicle system, is gradually emerging. The high-frequency band utilizes a wider bandwidth than the existing frequency band, so it is suitable for transmitting large amounts of data, but because it has strong linearity, it is relatively vulnerable to radio wave shadowing, etc. To solve this problem, small The importance of small cells is increasing. Mobile base stations receive energy from external energy such as solar energy and RF signals due to limited transmission power, so they must use it efficiently and appropriately to the communication channel conditions to provide reliable services to users. However, it is difficult to efficiently use the supplied energy appropriately for changing external energy and communication channel conditions. Accordingly, in the rate division multiple access method for a base station with limited energy supply to maximize the sum rate, a multiple access method that can efficiently use the energy supplied appropriately to changing external energy and communication channel conditions is needed.

본 발명은 상술한 바와 같은 종래 기술의 문제점을 해결하기 위한 것으로서, 기지국이 변화하는 외부 에너지 및 통신 채널 상태에 적합하게 공급받은 에너지를 효율적으로 사용 가능하도록 하는 에너지 공급이 제한된 기지국이 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법을 제공하는 것이다.The present invention is intended to solve the problems of the prior art as described above, and allows a base station with a limited energy supply to efficiently use the energy supplied suitably to changing external energy and communication channel conditions to maximize the sum transmission rate. It provides a rate division multiple access method to perform.

본 발명의 제1 특징에 따른 에너지 공급이 제한된 기지국이 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법은, 제1 상태의 상기 기지국 정보를 기반으로 제1 송신전력을 결정하는 단계; 복수의 단말들에게 전송할 데이터를 공유 및 개인 메시지로 분할하고, 상기 공유 및 개인 메시지를 각각 공유 및 개인 스트림으로 부호화하는 단계; 상기 결정된 송신전력 내에서 상기 부호화된 공유 및 개인 스트림 각각에 대한 전력할당 및 빔포밍 설계를 하는 단계; 상기 복수의 단말들에게 상기 데이터를 송신하고 합 전송률을 수신하는 단계; 및 상기 수신한 합 전송률 및 제2 상태의 상기 기지국 정보를 기반으로 제2 송신전력을 결정하는 단계를 포함하는 것을 특징으로 한다.A rate division multiple access method for a base station with limited energy supply to maximize the sum rate according to the first feature of the present invention includes determining a first transmission power based on information about the base station in a first state; Splitting data to be transmitted to a plurality of terminals into shared and personal messages, and encoding the shared and personal messages into shared and personal streams, respectively; Designing power allocation and beamforming for each of the encoded shared and private streams within the determined transmission power; Transmitting the data to the plurality of terminals and receiving a sum data rate; and determining a second transmission power based on the received sum data rate and the base station information in the second state.

본 발명의 제2 특징에 따른 에너지 공급이 제한된 통신 환경에서 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속을 사용하는 기지국은, 복수의 단말들과 신호를 송수신하는 송수신부; 상기 송수신부와 접속되고 상기 기지국의 동작을 제어하는 프로세서; 및 메모리를 포함하고, 상기 프로세서는, 제1 상태의 상기 기지국 정보를 기반으로 제1 송신전력을 결정하는 동작; 상기 복수의 단말들에게 전송할 데이터를 공유 및 개인 메시지로 분할하고, 상기 공유 및 개인 메시지를 각각 공유 및 개인 스트림으로 부호화하는 동작; 상기 결정된 송신전력 내에서 상기 부호화된 공유 및 개인 스트림 각각에 대한 전력할당 및 빔포밍 설계를 하는 동작; 상기 복수의 단말들에게 상기 데이터를 송신하고 합 전송률을 수신하는 동작; 및 상기 수신한 합 전송률 및 제2 상태의 상기 기지국 정보를 기반으로 제2 송신전력을 결정하는 동작을 수행하는 것을 특징으로 한다.A base station using rate division multiple access to maximize the sum rate in a communication environment with limited energy supply according to the second feature of the present invention includes a transceiver unit for transmitting and receiving signals to and from a plurality of terminals; a processor connected to the transceiver and controlling the operation of the base station; and a memory, wherein the processor determines a first transmission power based on the base station information in a first state; dividing data to be transmitted to the plurality of terminals into shared and personal messages, and encoding the shared and personal messages into shared and personal streams, respectively; An operation of performing power allocation and beamforming design for each of the encoded shared and private streams within the determined transmission power; Transmitting the data to the plurality of terminals and receiving a sum data rate; and determining a second transmission power based on the received sum data rate and the base station information in the second state.

본 발명의 제3 특징에 따른 에너지 공급이 제한된 통신 환경에서의 합 전송률 최대화를 위한 전송률 분할 다중접속을 제공하기 위해 디지털 처리 장치에 의해 실행될 수 있는 명령어들의 프로그램이 유형적으로 구현되어 있으며, 디지털 처리 장치에 의해 판독될 수 있는 기록매체는, 본 발명의 제1 특징에 따른 전송률 분할 다중접속 방법을 컴퓨터에서 실행시키기 위한 프로그램이 기록된 것을 특징으로 한다.According to the third feature of the present invention, a program of instructions that can be executed by a digital processing device is tangibly implemented to provide rate division multiple access for maximizing the sum rate in a communication environment with limited energy supply, and the digital processing device The recording medium that can be read by is characterized in that a program for executing the rate division multiple access method according to the first feature of the present invention on a computer is recorded thereon.

본 발명의 실시 예에 따른 에너지 공급이 제한된 기지국이 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법은 다음과 같은 효과를 제공한다.The rate division multiple access method for a base station with limited energy supply to maximize the sum rate according to an embodiment of the present invention provides the following effects.

기지국이 복수의 단말들에게서 수신한 합 전송률 및 기지국 상태 정보를 기반으로 심층강화학습을 사용하여 송신전력을 결정하고 결정된 송신전력 내에서 최적화 기법을 사용하여 부호화된 공유 및 개인 스트림 각각에 대한 전력할당 및 빔포밍 설계를 함으로써 에너지 공급이 제한된 기지국이 변화하는 외부 에너지 및 통신 채널 상태에 적합하도록 공급받은 에너지를 효율적으로 사용 가능하다.The base station determines the transmission power using deep reinforcement learning based on the sum data rate and base station status information received from multiple terminals, and uses an optimization technique within the determined transmission power to allocate power to each of the encoded shared and private streams. And by designing beamforming, a base station with limited energy supply can efficiently use the energy supplied to suit changing external energy and communication channel conditions.

상술한 효과는 기지국이 불가피하게 에너지 공급이 제한될 수 밖에 없는 이동 기지국인 경우 더욱 큰 장점이 되며, 이러한 이동 기지국은 셀과 셀 사이를 이어주는 역할을 하여 네트워크 연결성의 확보를 보다 용이하게 해 주며, 기존에 기지국이 존재하지 않아 통신이 제한된 지역까지 서비스 지원이 가능하다.The above-described effect becomes an even greater advantage when the base station is a mobile base station whose energy supply is inevitably limited. This mobile base station serves as a connection between cells, making it easier to secure network connectivity. Service support is possible even in areas where communication is limited because there are no existing base stations.

전송률 분할 다중접속기술을 사용함으로써, 다중 사용자 MIMO 환경에서 사용자들의 채널 상태 정보를 정확히 알지 못하더라도 동일한 주파수 대역을 사용하여 사용자들에게 신뢰성 있는 서비스를 제공하는 것이 가능하다.By using rate division multiple access technology, it is possible to provide reliable services to users using the same frequency band even if the channel status information of users is not accurately known in a multi-user MIMO environment.

도 1은 본 발명의 일 실시 예에 따른 에너지 공급이 제한된 기지국이 심층강화학습을 기반으로 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법에 대한 플로우차트이다.
도 2는 본 발명의 일 실시 예에 따른 에너지 공급이 제한된 통신 환경에서 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속을 사용하는 기지국을 나타내는 블록도이다.
도 3은 본 발명의 일 실시 예에 따른 에너지 공급이 제한된 기지국이 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법이 수행되는 일련의 과정을 개략적으로 도시한 도면이다.
도 4는 본 발명의 일 실시 예에 따른 전송률 분할 다중접속기술을 포함한 다중접속기술들의 성능을 비교한 그래프이다.
도 5는 본 발명의 일 실시 예에 따른 에너지 공급이 제한된 통신 환경에서 심층강화학습이 활용된 전송률 분할 다중 접속기술을 포함한 다중접속기술들의 성능을 비교한 그래프이다.Figure 1 is a flowchart of a rate division multiple access method for a base station with limited energy supply to maximize the sum rate based on deep reinforcement learning according to an embodiment of the present invention.
Figure 2 is a block diagram showing a base station using rate division multiple access to maximize the sum rate in a communication environment with limited energy supply according to an embodiment of the present invention.
Figure 3 is a diagram schematically showing a series of processes in which a rate division multiple access method is performed for a base station with limited energy supply to maximize the sum rate according to an embodiment of the present invention.
Figure 4 is a graph comparing the performance of multiple access technologies including rate division multiple access technology according to an embodiment of the present invention.
Figure 5 is a graph comparing the performance of multiple access technologies, including rate division multiple access technology using deep reinforcement learning, in a communication environment with limited energy supply according to an embodiment of the present invention.

이하, 본 발명에 대해서 실시예 및 도면을 참조하여 구체적으로 설명한다. 그러나, 이하의 설명은 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.Hereinafter, the present invention will be described in detail with reference to examples and drawings. However, the following description is not intended to limit the present invention to specific embodiments, and in describing the present invention, if it is determined that a detailed description of related known technology may obscure the gist of the present invention, the detailed description will be omitted. .

본 발명의 실시 예들에서 기지국은 일반적으로 무선기기와 통신하는 고정된 지점(fixed station)을 말하나 이동성을 가질 수도 있으며, eNB(evolved-NodeB), BTS(base transceiver system) 등의 용어로 대체될 수 있다.In embodiments of the present invention, a base station generally refers to a fixed station that communicates with a wireless device, but may be mobile and can be replaced by terms such as evolved-NodeB (eNB) and base transceiver system (BTS). there is.

또한, 본 발명의 실시 예들에서 단말은 고정되거나 이동성을 가질 수 있으며, 사용자 기기(UE: User Equipment), 이동국(MS: Mobile Station), 가입자 단말(SS: Subscriber Station), 이동 가입자 단말(MSS: Mobile Subscriber Station), 이동단말(Mobile Terminal) 또는 발전된 이동단말(AMS: Advanced Mobile Station) 등의 용어로 대체될 수 있다.Additionally, in embodiments of the present invention, the terminal may be fixed or mobile, and may include a user equipment (UE), a mobile station (MS), a subscriber station (SS), and a mobile subscriber station (MSS). It can be replaced by terms such as Mobile Subscriber Station, Mobile Terminal, or Advanced Mobile Station (AMS).

도 1은 본 발명의 일 실시 예에 따른 에너지 공급이 제한된 기지국이 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법에 대한 플로우차트이고, 도 2는 본 발명의 일 실시 예에 따른 에너지 공급이 제한된 통신 환경에서 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속을 사용하는 기지국을 나타내는 블록도이다.Figure 1 is a flow chart of a rate division multiple access method for a base station with limited energy supply to perform sum rate maximization according to an embodiment of the present invention, and Figure 2 is a flow chart for a base station with limited energy supply according to an embodiment of the present invention. This is a block diagram showing a base station using rate division multiple access to perform sum rate maximization in a communication environment.

도 1을 참조하면, 본 발명의 일 실시 예에 따른 에너지 공급이 제한된 기지국(200)이 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법(100)은, 제1 상태의 상기 기지국 정보를 기반으로 제1 송신전력을 결정하는 단계(110); 복수의 단말들(300)에게 전송할 데이터를 공유 및 개인 메시지로 분할하고, 상기 공유 및 개인 메시지를 각각 공유 및 개인 스트림으로 부호화하는 단계(120); 상기 결정된 송신전력 내에서 상기 부호화된 공유 및 개인 스트림 각각에 대한 전력할당 및 빔포밍 설계를 하는 단계(130); 상기 복수의 단말들(300)에게 상기 데이터를 송신하고 합 전송률을 수신하는 단계(140); 및 상기 수신한 합 전송률 및 제2 상태의 상기 기지국 정보를 기반으로 제2 송신전력을 결정하는 단계(150);를 포함하는 것을 특징으로 한다.Referring to FIG. 1, the rate division multiple access method 100 for the base station 200 with limited energy supply to maximize the sum rate according to an embodiment of the present invention is based on the base station information in the first state. Determining the first transmission power (110); Splitting data to be transmitted to a plurality of terminals 300 into shared and personal messages, and encoding the shared and personal messages into shared and personal streams, respectively (120); Step 130 of designing power allocation and beamforming for each of the encoded shared and private streams within the determined transmission power; Transmitting the data to the plurality of terminals 300 and receiving a sum data rate (140); and determining a second transmission power based on the received sum data rate and the base station information in the second state (150).

도 2를 참조하면, 본 발명의 일 실시 예에 따른 에너지 공급이 제한된 통신 환경에서 심층강화학습을 기반으로 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속을 사용하는 기지국(200)은, 복수의 단말들(300)과 신호를 송수신하는 송수신부(210); 상기 송수신부(210)와 접속되고 상기 기지국(200)의 동작을 제어하는 프로세서(220); 및 메모리(230)를 포함하고, 상기 프로세서(220)는, 제1 상태의 상기 기지국 정보를 기반으로 제1 송신전력을 결정하는 동작(110); 상기 복수의 단말들(300)에게 전송할 데이터를 공유 및 개인 메시지로 분할하고, 상기 공유 및 개인 메시지를 각각 공유 및 개인 스트림으로 부호화하는 동작(120); 상기 결정된 송신전력 내에서 상기 부호화된 공유 및 개인 스트림 각각에 대한 전력할당 및 빔포밍 설계를 하는 동작(130); 상기 복수의 단말들(300)에게 상기 데이터를 송신하고 합 전송률을 수신하는 동작(140); 및 상기 수신한 합 전송률 및 제2 상태의 상기 기지국 정보를 기반으로 제2 송신전력을 결정하는 동작(150);을 수행하는 것을 특징으로 한다.Referring to FIG. 2, a base station 200 using rate division multiple access to perform sum rate maximization based on deep reinforcement learning in a communication environment with limited energy supply according to an embodiment of the present invention includes a plurality of terminals. A transmitting and receiving unit 210 that transmits and receives signals with the fields 300; A processor 220 connected to the transceiver 210 and controlling the operation of the base station 200; and a memory 230, wherein the processor 220 performs an operation 110 of determining a first transmission power based on the base station information in a first state; An operation 120 of dividing data to be transmitted to the plurality of terminals 300 into shared and personal messages and encoding the shared and personal messages into shared and personal streams, respectively; An operation 130 of performing power allocation and beamforming design for each of the encoded shared and private streams within the determined transmission power; An operation 140 of transmitting the data to the plurality of terminals 300 and receiving a sum data rate; and an operation 150 of determining a second transmission power based on the received sum data rate and the base station information in the second state.

기지국(200)은 기지국 상태 정보를 기반으로 송신전력을 결정한다. 본 발명의 일 실시 예에 있어서, 기지국의 상태 정보는 기지국에 수확된 에너지의 양, 기지국에 저장되어 있는 에너지의 양 및 통신하는 복수의 단말들(300)에게 피드백 받은 채널 정보를 포함할 수 있다.The base station 200 determines transmission power based on base station status information. In one embodiment of the present invention, the status information of the base station may include the amount of energy harvested by the base station, the amount of energy stored in the base station, and channel information fed back to the plurality of communicating terminals 300. .

이러한 송신전력의 결정은 기지국(200)의 프로세서(220)가 송수신부(210) 및 메모리(230)를 통해 기지국의 상태 정보를 수신하여 심층강화학습 알고리즘을 수행하여 이루어지게 된다.This transmission power is determined by the processor 220 of the base station 200 receiving the status information of the base station through the transceiver 210 and the memory 230 and performing a deep reinforcement learning algorithm.

본 발명의 일 실시 예에 있어서, 심층강화학습 알고리즘은 머신러닝 기반의 모델로서 강화 학습(reinforcement learning) 기반의 모델일 수 있다. 강화 학습 기반의 모델은 관리자의 개입 없이 학습에 따라 지속적으로 또는 주기적으로 업데이트될 수 있다.In one embodiment of the present invention, the deep reinforcement learning algorithm is a machine learning-based model and may be a reinforcement learning-based model. Reinforcement learning-based models can be continuously or periodically updated as they learn without administrator intervention.

강화 학습 기반의 모델은, 에이전트(agent)가 환경(environment)로부터 액션(action)을 취하며, 이에 따라 에이전트에 보상(reward)과 State(상태)가 주어지게 되고, 보상이 가장 크게 되도록 에이전트를 훈련시키는 모델일 수 있다. 강화 학습 기반의 모델은, 에이전트가 사전 지식을 가지고 있는지 여부에 따라, 모델 베이스(model-based) 알고리즘을 사용하는 모델 또는 모델 프리(model-free) 알고리즘을 사용하는 모델로 구분될 수 있다.In a reinforcement learning-based model, an agent takes an action from the environment, and accordingly the agent is given a reward and a state, and the agent is selected so that the reward is the largest. It can be a model to train. Reinforcement learning-based models can be divided into models using a model-based algorithm or models using a model-free algorithm, depending on whether the agent has prior knowledge.

심층 강화 학습은, 상술한 바와 같은 강화 학습을 사용하여 실제 환경과 같은 연속적이고 복잡한 상태 공간과 행동 공간에서의 문제를 해결하기 위해 심층 신경망(deep neural network)을 사용하는 알고리즘이다. 본 발명의 일 실시 예에 있어서, 에이전트는 기지국, 환경은 에너지 공급이 제한된 전송률 분할 다중접속 통신 환경, 행동은 기지국에서 사용자로의 송신, 보상은 합 전송률, 상태는 수확된 에너지의 양, 기지국에 저장되어 있는 에너지의 양 및 통신하는 복수의 단말들에게 피드백 받은 채널 정보로 설정할 수 있다.Deep reinforcement learning is an algorithm that uses a deep neural network to solve problems in continuous and complex state spaces and action spaces such as real environments using reinforcement learning as described above. In one embodiment of the present invention, the agent is a base station, the environment is a rate division multiple access communication environment with limited energy supply, the action is transmission from the base station to the user, the compensation is the sum transmission rate, the state is the amount of energy harvested, and the base station It can be set based on the amount of stored energy and channel information fed back from multiple communicating terminals.

본 발명의 일 실시 예에 있어서, 송신전력의 결정에 사용되는 심층강화학습 알고리즘은 SAC(soft actor critic) 알고리즘일 수 있다.In one embodiment of the present invention, the deep reinforcement learning algorithm used to determine transmission power may be a soft actor critic (SAC) algorithm.

기지국(200)은 전송률 분할 다중접속기술을 사용하여 복수의 단말들(300)에게 전송할 데이터를 공유 메시지(Common message) 및 개인 메시지(Private message)의 2종류로 분할한다. 공유 메시지는 모든 사용자가 공유하는 코드북(Codebook)을 사용하여 1개의 공유 스트림으로 부호화(encoding)되며 이는 모든 사용자가 복호화(decoding)할 수 있다. 개인 메시지는 각 메시지에 해당하는 특정 사용자만 가지는 코드북을 통해 개인 스트림으로 부호화되며 상기 특정 사용자에 의해서만 복호화 가능하다.The base station 200 uses rate division multiple access technology to divide data to be transmitted to a plurality of terminals 300 into two types: a common message and a private message. Shared messages are encoded into one shared stream using a codebook shared by all users, which can be decoded by all users. Personal messages are encoded into personal streams through a codebook that only the specific user corresponding to each message has, and can be decoded only by the specific user.

사용자는 공유 스트림을 복호화하고 공유 메시지 중 자신의 메시지에 해당하는 부분을 얻는다. 그리고 순차적 간섭제거 기법(SIC: successive interference cancellation)을 통해 수신 신호에서 공유 스트림을 제거한다. 순차적 간섭제거 기술은 수신기 측에서 다중 수신신호를 동시에 처리하는데 사용된다. 이 기술은 대상 신호들 중에서 신호 강도 차이를 이용하여 수신기에서 신호를 처리하도록 한다. 즉, 수신기는 보다 강한 신호를 복호화 한 후 중첩된 신호로부터 강한 신호를 먼저 추출하고, 이후 나머지 신호로부터 약한 신호를 제거한다.Users decrypt the shared stream and obtain the part of the shared message that corresponds to their own message. And the shared stream is removed from the received signal through sequential interference cancellation (SIC). Sequential interference cancellation technology is used to simultaneously process multiple received signals on the receiver side. This technology uses signal strength differences among target signals to process signals at the receiver. That is, after decoding the stronger signal, the receiver first extracts the strong signal from the overlapping signal and then removes the weak signal from the remaining signals.

공유 스트림 제거 후 사용자는 다른 사용자에 해당하는 개인 스트림으로 인한 간섭만을 가지게 되며 이를 잡음으로 처리하여 원하는 개인 스트림을 복호화 한다. 이러한 순차적 간섭제거 기법을 사용한 공유 스트림과 개인 스트림의 복호화를 통해 각 사용자는 자신에게 해당하는 원래의 메시지를 얻을 수 있게 된다.After removing the shared stream, the user only has interference from the personal stream corresponding to other users, and this is treated as noise to decode the desired personal stream. By decoding the shared stream and private stream using this sequential interference cancellation technique, each user can obtain the original message corresponding to him/her.

따라서, 기지국은 서비스를 제공받을 사용자들의 채널 상태 정보를 정확하게 알지 못하는 상황(예를 들어, 양자화 에러, 채널 상태 변경 등에 기인한 경우)에서 공유 메시지를 활용하여 사용자들에게 전송할 메시지의 일부를 멀티캐스트(multicast)로 전송하여 인접한 사용자 간의 신호 간섭을 최소화할 수 있다.Therefore, in situations where the base station does not accurately know the channel state information of the users who will be provided with the service (for example, due to quantization error, channel state change, etc.), a shared message is used to multicast part of the message to be transmitted to the users. By transmitting via (multicast), signal interference between adjacent users can be minimized.

기지국은 분할한 메시지를 다수의 사용자들에게 전송하기 위해 프리코더(precoder)를 사용하여 하나의 공유 스트림에 대해서 전력할당 및 빔포밍(beamforming) 설계를 하고, 각각의 개인 스트림에 대해서도 전력할당 및 빔포밍 설계를 한다. 이 경우 공유 스트림의 전력할당 및 빔포밍 설계와 각각의 개인 스트림의 전력할당은 비볼록(non-convex)한 문제에 해당한다.In order to transmit divided messages to multiple users, the base station uses a precoder to design power allocation and beamforming for one shared stream, and also design power allocation and beamforming for each private stream. Perform forming design. In this case, the power allocation and beamforming design of the shared stream and the power allocation of each private stream are non-convex problems.

볼록(convex)한 문제는 하나의 최소값(global minimum) 또는 하나의 최대값(global maximum)이 존재하므로 이들을 계산하기가 상대적으로 수월하나, 삼차식과 같은 비볼록한 문제의 경우는 최소값 또는 최대값에 추가적으로 극소값(local minimum) 및 극대값(local maximum)을 가지므로 최소값 또는 최대값의 계산이 상대적으로 어렵다. 예를 들어, 비볼록한 문제에 있어서, 볼록한 문제를 푸는 대표적인 방법인 경사 하강법(gradient descent)과 같은 방식들을 사용하여 문제를 푸는 경우, 최소값 또는 최대값으로 수렴하지 않고 극소값 또는 극대값에 빠지게 되어 구하고자 했던 최소값 또는 최대값을 산출하지 못하는 경우가 발생할 수 있다.Convex problems have one minimum value (global minimum) or one maximum value (global maximum), so it is relatively easy to calculate them. However, in the case of non-convex problems such as cubic equations, there is an additional minimum or maximum value. Since it has a local minimum and a local maximum, calculating the minimum or maximum value is relatively difficult. For example, in the case of a non-convex problem, if the problem is solved using methods such as gradient descent, which is a representative method for solving convex problems, the problem will not converge to the minimum or maximum value and will fall into a minimum or maximum value. There may be cases where the intended minimum or maximum value cannot be calculated.

이에 따라, 본 발명의 일 실시 예에 있어서, 부호화된 공유 및 개인 스트림 각각에 대한 전력할당 및 빔포밍 설계는, 개인 스트림의 빔포밍 설계는 제로포싱(Zero-Forcing) 기법을 이용하여 우선적으로 수행하고 개인 스트림의 전력할당, 공유 스트림의 전력할당 및 공유 스트림의 빔포밍 설계는 최적화 알고리즘을 이용하여 수행할 수 있다.Accordingly, in one embodiment of the present invention, the power allocation and beamforming design for each of the encoded shared and private streams is preferentially performed using the zero-forcing technique. And the power allocation of the personal stream, the power allocation of the shared stream, and the beamforming design of the shared stream can be performed using an optimization algorithm.

본 발명의 일 실시 예에 있어서, 개인 스트림의 전력할당, 공유 스트림의 전력할당 및 공유 스트림의 빔포밍 설계에 사용되는 최적화 알고리즘은 SLSQP(sequential least squares programming) 알고리즘일 수 있다.In one embodiment of the present invention, the optimization algorithm used for power allocation of a private stream, power allocation of a shared stream, and beamforming design of the shared stream may be a sequential least squares programming (SLSQP) algorithm.

본 발명의 일 실시 예에 있어서, SLSQP 알고리즘은 개인 스트림의 전력할당, 공유 스트림의 전력할당 및 공유 스트림의 빔포밍 설계를 단순화해 목적함수를 볼록한 이차식으로 근사하여 계산하고 다음 지점을 예측하여 다시 동일한 방법을 수행하는 일련의 과정을 반복하면서 문제를 풀어냄으로써 비볼록/비선형의 문제를 최적으로 해결할 수 있다.In one embodiment of the present invention, the SLSQP algorithm simplifies the power allocation of the private stream, the power allocation of the shared stream, and the beamforming design of the shared stream, calculates the objective function by approximating it to a convex quadratic equation, predicts the next point, and repeats it again. Non-convex/non-linear problems can be optimally solved by repeating a series of processes using the same method to solve the problem.

본 발명의 일 실시 예에 있어서, SLSQP 알고리즘은 개인 스트림 및 공유 스트림의 전력할당에 관하여는 기지국이 전력할당을 심층강화학습의 결과로 결정된 송신전력보다 크게 설정할 수 없다는 제약 조건, 및 전송하고자 하는 모든 메시지들의 전력할당 비율의 합이 1보다 작거나 같아야 한다는 상하한 조건 하에서 동작되고, 공유 스트림의 빔포밍 설계에 관하여는 공유 스트림의 정규화된 빔포밍 벡터의 L2 norm의 제곱이 1보다는 작거나 같아야 한다는 조건 하에서 동작한다.In one embodiment of the present invention, the SLSQP algorithm has the constraint that the base station cannot set the power allocation larger than the transmission power determined as a result of deep reinforcement learning with respect to the power allocation of private streams and shared streams, and all the data to be transmitted. It operates under the upper and lower limit conditions that the sum of the power allocation ratios of the messages must be less than or equal to 1, and with respect to the beamforming design of the shared stream, the square of the L2 norm of the normalized beamforming vector of the shared stream must be less than or equal to 1. Operates under conditions.

도 3은 본 발명의 일 실시 예에 따른 에너지 공급이 제한된 기지국이 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법이 수행되는 일련의 과정을 개략적으로 도시한 도면이다.Figure 3 is a diagram schematically showing a series of processes in which a rate division multiple access method is performed for a base station with limited energy supply to maximize the sum rate according to an embodiment of the present invention.

도 3과 함께 다시 도 2를 참조하면, 기지국(200)은 송수신부(210), 프로세서(220) 및 메모리(230)를 포함한다. 프로세서(220)는 본 발명에서 제안한 절차 및/또는 방법들을 구현하도록 구성될 수 있다. 메모리(230)는 프로세서(220)와 연결되고 프로세서(220)의 동작과 관련한 다양한 정보를 저장한다. 송수신부(210)는 프로세서(220)와 연결되고 복수의 단말들(300)과 통신하여 무선 신호를 송신 및/또는 수신한다.Referring to FIG. 2 again along with FIG. 3 , the base station 200 includes a transceiver 210, a processor 220, and a memory 230. The processor 220 may be configured to implement the procedures and/or methods proposed in the present invention. The memory 230 is connected to the processor 220 and stores various information related to the operation of the processor 220. The transceiver 210 is connected to the processor 220 and communicates with a plurality of terminals 300 to transmit and/or receive wireless signals.

도 3에서, B_i는 i번째 단계에서 기지국에 저장되어 있는 에너지의 양, E_i는 i번째 단계에서 수확된 에너지의 양, h_i는 i번째 단계에서 복수의 단말들이 기지국에 피드백할 채널 상태 정보, H_i는 i번째 단계에서 기지국이 복수의 단말들에게 피드백 받은 채널 상태 정보, 그리고 e_i는 i번째 단계에서 발생하는 채널 상태 정보 손실을 나타낸다.In Figure 3, B _i is the amount of energy stored in the base station in the ith step, E _i is the amount of energy harvested in the ith step, and h _i is the channel status that a plurality of terminals will feed back to the base station in the ith step. Information, H _i represents channel state information fed back by the base station from a plurality of terminals in the ith step, and e _i represents channel state information loss that occurs in the ith step.

본 발명의 일 실시 예에 있어서, 기지국(200)은 복수의 단말들(300)에게 데이터를 송신하고 합 전송률(10)을 수신할 수 있다. 또한 상기 수신한 합 전송률(10)과 다음 상태의 기지국 정보(20)를 기반으로 송신전력을 다시 결정할 수 있다.In one embodiment of the present invention, the base station 200 may transmit data to a plurality of terminals 300 and receive the sum data rate 10. Additionally, the transmission power can be re-determined based on the received sum data rate (10) and the next state base station information (20).

도 3을 참조하면, 다음 상태의 기지국 정보(20)는 i번째 단계에서 기지국에 저장되어 있는 에너지의 양(B_i), i번째 단계에서 수확된 에너지의 양(E_i), i번째 단계에서 피드백 받은 채널 정보 상태(H_i)를 포함하는 정보이다.Referring to FIG. 3, the base station information 20 in the next state includes the amount of energy stored in the base station in the ith step (B _i ), the amount of energy harvested in the ith step (E i ), and the amount of energy harvested in the ith step (E _i ). This is information including the feedback channel information status (H _i ).

송신전력의 결정에는 심층강화학습 알고리즘이 사용될 수 있고, 본 발명의 일 실시 예에 있어서, 상기 심층강화학습 알고리즘은 SAC(soft actor critic) 알고리즘(30)일 수 있다.A deep reinforcement learning algorithm may be used to determine transmission power, and in one embodiment of the present invention, the deep reinforcement learning algorithm may be a soft actor critic (SAC) algorithm 30.

기지국(200)은 기지국에 저장되어 있는 에너지의 양, 수확된 에너지의 양, 피드백 받은 채널 정보 상태를 기반으로 사용할 송신전력을 결정한다. 이후 기지국(200)은 통신 환경(즉, 복수의 단말들(300))으로부터 합 전송률(10)을 보상으로 받아 수행했던 행동에 대한 평가를 받으며, 동시에 다음 상태의 기지국 정보(20)를 전달받게 된다. 에이전트는 이를 참고하여 보다 큰 보상을 받을 수 있는 방향으로 일련의 학습 과정을 반복 진행하여 최적의 송신전력을 결정할 수 있게 된다.The base station 200 determines the transmission power to be used based on the amount of energy stored in the base station, the amount of harvested energy, and the status of the channel information received. Afterwards, the base station 200 receives the sum data rate 10 as a reward from the communication environment (i.e., a plurality of terminals 300) and is evaluated for the action performed, and at the same time receives the base station information 20 of the next state. do. The agent can refer to this and determine the optimal transmission power by repeating a series of learning processes in the direction of receiving greater rewards.

이 경우, 기지국에 저장되어 있는 에너지의 양은 소정 조건을 만족해야 하고 기지국에 저장되어 있는 에너지의 양을 단계별로 b1 및 b2, 기지국의 최대 저장 가능 에너지의 양을 max b, 송신전력을 p, 전송시간을 T, 수확된 에너지의 양을 E라고 하면 그 조건은 하기와 같다.In this case, the amount of energy stored in the base station must satisfy certain conditions, and the amount of energy stored in the base station is b1 and b2 in each step, the maximum amount of energy that can be stored in the base station is max b, and the transmission power is p. If time is T and the amount of harvested energy is E, the conditions are as follows.

b2 = m(b1, p, E) = min(max b, b1-pT+E)b2 = m(b1, p, E) = min(max b, b1-pT+E)

따라서, 본 발명의 일 실시 예에 있어서, 제2 상태의 기지국에 저장되어 있는 에너지의 양은, (a) 기지국 보유 가능 최대 에너지의 양, (b) 제1 상태의 기지국에 저장되어 있는 에너지의 양에서 제1 상태의 결정된 송신전력 및 데이터가 전송된 시간의 곱을 빼고, 제2 상태의 기지국에 수확된 에너지의 양을 더한 에너지의 양, 인 경우에 상기 (a) 및 (b) 중 더 작은 에너지의 양으로 결정될 수 있다.Therefore, in one embodiment of the present invention, the amount of energy stored in the base station in the second state is (a) the maximum amount of energy that the base station can hold, (b) the amount of energy stored in the base station in the first state The amount of energy obtained by subtracting the product of the determined transmission power in the first state and the time at which data was transmitted and adding the amount of energy harvested to the base station in the second state, in which case the smaller energy of (a) and (b) above. It can be determined by the amount of .

심층강화학습 알고리즘인 SAC(soft actor critic) 알고리즘(30)을 통해 다시 결정된 송신전력(40)은 i번째 단계에서 부호화된 공유 및 개인 스트림 각각에 대한 전력할당 및 빔포밍 설계를 하도록 사용되고, 이러한 동작의 수행에 있어서 최적화 기법인 SLSQP(sequential least squares programming) 알고리즘(50)이 사용되며, i번째 단계에서 최적의 프리코더를 이용하여 다수의 사용자들인 복수의 단말들(300)에게 데이터를 전송하게 된다.The transmission power (40) determined again through the soft actor critic (SAC) algorithm (30), a deep reinforcement learning algorithm, is used to design power allocation and beamforming for each of the encoded shared and private streams in the i-th step, and these operations In performing, the SLSQP (sequential least squares programming) algorithm 50, which is an optimization technique, is used, and in the i-th step, data is transmitted to a plurality of terminals 300, which are a large number of users, using the optimal precoder. .

이후 기지국(200)은 복수의 단말들(300)로부터 새로운 합 전송률을 보상으로 받아 수행했던 행동에 대한 평가를 받게 되고, 상술한 바와 같이 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법은 반복적으로 수행될 수 있다.Afterwards, the base station 200 receives a new sum rate as compensation from the plurality of terminals 300 and receives an evaluation of the action performed. As described above, the rate division multiple access method for maximizing the sum rate is repeated. It can be done.

본 발명의 일 실시 예에 있어서, 에너지 공급이 제한된 통신 환경의 기지국은 이동 기지국일 수 있다. 드론형 기지국 플랫폼은 셀과 셀 사이를 이어주는 역할을 하여 네트워크 연결성의 확보를 보다 용이하게 해 주며, 기재국이 존재하지 않아서 통신이 제한된 지역까지도 서비스를 지원할 수 있도록 해 준다. 또한, 기지국의 고도를 높여 가시선전파(line of sight)를 최대한 확보하여 고주파 대역 통신에 있어서 쉐도잉 등에 취약한 면을 보완할 수 있고, 자연 재해 등의 이유로 기존의 기지국이 파괴된 경우에도 지속적인 서비스 제공이 가능하다.In one embodiment of the present invention, a base station in a communication environment with limited energy supply may be a mobile base station. The drone-type base station platform serves as a link between cells, making it easier to secure network connectivity and supporting services even in areas where communication is limited due to the absence of a base station. In addition, by increasing the altitude of the base station, line of sight can be secured as much as possible to compensate for vulnerabilities such as shadowing in high frequency band communication, and continuous service can be provided even if the existing base station is destroyed due to natural disasters or other reasons. This is possible.

이동 기지국은 제한된 송신전력으로 인해 태양에너지, RF신호 등의 외부 에너지를 통해 에너지를 공급받으므로 이를 통신 채널 상태에 적합하게 효율적으로 사용해야 사용자에게 신뢰성 있는 서비스 제공이 가능한 바, 본 발명에 따른 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법을 통해 변화하는 외부 에너지 및 통신 채널 상태에 적합하게 공급받은 에너지를 효율적으로 사용 가능하게 된다.The mobile base station receives energy through external energy such as solar energy and RF signals due to limited transmission power, so it must be used efficiently in accordance with the communication channel status to provide reliable services to users. The sum data rate according to the present invention is Through the rate division multiple access method to perform maximization, it is possible to efficiently use the energy supplied appropriately to changing external energy and communication channel conditions.

이하에서는 1) 전송률 분할 다중접속기술과 그 외의 다중접속기술들의 성능 비교, 및 2) 에너지 공급이 제한된 통신 환경에서 심층강화학습이 활용된 전송률 분할 다중 접속기술과 그 외의 다중접속기술들의 성능 비교에 대한 실험 과정 및 결과를 구체적으로 설명한다.The following describes 1) performance comparison of rate division multiple access technology and other multiple access technologies, and 2) performance comparison of rate division multiple access technology using deep reinforcement learning and other multiple access technologies in a communication environment with limited energy supply. The experimental process and results are explained in detail.

도 4는 본 발명의 일 실시 예에 따른 전송률 분할 다중접속기술을 포함한 다중접속기술들의 성능을 비교한 그래프이고, 도 5는 본 발명의 일 실시 예에 따른 에너지 공급이 제한된 통신 환경에서 심층강화학습이 활용된 전송률 분할 다중 접속기술을 포함한 다중접속기술들의 성능을 비교한 그래프이다.Figure 4 is a graph comparing the performance of multiple access technologies including rate division multiple access technology according to an embodiment of the present invention, and Figure 5 is a graph comparing deep reinforcement learning in a communication environment with limited energy supply according to an embodiment of the present invention. This is a graph comparing the performance of multiple access technologies including the rate division multiple access technology used.

실험에 대한 시뮬레이션 환경은 이동 기지국이 에너지 수확을 통해 50% 확률로 배터리 충전가능 용량만큼 외부의 에너지를 공급받은 뒤 다수의 사용자들에게 서비스를 제공해주는 통신 환경이다. 이 경우 사용자는 채널 상태 정보를 완전하게 알고 있으며 이를 기지국에게 피드백 해 주는 과정에서 양자화 에러, 채널 상태의 급격한 변화 등으로 인해 채널 상태 정보 손실이 발생한다. 이에 따라 기지국은 불완전한 채널 상태 정보를 피드백 받게 된다.The simulation environment for the experiment is a communication environment in which a mobile base station receives external energy equal to the battery's rechargeable capacity with a 50% probability through energy harvesting and then provides services to multiple users. In this case, the user is completely aware of the channel state information, and in the process of feeding it back to the base station, channel state information loss occurs due to quantization error and sudden changes in channel state. Accordingly, the base station receives incomplete channel state information as feedback.

도 4 및 도 5의 그래프들에서, x축은 학습을 진행한 횟수(learning iteration)이고, 시뮬레이션 과정에서는 100 step을 1 iteration으로 설정하였다. y축은 1000 step이 진행될 때마다 학습된 에이전트가 정해진 검증 환경에 대해 평가한 성능의 상대적인 척도이다. 여기서, step은 에이전트와 환경의 상호작용을 의미한다.In the graphs of Figures 4 and 5, the x-axis represents the number of learning iterations, and in the simulation process, 100 steps were set to 1 iteration. The y-axis is a relative measure of the performance evaluated by the learned agent against the designated verification environment every time 1000 steps are performed. Here, step refers to the interaction between the agent and the environment.

검증 환경은 학습 환경과 동일하게 설정되었다. 이동 기지국은 동일하게 에너지 수확을 통해 50% 확률로 배터리 충전가능 용량만큼 외부의 에너지를 공급받은 뒤 수확된 에너지의 양, 배터리에 남아있는 에너지의 양, 피드백 받은 채널 정보로 동일하게 구성된 상태 하에서 최적의 송신전력을 결정하여 다수의 사용자들에게 서비스를 제공한다. 이후 환경으로부터 보상 즉, 합 전송률을 수신한다. 성능의 비교에 있어서 보다 뚜렷한 가시화를 위해 환경으로부터 받은 보상에 1/10000을 곱해주어 값을 산출하여 그래프의 y축으로 도시되도록 설정하였으며 평가 과정의 일관성을 보장하기 위해 검증 모델은 일정하게 설정하였다.The verification environment was set up the same as the learning environment. The mobile base station receives external energy equal to the battery's chargeable capacity with a 50% probability through energy harvesting, and then operates optimally under conditions identically configured with the amount of energy harvested, the amount of energy remaining in the battery, and the channel information received. Provides service to multiple users by determining the transmission power. Afterwards, compensation, that is, the sum data rate, is received from the environment. For clearer visualization when comparing performance, the reward received from the environment was multiplied by 1/10000 to calculate the value and set it to be shown on the y-axis of the graph. To ensure consistency in the evaluation process, the verification model was set constant.

다시 도 4를 참조하면, 다음의 4가지 경우에 대해 비교하였다. Equal(Equal Power Allocation Precoder) Greedy, WF(Water Filling Precoder) Greedy, SLSQP Greedy(기지국이 자신이 피드백 받은 불완전한 채널 상태정보가 불완전하다는 것을 인지하고 있는 경우), SLSQP Greedy(no-info σ_e)(기지국이 자신이 피드백 받은 불완전한 채널 상태정보가 완전하다고 인지하고 있는 경우).Referring again to FIG. 4, the following four cases were compared. Equal (Equal Power Allocation Precoder) Greedy, WF (Water Filling Precoder) Greedy, SLSQP Greedy (if the base station recognizes that the incomplete channel state information it received as feedback is incomplete), SLSQP Greedy (no-info σ _e )( When the base station recognizes that the incomplete channel state information it received as feedback is complete).

Greedy란 심층강화학습을 적용하지 않은 경우라는 것을 의미한다. 즉, 기지국이 에너지를 수확하면 수확된 에너지, 배터리에 남아있는 에너지의 양 및 피드백 받은 채널 정보를 고려하지 않고 에너지를 수확하면 수확하는 대로 모두 사용해버리는 경우이다.Greedy means that deep reinforcement learning is not applied. In other words, when a base station harvests energy, it uses up all the energy as it is harvested without considering the harvested energy, the amount of energy remaining in the battery, and the channel information received feedback.

SLSQP Greedy와 SLSQP Greedy(no-info σ_e)는 메시지를 개인메시지와 공유메시지로 나누어 함께 전송하는 전송률 분할 다중접속기술을 사용한 결과이고, Equal Greedy와 WF Greedy는 메시지를 개인메시지와 공유메시지로 나누지 않고 유니캐스트(unicast) 방식으로만 메시지를 전송하고 있는 일반적인 다중접속기술이다.SLSQP Greedy and SLSQP Greedy (no-info σ _e ) are the result of using rate division multiple access technology to divide messages into personal messages and shared messages and transmit them together, while Equal Greedy and WF Greedy do not divide messages into personal messages and shared messages. It is a general multiple access technology that transmits messages only in unicast mode.

도 4의 실험 결과를 통해 메시지를 개인메시지와 공유메시지로 나누어 전송하는 전송률 분할 다중 접속기술이 공유 메시지를 사용하지 않는 다른 다중접속기술들에 비해 합 전송률 측면에서 우월한 성능을 보이고 있음을 알 수 있다.Through the experimental results in Figure 4, it can be seen that the rate division multiple access technology, which divides messages into personal messages and shared messages and transmits them, shows superior performance in terms of sum transmission rate compared to other multiple access technologies that do not use shared messages. .

다시 도 5를 참조하면, 다음의 8가지 경우에 대해 비교하였다. SAC+SLSQP, SAC+SLSQP(no-info σ_e), SAC+WF, SAC+Equal, Equal Greedy, WF Greedy, SLSQP Greedy, SLSQP Greedy(no-info σ_e).Referring again to FIG. 5, the following eight cases were compared. SAC+SLSQP, SAC+SLSQP(no-info σ _e ), SAC+WF, SAC+Equal, Equal Greedy, WF Greedy, SLSQP Greedy, SLSQP Greedy(no-info σ _e ).

도 5의 실험 결과를 통해 에너지 수확 환경에서 심층강화학습을 사용한 4가지 경우의 결과가 심층강화학습을 사용하지 않은 4가지 경우의 결과와 비교하여 합 전송률 측면에서 각각 우월한 성능을 보이고 있음을 확인할 수 있다. 또한, 심층강화학습을 사용한 경우에 대해서 Greedy한 경우 각각을 동일 조건 하에서 비교해 보면 전송률 분할 다중접속기술이 공유메시지를 사용하지 않는 다른 다중접속기술들에 비해 우월한 성능을 보이고 있음을 확인할 수 있다. 따라서, 본 발명에 따른 에너지 수확을 고려한 심층강화학습 기반 합 전송률 최대화를 위한 전송률 분할 다중접속 방법이 동일 조건의 통신 환경에서 기존의 다중접속기술들에 비해 우월한 성능을 보이고 있음을 알 수 있다.Through the experimental results in Figure 5, it can be confirmed that the results of the four cases using deep reinforcement learning in an energy harvesting environment each show superior performance in terms of sum transfer rate compared to the results of the four cases without deep reinforcement learning. there is. In addition, when comparing the case of using deep reinforcement learning and the case of greedy under the same conditions, it can be confirmed that the rate division multiple access technology shows superior performance compared to other multiple access technologies that do not use shared messages. Therefore, it can be seen that the rate division multiple access method for maximizing the sum data rate based on deep reinforcement learning considering energy harvesting according to the present invention shows superior performance compared to existing multiple access technologies in a communication environment under the same conditions.

지금까지 살펴본 바와 같이, 본 발명의 실시 예에 따른 에너지 공급이 제한된 기지국이 합 전송률 최대화를 수행하기 위한 전송률 분할 다중접속 방법은 기지국이 복수의 단말들에게서 수신한 합 전송률 및 기지국 상태 정보를 기반으로 심층강화학습을 사용하여 송신전력을 결정하고 결정된 송신전력 내에서 최적화 기법을 사용하여 부호화된 공유 및 개인 스트림 각각에 대한 전력할당 및 빔포밍 설계를 함으로써 에너지 공급이 제한된 기지국이 변화하는 외부 에너지 및 통신 채널 상태에 적합하도록 공급받은 에너지를 효율적으로 사용 가능한 효과를 제공한다.As seen so far, the rate division multiple access method for a base station with limited energy supply to maximize the sum rate according to an embodiment of the present invention is based on the sum rate and base station status information received by the base station from a plurality of terminals. By determining the transmission power using deep reinforcement learning and designing power allocation and beamforming for each encoded shared and private stream using an optimization technique within the determined transmission power, a base station with limited energy supply changes external energy and communication It provides the effect of efficiently using the energy supplied to suit the channel condition.

본 발명에 따른 실시예는 다양한 수단, 예를 들어, 하드웨어, 펌웨어(firmware), 소프트웨어 또는 그것들의 결합 등에 의해 구현될 수 있다. 하드웨어에 의한 구현의 경우, 본 발명의 일 실시예는 하나 또는 그 이상의 ASICs(application specific integrated circuits), DSPs(digital signal processors), DSPDs(digital signal processing devices), PLDs(programmable logic devices), FPGAs(field programmable gate arrays), 프로세서, 컨트롤러, 마이크로 컨트롤러, 마이크로 프로세서 등에 의해 구현될 수 있다.Embodiments according to the present invention may be implemented by various means, for example, hardware, firmware, software, or a combination thereof. In the case of implementation by hardware, an embodiment of the present invention includes one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), and FPGAs ( It can be implemented by field programmable gate arrays, processors, controllers, microcontrollers, microprocessors, etc.

펌웨어나 소프트웨어에 의한 구현의 경우, 본 발명의 일 실시예는 이상에서 설명된 기능 또는 동작들을 수행하는 모듈, 절차, 함수 등의 형태로 구현될 수 있다. 소프트웨어 코드는 메모리 유닛에 저장되어 프로세서에 의해 구동될 수 있다. 상기 메모리 유닛은 상기 프로세서 내부 또는 외부에 위치하여, 이미 공지된 다양한 수단에 의해 상기 프로세서와 데이터를 주고 받을 수 있다.In the case of implementation by firmware or software, an embodiment of the present invention may be implemented in the form of a module, procedure, function, etc. that performs the functions or operations described above. Software code can be stored in a memory unit and run by a processor. The memory unit is located inside or outside the processor and can exchange data with the processor through various known means.

한편, 본 발명의 실시예들은 컴퓨터로 읽을 수 있는 기록 매체에 컴퓨터가 읽을 수 있는 코드로 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다.Meanwhile, embodiments of the present invention can be implemented as computer-readable code on a computer-readable recording medium. Computer-readable recording media include all types of recording devices that store data that can be read by a computer system.

컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현하는 것을 포함한다. 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고 본 발명을 구현하기 위한 기능적인(functional) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술 분야의 프로그래머들에 의하여 용이하게 추론될 수 있다.Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage devices, and can also be implemented in the form of a carrier wave (e.g., transmission via the Internet). Includes. Additionally, the computer-readable recording medium can be distributed across computer systems connected to a network, so that computer-readable code can be stored and executed in a distributed manner. And functional programs, codes, and code segments for implementing the present invention can be easily deduced by programmers in the technical field to which the present invention pertains.

다양한 변형예가 본 발명의 범위를 벗어남이 없이 본 명세서에 기술되고 예시된 구성 및 방법으로 만들어질 수 있으므로, 상기 상세한 설명에 포함되거나 첨부 도면에 도시된 모든 사항은 예시적인 것으로 본 발명을 제한하기 위한 것이 아니다. 따라서, 본 발명의 범위는 상술한 예시적인 실시예에 의해 제한되지 않으며, 이하의 청구 범위 및 그 균등물에 따라서만 정해져야 한다.Since various modifications may be made to the configurations and methods described and illustrated herein without departing from the scope of the present invention, all matters contained in the foregoing detailed description or shown in the accompanying drawings are exemplary and are not intended to limit the present invention. It's not. Accordingly, the scope of the present invention should not be limited by the above-described exemplary embodiments, but should be determined only by the following claims and their equivalents.

10: i-1단계의 합 전송률 20: i단계의 기지국 상태 정보
30: SAC 알고리즘 40: i단계의 결정된 송신전력
50: SLSQP 알고리즘 200: 기지국
210: 송수신부 220: 프로세서
230: 메모리 300: 사용자 단말들10: Sum transmission rate of step i-1 20: Base station status information of step i
30: SAC algorithm 40: determined transmission power of step i
50: SLSQP algorithm 200: base station
210: Transmitter and receiver 220: Processor
230: memory 300: user terminals

Claims

In the rate division multiple access method for a base station with limited energy supply to maximize the sum rate,
Determining a first transmission power based on base station information in the first state;
Splitting data to be transmitted to a plurality of terminals into shared and personal messages, and encoding the shared and personal messages into shared and personal streams, respectively;
Designing power allocation and beamforming for each of the encoded shared and private streams within the determined transmission power;
Transmitting the data to the plurality of terminals and receiving a sum data rate; and
determining a second transmission power based on the received sum data rate and the base station information in a second state;
Including,
The power allocation and beamforming design for each of the encoded shared and private streams is performed preferentially using a zero-forcing technique, and the beamforming design of the private stream is performed preferentially, and the power allocation of the private stream and the shared Power allocation of the stream and beamforming design of the shared stream are performed using a first optimization algorithm,
The base station information in the first and second states includes the amount of energy harvested by the base station, the amount of energy stored in the base station, and channel information fed back to the plurality of terminals,
A rate division multiple access method, characterized in that the determination of the transmission power is performed using a second deep reinforcement learning algorithm.

delete

According to paragraph 1,
A rate division multiple access method, wherein the first optimization algorithm is a sequential least squares programming (SLSQP) algorithm.

delete

According to paragraph 1,
The amount of energy stored in the base station in the second state is,
(a) Maximum amount of energy the base station can hold;
(b) subtract the product of the determined transmission power in the first state and the time at which the data was transmitted from the amount of energy stored in the base station in the first state, and add the amount of energy harvested in the base station in the second state. amount of energy;
A rate division multiple access method characterized by determining the smaller energy amount among (a) and (b).

According to paragraph 1,
The second deep reinforcement learning algorithm is a rate division multiple access method, characterized in that the SAC (soft actor critic) algorithm.

According to paragraph 1,
A rate division multiple access method, wherein the base station is a mobile base station.

In a base station using rate division multiple access to maximize the sum rate in a communication environment with limited energy supply,
A transceiver unit that transmits and receives signals to and from a plurality of terminals;
a processor connected to the transceiver and controlling the operation of the base station; and memory,
The processor,
An operation of determining a first transmission power based on base station information in a first state;
dividing data to be transmitted to the plurality of terminals into shared and personal messages, and encoding the shared and personal messages into shared and personal streams, respectively;
An operation of performing power allocation and beamforming design for each of the encoded shared and private streams within the determined transmission power;
Transmitting the data to the plurality of terminals and receiving a sum data rate; and
determining a second transmission power based on the received sum data rate and the base station information in a second state;
Do this,
The power allocation and beamforming design for each of the encoded shared and private streams is performed preferentially using a zero-forcing technique, and the beamforming design of the private stream is performed preferentially, and the power allocation of the private stream and the shared Power allocation of the stream and beamforming design of the shared stream are performed using a first optimization algorithm,
The base station information in the first and second states includes the amount of energy harvested by the base station, the amount of energy stored in the base station, and channel information fed back to the plurality of terminals,
A base station, characterized in that the determination of the transmission power is performed using a second deep reinforcement learning algorithm.

delete

According to clause 8,
A base station, wherein the first optimization algorithm is a sequential least squares programming (SLSQP) algorithm.

delete

According to clause 8,
The amount of energy stored in the base station in the second state is,
(a) Maximum amount of energy the base station can hold;
(b) subtract the product of the determined transmission power in the first state and the time at which the data was transmitted from the amount of energy stored in the base station in the first state, and add the amount of energy harvested in the base station in the second state. amount of energy;
A base station characterized in that it is determined by the smaller amount of energy among (a) and (b).

According to clause 8,
A base station wherein the second deep reinforcement learning algorithm is a SAC (soft actor critic) algorithm.

According to clause 8,
A base station, characterized in that the base station is a mobile base station.

A recording medium that has a tangible implementation of a program of instructions that can be executed by a digital processing device to provide rate division multiple access for maximizing the sum rate in a communication environment with limited energy supply and that can be read by the digital processing device. as,
A computer-readable recording medium recording a program for executing the method of any one of claims 1, 3, and 5 to 7 on a computer.