KR102279231B1

KR102279231B1 - A method for determining of terminal mode in communication network including a plurality of terminals

Info

Publication number: KR102279231B1
Application number: KR1020200007237A
Authority: KR
Inventors: 신요안; 오선애
Original assignee: 숭실대학교산학협력단
Priority date: 2020-01-20
Filing date: 2020-01-20
Publication date: 2021-07-19

Abstract

The present invention provides a communication mode determining method of a first terminal which comprises the following steps of: checking whether a channel for data transmission in a communication network including a first terminal and a second terminal is allocated to a second terminal; determining a communication mode of the first terminal as an energy harvesting mode or a backscattering mode based on a reward and an energy state determined through reinforcement learning when the channel for data transmission is allocated to the second terminal; and determining the communication mode of the first terminal as a data transmission mode based on the reward and the energy state determined through reinforcement learning when the channel for data transmission is not allocated to the second terminal.

Description

A method for determining a communication mode of a terminal in a communication network including a plurality of terminals {A METHOD FOR DETERMINING OF TERMINAL MODE IN COMMUNICATION NETWORK INCLUDING A PLURALITY OF TERMINALS}

본 발명은 복수개의 단말을 포함하는 통신 네트워크에서 단말의 통신 모드 결정 방법을 제공한다.The present invention provides a method for determining a communication mode of a terminal in a communication network including a plurality of terminals.

도 1은 종래의 인지 라디오 네트워크(Cognitive Radio Network; CRN) 통신 시스템을 도시한 도면이다. 도 1에 도시된 130은 기지국(액세스 포인트), 120은 1차 사용자 측 단말, 그리고 110과 140은 각각 2차 사용자 측의 송신 단말과 수신 단말일 수 있다.1 is a diagram illustrating a conventional Cognitive Radio Network (CRN) communication system. Referring to FIG. 1, reference numeral 130 denotes a base station (access point), 120 denotes a primary user-side terminal, and 110 and 140 denote a secondary user's transmitting terminal and receiving terminal, respectively.

종래 기술에 따르면, RF 신호를 흡수해야 하는 에너지 수집과 달리 후방산란 방식은 안테나의 임피던스를 제어하여 수신한 에너지의 일정 양을 반사시키는 방식으로 데이터 비트를 전송한다. 따라서 에너지를 효율적으로 수집하기 위해 단말은 에너지 수집과 후방산란 모드를 동시에 수행할 수 없다. According to the prior art, unlike the energy collection that needs to absorb the RF signal, the backscattering method transmits data bits by controlling the impedance of the antenna to reflect a certain amount of the received energy. Therefore, in order to efficiently collect energy, the terminal cannot simultaneously perform energy collection and backscattering mode.

종래 기술에 따르면, 2차 사용자 측의 송신 단말(110)은 1차 사용자 측 단말(120)과 기지국(130)이 통신을 수행하지 않는 경우(즉, 채널이 점유되지 않은 경우)에 수신 단말(140)로 데이터를 전송할 수 있다. 따라서 2차 사용자 측의 송신 단말(110)은 데이터를 전송하기 위해 항상 1차 사용자 측 단말(120)과 기지국(130) 사이의 채널 상태를 확인해야 했다. 뿐만 아니라, 1차 사용자 측 단말(120)과 기지국(130) 사이의 채널 상태가 점유되어 있지 않다고 하더라도 2차 사용자 측의 송신 단말(110)의 에너지 상태에 따라 데이터가 전송되지 못할 수도 있다.According to the prior art, the transmitting terminal 110 of the secondary user side is the receiving terminal (in the case where the primary user side terminal 120 and the base station 130 do not communicate (that is, when the channel is not occupied) 140) to transmit data. Therefore, the transmitting terminal 110 of the secondary user has to always check the channel state between the primary user's terminal 120 and the base station 130 in order to transmit data. In addition, even if the channel state between the primary user-side terminal 120 and the base station 130 is not occupied, data may not be transmitted depending on the energy state of the secondary user-side transmitting terminal 110 .

따라서 본 발명에서는 단말이 보유한 에너지의 상태에 최적화된 동작 모드를 결정하기 위한 방법을 제공하고자 한다. 보다 구체적으로 본 발명은 단말의 동작 모드에 따른 보상을 최대화할 수 있는 강화학습을 통해 동작 모드를 결정하는 방법을 제안한다.Accordingly, the present invention is to provide a method for determining an operation mode optimized for the state of energy possessed by the terminal. More specifically, the present invention proposes a method for determining an operation mode through reinforcement learning capable of maximizing the compensation according to the operation mode of the terminal.

본 발명은 제1 단말과 제2 단말을 포함하는 통신 네트워크에서 데이터 전송을 위한 채널이 상기 제2 단말에게 할당되었는지 여부를 확인하는 단계, 상기 데이터 전송을 위한 채널이 상기 제2 단말에게 할당된 경우, 강화학습을 통해 결정되는 보상과 에너지 상태에 기반하여 상기 제1 단말의 통신 모드를 에너지 수집(harvesting) 모드 또는 후방산란(backscattering) 모드로 결정하는 단계 및 상기 데이터 전송을 위한 채널이 상기 제2 단말에게 할당되지 않은 경우, 강화학습을 통해 결정되는 보상과 에너지 상태에 기반하여 상기 제1 단말의 통신 모드를 데이터 전송 모드로 결정하는 단계를 포함하는 제1 단말의 통신 모드 결정 방법을 제공한다.The present invention provides the steps of checking whether a channel for data transmission is allocated to the second terminal in a communication network including a first terminal and a second terminal, when the channel for data transmission is allocated to the second terminal , determining the communication mode of the first terminal as an energy harvesting mode or a backscattering mode based on a reward and energy state determined through reinforcement learning, and the channel for data transmission is the second Provided is a method for determining a communication mode of a first terminal, comprising determining the communication mode of the first terminal as a data transmission mode based on a reward and energy state determined through reinforcement learning when not assigned to the terminal.

일 실시예에 따르면, 상기 강화학습은 상기 제1 단말의 에너지 상태에 대한 정보, 상기 제1 단말의 동작 모드에 대한 정보, 상기 제1 단말의 에너지 상태 변경과 관련된 확률에 대한 정보, 채널 상태에 따른 보상에 대한 정보 및 보상의 가중치에 대한 정보에 기반하여 수행될 수 있다.According to an embodiment, the reinforcement learning includes information on the energy state of the first terminal, information on the operation mode of the first terminal, information on the probability related to the change of the energy state of the first terminal, and the channel state. It may be performed based on information on the compensation according to the corresponding compensation and information on the weight of the compensation.

일 실시예에 따르면, 상기 제1 단말의 에너지 상태에 대한 정보는 특정한 시간 구간 동안 상기 제1 단말이 수집 가능한 에너지와 상기 특정한 시간 구간 동안 상기 제1 단말이 데이터를 전송하는데 소모되는 에너지에 기반하여 결정될 수 있다.According to an embodiment, the information on the energy state of the first terminal is based on energy that the first terminal can collect for a specific time period and energy consumed by the first terminal to transmit data during the specific time period. can be decided.

일 실시예에 따르면, 상기 제1 단말의 동작 모드는 제1 단말의 에너지 상태가 하위 레벨로 변경되는 데이터 전송 모드, 제1 단말의 에너지 상태가 상위 레벨로 변경되는 에너지 수집 모드, 제1 단말의 에너지 상태가 유지되는 후방산란 모드를 포함할 수 있다.According to an embodiment, the operation mode of the first terminal is a data transmission mode in which the energy state of the first terminal is changed to a lower level, an energy collection mode in which the energy state of the first terminal is changed to a higher level, and the first terminal It may include a backscatter mode in which the energy state is maintained.

일 실시예에 따르면, 상기 데이터 전송을 위한 채널이 상기 제2 단말에게 할당된 경우, 채널 상태에 따른 보상에 대한 정보는 후방산란을 통한 전송에 따른 보상 또는 에너지 수집에 따른 보상을 포함하고, 상기 데이터 전송을 위한 채널이 상기 제2 단말에게 할당되지 않은 경우, 채널 상태에 따른 보상에 대한 정보는 데이터 전송에 따른 보상을 포함할 수 있다.According to an embodiment, when the channel for data transmission is allocated to the second terminal, the information on compensation according to the channel state includes compensation according to transmission through backscattering or compensation according to energy collection, and the When a channel for data transmission is not allocated to the second terminal, the information on compensation according to the channel state may include compensation according to data transmission.

일 실시예에 따르면, 상기 보상의 가중치에 대한 정보는 0에서 1사이의 값을 가지며, 상기 보상의 가중치에 대한 정보가 0에 가까울수록 상기 후방산란을 통한 전송에 따른 보상 또는 상기 데이터 전송에 따른 보상에 대한 가중치값이 커지며, 상기 보상의 가중치에 대한 정보가 1에 가까울수록 상기 에너지 수집에 따른 보상에 대한 가중치값이 커질 수 있다.According to an embodiment, the information on the weight of the compensation has a value between 0 and 1, and as the information on the weight of the compensation is closer to 0, the compensation according to the transmission through the backscattering or the information according to the data transmission. The weight value for the compensation increases, and as the information on the weight of the compensation is closer to 1, the weight value for the compensation according to the energy collection may increase.

본 발명은 제1 단말과 제2 단말을 포함하는 통신 네트워크에서 기지국으로 데이터를 전송하거나 상기 기지국으로부터 데이터를 수신하는 송수신부 및 데이터 전송을 위한 채널이 상기 제2 단말에게 할당되었는지 여부를 확인하고, 상기 데이터 전송을 위한 채널이 상기 제2 단말에게 할당된 경우, 강화학습을 통해 결정되는 보상과 에너지 상태에 기반하여 상기 제1 단말의 통신 모드를 에너지 수집(harvesting) 모드 또는 후방산란(backscattering) 모드로 결정하며, 상기 데이터 전송을 위한 채널이 상기 제2 단말에게 할당되지 않은 경우, 강화학습을 통해 결정되는 보상과 에너지 상태에 기반하여 상기 제1 단말의 통신 모드를 데이터 전송 모드로 결정하는 제어부를 포함하는 제1 단말을 제공한다.The present invention transmits data to or from a base station in a communication network including a first terminal and a second terminal, and a transceiver for receiving data from the base station and a channel for data transmission are allocated to the second terminal, When the channel for data transmission is allocated to the second terminal, the communication mode of the first terminal is changed to an energy harvesting mode or a backscattering mode based on a reward and energy state determined through reinforcement learning. and a control unit that determines the communication mode of the first terminal as a data transmission mode based on a reward and energy state determined through reinforcement learning when the channel for data transmission is not assigned to the second terminal. It provides a first terminal including.

본 발명에서 개시하고 있는 일 실시예에 따르면, 단말은 임의의 에너지 상태에서 기존의 무선 전송 방식으로 데이터를 전송할지, 에너지를 수집할지 또는 후방산란 모드로 동작하여 데이터 비트를 전송할지 여부를 결정할 수 있다. 뿐만 아니라 본 발명에서 개시하고 있는 일 실시예에 따르면, 강화학습을 통해 단말이 결정한 동작 모드로부터 얻을 수 있는 수행 이득(보상)을 최대화할 수 있도록 에너지 상태에 적합한 동작을 수행함으로써 패킷 손실을 최소화할 수 있다. 더불어 본 발명에서 개시하고 있는 일 실시예에 따를 경우, 채널 점유를 확인하기 위한 별도의 센싱 구간이 필요하지 않기 때문에 프레임 내 전체 슬롯 동안 동작 모드를 수행하여 더 많은 데이터 비트를 전송하고 더 오랫동안 에너지를 충전할 수 있다.According to an embodiment disclosed in the present invention, the terminal can determine whether to transmit data, collect energy, or operate in a backscatter mode to transmit data bits in an arbitrary energy state in the existing wireless transmission method. have. In addition, according to an embodiment disclosed in the present invention, packet loss can be minimized by performing an operation suitable for the energy state so as to maximize the performance gain (compensation) obtainable from the operation mode determined by the terminal through reinforcement learning. can In addition, according to an embodiment disclosed in the present invention, since a separate sensing period for checking the channel occupancy is not required, the operation mode is performed during the entire slot in the frame to transmit more data bits and save energy for a longer period of time. can be recharged

도 1은 종래의 인지 라디오 네트워크(Cognitive Radio Network; CRN) 통신 시스템을 도시한 도면이다.
도 2는 본 발명의 일 실시예에 따른 통신 네트워크에서 통신 모드를 설명하기 위한 도면이다.
도 3은 본 발명의 일 실시예에 따른 제1 단말의 에너지 상태를 설명하기 위한 도면이다.
도 4는 본 발명의 일 실시예에 따라 제1 단말의 통신 모드가 변경되는 경우 제1 단말의 에너지 상태 변화를 나타낸 도면이다.
도 5는 본 발명의 일 실시예에 따른 강화학습 구조를 설명하기 위한 도면이다.
도 6은 본 발명의 일 실시예에 따른 단말의 통신 모드 결정 방법을 설명하기 위한 흐름도이다.
도 7은 본 발명의 일 실시예에 따른 단말의 블록도이다.1 is a diagram illustrating a conventional Cognitive Radio Network (CRN) communication system.
2 is a diagram for explaining a communication mode in a communication network according to an embodiment of the present invention.
3 is a diagram for explaining an energy state of a first terminal according to an embodiment of the present invention.
4 is a diagram illustrating a change in the energy state of the first terminal when the communication mode of the first terminal is changed according to an embodiment of the present invention.
5 is a diagram for explaining a reinforcement learning structure according to an embodiment of the present invention.
6 is a flowchart illustrating a method of determining a communication mode of a terminal according to an embodiment of the present invention.
7 is a block diagram of a terminal according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면을 참조하여 상세하게 설명하도록 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다.Since the present invention can have various changes and can have various embodiments, specific embodiments will be described in detail with reference to the drawings. However, this is not intended to limit the present invention to specific embodiments, and it should be understood to include all modifications, equivalents and substitutes included in the spirit and scope of the present invention. In describing each figure, like reference numerals have been used for like elements.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재 항목들의 조합 또는 복수의 관련된 기재 항목들 중의 어느 항목을 포함한다.Terms such as first, second, A, and B may be used to describe various elements, but the elements should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, a first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. and/or includes a combination of a plurality of related description items or any of a plurality of related description items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급될 때에는 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. When a component is referred to as being “connected” or “connected” to another component, it may be directly connected or connected to the other component, but it should be understood that other components may exist in between. something to do. On the other hand, when it is mentioned that a certain element is "directly connected" or "directly connected" to another element, it should be understood that the other element does not exist in the middle.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the present application are only used to describe specific embodiments, and are not intended to limit the present invention. The singular expression includes the plural expression unless the context clearly dictates otherwise. In the present application, terms such as “comprise” or “have” are intended to designate that a feature, number, step, operation, component, part, or combination thereof described in the specification exists, but one or more other features It should be understood that this does not preclude the existence or addition of numbers, steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical and scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present application. does not

명세서 및 청구범위 전체에서, 어떤 부분이 어떤 구성 요소를 포함한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있다는 것을 의미한다. Throughout the specification and claims, when a part includes a certain element, it means that other elements may be further included, rather than excluding other elements, unless otherwise stated.

도 2는 본 발명의 일 실시예에 따른 통신 네트워크에서 통신 모드를 설명하기 위한 도면이다.2 is a diagram for explaining a communication mode in a communication network according to an embodiment of the present invention.

일 실시예에 따르면 통신 네트워크는 제1 단말(210), 제2 단말(220), 기지국(230) 및 제3 단말(240)을 포함할 수 있다. 다양한 실시예에 따르면, 상기 제3 단말(240)은 기지국에 포함될 수 있다.According to an embodiment, the communication network may include a first terminal 210 , a second terminal 220 , a base station 230 , and a third terminal 240 . According to various embodiments, the third terminal 240 may be included in the base station.

일 실시예에 따르면, 도 2에서 도시하고 있는 통신 네트워크에서 제1 단말(210)은 채널 상태를 감지하여 통신 모드를 결정할 수 있다. 예를 들어, 채널 상태 감지 결과 채널이 제2 단말(220)과 기지국(230)과의 통신에 의해 이미 점유되었다고 판단된 경우(도 2의 (a) 또는 (b)에 해당되는 경우), 제1 단말(210)은 후방산란 모드로 동작(도 2의 (a))하거나 에너지 수집 모드로 동작(도 2의 (b))할 수 있다.According to an embodiment, in the communication network shown in FIG. 2 , the first terminal 210 may detect a channel state and determine a communication mode. For example, when it is determined that the channel is already occupied by communication between the second terminal 220 and the base station 230 as a result of the channel state detection (corresponding to (a) or (b) of FIG. 2), the second One terminal 210 may operate in the backscatter mode ((a) of FIG. 2) or may operate in the energy collection mode ((b) of FIG. 2).

일 실시예에 따르면, 도 2 (b)에서 도시하고 있는 바와 같이 채널이 제2 단말(220)과 기지국(230)과의 통신에 의해 이미 점유된 경우, 제1 단말(210)과 제3 단말(240)은 기지국(230)이 전소하는 프라이머리(primary) 신호로부터 무선 에너지를 수집(harvesting)할 수 있다. 다양한 실시예에 따르면, 채널이 제2 단말(220)에게 할당되어 있는 상태에서 제1 단말(210) 또는 제3 단말(240)이 수집할 수 있는 에너지는 제한될 수 있는바, 이후 상기 채널이 비어있을 때 신호를 복조하고 전송하는데 어려움이 발생할 수 있다. 따라서, 도 2 (a)에서 도시하고 있는 바와 같이 제1 단말(210)은 주변에 존재하는 신호를 반사(reflecting) 또는 비반사(non-reflecting)하여 신호를 수신하고자 하는 제3 단말(240)로 전송할 수 있다. 예를 들어 제1 단말(210)이 주변에 존재하는 신호를 반사하는 경우, 제1 단말(210)은 '1'의 데이터 비트 신호를 제3 단말(240)로 전송할 수 있으며, 제1 단말(210)이 주변에 존재하는 신호를 반사하지 않는 경우, 제1 단말(210)은 '0'의 데이터 비트 신호를 제3 단말(240)로 전송할 수 있다.According to an embodiment, when the channel is already occupied by communication between the second terminal 220 and the base station 230 as shown in FIG. 2(b), the first terminal 210 and the third terminal The 240 may collect wireless energy from a primary signal that the base station 230 burns out (harvesting). According to various embodiments, the energy that can be collected by the first terminal 210 or the third terminal 240 in a state in which the channel is allocated to the second terminal 220 may be limited, after which the channel When empty, difficulties can arise in demodulating and transmitting the signal. Therefore, as shown in Figure 2 (a), the first terminal 210 reflects (reflecting) or non-reflecting (non-reflecting) a signal existing in the vicinity to receive a signal to receive a third terminal (240) can be sent to For example, when the first terminal 210 reflects a signal existing in the vicinity, the first terminal 210 may transmit a data bit signal of '1' to the third terminal 240, and the first terminal ( When the 210 does not reflect a signal existing in the vicinity, the first terminal 210 may transmit a data bit signal of '0' to the third terminal 240 .

일 실시예에 따르면, 도 2 (c)에서 도시하고 있는 바와 같이 채널이 제2 단말(220)과 기지국(230)과의 통신에 의해 이미 점유되지 않은 경우, 제1 단말(210)은 상기 채널을 통해 제3 단말(240)에게 데이터를 전송할 수 있다. According to an embodiment, when the channel is not already occupied by the communication between the second terminal 220 and the base station 230 as shown in FIG. 2(c), the first terminal 210 is the channel. Data may be transmitted to the third terminal 240 through .

일 실시예에 따르면, 제1 단말(210)은 제1 단말(210)의 에너지 상태에 대한 정보, 상기 제1 단말의 동작 모드에 대한 정보, 상기 제1 단말의 에너지 상태 변경과 관련된 확률에 대한 정보, 채널 상태에 따른 보상에 대한 정보 및 보상의 가중치에 대한 정보에 기반하여 통신 모드를 결정하기 위한 강화학습을 수행할 수 있다. 다양한 실시예에 따르면, 상기 강화학습을 통해 제1 단말(210)은 제1 단말(210)이 보유한 에너지의 상태에 최적화된 통신 모드를 결정할 수 있다. 본 발명의 일 실시예에 따른 강화학습의 구체적인 수행방법은 도 3 내지 도 5에 대한 설명을 통해 후술한다.According to an embodiment, the first terminal 210 provides information on the energy state of the first terminal 210, information on the operation mode of the first terminal, and a probability related to the change in the energy state of the first terminal. Reinforcement learning for determining a communication mode may be performed based on information, information on compensation according to channel conditions, and information on weight of compensation. According to various embodiments, through the reinforcement learning, the first terminal 210 may determine a communication mode optimized for the state of energy possessed by the first terminal 210 . A specific method for performing reinforcement learning according to an embodiment of the present invention will be described later with reference to FIGS. 3 to 5 .

도 3은 본 발명의 일 실시예에 따른 제1 단말의 에너지 상태를 설명하기 위한 도면이다.3 is a diagram for explaining an energy state of a first terminal according to an embodiment of the present invention.

일 실시예에 따르면, 본 발명에 따른 강화학습은 마르코프 결정 과정(Markov decision process)에 기반할 수 있다. 다양한 실시예에 따르면, 마르코프 결정 과정은 다음 단계의 상태가 현재 상태에서 취한 행동에 의해서 결정될 수 있다는 점을 전제로 수행될 수 있다.According to an embodiment, reinforcement learning according to the present invention may be based on a Markov decision process. According to various embodiments, the Markov determination process may be performed on the premise that the state of the next stage may be determined by the action taken in the current state.

일 실시예에 따르면, 마르코프 결정 과정은 상태(state), 행동(action), 상태 전이 확률(state transition probability), 보상(reward), 디스카운트 요소(discount factor)에 기반하여 수행될 수 있다. 다양한 실시예에 따르면, 상기 상태는 에이전트(agent)가 처할 수 있는 상태의 집합일 수 있으며, 상기 행동은 에이전트가 취할 수 있는 행동의 집합일 수 있고, 상태 전이 확률은 현재 상태에서 행동을 취했을 때 다음 상태로 갈 확률일 수 있으며, 보상은 현재 상태에서 행동을 취했을 때 받게 되는 보상일 수 있고, 디스카운트 요소는 현재 얻는 보상이 미래에 얻는 보상보다 얼마나 더 중요한지를 나타내기 위한 요소일 수 있다.According to an embodiment, the Markov determination process may be performed based on a state, an action, a state transition probability, a reward, and a discount factor. According to various embodiments, the state may be a set of states that an agent can take, the action may be a set of actions that the agent can take, and the state transition probability is when an action is taken in the current state. It may be a probability of going to the next state, the reward may be a reward received when an action is taken in the current state, and the discount factor may be a factor for indicating how important the present reward is than the future reward.

일 실시예에 따르면, 도 2에서 도시한 제1 단말(210)이 마르코프 결정 과정에서 에이전트에 해당할 수 있으며, 제1 단말의 에너지 상태가 마르코프 결정 과정의 상태에 해당할 수 있다. 다양한 실시예에 따르면, 제1 단말(210)은 제1 단말로부터 데이터를 수신하는 제3 단말(240)에서의 목표 신호대잡음비(signal to noise ratio, SNR)를 만족시키기 위해 일정 크기 이상의 송신전력을 사용할 수 있다.According to an embodiment, the first terminal 210 shown in FIG. 2 may correspond to an agent in the Markov determination process, and the energy state of the first terminal may correspond to the state in the Markov determination process. According to various embodiments, the first terminal 210 transmits power of a certain amount or more to satisfy a target signal to noise ratio (SNR) in the third terminal 240 that receives data from the first terminal. Can be used.

한편, 제1 단말(210)에 의한 에너지 수집은 에너지 전송 채널 상태, RF-전기 변환 효율, 에너지 수집 소요 시간 등에 영향을 받을 수 있다. 따라서 본 발명에서는 동일 타임 슬롯 동안에 수집 가능한 에너지(E_h)와 데이터를 전송하면서 소모하는 에너지(E_c)가 아래 수식 1의 관계식을 가진다고 가정하였다.Meanwhile, energy collection by the first terminal 210 may be affected by an energy transmission channel state, RF-to-electricity conversion efficiency, energy collection time required, and the like. Therefore, in the present invention, it is assumed that the energy that can be collected during the same time slot (E _h ) and the energy consumed while transmitting data (E _c ) have the relational expression of Equation 1 below.

[수식 1][Formula 1]

E_C=m*E_h, m: 양의 정수E _C =m*E _h , m: positive integer

즉, 상기 수식 1을 통해 에너지 수집을 통해서는 에너지가 천천히 축적되며, 데이터 전송을 통해서는 에너지가 빨리 소모되는 것을 확인할 수 있다. 한편, 본 발명에서는 용량이 제한된 배터리를 고려하여 배터리가 완전히 충전되면 제1 단말이 연속하여 L번 데이터를 전송할 수 있다고 가정하였으며, 이 같은 가정에 기반한 제1 단말의 에너지 상태는 도 3에서 도시한 바와 같다.That is, through Equation 1, it can be confirmed that energy is slowly accumulated through energy collection and energy is quickly consumed through data transmission. Meanwhile, in the present invention, it is assumed that the first terminal can continuously transmit data L times when the battery is fully charged in consideration of the battery with limited capacity, and the energy state of the first terminal based on this assumption is shown in FIG. like a bar

일 실시예에 따르면, 에너지 고갈(energy outage) 상태는 제1 단말의 배터리 잔여 에너지가 일정 수준 이상을 유지하도록 하여 회로의 지속적으로 안정적인 동작을 보장하기 위해 정의한 에너지 상태일 수 있으며, 에너지 고갈 상태에서 제1 단말의 데이터 전송과 후방산란 동작은 제한될 수 있다. 다양한 실시예에 따르면, 에너지 부족 (energy deficiency) 레벨은 데이터 전송이 제한되는 모든 에너지 상태들의 집합을 의미할 수 있으며, 에너지 고갈 상태는 에너지 부족 레벨의 최하위 상태가 될 수 있다. 한편, 완전 충전 상태를 포함한 에너지 충분 (energy sufficiency) 레벨에서 제1 단말은 무선 전송을 포함한 통신 모드를 수행할 수 있다.According to an embodiment, the energy outage state may be an energy state defined to ensure a continuous and stable operation of the circuit by maintaining the battery residual energy of the first terminal at a predetermined level or more, and in the energy depletion state Data transmission and backscattering operations of the first terminal may be limited. According to various embodiments, the energy deficiency level may mean a set of all energy states in which data transmission is restricted, and the energy deficiency state may be the lowest state of the energy deficiency level. Meanwhile, at an energy sufficiency level including a fully charged state, the first terminal may perform a communication mode including wireless transmission.

일 실시예에 따르면, 도 2에서 도시한 제1 단말(210)이 마르코프 결정 과정에서 에이전트에 해당할 수 있으며, 제1 단말이 동작할 수 있는 통신 모드가 마르코프 결정 과정의 액션에 해당할 수 있다. 예를 들어, 데이터 전송 모드, 에너지 수집 모드, 후방산란 모드가 마르코프 결정 과정의 액션에 대응될 수 있다.According to an embodiment, the first terminal 210 shown in FIG. 2 may correspond to an agent in the Markov determination process, and a communication mode in which the first terminal can operate may correspond to an action in the Markov determination process. . For example, a data transmission mode, an energy collection mode, and a backscattering mode may correspond to actions of a Markov determination process.

일 실시예에 따르면, 제1 단말은 특정 에너지 상태에서 수행 가능한 통신 모드가 정의되어 있으며, 제1 단말이 선택한 통신 모드에 따라 제1 단말의 에너지 상태가 변할 수 있다. 다양한 실시예에 따르면, 제1 단말이 전송 모드로 동작하면 제1 단말의 에너지 상태는 하위 레벨로 전이할 수 있으며, 제1 단말이 에너지 수집 모드로 동작하면 제1 단말의 에너지 상태는 상위 레벨로 전이할 수 있으며, 제1 단말이 후방산란 모드로 동작하면 제1 단말의 에너지 상태는 현재의 에너지 상태를 유지할 수 있다.According to an embodiment, a communication mode that can be performed in a specific energy state is defined for the first terminal, and the energy state of the first terminal may change according to the communication mode selected by the first terminal. According to various embodiments, when the first terminal operates in the transmission mode, the energy state of the first terminal may transition to a lower level, and when the first terminal operates in the energy collection mode, the energy state of the first terminal moves to a higher level transition, and when the first terminal operates in the backscattering mode, the energy state of the first terminal may maintain the current energy state.

도 4는 본 발명의 일 실시예에 따라 제1 단말의 통신 모드가 변경되는 경우 제1 단말의 에너지 상태 변화를 나타낸 도면이다.4 is a diagram illustrating a change in the energy state of the first terminal when the communication mode of the first terminal is changed according to an embodiment of the present invention.

일 실시예에 따르면, 도 2에서 도시한 제1 단말(210)이 마르코프 결정 과정에서 에이전트에 해당할 수 있으며, 제1 단말의 에너지 상태 변화 확률이 마르코프 결정 과정의 상태 전이 확률에 해당할 수 있다. 다양한 실시예에 따르면, 제1 단말이 데이터 전송 모드를 수행하면 채널의 상태와 상관없이 제1 단말은 에너지를 소모할 수 있으며, 에너지 수집은 제1 단말 주변에 에너지 수집에 이용될 수 있는 무선 신호가 실제로 존재하는 경우에만 수행될 수 있다.According to an embodiment, the first terminal 210 shown in FIG. 2 may correspond to an agent in the Markov determination process, and the energy state change probability of the first terminal may correspond to the state transition probability in the Markov determination process. . According to various embodiments, when the first terminal performs the data transmission mode, the first terminal may consume energy regardless of the state of the channel, and the energy collection is a radio signal that can be used for energy collection around the first terminal. It can only be performed if .

일 실시예에 따르면, 하위 에너지 레벨로 결정론적으로(deterministic) 전이하는 데이터 전송 모드와는 달리 에너지 수집 모드 또는 후방산란 모드는 채널의 점유 상태에 따라 확률적으로 상위 에너지 상태로 전이(에너지 수집 모드의 경우)하거나 현재의 에너지 상태를 유지(후방산란 모드의 경우)할 수 있다. 다양한 실시예에 따르면, 후방산란 모드에서는 회로에서 소모하는 에너지를 고려하지 않기 때문에 채널의 상태와 무관하게 후방산란 동작을 수행한 후 제1 단말은 현재의 에너지 상태를 유지할 수 있다.According to an embodiment, unlike the data transmission mode in which a data transmission mode deterministically transitions to a lower energy level, the energy collection mode or the backscattering mode probabilistically transitions to a higher energy state according to the occupation state of the channel (energy collection mode ) or maintain the current energy state (in the case of backscatter mode). According to various embodiments, since energy consumed by the circuit is not considered in the backscattering mode, after performing the backscattering operation regardless of the channel state, the first terminal may maintain the current energy state.

도 5는 본 발명의 일 실시예에 따른 강화학습 구조를 설명하기 위한 도면이다. 5 is a diagram for explaining a reinforcement learning structure according to an embodiment of the present invention.

일 실시예에 따르면, 도 2에서 도시한 제1 단말(210)이 마르코프 결정 과정에서 에이전트에 해당할 수 있으며, 제1 단말이 동작하는 통신 모드에 따른 보상이 마르코프 결정 과정의 보상에 해당할 수 있다. 다양한 실시예에 따르면, 제1 단말은 데이터 전송 모드에서는 데이터 무선 전송을 통한 보상을 얻을 수 있다.According to an embodiment, the first terminal 210 shown in FIG. 2 may correspond to an agent in the Markov determination process, and a reward according to the communication mode in which the first terminal operates may correspond to a reward in the Markov determination process. have. According to various embodiments, the first terminal may obtain compensation through data wireless transmission in the data transmission mode.

일 실시예에 따르면, 채널 상태에 기반하여 제1 단말은 데이터 전송 모드로 동작할 지 또는 후방산란 모드(또는 데이터 수집 모드)로 동작할 지 여부를 결정할 수 있다. 즉, 채널 상태에 따라 강화학습의 보상 요소가 결정될 수 있다. 다양한 실시예에 따르면, 채널 상태가 점유되어 있지 않은 경우 보상 요소는 제1 단말의 무선 전송을 통한 보상일 수 있으며, 채널 상태가 점유되어 있는 경우 보상 요소는 제1 단말의 후반산란을 통한 보상(또는 에너지 수집을 통한 보상)일 수 있다. 한편, 마르코프 결정 과정에 이용되는 보상은 즉각적인 보상 뿐만 아니라 미래에 얻을 수 있는 보상까지 고려될 수 있다.According to an embodiment, the first terminal may determine whether to operate in the data transmission mode or the backscatter mode (or data collection mode) based on the channel state. That is, a reward element of reinforcement learning may be determined according to a channel state. According to various embodiments, when the channel state is not occupied, the compensation element may be compensation through wireless transmission of the first terminal, and when the channel state is occupied, the compensation element is compensation through the second half scattering of the first terminal ( or compensation through energy collection). On the other hand, rewards used in the Markov decision process can be considered not only immediate rewards but also future rewards.

일 실시예에 따르면, 디스카운트 요소는 0에서 1사이의 값을 가질 수 있다. 다양한 실시예에 따르면, 디스카운트 요소가 0에 가까우면 즉각적인 데이터 전송 또는 후방산란을 통해 얻을 수 있는 근시안적인 보상이 우선될 수 있으며, 디스카운트 요소가 1에 가까우면 미래지향적인 보상(예를 들어 에너지 수집을 통한 보상)이 우선될 수 있다.According to an embodiment, the discount element may have a value between 0 and 1. According to various embodiments, when the discount factor is close to 0, a short-sighted reward obtainable through immediate data transmission or backscattering may be prioritized, and when the discount factor is close to 1, a forward-looking reward (e.g., reducing energy collection) may be preferred. compensation) may be given priority.

본 발명에서 개시하고 있는 일 실시예에 따르면, 데이터 전송과 주변 후방산란이 제한되는 에너지 고갈 상태가 발생할 확률이 감소할 수 있으므로 패킷 손실이 감소할 수 있다. 본 발명에 따라 에너지 고갈 상태가 발생하는 확률을 최소화하기 위해 제1 단말은 에너지 상태에서 최대의 보상을 얻는 동시에 에너지가 고갈되지 않도록 하는 액션을 취해야 한다. 즉, 도 5는 패킷 손실 감소 방안을 구현하기 위한 제1 단말과 채널 점유 상태간의 상호작용을 나타낸 것이다.According to an embodiment disclosed in the present invention, the probability of occurrence of an energy depletion state in which data transmission and peripheral backscattering are limited may be reduced, and thus packet loss may be reduced. According to the present invention, in order to minimize the probability that an energy depletion state occurs, the first terminal should take an action to prevent energy depletion while obtaining the maximum reward in the energy state. That is, FIG. 5 shows the interaction between the first terminal and the channel occupancy state for implementing the packet loss reduction scheme.

일 실시예에 따르면, 제1 단말은 관찰을 통해 자신의 에너지 상태를 확인할 수 있으며, 에너지 상태에 기반하여 액션(통신 모드 결정)을 취할 수 있고, 채널의 점유 상태에 따라 전이할 다음 에너지 상태에 대한 정보와 액션에 따른 보상을 얻을 수 있다.According to an embodiment, the first terminal may check its energy state through observation, may take an action (determining the communication mode) based on the energy state, and may move to the next energy state to be transitioned according to the occupancy state of the channel. You can get information and rewards according to your actions.

도 6은 본 발명의 일 실시예에 따른 단말의 통신 모드 결정 방법을 설명하기 위한 흐름도이다.6 is a flowchart illustrating a method of determining a communication mode of a terminal according to an embodiment of the present invention.

일 실시예에 따르면, 제1 단말과 제2 단말을 포함하는 통신 네트워크에서 제1 단말은 S610 단계를 통해 데이터 전송을 위한 채널이 상기 제2 단말에게 할당되었는지 여부를 확인할 수 있다. 다양한 실시예에 따르면, 상기 S610 단계는 강화학습 초기 단계에만 수행될 수 있으며 강화학습이 완성된 이후에는 생략될 수 있다.According to an embodiment, in a communication network including the first terminal and the second terminal, the first terminal may check whether a channel for data transmission is allocated to the second terminal through step S610. According to various embodiments, the step S610 may be performed only in the initial stage of reinforcement learning, and may be omitted after the reinforcement learning is completed.

일 실시예에 따르면, 상기 데이터 전송을 위한 채널이 상기 제2 단말에게 할당되지 않은 경우, S620 단계에서 제1 단말은 강화학습을 통해 결정되는 보상과 에너지 상태에 기반하여 상기 제1 단말의 통신 모드를 데이터 전송 모드로 결정할 수 있다. 다양한 실시예에 따르면, 상기 데이터 전송을 위한 채널이 상기 제2 단말에게 할당된 경우, S630 단계에서 제1 단말은 강화학습을 통해 결정되는 보상과 에너지 상태에 기반하여 상기 제1 단말의 통신 모드를 에너지 수집(harvesting) 모드 또는 후방산란(backscattering) 모드로 결정할 수 있다. 한편, 강화학습에 대한 구체적인 설명과 각 통신 모드(데이터 전송 모드, 후방산란 모드, 에너지 수집 모드)에 대한 구체적인 설명은 앞선 설명으로 대체한다.According to an embodiment, when the channel for data transmission is not allocated to the second terminal, in step S620, the first terminal determines the communication mode of the first terminal based on the reward and energy state determined through reinforcement learning. may be determined as the data transmission mode. According to various embodiments, when the channel for data transmission is allocated to the second terminal, in step S630, the first terminal selects the communication mode of the first terminal based on a reward and energy state determined through reinforcement learning. It can be determined as an energy harvesting mode or a backscattering mode. Meanwhile, a detailed description of reinforcement learning and a detailed description of each communication mode (data transmission mode, backscattering mode, and energy collection mode) are replaced with the previous description.

도 7은 본 발명의 일 실시예에 따른 단말의 블록도이다.7 is a block diagram of a terminal according to an embodiment of the present invention.

일 실시예에 따르면, 제1 단말과 제2 단말을 포함하는 통신 네트워크에서 제1 단말(700)은 기지국으로 데이터를 전송하거나 상기 기지국으로부터 데이터를 수신하는 송수신부(710) 및 데이터 전송을 위한 채널이 상기 제2 단말에게 할당되었는지 여부를 확인하고, 상기 데이터 전송을 위한 채널이 상기 제2 단말에게 할당된 경우, 강화학습을 통해 결정되는 보상과 에너지 상태에 기반하여 상기 제1 단말의 통신 모드를 에너지 수집(harvesting) 모드 또는 후방산란(backscattering) 모드로 결정하며, 상기 데이터 전송을 위한 채널이 상기 제2 단말에게 할당되지 않은 경우, 강화학습을 통해 결정되는 보상과 에너지 상태에 기반하여 상기 제1 단말의 통신 모드를 데이터 전송 모드로 결정하는 제어부(720)를 포함할 수 있다. 다양한 실시예에 따르면, 상기 강화학습은 상기 제1 단말의 에너지 상태에 대한 정보, 상기 제1 단말의 동작 모드에 대한 정보, 상기 제1 단말의 에너지 상태 변경과 관련된 확률에 대한 정보, 채널 상태에 따른 보상에 대한 정보 및 보상의 가중치에 대한 정보에 기반하여 수행될 수 있다.According to an embodiment, in a communication network including a first terminal and a second terminal, the first terminal 700 transmits data to or receives data from the base station and a transceiver 710 for receiving data from the base station and a channel for data transmission It is checked whether this is allocated to the second terminal, and when the channel for data transmission is allocated to the second terminal, the communication mode of the first terminal is determined based on the reward and energy state determined through reinforcement learning. When it is determined in an energy harvesting mode or a backscattering mode, and the channel for data transmission is not allocated to the second terminal, the first based on the energy state and compensation determined through reinforcement learning It may include a control unit 720 that determines the communication mode of the terminal as the data transmission mode. According to various embodiments, the reinforcement learning includes information on the energy state of the first terminal, information on the operation mode of the first terminal, information on the probability associated with the change of the energy state of the first terminal, and the channel state. It may be performed based on information on the compensation according to the corresponding compensation and information on the weight of the compensation.

일 실시예에 따르면, 상기 제1 단말의 에너지 상태에 대한 정보는 특정한 시간 구간 동안 상기 제1 단말이 수집 가능한 에너지와 상기 특정한 시간 구간 동안 상기 제1 단말이 데이터를 전송하는데 소모되는 에너지에 기반하여 결정될 수 있다. 다양한 실시예에 따르면, 상기 제1 단말의 동작 모드는 제1 단말의 에너지 상태가 하위 레벨로 변경되는 데이터 전송 모드, 제1 단말의 에너지 상태가 상위 레벨로 변경되는 에너지 수집 모드, 제1 단말의 에너지 상태가 유지되는 후방산란 모드를 포함할 수 있다.According to an embodiment, the information on the energy state of the first terminal is based on energy that the first terminal can collect for a specific time period and energy consumed by the first terminal to transmit data during the specific time period. can be decided. According to various embodiments, the operation mode of the first terminal is a data transmission mode in which the energy state of the first terminal is changed to a lower level, an energy collection mode in which the energy state of the first terminal is changed to a higher level, and the first terminal It may include a backscatter mode in which the energy state is maintained.

일 실시예에 따르면, 상기 데이터 전송을 위한 채널이 상기 제2 단말에게 할당된 경우, 채널 상태에 따른 보상에 대한 정보는 후방산란을 통한 전송에 따른 보상 또는 에너지 수집에 따른 보상을 포함하고, 상기 데이터 전송을 위한 채널이 상기 제2 단말에게 할당되지 않은 경우, 채널 상태에 따른 보상에 대한 정보는 데이터 전송에 따른 보상을 포함할 수 있다. 다양한 실시예에 따르면, 상기 보상의 가중치에 대한 정보는 0에서 1사이의 값을 가지며, 상기 보상의 가중치에 대한 정보가 0에 가까울수록 상기 후방산란을 통한 전송에 따른 보상 또는 상기 데이터 전송에 따른 보상에 대한 가중치값이 커지며, 상기 보상의 가중치에 대한 정보가 1에 가까울수록 상기 에너지 수집에 따른 보상에 대한 가중치값이 커질 수 있다.According to an embodiment, when the channel for data transmission is allocated to the second terminal, the information on compensation according to the channel state includes compensation according to transmission through backscattering or compensation according to energy collection, and the When a channel for data transmission is not allocated to the second terminal, the information on compensation according to the channel state may include compensation according to data transmission. According to various embodiments, the information on the weight of the compensation has a value between 0 and 1, and as the information on the weight of the compensation is closer to 0, the compensation according to the transmission through the backscattering or the information according to the data transmission The weight value for the compensation increases, and as the information on the weight of the compensation is closer to 1, the weight value for the compensation according to the energy collection may increase.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 사람이라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 발명에 개시된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea of the present invention, and a person of ordinary skill in the art to which the present invention pertains may make various modifications and variations without departing from the essential characteristics of the present invention. Therefore, the embodiments disclosed in the present invention are not intended to limit the technical spirit of the present invention, but to explain, and the scope of the technical spirit of the present invention is not limited by these embodiments. The protection scope of the present invention should be construed by the following claims, and all technical ideas within the equivalent range should be construed as being included in the scope of the present invention.

Claims

A method for determining a communication mode of a first terminal in a communication network including a first terminal and a second terminal, the method comprising:
When a channel for data transmission is allocated to the second terminal, the communication mode of the first terminal is changed to an energy harvesting mode or a backscattering mode based on a reward and an energy state received by the first terminal. learning through reinforcement learning; and
When the channel for data transmission is not allocated to the second terminal, learning the communication mode of the first terminal as a data transmission mode through reinforcement learning based on the reward and energy state received by the first terminal including,
The reinforcement learning includes information on the energy state of the first terminal, information on the operation mode of the first terminal, information on the probability associated with the change in the energy state of the first terminal, information on compensation according to the channel state, and characterized in that it is performed based on information about the weight of the reward,
A method of determining the communication mode of the first terminal.

delete

According to claim 1,
Information on the energy state of the first terminal is determined based on energy that the first terminal can collect during a specific time period and energy consumed by the first terminal to transmit data during the specific time period ,
A method of determining the communication mode of the first terminal.

According to claim 1,
The operation mode of the first terminal is a data transmission mode in which the energy state of the first terminal is changed to a lower level, an energy collection mode in which the energy state of the first terminal is changed to a higher level, and the rear in which the energy state of the first terminal is maintained characterized in that it includes a scattering mode,
A method of determining the communication mode of the first terminal.

According to claim 1,
When the channel for data transmission is allocated to the second terminal, the information on compensation according to the channel state includes compensation according to transmission through backscattering or compensation according to energy collection, and the channel for data transmission is When it is not allocated to the second terminal, the information on the compensation according to the channel state is characterized in that it includes compensation according to the data transmission,
A method of determining the communication mode of the first terminal.

6. The method of claim 5,
The information on the weight of the compensation has a value between 0 and 1, and as the information on the weight of the compensation is closer to 0, the weight value for the compensation according to the transmission through the backscattering or the compensation according to the data transmission increases. It is characterized in that the weight value for the compensation according to the energy collection increases as the information on the weight of the compensation is closer to 1,
A method of determining the communication mode of the first terminal.

In a first terminal in a communication network including a first terminal and a second terminal,
a transceiver for transmitting data to or receiving data from the base station; and
When a channel for data transmission is allocated to the second terminal, the communication mode of the first terminal is changed to an energy harvesting mode or a backscattering mode based on a reward and an energy state received by the first terminal. is learned through reinforcement learning, and when the channel for data transmission is not assigned to the second terminal, the communication mode of the first terminal is changed based on the reward and energy state received by the first terminal. including a control unit that learns through reinforcement learning with
The reinforcement learning includes information on the energy state of the first terminal, information on the operation mode of the first terminal, information on the probability associated with the change in the energy state of the first terminal, information on compensation according to the channel state, and characterized in that it is performed based on information about the weight of the reward,
first terminal.

delete

8. The method of claim 7,
Information on the energy state of the first terminal is determined based on energy that the first terminal can collect during a specific time period and energy consumed by the first terminal to transmit data during the specific time period ,
first terminal.

8. The method of claim 7,
The operation mode of the first terminal is a data transmission mode in which the energy state of the first terminal is changed to a lower level, an energy collection mode in which the energy state of the first terminal is changed to a higher level, and the rear in which the energy state of the first terminal is maintained characterized in that it includes a scattering mode,
first terminal.

8. The method of claim 7,
When the channel for data transmission is allocated to the second terminal, the information on compensation according to the channel state includes compensation according to transmission through backscattering or compensation according to energy collection, and the channel for data transmission is When it is not allocated to the second terminal, the information on the compensation according to the channel state is characterized in that it includes compensation according to the data transmission,
first terminal.

12. The method of claim 11,
The information on the weight of the compensation has a value between 0 and 1, and as the information on the weight of the compensation is closer to 0, the weight value for the compensation according to the transmission through the backscattering or the compensation according to the data transmission increases. It is characterized in that the weight value for the compensation according to the energy collection increases as the information on the weight of the compensation is closer to 1,
first terminal.