KR20200080388A

KR20200080388A - Method and system for energy efficient lora enabled iot device using reinforcement learning

Info

Publication number: KR20200080388A
Application number: KR1020180163807A
Authority: KR
Inventors: 최준균; 안재원
Original assignee: 한국과학기술원
Priority date: 2018-12-18
Filing date: 2018-12-18
Publication date: 2020-07-07
Also published as: KR102151317B1

Abstract

Disclosed are a method for optimizing the energy of an Internet of things (IoT) device using a reinforcement learning technique and a system thereof. According to the present invention, an IoT energy optimization method, as a method performed in a gateway communicating with an IoT device or a computer system connected to the gateway, may comprise the step of adjusting the transmission power of the IoT device in a situation in which a spreading factor (SF) value of the IoT device is set.

Description

METHOD AND SYSTEM FOR ENERGY EFFICIENT LORA ENABLED IOT DEVICE USING REINFORCEMENT LEARNING}

아래의 설명은 초소형 IoT(Internet of Things) 장치에 대한 에너지 최적화 기술에 관한 것이다.The description below relates to energy optimization technology for a very small Internet of Things (IoT) device.

사물인터넷에서 한 단계 진화된 CPS(Cyber-Physical System)의 등장으로 인해 초소형 IoT 디바이스가 등장하게 된다.The emergence of a cyber-physical system (CPS) that has evolved a step further in the Internet of Things will lead to the emergence of ultra-compact IoT devices.

초소형 IoT 디바이스는 한정적인 물리적 이벤트를 측정할 수 있는 센서 및 정보를 전송할 통신 모듈, 그리고 센싱 정보를 디지털 정보로 인식하고 통신 모듈에 맞게 데이터를 파싱해 줄 최소한의 컴퓨팅 파워를 가진 마이크로컨트롤러(microcontroller)를 포함하는 디바이스를 의미한다.The micro IoT device is a communication module that transmits sensors and information that can measure limited physical events, and a microcontroller with minimal computing power that recognizes sensing information as digital information and parses data according to the communication module. It means a device comprising a.

초소형 IoT 디바이스는 컴퓨팅 파워가 없는 물리적인 사물에서 물리적인 이벤트가 발생하였을 때 사이버 공간에서 파악할 수 있도록 도움을 주기 때문에 CPS를 구현하기 위해서 가장 중요한 기술 요소라고 할 수 있다.Ultra-compact IoT devices can be said to be the most important technical elements to implement CPS, because they help users understand in the cyber space when a physical event occurs in a physical object without computing power.

초소형 IoT 디바이스의 사용 예는 다음과 같다.The following is an example of using the ultra-small IoT device.

- 상대방이 자신이 보낸 서류를 읽었는지 확인하기 위해 초소형 IoT 디바이스를 서류 봉투에 부착하고 서류 봉투가 개봉될 때 신호를 보내서 서류 수신을 확인한다.-To confirm that the other party has read the documents sent by them, attach a micro IoT device to the document bag and send a signal when the document bag is opened to confirm receipt of the document.

- 초소형 IoT 디바이스를 이용하여 산에 여러 군데에 초소형 IoT 디바이스를 살포하고 온도 혹은 연기 감지 센서를 통해 겨울철 산불 정보 등을 실시간으로 수집한다.-Using a tiny IoT device, spray a tiny IoT device on several places in the mountain and collect fire information in winter in real time through temperature or smoke detection sensors.

이러한 초소형 IoT 디바이스는 무선 통신을 통해서 발견한 물리적 이벤트를 게이트웨이로 전송하게 되는데 이를 위해서는 초소형 IoT 디바이스를 위한 무선 통신 프로토콜이 필요하다.Such a micro IoT device transmits a physical event discovered through wireless communication to a gateway, and for this, a wireless communication protocol for the micro IoT device is required.

이를 위해, Sigfox, LoRa, RPMA, LTE-M, NB-IoT와 같은 LPWA(Low Power Wide Area) 통신 프로토콜이 사용되고 있다.For this, LPWA (Low Power Wide Area) communication protocols such as Sigfox, LoRa, RPMA, LTE-M, and NB-IoT are used.

Sigfox는 Admiral Ivory, Admiral Blue와 같은 초소형 IoT 디바이스를 위한 통신 서비스를 제공하며, LoRa에서도 초소형 IoT 디바이스가 LoRa 기술 및 LoRaWan 프로토콜을 활용하여 게이트웨이와 통신을 할 수 있다.Sigfox provides communication services for micro IoT devices such as Admiral Ivory and Admiral Blue, and even in LoRa, micro IoT devices can communicate with gateways using LoRa technology and LoRaWan protocol.

초소형 IoT 디바이스 간 통신은 Cellular LPWA와 Non-cellular LPWA로 나눌 수 있다.Communication between micro IoT devices can be divided into Cellular LPWA and Non-cellular LPWA.

ABI Research의 조사로는 Non-cellular LPWA의 사용이 급증할 것이라 예측하고 있으며, Non-cellular LPWA로는 대표적으로 LoRa와 Sigfox가 존재한다.ABI Research's investigation predicts that the use of non-cellular LPWA will increase rapidly, and LoRa and Sigfox are representative of non-cellular LPWA.

LoRa의 경우 통신 모듈의 가격이 Sigfox보다 약간 비싸지만 LoRa의 PHY 기술로 인해 높은 간섭저항성과 최대 17배 높은 데이터율(data rate)을 보여주기 때문에 Non-cellular LPWA 기술 중에서도 LoRa의 사용이 더 많을 것이라 예상되고 있다.In the case of LoRa, the communication module is slightly more expensive than Sigfox, but LoRa's PHY technology shows high interference resistance and up to 17 times higher data rate, so LoRa will be used more among non-cellular LPWA technologies. Is expected.

초소형 IoT 디바이스는 기존 IoT 디바이스와는 달리 충전 혹은 전력 공급 형식이 아닌 배터리를 통해서 에너지를 공급받기 때문에 전력을 효율적으로 사용할 필요가 있다.Unlike the existing IoT devices, the ultra-small IoT device needs to efficiently use power because it receives energy through a battery that is not in the form of charging or power supply.

LoRa 모듈의 경우 정보를 송신할 때 가장 많은 에너지를 소비하며 전송 파워의 세기에 따라서도 두 배 이상의 에너지를 소비한다.The LoRa module consumes the most energy when transmitting information and consumes more than twice the energy depending on the strength of the transmission power.

따라서, 적절한 전송 파워 세기를 통해서 성공적으로 센싱 정보를 게이트웨이에 전달하는 것이 초소형 IoT 디바이스의 에너지 효율을 증가시키는데 중요하다.Therefore, it is important to successfully transmit sensing information to the gateway through proper transmission power strength to increase the energy efficiency of the micro IoT device.

이를 통해서 초소형 IoT 디바이스의 라이프 타임(life time)을 증가시켜 안정적으로 물리적인 이벤트를 감지할 수 있다.Through this, it is possible to stably detect a physical event by increasing the life time of the ultra-small IoT device.

LoRa에서는 CSS(chirp spread spectrum)를 사용해서 같은 채널을 사용하는 센서들끼리 간섭이 발생하지 않도록 신호를 모듈레이션 하여 한 채널을 통해서 서로 다른 CSS로 모듈레이션 된 신호는 게이트웨이에서 동시에 수신이 가능하다.In LoRa, a signal is modulated to prevent interference between sensors using the same channel using CSS (chirp spread spectrum), and signals modulated with different CSS through one channel can be simultaneously received at the gateway.

SF(Spreading factor)는 각 신호가 어떤 CSS를 사용해서 모듈레이션 됐는지를 알려주는 인덱스를 의미하며, 서로 다른 SF를 사용해서 동시에 신호를 보냈을 때 두 신호간 직교성이 존재한다.Spreading factor (SF) refers to an index indicating which CSS is modulated using each signal, and orthogonality between two signals exists when signals are simultaneously transmitted using different SFs.

하지만, 이론적으로는 두 신호 간 간섭이 무시할 수 있을 정도로 작지만, 최근 서로 다른 SF 간 직교성을 실측해본 연구에서는 실제로는 SF가 다른 두 신호간 간섭이 이론보다 더 심한 것으로 나타나고 있다.However, in theory, the interference between two signals is negligibly small, but a recent study of orthogonality between different SFs indicates that the interference between two signals with different SFs is more severe than the theory.

SF가 서로 다른 두 신호 간에 SIR(Signal-to-Interference Ratio) 차이가 8dB 이상이어야 SF7을 사용하는 신호가 있을 때 SF8을 사용하는 신호가 성공적으로 수신이 가능하다.The signal using SF8 can be successfully received when there is a signal using SF7 when the signal-to-interference ratio (SIR) difference between two signals having different SF is 8dB or more.

게이트웨이의 근거리에 SF7을 사용하는 디바이스가 있고 SF8 이상을 사용하는 디바이스가 멀리 있을 때 SF7을 사용하는 디바이스의 송신 파워 세기가 일정 범위 이상 존재하면 SF8 이상을 사용하는 디바이스의 송신 정보가 성공적으로 수신되지 않을 수 있다.If there is a device using SF7 near the gateway and a device using SF8 or higher is far away, if the transmit power strength of the device using SF7 is over a certain range, the transmission information of the device using SF8 or higher is not successfully received. It may not.

현재 LoRa 망에서는 망에서 간섭으로 인해 발생하는 송신 데이터 손실 때문에 전송 에너지의 낭비가 발생하고 있으며, 정보를 전송할 때에 초소형 IoT 디바이스에서 많은 에너지를 사용하고 있다.Currently, in the LoRa network, transmission energy is wasted due to transmission data loss caused by interference in the network, and a very small IoT device uses a lot of energy when transmitting information.

따라서, 초소형 IoT 디바이스의 에너지 효율을 높이기 위해서는 LoRa 망 내의 송신 데이터의 손실을 줄여야 한다.Therefore, in order to increase the energy efficiency of the ultra-small IoT device, it is necessary to reduce the loss of transmission data in the LoRa network.

LoRa 망 내의 송신 데이터 손실의 요인은 서로 다른 SF를 사용하는 신호 간 간섭이기 때문에 이 신호간 간섭을 줄임으로써 데이터 전송 성공율을 높이고 정보를 전송하는데 사용하는 에너지를 줄일 수 있다.Since the cause of transmission data loss in the LoRa network is interference between signals using different SFs, by reducing the interference between the signals, the data transmission success rate can be increased and the energy used to transmit information can be reduced.

강화학습 기법을 활용한 LoRa Enabled IoT 에너지 최적화 기술을 제공한다.Provides LoRa Enabled IoT energy optimization technology using reinforcement learning techniques.

초소형 IoT 디바이스의 전송 파워를 결정하기 위한 강화학습 알고리즘을 게이트웨이 혹은 게이트웨이에 연결된 엣지 서버에서 구동할 수 있다.The reinforcement learning algorithm for determining the transmission power of the ultra-small IoT device can be run on a gateway or an edge server connected to the gateway.

서로 다른 SF를 사용하는 신호를 인식하기 위해 게이트웨이에서 성공적으로 수신한 서로 다른 SF를 사용한 신호의 RSSI 값을 측정해서 강화학습 알고리즘에 반영할 수 있다.To recognize signals using different SFs, RSSI values of signals using different SFs successfully received at the gateway can be measured and reflected in the reinforcement learning algorithm.

IoT 디바이스와 통신하는 게이트웨이 또는 상기 게이트웨이와 연결된 컴퓨터 시스템에서 수행되는 방법으로서, 상기 IoT 디바이스의 SF(spreading factor) 값이 설정된 상황에서 상기 IoT 디바이스의 전송 파워를 조정하는 단계를 포함하는 IoT 에너지 최적화 방법을 제공한다.An IoT energy optimization method comprising adjusting a transmission power of the IoT device in a situation in which a spreading factor (SF) value of the IoT device is set as a method performed in a gateway communicating with the IoT device or a computer system connected to the gateway. Gives

일 측면에 따르면, 상기 조정하는 단계는, 상기 게이트웨이에서 측정된, 상기 IoT 디바이스의 RSSI 값을 이용하여 상기 전송 파워를 결정할 수 있다.According to an aspect, the adjusting step may determine the transmission power using the RSSI value of the IoT device measured at the gateway.

다른 측면에 따르면, 상기 조정하는 단계는, LoRaWan 표준 기술 상의 ADR(Adaptive Date Rate) 알고리즘으로 상기 전송 파워를 조정할 수 있다.According to another aspect, the adjusting step may adjust the transmission power by an adaptive date rate (ADR) algorithm on LoRaWan standard technology.

또 다른 측면에 따르면, 상기 조정하는 단계는, 강화학습 기법인 Q 러닝 알고리즘을 이용하여 상기 전송 파워를 결정하는 단계를 포함할 수 있다.According to another aspect, the adjusting step may include determining the transmission power using a Q learning algorithm, which is a reinforcement learning technique.

또 다른 측면에 따르면, 상기 결정하는 단계는, 상기 Q 러닝 알고리즘을 통해 상기 IoT 디바이스와 다른 SF에 속한 디바이스와의 간섭 정도를 나타내는 상태(state)에 대해 최대의 보상(reward)을 획득하는 행동(action)인 최적의 전송 파워를 결정할 수 있다.According to another aspect, the determining step is an action of obtaining a maximum reward for a state indicating a degree of interference between the IoT device and a device belonging to another SF through the Q learning algorithm ( action).

또 다른 측면에 따르면, 상기 결정하는 단계는, 각 SF 별 수신 신호에 대한 RSSI 값을 이용하여 상기 IoT 디바이스와 다른 SF에 속한 디바이스와의 간섭 정도를 나타내는 상태를 설정하는 단계; 및 상기 IoT 디바이스의 전송 파워 세기 설정 정보에 기초하여 상기 상태에 대해 최대의 보상을 획득하는 행동인 최적의 전송 파워를 결정하는 단계를 포함할 수 있다.According to another aspect, the determining may include setting a state indicating the degree of interference between the IoT device and a device belonging to another SF by using an RSSI value for each SF received signal; And determining an optimal transmission power that is an action of obtaining maximum compensation for the state based on the transmission power intensity setting information of the IoT device.

또 다른 측면에 따르면, 상기 게이트웨이에서는 상기 Q 러닝 알고리즘을 통해 결정된 전송 파워를 LoRaWan 표준 기술 상의 필드를 이용하여 상기 IoT 디바이스로 전송할 수 있다.According to another aspect, the gateway may transmit the transmission power determined through the Q-learning algorithm to the IoT device using a field on LoRaWan standard technology.

IoT 디바이스와 통신하는 게이트웨이 또는 상기 게이트웨이와 연결된 컴퓨터 시스템에서 수행되는 방법으로서, 상기 IoT 디바이스의 SF 값이 설정된 상황에서 강화학습 기법인 Q 러닝 알고리즘을 통해 상기 IoT 디바이스와 다른 SF에 속한 디바이스와의 간섭 정도를 나타내는 상태에 대해 최대의 보상을 획득하는 최적의 전송 파워를 결정하여 상기 IoT 디바이스의 전송 파워를 조정하는 단계를 포함하는 IoT 에너지 최적화 방법을 제공한다.As a method performed in a gateway communicating with an IoT device or a computer system connected to the gateway, interference between the IoT device and a device belonging to another SF through a Q learning algorithm that is a reinforcement learning technique in a situation in which the SF value of the IoT device is set. It provides an IoT energy optimization method comprising the step of adjusting the transmission power of the IoT device by determining the optimal transmission power to obtain the maximum compensation for the state indicating the degree.

IoT 에너지 최적화 시스템에 있어서, IoT 디바이스와 통신하는 게이트웨이 또는 상기 게이트웨이와 연결된 컴퓨터 시스템 상에 구현되고, 컴퓨터에서 판독 가능한 명령을 실행하도록 구현되는 적어도 하나의 프로세서를 포함하고, 상기 적어도 하나의 프로세서는, 상기 IoT 디바이스의 SF(spreading factor) 값이 설정된 상황에서 상기 IoT 디바이스의 전송 파워를 결정하는 전송 파워 결정부를 포함하는 IoT 에너지 최적화 시스템을 제공한다.An IoT energy optimization system, comprising: a gateway communicating with an IoT device or at least one processor implemented on a computer system connected to the gateway, and implemented to execute computer readable instructions, wherein the at least one processor comprises: It provides an IoT energy optimization system including a transmission power determining unit for determining the transmission power of the IoT device in a situation in which the spreading factor (SF) value of the IoT device is set.

본 발명의 실시예들에 따르면, LoRaWan 표준 기술 상의 ADR(Adaptive Date Rate) 알고리즘으로 각 디바이스의 SF값이 설정된 상황에서 각 디바이스의 전송 파워만 조정하면서 더 많은 디바이스의 신호를 손실 없이 받아들이고 디바이스들의 재전송 횟수를 줄여 각 디바이스의 송신 에너지를 줄일 수 있다.According to the embodiments of the present invention, in a situation where the SF value of each device is set by the adaptive date rate (ADR) algorithm on LoRaWan standard technology, only the transmission power of each device is adjusted and the signals of more devices are received without loss and the devices are retransmitted By reducing the number of times, the transmission energy of each device can be reduced.

본 발명의 실시예들에 따르면, 강화학습 기법 중 Q Learning을 통해서 얻은 각 디바이스의 송신 파워 세기는 LoRaWan 표준 프로토콜 상의 LinkADRReq 필드를 활용하여 각 디바이스에게 전송할 수 있고 각 디바이스는 전송 받은 송신 파워 세기로 파워 설정이 가능하다.According to embodiments of the present invention, the transmit power strength of each device obtained through Q Learning among the reinforcement learning techniques can be transmitted to each device by using the LinkADRReq field on the LoRaWan standard protocol, and each device is powered by the received transmit power strength Setting is possible.

도 1은 ADR 알고리즘을 통한 디바이스 별 SF 배치 예시를 도시한 것이다.
도 2는 각 디바이스의 SF값이 설정된 상황에서 전송 파워만 조정하는 기술의 성능을 설명하기 위한 예시 도면이다.
도 3은 본 발명의 일 실시예에 있어서 컴퓨터 시스템의 내부 구성의 일례를 설명하기 위한 블록도이다.
도 4는 각 LoRa 모듈별 SIR_xy 값의 테이블 예시를 도시한 것이다.
도 5는 각 센서의 상태(state), 활동(action), 및 보상(reward)에 대한 정의를 설명하기 위한 예시 도면이다.
도 6은 Q Learning 알고리즘 예시를 도시한 것이다.
도 7은 Q Learning 알고리즘을 이용한 IoT 에너지 최적화를 위한 전체 시스템 구조를 도시한 것이다.
도 8은 Q Learning 알고리즘을 통해 최적 송신 파워 세기를 결정하는 방법을 도시한 순서도이다.1 illustrates an example of SF deployment for each device through the ADR algorithm.
2 is an exemplary diagram for explaining the performance of a technique for adjusting only transmission power in a situation in which SF values of each device are set.
3 is a block diagram illustrating an example of an internal configuration of a computer system in an embodiment of the present invention.
4 is a SIR _xy for each LoRa module An example of a table of values is shown.
FIG. 5 is an exemplary diagram for describing definitions of states, actions, and rewards of each sensor.
6 shows an example of the Q Learning algorithm.
7 shows the overall system structure for IoT energy optimization using the Q Learning algorithm.
8 is a flowchart illustrating a method for determining an optimal transmit power strength through a Q Learning algorithm.

이하, 본 발명의 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 발명의 실시예들은 강화학습 기법을 활용한 LoRa Enabled IoT 에너지 최적화 기술에 관한 것이다.Embodiments of the present invention relate to LoRa Enabled IoT energy optimization technology utilizing reinforcement learning techniques.

강화학습 기법 중 Q Learning 알고리즘을 활용하여 서로 다른 SF를 사용하는 신호를 인식하고 서로 다른 SF를 사용하는 신호 간 간섭을 줄여 에너지를 최적화 하고자 초소형 IoT 디바이스의 전송 파워를 예측하고 예측 값으로 조정하여 한다.Among the reinforcement learning techniques, the Q learning algorithm is used to recognize signals using different SFs and to reduce energy interference between signals using different SFs to optimize energy and predict transmission power of ultra-small IoT devices and adjust them with predicted values. .

각 디바이스의 전송 파워를 조정하기 위해서 각 디바이스 상에 Q Learning 알고리즘을 수행해야 하지만, 초소형 IoT 디바이스는 각 디바이스 상에서 Q Learning 알고리즘을 수행할 컴퓨팅 파워가 없다.In order to adjust the transmission power of each device, a Q Learning algorithm must be performed on each device, but a micro IoT device does not have the computing power to perform a Q Learning algorithm on each device.

또한, 한 디바이스가 정보를 송신할 때에 서로 다른 SF 신호를 감지하기 위해서는 디바이스에서 Full Duplex 모드에서 Rx slot과 Tx slot이 함께 열려야 하는데, 초소형 IoT 디바이스에서는 Half Duplex 모드일 경우도 있어 정보를 전송 할 때에 다른 신호를 수신할 수 없는 경우도 있다.In addition, when a device transmits information, in order to detect different SF signals, the Rx slot and the Tx slot must be opened in full duplex mode on the device. In some small IoT devices, there may be a half duplex mode. In some cases, other signals cannot be received.

따라서, 본 발명에서는 초소형 IoT 디바이스의 전송 파워를 결정하기 위한 Q Learning 알고리즘을 게이트웨이 혹은 게이트웨이에 연결된 엣지 서버에서 구동한다.Therefore, in the present invention, the Q Learning algorithm for determining the transmission power of the micro IoT device is driven by a gateway or an edge server connected to the gateway.

그리고, 서로 다른 SF를 사용하는 신호를 인식하기 위해 게이트웨이에서 성공적으로 수신한 서로 다른 SF를 사용한 신호의 RSSI 값을 측정해서 Q Learning 알고리즘에 반영한다.And, in order to recognize signals using different SFs, the RSSI values of signals using different SFs successfully received at the gateway are measured and reflected in the Q Learning algorithm.

본 발명의 실시예들에 따르면, LoRaWan 표준 기술 상의 ADR(Adaptive Date Rate) 알고리즘으로 각 디바이스의 SF값이 설정된 상황에서 각 디바이스의 전송 파워만 조정하면서 더 많은 디바이스의 신호를 손실 없이 받아들이고 디바이스 들의 재전송 횟수를 줄여 각 디바이스의 송신 에너지를 줄일 수 있다.According to embodiments of the present invention, in a situation where the SF value of each device is set by the adaptive date rate (ADR) algorithm on the LoRaWan standard technology, only the transmission power of each device is adjusted while accepting signals from more devices without loss and retransmission of devices By reducing the number of times, the transmission energy of each device can be reduced.

또한, Q Learning을 통해서 얻은 각 디바이스의 송신 파워 세기는 LoRaWan 표준 프로토콜 상의 LinkADRReq 필드를 활용하여 각 디바이스에게 전송할 수 있고 각 디바이스는 전송 받은 송신 파워 세기로 파워 설정이 가능하다. 즉, 표준을 헤치지 않고 사용 가능한 기술이라 할 수 있다.In addition, the transmit power strength of each device obtained through Q Learning can be transmitted to each device by using the LinkADRReq field on the LoRaWan standard protocol, and each device can be set to the power of the transmitted power received. In other words, it can be said to be a technology that can be used without breaking standards.

현재 LoRaWan 표준에 명기된 도 1의 "ADR 알고리즘을 통한 디바이스별 SF 배치"를 기반으로 본 기술이 적용 되기 전(도 2의 (A))과 후(도 2의 (B))에 대한 성능 비교를 하고자 한다.Performance comparison of before (Fig. 2 (A)) and after (Fig. 2 (B)) of the present technology is applied based on the "SF deployment by device through the ADR algorithm" of Fig. 1 specified in the current LoRaWan standard. I want to

여기서는 같은 게이트웨이에 같은 채널을 활용하여 모든 디바이스들이 통신을 하고 각 디바이스는 SF 값이 정해진 상황을 가정한다.Here, it is assumed that all devices communicate using the same channel for the same gateway, and each device has a predetermined SF value.

도 2의 (A)의 경우에는 SF9를 사용하여 정보를 송신하는 디바이스가 최대 세기를 사용해서 정보를 보내더라도 경로 손실(path loss) 때문에 게이트웨이에서 수신하는 신호 세기(RSSI)는 1dBm 밖에 되지 않게 되며, 이때 게이트웨이에 근거리에 위치한 SF7을 사용하는 디바이스의 RSSI 값 보다 역치 이하가 되어 게이트웨이에서 수신이 불가능 하다.In the case of (A) of FIG. 2, even if a device transmitting information using SF9 sends information using the maximum intensity, the signal strength (RSSI) received by the gateway is only 1 dBm due to path loss. At this time, it is less than the threshold of the RSSI value of the device using the SF7 located at a short distance to the gateway, and it cannot be received at the gateway.

도 2의 (B)의 경우에는 본 기술을 통해서 SF7을 사용하는 디바이스가 Tx power를 17dBm에서 5dBm으로 낮춰 자신이 송신한 정보뿐만 아니라 SF9를 사용하는 디바이스의 정보도 게이트웨이에서 받을 수 있게 된다.In the case of (B) of FIG. 2, a device using SF7 reduces the Tx power from 17dBm to 5dBm through this technology, so that it can receive not only information transmitted by itself but also information of a device using SF9 at the gateway.

이를 통해 LoRa 망에서의 정보 손실을 줄이고 재전송에 드는 송신 에너지를 줄일 수 있어 디바이스의 에너지 효율을 증가시킬 수 있다.Through this, information loss in the LoRa network can be reduced and transmission energy required for retransmission can be reduced, thereby increasing the energy efficiency of the device.

이하에서 강화학습 기법을 활용한 LoRa Enabled IoT 장치의 에너지 최적화 기술의 구체적인 실시예를 설명하기로 한다.Hereinafter, a specific embodiment of the energy optimization technology of the LoRa Enabled IoT device using the reinforcement learning technique will be described.

도 3은 본 발명의 일 실시예에 있어서 컴퓨터 시스템의 내부 구성의 일례를 설명하기 위한 블록도이다.3 is a block diagram illustrating an example of an internal configuration of a computer system in an embodiment of the present invention.

본 발명의 실시예들에 따른 IoT 에너지 최적화 시스템이 도 3의 컴퓨터 시스템(300)을 통해 구현될 수 있다. 도 3에 도시한 바와 같이, 컴퓨터 시스템(300)은 강화학습 기법을 활용한 IoT 에너지 최적화 방법을 실행하기 위한 구성요소로서 프로세서(310), 메모리(320), 영구 저장 장치(330), 버스(340), 입출력 인터페이스(350) 및 네트워크 인터페이스(360)를 포함할 수 있다.The IoT energy optimization system according to embodiments of the present invention may be implemented through the computer system 300 of FIG. 3. As shown in FIG. 3, the computer system 300 is a component for executing an IoT energy optimization method utilizing reinforcement learning techniques, such as a processor 310, a memory 320, a permanent storage device 330, a bus ( 340), an input/output interface 350 and a network interface 360.

프로세서(310)는 명령어들의 시퀀스를 처리할 수 있는 임의의 장치를 포함하거나 그의 일부일 수 있다. 프로세서(310)는 예를 들어 컴퓨터 프로세서, 이동 장치 또는 다른 전자 장치 내의 프로세서 및/또는 디지털 프로세서를 포함할 수 있다. 프로세서(310)는 예를 들어, 서버 컴퓨팅 디바이스, 서버 컴퓨터, 일련의 서버 컴퓨터들, 서버 팜, 클라우드 컴퓨터, 컨텐츠 플랫폼, 이동 컴퓨팅 장치, 스마트폰, 태블릿, 셋톱 박스 등에 포함될 수 있다. 프로세서(310)는 버스(340)를 통해 메모리(320)에 접속될 수 있다.The processor 310 may include or be part of any device capable of processing a sequence of instructions. The processor 310 may include, for example, a computer processor, a processor in a mobile device or other electronic device, and/or a digital processor. The processor 310 may be included in, for example, a server computing device, a server computer, a series of server computers, a server farm, a cloud computer, a content platform, a mobile computing device, a smartphone, a tablet, a set top box, and the like. The processor 310 may be connected to the memory 320 through the bus 340.

메모리(320)는 컴퓨터 시스템(300)에 의해 사용되거나 그에 의해 출력되는 정보를 저장하기 위한 휘발성 메모리, 영구, 가상 또는 기타 메모리를 포함할 수 있다. 예를 들어, 메모리(320)는 랜덤 액세스 메모리(RAM: random access memory) 및/또는 동적 RAM(DRAM: dynamic RAM)을 포함할 수 있다. 메모리(320)는 컴퓨터 시스템(300)의 상태 정보와 같은 임의의 정보를 저장하는 데 사용될 수 있다. 메모리(320)는 예를 들어 IoT 에너지 최적화를 제어하기 위한 명령어들을 포함하는 컴퓨터 시스템(300)의 명령어들을 저장하는 데에도 사용될 수 있다. 컴퓨터 시스템(300)은 필요에 따라 또는 적절한 경우에 하나 이상의 프로세서(310)를 포함할 수 있다.The memory 320 may include volatile memory, permanent, virtual, or other memory for storing information used or output by the computer system 300. For example, the memory 320 may include random access memory (RAM) and/or dynamic RAM (DRAM). Memory 320 may be used to store any information, such as status information of computer system 300. The memory 320 can also be used to store instructions of the computer system 300 including instructions for controlling IoT energy optimization, for example. Computer system 300 may include one or more processors 310 as needed or appropriate.

버스(340)는 컴퓨터 시스템(300)의 다양한 컴포넌트들 사이의 상호작용을 가능하게 하는 통신 기반 구조를 포함할 수 있다. 버스(340)는 컴퓨터 시스템(300)의 컴포넌트들 사이에, 예를 들어 프로세서(310)와 메모리(320) 사이에 데이터를 운반할 수 있다. 버스(340)는 컴퓨터 시스템(300)의 컴포넌트들 간의 무선 및/또는 유선 통신 매체를 포함할 수 있으며, 병렬, 직렬 또는 다른 토폴로지 배열들을 포함할 수 있다.The bus 340 may include a communication infrastructure that enables interaction between various components of the computer system 300. The bus 340 can transport data between components of the computer system 300, for example between the processor 310 and the memory 320. The bus 340 may include wireless and/or wired communication media between components of the computer system 300, and may include parallel, serial or other topology arrangements.

영구 저장 장치(330)는 (예를 들어 메모리(320)에 비해) 소정의 연장된 기간 동안 데이터를 저장하기 위해 컴퓨터 시스템(300)에 의해 사용되는 바와 같은 메모리 또는 다른 영구 저장 장치와 같은 컴포넌트들을 포함할 수 있다. 영구 저장 장치(330)는 컴퓨터 시스템(300) 내의 프로세서(310)에 의해 사용되는 바와 같은 비휘발성 메인 메모리를 포함할 수 있다. 예를 들어, 영구 저장 장치(330)는 플래시 메모리, 하드 디스크, 광 디스크 또는 다른 컴퓨터 판독 가능 매체를 포함할 수 있다.Persistent storage 330 may store components, such as memory or other permanent storage, as used by computer system 300 to store data for a predetermined extended period (eg, compared to memory 320). It can contain. Persistent storage 330 may include non-volatile main memory as used by processor 310 in computer system 300. For example, the permanent storage device 330 may include a flash memory, hard disk, optical disk, or other computer readable medium.

입출력 인터페이스(350)는 키보드, 마우스, 마이크, 카메라, 디스플레이 또는 다른 입력 또는 출력 장치에 대한 인터페이스들을 포함할 수 있다. 구성 명령들 및/또는 IoT 에너지 최적화와 관련된 입력이 입출력 인터페이스(350)를 통해 수신될 수 있다.The input/output interface 350 may include interfaces to a keyboard, mouse, microphone, camera, display, or other input or output device. Configuration commands and/or input related to IoT energy optimization may be received via input/output interface 350.

네트워크 인터페이스(360)는 근거리 네트워크 또는 인터넷과 같은 네트워크들에 대한 하나 이상의 인터페이스를 포함할 수 있다. 네트워크 인터페이스(360)는 유선 또는 무선 접속들에 대한 인터페이스들을 포함할 수 있다. 구성 명령들은 네트워크 인터페이스(360)를 통해 수신될 수 있다. 그리고, IoT 에너지 최적화와 관련된 정보들은 네트워크 인터페이스(360)를 통해 수신 또는 송신될 수 있다.Network interface 360 may include one or more interfaces to networks such as a local area network or the Internet. Network interface 360 may include interfaces for wired or wireless connections. Configuration commands may be received via network interface 360. Also, information related to IoT energy optimization may be received or transmitted through the network interface 360.

또한, 다른 실시예들에서 컴퓨터 시스템(300)은 도 3의 구성요소들보다 더 많은 구성요소들을 포함할 수도 있다. 그러나, 대부분의 종래기술적 구성요소들을 명확하게 도시할 필요성은 없다. 예를 들어, 컴퓨터 시스템(300)은 상술한 입출력 인터페이스(350)와 연결되는 입출력 장치들 중 적어도 일부를 포함하도록 구현되거나 또는 트랜시버(transceiver), GPS(Global Positioning System) 모듈, 카메라, 각종 센서, 데이터베이스 등과 같은 다른 구성요소들을 더 포함할 수도 있다. 보다 구체적인 예로, 컴퓨터 시스템(300)이 스마트폰과 같은 모바일 기기의 형태로 구현되는 경우, 일반적으로 모바일 기기가 포함하고 있는 카메라, 가속도 센서나 자이로 센서, 카메라, 각종 물리적인 버튼, 터치패널을 이용한 버튼, 입출력 포트, 진동을 위한 진동기 등의 다양한 구성요소들이 컴퓨터 시스템(300)에 더 포함되도록 구현될 수 있다.Further, in other embodiments, the computer system 300 may include more components than those in FIG. 3. However, there is no need to clearly show most prior art components. For example, the computer system 300 is implemented to include at least some of the input/output devices connected to the input/output interface 350 described above, or a transceiver, a global positioning system (GPS) module, a camera, various sensors, Other components, such as a database, may also be included. As a more specific example, when the computer system 300 is implemented in the form of a mobile device such as a smartphone, a camera, an acceleration sensor or a gyro sensor, a camera, various physical buttons, and a touch panel generally included in the mobile device are used. Various components such as a button, an input/output port, and a vibrator for vibration may be implemented to be further included in the computer system 300.

본 실시예에서는 하나의 게이트웨이 안에 N개의 LoRa 기반 초소형 IoT 디바이스들이 존재한다고 가정한다.In this embodiment, it is assumed that there are N LoRa-based micro IoT devices in one gateway.

이후에서는 설명을 간소화 하기 위해 초소형 IoT 디바이스를 센서로도 명시한다. 센서의 인덱스는 i로 하고, 센서를 명명할 때에는 s_i로 한다.Later, in order to simplify the description, a micro IoT device is also specified as a sensor. The index of the sensor is i, and when naming the sensor, it is s _i .

본 기술에서는 한 게이트웨이에서 하나의 채널에서 모든 센서들이 게이트웨이와 통신을 하는 상황을 가정하며, 이때 SF(spreading factor)는 이미 각 센서 s_i 별로 지정이 되어 있다고 가정한다.In this technique, it is assumed that all sensors communicate with the gateway on one channel in one gateway, and it is assumed that the spreading factor (SF) is already specified for each sensor s _i .

이외로 전송 파워 이외에 데이터율에 영향을 미치는 요소들은 다 고정된 값으로 가정한다. 예를 들어, Coding Rate는 모두 4/5로 동일하다고 가정하고 센서들의 uplink Bandwidth는 125kHz 등과 같이 동일한 BW를 설정한다고 가정한다.In addition, it is assumed that all factors affecting the data rate in addition to the transmission power are fixed values. For example, it is assumed that all Coding Rates are equal to 4/5, and the uplink bandwidth of the sensors is set to the same BW, such as 125 kHz.

초기의 각 센서 별 전송 파워값(P_i)은 게이트웨이에서 지정한 값으로 설정이 되며, (따라서 게이트웨이는 각 센서의 전송 파워를 항상 알 수 있으며 P_i에 따른 정보 수신 상태 및 RSSI_i 값 매칭이 가능) 게이트웨이에서는 각 시간 별로 자신이 성공적으로 수신한 신호에 대한 RSSI 값을 파악할 수 있다고 가정한다.Initially, the transmission power value (P _i ) for each sensor is set to the value specified by the gateway (so the gateway always knows the transmission power of each sensor, and it is possible to match the information reception status and RSSI _i value according to P _i ) It is assumed that the gateway can grasp the RSSI value for the signal it successfully received for each time.

센서들은 시간당 d_i의 속도로 센싱 데이터를 생성하고 이 정보를 전부 게이트웨이로 전송한다. 만약에 전송이 실패하여 센싱 데이터를 송신하지 못한 경우, 다음 전송 기회 때 송신하지 못한 센싱 데이터를 포함해서 센싱 데이터를 송신한다고 가정한다.The sensors generate sensing data at a rate of d _i per hour and send all this information to the gateway. If the transmission fails and the sensing data is not transmitted, it is assumed that the sensing data is transmitted including the sensing data that was not transmitted at the next transmission opportunity.

손실된 정보를 저장할 만큼의 버퍼는 각 디바이스에 존재한다고 가정한다. 따라서, 만약 정보 손실이 발생하면 각 센서는 손실된 만큼의 데이터를 보내기 위해 전송 에너지를 추가로 사용해야 한다.It is assumed that there are enough buffers in each device to store the lost information. Therefore, if information loss occurs, each sensor must additionally use the transmission energy to send the lost data.

본 실시예에서는 문제에 대한 정의를 다음과 같이 진행한다.In this embodiment, the definition of the problem proceeds as follows.

문제:

Problem:

여기에서 E는 N개의 센서들의 전송 에너지 효율을 의미하며 bit/Joule로 나타낼 수 있다.Here, E means the transmission energy efficiency of the N sensors and can be expressed as bit/joule.

정확한 의미로 E는 (특정 주기 별 전송한 총 데이터량)/(그 데이터량을 전송할 때에 사용한 에너지)로 정의할 수 있으며, 한정된 에너지를 이용하여 얼마만큼의 데이터를 전송했는지를 파악하는 파라미터를 의미한다.In an accurate sense, E can be defined as (total amount of data transmitted per specific period)/(energy used to transmit the amount of data), and means a parameter that determines how much data has been transmitted using limited energy. do.

E는 수학식 1과 같이 정의할 수 있다.E can be defined as in Equation 1.

[수학식 1][Equation 1]

d_i는 s_i의 초당 데이터 센싱 속도(bit/s)를 의미하며 D는 일정 시간 주기 (s)를 의미한다. β는 (전송되는 데이터 사이즈)/(payload 사이즈)를 의미하며, α는 (1-DER)을 의미한다. DER은 data extraction rate, 즉, 일정 주기에 전송된 데이터 중 수신 측에서 전송 받은 데이터의 비율을 의미하며, 결국 α는 전송 중 손실된 데이터의 비율을 의미한다.d _i denotes the data sensing speed per second (bit/s) of s _i , and D denotes a certain period of time (s). β means (size of data to be transmitted)/(size of payload), and α means (1-DER). DER means the data extraction rate, that is, the ratio of the data transmitted from the receiving side among the data transmitted in a certain period, and in the end, α means the ratio of the data lost during the transmission.

P_i는 s_i가 데이터를 전송할 때의 송신 파워를 의미하며 T_i는 s_i가 1-bit를 전송할 때 전송 시간을 의미하며, T_i는 s_i의 데이터율의 역수로 나타낼 수 있다.P _i refers to the transmission power when the transmit data s _i and T _i indicates the transmission time when the s _i send a 1-bit, and, T _i can be expressed as the reciprocal of the data rate of the s _i.

LoRa에서는 s_i의 이론적인 DR_i 값을 수학식 2와 같이 정의하고 있다.In LoRa, the theoretical DR _i value of s _i is defined as in Equation 2.

[수학식 2][Equation 2]

SF_i는 s_i의 spreading factor 값을 의미하고 BW와 CR은 각각 채널 대역폭과 Coding rate를 의미한다. 이를 반영하게 되면 E 함수는 수학식 3과 같이 구성된다.SF _i means the spreading factor value of s _i , and BW and CR mean channel bandwidth and coding rate, respectively. If this is reflected, the E function is constructed as in Equation 3.

[수학식 3][Equation 3]

여기에서 d_i와 D, BW, CR, β값을 상수로 가정하고 SF_i값도 이미 각 센서 s_i마다 배정이 되어 있는 상황이라고 가정했을 때, 한 게이트웨이에서 담당하는 센서들의 총 전송 에너지 합은 α와 P_i에 의해서 결정될 수 있다.Here, assuming that d _i and D, BW, CR, and β values are constant and SF _i values are already assigned to each sensor s _i , the total transmission energy of the sensors in charge of one gateway is It can be determined by α and P _i .

즉, 로스율이 작거나 각 센서 s_i의 전송 파워 P_i가 낮을 때 전체적인 센서의 통신 에너지 효율을 높일 수 있다.That is, it is possible to increase the energy efficiency of the overall communication time roseuyul sensor is less than or lower the transmit power P _i for each sensor s _i.

일반적으로 무선 통신에서는 송신 파워를 높이게 되면 로스율을 줄일 수 있으나 본 실시예에서는 높은 에너지 효율을 달성하기 위해서 P_i와 로스율을 함께 고려해야 하기 때문에 새로운 접근 방법을 통해서 문제를 해결해야 한다.In general, in wireless communication, when the transmission power is increased, the loss rate can be reduced, but in this embodiment, in order to achieve high energy efficiency, P _i and the loss rate must be considered together, so a problem must be solved through a new approach.

또한, 본 실시예에서 가정하는 센서들은 다수의 센서들 간의 상호 송신 파워 세기와 각 센서의 위치에 따라서 로스율이 가변적으로 변하기 때문에 수학적 혹은 통계적으로 모델링을 통해서 문제를 풀기 어렵다.In addition, the sensors assumed in this embodiment are difficult to solve through mathematical or statistical modeling because the loss rate varies depending on the strength of mutual transmission power between multiple sensors and the position of each sensor.

전체 센서들의 로스율 α은 Pi에 비선형적으로 존재하게 되므로 본 발명에서는 강화학습을 이용하여 적은 송신 파워를 소비하면서 작은 로스율을 갖도록 최적의 Pi 값을 찾는 기술을 제공한다.Since the loss ratio α of all sensors is non-linearly present in Pi, the present invention provides a technique for finding an optimal value of Pi to have a small loss ratio while consuming less transmission power using reinforcement learning.

본 발명에서는 강화학습을 통해서 LoRa 기반으로 통신하는 초소형 IoT 디바이스들의 통신 에너지 효율을 증가시키고자 하는 것이다.In the present invention, through the reinforcement learning is to increase the communication energy efficiency of ultra-small IoT devices communicating based on LoRa.

본 발명에서는 각 센서가 최적의 송신 파워 P_i를 얻기 위해서 강화학습 중 한 부류인 Q learning 알고리즘을 사용한다. 각 센서 s_i의 최적의 P_i 값을 찾기 위한 Q learning 알고리즘은 각 초소형 IoT 디바이스에 존재하는 것이 아니라 게이트웨이에 위치하여 연산이 되기 때문에, 게이트웨이에서 측정된 각 센서 s_i의 RSSI_i 값을 이용하여 센서 s_i가 높은 에너지 효율을 가지면서 정보를 전송할 수 있도록 최적의 P_i를 탐색한다. Q learning 알고리즘을 사용하기 위해서는 상태(state) 및 행동(action) 그리고 보상(reward)에 대한 정의가 필요하다.In the present invention, each sensor uses a Q learning algorithm, which is a class of reinforcement learning, in order to obtain an optimal transmission power P _i . Since the Q learning algorithm for finding the optimal P _i value of each sensor s _i is not located in each micro IoT device, but is located at the gateway and is calculated, it uses the RSSI _i value of each sensor s _i measured at the gateway. The optimal P _i is searched so that the sensor s _i can transmit information with high energy efficiency. To use the Q learning algorithm, it is necessary to define states, actions, and rewards.

본 발명에서는 각 센서 s_i 별로 상태, 행동, 및 보상을 정의하고, 이 정보들은 모두 게이트웨이에 존재한다.In the present invention, states, actions, and rewards are defined for each sensor s _i , and all of this information is present in the gateway.

상태state

각 s_i의 상태 S_i는 수학식 4와 같이 정의될 수 있다.For each state s _i S _i can be defined as in Equation (4).

[수학식 4][Equation 4]

I_ij는 s_i와 s_j 사이의 간섭의 정도를 의미한다. x는 s_i의 SF 값을 의미하고, y는 s_j의 SF 값을 의미한다.I _ij means the degree of interference between s _i and s _j . x means the SF value of s _i , y means the SF value of s _j .

s_i의 신호를 게이트웨이에서 받을 때에 s_i의 신호와 s_j의 신호, 그리고 SIR_xy 값을 비교하여 만약 s_j의 신호가 s_i의 상황에 상관없이 수신 가능하다면 0, 만약에 s_i의 신호가 너무 세서 s_j의 신호를 받지 못한다면 -1, 그 반대는 1의 값으로 설정한다.When receiving the signal of s _i from the gateway, compare the signal of s _{i with} the signal of s _j , and the SIR _{xy value.If} the signal of s _j can be received regardless of the situation of s _i , 0, if the signal of s _i If is too strong to receive the signal of s _j -1, the opposite is set to a value of 1.

SIR_xy 값은 도 4에 도시한 테이블과 같이 각 LoRa 모듈별로 실측하여 구할 수 있다.The SIR _xy value can be obtained by actual measurement for each LoRa module as shown in the table shown in FIG. 4.

k는 s_i가 속하지 않은 SF 인덱스 값을 의미하며 수학식 5와 같이 나타낼 수 있다.k means an SF index value to which s _i does not belong, and can be expressed by Equation (5).

[수학식 5][Equation 5]

j_k는 s_i의 신호가 게이트웨이에 도달했을 때에 SF_k를 쓰는 센서 s_j 중에 성공적으로 게이트웨이가 정보를 수신한 센서를 의미한다.j _k denotes a sensor that successfully receives the information from the sensor s _{j using} SF _k when the signal of s _i reaches the gateway.

ε_k는 SF 별 가중치 팩터를 의미하며, SF 별로 신호 특성이 다르기 때문에 본 실시예에서는 그 특성을 반영하여 가중치 팩터를 산정한다. 예를 들어, SF 12의 경우 data rate이 낮아서 한번 신호가 간섭이 일어나서 전송이 안되면 다시 긴 시간 전송이 발생해야 되기 때문에 SF7에서 발생하는 간섭보다 SF12에서 발생하는 간섭을 더 신경써야 할 수 있다. 그럴 경우, ε₁₂는 ε₇보다 클 수 있으며, 이 가중치 팩터들은 상태를 discrete하게 하기 위해 정수의 범위에서 설정한다.ε _k means a weight factor for each SF, and since signal characteristics are different for each SF, in this embodiment, the weight factor is calculated by reflecting the characteristic. For example, in the case of SF 12, since the data rate is low and once the signal is interfered and transmission is not performed, a long time transmission must occur again, so it may be necessary to care more about the interference occurring in SF12 than the interference occurring in SF7. In that case, ε ₁₂ may be greater than ε ₇ , and these weight factors are set in the range of integers to make the state discrete.

즉, S_i는 s_i의 입장에서 다른 SF에 속한 디바이스가 얼마나 간섭을 받고 있는지 상태를 파악하는 파라미터라고 할 수 있다.That is, S _i can be said to be a parameter for determining the state of how interfering with devices belonging to other SFs from the standpoint of s _i .

행동behavior

각 s_i의 행동 A_i는 수학식 6과 같이 정의할 수 있다.S _i of each action A _i may be defined as Equation (6).

[수학식 6][Equation 6]

여기서, m은 센서 s_i에서 적용할 수 있는 송신 파워 level 수를 의미한다. 예를 들어서, 센서에서 -10, 0, 10, 20dBm으로 송신 파워를 설정할 수 있다면, m은 4이며, P_i ^(m)은 각 송신 파워 level에서의 송신 파워량을 의미한다. 해당 예에서 P_i ⁽¹⁾은 -10dBm을 의미한다.Here, m means the number of transmit power levels that can be applied by the sensor s _i . For example, if the transmission power can be set to -10, 0, 10, 20 dBm in the sensor, m is 4, and P _i ^(m) means the amount of transmission power at each transmission power level. In this example, P _i ⁽¹⁾ means -10dBm.

보상reward

각 상태(state)에서 행동(action)을 진행할 때에 보상은 다음과 같이 정의할 수 있다.Rewards can be defined as follows when performing an action in each state.

S_i 값이 높다는 의미는 다른 SF를 사용하는 센서들 때문에 s_i가 제대로 정보를 송신 못하는 상황을 의미하므로 S_i가 높을 때는 행동을 높은 송신 전력을 가지도록 보상을 줘야 하며, 반대로 S_i 값이 음수일 때에는 s_i의 송신 파워로 인해 다른 센서들의 정보가 제대로 게이트웨이로 보내지지 않는다는 뜻이므로, s_i의 송신 파워를 낮추도록 보상을 설정해야 한다.Means the S _i values high because the sensor that uses a different SF s _i is therefore mean not send the right information about the situation to give a reward to act when high S _i to have a high transmit power and, conversely, the S _i values because it means that information from other sensors due to the transmission power of s _i if negative, it does not correctly sent to the gateway, you should set the compensation so as to reduce the transmission power of s _i.

그리고, S_i가 0의 값에 가까울 때에는 평형 상태를 의미하기 때문에 되도록 전송 전력의 값을 안 바꾸도록 보상을 설정해야 한다.And, when S _i is close to the value of 0, it means equilibrium, so compensation should be set so as not to change the value of the transmission power.

결국, S_i와 A_i가 차이가 없을 때 높은 보상을 줘야 하므로, 엔트로피(Entropy) 공식을 이용하여 보상 함수(reward function)를 구하며, 이때에 S_i와 A_i의 단위가 다를 수 있으므로 모두 0과 1 사이의 값으로 정규화를 진행한 후 보상 식(수학식 7)에 대입한다.In the end, since there is no difference between S _i and A _i , high compensation is required. Therefore, a reward function is obtained using the entropy formula. At this time, since the units of S _i and A _i may be different, they are all 0. After normalization with a value between 1 and 1, it is substituted into the compensation equation (Equation 7).

S_i는 모든 정수 범위에서 존재하기 때문에 정규화를 위해서는 sigmoid, tanh, arctan 등의 함수를 활용하여 정규화를 진행하고, A_i의 경우에도 A_i를 실행한 후 보상에 잘 반영될 수 있도록 함수를 구성한다(도 5 참조).S _i is configured to function so that due to the presence in all integer ranges may be in order to normalized forward normalized by using a function such as a sigmoid, tanh, arctan, and reflected in the compensation after running A _i in the case of A _i (See FIG. 5).

[수학식 7][Equation 7]

여기에서, S_i'는 정규화 된 S_i 값을 의미하고, A_i'는 정규화 된 A_i 값을 의미한다. 예를 들어, S_i'와 A_i'는 수학식 8 및 수학식 9와 같이 정규화 될 수 있다.Here, S _i 'means a normalized S _i value, and A _i ' means a normalized A _i value. For example, S _i ′ and A _i ′ may be normalized as in Equation 8 and Equation 9.

[수학식 8][Equation 8]

[수학식 9][Equation 9]

P_i의 단계가 m개 이기 때문에, h번째 파워 레벨은 h/m로 정규화 할 수 있다.Since there are m steps in P _i , the h-th power level can be normalized to h/m.

S_i와 A_i의 차이가 적을수록 높은 보상을 얻을 수 있으며, S_i와 A_i의 차이가 다를 경우 가장 낮은 보상을 얻게 된다.The smaller the difference between S _i and A _i , the higher the reward, and when the difference between S _i and A _i is different, the lowest reward is obtained.

앞에서 설정한 각 센서 s_i의 S_i, A_i, 그리고 R_i를 기반으로 각 센서별 Q Learning 알고리즘을 구동할 수 있다(도 6 참조).Based on S _i , A _i , and R _i of each sensor s _i previously set, a Q Learning algorithm for each sensor can be driven (see FIG. 6 ).

γ는 learning rate을 의미하며 0부터 1 사이의 값을 가진다.γ stands for learning rate and has a value between 0 and 1.

ε은 ε-greedy parameter를 의미하며 0.01부터 0.05 사이의 값을 가진다. ε-greedy parameter는 Q Learning을 통해서 얻은 최적의 P_i 값이 local optimum일 경우를 방지하기 위해서 ε 확률에 따라 랜덤하게 P_i 값으로 신호를 송신하게 한다.ε means ε-greedy parameter and has a value between 0.01 and 0.05. The ε-greedy parameter causes the signal to be transmitted at a random P _i value according to the ε probability to prevent the case where the optimal P _i value obtained through Q Learning is a local optimum.

λ는 discount factor를 의미하며 0부터 1사이의 값을 가진다.λ means the discount factor and has a value between 0 and 1.

Q learning algorithm을 통해 센서 s_i가 특정 S_i일 때에 어떤 A_i를 실행해야 하는지, 즉, 최적의 P_i 값을 예측하는 policy를 학습한다.Through the Q learning algorithm, it is learned which A _i should be executed when the sensor s _i is a specific S _i , that is, a policy predicting an optimal P _i value.

도 7은 전체 시스템 구조를 도시한 것이다.7 shows the overall system structure.

LoRa 게이트웨이(710)는 RSSI 측정부(711), RSSI DB(712), 전송파워 설정부(713)를 포함할 수 있다.The LoRa gateway 710 may include an RSSI measurement unit 711, an RSSI DB 712, and a transmission power setting unit 713.

RSSI 측정부(711)는 각 SF 별 성공적으로 수신되는 신호에 대한 RSSI 값을 측정하여 측정된 정보를 RSSI DB(712)에 전송한다.The RSSI measurement unit 711 measures the RSSI value of the signal successfully received for each SF and transmits the measured information to the RSSI DB 712.

RSSI DB(712)는 각 SF 별로 성공적으로 수신되는 신호에 대한 RSSI 값을 시간 인덱스를 기준으로 리스트나 tuple, 행렬 등의 형식으로 저장한다.The RSSI DB 712 stores RSSI values for signals successfully received for each SF in a list, tuple, matrix, etc. format based on a time index.

전송파워 설정부(713)는 Q Learning 모듈(730)에서 결정된 해당 디바이스의 전송 파워 정보를 LoRaWan 표준의 LinkADRReq 필드에 담아 LoRa 디바이스(720)에 전송한다.The transmission power setting unit 713 stores the transmission power information of the corresponding device determined by the Q Learning module 730 in the LinkADRReq field of the LoRaWan standard and transmits it to the LoRa device 720.

LoRa 디바이스(720)는 전송파워 설정부(721)를 포함할 수 있으며, 전송파워 설정부(7210)는 LoRa 게이트웨이(710)로부터 컨트롤 메세지를 받으면 그 안에 있는 LinkADRReq 필드에서 전송 파워(Tx power) 값을 참조하여 전송파워를 설정한다.The LoRa device 720 may include a transmission power setting unit 721, and when the transmission power setting unit 7210 receives a control message from the LoRa gateway 710, the transmission power (Tx power) value in the LinkADRReq field therein Set the transmission power with reference to.

Q Learning 모듈(730)은 물리적으로 LoRa 게이트웨이(710) 상에 존재하거나 LoRa 게이트웨이(710)와 연결된 컴퓨터 시스템 상에 존재할 수 있으며 각 LoRa 디바이스(720)별로 하나의 Q Learning 모듈(730)이 존재하게 된다. Q Learning 모듈(730)이 존재하는 컴퓨팅 공간이 도 3을 통해 설명한 컴퓨터 시스템(300)과 대응된다.The Q Learning module 730 may exist physically on the LoRa gateway 710 or on a computer system connected to the LoRa gateway 710, so that there is one Q Learning module 730 for each LoRa device 720. do. The computing space in which the Q Learning module 730 is present corresponds to the computer system 300 described through FIG. 3.

Q Learning 모듈(730)은 강화학습기반 전송 파워 결정부(731)과 데이터베이스(732)를 포함할 수 있다.The Q Learning module 730 may include a reinforcement learning-based transmission power determination unit 731 and a database 732.

강화학습기반 전송 파워 결정부(731)는 LoRa 게이트웨이(710)에 있는 RSSI DB 값에서 동시에 측정된 각 SF 별 RSSI 값을 가져와서 디바이스의 현재 상태를 설정할 수 있다. 그리고, 행동은 LoRa 디바이스(710) 별로 사전에 각 디바이스의 송신 파워 세기 설정 정보를 획득하여 가지고 있으며, 앞서 설정된 상태와 행동을 통해 보상이 가장 높은 행동인 디바이스의 송신 파워 세기를 결정할 수 있다. 이때, Q(s,a), Q(s,a') 등의 Q 값 및 파라미터, 하이퍼파라미터 들은 모두 데이터베이스(732)에 저장한다.The reinforcement learning-based transmission power determination unit 731 may set the current state of the device by getting the RSSI value for each SF simultaneously measured from the RSSI DB value in the LoRa gateway 710. In addition, the behavior is obtained by obtaining the transmission power intensity setting information of each device in advance for each LoRa device 710, and the transmission power intensity of the device having the highest compensation behavior may be determined through the previously set state and behavior. At this time, Q values, parameters, and hyperparameters such as Q(s,a) and Q(s,a') are all stored in the database 732.

Q Learning 모듈(730)(각 모듈이 하나의 센서 s_i마다 존재)에서 최적 송신 파워 세기를 결정하는 순서도는 도 8과 같다.The flowchart of determining the optimal transmit power strength in the Q Learning module 730 (each module exists for each sensor s _i ) is shown in FIG. 8.

도 8을 참조하면, Q Learning 모듈(730)은 센서 s_i의 초기 송신 파워(P_i0)를 결정하고 SIR 테이블과 함께 데이터베이스로부터 A_i 값을 획득한다(S801).Referring to FIG. 8, the Q Learning module 730 determines the initial transmission power (P _i0 ) of the sensor s _i and acquires the value of A _i from the database together with the SIR table (S801 ).

Q Learning 모듈(730)은 LoRa 게이트웨이(710)의 전송파워 설정부(713)에 상기 결정된 송신 파워 값을 전송한다(S802).The Q Learning module 730 transmits the determined transmission power value to the transmission power setting unit 713 of the LoRa gateway 710 (S802).

Q Learning 모듈(730)은 일정주기 마다 LoRa 게이트웨이(710) 상의 RSSI DB(712)에 접속하여 해당 주기의 전체 센서들의 RSSI 값을 체크한다(S803).The Q Learning module 730 accesses the RSSI DB 712 on the LoRa gateway 710 every predetermined period and checks the RSSI values of all sensors of the corresponding period (S803).

Q Learning 모듈(730)은 센서 s_i의 RSSI_i 값이 유효한지 여부를 판단하여(S804) 유효하면SF 별로 성공적으로 수신된 신호에 대한 RSSI 값을 LoRa 게이트웨이(710)에서 측정된 RSSI_i 값으로 결정하고(S805) 유효하지 않으면 디폴트로 설정된 RSSI_i 값으로 결정한다(S806).The Q Learning module 730 determines whether the RSSI _i value of the sensor s _i is valid (S804) and, if valid, the RSSI value for the signal successfully received for each SF as the RSSI _i value measured by the LoRa gateway 710. If it is determined (S805) and it is not valid, it is determined as the default RSSI _i value (S806).

Q Learning 모듈(730)은 RSSI 값과 SIR 테이블 상의 정보를 활용하여 S_i를 설정하고(S807), 각 송신 파워 레벨에서의 송신 파워 값을 활용하여 A_i 설정한다(S808).The Q Learning module 730 sets S _i using the RSSI value and the information on the SIR table (S807), and sets A _i using the transmit power value at each transmit power level (S808).

Q Learning 모듈(730)은 S_i, A_i를 통해 A_i'(최적의 전송 파워 값)를 예측할 수 있다(S809). 서로 다른 SF를 사용하는 신호를 인식하고 서로 다른 SF를 사용하는 신호 간 간섭을 줄여 에너지를 최적화 하고자 전송 파워를 예측하고 예측 값으로 조정할 수 있다.The Q Learning module 730 may predict A _i ′ (optimal transmission power value) through S _i and A _i (S809 ). In order to optimize energy by recognizing signals using different SFs and reducing interference between signals using different SFs, the transmission power can be predicted and adjusted with predicted values.

이처럼 본 발명에 따른 IoT 에너지 최적화 기술은 LoRa 기반 무선 통신을 하는 초소형 IoT 디바이스들이 대규모로 존재해야 할 때에 효과적으로 사용 가능하다. LoRa에서 발생하고 있는 간섭을 줄여주기 때문에 LoRa 디바이스의 에너지 효율을 높여줄 수 있고 또한 인터넷 프로바이더 입장에서는 LoRa 망의 전반적인 손실을 줄여주기 때문에 LoRa 망의 품질을 올려주는 기대효과를 볼 수 있다. 또한, LoRa 기반으로 통신하는 디바이스에서는 정보를 송신하는데 드는 에너지가 많이 들기 때문에 LoRa에 사용하는 에너지를 줄임으로써 디바이스의 life time을 증가시킬 수 있다. 본 발명에서 에너지 효율을 높이고자 타겟팅하고 있는 디바이스는 초소형 IoT 디바이스이다. 초소형 IoT 디바이스는 기본적으로 전원공급을 배터리를 통해서 공급을 받는 1회용 센서를 의미하는데, 한 센서의 life time이 증가하게 되면 1회용인 초소형 IoT 디바이스를 더 오래 사용이 가능하며 이를 통해 자원 절약 및 서비스 비용 절약이 가능하다.As described above, the IoT energy optimization technology according to the present invention can be effectively used when ultra-small IoT devices that perform LoRa-based wireless communication must exist on a large scale. Since it reduces the interference generated by LoRa, it can increase the energy efficiency of the LoRa device, and the Internet provider can also see the expected effect of improving the quality of the LoRa network because it reduces the overall loss of the LoRa network. In addition, since a device that communicates with LoRa requires a lot of energy to transmit information, it is possible to increase the life time of the device by reducing the energy used for LoRa. The device targeting to increase energy efficiency in the present invention is an ultra-small IoT device. The ultra-compact IoT device basically means a disposable sensor that receives power supply through a battery. If the life time of one sensor increases, the disposable ultra-compact IoT device can be used for a longer time, thereby saving resources and services. Cost savings are possible.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 어플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented with hardware components, software components, and/or combinations of hardware components and software components. For example, the devices and components described in the embodiments may include a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor (micro signal processor), a microcomputer, a field programmable gate array (FPGA), and a programmable programmable logic array (PLU). It may be implemented using one or more general purpose computers or special purpose computers, such as a logic unit, microprocessor, or any other device capable of executing and responding to instructions. The processing device may perform an operating system (OS) and one or more software applications running on the operating system. In addition, the processing device may access, store, manipulate, process, and generate data in response to the execution of the software. For convenience of understanding, a processing device may be described as one being used, but a person having ordinary skill in the art, the processing device may include a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that may include. For example, the processing device may include a plurality of processors or a processor and a controller. In addition, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instruction, or a combination of one or more of these, and configure the processing device to operate as desired, or process independently or collectively You can command the device. Software and/or data may be embodied in any type of machine, component, physical device, computer storage medium, or device in order to be interpreted by the processing device or to provide instructions or data to the processing device. have. The software may be distributed over networked computer systems, and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 이때, 매체는 컴퓨터로 실행 가능한 프로그램을 계속 저장하거나, 실행 또는 다운로드를 위해 임시 저장하는 것일 수도 있다. 또한, 매체는 단일 또는 수 개의 하드웨어가 결합된 형태의 다양한 기록수단 또는 저장수단일 수 있는데, 어떤 컴퓨터 시스템에 직접 접속되는 매체에 한정되지 않고, 네트워크 상에 분산 존재하는 것일 수도 있다. 매체의 예시로는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등을 포함하여 프로그램 명령어가 저장되도록 구성된 것이 있을 수 있다. 또한, 다른 매체의 예시로, 어플리케이션을 유통하는 앱 스토어나 기타 다양한 소프트웨어를 공급 내지 유통하는 사이트, 서버 등에서 관리하는 기록매체 내지 저장매체도 들 수 있다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. In this case, the medium may continuously store a program executable on a computer or may be temporarily stored for execution or download. In addition, the medium may be various recording means or storage means in a form of a single or several hardware combinations, and is not limited to a medium directly connected to a computer system, but may be distributed on a network. Examples of the medium include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks, And program instructions including ROM, RAM, flash memory, and the like. In addition, examples of other media may include an application store for distributing applications or a recording medium or storage medium managed by a site, server, or the like that supplies or distributes various software.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described by the limited embodiments and drawings, those skilled in the art can make various modifications and variations from the above description. For example, the described techniques are performed in a different order than the described method, and/or the components of the described system, structure, device, circuit, etc. are combined or combined in a different form from the described method, or other components Alternatively, even if replaced or substituted by equivalents, appropriate results can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

A method performed in a gateway communicating with an IoT device or a computer system connected to the gateway,
Adjusting the transmission power of the IoT device in a situation in which a spreading factor (SF) value of the IoT device is set
IoT energy optimization method comprising a.

According to claim 1,
The adjusting step,
Determining the transmission power using the RSSI value of the IoT device measured at the gateway
IoT energy optimization method characterized in that.

According to claim 1,
The adjusting step,
Adjusting the transmit power with an adaptive date rate (ADR) algorithm on LoRaWan standard technology
IoT energy optimization method characterized in that.

According to claim 1,
The adjusting step,
Determining the transmission power using Q learning algorithm, a reinforcement learning technique
IoT energy optimization method comprising a.

According to claim 4,
The determining step,
Through the Q-learning algorithm, determining an optimal transmission power that is an action to obtain a maximum reward for a state indicating the degree of interference between the IoT device and a device belonging to another SF
IoT energy optimization method characterized in that.

According to claim 4,
The determining step,
Setting a state indicating the degree of interference between the IoT device and a device belonging to another SF using an RSSI value for each SF received signal; And
Determining an optimal transmission power that is an action of obtaining maximum compensation for the state based on transmission power intensity setting information of the IoT device
IoT energy optimization method comprising a.

According to claim 4,
The gateway transmits the transmission power determined through the Q-learning algorithm to the IoT device using fields on the LoRaWan standard technology.
IoT energy optimization method characterized in that.

A method performed in a gateway communicating with an IoT device or a computer system connected to the gateway,
In the situation where the SF value of the IoT device is set, the optimal transmission power for obtaining the maximum compensation for the state indicating the degree of interference between the IoT device and a device belonging to another SF is determined through Q learning algorithm, which is a reinforcement learning technique, Adjusting the transmission power of the IoT device
IoT energy optimization method comprising a.

In the IoT energy optimization system,
It is implemented on a gateway that communicates with an IoT device or a computer system connected to the gateway,
At least one processor implemented to execute computer readable instructions
Including,
The at least one processor,
A transmission power determination unit that determines the transmission power of the IoT device in a situation in which a spreading factor (SF) value of the IoT device is set
IoT energy optimization system comprising a.

The method of claim 9,
The transmission power determining unit,
Determining the transmission power using the RSSI value of the IoT device measured at the gateway
IoT energy optimization system, characterized by.

The method of claim 9,
The transmission power determining unit,
Adjusting the transmit power with ADR algorithm on LoRaWan standard technology
IoT energy optimization system, characterized by.

The method of claim 9,
The transmission power determining unit,
Determining the transmit power using Q-learning algorithm, a reinforcement learning technique
IoT energy optimization system, characterized by.

The method of claim 12,
The transmission power determining unit,
Through the Q-learning algorithm, determining the optimal transmission power, which is an action to obtain maximum compensation for a state indicating the degree of interference between the IoT device and a device belonging to another SF
IoT energy optimization system, characterized by.

The method of claim 12,
The transmission power determining unit,
An RSSI value for each SF received signal is used to set a state indicating the degree of interference between the IoT device and a device belonging to another SF, and based on the transmission power intensity setting information of the IoT device, the maximum for the state is set. Determining the optimal transmission power, which is the act of obtaining rewards
IoT energy optimization system, characterized by.

The method of claim 12,
The gateway transmits the transmission power determined through the Q-learning algorithm to the IoT device using fields on the LoRaWan standard technology.
IoT energy optimization system, characterized by.