KR102420744B1

KR102420744B1 - IoT Packet Scheduling Control Method and Sink Node Device

Info

Publication number: KR102420744B1
Application number: KR1020210006151A
Authority: KR
Inventors: 김정근; 노유빈; 장윤경; 유인태
Original assignee: 한국전력공사; 경희대학교 산학협력단
Priority date: 2021-01-15
Filing date: 2021-01-15
Publication date: 2022-07-15

Abstract

According to the present invention, an Internet of things packet scheduling control method comprises the following steps of: transmitting a state of a data packet currently waiting to be transmitted from a sensor node to a sink node; learning the state of the data packet of the sensor node by using reinforcement learning at the sink node; scheduling transmission of data packets from a plurality of sensor nodes according to a learned result; and transmitting, to each sensor node, a frame size to be implemented and the number of packets to be transmitted by each sensor node determined by scheduling.

Description

IoT Packet Scheduling Control Method and Sink Node Device

본 발명은 무선 센서 네트워크(WSN: Wireless Sensor Network)를 구성하는 사물인터넷 환경에서의 강화 학습을 이용한 패킷 스케쥴링 및 프레임 제어 방법 및 이를 수행하는 싱크 노드 장치에 관한 것이다.The present invention relates to a packet scheduling and frame control method using reinforcement learning in an IoT environment constituting a wireless sensor network (WSN), and a sink node apparatus for performing the same.

무선 센서 네트워크(WSN: Wireless Sensor Network)는 다수의 센서 노드들이 센싱 영역에 배치되 스스로 네트워크를 구성하고, 센서 노드가 획득한 센싱 정보를 원격으로 전송하는 네트워크이다. A wireless sensor network (WSN) is a network in which a plurality of sensor nodes are disposed in a sensing area, configure a network by themselves, and remotely transmit sensing information obtained by the sensor node.

무선 센서 네트워크는 지능형 빌딩 또는 공장 내의 환경 제어, 생산 공정 자동 제어, 물류 관리, 병원에서의 물품 및 정보 관리, 환자 상태의 원격 감지, 군사 통제 등 다양한 분야에서 사용되고 있다.Wireless sensor networks are being used in various fields such as environmental control in intelligent buildings or factories, automatic control of production processes, logistics management, product and information management in hospitals, remote sensing of patient conditions, and military control.

초기에 군사 작전을 주목적으로 사용되어 왔던 센서네트워크는 반도체 기술의 향상으로 인하여 프로세서의 소형화, 고성능화가 진행되고, 메모리 용량의 대형화 및 저비용화가 실현되는 한편, 무선 통신 등의 기술 발전에 힘입어 민간 부문에서도 상용화되기 직전까지 와있다. 현재 제안되고 있는 센서네트워크 기술 응용례는 무인 경비, 일정 지역의 또는 수역의 온도나 오염도 등의 상태를 감시하는 환경 감시, 원격 검침, 설비 감시 등 실로 다양하며, 홈 네트워크 시스템과 인터넷 망과 연동하여 동작되는 응용례도 제안되고 일부 실시되고 있다.In the early days, sensor networks, which were primarily used for military operations, have been made smaller and higher in performance due to improvements in semiconductor technology, and larger and lower costs in memory capacity are realized. It is even before commercialization. The currently proposed sensor network technology application examples are really diverse, such as unmanned security, environmental monitoring that monitors the temperature or pollution level of a certain area or water body, remote meter reading, and facility monitoring, etc. Working applications are also proposed and some are being implemented.

센서네트워크의 기본 구조는 독자적 감지 능력과 컴퓨팅 능력이 있는 복수 개의 네트워크 노드가 통신망에 의하여 상호 연결된 구조이며, 각 노드의 전력은 노드별로 위치하는 로컬 배터리를 통해 공급된다. 그러나, 각 노드에 전원을 공급하는 배터리는 각 노드의 이동성을 고려하여 비교적 소용량 배터리이기 때문에 에너지 사용에 극히 제약적인 단점이 있으며, 이를 극복하기 위하여 네트워크 전 분야에 걸쳐 전력 소비 저감에 관한 연구가 진행되어 왔다. 연구 방향의 주된 흐름은 각 노드간의 무선 통신의 횟수 또는 통신량을 감소시켜 네트워크 생존시간을 최대화 하려는 것이지만, 현재까지 에너지 소비를 효율적으로 이룰 수 있는 구체적인 방법이 제시되지 못하고 있으며 네트경로 탐색이나 데이터 병합 포인트에 대한 연구 등이 발표된 바 있으나 실질적 데이터 전송량 저감을 이루어내지 못하고 있다.The basic structure of a sensor network is a structure in which a plurality of network nodes with independent sensing and computing power are interconnected by a communication network, and the power of each node is supplied through a local battery located for each node. However, since the battery that supplies power to each node is a relatively small-capacity battery in consideration of the mobility of each node, there is an extremely limited disadvantage in energy use. has been The main trend of the study is to maximize the network survival time by reducing the number or amount of wireless communication between each node. Although research on this has been published, it has not been able to actually reduce the amount of data transmission.

정리하면, 무선 센서 네트워크는 배터리로 동작하고 배터리 전원이 고갈되면 기능할 수 없다는 특징 때문에 에너지 효율을 극대화하기 위한 전송기법이 절실히 요구된다. 그러나, 기존 기술은 특정 운영조건에 한하여 제한적으로 적용가능하고, 시간에 따라서 통계적 특성이 변하는 경우에 대해서는 적응성이 미흡하였다.In summary, since the wireless sensor network operates on a battery and cannot function when the battery power is exhausted, a transmission method to maximize energy efficiency is urgently required. However, the existing technology is limitedly applicable only to specific operating conditions, and the adaptability is insufficient for the case where the statistical characteristics change over time.

대한민국 등록공보 10-0627328호Republic of Korea Registration No. 10-0627328

본 발명은 시간에 따라서 통계적 특성이 변하는 환경에서도 높은 에너지 효율을 유지할 수 있는 사물 인터넷 패킷 스케쥴링 제어 방법 및 싱크 노드 장치를 제공하고자 한다.An object of the present invention is to provide an IoT packet scheduling control method and sink node apparatus capable of maintaining high energy efficiency even in an environment in which statistical characteristics change over time.

구체적으로, 본 발명은 싱크 노드를 중심으로 다수의 센서노드들이 전송을 하는 경우 센서노드의 수명, 즉, 네트워크의 수명을 극대화하기 위한 목적으로 센서노드들의 활성화되는 프레임의 길이, 프레임동안 각 센서노드가 전송할 패킷의 수를 강화 학습 기법을 이용하여 결정하는 알고리즘을 제안한다.Specifically, in the present invention, when a plurality of sensor nodes transmit data centering on a sink node, for the purpose of maximizing the lifetime of the sensor node, that is, the lifetime of the network, the length of the frame in which the sensor nodes are activated, and each sensor node during the frame. We propose an algorithm that determines the number of packets to be transmitted using reinforcement learning.

본 발명의 일 측면에 따른 사물 인터넷 패킷 스케쥴링 제어 방법은, 센서 노드로부터 현재 전송 대기중인 데이터 패킷의 상태를 싱크 노드로 전송하는 단계; 상기 싱크 노드에서 강화 학습을 이용하여 상기 센서 노드의 데이터 패킷의 상태를 학습하는 단계; 학습된 결과에 따라 다수 개의 센서 노드들로부터의 데이터 패킷 전달을 스케쥴링하는 단계; 및 스케쥴링에 의해 결정된 각 센서 노드가 이행할 프레임 크기와 전송할 패킷의 수를 상기 각 센서 노드에게 전송하는 단계를 포함할 수 있다.According to an aspect of the present invention, there is provided a method for controlling IoT packet scheduling, comprising: transmitting a state of a data packet currently waiting to be transmitted from a sensor node to a sink node; learning the state of the data packet of the sensor node using reinforcement learning at the sink node; scheduling data packet delivery from a plurality of sensor nodes according to the learned result; and transmitting a frame size to be fulfilled by each sensor node determined by scheduling and the number of packets to be transmitted to each sensor node.

여기서, 상기 각 센서 노드로부터 전송에서 결정한 상기 프레임 크기와 전송할 패킷의 수를 적용하여 전송한 결과를 상기 강화 학습에 적용하는 단계를 더 포함할 수 있다.Here, the method may further include the step of applying the transmission result by applying the frame size determined in transmission from each sensor node and the number of packets to be transmitted to the reinforcement learning.

여기서, 상기 센서 노드의 데이터 전송을 위한 보상 함수를 설계하는 단계를 더 포함할 수 있다.Here, the method may further include designing a compensation function for data transmission of the sensor node.

여기서, 상기 보상 함수를 설계하는 단계에서는, 큐-러닝 기법을 이용하여 네트워크 수명과 전송율을 극대화하기 위한 보상함수(reward function)를 설계할 수 있다.Here, in the step of designing the reward function, a reward function for maximizing the network lifespan and transmission rate may be designed using a queue-learning technique.

여기서, 상기 강화 학습은, 상태(state)로서 상기 센서 노드에서 상기 싱크 노드로 전송하는 데이터 패킷의 지연시간을 적용하고, 행동(action)으로서 상기 센서 노드에서 상기 싱크 노드로 전송하는 데이터 패킷을 위한 프레임 길이 및 패킷 수를 적용하고, 보상(reward)으로서 상기 센서 노드에서 상기 싱크 노드로 데이터 패킷 전달의 품질을 적용하는 큐-러닝 기법일 수 있다.Here, in the reinforcement learning, a delay time of a data packet transmitted from the sensor node to the sink node is applied as a state, and a data packet transmitted from the sensor node to the sink node is applied as an action. It may be a queue-learning technique that applies the frame length and the number of packets, and applies the quality of data packet delivery from the sensor node to the sink node as a reward.

여기서, 상기 상태(state)는 하기 수학식에 따라 설정될 수 있다.Here, the state may be set according to the following equation.

상태 = (d_1, d_2, ... , d_k)state = (d_1, d_2, ... , d_k)

(d_k는 큐에 있는 k번째 패킷의 지연시간(delay)) (d_k is the delay of the kth packet in the queue)

또는, 상기 상태(state)는 하기 수학식에 따라 설정될 수 있다.Alternatively, the state may be set according to the following equation.

상태 = (r_1, r_2, ... , r_k) state = (r_1, r_2, ... , r_k)

(r_k는 큐에 있는 k번째 패킷의 잔존 지연시간) (r_k is the remaining delay time of the kth packet in the queue)

여기서, 상기 행동(action)은 하기 수학식에 따라 설정될 수 있다.Here, the action may be set according to the following equation.

행동 = (L, n_1, n_2, ... n_N)action = (L, n_1, n_2, ... n_N)

(L은 프레임의 길이, n_k는 k번째 센서노드가 이번 프레임에서 보낼 패킷의 수)(L is the length of the frame, n_k is the number of packets that the kth sensor node will send in this frame)

여기서, 상기 보상(reward)은 하기 수학식에 따라 설정될 수 있다. Here, the reward may be set according to the following equation.

보상 = L x QoSReward = L x QoS

(L은 프레임의 길이, QoS는 행동을 통해서 지시한 각 센서노드별 전송할 패킷의 수가 얼마나 효과적인지를 나타내는 지표)(L is the length of the frame, QoS is an index indicating how effective the number of packets to transmit for each sensor node indicated through actions)

여기서, 상기 스케쥴링하는 단계는, 학습된 결과에 따라 프레임 크기를 결정하는 단계; 및 학습된 결과에 따라 각 센서노드에 대하여 각 프레임에서 전송할 패킷의 수를 결정할 수 있다.Here, the scheduling may include: determining a frame size according to a learned result; And it is possible to determine the number of packets to be transmitted in each frame for each sensor node according to the learned result.

여기서, 상기 스케쥴링하는 단계에서는, 다수의 센서 노드들에 대한 강화 학습은 각 센서 노드별로 개별적으로 수행하되, 학습 결과 행동에 대한 불일치가 발생하는 경우, 활용(exploitation)으로부터 도출된 행동에 우선 순위를 부여할 수 있다.Here, in the scheduling step, reinforcement learning for a plurality of sensor nodes is performed individually for each sensor node, but when inconsistency in behavior occurs as a result of learning, priority is given to behavior derived from exploitation. can be given

여기서, 상기 스케쥴링하는 단계에서는, 새로운 센서 노드의 연결시, 기존의 학습이 가장 활발한 센서 노드의 가치 테이블(value table)을 복사할 수 있다.Here, in the scheduling step, when a new sensor node is connected, a value table of a sensor node in which existing learning is most active may be copied.

본 발명의 다른 측면에 따른 싱크 노드 장치는, 각 센서 노드로부터 센싱 데이터가 담긴 패킷을 수신하는 센서 노드 수신부; 각 센서 노드에서의 데이터 전송을 위해 설계된 보상 함수 및 각 센서 노드의 상태 정보가 기록된 저장부; 각 센서 노드에 대하여 결정된 프레임 크기 및 패킷 수를 전송하는 센서 노드 송신부; 강화 학습을 이용하여 각 센서 노드 상태에 대한 프레임 크기 및 패킷 수를 학습하는 강화 학습부; 및 상기 학습부의 학습된 결과에 따라 각 센서 노드에 대하여 프레임 크기 및 패킷 수를 결정하는 스케쥴러를 포함할 수 있다.A sink node apparatus according to another aspect of the present invention includes: a sensor node receiving unit for receiving a packet containing sensing data from each sensor node; a storage unit in which a compensation function designed for data transmission in each sensor node and state information of each sensor node are recorded; a sensor node transmitter for transmitting the determined frame size and number of packets to each sensor node; a reinforcement learning unit for learning the frame size and the number of packets for each sensor node state using reinforcement learning; and a scheduler that determines the frame size and the number of packets for each sensor node according to the learned result of the learning unit.

여기서, 상기 강화 학습부는, 상기 싱크 노드 장치가 담당하는 센서 노드들 각각에 대하여 강화 학습을 지속적으로 계속 수행하되, 각 센서 노드에 대하여 현재 시점까지 완료된 강화 학습 결과를 상기 저장부에 각 센서 노드를 위해 할당된 영역에 저장할 수 있다.Here, the reinforcement learning unit continuously performs reinforcement learning for each of the sensor nodes in charge of the sink node device, and stores the reinforcement learning results completed up to the present time for each sensor node in the storage unit. It can be stored in the allocated area for

상술한 구성의 본 발명의 사상에 따른 사물 인터넷 패킷 스케쥴링 제어 방법 및/또는 싱크 노드 장치를 실시하면, 시간에 따라서 통계적 특성이 변하는 환경에서도 사물 인터넷 장치의 높은 에너지 효율을 유지할 수 있는 이점이 있다.If the IoT packet scheduling control method and/or the sink node device according to the spirit of the present invention having the above configuration are implemented, there is an advantage in that high energy efficiency of the IoT device can be maintained even in an environment in which statistical characteristics change according to time.

본 발명의 사물 인터넷 패킷 스케쥴링 제어 방법 및/또는 싱크 노드 장치는, 지연시간으로 표현된 QoS도 만족하면서도 에너지 효율을 제고하여 센서의 수명을 극대화하는 이점이 있다. The IoT packet scheduling control method and/or sink node device of the present invention has the advantage of maximizing the lifespan of a sensor by improving energy efficiency while satisfying QoS expressed by delay time.

도 1은 본 발명의 실시예에 따른 무선 센서 네트워크 통신환경의 개념도.
도 2는 본 발명의 실시예에 따른 싱크노도와 다수의 센서노드들간의 통신 방식에 관한 개념도.
도 3은 본 발명의 실시예에 따른 큐-러닝기반 프레임 길이 및 패킷 스케쥴링 개념도.
도 4는 본 발명의 사상에 따른 사물 인터넷 패킷 스케쥴링 제어 방법의 일 실시예를 도시한 흐름도.
도 5는 본 발명의 사상에 따른 사물 인터넷 패킷 스케쥴링 제어 방법을 수행할 수 있는 싱크 노드 장치의 일 실시예를 도시한 블록도.1 is a conceptual diagram of a wireless sensor network communication environment according to an embodiment of the present invention;
2 is a conceptual diagram illustrating a communication method between a sink node and a plurality of sensor nodes according to an embodiment of the present invention.
3 is a conceptual diagram of a queue-learning-based frame length and packet scheduling according to an embodiment of the present invention;
4 is a flowchart illustrating an embodiment of a method for controlling IoT packet scheduling according to the spirit of the present invention.
5 is a block diagram illustrating an embodiment of a sink node apparatus capable of performing the IoT packet scheduling control method according to the spirit of the present invention.

본 발명을 설명함에 있어서 제 1, 제 2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 구성요소들은 용어들에 의해 한정되지 않을 수 있다. 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제 1 구성요소는 제 2 구성요소로 명명될 수 있고, 유사하게 제 2 구성요소도 제 1 구성요소로 명명될 수 있다. In describing the present invention, terms such as first, second, etc. may be used to describe various components, but the components may not be limited by the terms. The terms are only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, a first component may be referred to as a second component, and similarly, a second component may also be referred to as a first component.

어떤 구성요소가 다른 구성요소에 연결되어 있다거나 접속되어 있다고 언급되는 경우는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해될 수 있다.When a component is referred to as being connected or connected to another component, it may be directly connected or connected to the other component, but it can be understood that other components may exist in between. .

본 명세서에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함할 수 있다. The terms used herein are used only to describe specific embodiments, and are not intended to limit the present invention. The singular expression may include the plural expression unless the context clearly dictates otherwise.

본 명세서에서, 포함하다 또는 구비하다 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것으로서, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해될 수 있다. In this specification, the terms include or include are intended to designate that a feature, number, step, operation, component, part, or combination thereof described in the specification exists, and includes one or more other features or numbers, It may be understood that the possibility of the presence or addition of steps, operations, components, parts, or combinations thereof is not precluded in advance.

또한, 도면에서의 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다.In addition, shapes and sizes of elements in the drawings may be exaggerated for clearer description.

본 발명에서는 인공지능 기법 중에서 강화학습계열의 학습알고리즘인 큐-러닝 (Q-learning)를 이용하여 에너지 효율을 결정짓는 핵심 파라미터인 프레임의 길이, 한 프레임당 전송할 패킷의 수를 결정한다. In the present invention, the length of a frame, which is a key parameter that determines energy efficiency, and the number of packets to be transmitted per frame are determined by using Q-learning, a learning algorithm of the reinforcement learning series among artificial intelligence techniques.

큐-러닝 기법은 학습과 실행을 지능적으로 병행하면서 환경변화에 대한 적응성을 제고하고 보다 효과적인 제어를 가능하게 해준다. The Q-learning technique intelligently parallels learning and execution while enhancing adaptability to environmental changes and enabling more effective control.

먼저, 본 발명의 사상이 적용될 수 있는 무선 센서 네트워크에 대하여 기술하겠다.First, a wireless sensor network to which the spirit of the present invention can be applied will be described.

도 1은 본 발명의 실시예에 따른 무선 센서 네트워크 통신환경의 개념도이다.1 is a conceptual diagram of a wireless sensor network communication environment according to an embodiment of the present invention.

도 2는 본 발명의 실시예에 따른 싱크노도와 다수의 센서노드들간의 통신 방식에 관한 개념도이다.2 is a conceptual diagram illustrating a communication method between a sync node and a plurality of sensor nodes according to an embodiment of the present invention.

도면에서, 싱크와 센서노드 간의 통신은 프레임이라는 시간 단위로 일어나며 싱크가 프레임의 시작을 알리는 패킷을 보내고 그 패킷 안에는 현재 프레임의 길이, 각 센서노드 별 전송할 패킷의 수를 알려준다. 각 노드는 패킷을 전송한 이후에는 다음 프레임이 시작할 때까지 수면모드(sleeping mode)을 유지하면서 에너지를 절감하게 된다. In the figure, communication between the sink and the sensor node occurs in units of time called frames, and the sink sends a packet announcing the start of a frame, and in the packet, the length of the current frame and the number of packets to be transmitted for each sensor node are informed. After each node transmits a packet, it saves energy while maintaining a sleeping mode until the next frame starts.

도 3은 본 발명의 실시예에 따른 큐-러닝기반 프레임 길이 및 패킷 스케쥴링 개념도이다.3 is a conceptual diagram of a queue-learning-based frame length and packet scheduling according to an embodiment of the present invention.

본 발명에서 제안하는 기술은 다수의 센서노드를 운영하는 사물인터넷(Internet of Things: IoT) 환경에서 큐-러닝 (Q-learning)를 이용하여 센서노드의 데이터 전송량, 프레임의 길이를 제어하기 위한 것이다. The technology proposed in the present invention is to control the data transmission amount and frame length of sensor nodes by using Q-learning in an Internet of Things (IoT) environment in which a plurality of sensor nodes are operated. .

센서노드는 발생된 센싱 데이터를 이를 수집하는 싱크노드에 전송하게 되는데, 센싱 데이터는 데이터의 특성에 따라서 일정시간 안에 전송되어야 하는 지연시간 요건을 갖는다. The sensor node transmits the generated sensing data to the sink node that collects it, and the sensing data has a delay time requirement to be transmitted within a certain time according to the characteristics of the data.

싱크노드를 이 요건을 감안하여 센서노드의 전송량을 적절히 스케쥴링 해야한다. 또한, 모든 센서노드는 에너지 절감을 목적으로 한 프레임 시간 내에서 전송(활성화)후 수면상태을 유지하게 되고 이런 형태의 프레임은 센서노드가 동작하는 동안 계속 반복된다. In consideration of this requirement for the sink node, the transmission amount of the sensor node should be appropriately scheduled. In addition, all sensor nodes maintain a sleep state after transmission (activation) within a frame time for the purpose of energy saving, and this type of frame is repeated while the sensor node operates.

싱크노드는 매 프레임마다 프레임의 크기를 제어하여 에너지 절감을 수행한다. 본 발명의 사상에 따라, 싱크노드는 큐-러닝 기법을 사용하여 센서노드의 수명과 전송률을 극대화하는 목적으로 학습을 수행하며 이를 바탕으로 매 프레임마다 각 센서노드의 전송량 (패킷수)과 프레임의 크기를 제어한다. The sink node performs energy saving by controlling the frame size for every frame. According to the idea of the present invention, the sink node performs learning for the purpose of maximizing the lifespan and transmission rate of the sensor node using the queue-learning technique, and based on this, the transmission amount (number of packets) of each sensor node and the frame Control the size.

도 4는 본 발명의 사상에 따른 사물 인터넷 패킷 스케쥴링 제어 방법의 일 실시예를 도시한 흐름도이다.4 is a flowchart illustrating an embodiment of a method for controlling IoT packet scheduling according to the spirit of the present invention.

도시한 흐름도에 따른 사물 인터넷 패킷 스케쥴링 제어 방법은, 센서 노드 데이터 전송을 위한 보상 함수를 설계하는 단계(S20); 센서 노드의 상태를 정의하는 단계(S40); 센서 노드로부터 현재 전송 대기중인 패킷의 상태를 싱크에서 전송하는 단계(S120); 싱크에서 큐-러닝 기법을 이용하여 노드 상태 및 패킷 상태를 학습하는 단계(S140); 큐-러닝에 따라 프레임(일종의 타임슬롯의 집합체임) 크기를 결정하는 단계(S150); 큐-러닝에 따라 각 센서노드 당 현 프레임에서 전송할 패킷의 수를 결정하는 단계(S160); 프레임 크기와 전송할 패킷의 수를 센서노드들에게 전송하는 단계(S170); 및 전송의 결과를 고려하여 큐-러닝 학습을 수행하는 단계(S180)를 포함한다.The IoT packet scheduling control method according to the illustrated flowchart includes the steps of designing a compensation function for data transmission to a sensor node (S20); defining the state of the sensor node (S40); Transmitting the status of the packet currently waiting to be transmitted from the sensor node at the sink (S120); learning a node state and a packet state using a queue-learning technique in the sink (S140); determining the size of a frame (which is a kind of a collection of timeslots) according to cue-learning (S150); determining the number of packets to be transmitted in the current frame per each sensor node according to the queue-learning (S160); transmitting the frame size and the number of packets to be transmitted to the sensor nodes (S170); and performing queue-learning learning in consideration of the result of transmission (S180).

센서 노드 데이터 전송을 위한 보상 함수는 다음과 같은 방식으로 설계될 수 있다.A compensation function for sensor node data transmission can be designed in the following way.

보상 함수는 적용하는 강화 학습 방법에 적합한 것들로 설계될 수 있는데, 하기 실시예의 설명에서는 학습 효과가 우수하다고 알려진 큐러닝 기법을 적용하는 경우로 구체화하여 기술한다.The reward function may be designed to be suitable for the reinforcement learning method to be applied, and in the description of the following examples, it will be concretely described as a case of applying a Q-learning technique known to have an excellent learning effect.

먼저 큐러닝 기법을 무선 센서 네트워크에 적용하여, 보상함수를 설계하는 과정(S20)부터 살펴보겠다. First, we will look at the process (S20) of designing a compensation function by applying a Q-learning technique to a wireless sensor network.

보상함수를 설계하는 단계(S20)에서는, 무선 센서 네트워크 시스템의 종류(서비스 업체나 장비 사양으로 정의할 수 있음)에 특정되어 보상함수가 설계되거나, 각 무선 센서 네트워크 시스템이 설치되는 사이트에 특정되어 보상함수가 설계될 수 있다.In the step of designing the compensation function (S20), the compensation function is designed by being specific to the type of the wireless sensor network system (which can be defined by a service company or equipment specification), or it is specific to the site where each wireless sensor network system is installed. A compensation function can be designed.

보상 함수를 설계하는 과정에는 큐러닝 알고리즘을 구성하는 상기 센서 노드의 상태(state)를 정의하는 과정이 포함되는 것이 일반적이다. 그러나, 도면에서는 센서 노드와 싱크 노드간의 주고받는 주요한 정보로서 센서 노드의 상태(state)를 강조하기 위해, 상기 센서 노드의 상태를 정의하는 단계(S40)를 구분하였다.The process of designing a reward function generally includes a process of defining the state of the sensor node constituting the Q-learning algorithm. However, in the drawing, in order to emphasize the state of the sensor node as the main information exchanged between the sensor node and the sink node, the step of defining the state of the sensor node ( S40 ) is divided.

무선 센서 네트워크 시스템의 운영의 관점에서는, 상기 보상함수를 설계하는 단계(S20)는, 시스템 구축시에 실행되는 과정으로서 생략될 수 있음을 밝혀둔다. 마찬가지로 무선 센서 네트워크 시스템의 운영의 관점에서 상기 센서 노드의 상태를 정의하는 단계(S40)도 생략될 수 있다.From the perspective of the operation of the wireless sensor network system, it is noted that the step of designing the compensation function ( S20 ) can be omitted as a process executed at the time of system construction. Similarly, the step ( S40 ) of defining the state of the sensor node from the viewpoint of operation of the wireless sensor network system may also be omitted.

본 발명에는 다양한 강화학습을 적용할 수 있는데, 이중 적용이 용이하면서도 학습 효과가 우수한 강화학습인 큐-러닝 알고리즘을 적용한 경우로 구체화하여 설명하겠다.A variety of reinforcement learning can be applied to the present invention, and it will be concretely described as a case where the Q-learning algorithm, which is easy to apply and has an excellent learning effect, is applied.

큐-러닝 알고리즘을 구성하기 위해서는 상태(state), 행동(action), 보상(reward)를 다음과 같이 설정할 수 있다. In order to configure the queue-learning algorithm, the state, action, and reward can be set as follows.

첫번째 상태(state)로서, 각 센서 노드는 센싱 데이터 전송을 위해서 패킷을 생성하며, 생성된 패킷은 센서 노드의 큐(queue)에서 전송 순서를 기다린다. 각 패킷은 일정시간 내에 전달되어야 요건이 있으며 이를 최대지연시간(maximum latency)이라 한다. 각 패킷은 최대지연시간이 지나기 전에 싱크노드에 전송되어야 한다. 만약 최대지연시간이 지나도 전송되지 않은 패킷은 센서 노드의 큐에서 제거하여 폐기한다. 센서 노드에 k개의 패킷이 큐에 있으면 이 각 패킷의 지연시간을 싱크노드에게 패킷으로 전달한다. As a first state, each sensor node generates a packet for sensing data transmission, and the generated packet waits for transmission order in a queue of the sensor node. Each packet has a requirement to be delivered within a certain amount of time, which is called maximum latency. Each packet must be transmitted to the sink node before the maximum delay time has elapsed. If the packet is not transmitted even after the maximum delay time, it is removed from the queue of the sensor node and discarded. If k packets are queued in the sensor node, the delay time of each packet is transmitted to the sink node as a packet.

각 패킷의 지연시간을 반영하는 센서 노드의 상태(state)는 예컨대, 하기 수학식 1에 따라 정의될 수 있다.The state of the sensor node reflecting the delay time of each packet may be defined, for example, according to Equation 1 below.

[수학식 1] [ Equation 1 ]

상태 = (d_1, d_2, ... , d_k) state = (d_1, d_2, ... , d_k)

상기 수학식에서 d_k는 큐에 있는 k번째 패킷의 지연시간(delay)이다. k가 클수록 더 많은 정보를 제공하지만 상태 공간 S의 크기가 기하 급수적으로 증가하여, 후술하는 상태 공간의 절감 방법이 필요할 수 있다.In the above equation, d_k is the delay time of the k-th packet in the queue. As k increases, more information is provided, but the size of the state space S increases exponentially, so a method of reducing the state space, which will be described later, may be required.

지연시간으로 구성된 상태정보를 패킷으로 보낼 때 구간별로 나누어 번호를 할당한 다음 0,1,2,.. 등의 정수 형태의 구간 번호를 보내는 것이 효과적이다. When sending the status information composed of the delay time as packets, it is effective to divide the number by section and then send the section number in the form of an integer such as 0,1,2,..

두번째 행동(action)으로서, 싱크 노드는 각 센서 노드들이 보내온 상태, 보상을 고려한 큐-러닝 알고리즘을 실행해서 최상의 행동을 결정한다. As the second action, the sink node determines the best action by executing a queue-learning algorithm that considers the state and reward sent by each sensor node.

행동(action)은 예컨대 하기 수학식 2와 같은 두 가지 요소로 구성될 수 있다.An action may be composed of, for example, two elements as shown in Equation 2 below.

[수학식 2] [ Equation 2 ]

행동 = (L, n_1, n_2, ... n_N)action = (L, n_1, n_2, ... n_N)

상기 수학식에서 L은 프레임의 (시간적)길이, n_k는 k번째 센서노드가 이번 프레임에서 보낼 패킷의 수이다. 싱크노드가 큐-러닝을 통해서 결정한 행동은 패킷에 실려서 전체 센서 노드에게 브로드캐스트(broadcast)된다. 행동정보를 수신한 각 센서노드는 프레임 길이를 확인하여 자신의 전송이 끝난 뒤 새로운 프레임이 시작될 때까지 수면에 돌입한다. 또한 행동에 제시된 패킷의 수 만큼 싱크노드에게 패킷을 전송하게 된다. In the above equation, L is the (temporal) length of the frame, and n_k is the number of packets to be sent by the k-th sensor node in this frame. Actions determined by the sink node through queue-learning are loaded into packets and broadcast to all sensor nodes. Each sensor node that receives the behavior information checks the frame length and enters sleep until a new frame starts after its transmission is finished. In addition, packets are transmitted to the sink node as many as the number of packets presented in the action.

세번째, 보상(reward)에 대하여 기술한다. Third, the reward will be described.

운영 매개 변수를 스케줄링하는 목적은 지연 제약 조건을 충족하면서 네트워크 수명을 연장하는 것이다. 따라서 보상 기능은 학습 과정이 목표와 일치하도록 안내하도록 설계되어야 한다. 예컨대, 프레임 워크에서 보상 기능은 네트워크 수명과 지연 제한으로 표현되는 QoS 메트릭의 두 부분으로 구성될 수 있다.The purpose of scheduling operational parameters is to extend network lifetime while meeting latency constraints. Therefore, the reward function should be designed to guide the learning process in line with the goal. For example, the compensation function in the framework may consist of two parts: network lifetime and QoS metrics expressed as delay limits.

다시말해, 보상은 큐-러닝을 통해서 우리가 얻고자하는 공학적 목적이 담겨야 한다. 본 발명에서 제안하는 방법 및 장치의 목적은 센서 네트워크의 수명을 극대화하게 동시에 최대지연시간내에 패킷을 전달하는 것이다. 이를 감안하여 본 발명에서 보상함수는 하기 수학식 3과 같이 설정될 수 있다.In other words, the reward should contain the engineering purpose we want to achieve through Q-learning. An object of the method and apparatus proposed in the present invention is to maximize the lifetime of a sensor network and simultaneously transmit packets within the maximum delay time. In consideration of this, in the present invention, the compensation function may be set as in Equation 3 below.

[수학식 3] [ Equation 3 ]

보상 = L x QoSReward = L x QoS

상기 수학식에서 L은 프레임의 길이, QoS는 행동을 통해서 지시한 각 센서노드별 전송할 패킷의 수가 얼마나 효과적인지를 나타내는 지표이다. In the above equation, L is the length of the frame, and QoS is an index indicating how effective the number of packets to transmit for each sensor node indicated through an action is.

QoS는 적용환경이나 목적에 따라 다양한 방식이 적용될 수 있다.QoS can be applied in various ways depending on the application environment or purpose.

예컨대, 계산을 간략화하는 목적을 위한 예로서, 하기 수학식 4에 따른 QoS를 적용할 수 있다.For example, as an example for the purpose of simplifying calculation, QoS according to Equation 4 below may be applied.

[수학식 4] [ Equation 4 ]

QoS = 1, if 이번 프레임에서 패킷의 손실(지연시간 초과로 인한)이 없는 경우, QoS = 1, if there is no packet loss (due to delay timeout) in this frame,

= 0, if 이번 프레임에서 패킷의 손실(지연시간 초과로 인한)이 있는 경우. = 0, if there is a packet loss (due to delay timeout) in this frame.

상기 수학식 4에 따른 QoS를 적용하는 유형은 만료된 지연 제한으로 인해 패킷이 삭제되는 경우 0값을 할당하여 패킷 손실을 최소화하도록 설계한 것이다.The type of applying QoS according to Equation 4 is designed to minimize packet loss by allocating a value of 0 when a packet is deleted due to an expired delay limit.

또 다른 예로서, 상기 수학식 4의 QoS와 달리 패킷 손실을 줄이는 것에 절대적인 가치를 두는 것을 선호하는 형식의 QoS는 하기 수학식 5에 따른 QoS를 적용할 수 있다. As another example, QoS according to Equation 5 below may be applied to QoS of a type that prefers to place an absolute value on reducing packet loss, unlike the QoS of Equation 4 above.

[수학식 5] [ Equation 5 ]

QoS = n_t/ (n_t + n_t- + n_t^) QoS = n_t/ (n_t + n_t- + n_t^)

상기 수학식에서 n_t는 현 프레임에서 센서노드의 전송할 패킷의 수, n_t-는 현 프레임에서 스케쥴 받지 못해서 폐기된 패킷의 수, n_t^는 프레임 시작 이후에 도착해서 상태로 보고되지 않아서 폐기된 패킷의 수이다. In the above equation, n_t is the number of packets to be transmitted by the sensor node in the current frame, n_t- is the number of packets discarded due to not receiving a schedule in the current frame, and n_t^ is the number of packets discarded because they arrive after the start of the frame and are not reported as a status to be.

상기 보상 함수를 설계하는 단계(S20)에서는, 상술한 큐-러닝 알고리즘을 구성하는 상태(state), 행동(action), 보상(reward)에 대한, 상기 수학식들을 결정/선택할 수 있다.In the designing of the reward function (S20), the above equations for the state, action, and reward constituting the above-described queue-learning algorithm may be determined/selected.

이 경우, 후속되는 상기 센서 노드의 상태를 정의하는 단계(S40)에서는, 상기 수학식 1에 대한 파라미터들을 설정할 수 있다. In this case, in the subsequent step of defining the state of the sensor node ( S40 ), parameters for Equation 1 may be set.

또한, 이미 구축된 센서 네트워크에 새로운 센서 노드가 추가되는 경우, 추가된 센서 노드에 대하여 상기 센서 노드의 상태를 정의하는 단계(S40)를 수행할 수 있다.In addition, when a new sensor node is added to the already built sensor network, the step of defining the state of the sensor node with respect to the added sensor node ( S40 ) may be performed.

다음, 상기 센서 노드로부터 현재 전송 대기중인 패킷의 상태를 싱크에서 전송하는 단계(S120)에서, 각 센서 노드는 전송을 위해서 큐에서 대기중인 패킷의 전부 또는 일부의 지연시간 정보를 하기 수학식 6에 따른 형식으로 패킷에 실어서 보낼 수 있다.Next, in the step (S120) of transmitting the status of the packet currently waiting to be transmitted from the sensor node to the sink, each sensor node transmits delay time information of all or part of the packet waiting in the queue for transmission in Equation 6 below. It can be sent in a packet in the following format.

[수학식 6] [ Equation 6 ]

상태 = (d_1, d_2, ... , d_k) state = (d_1, d_2, ... , d_k)

상기 수학식에서 d_k는 큐에 있는 k번째 패킷의 지연시간이다. 지연시간을 구간별로 나누어 번호를 할당한 다음 0,1,2,.. 등의 정수형태의 구간 번호를 보내는 것이 효과적이다. 예를 들어, 4초를 최대로 하여 20 등분하는 경우, 상태의 값은 하기 표 1과 같이 할당될 수 있다. In the above equation, d_k is the delay time of the k-th packet in the queue. It is effective to assign a number by dividing the delay time by section, and then send the section number in the form of an integer such as 0,1,2,.. For example, in the case of dividing into 20 equal parts by maximizing 4 seconds, the value of the state may be assigned as shown in Table 1 below.

또한, 지연시간 대신 잔존시간으로 하기 수학식 7에 따라 상태를 구성할 수 있다.In addition, the state may be configured according to the following Equation 7 as the remaining time instead of the delay time.

[수학식 7] [ Equation 7 ]

상태 = (r_1, r_2, ... , r_k) state = (r_1, r_2, ... , r_k)

상기 수학식에서 r_k는 큐에 있는 k번째 패킷의 잔존 지연시간이다. 잔존시간이라 함은 패킷에 주어진 최대 전달 지연시간에서 현재의 지연 시간을 뺀 값으로 폐기되지 않기 위해서는 이 시간내에 전송되어야 함을 의미하는 것이다. 예를 들어 최대지연시간이 4초라면 상태들은 다음 표 2와 같이 구성될 수 있다.In the above equation, r_k is the remaining delay time of the k-th packet in the queue. Residual time is a value obtained by subtracting the current delay time from the maximum delivery delay time given to a packet, and means that it must be transmitted within this time in order not to be discarded. For example, if the maximum delay time is 4 seconds, the states may be configured as shown in Table 2 below.

다음, 강화학습을 수행하기 위한 보상 정보 및 보상 정보의 전송에 대하여 살펴본다.Next, reward information for performing reinforcement learning and transmission of reward information will be described.

보상 값은 큐-러닝을 운영하는데 필요한 필수 정보로서, 센서 노드는 보상 값을 다음 수학식 8에 의거하여 계산할 수 있다.The compensation value is essential information required to operate Q-learning, and the sensor node may calculate the compensation value based on Equation 8 below.

[수학식 8] [ Equation 8 ]

보상 = L x QoSReward = L x QoS

상기 수학식에 따른 보상 값은 센서 노드가 계산하여 패킷에 실어서 싱크 노드에게 전송하거나, 또는 싱크 노드가 이미 L(프레임 길이)를 알고 있으므로 QoS 값만 전송할 수도 있다.The compensation value according to the above equation may be calculated by the sensor node, loaded in a packet, and transmitted to the sink node, or only the QoS value may be transmitted because the sink node already knows L (frame length).

다음 큐-러닝 기법을 이용하여 학습하는 단계(S140)에 대하여 살펴보겠다.Next, a step ( S140 ) of learning using the queue-learning technique will be described.

큐-러닝을 현재 상태가 s 라고 할 때, 다음 절차에 의해서 학습을 진행한다.When the current state of Q-learning is s, learning proceeds according to the following procedure.

1. 초기화: 현재 상태 s, 행동 a 1. Initialization : current state s, action a

2. 파라미터: epsilon, ALPHA, GAMMA 2. Parameters: epsilon, ALPHA, GAMMA

3. if unif(0,1) < epsilon: # 0-1사이의 난수가 epsilon 보다 작을 때 탐험 (exploration)3. if unif(0,1) < epsilon: # Exploration when a random number between 0-1 is less than epsilon

4. a = random_action() # 임의의 행동 선택4. a = random_action() # choose a random action

5. else : # 활용 (exploitation)5. else : # exploitation (exploitation)

6. a = optimal_action(s) # 최적 행동 a 선택. 현 상태 s 에서 가장 큰 가치값을 갖는 행동. a = (프레임길이, 각 센서노드별 전송할 패킷 수) 6. a = optimal_action(s) # Choose optimal action a. The action with the greatest value in the current state s. a = (frame length, number of packets to be transmitted by each sensor node)

7. new_s, r = step(a) # 행동 a를 수행하고 이의 결과로 변화된 상태 new_s와 보상 r를 센서노드로부터 수신한다. 여기서 한 스텝은 한 프레임을 의미. 프레임 단위로 새로운 행동 지시7. new_s, r = step(a) # Execute action a and receive the changed state new_s and reward r from the sensor node as a result. Here, one step means one frame. New action instruction frame by frame

8. value[(new_s, a)] += (1-ALPHA) + (r + GAMMA *best_v(new_s)) * ALPHA # 여기서 best_v(new_s) 는 새로운 상태 new_s에서 취할 수 있는 행동중에서 최대 가치값을 갖는 행동의 가치8. value[(new_s, a)] += (1-ALPHA) + (r + GAMMA *best_v(new_s)) * ALPHA # where best_v(new_s) is the maximum value of actions that can be taken in the new state new_s value of action

9. Goto 3: # 센서노드가 살아 있는 동안 3번으로 가서 다시 반복 9. Goto 3: # Go to step 3 and repeat while the sensor node is alive

다음, 상태공간의 절약 방안의 하나에 대하여 알아보겠다.Next, one of the methods of saving state space will be described.

상태를 상기 수학식 6에 따라 상태 = (d_1, d_2, ... , d_k)로 나타내면 다음 수학식 9와 같은 관계가 있다.If the state is expressed as state = (d_1, d_2, ... , d_k) according to Equation 6, there is a relationship as in Equation 9 below.

[수학식 9] [ Equation 9 ]

d_1 ≥ d_2 ≥ d_3 ... ≥ d_kd_1 ≥ d_2 ≥ d_3 ... ≥ d_k

또한, 상태를 잔존 지연시간으로 표현하면 상기 수학식 7에 따라 상태 = (r_1, r_2, ... , r_k)로 나타낼 수 있는데, 이 때에도 다음 수학식 10과 같은 관계가 성립한다.In addition, when the state is expressed in terms of the remaining delay time, it can be expressed as state = (r_1, r_2, ..., r_k) according to Equation 7, and also in this case, the relationship shown in Equation 10 is established.

[수학식 10] [ Equation 10 ]

r_1 ≤ r_2 ≤ r_3 ... ≤ r_kr_1 ≤ r_2 ≤ r_3 ... ≤ r_k

큐-러닝 알고리즘을 운영하는데 전체 상태를 생성해야 되는데 상기의 성질을 착안하면 불가능 상태를 제거하면 메모리의 절약 및 실행속도를 빠르게 할 수 있다. 불가능 상태라 함은 예를 들어 k=5이고 지연시간으로 표현한 경우, (3,4,1,2,5)와 같은 경우를 말하는데 나중에 도착한 패킷이 지연시간이 길 수 없으므로 불가능 상태인 것이다. In order to operate a queue-learning algorithm, the entire state must be created. Considering the above properties, removing the impossible state can save memory and speed up execution. The impossible state is, for example, k=5 and expressed as a delay time, such as (3,4,1,2,5), which is an impossible state because a packet arriving later cannot have a long delay time.

다음, 선택적으로 중첩되어 적용가능한 스케쥴링 방법들에 대하여 기술한다.Next, scheduling methods selectively overlapping and applicable will be described.

첫번째 스케쥴링 방법에서는, 다수의 센서노드들에 대한 큐-러닝 학습은 각 센서별로 개별적으로 수행한다. 즉 N개의 센서노드가 있으면 싱크노드에는 N개의 개별적인 큐-러닝 모듈이 존재한다. 이는 센서노드의 이동성, 생성/사멸의 빈번함을 감안하여 통합하여 학습하는 것에 비해 장점이 있다. In the first scheduling method, queue-learning learning for multiple sensor nodes is performed individually for each sensor. That is, if there are N sensor nodes, there are N individual queue-learning modules in the sink node. This has an advantage over learning through integration considering the mobility of sensor nodes and the frequency of creation/death.

두번째 스케쥴링 방법은, N개의 센서노드에 대한 개별적인 큐-러닝 학습결과 행동에 대한 불일치가 발생하는 경우를 대비한 것이다. 이때 우선순위는 탐험 (exploration)이 아닌 활용 (exploitation)으로부터 도출된 행동에 있다. The second scheduling method prepares for the case of inconsistency in the behavior of individual queue-learning learning results for N sensor nodes. In this case, the priority lies in actions derived from exploitation, not exploration.

예를 들어 각 센서 노드에 대한 행동이 큐-러닝을 통해서 다음과 같이 도출되면, 센서노드 1, 2, 3만 고려하여 공통적인 프레임 길이를 결정하는 것이다. For example, if the behavior of each sensor node is derived as follows through queue-learning, the common frame length is determined by considering only sensor nodes 1, 2, and 3.

센서노드 1: (L_1, n_1) <- exploitationSensor node 1: (L_1, n_1) <- exploitation

센서노드 2: (L_2, n_2) <- exploitationSensor node 2: (L_2, n_2) <- exploitation

센서노드 3: (L_3, n_3) <- exploitationSensor node 3: (L_3, n_3) <- exploitation

센서노드 4: (L_4, n_4) <- exploration Sensor node 4: (L_4, n_4) <- exploration

센서노드 5: (L_5, n_5) <- exploration Sensor node 5: (L_5, n_5) <- exploration

공통의 프레임길이를 결정하는 방법은 상기 예에서는, ①L_1, L_2, L_3의 최소값으로 결정하거나, 또는, ②Mean-square error를 최소화하는 L 값을 사용토록 결정할 수 있다. As for the method of determining the common frame length, in the above example, ① the minimum value of L_1, L_2, L_3 may be determined, or ② may be determined to use an L value that minimizes the mean-square error.

상기 방법에 의해서 정해진 L(프레임길이)에 전송할 패킷의 수 (n_1 + n_2 + ... + n_5)를 수용할 수 없는 경우에는, ①각 n_i를 비례식으로 감소하여 floor한 값을 선택하거나(예컨대 10% 감소가 필요한 경우, floor(0.9*n_i)로 계산), 또는 ②전송할 노드의 잔존 지연시간이 상대적으로 큰 경우부터 n 값을 감소시키는 형식으로 결정할 수 있다.If the number of packets to be transmitted (n_1 + n_2 + ... + n_5) cannot be accommodated in L (frame length) determined by the above method, ① select a floor value by decreasing each n_i proportionally (for example, When a 10% reduction is required, it can be determined in the form of reducing the value of n from floor(0.9*n_i)), or ② when the remaining delay time of the node to be transmitted is relatively large.

세번째 스케쥴링 방법은, 새로운 센서 노드의 연결시 학습이 없는 상태에서 큐-러닝을 운영하는 것이 아닌 기존의 학습이 가장 활발한 노드의 가치 테이블 (value table)를 복사함으로써 학습 속도를 신속히 하는 것이다.The third scheduling method is to speed up the learning rate by copying the value table of the node where the existing learning is most active, rather than running queue-learning in a state where there is no learning when a new sensor node is connected.

상기 프레임 크기와 전송할 패킷의 수를 센서노드들에게 전송하는 단계(S170)에서는, 상기 싱크 노드에서 상기 스케쥴링 방법으로 생성된 스케쥴링 정보로서 각 센서 노드에 프레임 크기와 전송할 패킷의 수를 지정하여 전송할 수 있다.In the step (S170) of transmitting the frame size and the number of packets to be transmitted to the sensor nodes, as the scheduling information generated by the scheduling method in the sink node, the frame size and the number of packets to be transmitted can be specified and transmitted to each sensor node. have.

예컨대, 싱크 노드에서 행동을 전달하는 방안으로서, 싱크 노드는 프레임의 시작을 행동에 대한 정보(프레임길이, 전송할 패킷수)를 알리는 패킷으로 시작할 수 있다. 이때, 스케쥴링할 패킷의 수는 상기 패킷으로 시작하거나 블루투스의 폴링 형태로 센서노드를 지명하여 스케쥴링할 수 있다. For example, as a method of transmitting an action from a sink node, the sink node may start a frame with a packet that informs information about the action (frame length, number of packets to be transmitted). In this case, the number of packets to be scheduled may start with the above packet or may be scheduled by designating a sensor node in the form of Bluetooth polling.

다음, 상기 스케쥴링 정보를 전송받은 센서 노드에서의 동작에 대하여 기술한다.Next, an operation in the sensor node that has received the scheduling information will be described.

상기 각 센서 노드에서 전송받은 프레임 크기와 패킷의 수를 반영하여 센싱 데이터가 실린 패킷을 전송하고, 각 센서 노드는 패킷을 전송한 이후에는 다음 프레임이 시작할 때까지 수면모드(sleeping mode)을 유지하면서 에너지를 절감하게 된다. Each sensor node transmits a packet loaded with sensing data by reflecting the frame size and the number of packets received from each sensor node, and after transmitting the packet, each sensor node maintains a sleeping mode until the next frame starts. will save energy.

다음, 상기 전송받은 프레임 크기와 패킷의 수를 반영한 센싱 패킷 전송의 결과들을 반영하여, 강화학습을 위한 센서 노드들의 다음 상태 정보를 수집하고, 상기 싱크 노드는 수집된 상기 다음 상태 정보를 강화학습으로서 큐 러닝에 반영한다(S180).Next, by reflecting the results of the sensing packet transmission reflecting the received frame size and the number of packets, the next state information of the sensor nodes for reinforcement learning is collected, and the sink node uses the collected next state information as reinforcement learning. It is reflected in the queue learning (S180).

한편, 각 센서 노드에서 제로 패킷에 대한 처리 방안으로서, 센서 노드에 보낼 데이터가 존재하지 않는 경우, 데이터가 없는 짧은 제어 패킷을 통해서 상태를 전달할 수 있다.Meanwhile, as a processing method for the zero packet in each sensor node, when there is no data to be sent to the sensor node, the state may be transmitted through a short control packet without data.

도 5는 본 발명의 사상에 따른 사물 인터넷 패킷 스케쥴링 제어 방법을 수행할 수 있는 싱크 노드 장치의 일 실시예를 도시한다.5 illustrates an embodiment of a sink node apparatus capable of performing the IoT packet scheduling control method according to the spirit of the present invention.

도시한 싱크 노드 장치는, 각 센서 노드로부터 센싱 데이터가 담긴 패킷을 수신하는 센서 노드 수신부(120); 각 센서 노드에서의 데이터 전송을 위해 설계된 보상 함수 및 각 센서 노드의 상태 정보가 기록된 저장부(180); 각 센서 노드로부터 현재 전송 대기중인 패킷의 상태를 해당 센서 노드로 전송하고, 각 센서 노드에 대하여 결정된 프레임 크기 및 패킷 수를 전송하는 센서 노드 송신부(160); 강화 학습(큐-러닝) 기법을 이용하여 각 센서 노드 상태에 대한 프레임 크기 및 패킷 수를 학습하는 강화 학습부(140); 및 상기 학습부의 학습된 결과에 따라 각 센서 노드에 대하여 프레임 크기 및 패킷 수를 결정하는 스케쥴러(150)를 포함할 수 있다.The illustrated sink node device includes: a sensor node receiving unit 120 for receiving a packet containing sensing data from each sensor node; a storage unit 180 in which a compensation function designed for data transmission in each sensor node and state information of each sensor node are recorded; a sensor node transmitter 160 that transmits a state of a packet currently waiting to be transmitted from each sensor node to the corresponding sensor node, and transmits the determined frame size and number of packets to each sensor node; a reinforcement learning unit 140 for learning the frame size and the number of packets for each sensor node state using a reinforcement learning (queue-learning) technique; and a scheduler 150 that determines the frame size and the number of packets for each sensor node according to the learned result of the learning unit.

상기 저장부(180)는, 상기 싱크 노드 장치에 내장된 저장 매체(예: 플래시 메모리)로 구현될 수 있다.The storage unit 180 may be implemented as a storage medium (eg, a flash memory) embedded in the sink node device.

본 발명의 사상에 따른 사물 인터넷 패킷 스케쥴링 제어 방법을 수행하는데 필요한 각 센서 노드에서의 데이터 전송을 위한 보상 함수는 외부의 관리 서버 등에서 설계되어, 상기 저장부(180)에 기록될 수 있다. 보상 함수가 설계된 후, 해당 보상 함수에서 적용하는 각 센서 노드의 상태 정보도 상기 저장부(180)에 함께 기록될 수 있다.A compensation function for data transmission in each sensor node required to perform the IoT packet scheduling control method according to the spirit of the present invention may be designed in an external management server or the like and recorded in the storage unit 180 . After the compensation function is designed, the state information of each sensor node applied by the corresponding compensation function may also be recorded in the storage unit 180 .

상기 센서 노드 송신부(160)는, 프레임을 통보하는 방송(broadcast)을 수행할 수 있다.The sensor node transmitter 160 may perform a broadcast of notifying a frame.

상기 센서 노드 수신부(120)와 상기 센서 노드 송신부(160)는 단일 하드웨어인 통신 모듈로 구현될 수 있지만, 서로 수행하는 기능에 있어서 차별성이 높아, 별개의 독립된 구성요소로 표현하였다.Although the sensor node receiving unit 120 and the sensor node transmitting unit 160 may be implemented as a single hardware communication module, they are highly differentiated in the functions they perform, and are expressed as separate and independent components.

구현에 따라, 상기 강화 학습부(140) 및 상기 스케쥴러(150)는, 하나의 단일 CPU에서 실행되는 SW 블록들로 구성되거나, 서로 다른 하드웨어로 구성될 수 있다.Depending on the implementation, the reinforcement learning unit 140 and the scheduler 150 may be composed of SW blocks executed by one single CPU or may be composed of different hardware.

후자의 경우, 상기 스케쥴러(150)는 전체 싱크 노드 장치의 일반적인 제어를 수행하는 CPU이고, 상기 강화 학습부(140)는 강화 학습을 위한 전용 하드웨어/소프트웨어일 수 있다. 예컨대, 상기 강화 학습부(140)는 강화 학습 수행을 위한 전용 칩이거나, 외부의 온라인 강화 학습 서버에 강화 학습을 요청하여 그 결과를 수령하는 통신 모듈일 수 있다.In the latter case, the scheduler 150 may be a CPU that performs general control of the entire sink node device, and the reinforcement learning unit 140 may be dedicated hardware/software for reinforcement learning. For example, the reinforcement learning unit 140 may be a dedicated chip for performing reinforcement learning, or a communication module that requests reinforcement learning from an external online reinforcement learning server and receives the result.

상기 강화 학습부(140)는, 상기 싱크 노드 장치가 담당하는 센서 노드들 각각에 대하여 강화 학습을 수행할 수 있다. 이 경우, 강화 학습은 지속적으로 계속 수행되는 바, 각 센서 노드에 대하여 현재 시점까지 완료된 강화 학습 결과를, 상기 저장부(180)에 각 센서 노드를 위해 할당된 영역에 저장할 수 있다.The reinforcement learning unit 140 may perform reinforcement learning on each of the sensor nodes in charge of the sink node device. In this case, since reinforcement learning is continuously performed, the reinforcement learning results completed up to the current point in time for each sensor node may be stored in the storage unit 180 in an area allocated for each sensor node.

본 발명이 속하는 기술 분야의 당업자는 본 발명이 그 기술적 사상이나 필수적 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있으므로, 이상에서 기술한 실시 예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로서 이해해야만 한다. 본 발명의 범위는 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 등가개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.Those skilled in the art to which the present invention pertains should understand that the present invention can be embodied in other specific forms without changing the technical spirit or essential characteristics thereof, so the embodiments described above are illustrative in all respects and not restrictive. only do The scope of the present invention is indicated by the following claims rather than the detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be construed as being included in the scope of the present invention. .

120 : 센서 노드 수신부
140 : 강화 학습부
150 : 스케쥴러
160 : 센서 노드 송신부
180 : 저장부120: sensor node receiver
140: reinforcement learning unit
150 : Scheduler
160: sensor node transmitter
180: storage

Claims

A method for controlling IoT packet scheduling performed in a wireless sensor network system in which a plurality of sensor nodes transmit data centering on a sink node, the method comprising:
transmitting the state of the data packet currently waiting to be transmitted from the sensor node to the sink node;
learning the state of the data packet of the sensor node using reinforcement learning at the sink node;
scheduling data packet delivery from a plurality of sensor nodes according to the learned result; and
Transmitting the frame size to be implemented by each sensor node determined by scheduling and the number of packets to be transmitted to each sensor node
including,
In the scheduling step,
When a new sensor node is connected, an IoT packet scheduling control method that copies the value table of the sensor node that has the most active learning.

According to claim 1,
applying the transmission result by applying the frame size determined in transmission from each sensor node and the number of packets to be transmitted to the reinforcement learning;
Internet of Things packet scheduling control method further comprising a.

According to claim 1,
Designing a compensation function for data transmission of the sensor node
Internet of Things packet scheduling control method further comprising a.

4. The method of claim 3,
In the step of designing the compensation function,
An IoT packet scheduling control method that uses a queue-learning technique to design a reward function to maximize network lifespan and transmission rate.

According to claim 1,
The reinforcement learning is
Applying a delay time of a data packet transmitted from the sensor node to the sink node as a state,
Applying the frame length and the number of packets for data packets transmitted from the sensor node to the sink node as an action,
A method for controlling IoT packet scheduling, which is a queue-learning technique that applies the quality of data packet delivery from the sensor node to the sink node as a reward.

6. The method of claim 5,
The state (state) is an IoT packet scheduling control method set according to the following equation.
state = (d_1, d_2, ... , d_k)
(d_k is the delay of the kth packet in the queue)

6. The method of claim 5,
The state (state) is an IoT packet scheduling control method set according to the following equation.
state = (r_1, r_2, ... , r_k)
(r_k is the remaining delay time of the kth packet in the queue)

6. The method of claim 5,
The action (action) is an IoT packet scheduling control method set according to the following equation.
action = (L, n_1, n_2, ... n_N)
(L is the length of the frame, n_k is the number of packets that the kth sensor node will send in this frame)

6. The method of claim 5,
The reward is set according to the following equation.
Reward = L x QoS
(L is the length of the frame, QoS is an index indicating how effective the number of packets to transmit for each sensor node indicated through actions)

According to claim 1,
The scheduling step is
determining a frame size according to the learned result; and
Determining the number of packets to be transmitted in each frame for each sensor node according to the learned result
Internet of Things packet scheduling control method comprising a.

According to claim 1,
In the scheduling step,
Reinforcement learning for a plurality of sensor nodes is performed individually for each sensor node, but when there is a discrepancy in behavior as a result of learning, a method for controlling an IoT packet scheduling that gives priority to behavior derived from exploitation .

delete

a sensor node receiving unit for receiving a packet containing sensing data from each sensor node;
a storage unit in which a compensation function designed for data transmission in each sensor node and state information of each sensor node are recorded;
a sensor node transmitter for transmitting the determined frame size and number of packets to each sensor node;
a reinforcement learning unit for learning the frame size and the number of packets for each sensor node state using reinforcement learning; and
A scheduler that determines the frame size and the number of packets for each sensor node according to the learned result of the learning unit
including,
The reinforcement learning unit,
A sink that continuously performs reinforcement learning for each of the sensor nodes in charge of the sink node device, and stores the reinforcement learning results completed up to the current time for each sensor node in an area allocated for each sensor node in the storage unit node device.

14. The method of claim 13,
The reinforcement learning is
Applying a delay time of a data packet transmitted from the sensor node to the sink node as a state,
Applying the frame length and the number of packets for data packets transmitted from the sensor node to the sink node as an action,
A sink node device, which is a queue-learning technique that applies the quality of data packet delivery from the sensor node to the sink node as a reward.

delete