KR102127652B1

KR102127652B1 - Publish and subscribe communication system and method thereof

Info

Publication number: KR102127652B1
Application number: KR1020180131541A
Authority: KR
Inventors: 김재훈; 홍서희
Original assignee: 아주대학교산학협력단
Priority date: 2018-10-31
Filing date: 2018-10-31
Publication date: 2020-06-29
Also published as: KR20200048907A

Abstract

본 발명의 기술적 사상에 의한 일 양태에 따른 발행(Publish) 및 구독(subscribe) 통신 시스템의 동작 방법은 수신자(subscriber)의 상태에 대한 상태 벡터를 입력 받는 단계; 상기 입력된 상태 벡터(si) 및 상기 수신자의 대기 주기를 조절하기 위한 보상 함수(reward function)를 기초로, 상기 수신자의 루프 동작(loop operation)의 대기 주기(cycle)를 조절하기 위한 대기 주기 조정값을 생성하는 단계; 및 상기 생성된 대기 주기 조정값을 상기 수신자(subscriber)에 전달하는 단계를 포함할 수 있다. A method of operating a publish and subscribe communication system according to an aspect of the inventive concept includes receiving a state vector for a state of a subscriber; Adjusting the waiting period to adjust the waiting cycle of the loop operation of the receiver based on the input state vector si and a reward function for adjusting the waiting period of the receiver Generating a value; And passing the generated waiting period adjustment value to the subscriber.

Description

PUBLISH AND SUBSCRIBE COMMUNICATION SYSTEM AND METHOD THEREOF}

본 발명의 기술적 사상은 발행 및 구독 통신 시스템 및 그 동작 방법에 관한 것으로, 보다 상세하게는, 구독 주기를 조절하는 발행 및 구독 통신 시스템 및 그 동작 방법에 관한 것이다. The technical idea of the present invention relates to a publishing and subscribing communication system and a method for operating the same, and more particularly, to a publishing and subscribing communication system for adjusting a subscription cycle and a method for operating the same.

발행-구독 모델은 비동기 메시징 패러다임이다. The publish-subscribe model is an asynchronous messaging paradigm.

발행-구독 모델에서 발신자(publisher)의 메시지는 특별한 수신자(subscriber)가 정해져 있지 않다. In the publish-subscribe model, the message of the publisher is not specified for a specific subscriber.

발행된 메시지는 정해진 범주에 따라, 각 범주(topic)에 대한 구독을 신청한 수신자(subscriber)에게 전달된다. The issued message is delivered to subscribers who have subscribed to each category according to a predetermined category.

수신자는 발행자에 대한 지식이 없어도 원하는 메시지만을 수신할 수 있다. The recipient can only receive the desired message without having any knowledge of the publisher.

이러한 발행자와 수신자의 디커플링은 더 다이나믹한 네트워크 토폴로지와 높은 확장성을 허용한다.The decoupling of these publishers and receivers allows for a more dynamic network topology and high scalability.

정해지거나 적은 수의 클라이언트들이 존재하는 네트워크 환경과는 다른 엣지 네트워크(Edge Network)와 같은 미래형 네트워크 환경에서는, 다수의 클라이언트들이 존재하는 동적인 환경을 조성하고 있다. In a future network environment such as an edge network different from a network environment in which a predetermined number or a small number of clients exist, a dynamic environment in which a large number of clients exist is created.

발행-구독 모델에 따른 통신 구조는 사물인터넷(IoT) 환경에서 효과적인 프로토콜로서의 효율은 인정되고 있으나, 브로커(broker)에 연결(connection)을 유지하는 수신자의 수가 증가하면 작업에 부하가 걸리게 된다.The communication structure according to the publish-subscribe model is recognized as an effective protocol as an effective protocol in the Internet of Things (IoT) environment, but the workload increases when the number of recipients maintaining a connection to a broker increases.

또한, 수신자가 루프 오퍼레이션(loop operation)을 진행하여 연결을 유지하는 과정에 있어서, 동일한 토픽에 대한 메시지가 브로커에 발행(publish)되지 않는다면 불필요한 자원의 소모가 발생한다.In addition, in the process of the receiver maintaining a connection by performing a loop operation, unnecessary resources are consumed if a message for the same topic is not published to the broker.

본 발명의 기술적 사상에 따른 발행 및 구독 통신 시스템 및 그 동작 방법은 수신자의 대기 시간을 최적으로 제어하는데 목적이 있다.The publishing and subscribing communication system and its operating method according to the technical idea of the present invention have an object to optimally control a waiting time of a receiver.

또한, 본 발명은 심층신경망을 이용하여 수신자의 대기 주기를 조정하는데 목적이 있다.In addition, the present invention has an object to adjust the waiting period of the receiver using a deep neural network.

또한, 본 발명은 수신자의 환경을 상태 벡터로 정의하고 보상 함수를 적용하여 대기 시간을 조정하는데 목적이 있다.In addition, the present invention has an object to define a receiver's environment as a state vector and adjust a waiting time by applying a compensation function.

본 발명은 수신자의 대기 시간을 최적으로 제어하여, 수신자의 동작 효율 및 에너지 효율을 향상시키는데 목적이 있다.An object of the present invention is to improve the operation efficiency and energy efficiency of the receiver by optimally controlling the waiting time of the receiver.

본 발명의 기술적 사상에 따른 발행 및 구독 통신 시스템 및 그 동작 방법이 이루고자 하는 기술적 과제는 이상에서 언급한 과제(들)로 제한되지 않으며, 언급되지 않은 또 다른 과제(들)는 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The technical problem to be achieved by the publishing and subscribing communication system and its operation method according to the technical idea of the present invention is not limited to the above-mentioned task(s), and another task(s) not mentioned is from the following description Will be clearly understood.

예시적인 실시예에 따르면, 상기 상태 벡터(si)는 상기 수신자가 지정한 토픽(topic)에 해당하는 메시지의 수(mi), 상기 토픽에 해당하는 메시지 중 중요도가 높은 메시지의 수(fi), 및 상기 토픽에 해당하는 중요한 메시지의 대기 시간(t_i)을 포함할 수 있다. According to an exemplary embodiment, the state vector (si) is the number of messages (mi) corresponding to the topic (topic) specified by the recipient, the number of messages (fi) of high importance among the messages corresponding to the topic, and It may include a waiting time (t _i ) of the important message corresponding to the topic.

예시적인 실시예에 따르면, 상기 보상 함수는 상기 상태 벡터로부터 유도되는 하기의 수학식인 According to an exemplary embodiment, the compensation function is the following equation derived from the state vector

일 수 있다.Can be

예시적인 실시예에 따르면, 상기 대기 주기 조정값을 생성하는 단계는 상기 입력된 상태 벡터를 심층신경망에 입력하는 단계와, 상기 보상함수를 적용하여 상기 수신자의 대기 주기 조정 동작을 결정하는 단계와, 상기 보상 함수를 이용하여 상기 대기 주기값을 계산하는 단계를 포함할 수 있다.According to an exemplary embodiment, the generating of the waiting period adjustment value may include inputting the input state vector into a deep neural network, and determining the waiting period adjusting operation of the receiver by applying the compensation function, And calculating the waiting period value using the compensation function.

예시적인 실시예에 따르면, 상기 심층신경망은 Q-러닝(Q-learning) 방식 및 정책 그라디언트(Policy Gradient) 방식 중 적어도 하나의 강화 학습 방식이 적용될 수 있다.According to an exemplary embodiment, the deep neural network may be applied with at least one reinforcement learning method of a Q-learning method and a policy gradient method.

예시적인 실시예에 따르면, 상기 생성된 대기 주기 조정값을 상기 수신자에 전달하는 단계는 상기 수신자의 구독 동작 시 전달되어, 상기 수신자의 다음 주기에 대해 설정되도록 하는 단계를 포함할 수 있다.According to an exemplary embodiment, the step of delivering the generated waiting period adjustment value to the receiver may include transmitting the message when the receiver subscribes, and setting the next period of the receiver.

본 발명의 기술적 사상에 의한 다른 양태에 따른 발행(Publish) 및 구독(subscribe) 통신 시스템에 포함되는 브로커(broker)는 적어도 하나의 프로세서; 및 상기 프로세서에 전기적으로 연결된 메모리를 포함하고, 상기 메모리는, 상기 프로세서가 실행 시에, 수신자(subscriber)의 상태에 대한 상태 벡터를 입력 받고, 상기 입력된 상태 벡터(s_i) 및 상기 수신자의 대기 주기를 조절하기 위한 보상 함수(reward function)를 기초로, 상기 수신자의 루프 동작(loop operation)의 대기 주기(cycle)를 조절하기 위한 대기 주기 조정값을 생성하고, 상기 생성된 대기 주기 조정값을 상기 수신자(subscriber)에 전달하도록 하는 인스트럭션들을 저장할 수 있다. A broker included in a publishing and subscription communication system according to another aspect according to the technical spirit of the present invention includes at least one processor; And a memory electrically connected to the processor, wherein the memory, when the processor is running, receives a state vector for a state of a subscriber, and inputs the state vector (s _i ) and the recipient Based on a reward function for adjusting a waiting period, a waiting period adjustment value for adjusting a waiting period of a loop operation of the receiver is generated, and the generated waiting period adjustment value is generated. It is possible to store instructions to deliver the to the subscriber (subscriber).

예시적인 실시예에 따르면, 상기 상태 벡터(s_i)는 상기 수신자가 지정한 토픽(topic)에 해당하는 메시지의 수(m_i), 상기 토픽에 해당하는 메시지 중 중요도가 높은 메시지의 수(f_i), 및 상기 토픽에 해당하는 중요한 메시지의 대기 시간(t_i)을 포함할 수 있다.According to an exemplary embodiment, the status vector (s _i ) is the number of messages (m _i ) corresponding to the topic (topic) designated by the recipient, and the number of messages with high importance (f _i) among the messages corresponding to the topic ), and the waiting time t _i of the important message corresponding to the topic.

예시적인 실시예에 따르면, 상기 보상 함수는 상기 상태 벡터로부터 유도되는 하기의 수학식인According to an exemplary embodiment, the compensation function is the following equation derived from the state vector

일 수 있다.Can be

예시적인 실시예에 따르면, 상기 메모리는, 상기 프로세서가 실행 시에, 상기 입력된 상태 벡터를 심층신경망에 입력하고, 상기 보상함수를 적용하여 상기 수신자의 대기 주기 조정 동작을 결정하고, 상기 보상 함수를 이용하여 상기 대기 주기값을 계산하도록 하는 인스트럭션들을 저장할 수 있다.According to an exemplary embodiment, the memory, when the processor is running, inputs the input state vector into a deep neural network, applies the compensation function to determine an operation of adjusting the waiting period of the receiver, and the compensation function Instructions for calculating the waiting period value may be stored by using.

예시적인 실시예에 따르면, 상기 메모리는, 상기 프로세서가 실행 시에, Q-러닝(Q-learning) 방식 및 정책 그라디언트(Policy Gradient) 방식 중 적어도 하나의 강화 학습 방식을 상기 심층신경망에 적용하도록 하는 인스트럭션들을 저장할 수 있다.According to an exemplary embodiment, the memory, when the processor is running, to apply at least one reinforcement learning method of the Q-learning (Q-learning) method and the policy gradient (Policy Gradient) method to the deep neural network You can save your instructions.

예시적인 실시예에 따르면, 상기 메모리는, 상기 프로세서가 실행 시에, 상기 수신자의 구독 동작 시, 상기 대기 주기 조정값을 전달하여, 상기 수신자의 다음 주기에 대해 설정되도록 하는 인스트럭션들을 저장할 수 있다.According to an exemplary embodiment, the memory may store instructions that, when executed by the processor, transmit the waiting period adjustment value when the receiver subscribes, and set for the next period of the receiver.

본 발명의 기술적 사상에 의한 실시예들에 따른 발행 및 구독 통신 시스템 및 그 동작 방법은 수신자의 대기 시간을 최적으로 제어할 수 있다. The publication and subscription communication system according to embodiments of the present invention and the method of operation thereof can optimally control the waiting time of a receiver.

본 발명은 심층신경망을 이용하여 수신자의 대기 주기를 조절할 수 있다. The present invention can control the waiting period of the receiver by using the deep neural network.

또한, 본 발명은 수신자의 환경을 상태 벡터로 정의하고 보상 함수를 적용하여 대기 시간을 조정할 수 있다. In addition, the present invention can adjust the waiting time by defining the environment of the receiver as a state vector and applying a compensation function.

또한, 본 발명은 수신자의 대기 시간을 최적으로 제어하여, 수신자의 동작 효율 및 에너지 효율을 향상시킬 수 있다. In addition, the present invention can optimally control the waiting time of the receiver, thereby improving the operational efficiency and energy efficiency of the receiver.

본 명세서에서 인용되는 도면을 보다 충분히 이해하기 위하여 각 도면의 간단한 설명이 제공된다.
도 1은 본 발명의 다양한 실시예에 따른 발행 및 구독 통신 시스템에 대한 개념도이다.
도 2는 본 발명의 다양한 실시예에 따른 강화 학습 에이전트를 포함하는 발행 및 구독 통신 시스템에 대한 개념도이다.
도 3은 본 발명의 다양한 실시예에 따른 대기 주기 조정 판단 동작에 대한 흐름도이다.
도 4는 본 발명의 다양한 실시예에 따른 대기 주기 조정 동작에 대한 흐름도이다.
도 5는 본 발명의 다양한 실시예에 따른 조정된 주기에 대한 예시도이다.
도 6은 본 발명의 일 실시예에 따른 실험 결과에 대한 그래프이다.A brief description of each drawing is provided to better understand the drawings cited herein.
1 is a conceptual diagram of a publication and subscription communication system according to various embodiments of the present invention.
2 is a conceptual diagram of a publishing and subscribing communication system including a reinforcement learning agent according to various embodiments of the present invention.
3 is a flowchart of an operation for determining a waiting period adjustment according to various embodiments of the present invention.
4 is a flowchart of a standby period adjustment operation according to various embodiments of the present invention.
5 is an exemplary diagram of an adjusted cycle according to various embodiments of the present invention.
6 is a graph of experimental results according to an embodiment of the present invention.

본 발명의 기술적 사상은 다양한 변경을 가할 수 있고 여러 가지 실시 예를 가질 수 있는 바, 특정 실시 예들을 도면에 예시하고 이를 상세한 설명을 통해 상세히 설명하고자 한다. 그러나, 이는 본 발명의 기술적 사상을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 기술적 사상의 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.The technical idea of the present invention can be applied to various changes and can have various embodiments, and specific embodiments will be illustrated in the drawings and described in detail through detailed description. However, this is not intended to limit the technical spirit of the present invention to specific embodiments, and should be understood to include all modifications, equivalents, and substitutes included in the scope of the technical spirit of the present invention.

본 발명의 기술적 사상을 설명함에 있어서, 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 기술적 사상의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 본 명세서의 설명 과정에서 이용되는 숫자(예를 들어, 제1, 제2 등)는 하나의 구성요소를 다른 구성요소와 구분하기 위한 식별기호에 불과하다.In describing the technical spirit of the present invention, when it is determined that a detailed description of related known technologies may unnecessarily obscure the subject matter of the technical spirit of the present invention, the detailed description will be omitted. In addition, the numbers (for example, first, second, etc.) used in the description process of this specification are only identification symbols for distinguishing one component from other components.

또한, 본 명세서에서, 일 구성요소가 다른 구성요소와 "연결된다" 거나 "접속된다" 등으로 언급된 때에는, 상기 일 구성요소가 상기 다른 구성요소와 직접 연결되거나 또는 직접 접속될 수도 있지만, 특별히 반대되는 기재가 존재하지 않는 이상, 중간에 또 다른 구성요소를 매개하여 연결되거나 또는 접속될 수도 있다고 이해되어야 할 것이다.In addition, in this specification, when one component is referred to as "connected" or "connected" with another component, the one component may be directly connected to the other component, or may be directly connected, but in particular It should be understood that, as long as there is no objection to the contrary, it may or may be connected via another component in the middle.

또한, 본 명세서에 기재된 "~부", "~기", "~자", "~모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 프로세서(Processor), 마이크로 프로세서(Micro Processor), 어플리케이션 프로세서(Application Processor), 마이크로 컨트롤러(Micro Controller), CPU(Central Processing Unit), GPU(Graphics Processing Unit), APU(Accelerate Processor Unit), DSP(Digital Signal Processor), ASIC(Application Specific Integrated Circuit), FPGA(Field Programmable Gate Array) 등과 같은 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다. In addition, terms such as “~ unit”, “~ group”, “~ ruler”, and “~ module” described in the present specification mean a unit that processes at least one function or operation, which is a processor or microprocessor. Processor (Micro Processor), Application Processor (Application Processor), Micro Controller (Micro Controller), CPU (Central Processing Unit), GPU (Graphics Processing Unit), APU (Accelerate Processor Unit), DSP (Digital Signal Processor), ASIC ( It may be implemented by hardware or software such as an application specific integrated circuit (FPGA), a field programmable gate array (FPGA), or a combination of hardware and software.

그리고 본 명세서에서의 구성부들에 대한 구분은 각 구성부가 담당하는 주기능 별로 구분한 것에 불과함을 명확히 하고자 한다. 즉, 이하에서 설명할 2개 이상의 구성부가 하나의 구성부로 합쳐지거나 또는 하나의 구성부가 보다 세분화된 기능별로 2개 이상으로 분화되어 구비될 수도 있다. 그리고 이하에서 설명할 구성부 각각은 자신이 담당하는 주기능 이외에도 다른 구성부가 담당하는 기능 중 일부 또는 전부의 기능을 추가적으로 수행할 수도 있으며, 구성부 각각이 담당하는 주기능 중 일부 기능이 다른 구성부에 의해 전담되어 수행될 수도 있음은 물론이다.In addition, it is intended to clarify that the division of the constituent parts in this specification is only classified according to main functions of each constituent part. That is, two or more components to be described below may be combined into one component, or one component may be divided into two or more for each subdivided function. In addition, each of the constituent parts to be described below may additionally perform some or all of the functions of other constituent parts in addition to the main functions of the constituent parts, and some of the main functions of the constituent parts are different. Needless to say, it may also be carried out exclusively by.

본 발명은 MQTT 프로토콜이 가지는 발행/구독(Publish/Subscribe) 통신 구조에 있어서, 트래픽이 이벤트 기반으로 일어나는(event-driven traffic generation) 환경에 최적화된 강화 학습 기반의 발행/구독 구조를 제안한다. The present invention proposes a publish/subscribe structure based on reinforcement learning that is optimized for an event-driven traffic generation environment in a publish/subscribe communication structure of the MQTT protocol.

본 발명은 기존의 루프 구조에서 강화 학습을 통해 사물 인터넷 환경에서의 수신자(subscriber)의 효율적인 자원 활용을 보장한다. The present invention ensures efficient resource utilization of subscribers in the IoT environment through reinforcement learning in the existing loop structure.

본 발명은, 수신자(subscriber)가 토픽을 등록하고 브로커(broker)는 토픽을 저장한 후 수신자가 일정 시간 간격으로 브로커에 접속하여, 수신자가 관심있는 토픽의 메시지를 한번에 수신하는 형태에서, 수신자의 대기시간을 최적으로 제어하는 것을 목적으로 한다.In the present invention, the subscriber registers a topic and the broker saves the topic, and then the receiver accesses the broker at regular time intervals, and the receiver receives messages of topics of interest at a time. The purpose is to optimally control the waiting time.

본 발명은 이러한 목적 달성을 위해, 수신자의 상태(state) 벡터를 정의하고, 상태를 심층 신경망(Deep Neural Network)에 입력하여 대기시간의 기준값을 조정할 수 있다. 본 발명은 대기시간 조정에 대해, 보상함수(Reward Function)을 적용하여, 대기시간 조정의 방향을 결정할 수 있다. In order to achieve this object, the present invention may define a receiver's state vector, and input a state into a deep neural network to adjust a reference value of latency. The present invention can determine the direction of the standby time adjustment by applying a reward function to the standby time adjustment.

한편, 본 발명은 MQTT 프로토콜 이외의, 발행-구독 모델의 다양한 통신 프로토콜에 적용될 수 있다. 다만, 후술할 내용에서는 설명의 용이함을 위해 MQTT 프로토콜을 예로 설명하기로 한다.Meanwhile, the present invention can be applied to various communication protocols of the publish-subscribe model other than the MQTT protocol. However, in the following description, for ease of explanation, the MQTT protocol will be described as an example.

MQTT 프로토콜은 발행/구독 통신 구조를 이용하여 M2M(Machine-to-machine)과 사물 인터넷 환경에서 효율적인 통신을 수행하도록 할 수 있다. The MQTT protocol enables efficient communication in a machine-to-machine (M2M) and Internet of Things environment using a publish/subscribe communication structure.

MQTT 프로토콜은 낮은 전력, 낮은 대역폭 환경에서도 통신이 가능하다.The MQTT protocol can communicate even in low power and low bandwidth environments.

MQTT 프로토콜은 MQTT 클라이언트와 MQTT 브로커 사이에 연결(Connect) 및 분리(Disconnect)하는 과정을 갖고 있으며, 연결이 이루어진 후 기본적으로 구독 및 (Subscribe) 및 구독 취소(Unsubscribe), 그리고 발행(Publish)을 통해서 연결 및 메시지를 송수신하도록 할 수 있다. The MQTT protocol has a process of connecting and disconnecting between the MQTT client and the MQTT broker. After the connection is established, it is basically subscribed, unsubscribed, and published. Connections and messages can be sent and received.

이하, 본 발명의 기술적 사상에 의한 실시예들을 차례로 상세히 설명한다.Hereinafter, embodiments according to the technical spirit of the present invention will be described in detail.

도 1은 본 발명의 다양한 실시예에 따른 발행 및 구독 통신 시스템에 대한 개념도이다.1 is a conceptual diagram of a publication and subscription communication system according to various embodiments of the present invention.

도 1을 참조하면, 발행 및 구독 통신 시스템은 발행자(publisher, 100), 브로커(broker, 200) 및 수신자(subscriber, 300)를 포함할 수 있다. 예를 들면, 발행 및 구독 통신 시스템은 1개 이상의 발행자(100)와 1개 이상의 수신자(300) 및 브로커(200)로 구성될 수 있다.Referring to FIG. 1, a publishing and subscribing communication system may include a publisher (publisher, 100), a broker (broker, 200), and a receiver (subscriber, 300). For example, the publication and subscription communication system may be composed of one or more publishers 100 and one or more recipients 300 and brokers 200.

발행자(100)는 특정한 메시지를 보낼 수 있고, 수신자(200)는 해당 메시지를 수신할 수 있다. 예를 들면, 수신자(200)는 구독한 토픽(topic)에 대한 메시지를 수신할 수 있다. The issuer 100 can send a specific message, and the receiver 200 can receive the message. For example, the receiver 200 may receive a message for a subscribed topic.

일 실시예로, 발행자(100)는 센서일 수 있고, 수신자(200)는 액츄에이터(actuator)일 수 있다.In one embodiment, the issuer 100 may be a sensor, and the receiver 200 may be an actuator.

수신자(300)는 내부의 루프 오퍼레이션(loop operation)을 지속적으로 수행하여, 연결(connection)을 유지하고, 수신하고자 하는 토픽(topic) 이름을 가지고 브로커(200)에 구독(subscribe)할 수 있다.The receiver 300 may perform an internal loop operation continuously, maintain a connection, and subscribe to the broker 200 with a topic name to be received.

브로커(200)는 적어도 하나의 발행자(100)가 송신하는 발행 메시지를, 발행 메시지에 적힌 토픽 이름을 기준으로 보관할 수 있다.The broker 200 may store an issuance message transmitted by at least one issuer 100 based on a topic name written in the issuance message.

수신자(300)는 루프 오퍼레이션을 통해 계속적으로 브로커(200)에 보관된 발행 메시지의 토픽 이름을 검색할 수 있고, 수신하고자 하는 토픽 이름과 동일한 토픽 이름을 갖는 발행 메시지가 확인되면, 확인된 발행 메시지를 수신자(300)에 발행(publish)하도록 브로커(200)에 요청할 수 있다.The receiver 300 may continuously search for the topic name of the publication message stored in the broker 200 through the loop operation, and when the publication message having the same topic name as the topic name to be received is confirmed, the confirmed publication message It may request the broker 200 to publish to the receiver (300).

브로커(200)는 수신자(300)의 요청에 따라, 동일한 토픽을 가지는 수신자(300)에 발행자(100)의 메시지를 발행할 수 있다.The broker 200 may issue a message of the issuer 100 to the receiver 300 having the same topic, at the request of the receiver 300.

이와 같은 발행 및 구독 통신 시스템은 일정한 수의 클라이언트들이 존재하고, 제한된 목적을 갖고 메시지가 발신/수신되는 사물 인터넷 환경에서 메시지를 표현(represent)할 수 있는 작은 크기의 확인자(identifier)인 토픽을 사용하여 메시지를 인식하고 송수신할 수 있어서, 다른 프로토콜보다 경량화된 프로토콜로 활용될 수 있다.Such a publication and subscription communication system uses a topic that is a small sized identifier that can represent a message in an Internet of Things environment in which a certain number of clients exist and the message is sent/received for a limited purpose. By recognizing and transmitting and receiving messages, it can be used as a lightweight protocol than other protocols.

본 발명에 따라 수신자(300)의 대기 주기를 조정하기 위해, 강화 학습 에이전트(reinforcement learning agent, 400)가 포함될 수 있다.In order to adjust the waiting period of the receiver 300 according to the present invention, a reinforcement learning agent 400 may be included.

강화 학습 에이전트(400)에 대해 도 2를 참조하여 설명한다.The reinforcement learning agent 400 will be described with reference to FIG. 2.

도 2는 본 발명의 다양한 실시예에 따른 강화 학습 에이전트를 포함하는 발행 및 구독 통신 시스템에 대한 개념도이다.2 is a conceptual diagram of a publishing and subscribing communication system including a reinforcement learning agent according to various embodiments of the present invention.

도 2를 참조하면, 본 발명은 강화 학습을 적용하기 위한 강화 학습 에이전트(400)를 포함할 수 있고, 강화 학습 에이전트(400)는 브로커(200) 상에 위치할 수 있다.Referring to FIG. 2, the present invention may include a reinforcement learning agent 400 for applying reinforcement learning, and the reinforcement learning agent 400 may be located on the broker 200.

강화 학습 에이전트(400)는 1) 미리 학습한 심층신경망을 탑재할 수 있고, 2) 수신자(300)로부터 실시간으로 발생하는 메시지로부터 실시간 학습할 수 있다. The reinforcement learning agent 400 may 1) mount a deep network that has been previously learned, and 2) learn in real time from messages generated in real time from the receiver 300.

강화 학습 에이전트(400)는 브로커(200) 내의 대기 주기(cycle) 조정 모듈과 연결되어, 수신자(300)의 현재 상태를 심층신경망에 입력하고, 입력에 따라 조정된 대기 주기를 전달할 수 있다.The reinforcement learning agent 400 may be connected to a waiting cycle adjustment module in the broker 200, input the current state of the receiver 300 into the deep neural network, and transmit the adjusted waiting cycle according to the input.

한편, 본 발명은 수신자(300)의 대기 주기 조정이 필요한지 판단하고, 조정이 필요하다고 판단되면 후술할 대기 주기 조정값을 반영하도록 할 수 있다.On the other hand, the present invention may determine whether the adjustment of the waiting period of the receiver 300 is necessary, and if it is determined that the adjustment is necessary, it is possible to reflect the waiting period adjustment value to be described later.

도 3을 참조하여, 본 발명의 다양한 실시예에 따른 대기 주기 조정 판단 및 그에 따른 동작에 대해 설명한다. With reference to FIG. 3, the determination of the waiting period adjustment according to various embodiments of the present invention and the operation according thereto will be described.

도 3은 본 발명의 다양한 실시예에 따른 대기 주기 조정 판단 동작에 대한 흐름도이다. 3 is a flowchart of an operation for determining a waiting period adjustment according to various embodiments of the present invention.

수신자(300)는 구독하고자 하는 토픽(topic)을 등록(register)할 수 있다(S110). 브로커(200)는 수신자가 등록한 토픽을 레지스트리(registry)에 저장할 수 있다(S130). 브로커(200)는 구독 대기 시간을 수신자(300)의 대기 주기와 비교할 수하고있다(S150). 비교 결과 대기 주기가 구독 대기 시간보다 짧은 경우, 수신자(300)는 구독 대기할 수 있다(S160). 그리고 수신자(300)는 대기 주기에 따라, 등록된 토픽에 대한 메시지를 발행 받을 수 있다. 수신자(300)는 메시지 발행 시, 대기 주기 조정값을 반영하는 대기 주기 업데이트를 수행할 수 있다(S170). 예를 들면, 수신자(300)는 브로커(200)로부터 등록된 토픽에 대한 메시지와 함께 대기 주기 조정값을 수신할 수 있고, 수신된 대기 주기 조정값을 반영하여 대기 주기를 변경할 수 있다. 이에 따라 수신자(300)는 다음 주기에 대해 조정된 대기 주기로 동작할 수 있다.The receiver 300 may register a topic to be subscribed (S110). The broker 200 may store topics registered by the recipient in a registry (S130). The broker 200 compares the waiting time of the subscription with the waiting period of the receiver 300 (S150). As a result of comparison, if the waiting period is shorter than the waiting time for the subscription, the receiver 300 may wait for the subscription (S160). In addition, the receiver 300 may receive a message for the registered topic according to the waiting period. When the message 300 is issued, the receiver 300 may perform a waiting period update that reflects the waiting period adjustment value (S170). For example, the receiver 300 may receive a waiting period adjustment value together with a message for a topic registered from the broker 200, and may change the waiting period by reflecting the received waiting period adjustment value. Accordingly, the receiver 300 may operate with a waiting period adjusted for the next period.

한편, 단계 S150에서, 구독 대기 시간이 대기 주기보다 길다면, 수신자(300)는 현재의 대기 주기에 따라 브로커(200)에 접속하여 등록된 토픽에 대한 메시지를 발행 받을 수 있다(S190).On the other hand, in step S150, if the subscription waiting time is longer than the waiting period, the receiver 300 may access the broker 200 according to the current waiting period and receive a message on the registered topic (S190).

이와 같이, 본 발명은 발행자(100)가 토픽에 대해 메시지를 브로커(200)에 전송하는 주기와 관련된 구독 대기 시간과 수신자(300)의 대기 주기를 비교하여, 대기 주기 업데이트가 필요한지 판단하고, 필요한 경우에 수신자(300)의 대기 주기를 업데이트할 수 있다.As described above, the present invention compares the waiting period of the subscription 300 and the waiting period of the receiver 300 related to the period in which the publisher 100 transmits a message to the broker 200 for a topic, determines whether a waiting period update is necessary, In this case, the waiting period of the receiver 300 may be updated.

이하, 본 발명의 다양한 실시예에 따른, 수신자(300)의 대기 주기를 업데이트하기 위한 대기 주기 조정값을 산출하는 내용을 설명한다.Hereinafter, contents of calculating a waiting period adjustment value for updating the waiting period of the receiver 300 according to various embodiments of the present invention will be described.

도 4는 본 발명의 다양한 실시예에 따른 대기 주기 조정 동작에 대한 흐름도이다.4 is a flowchart of a standby period adjustment operation according to various embodiments of the present invention.

도 4를 참조하면, 강화 학습 에이전트(400)는 상태 벡터를 입력 받을 수 있다(S210).Referring to FIG. 4, the reinforcement learning agent 400 may receive a state vector (S210).

예를 들면, 브로커(200) 상의 강화 학습 에이전트(400)는 수신자(300)의 상태와 관련된 상태 벡터를 입력 받을 수 있다.For example, the reinforcement learning agent 400 on the broker 200 may receive a state vector related to the state of the receiver 300.

일 실시예로, 강화 학습 에이전트(400)는 수신자(300)가 지정한 토픽에 해당하는 메시지의 수(m_i), 지정한 토픽에 해당하는 메시지 중 중요도가 높은 메시지의 수(f_i) 및 지정한 토픽에 해당하는 중요한 메시지의 대기 시간(t_i)을 포함하는 상태 벡터를 입력 받을 수 있다. 그리고 이러한 상태 벡터는 브로커(200)가 수신자(300)로부터 수신한 상태 정보로부터 획득될 수 있다.In one embodiment, the reinforcement learning agent 400 includes the number of messages (m _i ) corresponding to the topic specified by the receiver 300 (m _i ), the number of messages (f _i ) of high importance among the messages corresponding to the specified topic, and the specified topic. A state vector including the waiting time (t _i ) of an important message corresponding to may be received. In addition, such a state vector may be obtained from state information received from the receiver 300 by the broker 200.

상술한 상태 벡터는 아래와 같이 표현할 수 있다.The above-described state vector can be expressed as follows.

S_i= [m_i, f_i, t_i]S _i = [m _i , f _i , t _i ]

강화 학습 에이전트(400)는 입력된 상태 벡터를 심층신경망(Deep Neural Network)에 입력할 수 있다(S230).The reinforcement learning agent 400 may input the input state vector into a deep neural network (S230).

수신자(300)의 대기 주기를 조절하기 위한 보상 함수(reward function)는 상태 벡터로부터 유도될 수 있다.The reward function for adjusting the waiting period of the receiver 300 may be derived from the state vector.

보상 함수의 설계 기준은 아래의 기준이 적용될 수 있다.The following criteria can be applied to the design criteria of the compensation function.

1) 많은 메시지를 한번에 받을수록 좋음1) The more messages you receive at once, the better

2) 중요한 메시지를 놓치면 손해 발생2) Loss of important messages

3) 중요한 메시지가 지연되면 손해 발생3) Damage occurs when important messages are delayed

이에 따라 본 발명은 아래와 같은 보상 함수를 적용할 수 있다.Accordingly, the present invention can apply the following compensation function.

본원 발명은, 예를 들면 강화 학습 에이전트(400)는 위에서 설계된 상태 벡터 및 보상 함수를 기초로 심층신경망을 동작시킬 수 있다.In the present invention, for example, the reinforcement learning agent 400 may operate the deep neural network based on the state vector and compensation function designed above.

예를 들면, 본 발명은 Q-러닝(Q-learning) 방식 및 정책 그라디언트(Policy Gradient) 방식 중 적어도 하나의 강화 학습 방식을 적용할 수 있다. 따라서, 본 발명은 Q-러닝 방식 및 정책 그라디언트 방식을 모두 적용할 수 있다.For example, the present invention can be applied to at least one reinforcement learning method of the Q-learning (Q-learning) method and the policy gradient (Policy Gradient) method. Therefore, the present invention can apply both the Q-learning method and the policy gradient method.

일 실시예로, 본 발명은 Q-러닝 방식에 따르면 아래의 순서로 심층신경망을 동작할 수 있다. In one embodiment, the present invention can operate the deep neural network in the following order according to the Q-learning method.

다른 실시예로, 본 발명은 정책 그라디언트 방식에 따르면 아래의 순서로 심층신경망을 동작할 수 있다.In another embodiment, the present invention may operate the deep neural network in the following order according to the policy gradient method.

강화 학습 에이전트(400)는 심층 신경망 적용에 따른 결과로, 대기 주기 조정 동작을 결정할 수 있다(S250).The reinforcement learning agent 400 may determine a waiting period adjustment operation as a result of applying a deep neural network (S250).

예를 들면, 강화 학습 에이전트(400)는 수신자(300)의 대기 주기를 줄일지 또는 늘릴지를 결정할 수 있다.For example, the reinforcement learning agent 400 may determine whether to reduce or increase the waiting period of the receiver 300.

따라서, 본 발명의 대기 주기 조정 동작 결정은 대기 시간 조정의 방향을 결정하는 것을 의미할 수 있다.Therefore, the determination of the waiting period adjustment operation of the present invention may mean determining the direction of the waiting time adjustment.

강화 학습 에이전트(400)는 보상 함수를 이용하여 대기 주기 조정값을 계산할 수 있다(S270).The reinforcement learning agent 400 may calculate a waiting period adjustment value using a compensation function (S270).

강화 학습 에이전트(400)는 보상 함수를 이용하여 보상을 계산할 수 있고, 계산된 보상을 기초로 대기 주기 조정값을 산출할 수 있다.The reinforcement learning agent 400 may calculate a compensation using a compensation function, and may calculate a waiting period adjustment value based on the calculated compensation.

산출된 대기 주기 조정값은 브로커(200)가 수신자(300)에 전송할 수 있고, 전송된 대기 주기 조정값은 수신자(300)의 다음 주기에 반영될 수 있다. The calculated waiting period adjustment value may be transmitted by the broker 200 to the receiver 300, and the transmitted waiting period adjustment value may be reflected in the next period of the receiver 300.

이에 따라 도 5에 도시된 것과 같이, 수신자(300)의 대기 주기(cycle)이 조정될 수 있고, 수신자(300)는 조정된 주기(adjusted cycle)를 가지고 브로커(200)에 접속할 수 있다. Accordingly, as shown in FIG. 5, the waiting cycle of the receiver 300 may be adjusted, and the receiver 300 may access the broker 200 with an adjusted cycle.

한편, 상술한 강화 학습 에이전트(400)는 브로커(200) 상에 위치할 수 있고, 브로커(200)에 포함되는 구성일 수도 있다.Meanwhile, the above-described reinforcement learning agent 400 may be located on the broker 200 or may be a component included in the broker 200.

본 발명에 따른 발행/구독 구조를 적용한 경우, 수신자(300)는 대기 시간 주기를, 정의된 상태 벡터와 보상 함수를 적용한 심층 신경망을 통해 유동적으로 변경할 수 있다. 그리고 본 발명에 따라 대기 시간 주기 조정이 최적으로 제어되는 것을 증명하기 위해, 수신자의 배터리 소모를 측정한 실험 결과를 도 6을 참조하여 설명한다.When the publish/subscribe structure according to the present invention is applied, the receiver 300 can flexibly change the waiting time period through a deep neural network to which a defined state vector and a reward function are applied. And in order to prove that the standby time period adjustment is optimally controlled according to the present invention, the experimental results of measuring the battery consumption of the receiver will be described with reference to FIG. 6.

도 6은 본 발명의 일 실시예에 따른 실험 결과에 대한 그래프이다.6 is a graph of experimental results according to an embodiment of the present invention.

도 6을 참조하면, 일반적인 MQTT 동작에서는 단위시간당 평균 1058.05mW를 소모한데 비해, 본 발명에 따른 발행/구독 구조에 따른 Q-러닝(Q-learning)과 정책 그라디언트(Policy Gradient) 방식을 적용했을 때는 각각 1052.8mW, 1054.2mW를 소모하는 것을 확인할 수 있다. 이와 같은 결과는 각각 5.25mW, 3.85mW의 전력을 절약한 것으로, 작은 차이인 것으로 볼 수 있지만, 수신자(300)에 해당하는 기기의 아이들(idle) 상태에서의 전력소모량인 1049.84mW를 고려한다면, MQTT 통신에 소모되는 에너지는 일반적 MQTT 통신에서는 8.21mW이고, 본 발명의 Q-러닝(Q-learning) 방식이 적용된 경우에는 2.96mW이고, 정책 그라디언트(Policy Gradient) 방식이 적용된 경우에는 4.36mW이어서, 각각 63.9%, 47.0%의 통신 에너지 절약이 가능하다는 결론을 도출할 수 있다. Referring to FIG. 6, when the average MQTT operation consumes an average of 1058.05 mW per unit time, when Q-learning and Policy Gradient are applied according to the publish/subscribe structure according to the present invention It can be seen that it consumes 1052.8mW and 1054.2mW, respectively. These results are 5.25mW and 3.85mW power savings, which can be seen as a small difference, but considering the power consumption in the idle state of the device corresponding to the receiver 300, 1049.84mW, The energy consumed for MQTT communication is 8.21 mW in general MQTT communication, and 2.96 mW when the Q-learning method of the present invention is applied, and 4.36 mW when the Policy Gradient method is applied, It can be concluded that communication energy savings of 63.9% and 47.0%, respectively, are possible.

따라서, 본 발명이 적용된 실험 결과에서 볼 수 있듯이, 본 발명에 따르면 수신자(300)의 효율적인 자원 활용을 보장할 수 있어서, 수신자(300)의 동작 효율 및 에너지 효율이 향상되는 것을 확인할 수 있다.Therefore, as can be seen from the experimental results to which the present invention is applied, according to the present invention, it is possible to ensure efficient resource utilization of the receiver 300, and thus it can be seen that the operation efficiency and energy efficiency of the receiver 300 are improved.

이와 같이, 본 발명은 발행 및 구독 통신 시스템의 수신자의 대기 시간을 최적으로 제어할 수 있다.As such, the present invention can optimally control the waiting time of the receiver of the publication and subscription communication system.

한편, 상술한 본 발명은 MQTT 프로토콜이 가지는 발행/구독 통신 구조에 적용될 수 있으며, MQTT 프로토콜 이외에도 발행/구독 통신 구조에 따른 다양한 프로토콜에 적용될 수 있다.Meanwhile, the above-described present invention can be applied to the publish/subscribe communication structure of the MQTT protocol, and can be applied to various protocols according to the publish/subscribe communication structure in addition to the MQTT protocol.

이상, 본 발명의 기술적 사상을 다양한 실시 예들을 들어 상세하게 설명하였으나, 본 발명의 기술적 사상은 상기 실시 예들에 한정되지 않고, 본 발명의 기술적 사상의 범위 내에서 당 분야에서 통상의 지식을 가진 자에 의하여 여러가지 변형 및 변경이 가능하다.As described above, the technical idea of the present invention has been described in detail with various embodiments, but the technical idea of the present invention is not limited to the above embodiments, and has ordinary knowledge in the art within the scope of the technical idea of the present invention Various modifications and changes are possible.

Claims

In the method of operation of the publishing (Publish) and subscription (subscribe) communication system,
Receiving a state vector for the state of the subscriber;
A waiting period for adjusting a waiting cycle of the loop operation of the receiver based on the input state vector s _i and a reward function for adjusting the waiting period of the receiver Generating an adjustment value; And
And transmitting the generated waiting period adjustment value to the subscriber.
How it works.

According to claim 1,
The state vector s _i is
The number of messages (m _i ) corresponding to the topic designated by the recipient,
The number of messages of high importance (f _i ) among the messages corresponding to the topic, and
Including the waiting time (t _i ) of the important message corresponding to the topic
How it works.

According to claim 2,
The compensation function
The following equation derived from the state vector

How it works.

According to claim 1,
The step of generating the standby period adjustment value is
Inputting the input state vector into a deep neural network;
Determining an operation of adjusting the waiting period of the receiver by applying the compensation function;
Comprising the step of calculating the waiting period adjustment value using the compensation function
How it works.

According to claim 4,
The deep neural network
At least one reinforcement learning method of Q-learning method and Policy Gradient method is applied.
How it works.

According to claim 1,
The step of delivering the generated waiting period adjustment value to the receiver is
It is delivered during the subscription operation of the receiver, and comprising the step of being set for the next cycle of the receiver
How it works.

In a broker included in a publishing and subscription communication system,
At least one processor; And
A memory electrically connected to the processor,
The memory, when the processor is running,
Receive a state vector for the state of the subscriber,
A waiting period for adjusting a waiting cycle of the loop operation of the receiver based on the input state vector s _i and a reward function for adjusting the waiting period of the receiver Create adjustment values,
Storing instructions for passing the generated waiting period adjustment value to the subscriber
broker.

The method of claim 7,
The state vector s _i is
The number of messages (m _i ) corresponding to the topic designated by the recipient,
The number of messages of high importance (f _i ) among the messages corresponding to the topic, and
Including the waiting time (t _i ) of the important message corresponding to the topic
broker.

The method of claim 8,
The compensation function
The following equation derived from the state vector

broker.

The method of claim 7,
The memory, when the processor is running,
Enter the input state vector into the deep neural network,
Applying the compensation function to determine the waiting period adjustment operation of the receiver,
Storing instructions for calculating the waiting period adjustment value using the compensation function
broker.

The method of claim 10,
The memory, when the processor is running,
Storing instructions for applying at least one reinforcement learning method of the Q-learning method and the policy gradient method to the deep neural network
broker.

The method of claim 7,
The memory, when the processor is running,
In the case of the subscription operation of the receiver, the standby cycle adjustment value is transmitted to store instructions to be set for the next cycle of the receiver
broker.