KR20220009308A

KR20220009308A - Packet Transmission Decision Apparatus And Packet Transmission Schedule Decision Method

Info

Publication number: KR20220009308A
Application number: KR1020200169987A
Authority: KR
Inventors: 전상운; 정종진; 소화이브무하마드
Original assignee: 한양대학교 에리카산학협력단
Priority date: 2020-07-15
Filing date: 2020-12-08
Publication date: 2022-01-24
Also published as: KR102428507B1

Abstract

An apparatus for determining packet transmission includes: a pre-processing unit which generates status information for each determination time based on behavior information that has transmitted a packet and feedback information received through packet transmission; a data management unit which generates redefined status information by using the status information and accumulated status information; a value calculating unit which calculates a present value of the packet transmission and a future value of the packet transmission by using the redefined status information; a Q-value calculating unit which calculates a Q-value for the packet transmission by summing the present and future values; and a decision unit which determines a packet transmission schedule at the next determination time based on the Q-value.

Description

Packet Transmission Decision Apparatus And Packet Transmission Schedule Decision Method

본 발명은 패킷전송 결정 장치 및 패킷전송 스케줄 결정 방법에 대한 것으로, 더욱 상세하게는 패킷전송에 따른 상태정보를 이용한 Q-Value를 산출하여 사용자 단말들 간의 공정성을 향상시키는 패킷전송 결정 장치 및 패킷전송 스케줄 결정 방법에 대한 것이다.The present invention relates to an apparatus for determining packet transmission and a method for determining a packet transmission schedule, and more particularly, to an apparatus for determining packet transmission and packet transmission for improving fairness between user terminals by calculating a Q-Value using status information according to packet transmission How to decide on a schedule.

종래의 기계학습 기반의 다중채널 액세스 기법은 단일 에이전트가 모든 사용자의 정보를 취합하여 채널을 할당하는 중앙집중형 무선자원 관리 방식이다. 최근 분산환경에서 적용 가능한 기계학습 기반 다중채널 액세스 기법들이 제안되었으나, 고정된 사용자 집합을 가정하고 있어 무선자원 관리에 한계가 있다.The conventional machine learning-based multi-channel access method is a centralized radio resource management method in which a single agent collects information of all users and allocates channels. Recently, machine learning-based multi-channel access methods applicable in a distributed environment have been proposed, but there is a limitation in radio resource management because a fixed set of users is assumed.

또한, 종래의 랜덤 액세스 기술은 최적화 및 결정 알고리즘에 기반한 기술군에 해당한다. 따라서 실시간으로 변화하는 통신 환경에 적응하는데 한계가 있으며 계산의 복잡도로 인하여 최적화 또한 한계가 있다.In addition, the conventional random access technique corresponds to a group of techniques based on optimization and decision algorithms. Therefore, there is a limit in adapting to the communication environment changing in real time, and optimization also has a limit due to the complexity of the calculation.

또한, 종래의 기계학습 기반 무선자원 관리 기술들은 중앙집중형 에이전트가 학습 데이터를 수집 후, 학습을 통하여 다수 사용자들의 무선자원 할당을 수행하였다. 따라서 5G 및 Beyond 5G 셀룰라 통신에서 요구하는 Massive Connectivity를 지원하기 어렵다. 아울러 최근 제안된 기계학습 기반 분산적 무선자원 관리 기술군 또한 시불변 사용자 집합을 가정하고 있으며, 많은 학습 데이터와 장기간의 학습 시간이 요구되어 실시간으로 변화하는 사물 인터넷 환경에 적응하는 데 한계가 있다.In addition, in conventional machine learning-based radio resource management techniques, the centralized agent collects learning data and then allocates radio resources to multiple users through learning. Therefore, it is difficult to support the Massive Connectivity required for 5G and Beyond 5G cellular communication. In addition, the recently proposed machine learning-based distributed radio resource management technology group also assumes a time-invariant set of users, and there is a limit to adapting to the real-time changing Internet of Things environment because a lot of learning data and a long learning time are required.

대한민국공개특허공보 제10-2019-0127480호(통신 시스템에서 전송 포인트 및 링크 적응 방식을 결정하기 위한 방법 및 장치, 한국전자통신연구원, 2019.11.13)Republic of Korea Patent Publication No. 10-2019-0127480 (Method and apparatus for determining transmission point and link adaptation scheme in a communication system, Korea Electronics and Telecommunications Research Institute, 2019.11.13)

본 발명이 이루고자 하는 기술적 과제는 이러한 문제점 또는 한계치를 해결하기 위한 것으로서, 본 발명은 패킷전송에 따른 상태정보를 이용한 Q-Value를 산출하여 사용자 단말들 간의 공정성을 향상시키는 패킷전송 결정 장치 및 패킷전송 스케줄 결정 방법을 제공한다.The technical problem to be achieved by the present invention is to solve these problems or limitations, and the present invention provides a packet transmission determining device and packet transmission that improve fairness between user terminals by calculating a Q-Value using status information according to packet transmission. It provides a method for determining a schedule.

본 발명의 패킷전송 결정 장치는 패킷을 전송한 행동정보 및 패킷전송으로 수신한 피드백정보에 기초하여 결정시간마다 상태정보를 생성하는 전처리부, 상태정보 및 누적된 상태정보를 이용하여 재정의된 상태정보를 생성하는 데이터관리부, 재정의된 상태정보를 이용하여 패킷전송의 현재가치 및 패킷전송의 미래가치를 산출하는 가치산출부, 현재가치 및 미래가치를 합산하여 패킷전송에 대한 Q-Value를 산출하는 Q-Value 산출부 및 Q-Value에 기초하여 다음 결정시간에서의 패킷전송 스케줄을 결정하는 결정부를 포함할 수 있다.The apparatus for determining packet transmission of the present invention includes a pre-processing unit that generates status information for each determination time based on behavior information of packet transmission and feedback information received through packet transmission, status information redefined using status information and accumulated status information A data management unit that creates a value, a value calculation unit that calculates the present value and future value of packet transmission using the redefined state information, and a Q that calculates the Q-Value for packet transmission by adding up the present value and future value It may include a -Value calculator and a decision part that determines a packet transmission schedule at the next decision time based on the Q-Value.

가치산출부는 미리 마련된 Deep Q-Network에 재정의된 상태정보를 입력하여 현재가치 및 미래가치를 산출할 수 있다.The value calculation unit can calculate the present value and future value by inputting the redefined state information into the deep Q-Network prepared in advance.

가치산출부는 최대 허용 사용자 단말 수에 대응하는 복수 개의 Deep Q-Network로 구성되는 현재가치 산출부를 포함할 수 있다.The value calculating unit may include a present value calculating unit comprising a plurality of Deep Q-Networks corresponding to the maximum number of allowable user terminals.

현재가치 산출부는 복수 개의 Deep Q-Network 중 결정시간의 크기에 대응하는 수의 Deep Q-Network만을 이용하여, 현재가치를 산출하는 것을 특징으로 할 수 있다.The present value calculating unit may be characterized in that the present value is calculated by using only the number of deep Q-Networks corresponding to the size of the decision time among the plurality of Deep Q-Networks.

현재가치 산출부는 복수 개의 Deep Q-Network에서 산출된 현재가치를 정규화하는 것을 특징으로 할 수 있다.The present value calculator may be characterized by normalizing the present values calculated from the plurality of deep Q-Networks.

가치산출부는 재정의된 상태정보를 이용하여 현재가치가 결정되는 결정시간 이후의 패킷전송의 가치인 미래가치를 산출하는 미래가치 산출부를 포함할 수 있다.The value calculation unit may include a future value calculation unit that calculates a future value that is a value of packet transmission after the determination time at which the present value is determined by using the redefined state information.

결정부는 Q-Value 중 가장 높은 Q-Value가 산출된 결정시간 내 시간슬롯에 대응되는 전송 채널을 다음 결정시간에서의 패킷전송 스케줄로 결정할 수 있다.The determination unit may determine a transmission channel corresponding to a timeslot within a determination time in which the highest Q-Value is calculated among Q-Values as a packet transmission schedule at a next determination time.

결정부는 패킷전송 스케줄로 결정되지 않은 시간슬롯에 제로패딩(Zero-Padding)을 적용하여, 패킷전송 스케줄의 크기와 결정시간의 크기를 맞추는 것을 특징으로 할 수 있다.The determining unit may apply zero-padding to a timeslot not determined by the packet transmission schedule to match the size of the packet transmission schedule and the size of the determination time.

데이터관리부는 LSTM(Long Short-Term Memory) 모델을 이용하여 결정시간의 경과에 따라 상태정보 및 재정의된 상태정보를 시계열 학습 및 집계하는 것을 특징으로 할 수 있다.The data management unit may be characterized by time-series learning and aggregation of the state information and the redefined state information according to the elapse of a decision time using a Long Short-Term Memory (LSTM) model.

본 발명의 패킷전송 스케줄 결정 방법은 전처리부가 패킷을 전송한 행동정보 및 패킷전송으로 수신한 피드백정보에 기초하여 결정시간마다 상태정보를 생성하는 전처리단계, 데이터관리부가 상태정보 및 누적된 상태정보를 이용하여 재정의된 상태정보를 생성하는 데이터관리단계, 가치산출부가 재정의된 상태정보를 이용하여 패킷전송의 현재가치 및 패킷전송의 미래가치를 산출하는 가치산출단계, Q-Value 산출부가 현재가치 및 미래가치를 합산하여 패킷전송에 대한 Q-Value를 산출하는 Q-Value 산출단계 및 결정부가 Q-Value에 기초하여 다음 결정시간에서의 패킷전송 스케줄을 결정하는 결정단계를 포함할 수 있다.The packet transmission schedule determination method of the present invention includes a pre-processing step in which the pre-processing unit generates status information at each determination time based on the packet transmission behavior information and the feedback information received through packet transmission, and the data management unit collects the status information and accumulated status information. A data management step that generates the redefined state information using the It may include a Q-Value calculation step of calculating a Q-Value for packet transmission by summing values, and a determining step of determining a packet transmission schedule at a next determination time by a decision unit based on the Q-Value.

본 발명의 패킷전송 결정 장치 및 패킷전송 스케줄 결정 방법은 사용자 단말 별 전송률과 사용자 단말들 간의 공정성을 보장할 수 있다.The apparatus for determining packet transmission and the method for determining a packet transmission schedule according to the present invention can ensure a data rate for each user terminal and fairness between user terminals.

또한, 본 발명의 패킷전송 결정 장치 및 패킷전송 스케줄 결정 방법은 기존의 중앙집중형 기계학습과 분산형 기계학습에 비해 계산의 복잡도를 감소시킬 수 있다.In addition, the apparatus for determining packet transmission and the method for determining a packet transmission schedule of the present invention can reduce computational complexity compared to the existing centralized machine learning and distributed machine learning.

또한, 본 발명의 패킷전송 결정 장치 및 패킷전송 스케줄 결정 방법은 다양한 사물인터넷 관련 분야 또는 5G, Beyond 5G 셀룰라 통신 시스템의 접속 기술로 활용할 수 있다.In addition, the apparatus for determining packet transmission and the method for determining a packet transmission schedule of the present invention can be utilized as access technology in various IoT-related fields or 5G and Beyond 5G cellular communication systems.

도 1은 일 실시예에 따른 랜덤 액세스 통신 시스템을 나타내는 도면이다.
도 2는 일 실시예에 따른 랜덤 액세스 통신 시스템이 동작하는 예시를 나타내는 도면이다.
도 3은 일 실시예에 따른 사용자 단말의 구성을 나타내는 블록도이다.
도 4는 일 실시예에 따른 패킷전송 스케줄을 나타내는 도면이다.
도 5는 일 실시예에 따른 패킷전송 결정 장치의 구성을 나타내는 블록도이다.
도 6은 일 실시예에 따른 현재가치 산출부의 구성을 나타내는 블록도이다.
도 7은 일 실시예에 따른 패킷전송 결정 장치의 패킷전송 스케줄 결정 과정을 나타내는 도면이다.
도 8은 일 실시예에 따른 패킷전송 결정 장치를 이용하여 패킷전송 스케줄을 결정하는 방법을 나타내는 순서도이다.
도 9a는 일 실시예에 따른 시간경과에 따른 활성 사용자 단말 간의 패킷전송 공정성을 도시한 도면이다.
도 9b는 일 실시예에 따른 활성 사용자 단말 별 전송률과 목표 전송률을 도시한 도면이다.1 is a diagram illustrating a random access communication system according to an embodiment.
2 is a diagram illustrating an operation of a random access communication system according to an embodiment.
3 is a block diagram illustrating a configuration of a user terminal according to an embodiment.
4 is a diagram illustrating a packet transmission schedule according to an embodiment.
5 is a block diagram illustrating a configuration of an apparatus for determining packet transmission according to an embodiment.
6 is a block diagram illustrating a configuration of a present value calculator according to an exemplary embodiment.
7 is a diagram illustrating a packet transmission schedule determination process of an apparatus for determining packet transmission according to an embodiment.
8 is a flowchart illustrating a method of determining a packet transmission schedule using the apparatus for determining packet transmission according to an embodiment.
9A is a diagram illustrating the fairness of packet transmission between active user terminals according to the lapse of time according to an embodiment.
9B is a diagram illustrating a data rate and a target data rate for each active user terminal according to an embodiment.

이하, 첨부된 도면을 참조하여 기술되는 실시예를 통하여 발명을 통상의 기술자가 용이하게 이해하고 재현할 수 있도록 상세히 기술하기로 한다. 다만, 발명을 설명함에 있어 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 발명 실시예들의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다.Hereinafter, the present invention will be described in detail so that those skilled in the art can easily understand and reproduce the invention through the embodiments described with reference to the accompanying drawings. However, when it is determined that a detailed description of a known function or configuration related to the invention may unnecessarily obscure the gist of the embodiments of the invention, the detailed description thereof will be omitted.

이하에서 사용되는 용어들은 실시예에서의 기능을 고려하여 선택된 용어들로써, 그 용어의 의미는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로, 후술하는 실시예에서 사용된 용어의 의미는 이하에서 구체적으로 정의된 경우에는 그 정의에 따르며, 구체적인 정의가 없는 경우는 통상의 기술자들이 일반적으로 인식하는 의미로 해석되어야 할 것이다. 또한, 각 도면에서 제시된 동일한 참조번호 또는 부호는 실질적으로 동일한 기능을 수행하는 부품 또는 구성요소를 나타낸다.The terms used below are terms selected in consideration of functions in the embodiment, and the meaning of the terms may vary depending on the intention or custom of a user or operator. Therefore, the meanings of terms used in the examples to be described below follow the definitions when they are specifically defined below, and when there is no specific definition, they should be interpreted as meanings generally recognized by those skilled in the art. In addition, the same reference numerals or reference numerals in each drawing indicate parts or components that perform substantially the same functions.

이하, 첨부된 도면 및 이에 기재된 내용들을 참조하여 본 발명의 실시예에 따른 패킷전송 결정 장치 및 패킷전송 스케줄 결정 방법을 상세히 설명하도록 한다.Hereinafter, an apparatus for determining packet transmission and a method for determining a packet transmission schedule according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings and contents described therein.

도 1은 본 발명의 실시예에 따른 랜덤 액세스 통신 시스템을 나타내는 도면이다.1 is a diagram illustrating a random access communication system according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 랜덤 액세스 통신 시스템(1)은 적어도 하나의 사용자 단말(10) 및 중계기(20)로 구성될 수 있다.Referring to FIG. 1 , a random access communication system 1 according to an embodiment of the present invention may include at least one user terminal 10 and a repeater 20 .

사용자 단말(10)은 전송할 패킷이 있는 경우 중계기(20)에 접속하여, 임의(Random)의 시간슬롯에 패킷을 전송하고, 패킷전송이 완료되면 중계기(20)와의 접속을 해제할 수 있다.When there is a packet to be transmitted, the user terminal 10 may access the relay 20, transmit the packet in a random timeslot, and release the connection with the relay 20 when the packet transmission is completed.

복수의 사용자 단말(10)이 각 시간슬롯마다 중계기(20)와의 연결 및 해제를 반복적으로 수행함으로써, 중계기(20)에 연결되어 있는 활성 사용자 단말의 수는 각 시간슬롯에 따라 변할 수 있다.Since the plurality of user terminals 10 repeatedly perform connection and disconnection with the repeater 20 for each timeslot, the number of active user terminals connected to the repeater 20 may change according to each timeslot.

중계기(20)는 일반적으로 사용자 단말(10)과 통신하는 AP(Access Point)를 의미할 수 있고, 이에 따라 사용자 단말(10)이 전송하는 패킷을 수신할 수 있다. 중계기(20)는 각 시간슬롯마다 사용자 단말(10)이 전송한 패킷을 수신하고, 패킷의 수신에 따른 피드백정보를 브로드캐스팅(Broadcasting)할 수 있다.The repeater 20 may generally mean an access point (AP) that communicates with the user terminal 10 , and thus may receive a packet transmitted from the user terminal 10 . The repeater 20 may receive the packet transmitted by the user terminal 10 for each time slot, and broadcast feedback information according to the reception of the packet.

구체적으로, 중계기(20)는 하나의 시간슬롯에서 하나의 패킷만이 수신되면, 수신된 패킷을 디코딩한 후 사용자 단말(10) 측으로 ACK 신호를 브로드캐스팅할 수 있다.Specifically, when only one packet is received in one timeslot, the repeater 20 may decode the received packet and broadcast an ACK signal to the user terminal 10 side.

하지만 중계기(20)는 하나의 시간슬롯에서 복수 개의 패킷들이 수신되면, 해당 패킷들을 디코딩하지 않고 사용자 단말(10) 측으로 NAK 신호를 브로드캐스팅할 수 있다.However, when a plurality of packets are received in one timeslot, the repeater 20 may broadcast the NAK signal to the user terminal 10 without decoding the corresponding packets.

도 2는 본 발명의 실시예에 따른 랜덤 액세스 통신 시스템이 동작하는 예시를 나타내는 도면이다.2 is a diagram illustrating an example in which a random access communication system according to an embodiment of the present invention operates.

도 2를 참조하면, 도 2는 활성 사용자 단말이 5개이고, 전송 채널이 2개인 경우의 랜덤 액세스 통신 시스템 동작 예시를 도시하고 있다. 여기서 각 결정시간(i=1,2,??,n)에서 패킷전송을 위해 중계기에 접속하는 사용자 단말을 활성 사용자 단말로 정의하고, 결정시간(Decision Time)에 대해서는 이하에서 후술하도록 한다.Referring to FIG. 2, FIG. 2 shows an example of operation of a random access communication system in the case where there are 5 active user terminals and 2 transmission channels. Here, a user terminal that accesses a repeater for packet transmission at each decision time (i=1,2,??,n) is defined as an active user terminal, and the decision time will be described later.

시간경과에 따른 패킷전송 동작을 살펴보면, 활성 사용자 단말 1은 시간슬롯 1에서 전송 채널 2를 통해 중계기 측으로 패킷을 전송할 수 있고, 중계기는 시간슬롯 1의 전송 채널 2에서 활성 사용자 단말 1의 패킷만을 수신하였으므로, 해당 패킷을 디코딩한 후 활성 사용자 단말 1 측으로 ACK 신호를 브로드캐스팅할 수 있다.Looking at the packet transmission operation according to time lapse, active user terminal 1 may transmit a packet to the relay through transmission channel 2 in timeslot 1, and the repeater receives only packets of active user terminal 1 in transmission channel 2 of timeslot 1 Therefore, after decoding the corresponding packet, the ACK signal can be broadcast to the active user terminal 1 side.

활성 사용자 단말 2는 시간슬롯 1에서 전송 채널 1을 통해 중계기 측으로 패킷을 전송할 수 있고, 중계기는 시간슬롯 1의 전송 채널 1에서 활성 사용자 단말 1의 패킷만을 수신하였으므로 해당 패킷을 디코딩한 후 활성 사용자 단말 1 측으로 ACK 신호를 브로드캐스팅할 수 있다.Active user terminal 2 may transmit a packet to the repeater through transport channel 1 in timeslot 1, and since the repeater has only received packets of active user terminal 1 in transport channel 1 of timeslot 1, after decoding the packet, active user terminal An ACK signal may be broadcast to one side.

복수의 활성 사용자 단말들은 시간슬롯 2에서 패킷전송에 참여하지 않았지만, 중계기는 복수의 활성 사용자 단말 측으로 NAK 신호를 브로드캐스팅할 수 있다. 여기서 시간슬롯 2에서 브로드캐스팅한 NAK 신호는 어떠한 사용자 단말도 중계기에 접속하지 않았다는 의미의 NAK 신호일 수 있다.Although the plurality of active user terminals did not participate in packet transmission in timeslot 2, the repeater may broadcast the NAK signal to the plurality of active user terminals. Here, the NAK signal broadcast in timeslot 2 may be a NAK signal meaning that no user terminal accesses the repeater.

활성 사용자 단말 2 및 4는 시간슬롯 3에서 전송 채널 2를 통해 중계기 측으로 패킷을 전송할 수 있으며, 중계기는 시간슬롯 3의 전송 채널 2에서 복수 개의 패킷들을 수신하였으므로, 해당 패킷들을 디코딩하지 않고 활성 사용자 단말 2 및 4 측으로 NAK 신호를 브로드캐스팅할 수 있다. 여기서 시간슬롯 3에서 브로드캐스팅한 NAK 신호는 패킷들 간의 충돌이 발생하였다는 의미의 NAK 신호일 수 있다.Active user terminals 2 and 4 may transmit packets to the relay through transport channel 2 in timeslot 3, and since the repeater has received a plurality of packets in transport channel 2 of timeslot 3, active user terminals without decoding the packets You can broadcast NAK signals to the 2nd and 4th sides. Here, the NAK signal broadcast in timeslot 3 may be a NAK signal indicating that a collision between packets has occurred.

이후의 결정시간 및 시간슬롯에서도 위와 같은 동작이 수행되므로, 이에 대한 설명은 생략하기로 한다.Since the above operation is also performed in the subsequent determination time and timeslot, a description thereof will be omitted.

도 3은 본 발명의 실시예에 따른 사용자 단말의 구성을 나타내는 블록도이다.3 is a block diagram illustrating the configuration of a user terminal according to an embodiment of the present invention.

도 3을 참조하면, 사용자 단말(10)은 통신부(110), 프로세스(120) 및 저장부(130)를 포함할 수 있다.Referring to FIG. 3 , the user terminal 10 may include a communication unit 110 , a process 120 , and a storage unit 130 .

통신부(110)는 중계기 측으로 패킷을 송신하고, 중계기의 패킷 수신 여부에 따른 피드백정보를 수신할 수 있다.The communication unit 110 may transmit a packet to the repeater and receive feedback information according to whether the repeater has received the packet.

프로세스(120)는 중계기 측으로 송신할 패킷을 생성하고, 패킷전송 결정 장치(121)에 의해 결정된 패킷전송 스케줄에 따라 패킷을 전송하도록 통신부(110)를 제어할 수 있다.The process 120 may generate a packet to be transmitted to the repeater and control the communication unit 110 to transmit the packet according to the packet transmission schedule determined by the packet transmission determining device 121 .

구체적으로 사용자 단말(10)은 랜덤 액세스 통신 시스템에서 패킷을 전송할 시간슬롯 및 전송 채널을 결정하기 위한 패킷전송 결정 장치(121)를 포함할 수 있다. 패킷전송 결정 장치(121)는 사용자 단말(10)에 별도의 하드웨어적 구성으로 마련될 수 있으나, 이에 한정되는 것이 아니고 사용자 단말(10)의 프로세스(120)에 의하여 구동되는 소프트웨어 형태로도 구성될 수 있다. 즉, 사용자 단말(10)과 패킷전송 결정 장치(121)는 하나의 장치로 구성될 수 있다.In more detail, the user terminal 10 may include a packet transmission determining device 121 for determining a timeslot and a transmission channel to transmit a packet in a random access communication system. The packet transmission determining device 121 may be provided as a separate hardware configuration in the user terminal 10 , but is not limited thereto and may also be configured in a software form driven by the process 120 of the user terminal 10 . can That is, the user terminal 10 and the packet transmission determining device 121 may be configured as one device.

패킷전송 결정 장치(121)는 결정시간마다 패킷을 전송할 패킷전송 스케줄을 결정할 수 있다.The packet transmission determining apparatus 121 may determine a packet transmission schedule to transmit a packet at each determined time.

여기서 패킷전송 스케줄은 결정시간 내의 시간슬롯과 패킷전송에 이용할 전송 채널로 구성될 수 있으며, 패킷전송 스케줄은 특정 행렬 형태로 관리될 수 있다.Here, the packet transmission schedule may be composed of a timeslot within a determined time and a transmission channel to be used for packet transmission, and the packet transmission schedule may be managed in the form of a specific matrix.

도 2 및 도 4를 참조하면, 도 2와 같이 결정시간의 크기가 5이고 전송 채널의 수가 2인 경우의 랜덤 액세스 시스템에서, 도 4와 같이 패킷전송 스케줄(30)은 결정시간의 크기 5에 해당하는 크기의 행렬로 구성될 수 있고, 각 시간슬롯마다 패킷전송에 이용할 전송 채널의 번호가 저장된 형태로 생성될 수 있다.2 and 4, in the random access system when the size of the decision time is 5 and the number of transmission channels is 2 as shown in FIG. 2, the packet transmission schedule 30 as shown in FIG. 4 is the size of the decision time of 5 It may be composed of a matrix of a corresponding size, and may be generated in a form in which the number of a transmission channel to be used for packet transmission is stored for each timeslot.

이때, 패킷전송에 이용할 전송 채널의 번호가 저장되지 않은 시간슬롯은 제로패딩(Zero-Padding)으로 처리될 수 있으나, 이에 한정되는 것이 아니고 패킷전송이 이루어지지 않음을 알 수 있는 특별한 값으로 처리될 수 있다.In this case, a timeslot in which the number of a transport channel to be used for packet transmission is not stored may be processed as zero-padding, but is not limited thereto, and may be processed as a special value indicating that packet transmission is not performed. can

일 실시예로, 활성 사용자 단말의 수가 n명인 경우 패킷전송 결정 장치(121)는 결정시간의 크기를 n 시간으로 결정하고, n 시간마다 패킷전송 스케줄(30)을 결정할 수 있다. 활성 사용자 단말에 따라 결정시간이 결정되는 경우, 사용자 단말(10)은 중계기로부터 미리 설정된 시간슬롯마다 활성 사용자 단말의 수를 받아오거나, 중계기가 브로드캐스팅하는 ACK 신호 및 NAK 신호를 활용하여 활성 사용자 단말의 수를 추정할 수 있다.As an embodiment, when the number of active user terminals is n, the packet transmission determining apparatus 121 may determine the size of the determination time to be n hours, and may determine the packet transmission schedule 30 every n hours. When the determination time is determined according to the active user terminal, the user terminal 10 receives the number of active user terminals for each preset timeslot from the repeater, or utilizes the ACK signal and NAK signal broadcast by the repeater to the active user terminal number can be estimated.

저장부(130)는 중계기 측으로 송신할 패킷을 저장하거나 프로세스(120)로부터 생성된 패킷을 저장하는 등의 역할을 수행할 수 있다.The storage unit 130 may perform a role of storing a packet to be transmitted to the repeater or storing a packet generated by the process 120 .

이하 도면을 참조하여, 본 발명의 실시예에 따른 패킷전송 결정 장치가 포함된 사용자 단말의 구성과 동작에 대하여 상세히 설명한다.Hereinafter, the configuration and operation of the user terminal including the apparatus for determining packet transmission according to an embodiment of the present invention will be described in detail with reference to the drawings.

도 5는 본 발명의 실시예에 따른 패킷전송 결정 장치의 구성을 나타내는 블록도이다.5 is a block diagram showing the configuration of an apparatus for determining packet transmission according to an embodiment of the present invention.

패킷전송 결정 장치(121)는 사용자 단말의 행동정보, 보상정보 및 상태정보에 기초하여 결정시간 내 시간슬롯 별 Q-Value를 산출하고, 산출된 Q-Value의 우선순위에 따라 다음 결정시간의 패킷전송 스케줄을 결정할 수 있다.The packet transmission determining device 121 calculates a Q-Value for each time slot within a decision time based on the behavior information, reward information, and status information of the user terminal, and according to the priority of the calculated Q-Value, the packet of the next decision time A transmission schedule may be determined.

여기서 사용자 단말의 행동정보, 보상정보 및 상태정보는 이하에서 설명할 패킷전송 스케줄을 결정하는 과정 또는 패킷전송 결정 장치(121)의 각 구성에서 사용되는 정보를 의미하는 것으로, 먼저 행동정보, 보상정보 및 상태정보의 정의에 대하여 선술하도록 한다.Here, the behavior information, reward information, and status information of the user terminal refer to information used in the process of determining the packet transmission schedule or in each configuration of the packet transmission determining device 121, which will be described below. First, behavior information and reward information and the definition of status information will be described in advance.

행동정보는 사용자 단말이 패킷을 전송하기 위해 행동한 정보를 의미하는 것으로, 행동정보는 각 시간슬롯 별 패킷전송 여부 및 패킷전송에 사용된 전송 채널에 대한 정보를 포함할 수 있다.Behavior information refers to information that the user terminal takes to transmit packets, and the behavior information may include information on whether or not packets are transmitted for each time slot and information on a transmission channel used for packet transmission.

본 발명의 일 실시예에 따른 행동정보는 결정시간 단위로 활성 사용자 단말이 패킷을 전송하기 위한 행동의 집합으로 구성될 수 있으며, 행동정보는

로 표현될 수 있다. 여기서,

는 각 활성 사용자 단말,

는 결정시간, N은 전송 채널,

는 결정시간 i에서의 활성 사용자 단말의 수를 의미한다.Behavior information according to an embodiment of the present invention may be composed of a set of behaviors for the active user terminal to transmit packets in units of decision time, and the behavior information

can be expressed as here,

is each active user terminal,

is the decision time, N is the transmission channel,

denotes the number of active user terminals at decision time i.

다시 말해, 행동정보는 결정시간동안 활성 사용자 단말이 패킷을 전송하기 위한 정보로, 현재 결정시간에 속한 시간슬롯들의 패킷전송 스케줄에 대한 정보를 나타낼 수 있다.In other words, the behavior information is information for the active user terminal to transmit packets during the determined time, and may indicate information on packet transmission schedules of timeslots belonging to the current determined time.

예를 들어 도 4와 같이 행동정보가 포함된 패킷전송 스케줄이 주어진 경우, 활성 사용자 단말은 시간슬롯 1에서 전송 채널 2를 통해 패킷을 전송하고, 시간슬롯 2에서 전송 채널 1을 통해 패킷을 전송하며, 시간슬롯 3 내지 5에서는 패킷을 전송하지 않는다는 것을 의미할 수 있다.For example, when a packet transmission schedule including behavior information is given as shown in FIG. 4, the active user terminal transmits a packet through transport channel 2 in timeslot 1, and transmits a packet through transport channel 1 in timeslot 2, , may mean that packets are not transmitted in timeslots 3 to 5.

보상정보는 활성 사용자 단말이 행동정보에 따라 패킷을 전송하여, 중계기가 송신한 피드백정보로부터 획득할 수 있다. 여기서, 피드백정보는 패킷의 전송 완료를 의미하는 ACK 신호와 패킷의 충돌을 의미하는 NAK 신호를 포함할 수 있으나 이에 한정되는 것은 아니다.The reward information may be obtained from feedback information transmitted by the repeater by transmitting a packet according to the behavior information of the active user terminal. Here, the feedback information may include, but is not limited to, an ACK signal indicating completion of packet transmission and a NAK signal indicating packet collision.

본 발명의 일 실시예에 따른 패킷전송 결정 장치(121)는 패킷전송에 따라 중계기로부터 ACK 신호를 수신한 경우 1로, 충돌발생을 의미하는 NAK 신호를 수신한 경우 -1, 이외의 경우에는(= 패킷을 전송하지 않은 경우)에는 0으로 보상정보를 결정할 수 있다. 이에 따라 활성 사용자 단말의 시간슬롯 별 보상정보는 다음의 수학식 1로 표현할 수 있다.The apparatus for determining packet transmission 121 according to an embodiment of the present invention sets 1 when receiving an ACK signal from the repeater according to packet transmission, -1 when receiving a NAK signal indicating collision occurrence, and ( = when no packet is transmitted), the compensation information can be determined as 0. Accordingly, compensation information for each time slot of the active user terminal can be expressed by Equation 1 below.

수학식 1에서

는 활성 사용자 단말,

는 시간슬롯을 의미할 수 있다.in Equation 1

is the active user terminal,

may mean a timeslot.

다만, 본 발명의 보상정보는 결정시간 단위로 활성 사용자 단말이 행동을 통하여 획득한 피드백정보에 따라 결정되는 보상정보의 집합으로 구성될 수 있으며, 보상정보는

로 표현될 수 있다.However, the compensation information of the present invention may be composed of a set of compensation information determined according to the feedback information acquired through the action of the active user terminal in units of determination time, and the compensation information is

can be expressed as

이와 같이 패킷의 전송성공, 패킷의 전송실패 및 패킷의 전송유휴 상태 별로 보상 값을 달리하여 보상정보를 생성함으로써, 이하에서 설명할 Q-Value의 산출 정확도가 더욱 높아질 수 있다.As described above, by generating compensation information by different compensation values for each packet transmission success, packet transmission failure, and packet transmission idle state, the calculation accuracy of Q-Value, which will be described below, can be further improved.

상태정보는 이전 결정시간에서의 행동정보와 보상정보를 포함하는 정보를 의미할 수 있다. 본 발명의 일 실시예에 따르면, 행동정보와 보상정보가 결정시간의 크기에 대응되는 행렬로 구성되므로, 상태정보는

로 표현될 수 있다.The status information may mean information including behavior information and reward information at a previous decision time. According to an embodiment of the present invention, since the behavior information and the reward information are configured in a matrix corresponding to the size of the decision time, the state information is

can be expressed as

이하에서는 이상에서 정의한 행동정보, 보상정보 및 상태정보를 기초로 패킷전송 결정 장치에 대하여 상세히 설명한다.Hereinafter, an apparatus for determining packet transmission based on the behavior information, reward information and status information defined above will be described in detail.

패킷전송 결정 장치(121)는 전처리부(122), 데이터관리부(123), 가치산출부(124), Q-Value 산출부(127) 및 결정부(128)를 포함할 수 있다.The packet transmission determination apparatus 121 may include a preprocessor 122 , a data management unit 123 , a value calculation unit 124 , a Q-Value calculation unit 127 , and a determination unit 128 .

전처리부(122)는 현재 결정시간에서의 상태정보

를 생성하고, 생성된 상태정보가 인공지능을 통한 처리에 용이하도록 전처리를 수행하여 재정의된 상태정보

를 생성할 수 있다.The pre-processing unit 122 provides status information at the current decision time.

created state information, and pre-processing is performed to facilitate processing of the generated state information through artificial intelligence.

can create

구체적으로 전처리부(122)는 상태정보

를 생성하기 위하여 이에 대한 행동정보 및 보상정보를 생성할 수 있다.Specifically, the pre-processing unit 122 states information

In order to generate , it is possible to generate behavior information and reward information for it.

여기서 행동정보는 사용자 단말 내 통신부가 패킷전송 여부를 모니터링하여 생성될 수도 있고, 전처리부(122)가 통신부로부터 해당 행동정보를 수신할 수도 있으며, 전처리부(122)가 이전 전송시간의 패킷전송 스케줄을 행동정보로 가져올 수도 있으나, 이에 한정되지 않는다.Here, the behavior information may be generated by the communication unit in the user terminal monitoring whether or not packets are transmitted, the pre-processing unit 122 may receive the corresponding behavior information from the communication unit, and the pre-processing unit 122 may transmit the packet transmission schedule of the previous transmission time. may be brought as behavior information, but is not limited thereto.

전처리부(122)는 입력된 행동정보

의 각 요소에 대하여 One-Hot 인코딩을 적용하여,

을 도출할 수 있다. 여기서,

는 각 활성 사용자 단말,

는 결정시간, N은 전송 채널,

는 결정시간 i에서의 활성 사용자 단말의 수를 의미한다.The pre-processing unit 122 is inputted behavior information

By applying One-Hot encoding to each element of

can be derived. here,

is each active user terminal,

is the decision time, N is the transmission channel,

denotes the number of active user terminals at decision time i.

전처리부(122)는 통신부를 통해서 ACK 신호 및 NAK 신호가 수신되면, 이를 기초로 수학식 1과 같이 각 경우에 맞게 매칭하여 보상정보를 생성할 수 있다.When the ACK signal and the NAK signal are received through the communication unit, the pre-processing unit 122 may generate compensation information by matching each case as in Equation 1 based on this.

이에 따라 전처리부(122)는 행동정보 및 보상정보에 기초하여 상태정보

를 생성할 수 있다.Accordingly, the pre-processing unit 122 is state information based on the behavior information and the reward information.

can create

데이터관리부(123)는 상태정보

를 기초로, 활성 사용자 단말의 수

와 최대 허용 사용자 단말 수

에 따라 재정의된 상태정보

를 생성할 수 있다.Data management unit 123 state information

Based on the number of active user terminals

and the maximum number of user terminals allowed

Status information redefined according to

can create

구체적으로, 활성 사용자 단말의 수

와 최대 허용 사용자 단말 수

가 동일한 경우, 데이터관리부(123)는 상태정보

를 다음의 수학식 2와 같이 재정의하여 생성할 수 있다.Specifically, the number of active user terminals

and the maximum number of user terminals allowed

If is the same, the data management unit 123 is the status information

can be created by redefining as in Equation 2 below.

수학식 2에서

는 재정의된 상태정보,

는 활성 사용자 단말,

는 결정시간,

은 전송 채널의 수,

는 최대 허용 사용자 단말 수,

는 One-Hot 인코딩을 적용한 행동정보,

는 보상정보를 의미할 수 있다.in Equation 2

is the overridden state information,

is the active user terminal,

is the decision time,

is the number of transmission channels,

is the maximum number of allowed user terminals,

is behavior information applied with One-Hot encoding,

may mean compensation information.

또한, 활성 사용자 단말의 수

가 최대 허용 사용자 단말 수

미만일 경우, 데이터관리부(123)는 상태정보

를 다음의 수학식 3과 같이 재정의할 수 있다.In addition, the number of active user terminals

is the maximum number of user terminals allowed

If less than, the data management unit 123 state information

can be redefined as in Equation 3 below.

수학식 3에서

는 재정의된 상태정보,

는 활성 사용자 단말,

는 결정시간,

은 전송 채널의 수,

는 최대 허용 사용자 단말 수,

는 시간슬롯의 수,

는 One-Hot 인코딩을 적용한 행동정보,

는 보상정보를 의미할 수 있다.in Equation 3

is the overridden state information,

is the active user terminal,

is the decision time,

is the number of transmission channels,

is the maximum number of allowed user terminals,

is the number of timeslots,

is behavior information applied with One-Hot encoding,

may mean compensation information.

다만, 활성 사용자 단말의 수

가 최대 허용 사용자 단말 수

미만일 경우,

만큼 상태정보의 크기가 작아지므로, 데이터관리부(123)는 추가적으로 0을 삽입하는 제로패딩을 적용하여 수학식 3의

및

와 같이 크기를 맞춰 재정의된 상태정보

를 생성할 수 있다.However, the number of active user terminals

is the maximum number of user terminals allowed

If less than

Since the size of the state information is reduced as much as possible, the data management unit 123 applies zero padding to additionally insert 0 in Equation 3

and

Redefined state information to fit the size like

can create

또한, 데이터관리부(123)는 상태정보

또는 재정의된 상태정보

를 누적하여 저장할 수 있다. 즉, 데이터관리부(123)는 매 결정시간마다 생성되는 상태정보를 지속적으로 누적하여 저장할 수 있다.In addition, the data management unit 123 state information

or overridden status information

can be accumulated and stored. That is, the data management unit 123 may continuously accumulate and store the state information generated at every determination time.

여기서 데이터관리부(123)가 매 결정시간마다 누적된 상태정보와 재정의된 상태정보를 지속적으로 누적하다 보면, 그 수가 방대해져 데이터 손실, 학습 오류 등의 문제점이 생길 수 있다.Here, if the data management unit 123 continuously accumulates the accumulated state information and the redefined state information at every determination time, the number becomes enormous and problems such as data loss and learning errors may occur.

따라서, 데이터관리부(123)는 LSTM(Long Short-Term Memory) 블록이라는 메모리 형태로 구성되어, 결정시간경과에 따라 누적된 상태정보와 전처리부(122)에서 생성된 상태정보가 집계된 LSTM 데이터를 시계열 데이터 처리 및 학습시킬 수 있으며, 해당 LSTM 데이터의 출력을 효율적으로 조정할 수 있다.Accordingly, the data management unit 123 is configured in the form of a memory called an LSTM (Long Short-Term Memory) block, and the LSTM data in which the accumulated state information and the state information generated by the pre-processing unit 122 are aggregated according to the elapse of a decision time. Time series data can be processed and trained, and the output of the corresponding LSTM data can be efficiently adjusted.

이와 같이, 데이터관리부(123)는 LSTM 모델을 도입하여 결정시간경과에 따른 상태정보를 집계함으로써, 패킷전송 결정 장치(121)가 과거의 상태정보를 이용하여 현재의 상태정보를 추정하는 데 효율성 및 정확성을 향상시킬 수 있다. 이렇게 데이터관리부(123)에 집계되어 있는 LSTM 데이터는 이후 가치산출부(124)의 입력 값으로 활용될 수 있다.In this way, the data management unit 123 introduces the LSTM model and aggregates the state information according to the elapse of the decision time, so that the packet transmission determining device 121 uses the past state information to estimate the current state information. accuracy can be improved. The LSTM data aggregated in the data management unit 123 in this way may be used as an input value of the value calculation unit 124 thereafter.

가치산출부(124)는 현재가치 산출부(125) 및 미래가치 산출부(126)를 포함할 수 있다.The value calculating unit 124 may include a present value calculating unit 125 and a future value calculating unit 126 .

현재가치 산출부(125)는 미리 마련된 인공지능에 재정의된 상태정보

를 입력하여 패킷전송의 현재가치를 산출할 수 있다.The present value calculation unit 125 provides state information redefined by artificial intelligence provided in advance.

can be input to calculate the present value of packet transmission.

현재가치 산출부(125)는 적어도 하나의 Deep Q-Network(DQN)로 구성될 수 있으며, 미리 마련된 Deep Q-Network에 재정의된 상태정보

입력을 통하여 패킷전송의 현재가치를 산출할 수 있다. 여기서 현재가치는 현재 결정시간만을 고려할 때 각 시간슬롯 및 전송 채널이 갖는 Q-Value를 의미할 수 있다.The present value calculation unit 125 may be composed of at least one Deep Q-Network (DQN), and state information redefined in a pre-prepared Deep Q-Network

The present value of packet transmission can be calculated through input. Here, the present value may mean a Q-Value of each timeslot and a transmission channel when only the current determination time is considered.

구체적으로, 현재가치 산출부(125)는 도 6에 도시된 것과 같이 복수 개의 Deep Q-Network(125-1 내지 125-

)로 구성될 수 있으며, 여기서 Deep Q-Network(125-1 내지 125-

)의 수는 최대 허용 사용자 단말 수

와 동일할 수 있다.Specifically, the present value calculation unit 125 is a plurality of Deep Q-Network (125-1 to 125-) as shown in FIG.

), where Deep Q-Network (125-1 to 125-

) is the maximum number of allowed user terminals.

can be the same as

또한, 현재가치 산출부(125)는 결정시간에 따라 현재가치의 산출에 이용하는 Deep Q-Network(125-1 내지 125-

)의 개수를 조정할 수 있다. 결정시간의 크기는 활성 사용자 단말의 수에 따라 달라질 수 있으므로, 기존의 Q-Value 산출 방법으로는 패킷전송에 대한 현재가치를 산출하는 것에 어려움이 있다.In addition, the present value calculation unit 125 is a Deep Q-Network (125-1 to 125-) used to calculate the present value according to the determination time.

) can be adjusted. Since the size of the decision time may vary depending on the number of active user terminals, it is difficult to calculate the present value for packet transmission using the existing Q-Value calculation method.

따라서, 현재가치 산출부(125)는 적어도 최대 허용 사용자 단말 수

만큼의 Deep Q-Network(125-1 내지 125-

)로 구성되어, 활성 사용자 단말 수를 고려한 현재가치를 산출함으로써 현재가치 산출의 정확도를 향상할 수 있으며, 현재 활성 사용자 단말 수에 한하여 인공지능을 활용함으로써 패킷전송 결정 장치(121)의 향상된 효율을 기대할 수 있다.Therefore, the present value calculation unit 125 is at least the maximum number of allowable user terminals.

Deep Q-Network (125-1 to 125-

), the accuracy of the present value calculation can be improved by calculating the present value considering the number of active user terminals, and the improved efficiency of the packet transmission determining device 121 is improved by using artificial intelligence only for the number of currently active user terminals. can be expected

구체적으로, 현재가치 산출부(125)에 포함된 복수 개의 Deep Q-Network(125-1 내지 125-

)는 활성 사용자 단말 수의 범위인

로 정의될 수 있으며, 각 Deep Q-Network(125-1 내지 125-

)는 패킷전송의 현재가치를 산출하는 어드벤티지 행동 브렌치(Advantage Action Branch) 및 산출된 패킷전송의 현재가치를 정규화하는 정규화 행동 브렌치(Normalization Action Branch)로 구성될 수 있다.Specifically, a plurality of Deep Q-Networks (125-1 to 125-) included in the present value calculation unit 125

) is the range of the number of active user terminals.

It can be defined as, and each Deep Q-Network (125-1 to 125-

) may be composed of an advantage action branch that calculates the present value of packet transmission and a normalization action branch that normalizes the calculated present value of packet transmission.

j번째 현재가치 산출부(즉, j번째 Deep Q-Network)의 어드벤티지 행동 브렌치는 재정의된 상태정보

를 입력 받아, j번째 행동정보

의 현재가치인

를 산출할 수 있다.The advantage behavior branch of the j-th present value calculator (that is, the j-th Deep Q-Network) is the redefined state information

receives the input, the j-th behavior information

is the present value of

can be calculated.

이후, j번째 현재가치 산출부의 정규화 행동 브렌치는 j번째 행동정보 별 현재가치인

,

, ?? ,

,

를 j번째 행동정보

에 대해 정규화하여, 최종적으로 정규화된 패킷전송의 현재가치

를 산출할 수 있다.After that, the normalized behavior branch of the j-th present value calculation unit is the present value for each j-th behavior information.

,

, ?? ,

,

is the j-th behavior information

By normalizing for , the present value of the finally normalized packet transmission

can be calculated.

미래가치 산출부(126)는 미리 마련된 인공지능에 재정의된 상태정보

를 입력하여 패킷전송의 미래가치를 산출할 수 있다. 여기서, 미래가치는 현재가치 산출부(125)에서 산출되는 패킷전송의 현재가치 이후의 패킷전송을 통해 획득할 수 있는 잠재적인 가치를 의미하는 것으로, 현재가치 산출부(125)에서 현재가치가 산출된 결정시간 이후의 패킷전송에 대한 가치를 의미할 수 있다.The future value calculation unit 126 provides state information redefined in artificial intelligence prepared in advance.

can be input to calculate the future value of packet transmission. Here, the future value means a potential value that can be obtained through packet transmission after the present value of packet transmission calculated by the present value calculation unit 125, and the present value is calculated by the present value calculation unit 125. It can mean the value of packet transmission after the determined decision time.

도 2를 참조하여 미래가치에 대해 더 상세히 설명하자면, 현재가치 산출부(125)가 결정시간 i=2 이후(더 상세하게는 결정시간 i=2와 결정시간 i=3 사이의 구간)의 패킷전송에 대한 현재가치를 산출한다면, 이때 미래가치는 결정시간 i=3 이후의 패킷전송에 대한 가치가 될 수 있다.To describe the future value in more detail with reference to FIG. 2 , the present value calculating unit 125 performs a packet after the decision time i=2 (more specifically, the interval between the decision time i=2 and the decision time i=3) If the present value for transmission is calculated, then the future value can be the value for packet transmission after the decision time i=3.

현재 패킷전송의 현재가치가 높은 값으로 산출되어도, 패킷전송의 미래가치는 낮은 값으로 산출될 수 있으므로, 향후 패킷전송의 미래가치에 대한 가중치를 낮게 하여 보다 안정된 Q-Value를 산출할 수 있게끔 할 수 있다.Even if the present value of the current packet transmission is calculated as a high value, the future value of the packet transmission can be calculated as a low value. can

따라서 가치산출부(124)는 패킷전송의 현재가치와 미래가치를 종합적으로 고려하여 최종적인 Q-Value를 산출할 수 있다.Accordingly, the value calculating unit 124 may calculate the final Q-Value by comprehensively considering the present and future values of packet transmission.

구체적으로, 미래가치 산출부(126)는 현재가치 산출부(125)에서 산출되는 패킷전송의 현재가치와는 별개로, 패킷전송의 미래가치

를 산출할 수 있다. 다만, 앞서 언급했던 것처럼 상태정보

는 결정시간 i마다 크기가 다르기 때문에, 미래가치 산출부(126)는 상태정보

를 대체하는 고정된 크기를 갖는 재정의된 상태정보

를 적용한 미래가치

를 산출할 수 있다.Specifically, the future value calculation unit 126 calculates the future value of packet transmission separately from the present value of packet transmission calculated by the present value calculation unit 125 .

can be calculated. However, as mentioned above, status information

Since the size of is different for each determination time i, the future value calculation unit 126 displays the state information

Overridden state information with a fixed size that replaces

future value applied

can be calculated.

Q-Value 산출부(127)는 패킷전송의 현재가치와 패킷전송의 미래가치를 합산하여 결정시간 내 시간슬롯 및 전송 채널 별로 Q-Value를 산출할 수 있다.The Q-Value calculator 127 may calculate the Q-Value for each timeslot and transmission channel within the determined time by adding up the present value of packet transmission and the future value of packet transmission.

구체적으로, Q-Value 산출부(127)는 현재가치 산출부(125)에서 산출된 패킷전송의 현재가치

와 미래가치 산출부(126)에서 산출된 패킷전송의 미래가치

의 조합으로 표현되는 Q-Value를 산출할 수 있고,

와 같이 표현할 수 있다.Specifically, the Q-Value calculator 127 calculates the present value of packet transmission calculated by the present value calculator 125 .

and the future value of packet transmission calculated by the future value calculation unit 126

It is possible to calculate the Q-Value expressed by the combination of

can be expressed as

이와 같이 Q-Value 산출부(127)는 결정시간 내 각 시간슬롯 및 전송 채널 별로 Q-Value를 산출할 수 있고, 이에 대한 예시는 도 7의 (a)와 같다.As described above, the Q-Value calculator 127 may calculate a Q-Value for each timeslot and each transmission channel within the determination time, and an example thereof is shown in FIG. 7A .

결정부(128)는 Q-Value 산출부(127)에서 산출된 결정시간 내 각 시간슬롯 및 전송 채널 별로 Q-Value에 기초하여, 다음 결정시간 내 각 시간슬롯에서 이용할 수 있는 전송 채널을 결정할 수 있다.The determination unit 128 may determine a transmission channel usable in each timeslot within the next determination time based on the Q-Value for each time slot and each transmission channel within the determination time calculated by the Q-Value calculator 127. have.

결정부(128)는 시간슬롯의 수 T(1~T) 및 전송 채널의 수 N(1~N)으로 구성된 결정시간 i에서의 N x T 행렬의 Q-Value에 기초하여, 가장 큰 Q-Value를 선택하고 해당 Q-Value가 포함되어 있는 시간슬롯 T에서의 전송 채널 N을 다음 결정시간 내 시간슬롯 T의 전송 채널로 결정할 수 있다.The determination unit 128 determines the largest Q- It is possible to select a value and determine the transmission channel N in the timeslot T including the corresponding Q-Value as the transmission channel in the timeslot T within the next determination time.

이후 N x T 행렬 내 가장 큰 Q-Value가 전송 채널로 결정되면, 결정부(128)는 해당 Q-Value가 포함된 행렬을 제외한 N x T-1 행렬에서 다시 가장 큰 Q-Value를 선택하고, 해당 Q-Value가 포함된 시간슬롯 T에서의 전송 채널 N을 다음 결정시간 내 시간슬롯 T에서의 전송 채널로 결정할 수 있다.Afterwards, when the largest Q-Value in the N x T matrix is determined as the transmission channel, the determiner 128 selects the largest Q-Value again from the N x T-1 matrix except for the matrix including the corresponding Q-Value, , it is possible to determine the transport channel N in the timeslot T including the corresponding Q-Value as the transport channel in the timeslot T within the next determination time.

다만, 결정부(128)는 위와 같이 가장 큰 Q-Value가 포함된 시간슬롯에서의 전송 채널을 다음 결정시간의 전송 채널로 결정하는 과정을, 전송 채널의 수 N만큼 반복하여 진행할 수 있다.However, the determination unit 128 may repeat the process of determining the transmission channel in the timeslot including the largest Q-Value as the transmission channel of the next determination time as described above by the number of transmission channels N.

구체적으로, 전송 채널을 결정하는 과정에서 결정시간 i동안의 활성 사용자 단말의 수

가 패킷전송을 시도할 수 있는 전송 채널의 수 N이 주어지면, 각 활성 사용자 단말은

확률로 패킷을 전송할 수 있기 때문에, 다음 결정시간의 전송 채널로 결정되는 수는 주어진 전송 채널의 수 N과 같을 수 있다.Specifically, the number of active user terminals during the determination time i in the process of determining the transmission channel

Given the number of transport channels N can attempt to transmit a packet, each active user terminal

Since the packet can be transmitted with a probability, the number determined as the transmission channel at the next decision time may be equal to the number N of the given transmission channel.

또한, 결정부(128)는 Q-Value가 선택되지 않은 나머지 시간슬롯에 대해서 제로패딩을 적용하여, 해당 시간슬롯의 전송 채널을 0으로 처리함으로써 다음 결정시간의 전송 채널 집합의 크기와 결정시간의 크기를 맞출 수 있다.In addition, the determination unit 128 applies zero padding to the remaining timeslots for which the Q-Value is not selected, and treats the transmission channel of the corresponding timeslot as 0, thereby determining the size of the transmission channel set at the next determination time and the determination time. can fit the size.

최종적으로 결정부(128)는 다음 결정시간 내 시간슬롯 별 전송 채널이 결정된 1 x T 행렬 형태의 패킷전송 스케줄(즉, 위에서 언급한 다음 결정시간의 전송 채널 집합)을 결정할 수 있다.Finally, the determiner 128 may determine a packet transmission schedule in the form of a 1 x T matrix in which the transmission channel for each timeslot is determined within the next determination time (ie, the transmission channel set at the next determination time mentioned above).

또한, 결정부(128)가 다음 결정시간의 패킷전송 스케줄을 결정하면, 활성 사용자 단말들은 다음 결정시간에서 패킷전송 스케줄에 따라 패킷을 전송하며, 패킷전송이 완료되면 결정부(128)는 다시 그 다음 결정시간의 패킷전송 스케줄을 결정하는 과정을 반복하며 업데이트할 수 있다.In addition, when the determination unit 128 determines the packet transmission schedule for the next determination time, the active user terminals transmit the packet according to the packet transmission schedule at the next determination time, and when the packet transmission is completed, the determination unit 128 again The process of determining the packet transmission schedule for the next determined time can be repeated and updated.

본 발명의 패킷전송 스케줄 결정 과정의 일 실시예를 도시한 도 7을 참조하여 결정부의 동작을 다시 한번 설명하고자 한다.The operation of the determination unit will be described once again with reference to FIG. 7 showing an embodiment of the packet transmission schedule determination process of the present invention.

도 7의 (a)와 같이, 결정부는 시간슬롯의 수(T=5) 및 전송 채널의 수(N=2)로 구성된 결정시간에서의 2 x 5 행렬의 Q-Value에 기초하여 가장 큰 Q-Value(5.7)을 선택하고, 해당 Q-Value(5.7)이 포함되어 있는 시간슬롯 1의 전송 채널 1을 다음 결정시간 내 시간슬롯 1의 전송 채널로 결정할 수 있다.As shown in (a) of FIG. 7 , the decision unit has the largest Q based on the Q-Value of the 2×5 matrix at the decision time composed of the number of timeslots (T=5) and the number of transmission channels (N=2). -Value(5.7) is selected, and the transmission channel 1 of timeslot 1 including the corresponding Q-Value(5.7) can be determined as the transmission channel of timeslot 1 within the next determination time.

또한, 도 7의 (b)와 같이, 결정부는 N x T 행렬 내 가장 큰 Q-Value(5.7)가 전송 채널로 결정되면, 결정부는 해당 Q-Value(5.7)가 포함된 행렬(1,1)을 제외한 2 x 4 행렬에서 다시 가장 큰 Q-Value(4.3)를 선택하고, 해당 Q-Value(4.3)가 포함된 시간슬롯 4의 전송 채널 2을 다음 결정시간 내 시간슬롯 4에서의 전송 채널로 결정할 수 있다.In addition, as shown in (b) of FIG. 7 , when the determination unit determines that the largest Q-Value (5.7) in the N x T matrix is the transmission channel, the determination unit determines the matrix (1,1) including the corresponding Q-Value (5.7). ), selects the largest Q-Value (4.3) again from the 2 x 4 matrix, and sets the transmission channel 2 of timeslot 4 including the corresponding Q-Value (4.3) to the transmission channel in timeslot 4 within the next decision time. can be decided with

이후, 도 7의 (c)와 같이, 결정부는 Q-Value가 선택되지 않은 나머지 시간슬롯(2,3,5)에 대해서 제로패딩을 적용하여, 해당 시간슬롯의 전송 채널을 0으로 처리함으로써 다음 결정시간의 전송 채널 집합의 크기와 결정시간의 크기를 맞출 수 있다.Thereafter, as shown in (c) of FIG. 7 , the determination unit applies zero padding to the remaining timeslots 2, 3, 5 in which the Q-Value is not selected, and treats the transmission channel of the corresponding timeslot as 0. The size of the transmission channel set of the decision time and the size of the decision time can be matched.

최종적으로 도 7의 (c)와 같이, 결정부는 다음 결정시간 내 시간슬롯 별 전송 채널이 결정된 1 x 5 행렬 형태의 패킷전송 스케줄(즉, 위에서 언급한 다음 결정시간의 전송 채널 집합)을 결정할 수 있다.Finally, as shown in FIG. 7(c), the decision unit can determine the packet transmission schedule in the form of a 1 x 5 matrix in which the transmission channel for each timeslot is determined within the next determination time (that is, the transmission channel set at the next determination time mentioned above). have.

즉, 활성 사용자 단말은 다음 결정시간에 포함된 시간슬롯 1에서는 전송 채널 1를 통해 패킷을 전송하고, 시간슬롯 4에서는 전송 채널 2을 통해 패킷을 전송할 수 있으며, 이후 시간슬롯 2, 3 및 5에서는 패킷을 전송하지 않는다는 것을 의미할 수 있다.That is, the active user terminal may transmit a packet through transport channel 1 in timeslot 1 included in the next determination time, and may transmit a packet through transport channel 2 in timeslot 4, and thereafter, in timeslots 2, 3 and 5 It may mean that the packet is not transmitted.

도 8은 본 발명의 실시예에 따른 패킷전송 결정 장치를 이용하여 패킷전송 스케줄을 결정하는 방법을 나타내는 순서도이다.8 is a flowchart illustrating a method of determining a packet transmission schedule using the packet transmission determining apparatus according to an embodiment of the present invention.

도 8을 참조하면, 패킷전송 결정 장치를 이용하여 패킷전송 스케줄을 결정하는 방법은 상태정보를 생성하는 전처리단계(S100), 재정의된 상태정보를 생성하는 데이터관리단계(S200), 현재가치 및 미래가치를 산출하는 가치산출단계(S300), Q-Value를 산출하는 Q-Value 산출단계(S400) 및 패킷전송 스케줄을 결정하는 결정단계(S500)를 포함할 수 있다.Referring to FIG. 8 , a method for determining a packet transmission schedule using a packet transmission determining device includes a pre-processing step of generating status information (S100), a data management step of generating redefined status information (S200), present value and future It may include a value calculation step (S300) of calculating a value, a Q-Value calculation step (S400) of calculating a Q-Value, and a decision step (S500) of determining a packet transmission schedule.

전처리단계(S100)는 전처리부가 패킷을 전송하는 행동정보 및 패킷전송에 대응되는 피드백정보에 기초하여 결정시간마다 상태정보를 생성할 수 있다.In the pre-processing step ( S100 ), the pre-processing unit may generate state information for each determination time based on the packet transmission behavior information and the feedback information corresponding to the packet transmission.

데이터관리단계(S200)는 데이터관리부가 상태정보 및 누적된 상태정보를 이용하여 재정의된 상태정보를 생성할 수 있다.In the data management step (S200), the data management unit may generate the redefined state information by using the state information and the accumulated state information.

가치산출단계(S300)는 가치산출부가 재정의된 상태정보를 이용하여 패킷전송의 현재가치 및 패킷전송의 미래가치를 산출할 수 있다.In the value calculation step ( S300 ), the present value of packet transmission and the future value of packet transmission may be calculated using the state information redefined by the value calculation unit.

Q-Value 산출단계(S400)는 Q-Value 산출부가 패킷전송의 현재가치 및 패킷전송의 미래가치를 합산하여 패킷전송에 대한 Q-Value를 산출할 수 있다.In the Q-Value calculation step ( S400 ), the Q-Value calculator may calculate a Q-Value for packet transmission by adding up the present value of packet transmission and future value of packet transmission.

결정단계(S500)는 결정부가 Q-Value에 기초하여 다음 결정시간에서의 패킷전송 스케줄을 결정할 수 있다.In the decision step S500, the decision unit may determine the packet transmission schedule at the next decision time based on the Q-Value.

패킷전송 결정 장치를 이용하여 패킷전송 스케줄을 결정하는 방법은 이상에서 도 1 내지 도 7을 참조하여 패킷전송 결정 장치를 상세하게 설명하였으므로, 위와 같이 간략하게 설명하도록 한다.The method of determining the packet transmission schedule using the packet transmission determining device has been described in detail with reference to FIGS. 1 to 7 above, and thus will be briefly described as above.

도 9a는 본 발명의 실시예에 따른 시간경과에 따른 활성 사용자 단말 간의 패킷전송 공정성을 도시한 도면이다.9A is a diagram illustrating packet transmission fairness between active user terminals over time according to an embodiment of the present invention.

도 9a를 참조하면, 최대 허용 사용자 단말 수

, 전송 채널

인 경우, 본 발명을 시간경과에 따라 실시함으로써 얻어지는 패킷전송 공정성을 확인할 수 있다.Referring to Figure 9a, the maximum number of allowed user terminals

, transmission channel

In the case of , the fairness of packet transmission obtained by implementing the present invention over time can be confirmed.

위의 조건에 따라 이상적인 경우, 각 활성 사용자 단말은 2/5 확률로 패킷을 전송하고, 이에 따라 모든 패킷을 성공적으로 송신한 경우 2/5의 전송률을 얻을 수 있어 모든 활성 사용자 단말이 공정하게 패킷을 전송할 수 있다.In the ideal case according to the above conditions, each active user terminal transmits a packet with a probability of 2/5, and thus, if all packets are successfully transmitted, a transmission rate of 2/5 can be obtained, so that all active user terminals can fairly transmit packets. can be transmitted.

본 발명을 시간경과에 따라 실시하면, 약 시간슬롯 50까지는 각 활성 사용자의 전송률이 고르지 않고 공정성이 낮은 현상을 확인할 수 있지만, 시간슬롯 50 이후로도 지속적인 실시를 통해 모든 활성 사용자의 전송률이 목표 전송률 2/5(0.4)과 같이 일정해짐에 따라 패킷전송의 공정성을 유지할 수 있다.If the present invention is implemented over time, it can be seen that the data rate of each active user is uneven and fairness is low until about timeslot 50, but through continuous implementation after timeslot 50, the data rate of all active users is the target data rate 2/ As it becomes constant like 5(0.4), it is possible to maintain the fairness of packet transmission.

도 9b는 본 발명의 실시예에 따른 활성 사용자 단말 별 전송률과 목표 전송률을 도시한 도면이다.9B is a diagram illustrating a data rate and a target data rate for each active user terminal according to an embodiment of the present invention.

도 9b의 (a)는 최대 허용 사용자 단말 수

, 전송 채널

인 경우의 활성 사용자 단말 별 전송률과 목표 전송률을 확인할 수 있다.9b (a) is the maximum number of allowed user terminals

, transmission channel

In the case of , it is possible to check the data rate and the target data rate for each active user terminal.

도 9b의 (b)는 최대 허용 사용자 단말 수

, 전송 채널

인 경우의 활성 사용자 단말 별 전송률과 목표 전송률을 확인할 수 있다.9b (b) is the maximum number of allowed user terminals

, transmission channel

도 9b의 (a) 및 (b) 도면을 참조하면, 시간경과에 따라 실시간으로 접속하는 활성 사용자 수가 달라지므로 활성 사용자 별 목표 전송률이 상이함을 확인할 수 있다.Referring to (a) and (b) of FIG. 9B , it can be seen that the target data rate for each active user is different because the number of active users accessing in real time varies according to the lapse of time.

하지만, 본 발명을 실시함으로써 접속하는 활성 사용자 수가 달라지는 랜덤 액세스 통신 환경에서도, 활성 사용자 별 목표 전송률에 근접하는 전송률을 얻을 수 있다.However, even in a random access communication environment in which the number of connected active users varies by implementing the present invention, it is possible to obtain a data rate close to the target data rate for each active user.

상기의 설명은 기술적 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명의 기술분야에서 통상의 지식을 가진 자라면 본질적인 특성에서 벗어나지 않는 범위 내에서 다양한 수정, 변경 및 치환이 가능할 것이다. 따라서 상기에 개시된 실시예 및 첨부된 도면들은 기술적 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예 및 첨부된 도면에 의하여 기술적 사상의 범위가 한정되는 것은 아니다. 그 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술적 사상은 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea, and various modifications, changes and substitutions will be possible without departing from the essential characteristics by those skilled in the art of the present invention. Accordingly, the embodiments disclosed above and the accompanying drawings are for explanation rather than limiting the technical idea, and the scope of the technical idea is not limited by these embodiments and the accompanying drawings. The scope of protection should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of rights.

1: 랜덤 액세스 통신 시스템
10: 사용자 단말
20: 중계기
30: 패킷전송 스케줄
110: 통신부
120: 프로세스
121: 패킷전송 결정 장치
122: 전처리부
123: 데이터관리부
124: 가치산출부
125: 현재가치 산출부
126: 미래가치 산출부
127: Q-Value 산출부
128: 결정부
130: 저장부1: Random Access Communication System
10: user terminal
20: repeater
30: packet transmission schedule
110: communication department
120: process
121: packet transmission decision device
122: preprocessor
123: data management unit
124: value calculation unit
125: present value calculator
126: Future value calculation unit
127: Q-Value calculator
128: decision part
130: storage

Claims

a pre-processing unit for generating status information at each decision time based on the packet transmission behavior information and the packet transmission feedback information;
a data management unit for generating redefined state information by using the state information and the accumulated state information;
a value calculating unit for calculating a present value of the packet transmission and a future value of the packet transmission by using the redefined state information;
a Q-Value calculator for calculating a Q-Value for the packet transmission by adding up the present value and the future value; and
a determination unit that determines a packet transmission schedule at a next determination time based on the Q-Value;
Packet transmission decision device comprising a.

According to claim 1,
The value calculation unit,
A packet transmission determining device for calculating the present value and the future value by inputting the redefined state information into a pre-prepared Deep Q-Network.

3. The method of claim 2,
The value calculation unit,
a present value calculator comprising a plurality of the Deep Q-Networks corresponding to the maximum number of allowed user terminals;
Packet transmission decision device comprising a.

4. The method of claim 3,
The present value calculation unit,
The apparatus for determining packet transmission, characterized in that the present value is calculated by using only the number of the Deep Q-Networks corresponding to the size of the determination time among the plurality of Deep Q-Networks.

5. The method of claim 4,
The present value calculation unit,
The apparatus for determining packet transmission, characterized in that normalizing the present values calculated by the plurality of deep Q-Networks.

According to claim 1,
The value calculation unit,
a future value calculator for calculating the future value, which is a value of packet transmission after the determined time at which the present value is determined, by using the redefined state information;
Packet transmission decision device comprising a.

According to claim 1,
The determining unit is
A packet transmission determining apparatus for determining a transmission channel corresponding to a timeslot within the determination time for which the highest Q-Value is calculated among the Q-Values as a packet transmission schedule at the next determination time.

8. The method of claim 7,
The determining unit is
and applying zero-padding to the timeslot not determined by the packet transmission schedule to match the size of the packet transmission schedule and the size of the determination time.

According to claim 1,
The data management unit,
The apparatus for determining packet transmission, characterized in that time series learning and aggregation of the state information and the redefined state information according to the elapse of the determination time by using a Long Short-Term Memory (LSTM) model.

a pre-processing step of generating, by the pre-processing unit, status information at each decision time based on the behavior information of packet transmission and the feedback information received through packet transmission;
a data management step of generating, by a data management unit, the redefined state information using the state information and the accumulated state information;
a value calculation step of calculating, by a value calculation unit, a present value of the packet transmission and a future value of the packet transmission by using the redefined state information;
a Q-Value calculation step in which a Q-Value calculation unit calculates a Q-Value for the packet transmission by summing the present value and the future value; and
a decision step in which a decision unit determines a packet transmission schedule at a next decision time based on the Q-Value;
Packet transmission schedule determination method comprising a.

11. The method of claim 10,
The value calculation step is
A method for determining a packet transmission schedule for calculating the present value and the future value by inputting the redefined state information into a pre-prepared Deep Q-Network.

12. The method of claim 11,
The value calculation step is
a present value calculation step of configuring a present value calculation unit with a plurality of the Deep Q-Networks corresponding to the maximum allowed number of user terminals;
Packet transmission schedule determination method comprising a.

13. The method of claim 12,
The present value calculation step is
Method for determining the packet transmission schedule, characterized in that the present value is calculated by using only the number of the Deep Q-Networks corresponding to the size of the determination time among the plurality of Deep Q-Networks.

14. The method of claim 13,
The present value calculation step is
A method for determining a packet transmission schedule, characterized in that normalizing the present values calculated by the plurality of deep Q-Networks.

11. The method of claim 10,
The value calculation step is
a future value calculation step of calculating the future value, which is a value of packet transmission after the determination time at which the present value is determined, using the redefined state information;
Packet transmission schedule determination method comprising a.

11. The method of claim 10,
The decision step is
A method for determining a packet transmission schedule for determining a transmission channel corresponding to a timeslot within the determination time for which the highest Q-Value is calculated among the Q-Values as the packet transmission schedule at the next determination time.

17. The method of claim 16,
The decision step is
and applying zero-padding to the timeslot that is not determined by the packet transmission schedule to match the size of the packet transmission schedule with the size of the determination time.

11. The method of claim 10,
The data management step is
A method for determining a packet transmission schedule, characterized in that time series learning and aggregation of the state information and the redefined state information according to the elapse of the determination time using a Long Short-Term Memory (LSTM) model.