KR20220158916A

KR20220158916A - Method of radio resource scheduling for beyond-5g network using artificial intelligence, recording medium and device for performing the method

Info

Publication number: KR20220158916A
Application number: KR1020210066436A
Authority: KR
Inventors: 김성원; 나우만알리
Original assignee: 영남대학교 산학협력단
Priority date: 2021-05-24
Filing date: 2021-05-24
Publication date: 2022-12-02

Abstract

A wireless communication resource allocation method using artificial intelligence comprises the steps of: determining the scheduling priority of a user between heterogeneous services of eMBB and URLLC using selectively input learning data; selecting subcarrier spacing (SCS) numerology and an orthogonal frequency division multiplex (OFDM) symbol number for a user using a URLLC service by using the determined scheduling priority as an input of an intelligent scheduler; calculating a compensation value of a state according to a combination of the SCS numerology and the OFDM symbol number that minimizes a normalized load (NL) in each slot of each user; and updating an operation value and learning data of the state by reflecting the calculated compensation value. Accordingly, it is possible to efficiently allocate resources to heterogeneous eMBB and URLLC in a B5G environment.

Description

Method for allocating resources for wireless communication using artificial intelligence, recording medium and device for performing the same

본 발명은 인공지능을 이용한 무선통신 자원할당 방법, 이를 수행하기 위한 기록 매체 및 장치에 관한 것으로서, 더욱 상세하게는 B5G 환경에서 eMBB와 URLLC의 공존을 위해 미니 슬롯 TTI(Transmission Time Interval) 스케줄링을 위한 최적의 OFDM 심볼 수와 수비학의 선택을 모델링하고 이를 해결하기 위해 Q-러닝(QL) 알고리즘을 활용하는 지능형 차세대 NodeB(gNB)에 관한 것이다.The present invention relates to a wireless communication resource allocation method using artificial intelligence, and to a recording medium and apparatus for performing the same, and more particularly, to a minislot TTI (Transmission Time Interval) scheduling for coexistence of eMBB and URLLC in a B5G environment. An intelligent next-generation NodeB (gNB) that models the selection of optimal OFDM symbol count and numerology and utilizes a Q-learning (QL) algorithm to solve it.

초고신뢰/저지연 통신(URLLC; Ultra-Reliable and Low Latency Communication) 및 향상된 모바일 브로드밴드(eMBB; enhanced Mobile BroadBand)는 5G 이후(B5G; Beyond 5G)를 위해 ITU(International Telecommunication Union)에서 분류한 두 가지 주요 카테고리이다. Ultra-Reliable and Low Latency Communication (URLLC) and enhanced Mobile Broadband (eMBB) are two categories classified by the International Telecommunication Union (ITU) for beyond 5G (B5G). This is the main category.

이 두 카테고리는 필요한 데이터 속도, 데이터 패킷 크기, 처리량, 지연 시간 및 안정성 측면에서 서로 다르다. 따라서, eMBB 및 URLLC 서비스의 공존 및 스케줄링은 차세대 네트워크의 주요 과제 중 하나이다. These two categories differ from each other in terms of required data rate, data packet size, throughput, latency and reliability. Therefore, coexistence and scheduling of eMBB and URLLC services is one of the major challenges of next-generation networks.

3GPP(3rd Generation Partnership Project)는 B5G 네트워크용 무선 인터페이스로 NR(New Radio)을 표준화한다. NR은

의 확장 가능한 부반송파 간격(SCS) 수비학을 제공한다. 또한, NR은 URLLC 트래픽이 산발적이고 eMBB에 비해 데이터 크기가 작기 때문에 다양한 수의 OFDM 심볼로 URLLC 전송을 위한 확장 가능한 미니 슬롯을 허용한다. NR에서 SCS를 늘리거나 미니 슬롯에서 OFDM 심볼 수를 줄이면 전송 시간 간격(TTI; Transmission Time Interval)이 줄어들 수 있다. The 3rd Generation Partnership Project (3GPP) standardizes New Radio (NR) as a radio interface for B5G networks. NR is

Provides a scalable subcarrier spacing (SCS) numerology of In addition, NR allows scalable mini-slots for URLLC transmission with a variable number of OFDM symbols because URLLC traffic is sporadic and the data size is small compared to eMBB. The Transmission Time Interval (TTI) can be reduced by increasing the SCS in the NR or reducing the number of OFDM symbols in the minislot.

미니 슬롯에서 SCS 및 OFDM 심볼의 최적 선택은 eMBB와 URLLC의 공존을 위해 매우 중요하다. 3GPP는 URLLC 지연 시간과 안정성을 충족하기 위해 펑처링(Puncturing) 메커니즘을 표준화했다. 펑처링 메커니즘은 진행 중인 eMBB 트래픽을 중단하여 eMBB 사용자에게 알리지 않고 미니 슬롯에서 URLLC 트래픽을 전송한다. 그러나, 펑처링 메커니즘은 eMBB 사용자의 서비스 품질(QoS)을 저하시키는 문제가 있다. Optimal selection of SCS and OFDM symbols in minislots is very important for coexistence of eMBB and URLLC. 3GPP has standardized a puncturing mechanism to meet URLLC latency and reliability. The puncturing mechanism aborts the ongoing eMBB traffic to send URLLC traffic in the minislot without notifying the eMBB user. However, the puncturing mechanism has a problem of degrading quality of service (QoS) of eMBB users.

KRKR 10-2020-0015381 10-2020-0015381 AA KRKR 10-2017-0128143 10-2017-0128143 AA WO 2018/204344 A1WO 2018/204344 A1

이에, 본 발명의 기술적 과제는 이러한 점에서 착안된 것으로 본 발명의 목적은 인공지능을 이용한 무선통신 자원할당 방법을 제공하는 것이다.Accordingly, the technical problem of the present invention is conceived in this respect, and an object of the present invention is to provide a wireless communication resource allocation method using artificial intelligence.

본 발명의 다른 목적은 상기 인공지능을 이용한 무선통신 자원할당 방법을 수행하기 위한 컴퓨터 프로그램이 기록된 기록 매체를 제공하는 것이다.Another object of the present invention is to provide a recording medium on which a computer program for performing the wireless communication resource allocation method using artificial intelligence is recorded.

본 발명의 또 다른 목적은 상기 인공지능을 이용한 무선통신 자원할당 방법을 수행하기 위한 장치를 제공하는 것이다.Another object of the present invention is to provide an apparatus for performing the wireless communication resource allocation method using artificial intelligence.

상기한 본 발명의 목적을 실현하기 위한 일 실시예에 따른 인공지능을 이용한 무선통신 자원할당 방법은, 선택적으로 입력된 학습 데이터를 이용하여 eMBB 및 URLLC의 이종 서비스 간 사용자의 스케줄링 우선 순위를 결정하는 단계; 결정된 스케줄링 우선 순위를 지능형 스케줄러의 입력으로 하여 URLLC 서비스를 사용하는 사용자에 대한 SCS(SubCarrier Spacing) 수비학 및 OFDM(Orthogonal Frequency Division Multiplex) 심볼 수를 선택하는 단계; 각 사용자의 각 슬롯에서 정규화된 부하(NL; Normalized Load)를 최소화하는 SCS 수비학 및 OFDM 심볼 수의 조합에 따른 상태의 보상값을 계산하는 단계; 및 계산된 보상값을 반영하여 상태의 동작값 및 학습 데이터를 업데이트하는 단계;를 포함한다. A wireless communication resource allocation method using artificial intelligence according to an embodiment for realizing the object of the present invention is to determine a user's scheduling priority between heterogeneous services of eMBB and URLLC using selectively input learning data. step; selecting the number of SubCarrier Spacing (SCS) numerology and Orthogonal Frequency Division Multiplex (OFDM) symbols for a user using the URLLC service by using the determined scheduling priority as an input of the intelligent scheduler; Calculating a compensation value of a state according to a combination of SCS numerology and OFDM symbol number that minimizes a normalized load (NL) in each slot of each user; and updating the operation value and learning data of the state by reflecting the calculated compensation value.

본 발명의 실시예에서, 상기 사용자의 스케줄링 우선 순위를 결정하는 단계는, 각 슬롯에 대해 각 서비스 유형 사용자의 패킷/초의 도착률, 패킷 길이, 각 사용자의 큐 가중치 및 CSI를 입력하는 단계; 서비스 유형 eMBB 및 URLLC에 대한 우선 순위 함수에 따라 각 사용자의 우선 순위를 계산하는 단계; 및 서비스 유형 eMBB 및 URLLC를 사용하는 각 사용자의 우선 순위에 따라 내림차순으로 순위를 결정하는 단계;를 포함할 수 있다.In an embodiment of the present invention, the step of determining the user's scheduling priority includes: inputting the arrival rate of each service type user's packet/second, packet length, queue weight and CSI of each user for each slot; calculating each user's priority according to the priority function for service types eMBB and URLLC; and determining the order in descending order according to the priorities of each user using the service types eMBB and URLLC.

본 발명의 실시예에서, 상기 우선 순위 함수는 도착률, 데이터 크기 및 대기열 길이 중 적어도 하나를 기반으로 할 수 있다.In an embodiment of the present invention, the priority function may be based on at least one of arrival rate, data size and queue length.

본 발명의 실시예에서, 상기 SCS 수비학 및 OFDM 심볼 수를 선택하는 단계는, 각 시간 슬롯에 대해 우선 순위 설정에서 가장 높은 순위를 가진 eMBB 사용자에게 시간 슬롯을 할당하는 단계; URLLC 사용자에 대한 시간 슬롯이 있는 각 미니 슬롯에 대해 0에서 1 사이의 난수(random number)를 생성하는 단계; 및 생성된 난수가 미리 설정된 임계값과 비교하여 선택된 동작으로 학습하는 단계;를 포함할 수 있다.In an embodiment of the present invention, the step of selecting the SCS numerology and the number of OFDM symbols may include allocating a time slot to an eMBB user having the highest priority in priority setting for each time slot; generating a random number between 0 and 1 for each minislot with time slots for a URLLC user; and comparing the generated random number with a preset threshold and learning the selected operation.

본 발명의 실시예에서, 상기 생성된 난수가 미리 설정된 임계값과 비교하여 선택된 동작으로 학습하는 단계는, 생성된 난수가 미리 설정된 임계값 이하인 경우, SCS 수비학 및 OFDM 심볼 수를 무작위로 선택한 상태에서의 동작을 탐색(exploration)하고, 생성된 난수가 미리 설정된 임계값보다 큰 경우, 가장 높은 Q-값을 가진 상태에서의 동작을 착취(exploitation) 할 수 있다.In an embodiment of the present invention, the step of comparing the generated random number with a preset threshold and learning with the selected operation may include, when the generated random number is less than or equal to a preset threshold, in a state in which SCS numerology and OFDM symbol numbers are randomly selected. The operation of is explored, and if the generated random number is greater than a preset threshold, the operation in the state with the highest Q-value can be exploited.

본 발명의 실시예에서, 상기 상태의 보상값을 계산하는 단계는, 각 시간 슬롯의 서비스 유형 URLLC 및 eMBB을 사용하여 각 사용자의 정규화된 부하를 계산하는 단계;를 더 포함할 수 있다.In an embodiment of the present invention, the step of calculating the compensation value of the state may further include calculating a normalized load of each user using service types URLLC and eMBB of each time slot.

본 발명의 실시예에서, 상기 슬롯의 길이는 URLLC 서비스를 위한 미니 슬롯의 길이 및 eMBB 서비스를 위한 슬롯의 길이를 합한 길이일 수 있다.In an embodiment of the present invention, the length of the slot may be the sum of the length of the mini-slot for the URLLC service and the length of the slot for the eMBB service.

상기한 본 발명의 다른 목적을 실현하기 위한 일 실시예에 따른 컴퓨터로 판독 가능한 저장 매체에는, 상기 인공지능을 이용한 무선통신 자원할당 방법을 수행하기 위한 컴퓨터 프로그램이 기록되어 있다. A computer program for performing the wireless communication resource allocation method using artificial intelligence is recorded in a computer-readable storage medium according to an embodiment for realizing another object of the present invention.

상기한 본 발명의 또 다른 목적을 실현하기 위한 일 실시예에 따른 인공지능을 이용한 무선통신 자원할당 장치는, 선택적으로 입력된 학습 데이터를 이용하여 eMBB 및 URLLC의 이종 서비스 간 사용자의 스케줄링 우선 순위를 결정하는 우선 순위 계산부; 결정된 스케줄링 우선 순위를 입력으로 하여 URLLC 서비스를 사용하는 사용자에 대한 SCS(SubCarrier Spacing) 수비학 및 OFDM(Orthogonal Frequency Division Multiplex) 심볼 수를 선택하는 최적 상태 선택부; 각 사용자의 각 슬롯에서 정규화된 부하(NL; Normalized Load)를 최소화하는 SCS 수비학 및 OFDM 심볼 수의 조합에 따른 상태의 보상값을 계산하는 보상값 계산부; 및 계산된 보상값을 반영하여 상태의 동작값 및 학습 데이터를 업데이트하는 최적값 반환부;를 포함한다.An apparatus for allocating resources for wireless communication using artificial intelligence according to an embodiment for realizing another object of the present invention described above determines the user's scheduling priority between heterogeneous services of eMBB and URLLC using selectively input learning data. a priority calculation unit that determines; an optimal state selection unit that selects the number of SubCarrier Spacing (SCS) numerology and Orthogonal Frequency Division Multiplex (OFDM) symbols for a user using the URLLC service using the determined scheduling priority as an input; A compensation value calculation unit for calculating a compensation value of a state according to a combination of SCS numerology and OFDM symbol number that minimizes a normalized load (NL) in each slot of each user; and an optimal value returning unit that updates the operating value and learning data of the state by reflecting the calculated compensation value.

본 발명의 실시예에서, 상기 우선 순위 계산부는, 각 슬롯에 대해 각 서비스 유형 사용자의 패킷/초의 도착률, 패킷 길이, 각 사용자의 큐 가중치 및 CSI를 입력으로 하여, 서비스 유형 eMBB 및 URLLC에 대한 우선 순위 함수에 따라 각 사용자의 우선 순위를 계산하고, 각 사용자의 우선 순위에 따라 내림차순으로 순위를 결정할 수 있다.In an embodiment of the present invention, the priority calculation unit takes as input the arrival rate of packets/second of each service type user, the packet length, the queue weight and CSI of each user for each slot, and determines the priority for the service types eMBB and URLLC. The priority of each user may be calculated according to the ranking function, and the ranking may be determined in descending order according to the priority of each user.

본 발명의 실시예에서, 상기 최적 상태 선택부는, 각 시간 슬롯에 대해 우선 순위 설정에서 가장 높은 순위를 가진 eMBB 사용자에게 시간 슬롯을 할당하고, URLLC 사용자에 대한 시간 슬롯이 있는 각 미니 슬롯에 대해 0에서 1 사이의 난수(random number)를 생성하여, 생성된 난수가 미리 설정된 임계값과 비교하여 선택된 동작으로 학습할 수 있다.In an embodiment of the present invention, the optimal state selector allocates a time slot to an eMBB user having the highest priority in priority setting for each time slot, and 0 for each mini-slot with a time slot for a URLLC user. By generating a random number between 1 and 1, the generated random number can be compared with a preset threshold and learned as a selected operation.

본 발명의 실시예에서, 상기 최적 상태 선택부는, 생성된 난수가 미리 설정된 임계값 이하인 경우, SCS 수비학 및 OFDM 심볼 수를 무작위로 선택한 상태에서의 동작을 탐색(exploration)하고, 생성된 난수가 미리 설정된 임계값보다 큰 경우, 가장 높은 Q-값을 가진 상태에서의 동작을 착취(exploitation)할 수 있다.In an embodiment of the present invention, when the generated random number is less than or equal to a preset threshold value, the optimal state selector explores an operation in a state in which SCS numerology and the number of OFDM symbols are randomly selected, and the generated random number is previously set. If it is greater than the set threshold, the operation in the state with the highest Q-value may be exploited.

이와 같은 인공지능을 이용한 무선통신 자원할당 방법에 따르면, URLLC 애플리케이션을 제공하기 위한 미니 슬롯에 대해 SCS 및 OFDM 심볼의 최적의 상태를 선택하여, B5G 환경에서 이기종인 eMBB와 URLLC에 대한 효율적인 자원할당이 가능하다.According to such a wireless communication resource allocation method using artificial intelligence, efficient resource allocation for heterogeneous eMBB and URLLC in a B5G environment is achieved by selecting the optimal state of SCS and OFDM symbols for mini-slots to provide URLLC applications. It is possible.

도 1은 본 발명의 일 실시예에 따른 인공지능을 이용한 무선통신 자원할당 장치의 블록도이다.
도 2는 이기종 서비스(URLLC 및 eMBB)를 제공하는 다운 링크 단일 셀 셀룰러 네트워크의 개념도이다.
도 3은 다양한 스펙트럼 및 배치를 위해 가능한 SCS 수비학을 가진 NR(New Radio)의 프레임 구조를 보여주는 도면이다.
도 4는 5G 애플리케이션용 gNB에서 펑처링 메커니즘을 사용하여 PHY/MAC 계층 동작을 설명하는 대기열 모델을 보여주는 도면이다.
도 5는 본 발명의 상태 공간 다이어그램(속도 적응 방식)을 보여주는 도면이다.
도 6은 본 발명에서 제안된 지능형 gNB의 에이전트-환경 상호 작용을 보여주는 도면이다.
도 7은 일반걱인 QL 알고리즘을 보여주는 도면이다.
도 8은 본 발명의 일 실시예에 따른 우선 순위 계산 과정의 흐름도이다.
도 9는 본 발명의 일 실시예에 따른 인공지능을 이용한 무선통신 자원할당 방법의 흐름도이다.1 is a block diagram of a wireless communication resource allocation apparatus using artificial intelligence according to an embodiment of the present invention.
2 is a conceptual diagram of a downlink single cell cellular network providing heterogeneous services (URLLC and eMBB).
3 is a diagram showing the frame structure of New Radio (NR) with possible SCS numerologies for various spectrums and deployments.
4 is a diagram showing a queue model describing PHY/MAC layer operation using a puncturing mechanism in a gNB for 5G applications.
5 is a diagram showing a state space diagram (speed adaptation scheme) of the present invention.
6 is a diagram showing the agent-environment interaction of the intelligent gNB proposed in the present invention.
7 is a diagram showing a general QL algorithm.
8 is a flowchart of a priority calculation process according to an embodiment of the present invention.
9 is a flowchart of a wireless communication resource allocation method using artificial intelligence according to an embodiment of the present invention.

후술하는 본 발명에 대한 상세한 설명은, 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예에 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 본 발명의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 본 발명의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The detailed description of the present invention which follows refers to the accompanying drawings which illustrate, by way of illustration, specific embodiments in which the present invention may be practiced. These embodiments are described in sufficient detail to enable one skilled in the art to practice the present invention. It should be understood that the various embodiments of the present invention are different from each other but are not necessarily mutually exclusive. For example, specific shapes, structures, and characteristics described herein may be implemented in one embodiment in another embodiment without departing from the spirit and scope of the invention. Additionally, it should be understood that the location or arrangement of individual components within each disclosed embodiment may be changed without departing from the spirit and scope of the invention. Accordingly, the detailed description set forth below is not to be taken in a limiting sense, and the scope of the present invention, if properly described, is limited only by the appended claims, along with all equivalents as claimed by those claims. Like reference numbers in the drawings indicate the same or similar function throughout the various aspects.

이하, 도면들을 참조하여 본 발명의 바람직한 실시예들을 보다 상세하게 설명하기로 한다. Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the drawings.

도 1은 본 발명의 일 실시예에 따른 인공지능을 이용한 무선통신 자원할당 장치의 블록도이다.1 is a block diagram of a wireless communication resource allocation apparatus using artificial intelligence according to an embodiment of the present invention.

본 발명에 따른 인공지능을 이용한 무선통신 자원할당 장치(10, 이하 장치)는 B5G 환경에서 eMBB와 URLLC의 공존을 위해 미니 슬롯 TTI(Transmission Time Interval) 스케줄링을 위한 최적의 OFDM 심볼 수와 수비학의 선택을 모델링하고 이를 해결하기 위해 Q-러닝(QL) 알고리즘을 활용하는 지능형 차세대 NodeB(gNB)에 관한 것이다. A wireless communication resource allocation apparatus (10, hereinafter) using artificial intelligence according to the present invention selects the optimal number of OFDM symbols and numerology for mini-slot TTI (Transmission Time Interval) scheduling for coexistence of eMBB and URLLC in a B5G environment It is about an intelligent next-generation NodeB (gNB) that models and utilizes a Q-learning (QL) algorithm to solve it.

도 1을 참조하면, 본 발명에 따른 장치(10)는 우선 순위 계산부(110), 최적 상태 선택부(130), 보상값 계산부(150) 및 최적값 반환부(170)를 포함한다.Referring to FIG. 1 , an apparatus 10 according to the present invention includes a priority calculation unit 110, an optimum state selection unit 130, a compensation value calculation unit 150, and an optimum value return unit 170.

본 발명의 상기 장치(10)는 인공지능을 이용한 무선통신 자원할당을 수행하기 위한 소프트웨어(애플리케이션)가 설치되어 실행될 수 있으며, 상기 우선 순위 계산부(110), 상기 최적 상태 선택부(130), 상기 보상값 계산부(150) 및 상기 최적값 반환부(170)의 구성은 상기 장치(10)에서 실행되는 상기 인공지능을 이용한 무선통신 자원할당을 수행하기 위한 소프트웨어에 의해 제어될 수 있다. In the apparatus 10 of the present invention, software (application) for performing wireless communication resource allocation using artificial intelligence may be installed and executed, and the priority calculation unit 110, the optimal state selection unit 130, Configurations of the compensation value calculation unit 150 and the optimal value return unit 170 may be controlled by software for performing wireless communication resource allocation using the artificial intelligence executed in the device 10 .

상기 장치(10)는 별도의 단말이거나 또는 단말의 일부 모듈일 수 있다. 또한, 상기 우선 순위 계산부(110), 상기 최적 상태 선택부(130), 상기 보상값 계산부(150) 및 상기 최적값 반환부(170)의 구성은 통합 모듈로 형성되거나, 하나 이상의 모듈로 이루어 질 수 있다. 그러나, 이와 반대로 각 구성은 별도의 모듈로 이루어질 수도 있다.The device 10 may be a separate terminal or a part of a module of the terminal. In addition, the configuration of the priority calculation unit 110, the optimal state selection unit 130, the compensation value calculation unit 150, and the optimum value return unit 170 are formed as an integrated module or as one or more modules. It can be done. However, on the contrary, each component may be composed of a separate module.

상기 장치(10)는 이동성을 갖거나 고정될 수 있다. 상기 장치(10)는, 서버(server) 또는 엔진(engine) 형태일 수 있으며, 디바이스(device), 기구(apparatus), 단말(terminal), UE(user equipment), MS(mobile station), 무선기기(wireless device), 휴대기기(handheld device) 등 다른 용어로 불릴 수 있다. The device 10 may be mobile or stationary. The apparatus 10 may be in the form of a server or engine, and may be a device, an apparatus, a terminal, a user equipment (UE), a mobile station (MS), or a wireless device. It can be called by other terms such as wireless device, handheld device, etc.

상기 장치(10)는 운영체제(Operation System; OS), 즉 시스템을 기반으로 다양한 소프트웨어를 실행하거나 제작할 수 있다. 상기 운영체제는 소프트웨어가 장치의 하드웨어를 사용할 수 있도록 하기 위한 시스템 프로그램으로서, 안드로이드 OS, iOS, 윈도우 모바일 OS, 바다 OS, 심비안 OS, 블랙베리 OS 등 모바일 컴퓨터 운영체제 및 윈도우 계열, 리눅스 계열, 유닉스 계열, MAC, AIX, HP-UX 등 컴퓨터 운영체제를 모두 포함할 수 있다.The device 10 may execute or manufacture various software based on an operating system (OS), that is, a system. The operating system is a system program for enabling software to use the hardware of the device, and is a mobile computer operating system such as Android OS, iOS, Windows mobile OS, Bada OS, Symbian OS, Blackberry OS, and Windows-based, Linux-based, Unix-based, It can include all computer operating systems such as MAC, AIX, and HP-UX.

ITU(International Telecommunication Union)는 5 세대(5G) 셀룰러 통신의 이기종 트래픽을 eMBB(Enhanced Mobile Broadband), URLLC(Ultra Reliable and Low Latency Communication) 및 mMTC(massive Machine-Type Communication)의 세 가지 범주로 분류한다. The International Telecommunication Union (ITU) classifies heterogeneous traffic in fifth generation (5G) cellular communications into three categories: Enhanced Mobile Broadband (eMBB), Ultra Reliable and Low Latency Communication (URLC), and Massive Machine-Type Communication (mMTC). .

eMBB에는 대용량 비디오 스트리밍 및 증강/가상 현실(AR/VR)과 같이 대역폭이 많이 필요한 애플리케이션이 포함된다. mMTC는 사물 인터넷(IoT)의 대규모 배포를 지원하기 위해 감지, 측정, 모니터링, 측정 및 교정 애플리케이션을 다루며, URLLC는 자율 주행 자동차 및 드론과 같은 지연 및 신뢰성에 민감한 애플리케이션을 지원한다. eMBBs include bandwidth-hungry applications such as high-volume video streaming and augmented/virtual reality (AR/VR). mMTC addresses sensing, measurement, monitoring, measurement and calibration applications to support large-scale deployments of the Internet of Things (IoT), while URLLC supports latency- and reliability-sensitive applications such as autonomous vehicles and drones.

eMBB와 URLLC 애플리케이션의 공존은 5G 네트워크에서 매우 중요하다. 도 2는 단일 셀 차세대 NodeB(gNB)의 다운 링크 전송 시나리오를 예시한 것이다. 3GPP(3rd Generation Partnership Project)에서 정의한 URLLC에 대한 가장 엄격한 요구 사항은 다음과 같다.The coexistence of eMBB and URLLC applications is very important in 5G networks. 2 illustrates a downlink transmission scenario of a single-cell next-generation NodeB (gNB). The most stringent requirements for URLLC defined by 3GPP (3rd Generation Partnership Project) are as follows.

1. 낮은 종단 간 대기 시간은 무선 인터페이스 대기 시간이 0.5ms인 경우 1ms로 낮아야 한다.1. Low end-to-end latency should be as low as 1 ms when the air interface latency is 0.5 ms.

2. 신뢰성은 10^-9 패킷 오류율에 해당하는 99.99 %까지 높아야 한다. 이는 10⁹의 패킷 중 하나가 1ms 이내에 전달되지 않으면 신뢰성 실패로 선언된 것을 반영한다.2. Reliability should be as high as 99.99%, corresponding to a 10 ^-9 packet error rate. This reflects a reliability failure declaration if one of the 10 ⁹ packets is not delivered within 1 ms.

eMBB의 최소 요구 사항은 다음과 같이 정의된다.The minimum requirements for eMBB are defined as:

1. 최대 데이터 속도: 다운 링크: 20 Gbit/s1. Maximum Data Rate: Downlink: 20 Gbit/s

2. 최대 4 ms의 지연 시간2. Latency up to 4 ms

3. 스펙트럼 효율 30bit/s/Hz3. Spectral efficiency 30bit/s/Hz

3GPP는 B5G 네트워크를 위한 무선 인터페이스로 NR(New Radio)을 표준화했다. NR은 도 3과 같이 LTE-A(Long Term Evolution-Advanced)와 동일한 프레임 구조 및 OFDM(Orthogonal Frequency Division Multiplex) 전송을 따른다. NR에서 부반송파 간격(SCS) 수비학은

(여기서, n = {0, 1, 2, 3, ...})의 순서로 확장 가능하다. 지금까지는 n = 4가 표준에 포함되었다. 3GPP standardized NR (New Radio) as a radio interface for B5G networks. As shown in FIG. 3, NR follows the same frame structure and Orthogonal Frequency Division Multiplex (OFDM) transmission as Long Term Evolution-Advanced (LTE-A). The subcarrier spacing (SCS) numerology in NR is

(Here, n = {0, 1, 2, 3, ...}) can be extended in the order. So far, n = 4 has been included in the standard.

그러나, n > 4는 차세대 표준에 포함될 것으로 예상되며, NR의 무선 프레임은 시간 길이가 10ms이고 서브 프레임은 1ms이고 각 서브 프레임은

개의 슬롯을 포함한다. 슬롯은 TTI(Transmission Time Interval) 내에서 물리적 신호가 전송 및 반복되는 14 개의 OFDM 심볼을 포함하는 기본 프레임 구조이다.However, n > 4 is expected to be included in the next-generation standard, and the radio frame of NR has a time length of 10 ms and a sub-frame of 1 ms, and each sub-frame is

contains two slots. A slot is a basic frame structure including 14 OFDM symbols in which a physical signal is transmitted and repeated within a Transmission Time Interval (TTI).

NR의 슬롯은 도 3과 같이 15 kHz SCS에서 1 ms의 14 개 OFDM 심볼을 포함하는 서브 프레임에 해당하며, 다른 SCS 수비학은 15 kHz에서 1 ms로부터 120 kHz에서 125μs 범위의 다른 슬롯 길이에 해당하여 더 짧은 TTI를 가능하게 한다. 비 슬롯 전송의 개념은 NR의 미니 슬롯을 나타내는 NR에 도입되었다. A slot of NR corresponds to a subframe containing 14 OFDM symbols of 1 ms at 15 kHz SCS as shown in FIG. 3, and different SCS numerologies correspond to different slot lengths ranging from 1 ms at 15 kHz to 125 μs at 120 kHz Enables shorter TTI. The concept of non-slotted transmission was introduced to NR representing mini-slots in NR.

미니 슬롯은 임의의 OFDM 심볼에서 시작할 수 있으며 다양한 수의 OFDM 심볼을 전달할 수 있다(예를 들어, 2, 4 또는 7 개의 심볼). OFDM 심볼 수가 적은 미니 슬롯은 빠른 전송을 가능하게 하여 SCS 수비에 관계없이 URLLC와 같은 저 지연 애플리케이션을 위한 실행 가능한 솔루션을 제공한다.A mini-slot can start at any OFDM symbol and can carry a variable number of OFDM symbols (eg, 2, 4 or 7 symbols). Minislots with a low number of OFDM symbols enable fast transmission, providing a viable solution for low-latency applications such as URLLC, regardless of SCS coverage.

5G의 요구 사항을 충족하기 위해 릴리스 15 standardizesto의 3GPP는 TTI를 1ms에서 몇 개의 심볼로 줄여 물리적 및 매체 액세스 제어(MAC) 계층 지연 시간을 줄이도록 표준화한다. 감소된 TTI는 업 링크 및 다운 링크 모두에서 더 빠른 사용자 스케줄링을 가능하게 하고, HARQ(Hybrid Automatic Repeat Requests) 타임 라인을 줄여 네트워크 용량을 증가시키고 대기 시간을 줄인다. 또한, 대기 시간 임계 값 내에서 더 많은 재전송을 수용할 수 있어 패킷 오류율 또는 정확도가 개선된다.To meet the requirements of 5G, 3GPP in release 15 standardizesto standardize the TTI to be reduced from 1 ms to a few symbols to reduce physical and medium access control (MAC) layer latency. The reduced TTI enables faster user scheduling on both the uplink and downlink, and reduces the Hybrid Automatic Repeat Requests (HARQ) timeline, increasing network capacity and reducing latency. Additionally, more retransmissions can be accommodated within the latency threshold, improving packet error rate or accuracy.

또한, SCS를 늘리면 도 3에 표시된 대로 사용 가능한 대역폭이 증가한다. 사용 가능한 최대 대역폭은 15 kHz SCS에서 50 MHz로부터 120 kHz SCS에서 400 MHz가 된다. NR에서는 PRB(Physical Resource Block)의 부반송파(SC) 수는 12 개로 고정되어 있다. SC의 수는 SCS에 의해 정의되므로, 총 PRB의 수도 SCS에 의해 정의된다. PRB의 대역폭도 SCS에 따라 달라지며, 이는 도 3에서 볼 수 있듯이 n = 0에서 180 kHz 및 n = 4에서 2.88 MHz이다. 본 발명은 URLLC 애플리케이션을 제공하기 위한 미니 슬롯에 대해 SCS 및 OFDM 심볼의 선택을 다룬다.In addition, increasing the SCS increases the usable bandwidth as shown in FIG. 3 . The maximum usable bandwidth goes from 50 MHz at 15 kHz SCS to 400 MHz at 120 kHz SCS. In NR, the number of subcarriers (SC) of a physical resource block (PRB) is fixed to 12. Since the number of SCs is defined by SCS, the total number of PRBs is also defined by SCS. The bandwidth of the PRB also depends on the SCS, which is 180 kHz at n = 0 and 2.88 MHz at n = 4, as shown in FIG. The present invention addresses the selection of SCS and OFDM symbols for minislots to serve URLLC applications.

하향 링크 전송에서 MAC 계층 간의 성공적인 전송에 대한 종단 간 지연은 스케줄링 지연, 큐잉 지연, 전송 지연, 처리 지연, 디코딩 지연 및 HARQ 왕복 시간(RTT)을 포함한다. 대기열 지연은 여러 사용자 데이터의 통계적 다중화로 인해 발생한다. 여러 사용자의 데이터 흐름은 B5G 이기종 네트워크에서 다양한 사용자의 트래픽 패턴이 다양하기 때문에 버스트 및 산발적일 수 있다. 높은 신뢰성을 얻기 위해서는 충분한 HARQ 재전송이 필요하다. End-to-end delays for successful transmission between MAC layers in downlink transmission include scheduling delay, queuing delay, transmission delay, processing delay, decoding delay, and HARQ round-trip time (RTT). Queue delays are caused by statistical multiplexing of multiple user data. The data flow of different users can be bursty and sporadic because of the different traffic patterns of different users in B5G heterogeneous networks. In order to obtain high reliability, sufficient HARQ retransmission is required.

사용자의 데이터가 증가함에 따라 대기 지연이 증가하여 스펙트럼 효율성을 극대화한다. 따라서, 5G 네트워크를 설계하는 동안 대기 지연 문제를 해결해야 한다.As the user's data grows, the standby delay increases to maximize spectral efficiency. Therefore, while designing the 5G network, the latency issue must be addressed.

다운 링크 전송에서 gNB는 도착한 사용자의 패킷을 예약하고 사용자의 첫 번째 전송 큐에서 버퍼링하고 첫 번째 HARQ 재전송을 위한 예약을 기다린다. 첫 번째 HARQ가 실패하면 패킷은 RTT 후 두 번째 재전송에 사용할 수 있다. gNB에서 버퍼링된 패킷이 데드 라인을 놓칠 때마다 패킷이 삭제되어 안정성이 저하된다. 또한, n 개의 HARQ 이후에 수신단에서 디코딩 할 수 없는 데이터 패킷은 gNB에 의해 실패로 선언되어 신뢰성이 떨어질 수 있다. In downlink transmission, the gNB reserves the arriving user's packet, buffers it in the user's first transmit queue, and waits for the reservation for the first HARQ retransmission. If the first HARQ fails, the packet is available for second retransmission after RTT. Whenever a packet buffered in the gNB misses the deadline, the packet is discarded, reducing reliability. In addition, data packets that cannot be decoded by the receiving end after n HARQs are declared as failure by the gNB, and thus reliability may decrease.

도 4는 B5G 네트워크에서 이기종 트래픽이 있는 신호 큐 모델의 동작을 보여준다. 모든 스케줄링에서 gNB는 버퍼링 된 패킷의 새로운 전송 및 재전송에 주파수 및 시간 리소스를 할당한다. 그러나, 버퍼는 한정되어 있으며, 대기열 지연이 지연 요구 사항보다 크면 패킷이 gNB에서 삭제된다.Figure 4 shows the operation of the signaling queue model with heterogeneous traffic in a B5G network. In all scheduling, the gNB allocates frequency and time resources to new transmissions and retransmissions of buffered packets. However, the buffer is bounded, and packets are dropped at the gNB if the queue delay is greater than the delay requirement.

3GPP는 엄격한 대기 시간 및 안정성 요구 사항을 충족하기 위해 URLLC 트래픽에 대해 NR에서 펑처링 메커니즘을 표준화하여 eMBB 사용자 장비(UE)에 알리지 않고 미니 슬롯에서 URLLC 트래픽을 전송하기 위해 진행중인 eMBB 트래픽을 중단한다. To meet stringent latency and reliability requirements, 3GPP has standardized a puncturing mechanism in NR for URLLC traffic to suspend ongoing eMBB traffic to send URLLC traffic in minislots without informing the eMBB User Equipment (UE).

펑처링 메커니즘은 eMBB 사용자의 서비스 품질(QoS)을 저하시킨다. 때로는 eMBB UE가 경험하는 QoS 저하가 URLLC 트래픽을 전송하여 얻은 이득보다 더 많다. 도 4는 펑처링 메커니즘을 보여 주며 B5G 네트워크 서비스의 QoS 요구 사항을 충족하려면 효율적인 스케줄링이 필요하다.The puncturing mechanism degrades the quality of service (QoS) of eMBB users. Sometimes the QoS degradation experienced by eMBB UEs outweighs the gains obtained by sending URLLC traffic. Figure 4 shows the puncturing mechanism and efficient scheduling is required to meet the QoS requirements of B5G network services.

상기 우선 순위 계산부(110)는 선택적으로 입력된 학습 데이터를 이용하여 eMBB 및 URLLC의 이종 서비스 간 사용자의 스케줄링 우선 순위를 결정한다.The priority calculation unit 110 determines a user's scheduling priority between heterogeneous services of eMBB and URLLC using selectively input learning data.

상기 최적 상태 선택부(130)는 결정된 스케줄링 우선 순위를 입력으로 하여 URLLC 서비스를 사용하는 사용자에 대한 SCS(SubCarrier Spacing) 수비학 및 OFDM(Orthogonal Frequency Division Multiplex) 심볼 수를 선택한다.The optimal state selector 130 selects the number of SubCarrier Spacing (SCS) numerology and Orthogonal Frequency Division Multiplex (OFDM) symbols for the user using the URLLC service by taking the determined scheduling priority as an input.

시스템 모델은 도 4에 도시되어 있다. 본 발명은 eMBB-URLLC 공존 시스템의 다운 링크 동적 스케줄링을 고려하며, eMBB 서비스와 관련된 사용자는 무한한 패킷 크기로 연속 트래픽(즉, 전체 버퍼 트래픽)을 생성한다. URLLC 서비스와 관련된 사용자는 작은 버스트를 생성한다. The system model is shown in FIG. 4 . The present invention considers the downlink dynamic scheduling of the eMBB-URLLC coexistence system, and users associated with the eMBB service generate continuous traffic (i.e. full buffer traffic) with an infinite packet size. Users associated with the URLLC service generate small bursts.

도착 속도가 λ(패킷/초)인 PPP(Poisson Point Process)를 따르는 패킷 두 가지 유형의 서비스, 즉 URLLC 및 eMBB를 d_r 및 d_e로 표시한다(여기서,

, 여기서

,

및

,

). 패킷/초 단위의 URLLC 및 eMBB 데이터 패킷의 평균 도착률은 도 4에서와 같이 M/G/Δ큐잉 모델에서 포이즌(Poison) 분포에 따른 λ_r 및 λ_e이다. 패킷 길이 L_d는 bits/packets를 의미한다. 미니 슬롯의 OFDM 심볼 수는 m = {2,4,7}로 표시된다. NR의 서브 프레임에 있는 각 슬롯의 길이는 б로 표시되고 아래의 수학식 1에 따라 계산된다. SCS의 수비는 n = {0,1,2,3,...}와 같이 n으로 표시된다.Packets following the Poisson Point Process (PPP) with an arrival rate of λ (packets/sec). Two types of services, namely URLLC and eMBB, are denoted by d _r and d _e (where

, here

,

and

,

). Average arrival rates of URLLC and eMBB data packets in units of packets/second are λ _r and λ _e according to a poison distribution in the M/G/Δ queuing model as shown in FIG. 4 . Packet length L _d means bits/packets. The number of OFDM symbols in a minislot is denoted by m = {2,4,7}. The length of each slot in the subframe of NR is denoted by б and is calculated according to Equation 1 below. The defense of SCS is denoted by n such that n = {0,1,2,3,...}.

[수학식 1][Equation 1]

사용자 서비스 유형 t의 대기열 길이는 비트 단위로 Q_d로 표시되며 수학식 2에 따라 대기열 가중치 ω를 결정하는데 사용된다. eMBB 및 URLLC 사용자에 대한 우선 순위 함수는 φ_d로 표시된다. eMBB 및 URLLC에 대한 우선 순위는 각각 아래의 수학식 3 및 수학식 4와 같이 정의된다.The queue length of user service type t is represented by Q _d in units of bits and is used to determine the queue weight ω according to Equation 2. The priority function for eMBB and _URLLC users is denoted by φd. Priorities for eMBB and URLLC are defined as Equations 3 and 4 below, respectively.

[수학식 2][Equation 2]

[수학식 3][Equation 3]

[수학식 4][Equation 4]

여기서, τ는 URLLC 사용자에 대한 스케줄링 지연, 즉 3GPP 표준에서 2 개의 미니 슬롯이 있고 그 후에 URLLC 패킷이 삭제된다. OFDM의 심볼 기간은 아래의 수학식 5와 같이 주어진다.where τ is the scheduling delay for URLLC users, i.e., in the 3GPP standard there are two minislots after which URLLC packets are dropped. The symbol period of OFDM is given by Equation 5 below.

[수학식 5][Equation 5]

URLLC 서비스를 위한 미니 슬롯 ζ의 길이는 아래의 수학식 6과 같이 주어진다.The length of the minislot ζ for the URLLC service is given by Equation 6 below.

[수학식 6][Equation 6]

eMBB 서비스를 위한 슬롯 ρ의 길이는 아래의 수학식 7과 같이 주어진다.The length of the slot ρ for the eMBB service is given by Equation 7 below.

[수학식 7][Equation 7]

본 발명은 모든 사용자 d의 정규화된 부하(NL)를 최소화하기 위해 URLLC 서비스에서 미니 슬롯에 대한 SCS 수비학 n 및 OFDM 기호 m의 수를 선택하는 것이 목표이다. 또한, 수학식 3 및 수학식 4에서 우선 순위 함수를 정의하였다. 자원을 공정하게 할당하고 데이터 속도, 지연 시간, 안정성을 균형 있게 유지하기 위해 도착률, 데이터 크기, 대기열 길이를 고려하여 사용자 데이터의 특성을 고려한다. The present invention aims to choose the number of SCS numerologies n and OFDM symbols m for minislots in a URLLC service to minimize the normalized load (NL) of all users d. In addition, the priority function is defined in Equations 3 and 4. Consider the characteristics of user data by considering arrival rate, data size, and queue length in order to allocate resources fairly and balance data speed, latency, and reliability.

본 발명은 n과 m의 조합을 도 5와 같이 속도 적응 방식 k로 정의한다. eMBB 서비스를 위한 슬롯의 ODFM 심볼 수는 o = 14-m이다. eMBB 및 URLLC에 대한 NL은 아래의 수학식 8 및 수학식 9와 같이 주어진다.In the present invention, the combination of n and m is defined as speed adaptation method k as shown in FIG. 5 . The number of ODFM symbols in a slot for eMBB service is o = 14-m. NLs for eMBB and URLLC are given as Equations 8 and 9 below.

[수학식 8][Equation 8]

[수학식 9][Equation 9]

상기 보상값 계산부(150)는 각 사용자의 각 슬롯에서 정규화된 부하(NL; Normalized Load)를 최소화하는 SCS 수비학 및 OFDM 심볼 수의 조합에 따른 상태의 보상값을 계산한다.The compensation value calculator 150 calculates a compensation value of a state according to a combination of SCS numerology and OFDM symbol number that minimizes a normalized load (NL) in each slot of each user.

상기 최적값 반환부(170)는 계산된 보상값을 반영하여 상태의 동작값 및 학습 데이터를 업데이트한다.The optimal value returning unit 170 updates the operation value and learning data of the state by reflecting the calculated compensation value.

강화 학습(RL; Reinforcement Learning)은 학습자(에이전트)가 수치적 보상(주요 목표의 방향으로 이동)을 극대화하기 위해 수행할 동작(action)에 대한 사전 지식이 없는 ML(머신 러닝) 유형이다. 그러나, 에이전트는 히트 및 시험 방법론으로 최대 보상을 산출하는 수행할 동작을 찾아야 한다. RL에는 에이전트, 환경 및 보상의 세 가지 주요 요소가 있다. Reinforcement Learning (RL) is a type of machine learning (ML) in which learners (agents) do not have prior knowledge of the actions they will take to maximize the numerical reward (movement in the direction of the main goal). However, the agent must find the action to perform that yields the maximum reward with a hit and test methodology. RL has three main components: agent, environment and reward.

Q-Learning(QL) 알고리즘은 실시간 시나리오를 위한 효과적인 RL 방법으로, 고정적이고 독립적이며 무작위로 분산된 트래픽에서 빠르게 수렴되며, 정책을 벗어난 시간 차이 RL 방법론이다. QL에서 에이전트는 시스템 성능을 학습하고 최적화하기 위해 알려지지 않은 환경과 상호 작용한다. 정책을 벗어난 것은 에이전트의 행동을 말하며 정책과는 별도로 동작 값 Q를 직접 최적화한다. 이 접근 방식은 알고리즘을 간소화하고 빠른 수렴을 가능하게 한다. The Q-Learning (QL) algorithm is an effective RL method for real-time scenarios, which converges quickly on stationary, independent, and randomly distributed traffic, and is an out-of-policy time-difference RL methodology. In QL, agents interact with an unknown environment to learn and optimize system performance. Out of policy refers to the action of the agent and directly optimizes the action value Q independently of the policy. This approach simplifies the algorithm and enables fast convergence.

문제는 tuple(K, A, P, R)로 표현되는 Markov Decision Process(MDP)로 모델링 할 수 있다. K가 유한 상태 공간을 나타내는 경우 A는 에이전트의 유한 동작 공간(가능한 동작 집합)이다. P는 현재 상태 k_t에서 다음 상태 k_(t+1)로의 전환 확률을 결정하는 전환 확률 행렬이고, R은 한 상태에서 다른 상태로 이동하면서 에이전트에 대한 보상을 결정하는 보상 함수를 나타낸다. A problem can be modeled as a Markov Decision Process (MDP) represented by a tuple (K, A, P, R). If K represents a finite state space, then A is the agent's finite action space (set of possible actions). P is the transition probability matrix that determines the transition probability from the current state k _t to the next state k _(t+1) , and R represents the reward function that determines the reward for the agent while moving from one state to another.

실시간 환경의 역학(과도적 확률)은 알려져 있지 않다. Bellman 최적성 방정식을 풀기 위한 효과적인 RL 알고리즘 중 하나는 QL이다. 정책 π는 Bellman 방정식을 사용하여 룩업 테이블로 각 반복에서 수행되는 상태-동작 쌍의 동작 값을 결정하고 업데이트한다.The dynamics (transient probabilities) of the real-time environment are unknown. One of the effective RL algorithms for solving the Bellman optimality equation is QL. Policy π uses the Bellman equation to determine and update the action values of the state-action pairs performed at each iteration with a lookup table.

γ는 보상을 제한하는 감가 요소이며 0≤ γ≤1이다. 감가 요소는 미래 보상의 현재 가치를 결정한다. γ의 값이 0으로 설정되면 에이전트는 즉각적인 보상, 즉 r_k(t)에 대해 더 많이 고려된다. γ이 1에 가까워 질수록 에이전트는 미래 보상인 장기적 보상을 고려하고, α는 학습률이라고도 하는 스텝 크기이고,

(0; 1]의 값이다. γ is a depreciation factor that limits compensation, and 0≤γ≤1. The depreciation factor determines the present value of future rewards. When the value of γ is set to 0, the agent is considered more for the immediate reward, i.e. r _k (t). As γ approaches 1, the agent considers the long-term reward, which is the future reward, α is the step size, also called the learning rate,

It is a value of (0; 1].

수학식 10 및 수학식 11에서 γ가 0으로 설정되면 에이전트가 학습하지 않고, 0.9와 같이 높은 값이면 에이전트가 빠르게 학습함을 알 수 있다.In Equations 10 and 11, when γ is set to 0, the agent does not learn, and when γ is a high value such as 0.9, it can be seen that the agent learns quickly.

[수학식 10][Equation 10]

[수학식 11][Equation 11]

본 발명에서 에이전트는 gNB 스케줄러이고, 환경의 상태-공간

는 SCS 수비학 n 및 TTI 동안 사용된 OFDM 심볼 수 m의 조합이다(도 5 참조). 동작

은 모든 슬롯에서

의 NL을 최소화하는 상태 k(속도 적응 체계)를 선택하기 위해 에이전트에 의해 수행된다. 도 6은 본 발명에서 제안된 지능형 gNB의 에이전트-환경 상호 작용을 보여준다.In the present invention, the agent is the gNB scheduler, and the environment's state-space

is a combination of the SCS numerology n and the number of OFDM symbols used during TTI m (see FIG. 5). movement

in all slots

Performed by the agent to select the state k (rate adaptation scheme) that minimizes the NL of 6 shows the agent-environment interaction of the intelligent gNB proposed in the present invention.

보상은 특정 상태에 대한 조치의 정량적 성과 지표이다. 본 발명에서 보상

는 각 사용자

의 각 슬롯에서 NL을 최소화하는 SCS 수비학 n과 TTI(slot) 당 심볼 수인 상태 k를 선택하여 얻은 보상이다. Reward is a quantitative performance indicator of an action for a particular state. compensation in the present invention

is for each user

It is a reward obtained by selecting SCS numerology n that minimizes NL in each slot of and state k, which is the number of symbols per TTI (slot).

스케줄링을 위해 사용자의 우선 순위를 지정하고, NL을 최소화함으로써 각 시간 단계 t에서 지연 및 안정성 요구 사항을 충족한다. 보상

에 대한 두 값, 즉 x가 항상 양수이고, y보다 큰 x와 y 즉, x > y가 고려된다. 예를 들어, x = 1 및 y = 0이다. 보상은 다음의 수학식 12와 같이 정의된다.We prioritize users for scheduling and minimize the NL to meet the delay and reliability requirements at each time step t. compensation

Two values for x are always considered positive, x and y greater than y, i.e. x > y. For example, x = 1 and y = 0. Compensation is defined as in Equation 12 below.

[수학식 12][Equation 12]

는 현재 상태와 이전 상태의 NL 간의 차이를 나타낸다.

represents the difference between the NL in the current state and the previous state.

보상을 극대화하기 위해 에이전트는 효과적인 보상을 제공하는 과거에 배운 동작을 선호한다. 이것을 착취(exploitation)라고 한다. 에이전트가 더 나은 동작 선택을 위해 무작위로 다른 동작을 탐색하는 것을 탐색(exploration)이라고 한다. 탐색과 착취의 균형을 유지하는 가장 좋은 방법 중 하나는 ∈-greedy를 사용하는 것이다. To maximize reward, agents prefer actions learned in the past that provide effective rewards. This is called exploitation. When an agent randomly explores different actions in order to select a better action, it is called exploration. One of the best ways to balance exploration and exploitation is to use ∈-greedy.

에이전트는 ∈ 확률로 탐색하고, 1-∈로 착취한다. ∈-greedy는 시스템의 조기 수렴을 방지한다. 또한, 미탐색 동작을 선택할 가능성이 높아진다. 착취하는 동안 에이전트는 수학식 10을 사용하여 다음의 수학식 14에 따라 액션을 선택한다.The agent searches with ∈ probability, and exploits with 1-∈. ∈-greedy prevents premature convergence of the system. In addition, the possibility of selecting an unsearched operation increases. During exploitation, the agent uses Equation 10 to select an action according to Equation 14 below.

[수학식 13][Equation 13]

일반걱인 QL 알고리즘은 도 7과 같다.The general QL algorithm is shown in FIG. 7 .

도 8은 본 발명의 일 실시예에 따른 우선 순위 계산 과정의 흐름도이다. 도 9는 본 발명의 일 실시예에 따른 인공지능을 이용한 무선통신 자원할당 방법의 흐름도이다.8 is a flowchart of a priority calculation process according to an embodiment of the present invention. 9 is a flowchart of a wireless communication resource allocation method using artificial intelligence according to an embodiment of the present invention.

본 실시예에 따른 인공지능을 이용한 무선통신 자원할당 방법은, 도 1의 장치(10)와 실질적으로 동일한 구성에서 진행될 수 있다. 따라서, 도 1의 장치(10)와 동일한 구성요소는 동일한 도면부호를 부여하고, 반복되는 설명은 생략한다. The wireless communication resource allocation method using artificial intelligence according to the present embodiment may be performed in substantially the same configuration as the device 10 of FIG. 1 . Accordingly, components identical to those of the apparatus 10 of FIG. 1 are given the same reference numerals, and repeated descriptions are omitted.

또한, 본 실시예에 따른 인공지능을 이용한 무선통신 자원할당 방법은 인공지능을 이용한 무선통신 자원할당을 수행하기 위한 소프트웨어(애플리케이션)에 의해 실행될 수 있다.In addition, the wireless communication resource allocation method using artificial intelligence according to the present embodiment may be executed by software (application) for performing wireless communication resource allocation using artificial intelligence.

도 8을 참조하면, 본 실시예에 따른 인공지능을 이용한 무선통신 자원할당 방법에서 우선 순위 계산 과정은, 우선 순위 계산기에 각 서비스 유형 사용자

의 패킷/초의 도착률

, 사용자 d의 패킷 길이

(비트/초), 각 사용자의 큐 가중치

및 CSI를 입력한다(단계 S100). Referring to FIG. 8, in the wireless communication resource allocation method using artificial intelligence according to the present embodiment, the priority calculation process is performed on the priority calculator for each service type user.

of packets/second arrival rate

, the packet length of user d

(bits/sec), each user's queue weight

and CSI are input (step S100).

각 시간 슬롯 t에 대해 모두 입력된 경우(단계 S200), 수학식 3 및 수학식 4를 사용하여 각 사용자 d의 우선 순위

를 계산한다(단계 S300). 이후, 서비스 유형 eMBB

및 URLLC

로 각 사용자를 우선 순위에 따라 내림차순으로 순위를 정한다(단계 S400).If all are entered for each time slot t (step S200), the priority of each user

d using Equations

3 and 4

Calculate (step S300). After, service type eMBB

and URLLC

Each user is ranked in descending order according to the priority (step S400).

도 8을 참조하면, 본 실시예에 따른 인공지능을 이용한 무선통신 자원할당 방법은, 지능형 스케줄러에 대해 각 서비스 유형 사용자

의 패킷/초의 도착률

, 사용자 d의 패킷 길이

(비트/초), 단계 크기 α, 감가 계수 γ, 엡실론

및 서비스 유형

및

의 우선 순위가 설정된 사용자 집합을 입력한다(단계 S10).Referring to FIG. 8, in the wireless communication resource allocation method using artificial intelligence according to this embodiment, each service type user for an intelligent scheduler

of packets/second arrival rate

, the packet length of user d

(bits/sec), step size α, decrement factor γ, epsilon

and service type

and

Enter a set of users whose priorities are set (step S10).

각 시간 슬롯에 대해 우선 순위 설정에서 가장 높은 순위를 가진 eMBB 사용자

에게 시간 슬롯을 할당한다(단계 S11).For each time slot, the eMBB user with the highest rank in Priority Settings

A time slot is allocated to (step S11).

URLLC 서비스 유형이

인 사용자에 대한 시간 슬롯이 있는 각 미니 슬롯에 대해 0에서 1 사이의 난수(random number) β를 생성한다(단계 S12). 생성된 난수가 β≤∈인지 확인한다(단계 S13).URLLC service type

A random number β between 0 and 1 is generated for each mini-slot in which there is a time slot for user . It is checked whether the generated random number is β≤∈ (step S13).

그렇다면 지능형 gNB는 상태

(속도 적응 방식: SCS 수비학 및 OFDM 심볼 수의 조합)를 무작위로 선택하여

의 동작을 취하여 탐색한다(단계 S14). 아니라면 지능형 gNB는 수학식 13에 따라 가장 높은 Q-값을 가진 상태

를 선택하여

의 동작을 취하여 착취한다(단계 S15).Then the intelligent gNB states

(rate adaptation method: combination of SCS numerology and OFDM symbol count) at random

Search by taking the action of (step S14). Otherwise, the intelligent gNB has the highest Q-value according to Equation 13

by selecting

Takes the action of and exploits it (step S15).

수학식 8 및 수학식 9를 사용하여 시간 슬롯 t의 서비스 유형

및

을 사용하여 각 사용자의 정규화된 부하를 계산한다(단계 S16).Type of service in time slot t using Equation 8 and Equation 9

and

Calculate the normalized load of each user by using (step S16).

η<0인지 확인한다(단계 S17). 여기서,

이다. 그렇다면 보상

를 x로 업데이트하고, 그렇지 않으면

를 y로 업데이트한다.It is checked whether η<0 (step S17). here,

to be. if so reward

to x, otherwise

update to y.

상태

를 선택하여 동작

의 Q-값

를 업데이트한다(단계 S18). 모든 상태

의 Q-값을 업데이트하고(단계 S19), 모든 동작

의 동작-보상 매트릭스를 업데이트한다(단계 S20). situation

Select to operate

Q-value of

is updated (step S18). all status

Update the Q-value of (step S19), and all operations

Updates the motion-compensation matrix of (step S20).

이후, 단계 S11로 회귀하여 모든 시간 슬롯에 대해 위 과정을 반복한다.Thereafter, returning to step S11, the above process is repeated for all time slots.

본 발명은 B5G 환경에서 eMBB와 URLLC의 공존을 위해 미니 슬롯 TTI(Transmission Time Interval) 스케줄링을 위한 최적의 OFDM 심볼 수와 수비학의 선택을 모델링하고, 이를 해결하기 위해 Q-러닝(QL) 알고리즘을 활용하는 지능형 차세대 NodeB(gNB)을 제공한다.The present invention models the selection of the optimal number of OFDM symbols and numerology for mini-slot TTI (Transmission Time Interval) scheduling for coexistence of eMBB and URLLC in a B5G environment, and utilizes a Q-learning (QL) algorithm to solve this problem. Provides an intelligent next-generation NodeB (gNB) that

이와 같은, 인공지능을 이용한 무선통신 자원할당 방법은 애플리케이션으로 구현되거나 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. Such a wireless communication resource allocation method using artificial intelligence may be implemented as an application or implemented in the form of program instructions that can be executed through various computer components and recorded on a computer-readable recording medium. The computer readable recording medium may include program instructions, data files, data structures, etc. alone or in combination.

상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것들이거니와 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다. Program instructions recorded on the computer-readable recording medium may be those specially designed and configured for the present invention, or those known and usable to those skilled in the art of computer software.

컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks. media), and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like.

프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter or the like as well as machine language codes such as those produced by a compiler. The hardware device may be configured to act as one or more software modules to perform processing according to the present invention and vice versa.

이상에서는 실시예들을 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although the above has been described with reference to embodiments, those skilled in the art can variously modify and change the present invention without departing from the spirit and scope of the present invention described in the claims below. You will understand.

본 발명은 B5G 환경에서 eMBB와 URLLC의 공존을 위한 자원할당 방법을 제공하고, 이를 위해 Q-러닝(QL) 알고리즘을 활용하여 동적으로 학습하므로, 차세대 무선통신에 유용하게 활용 가능하다.The present invention provides a resource allocation method for the coexistence of eMBB and URLLC in a B5G environment, and dynamically learns using a Q-learning (QL) algorithm for this purpose, so it can be usefully used for next-generation wireless communication.

10: 인공지능을 이용한 무선통신 자원할당 장치
110: 우선 순위 계산부
130: 최적 상태 선택부
150: 보상값 계산부
170: 최적값 반환부10: Wireless communication resource allocation device using artificial intelligence
110: priority calculation unit
130: optimal state selection unit
150: compensation value calculation unit
170: optimal value return unit

Claims

Determining a user's scheduling priority between heterogeneous services of eMBB and URLLC using selectively input learning data;
selecting the number of SubCarrier Spacing (SCS) numerology and Orthogonal Frequency Division Multiplex (OFDM) symbols for a user using the URLLC service by using the determined scheduling priority as an input of the intelligent scheduler;
Calculating a compensation value of a state according to a combination of SCS numerology and OFDM symbol number that minimizes a normalized load (NL) in each slot of each user; and
A method of allocating resources for wireless communication using artificial intelligence, comprising: updating operating values and learning data of states by reflecting the calculated compensation values.

The method of claim 1, wherein determining the scheduling priority of the user comprises:
inputting the arrival rate of each service type user in packets/second, the packet length, the queue weight of each user and the CSI for each slot;
calculating each user's priority according to the priority function for service types eMBB and URLLC; and
A wireless communication resource allocation method using artificial intelligence, comprising: determining a rank in descending order according to the priorities of each user using the service type eMBB and URLLC.

According to claim 2,
Wherein the priority function is based on at least one of an arrival rate, a data size, and a queue length, a wireless communication resource allocation method using artificial intelligence.

The method of claim 1, wherein selecting the SCS numerology and the number of OFDM symbols comprises:
allocating a time slot to an eMBB user with the highest priority in a priority setting for each time slot;
generating a random number between 0 and 1 for each minislot with time slots for a URLLC user; and
A method of allocating resources for wireless communication using artificial intelligence, comprising: comparing the generated random number with a preset threshold and learning the selected operation.

The method of claim 4, wherein the step of comparing the generated random number with a preset threshold and learning as a selected operation comprises:
When the generated random number is less than or equal to a preset threshold, the operation in the state of randomly selecting the number of SCS numerology and OFDM symbols is explored, and when the generated random number is greater than the preset threshold, the highest Q-value is selected. A wireless communication resource allocation method using artificial intelligence that exploits operation in a state of being.

The method of claim 1, wherein calculating the compensation value of the state comprises:
Further comprising, wireless communication resource allocation method using artificial intelligence; calculating a normalized load of each user using the service type URLLC and eMBB of each time slot.

According to claim 1,
The length of the slot is the sum of the length of the mini-slot for the URLLC service and the length of the slot for the eMBB service, wireless communication resource allocation method using artificial intelligence.

A computer-readable storage medium on which a computer program for performing the wireless communication resource allocation method using artificial intelligence according to any one of claims 1 to 7 is recorded.

a priority calculation unit for determining a user's scheduling priority between heterogeneous services of eMBB and URLLC using selectively input learning data;
an optimal state selection unit that selects the number of SubCarrier Spacing (SCS) numerology and Orthogonal Frequency Division Multiplex (OFDM) symbols for a user using the URLLC service using the determined scheduling priority as an input;
A compensation value calculation unit for calculating a compensation value of a state according to a combination of SCS numerology and OFDM symbol number that minimizes a normalized load (NL) in each slot of each user; and
An apparatus for allocating resources for wireless communication using artificial intelligence, comprising: an optimal value return unit for updating the operation value and learning data of the state by reflecting the calculated compensation value.

10. The method of claim 9, wherein the priority calculation unit,
Calculate the priority of each user according to the priority function for the service types eMBB and URLLC, for each slot, taking as inputs the arrival rate of each service type user in packets/sec, the packet length, the queue weight and the CSI of each user, An apparatus for allocating resources for wireless communication using artificial intelligence, which determines the ranking in descending order according to the priority of each user.

10. The method of claim 9, wherein the optimal state selection unit,
For each time slot, allocate a time slot to the eMBB user with the highest priority in the priority setting, and generate a random number between 0 and 1 for each mini slot with a time slot for a URLLC user; An apparatus for allocating resources for wireless communication using artificial intelligence, wherein the generated random number is compared with a preset threshold and learned as a selected operation.

The method of claim 11, wherein the optimal state selection unit,
When the generated random number is less than or equal to a preset threshold, the operation in the state of randomly selecting the number of SCS numerology and OFDM symbols is explored, and when the generated random number is greater than the preset threshold, the highest Q-value is selected. A wireless communication resource allocation device using artificial intelligence that exploits operation in a state of being.