KR20220112890A

KR20220112890A - Device and method for allocating resource in vehicle to everything communication based on non-orhthogonal multiple access

Info

Publication number: KR20220112890A
Application number: KR1020210016125A
Authority: KR
Inventors: 조성현; 이솔; 안세영
Original assignee: 한양대학교 에리카산학협력단
Priority date: 2021-02-04
Filing date: 2021-02-04
Publication date: 2022-08-12
Also published as: KR102555696B1

Abstract

Disclosed are a device and a method for allocating resources in vehicle communication based on non-orthogonal multiple access. In accordance with an embodiment of the present invention, the device for allocating resources includes: a reinforcement learning part performing reinforcement learning for modifying a policy such that the total data throughput between at least two vehicles and at least one base station can rise, while changing channels allocated to a plurality of partitions and positions of the vehicles; and a resource allocation part generating a resource allocation table in which one of the channels is allocated to each of the partitions according to a policy determined by the reinforcement learning part, and transmitting the resource allocation table to the vehicles and the base station. Therefore, the present invention is capable of efficiently allocating resources without causing an excessive delay.

Description

DEVICE AND METHOD FOR ALLOCATING RESOURCE IN VEHICLE TO EVERYTHING COMMUNICATION BASED ON NON-ORHTHOGONAL MULTIPLE ACCESS

본 발명은 비직교 다중 접속 방식 기반 차량 통신에서의 자원 할당 장치 및 방법에 관한 것으로서, 특히, 심층 강화 학습을 적용하여 과도한 지연 없이 효율적으로 자원을 할당할 수 있는 비직교 다중 접속 방식 기반 차량 통신에서의 자원 할당 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for allocating resources in non-orthogonal multiple access based vehicular communication, and in particular, in non-orthogonal multiple access based vehicular communication that can efficiently allocate resources without excessive delay by applying deep reinforcement learning. It relates to an apparatus and method for allocating resources.

차량통신(Vehicle to Everything; V2X)과 비직교 다중 접속 방식(Non-Orthogonal Multiple Access; NOMA)과 같은 기술들이 현재 상용화되고 있다.Technologies such as Vehicle to Everything (V2X) and Non-Orthogonal Multiple Access (NOMA) are currently being commercialized.

차량통신(V2X)은 이동하는 차량이 기지국 또는 다른 차량과 통신하는 기술로서 무선 통신에 의한 다중 경로 페이딩(multi-path fading)와 차량 이동에 따른 도플러 확산(doppler spread) 등 신호의 감쇄, 변화 및 간섭에 대응하여 안정적인 통신을 수행하는 것이 중요하다.Vehicle communication (V2X) is a technology in which a moving vehicle communicates with a base station or other vehicle. Attenuation, change and It is important to perform stable communication in response to interference.

비직교 다중 접속 방식(NOMA)은 동일한 채널(주파수)을 통해 동시에 2 이상의 수신 단말에 신호를 전송하여 자원 효율을 향상시키기 위한 기술로서 송신측에서는 동시에 송신하는 신호의 송신 전력을 제어하며, 수신측에서는 순차적 간섭 제거(successive interference cancellation; SIC)를 통해 다른 신호를 제거하고 본인의 신호만 디코딩하는 것이다.The non-orthogonal multiple access method (NOMA) is a technology for improving resource efficiency by transmitting signals to two or more receiving terminals simultaneously through the same channel (frequency). It cancels other signals through successful interference cancellation (SIC) and decodes only the own signal.

최근에는 자율주행과 차량 내 인포테인먼트 등의 발전으로 인해 차량통신에서의 통신 트래픽이 급격하게 증가하고 있으며, 특히, 도심지와 같이 차량의 수가 많은 환경에서는 기존의 통신 방식으로는 필요 데이터 처리량을 확보하지 못하는 문제가 발생하고 있다. 따라서, 데이터 처리량을 향상시키기 위해 차량통신에 비직교 다중 접속 방식을 적용하는 기술이 고려되고 있다. In recent years, communication traffic in vehicle communication is rapidly increasing due to the development of autonomous driving and in-vehicle infotainment. A problem is occurring. Accordingly, in order to improve data throughput, a technique for applying a non-orthogonal multiple access method to vehicle communication is being considered.

이때, 통신 기기들에게 채널을 할당하는, 특히, 비직교 다중 접속 방식에서 동일한 채널을 할당하고 전송 전력의 세기를 결정하는 무선 자원 할당(wireless resource allocation) 기술이 매우 중요하다.In this case, a wireless resource allocation technique for allocating channels to communication devices, in particular, allocating the same channel in a non-orthogonal multiple access method and determining the strength of transmission power is very important.

무선 자원할당이 적절하게 수행되어야 시스템 내의 기기들 간 간섭을 최소화할 수 있고 가장 전송 효율이 높은 채널을 사용함으로써 전체 시스템의 데이터 처리량을 극대화할 수 있다.When radio resource allocation is properly performed, interference between devices in the system can be minimized and the data throughput of the entire system can be maximized by using the channel with the highest transmission efficiency.

한편, 심층 강화 학습(Deep Reinforcement Learning; DRL)은 인공지능 분야에서 머신 러닝(machine learning)의 한 종류로서 기존의 강화 학습 알고리즘에 인공신경망(artificial neural network)를 적용하여 경우의 수가 많아 일반적으로 최적화 알고리즘 또는 수학적 접근법을 도출해내기 어려운 non-convex 특성을 가진 문제를 해결하기에 적합한 기술이다.On the other hand, Deep Reinforcement Learning (DRL) is a type of machine learning in the field of artificial intelligence. It is generally optimized by applying an artificial neural network to the existing reinforcement learning algorithm. It is a technique suitable for solving problems with non-convex characteristics that are difficult to derive algorithmic or mathematical approaches.

심층강화학습에서는 에이전트 (Agent), 환경 (Environment), 상태 (State), 행동 (Action), 보상 (Reward)과 같은 개념을 사용한다. 에이전트는 환경을 관찰 (Observation) 하여 현재 상태를 알아내고, 현재 상태에서 어떤 행동을 하면 얼마만큼의 보상을 받을 수 있는지를 예측할 수 있도록 학습하면서 각 상태에서 최대의 보상을 주는 행동을 하는 최적의 정책 (Policy)을 학습한다. 학습이 완료된 심층강화학습 모델을 사용하게 되면 자신의 정책에 따라 별도의 추가적인 학습 없이 현재 상태에서 최적의 보상을 반환하는 행동을 수행할 수 있기 때문에 실시간으로 수행되는 시스템에서의 즉각적인 답을 도출해야 할 때 유리하게 작용할 수 있으며, 시스템 내의 사용자들에게 동일한 모델을 분산형으로 공유할 수 있기 때문에 서로 간의 행동을 합의하는 과정에서 생기는 지연시간 또한 줄일 수 있다.In deep reinforcement learning, concepts such as Agent, Environment, State, Action, and Reward are used. The agent learns to find out the current state by observing the environment, and learns to predict how much reward can be obtained by taking an action in the current state. Learn (Policy). If you use the deep reinforcement learning model that has been trained, you can perform an action that returns the optimal reward in the current state without additional learning according to your own policy. This can be advantageous when the user in the system can share the same model in a distributed manner, so the delay in the process of agreeing actions can also be reduced.

차량통신은 차량의 이동 등 변수가 높기 때문에 비직교 다중 접속 방식을 도입하는데 다양한 어려움이 있는데, 특히 자원 할당 측면에서 큰 어려움이 있다. 특히, 구역 내 사용자(또는 차량)가 증가하면 자원 할당 문제를 짧은 시간 내에 계산하기 어렵고 전송 효율 조건을 충족하는데 어려움이 발생할 수 있다. 따라서, 차량 통신의 다양한 변수를 고려하더라도 시스템 내 필요 데이터 처리량을 충족하고 지연이 발생하지 않는 효율적인 자원 할당 기법이 필요하다.Since vehicle communication has high variables such as vehicle movement, there are various difficulties in introducing a non-orthogonal multiple access method. In particular, there is a great difficulty in resource allocation. In particular, if the number of users (or vehicles) in the area increases, it may be difficult to calculate the resource allocation problem in a short time, and difficulty may occur in meeting the transmission efficiency condition. Therefore, there is a need for an efficient resource allocation technique that satisfies the required data throughput in the system and does not cause delay even in consideration of various variables of vehicle communication.

본 발명이 이루고자 하는 기술적인 과제는 심층 강화 학습을 적용하여 과도한 지연 없이 효율적으로 자원을 할당할 수 있는 비직교 다중 접속 방식 기반 차량 통신에서의 자원 할당 장치 및 방법을 제공하는 것이다.An object of the present invention is to provide an apparatus and method for allocating resources in non-orthogonal multiple access method-based vehicle communication that can efficiently allocate resources without excessive delay by applying deep reinforcement learning.

본 발명의 실시 예에 따른 비직교 다중 접속 방식 기반 차량 통신에서의 자원 할당 장치는 복수의 구획들에 할당되는 채널들과 적어도 2 이상의 차량들의 위치를 변경하면서 상기 차량들과 적어도 1 이상의 기지국 간의 전체 데이터 처리량이 증가되도록 정책(policy)을 수정하는 강화 학습을 수행하는 강화 학습부 및 상기 강화 학습부에 의해 결정된 정책에 따라 상기 구획들 각각에 상기 채널들 중 하나가 할당된 자원 할당 테이블을 생성하고 상기 자원 할당 테이블을 상기 차량들과 상기 기지국으로 전송하는 자원 할당부를 포함할 수 있다.The resource allocating apparatus in non-orthogonal multiple access method-based vehicle communication according to an embodiment of the present invention changes channels allocated to a plurality of partitions and positions of at least two or more vehicles while changing the positions of the vehicles and at least one or more base stations. Create a resource allocation table in which one of the channels is allocated to each of the partitions according to a policy determined by a reinforcement learning unit and a reinforcement learning unit that performs reinforcement learning to modify a policy so that data throughput is increased, and and a resource allocator for transmitting the resource allocation table to the vehicles and the base station.

실시 예에 따라, 상기 채널들은 주파수 및 신호 세기 중에서 적어도 하나에 의해 구분될 수 있다.According to an embodiment, the channels may be divided by at least one of a frequency and a signal strength.

실시 예에 따라, 상기 강화 학습은, 상기 구획들에 할당되는 상기 채널들을 변경하는 행동(action)을 한 후 상기 데이터 처리량을 보상(reward)으로서 측정하며 상기 전체 데이터 처리량이 증가되도록 상기 정책을 수정하는 학습 과정을 n번(n은 2 이상의 자연수) 반복하는 (a) 단계, 상기 차량들마다 설정된 운행 정보에 따라 상기 차량들의 상기 위치를 이동시키면서 (a) 단계를 m번(m은 2 이상의 자연수) 반복하는 (b) 단계 및 상기 차량들의 초기 위치를 변경하면서 (b) 단계를 p번(p는 2 이상의 자연수) 반복하는 (c) 단계를 포함할 수 있다.According to an embodiment, the reinforcement learning measures the data throughput as a reward after performing an action to change the channels allocated to the partitions, and modifies the policy so that the total data throughput is increased. Step (a) repeating the learning process n times (n is a natural number greater than or equal to 2) n times (n is a natural number greater than or equal to 2), and step (a) is repeated m times (m is a natural number greater than or equal to 2) while moving the positions of the vehicles according to the driving information set for each vehicle. ) repeating step (b) and (c) repeating step (b) p times (p is a natural number equal to or greater than 2) while changing the initial positions of the vehicles.

실시 예에 따라, 상기 (a) 단계는, 상기 정책에 따라 생성된 자원 할당 테이블에 기초하여 상기 구획들에 상기 채널들을 할당하는 단계, 상기 차량들과 상기 기지국 간의 통신을 관찰하여 상기 전체 데이터 처리량을 측정하는 단계 및 소정의 목표 함수(objective function)의 결과 값과 상기 보상의 차이인 손실(loss)이 최소화되도록 뉴럴 네트워크의 가중치를 업데이트하는 단계를 포함할 수 있다.According to an embodiment, the step (a) may include allocating the channels to the partitions based on the resource allocation table generated according to the policy, observing communication between the vehicles and the base station to determine the total data throughput and updating the weight of the neural network so that a loss, which is a difference between a result value of a predetermined objective function and the compensation, is minimized.

실시 예에 따라, 상기 운행 정보는 상기 차량들 각각의 진행 방향과 속도를 포함할 수 있다.According to an embodiment, the driving information may include the traveling direction and speed of each of the vehicles.

실시 예에 따라, 상기 전체 데이터 처리량은 상기 차량들 각각으로부터 다른 차량 또는 상기 기지국으로의 상향 링크에서의 데이터 처리량과 상기 기지국으로부터 상기 차량들로의 하향 링크에서의 데이터 처리량을 포함할 수 있다.According to an embodiment, the total data throughput may include a data throughput in an uplink from each of the vehicles to another vehicle or the base station and a data throughput in a downlink from the base station to the vehicles.

실시 예에 따라, 상기 전체 데이터 처리량은 다음의 수학식에 의해 계산되며,

, 여기서, N은 상기 채널들의 집합, V는 상기 차량들의 집합, B는 상기 기지국의 집합,

는 n번째 채널을 통한 i번째 차량으로부터 다른 차량으로의 상향 링크에서의 제1 데이터 처리량,

는 상기 n번째 채널을 통한 상기 i번째 차량으로부터 상기 기지국으로의 상향 링크에서의 제2 데이터 처리량,

는 상기 n번째 채널을 통한 j번째 기지국으로부터 제1 차량으로의 하향 링크에서의 제3 데이터 처리량,

는 상기 n번째 채널을 통한 상기 j번째 기지국으로부터 제2 차량으로의 하향 링크에서의 제4 데이터 처리량을 나타낸다.According to an embodiment, the total data throughput is calculated by the following equation,

, where N is the set of channels, V is the set of vehicles, B is the set of base stations,

is the first data throughput in the uplink from the i-th vehicle to another vehicle through the n-th channel,

is a second data throughput in the uplink from the i-th vehicle to the base station through the n-th channel,

is the third data throughput in the downlink from the j-th base station to the first vehicle through the n-th channel,

denotes the fourth data throughput in the downlink from the j-th base station to the second vehicle through the n-th channel.

실시 예에 따라, 상기 제1 데이터 처리량은 다음의 수학식에 따라 계산되며,

, 여기서,

는 상기 i번째 차량에 상기 n번째 채널의 할당 여부를 나타내는 값,

는 상기 i번째 차량과 상기 다른 차량 사이의 상기 n번째 채널의 제1 SINR(signal to interference and noise ratio) 값을 나타낸다.According to an embodiment, the first data throughput is calculated according to the following equation,

, here,

is a value indicating whether the n-th channel is allocated to the i-th vehicle,

denotes a first signal to interference and noise ratio (SINR) value of the n-th channel between the i-th vehicle and the other vehicle.

실시 예에 따라, 상기 제2 데이터 처리량은 다음의 수학식에 따라 계산되며,

, 여기서,

는 상기 i번째 차량과 상기 기지국 사이의 상기 n번째 채널의 제2 SINR 값을 나타낸다.According to an embodiment, the second data throughput is calculated according to the following equation,

, here,

denotes the second SINR value of the n-th channel between the i-th vehicle and the base station.

상기 제3 데이터 처리량은 다음의 수학식에 따라 계산되며,

, 여기서,

는 상기 j번째 기지국에 상기 n번째 채널의 할당 여부를 나타내는 값,

는 상기 n번째 채널을 통한 상기 j번째 기지국으로부터 상기 제1 차량으로의 하향 링크에서의 제3 SINR 값을 나타낸다.The third data throughput is calculated according to the following equation,

, here,

is a value indicating whether the n-th channel is allocated to the j-th base station,

denotes a third SINR value in a downlink from the j-th base station to the first vehicle through the n-th channel.

실시 예에 따라, 상기 제4 데이터 처리량은 다음의 수학식에 따라 계산되며,

, 여기서,

는 상기 n번째 채널을 통한 상기 j번째 기지국으로부터 상기 제2 차량으로의 하향 링크에서의 제4 SINR 값을 나타낸다.According to an embodiment, the fourth data throughput is calculated according to the following equation,

, here,

denotes a fourth SINR value in a downlink from the j-th base station to the second vehicle through the n-th channel.

본 발명의 실시 예에 따른 비직교 다중 접속 방식 기반 차량 통신에서의 자원 할당 방법은 복수의 구획들에 할당되는 채널들을 변경하는 행동(action)을 한 후 적어도 2 이상의 차량들과 적어도 1 이상의 기지국 간의 전체 데이터 처리량을 보상(reward)으로서 측정하며 상기 전체 데이터 처리량이 증가되도록 정책(policy)을 수정하는 학습 과정을 n번(n은 2 이상의 자연수) 반복하는 (a) 단계, 상기 차량들마다 설정된 운행 정보에 따라 상기 차량들의 위치를 이동시키면서 (a) 단계를 m번(m은 2 이상의 자연수) 반복하는 (b) 단계, 상기 차량들의 초기 위치를 변경하면서 (b) 단계를 p번(p는 2 이상의 자연수) 반복하는 (c) 단계 및 (a) 내지 (c) 단계를 통해 결정된 상기 정책에 따라 상기 구획들에 상기 채널들을 할당하는 (d) 단계를 포함할 수 있다.A resource allocation method in non-orthogonal multiple access scheme-based vehicle communication according to an embodiment of the present invention performs an action of changing channels allocated to a plurality of partitions, and then between at least two or more vehicles and at least one or more base stations Step (a) of measuring the total data throughput as a reward and repeating the learning process of modifying the policy so that the total data throughput is increased n times (n is a natural number greater than or equal to 2); Step (b) repeating step (a) m times (m is a natural number greater than or equal to 2) while moving the locations of the vehicles according to the information, and repeating step (b) p times (p is 2) while changing the initial locations of the vehicles or more) repeating steps (c) and (d) allocating the channels to the partitions according to the policy determined through steps (a) to (c).

, 여기서,

, here,

, 여기서,

, here,

상기 제3 데이터 처리량은 다음의 수학식에 따라 계산되며,

, 여기서,

, here,

, 여기서,

, here,

본 발명의 실시 예에 따른 비직교 다중 접속 방식 기반 차량 통신에서의 자원 할당 장치 및 방법은 심층 강화 학습을 적용하여 과도한 지연 없이 효율적으로 자원을 할당할 수 있다.A resource allocation apparatus and method in non-orthogonal multiple access scheme-based vehicle communication according to an embodiment of the present invention can efficiently allocate resources without excessive delay by applying deep reinforcement learning.

본 발명의 상세한 설명에서 인용되는 도면을 보다 충분히 이해하기 위하여 각 도면의 상세한 설명이 제공된다.
도 1은 본 발명의 실시 예에 따른 무선 통신 시스템을 나타내는 개념도이다.
도 2는 본 발명의 실시 예에 따른 자원 할당 장치를 나타내는 블록도이다.
도 3은 도 2에 도시된 강화 학습부의 강화 학습 과정을 나타내는 플로우 차트(flow chart)이다.
도 4는 도 3에 도시된 S200 단계를 보다 상세하게 나타내는 플로우 차트이다.
도 5는 도 3에 도시된 강화 학습 과정을 시뮬레이션하기 위한 가상의 시스템을 나타내는 도면이다.
도 6은 강화 학습이 수행되는 동안 학습 초기 단계에서의 손실의 변화를 나타내는 그래프이다.
도 7은 강화 학습이 수행되는 동안 학습 후기 단계에서의 손실의 변화를 나타내는 그래프이다.
도 8은 강화 학습이 수행되는 동안 할인 보상(discounted reward)의 변화를 나타내는 그래프이다.In order to more fully understand the drawings recited in the Detailed Description of the Invention, a detailed description of each drawing is provided.
1 is a conceptual diagram illustrating a wireless communication system according to an embodiment of the present invention.
2 is a block diagram illustrating an apparatus for allocating a resource according to an embodiment of the present invention.
3 is a flowchart illustrating a reinforcement learning process of the reinforcement learning unit shown in FIG. 2 .
FIG. 4 is a flowchart illustrating the step S200 shown in FIG. 3 in more detail.
FIG. 5 is a diagram illustrating a virtual system for simulating the reinforcement learning process shown in FIG. 3 .
6 is a graph illustrating a change in loss in an initial stage of learning while reinforcement learning is performed.
7 is a graph illustrating a change in loss at a late learning stage while reinforcement learning is performed.
8 is a graph illustrating a change in a discounted reward while reinforcement learning is performed.

본 명세서에 개시되어 있는 본 발명의 개념에 따른 실시 예들에 대해서 특정한 구조적 또는 기능적 설명들은 단지 본 발명의 개념에 따른 실시 예들을 설명하기 위한 목적으로 예시된 것으로서, 본 발명의 개념에 따른 실시 예들은 다양한 형태들로 실시될 수 있으며 본 명세서에 설명된 실시 예들에 한정되지 않는다.Specific structural or functional descriptions for the embodiments according to the concept of the present invention disclosed in this specification are only exemplified for the purpose of explaining the embodiments according to the concept of the present invention, and the embodiments according to the concept of the present invention are It may be implemented in various forms and is not limited to the embodiments described herein.

본 발명의 개념에 따른 실시 예들은 다양한 변경들을 가할 수 있고 여러 가지 형태들을 가질 수 있으므로 실시 예들을 도면에 예시하고 본 명세서에 상세하게 설명하고자 한다. 그러나, 이는 본 발명의 개념에 따른 실시 예들을 특정한 개시 형태들에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물, 또는 대체물을 포함한다.Since the embodiments according to the concept of the present invention may have various changes and may have various forms, the embodiments will be illustrated in the drawings and described in detail herein. However, this is not intended to limit the embodiments according to the concept of the present invention to specific disclosed forms, and includes all modifications, equivalents, or substitutes included in the spirit and scope of the present invention.

제1 또는 제2 등의 용어는 다양한 구성 요소들을 설명하는데 사용될 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로만, 예컨대 본 발명의 개념에 따른 권리 범위로부터 이탈되지 않은 채, 제1구성요소는 제2구성요소로 명명될 수 있고, 유사하게 제2구성요소는 제1구성요소로도 명명될 수 있다.Terms such as first or second may be used to describe various elements, but the elements should not be limited by the terms. The above terms are used only for the purpose of distinguishing one element from other elements, for example, without departing from the scope of the present invention, a first element may be called a second element, and similarly The second component may also be referred to as the first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 구성요소들 간의 관계를 설명하는 다른 표현들, 즉 "~사이에"와 "바로 ~사이에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.When a component is referred to as being “connected” or “connected” to another component, it may be directly connected or connected to the other component, but it is understood that other components may exist in between. it should be On the other hand, when it is said that a certain element is "directly connected" or "directly connected" to another element, it should be understood that the other element does not exist in the middle. Other expressions describing the relationship between elements, such as "between" and "immediately between" or "neighboring to" and "directly adjacent to", etc., should be interpreted similarly.

본 명세서에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used herein are used only to describe specific embodiments, and are not intended to limit the present invention. The singular expression includes the plural expression unless the context clearly dictates otherwise. In the present specification, terms such as “comprise” or “have” are intended to designate that the described feature, number, step, operation, component, part, or combination thereof exists, and includes one or more other features or numbers. , it should be understood that it does not preclude the possibility of the presence or addition of steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present specification. does not

이하, 첨부한 도면을 참조하여 본 발명의 바람직한 실시 예를 설명함으로써, 본 발명을 상세히 설명한다.Hereinafter, the present invention will be described in detail by describing preferred embodiments of the present invention with reference to the accompanying drawings.

도 1은 본 발명의 실시 예에 따른 무선 통신 시스템을 나타내는 개념도이다.1 is a conceptual diagram illustrating a wireless communication system according to an embodiment of the present invention.

도 1을 참조하면, 무선 통신 시스템(10)은 기지국(100)과 차량들(200-1 내지 200-4, 이하, 통칭할 때는 200으로 표시함)을 포함한다. 도 1에서는 하나의 기지국(100)과 4대의 차량들(200-1 내지 200-4)이 도시되어 있으나, 본 발명의 기술적 사상은 이에 한정되지 않는다. 즉, 무선 통신 시스템(10)은 적어도 하나의 기지국과 적어도 2 이상의 차량들을 포함할 수 있다.Referring to FIG. 1 , the wireless communication system 10 includes a base station 100 and vehicles 200 - 1 to 200 - 4 (hereinafter, referred to as 200 when commonly referred to). Although one base station 100 and four vehicles 200-1 to 200-4 are illustrated in FIG. 1, the technical spirit of the present invention is not limited thereto. That is, the wireless communication system 10 may include at least one base station and at least two or more vehicles.

기지국(100)과 차량등(200-1 내지 200-4)는 비직교 다중 접속 방식(non-orthogonal multiple access)으로 서로 통신할 수 있다.The base station 100 and the vehicles 200 - 1 to 200 - 4 may communicate with each other in a non-orthogonal multiple access method.

예를 들어, 기지국(100)은 동일한 주파수로 신호의 세기를 다르게 하여 제1 차량(200-1)과 제2 차량(200-2)으로 신호를 송신할 수 있다(하향 링크 채널(NOMA1)). 이때, 기지국(100)은 기지국(100)으로부터 제1 차량(200-1)과 제2 차량(200-2) 각각까지의 거리에 따라 제1 차량(200-1)과 제2 차량(200-2)으로 송신하는 신호의 세기를 조절할 수 있다. 기지국(100)과 제1 차량(200-1)의 거리와 기지국(100)과 제2 차량(200-2)의 거리의 차이가 크도록 제1 차량(200-1)과 제2 차량(200-2)을 선택하는 것이 바람직할 수 있다. 기지국(100)은 제1 차량(200-1)과 제2 차량(200-2) 중에서 기지국(100)에서 더 먼 거리의 차량을 기준으로 하향 링크 채널(NOMA1)의 통신 범위를 결정할 수 있다.For example, the base station 100 may transmit signals to the first vehicle 200 - 1 and the second vehicle 200 - 2 with different signal strengths at the same frequency (downlink channel NOMA1). . At this time, the base station 100 is the first vehicle 200-1 and the second vehicle 200- according to the distance from the base station 100 to the first vehicle 200-1 and the second vehicle 200-2, respectively. 2), you can adjust the strength of the transmitted signal. The first vehicle 200-1 and the second vehicle 200 are such that the difference between the distance between the base station 100 and the first vehicle 200-1 and the distance between the base station 100 and the second vehicle 200-2 is large. It may be desirable to select -2). The base station 100 may determine the communication range of the downlink channel NOMA1 based on a vehicle that is farther from the base station 100 among the first vehicle 200 - 1 and the second vehicle 200 - 2 .

또한, 제3 차량(200-3)은 동일한 주파수로 신호의 세기를 다르게 하여 기지국(100)과 제4 차량(200-4)으로 신호를 송신할 수 있다(상향 링크 채널(NOMA2)). 이때, 제3 차량(200-3)은 제3 차량(200-3)으로부터 기지국(100)과 제42 차량(200-4) 각각까지의 거리에 따라 기지국(100)과 제4 차량(200-4)으로 송신하는 신호의 세기를 조절할 수 있다. 제3 차량(200-3)과 기지국(100)의 거리와 제3 차량(200-3)과 제4 차량(200-4)의 거리의 차이가 크도록 제4 차량(200-4)을 선택하는 것이 바람직할 수 있다. 제3 차량(200-3)은 기지국(100)과 제4 차량(200-4) 중에서 기지국(100)에서 더 먼 거리의 차량을 기준으로 상향 링크 채널(NOMA2)의 통신 범위를 결정할 수 있다.Also, the third vehicle 200 - 3 may transmit a signal to the base station 100 and the fourth vehicle 200 - 4 by varying the signal strength at the same frequency (uplink channel NOMA2). At this time, the third vehicle 200-3 is the base station 100 and the fourth vehicle 200- according to the distances from the third vehicle 200-3 to the base station 100 and the 42nd vehicle 200-4, respectively. 4), you can adjust the strength of the transmitted signal. The fourth vehicle 200-4 is selected so that the difference between the distance between the third vehicle 200-3 and the base station 100 and the distance between the third vehicle 200-3 and the fourth vehicle 200-4 is large. It may be desirable to The third vehicle 200 - 3 may determine the communication range of the uplink channel NOMA2 based on a vehicle having a greater distance from the base station 100 among the base station 100 and the fourth vehicle 200 - 4 .

본 발명에서는 도로(Road)에 해당하는 영역이 복수의 구획들(HF)로 구분한다. 구획들(HF)은 사전에 설정된 형태로 정의될 수 있는데, 가장 간단하게는 사각형 형태로 정의될 수 있다. In the present invention, an area corresponding to a road is divided into a plurality of sections HF. The compartments HF may be defined in a preset shape, and most simply, may be defined in a rectangular shape.

도 1에서는 도로(Road)가 왕복 2차선으로 도시되어 있고, 왕복 2차선의 폭 전체가 포함되도록 구획들(HF)이 정의되어 있으나, 본 발명의 기술적 사상은 이에 한정되지 않는다. 예를 들어, 구획들(HF) 각각은 차선 하나만 포함하도록 설정될 수 있다.In FIG. 1 , a road is illustrated as a two-lane reciprocating lane, and sections HF are defined to include the entire width of the reciprocating two-lane, but the technical spirit of the present invention is not limited thereto. For example, each of the sections HF may be set to include only one lane.

또한, 구획들(HF) 각각에 무선 통신을 위한 자원, 예를 들어, 주파수 등이 할당되는데 구획들(HF) 각각에 하나의 통신 장치, 예를 들어, 하나의 차량(200)만 존재하는 것이 바람직하므로(복수의 차량들이 존재하는 경우 복수의 차량들에 하나의 주파수가 할당되어 주파수 충돌이 발생할 우려가 있음) 구획들(HF)의 크기는 이를 고려하여 정의될 수 있다. 즉, 구획들(HF) 각각의 크기는 차량(200)의 크기 보다 작게 설정되는 것이 바람직할 수 있다.In addition, in each of the compartments HF, a resource, for example, a frequency, etc. for wireless communication is allocated, and there is only one communication device, eg, one vehicle 200, in each of the compartments HF. Since it is desirable (when there are a plurality of vehicles, one frequency is allocated to the plurality of vehicles, there is a possibility that a frequency collision may occur), the size of the partitions HF may be defined in consideration of this. That is, the size of each of the compartments HF may be preferably set to be smaller than the size of the vehicle 200 .

무선 통신 시스템(10)에 도 1에 도시된 바와 같이 소수의 기지국(100)과 차량들(200)만 존재하는 경우에는 다양한 방법으로 큰 지연 없이 기지국(100)과 차량들(200)에 채널을 할당할 수 있으나, 기지국(100)과 차량들(200)이 다수인 경우에는 미리 정해진 알고리즘으로 적절하게 채널을 할당할 수 없다.As shown in FIG. 1 in the wireless communication system 10, when only a small number of base stations 100 and vehicles 200 exist, channels are provided to the base stations 100 and vehicles 200 without a large delay in various ways. However, when there are a plurality of base stations 100 and vehicles 200, channels cannot be appropriately allocated using a predetermined algorithm.

본 발명의 실시 예에 따른 자원 할당 장치(300)는 심층 강화 학습을 통해 결정된 정책에 따라 무선 통신 시스템(10)에서 기지국(100)과 차량들(200)이 사용할 자원을 할당한다.The resource allocating apparatus 300 according to an embodiment of the present invention allocates resources to be used by the base station 100 and the vehicles 200 in the wireless communication system 10 according to a policy determined through deep reinforcement learning.

자원 할당 장치(300)는 일종의 컴퓨팅 시스템으로서 본 발명의 실시 예에 따른 자원 할당 방법을 수행할 수 있다. 즉, 본 발명의 자원 할당 방법은 컴퓨터에 의해 수행될 수 있다.The resource allocating apparatus 300 may perform a resource allocation method according to an embodiment of the present invention as a kind of computing system. That is, the resource allocation method of the present invention may be performed by a computer.

자원 할당 장치(300)는 기지국(100)과 접속된 컴퓨팅 시스템에 구비될 수 있다. 예를 들어, 자원 할당 장치(300)는 기지국(100)에 그 기능이 구현되거나 기지국(100)에 접속된 서버 등에 구현될 수 있다.The resource allocating apparatus 300 may be provided in a computing system connected to the base station 100 . For example, the resource allocating apparatus 300 may implement its function in the base station 100 or may be implemented in a server connected to the base station 100 .

본 명세서에서 채널은 주파수 및 신호 세기 중에서 적어도 하나에 의해 구분되는 무선 통신을 위한 자원을 의미할 수 있다.In the present specification, a channel may mean a resource for wireless communication divided by at least one of a frequency and a signal strength.

자원 할당 장치(300)의 구조 및 동작은 도 2 및 도 3을 통해 보다 상세하게 설명될 것이다.The structure and operation of the resource allocating apparatus 300 will be described in more detail with reference to FIGS. 2 and 3 .

도 2는 본 발명의 실시 예에 따른 자원 할당 장치를 나타내는 블록도이며, 도 3은 도 2에 도시된 강화 학습부의 강화 학습 과정을 나타내는 플로우 차트(flow chart)이다.FIG. 2 is a block diagram illustrating an apparatus for allocating resources according to an embodiment of the present invention, and FIG. 3 is a flowchart illustrating a reinforcement learning process of the reinforcement learning unit shown in FIG. 2 .

도 2 및 도 3을 참조하면, 자원 할당 장치(300)는 강화 학습부(310)와 자원 할당부(330)를 포함할 수 있다.2 and 3 , the resource allocating apparatus 300 may include a reinforcement learning unit 310 and a resource allocating unit 330 .

강화 학습부(310)는 구획들(HF)에 할당되는 채널들과 차량들(200)의 위치를 변경하면서 차량들(200)과 기지국(100) 간의 전체 데이터 처리량이 증가하는 방향으로 정책(policy)을 수정하는 강화 학습을 수행할 수 있다.The reinforcement learning unit 310 changes the channels allocated to the compartments HF and the positions of the vehicles 200 in the direction of increasing the total data throughput between the vehicles 200 and the base station 100 . ) can be modified by reinforcement learning.

상기 강화 학습에서 환경(environment)은 무선 통신 시스템(10)의 X축 및 Y축의 길이, 도로(Road)의 길이, 너비 및 위치(좌표), 무선 링크(상향 링크, 하향 링크)의 속성, 차량들(200)의 이동 속도 등으로 정의될 수 있고, 행위(action)는 구획들(HF) 각각에 채널을 할당하는 것으로 정의될 수 있고, 보상(reward)은 무선 통신 시스템(10)에서의 전체 데이터 처리량(throughput)으로 정의될 수 있고, 정책(policy)은 주어진 상태에서 최적의 자원 할당을 수행하기 위한 기준(뉴럴 네트워크)로 정의될 수 있으며, 손실(loss)은 주어진 상태에서 에이전트(agent)가 정해진 행동을 했을 때 예상되는 보상과 실제 해당 행동을 했을 때 측정된 보상의 차이로 정의될 수 있다.In the reinforcement learning, the environment is the length of the X-axis and the Y-axis of the wireless communication system 10, the length, width, and location (coordinates) of the road, the properties of the wireless link (uplink, downlink), vehicle It can be defined as the moving speed of the devices 200 , etc., the action can be defined as allocating a channel to each of the compartments HF, and the reward can be defined as the overall in the wireless communication system 10 . It can be defined as data throughput, policy can be defined as a criterion (neural network) for performing optimal resource allocation in a given state, and loss can be defined as an agent in a given state. It can be defined as the difference between the expected reward for performing a given behavior and the measured reward for actually performing the corresponding behavior.

강화 학습부(310)는 최소 단위의 학습 과정을 3단계로 반복하면서 강화 학습을 수행할 수 있다.The reinforcement learning unit 310 may perform reinforcement learning while repeating the learning process of the minimum unit in three steps.

본 명세서에는, 설명의 편의를 위해, 반복 단위를 작은 순서대로 에포크(epoch), 에라(ear) 및 에피소드(episode)라고 정의한다. 상위 단계는 조건을 변경하면서 하위 단계를 설정된 횟수만큼 반복하는 것을 의미할 수 있다. 예를 들어, 에포크는 상기 최소 단위의 학습 과정을 의미하고, 1 에라는 상기 에포크가 n번(n은 2 이상의 자연수) 반복되는 것을 의미하고, 1 에피소드는 에라가 m번(m은 2 이상의 자연수) 반복되는 것을 의미한다. 에피소드는 p번(p는 2 이상의 자연수) 반복된다.In the present specification, for convenience of description, a repeating unit is defined as an epoch, an ear, and an episode, in a small order. The upper step may mean repeating the lower step a set number of times while changing the condition. For example, epoch means the learning process of the minimum unit, 1 epoch means that the epoch is repeated n times (n is a natural number greater than or equal to 2), and 1 episode means that the epoch is repeated m times (m is a natural number greater than or equal to 2). ) is repeated. The episode is repeated p times (p being a natural number greater than or equal to 2).

강화 학습부(310)는, 심층 강화 학습의 전처리로서, 구역들(HF)을 정의하거나, 학습 단계의 반복 횟수, 즉, 에포크, 에라 및 에피소드의 반복 횟수를 정의하거나, 기지국(100)과 차량들(200)의 위치를 설정할 수 있다.Reinforcement learning unit 310, as a pre-processing of deep reinforcement learning, defines the zones HF, or defines the number of repetitions of the learning step, that is, the number of repetitions of epochs, errata and episodes, or the base station 100 and the vehicle. It is possible to set the position of the 200.

도 3의 단계들(S100 내지 S500)은 강화 학습부(310), 구체적으로, 강화 학습부의 에이전트(agent)에 의해 수행될 수 있다. 본 명세서에서는 설명의 편의를 위해 에이전트를 강화 학습부(310)로 통칭한다.Steps S100 to S500 of FIG. 3 may be performed by the reinforcement learning unit 310 , specifically, an agent of the reinforcement learning unit. In this specification, for convenience of description, the agent is collectively referred to as the reinforcement learning unit 310 .

심층 강화 학습의 첫번째 단계로서, 강화 학습부(310)는 무선 통신 시스템(10)의 상태를 관측할 수 있다(S100). 예를 들어, 강화 학습부(310)는 무선 통신 시스템(10)에서의 기지국(100)과 차량등(200)의 위치를 관측할 수 있다.As a first step of deep reinforcement learning, the reinforcement learning unit 310 may observe the state of the wireless communication system 10 (S100). For example, the reinforcement learning unit 310 may observe the positions of the base station 100 and the vehicle light 200 in the wireless communication system 10 .

강화 학습부(310)는, 최소 단위의 학습 과정으로서, 관측된 상태를 기반으로 구획들(HF)에 할당되는 채널들을 변경하면서 기지국(100)과 차량들(200)간의 전체 데이터 처리량이 증가되는 방향으로 정책(policy)을 수정할 수 있다(S200). 다시 말해, 강화 학습부(310)는 관측된 상태를 기반으로 소정의 학습 횟수 동안 구획들(HF)에 할당되는 채널들을 변경하는 행동을 하면서 상기 전체 데이터 처리량을 보상으로서 측정하고 상기 보상을 최대화하고 손실을 최소화할 수 있도록 정책을 업데이트할 수 있다.Reinforcement learning unit 310, as a learning process of a minimum unit, the total data throughput between the base station 100 and the vehicles 200 is increased while changing the channels allocated to the compartments HF based on the observed state. It is possible to modify the policy in the direction (S200). In other words, the reinforcement learning unit 310 measures the total data throughput as a reward and maximizes the reward while changing the channels allocated to the compartments HF for a predetermined number of learning times based on the observed state. Policies can be updated to minimize losses.

강화 학습부(310)이 상기 최소 단위의 학습 과정을 수행하는 프로세스는 도 4를 통해 보다 상세하게 설명될 것이다.A process by which the reinforcement learning unit 310 performs the learning process of the minimum unit will be described in more detail with reference to FIG. 4 .

도 4는 도 3에 도시된 S200 단계를 보다 상세하게 나타내는 플로우 차트이다.FIG. 4 is a flowchart illustrating the step S200 shown in FIG. 3 in more detail.

도 4를 참조하면, 강화 학습부(310)는 정책에 따라 자원 할당 테이블을 생성하고 생성된 자원 할당 테이블에 기초하여 구획들(HF)에 채널들을 할당할 수 있다(S201).Referring to FIG. 4 , the reinforcement learning unit 310 may generate a resource allocation table according to a policy and allocate channels to the partitions HF based on the generated resource allocation table ( S201 ).

상기 자원 할당 테이블은 구획들(HF)의 번호를 인덱스(index)로 하고 구획들(HF) 각각에서 사용할 채널 번호를 요소로 하는 2차원 배열로 정의될 수 있다.The resource allocation table may be defined as a two-dimensional array having the number of the partitions HF as an index and the channel number to be used in each of the partitions HF as an element.

하나의 구획에는 하나의 채널이 할당되는 것이, 시스템의 복잡도를 저감하는 측면에서, 가장 바람직할 수 있다. 강화 학습 이후 실제 무선 통신을 수행할 때 하나의 구획에 하나의 채널이 할당되어야 차량들(200)이 추가적인 확인/검증 프로세스 없이 자신의 위치에 따라 채널을 결정하여 통신할 수 있기 때문이다.Allocating one channel to one partition may be most preferable in terms of reducing system complexity. This is because when one channel is allocated to one compartment when actual wireless communication is performed after reinforcement learning, the vehicles 200 can communicate by determining a channel according to their location without an additional confirmation/verification process.

강화 학습부(310)는 차량들(200)과 기지국(100) 간의 통신을 관찰하여 무선 통신 시스템(10)에서의 전체 데이터 처리량을 측정할 수 있다(S203).The reinforcement learning unit 310 may measure the total data throughput in the wireless communication system 10 by observing the communication between the vehicles 200 and the base station 100 ( S203 ).

상기 전체 데이터 처리량은 차량들(200) 중에서 어느 하나로부터 차량들(200) 중에서 다른 차량 또는 기지국(100)으로의 상향 링크에서의 데이터 처리량과 기지국(100)으로부터 차량들(200)로의 하향 링크에서의 데이터 처리량을 포함할 수 있다.The total data throughput is the data throughput in the uplink from any one of the vehicles 200 to another of the vehicles 200 or the base station 100 and the data throughput in the downlink from the base station 100 to the vehicles 200 . of data throughput.

상기 전체 데이터 처리량은 다음의 수학식 1에 의해 계산될 수 있으며, 수학식 1은 심층 강화 학습에 있어서 목표 함수(objective function)으로 역할할 수 있다.The total data throughput may be calculated by Equation 1 below, which may serve as an objective function in deep reinforcement learning.

[수학식 1][Equation 1]

수학식 1에서 N은 채널들의 집합, V는 차량들(200)의 집합, B는 기지국(100)의 집합을 나타낸다.

는 상기 채널들 중에서 n번째 채널을 통해 차량들(200) 중에서 i번째 차량으로부터 차량들(200) 중에서 다른 차량으로의 상향 링크에서의 제1 데이터 처리량을 나타낸다.

는 상기 n번째 채널을 통한 상기 i번째 차량으로부터 복수의 기지국들 중에서 대응하는 기지국으로의 상향 링크에서의 제2 데이터 처리량을 나타낸다.

는 상기 n번째 채널을 통한 복수의 기지국들 중에서 j번째 기지국으로부터 차량들(200) 중에서 제1 차량으로의 하향 링크에서의 제3 데이터 처리량,

는 상기 n번째 채널을 통한 상기 j번째 기지국으로부터 차량들(200) 중에서 제2 차량으로의 하향 링크에서의 제4 데이터 처리량을 나타낸다.In Equation 1, N represents a set of channels, V represents a set of vehicles 200 , and B represents a set of base stations 100 .

denotes a first data throughput in an uplink from the i-th vehicle among the vehicles 200 to another vehicle among the vehicles 200 through the n-th channel among the channels.

denotes a second data throughput in an uplink from the i-th vehicle through the n-th channel to a corresponding base station among a plurality of base stations.

is a third data throughput in a downlink from a j-th base station among a plurality of base stations through the n-th channel to a first vehicle among vehicles 200;

denotes a fourth data throughput in a downlink from the j-th base station to a second vehicle among the vehicles 200 through the n-th channel.

상기 제1 데이터 처리량은 다음의 수학식 2에 따라 계산될 수 있다.The first data throughput may be calculated according to Equation 2 below.

[수학식 2][Equation 2]

수학식 2에서

는 상기 i번째 차량에 상기 n번째 채널의 할당 여부를 나타내는 값이다. 즉,

이 '1'이면 상기 i번째 차량에 상기 n번째 채널이 할당된 것을 나타내고, 반대로,

이 '0'이면 상기 i번째 차량에 상기 n번째 채널이 할당되지 않은 것을 나타낸다.

는 상기 i번째 차량과 상기 다른 차량 사이의 상기 n번째 채널의 제1 SINR(signal to interference and noise ratio) 값을 나타낸다.in Equation 2

is a value indicating whether the n-th channel is allocated to the i-th vehicle. in other words,

If this is '1', it indicates that the n-th channel is allocated to the i-th vehicle, and conversely,

If this value is '0', it indicates that the n-th channel is not allocated to the i-th vehicle.

상기 제2 데이터 처리량은 다음의 수학식 3에 따라 계산될 수 있다.The second data throughput may be calculated according to Equation 3 below.

[수학식 3][Equation 3]

수학식 3에서

는 상기 i번째 차량에 상기 n번째 채널의 할당 여부를 나타내는 값이다.

는 상기 i번째 차량과 상기 복수의 기지국들 중에서 상기 대응하는 기지국 사이의 상기 n번째 채널의 제2 SINR 값을 나타낸다.in Equation 3

is a value indicating whether the n-th channel is allocated to the i-th vehicle.

denotes a second SINR value of the n-th channel between the i-th vehicle and the corresponding base station among the plurality of base stations.

상기 제3 데이터 처리량은 다음의 수학식 5에 따라 계산될 수 있다.The third data throughput may be calculated according to Equation 5 below.

[수학식 5][Equation 5]

수학식 5에서,

는 상기 j번째 기지국에 상기 n번째 채널의 할당 여부를 나타내는 값이다.

는 상기 n번째 채널을 통한 상기 j번째 기지국으로부터 상기 제1 차량으로의 하향 링크에서의 제3 SINR 값을 나타낸다.In Equation 5,

is a value indicating whether the n-th channel is allocated to the j-th base station.

상기 제4 데이터 처리량은 다음의 수학식 6에 따라 계산될 수 있다.The fourth data throughput may be calculated according to Equation 6 below.

[수학식 6][Equation 6]

수학식 6에서,

는 상기 n번째 채널을 통한 상기 j번째 기지국으로부터 상기 제2 차량으로의 하향 링크에서의 제4 SINR 값을 나타낸다.In Equation 6,

상기 제1 SINR 값은 다음의 수학식 7에 따라 계산될 수 있다.The first SINR value may be calculated according to Equation 7 below.

[수학식 7][Equation 7]

수학식 7에서

는 상기 i번째 차량으로부터 상기 다른 차량으로 송신하는 상향 링크 신호의 세기(전력 크기)를 나타낸다. s는 차량들(200) 중에서 s번째 차량을 나타내며,

는 상기 s번째 차량에 상기 n번째 채널의 할당 여부를 나타내는 값이다.

는 상기 s번째 차량으로부터 상기 s번째 차량이 NOMA 방식으로 통신하는 특정 차량으로 송신하는 신호의 세기를 나타내며,

는 상기 s번째 차량으로부터 상기 s번째 차량이 NOMA 방식으로 통신하는 기지국으로 송신하는 신호의 세기를 나타낸다. t는 복수의 기지국들(100) 중에서 t번째 기지국을 나타내며,

는 상기 t번째 기지국에 상기 n번째 채널의 할당 여부를 나타내는 값이다.

는 상기 t번째 기지국으로부터 상기 t번째 기지국이 NOMA 방식으로 통신하는 차량들 중에서 어느(가까운) 하나로 송신하는 신호의 세기를 나타내며,

는 상기 t번째 기지국으로부터 상기 t번째 기지국이 NOMA 방식으로 통신하는 차량들 중에서 다른(먼) 하나로 송신하는 신호의 세기를 나타낸다.

는 잡음 세기 상수를 나타낸다.in Equation 7

denotes the strength (power level) of an uplink signal transmitted from the i-th vehicle to the other vehicle. s represents the s-th vehicle among the vehicles 200,

is a value indicating whether the n-th channel is allocated to the s-th vehicle.

represents the strength of a signal transmitted from the s-th vehicle to a specific vehicle that the s-th vehicle communicates with in the NOMA method,

denotes the strength of a signal transmitted from the s-th vehicle to a base station communicating with the s-th vehicle in the NOMA method. t represents the t-th base station among the plurality of base stations 100,

is a value indicating whether the n-th channel is allocated to the t-th base station.

represents the strength of a signal transmitted from the t-th base station to any (closer) one of the vehicles that the t-th base station communicates with in the NOMA method,

denotes the strength of a signal transmitted from the t-th base station to another (far) one of the vehicles communicating by the t-th base station in the NOMA method.

is the noise intensity constant.

H는 송신기과 수신기 사이의 채널 이득(channel gaim)을 의미하는데,

이라고 함은 k번째 채널을 통해 l번째 송신기와 m번째 수신기 사이의 채널 이득을 의미하며, 다음의 수학식 8에 의해 정의된다.H stands for the channel gain between the transmitter and the receiver,

denotes a channel gain between the l-th transmitter and the m-th receiver through the k-th channel, and is defined by Equation 8 below.

[수학식 8][Equation 8]

수학식 8에서,

는 상기 k번째 채널에서 상기 l번째 송신기와 상기 m번째 수신기 사이의 fast fading component를 나타내는데, 이는 complex Gaussian 분포를 따르며

이다.

은 상기 l번째 송신기와 상기 m번째 수신기 사이의 log-normal shadowing 랜덤 변수를 나타내며, 표준편차

를 가진다.

은 상기 l번째 송신기와 상기 m번째 수신기 사이의 path-loss 상수를 나타낸다.

은 상기 l번째 송신기와 상기 m번째 수신기 사이의 거리를 나타낸다.

은 상기 l번째 송신기의 path-loss exponent를 나타낸다.In Equation 8,

denotes a fast fading component between the l-th transmitter and the m-th receiver in the k-th channel, which follows a complex Gaussian distribution,

to be.

denotes a log-normal shadowing random variable between the l-th transmitter and the m-th receiver, standard deviation

have

denotes a path-loss constant between the l-th transmitter and the m-th receiver.

is the distance between the l-th transmitter and the m-th receiver.

denotes the path-loss exponent of the l-th transmitter.

상기 제2 SINR 값은 다음의 수학식 9에 따라 계산될 수 있다.The second SINR value may be calculated according to Equation 9 below.

[수학식 9][Equation 9]

수학식 9는 수학식 7에서

이

으로,

이

으로,

이

으로 변경된 것을 제외하고는 실질적으로 동일하므로 중복되는 설명은 생략한다.

는 상기 i번째 차량이 상기 기지국으로 송신하는 상향 링크 신호의 세기를 나타낸다.Equation 9 is from Equation 7

this

by,

this

by,

this

Since it is substantially the same except for the change to , the overlapping description will be omitted.

denotes the strength of an uplink signal transmitted by the i-th vehicle to the base station.

상기 제3 SINR 값은 다음의 수학식 10에 따라 계산될 수 있다.The third SINR value may be calculated according to Equation 10 below.

[수학식 10][Equation 10]

수학식 10는 수학식 7에서

이

으로,

이

으로,

이

는 상기 j번째 기지국이 상기 제1 차량으로 송신하는 하향 링크 신호의 세기를 나타낸다.Equation 10 is obtained from Equation 7

this

by,

this

by,

this

denotes the strength of a downlink signal transmitted by the j-th base station to the first vehicle.

상기 제4 SINR 값은 다음의 수학식 11에 따라 계산될 수 있다.The fourth SINR value may be calculated according to Equation 11 below.

[수학식 11][Equation 11]

수학식 11는 수학식 7에서

이

으로,

이

으로,

이

는 상기 j번째 기지국이 상기 제2 차량으로 송신하는 하향 링크 신호의 세기를 나타낸다.Equation 11 is obtained from Equation 7

this

by,

this

by,

this

denotes the strength of a downlink signal transmitted by the j-th base station to the second vehicle.

강화 학습부(310)는 손실이 최소화되도록 뉴럴 네트워크의 가중치를 업데이트할 수 있다(S205).The reinforcement learning unit 310 may update the weight of the neural network so that the loss is minimized (S205).

강화 학습부(310)는 목표 함수에 의해 계산된 값과 실제 시뮬레이션된 도는 측정된 값 사이의 차이인 손실이 최소화될 수 있도록 뉴럴 네트워크의 가중치를 변경할 수 있다.The reinforcement learning unit 310 may change the weight of the neural network so that a loss that is a difference between a value calculated by the objective function and an actual simulated or measured value can be minimized.

도 4에 도시된 단계들(S201 내지 S205)이 본 발명에서 최소 단위의 학습 과정을 나타낸다.Steps S201 to S205 shown in FIG. 4 represent the learning process of the minimum unit in the present invention.

다시 도 2와 도 3을 참조하면, 강화 학습부(310)는 최소 단위의 학습 과정인 에포크(S200)의 반복 횟수를 카운트하며 n번(n은 2 이상의 자연수) 반복한다(S210의 NO 브랜치). Referring back to FIGS. 2 and 3 , the reinforcement learning unit 310 counts the number of repetitions of the epoch S200, which is a learning process of the minimum unit, and repeats it n times (n is a natural number greater than or equal to 2) (NO branch of S210) .

에포크가 n번 반복되어 1 에라가 종료되면(S220의 YES 브랜치), 강화 학습부(310)는 차량들(200)마다 설정된 운행 정보에 따라 차량들(200)의 위치를 이동시킬 수 있다(S300).When the epoch is repeated n times and 1 Era is finished (YES branch of S220 ), the reinforcement learning unit 310 may move the positions of the vehicles 200 according to the driving information set for each vehicle 200 ( S300 ). ).

실시 예에 따라, 운행 정보는 차량들(200) 각각의 진행 방향과 속도를 포함할 수 있다.According to an embodiment, the driving information may include the traveling direction and speed of each of the vehicles 200 .

강화 학습부(310)는 차량들(200)을 이동시킨 후 S200 내지 S210의 강화 학습, 즉, 새로운 에라를 시작한다. 강화 학습부(310)는 에라의 반복 횟수를 카운트하며 m번(m은 2 이상의 자연수) 반복한다(S310의 NO 브랜치).After moving the vehicles 200 , the reinforcement learning unit 310 starts reinforcement learning of S200 to S210 , that is, a new erasure. The reinforcement learning unit 310 counts the number of repetitions of the error and repeats it m times (m is a natural number equal to or greater than 2) (NO branch of S310).

에라가 m번 반복되어 1 에피소드가 종료되면(S310의 YES 브랜치), 강화 학습부(410)는 차량들(200)의 위치를 초기화한다. 여기서, 초기화라 함은 차량들(200)의 위치를 랜덤하게 결정하는 것을 의미하며, 차량들(200)이 추가되거나 제거되는 것을 포함할 수 있다.When Erra is repeated m times and one episode ends (YES branch of S310 ), the reinforcement learning unit 410 initializes the positions of the vehicles 200 . Here, initialization means randomly determining the positions of the vehicles 200 , and may include adding or removing vehicles 200 .

강화 학습부(310)는 차량들(200)의 위치를 초기화한 후 S200 내지 S310의 강화 학습, 즉, 새로운 에피소드를 시작한다. 강화 학습부(310)는 에피소드의 반복 횟수를 카운트하며 p번(p는 2 이사의 자연수) 반복한다(S410의 NO 브랜치).After initializing the positions of the vehicles 200 , the reinforcement learning unit 310 starts reinforcement learning of S200 to S310 , that is, a new episode. The reinforcement learning unit 310 counts the number of repetitions of the episode and repeats it p times (p is a natural number greater than or equal to 2) (NO branch of S410).

에피소드가 p번 반복한 후(S410의 NO 브랜치), 강화 학습부(310)는 마지막 버전의 정책에 따라 자원 할당 모델을 도출한다(S500). 예를 들어, 강화 학습부(310)는 전책에 따라 자원 할당 테이블을 생성하고 생성된 자원 할당 테이블을 자원 할당부(330)로 출력할 수 있다.After the episode is repeated p times (NO branch of S410), the reinforcement learning unit 310 derives a resource allocation model according to the policy of the last version (S500). For example, the reinforcement learning unit 310 may generate a resource allocation table according to the strategy and output the generated resource allocation table to the resource allocation unit 330 .

자원 할당부(330)는 강화 학습부(310)로부터 수신된 자원 할당 테이블을 기지국(100)과 차량들(200)로 전송할 수 있다. 자원 할당부(330)는 강화 학습부(310)로부터 최종적인 자원 할당 테이블을 수신할 수 있음은 물론이며 강화 학습 동안의 자원 할당 테이블도 수신할 수 있다.The resource allocation unit 330 may transmit the resource allocation table received from the reinforcement learning unit 310 to the base station 100 and the vehicles 200 . The resource allocator 330 may receive the final resource allocation table from the reinforcement learning unit 310 as well as receive the resource allocation table during reinforcement learning.

기지국(100)과 차량들(200)은 자원 할당 테이블에 기초하여 무선 통신에 사용할 채널을 스스로 선택하여 데이터를 송신할 수 있다.The base station 100 and the vehicles 200 may transmit data by selecting a channel to be used for wireless communication by themselves based on the resource allocation table.

이와 같은 심층 강화 학습 과정을 통해 자원 할당 테이블을 결정하고 이를 무선 통신 시스템(10)에서 사용함에 따라 기지국(100)과 차량들(200)은 가용 채널 탐색 시간 등 큰 지연 없이 무선 통신을 수행할 수 있으며 심층 강화 학습의 특성에 따라 매우 효율적이고 충돌 없는 채널 할당이 가능하다.As the resource allocation table is determined through such a deep reinforcement learning process and used in the wireless communication system 10, the base station 100 and the vehicles 200 can perform wireless communication without a large delay such as an available channel search time. And according to the characteristics of deep reinforcement learning, very efficient and collision-free channel assignment is possible.

도 5 내지 도 8은 본 발명의 기술적 사상을 시뮬레이션한 과정과 결과를 나타낸다.5 to 8 show the process and results of simulating the technical idea of the present invention.

도 5는 도 3에 도시된 강화 학습 과정을 시뮬레이션하기 위한 가상의 시스템을 나타내는 도면이고, 도 6은 강화 학습이 수행되는 동안 학습 초기 단계에서의 손실의 변화를 나타내는 그래프이고, 도 7은 강화 학습이 수행되는 동안 학습 후기 단계에서의 손실의 변화를 나타내는 그래프이며, 도 8은 강화 학습이 수행되는 동안 할인 보상(discounted reward)의 변화를 나타내는 그래프이다.FIG. 5 is a diagram illustrating a virtual system for simulating the reinforcement learning process shown in FIG. 3 , FIG. 6 is a graph illustrating a change in loss in the initial stage of learning while reinforcement learning is performed, and FIG. 7 is reinforcement learning It is a graph showing a change in loss in the late learning stage while this is being performed, and FIG. 8 is a graph showing a change in a discounted reward while reinforcement learning is performed.

도 5 내지 도 8을 참조하면, 도 5에 도시한 바와 같이, X축과 Y축 모두 2,000[m] 이상의 구역에서의 무선 통신 시스템(10)에 9개의 기지국들(MBS)과 복수의 차량들(VUE)이 있는 것으로 설정하였다. 이와 같은 설정에서 에포크는 수천번 반복, 에라는 수백번 반복, 에피소드는 수십번 반복하는 것으로 시뮬레이션하였다.5 to 8, as shown in Figure 5, both the X-axis and the Y-axis in the wireless communication system 10 in an area of 2,000 [m] or more 9 base stations (MBS) and a plurality of vehicles (VUE) was set to exist. In such a setting, epochs were simulated as repeating thousands of times, errata repeating hundreds of times, and episodes repeating dozens of times.

도 6에 도시한 바와 같이, 학습 초기(첫번째 에피소드, 첫번째 에라, 첫번째 에포크)에는 목표 함수의 결과와 실제 시뮬레이션 결과 사이의 차이인 손실이 비교적 크고 심지어 발산하는 것으로 나타났다.As shown in FIG. 6 , in the early stages of learning (first episode, first epoch, first epoch), the loss, which is the difference between the result of the objective function and the actual simulation result, was found to be relatively large and even divergent.

그러나, 도 7에 도시한 바와 같이. 학습 중반(첫번째 에피소드, 40번째 에라, 첫번째 에포크)에는 손실이 빠르게 '0'으로 수렴하는 것으로 나타났다.However, as shown in FIG. 7 . In the middle of the learning (first episode, 40th epoch, first epoch), the loss quickly converges to '0'.

또한, 도 8에 도시한 바와 같이, 학습 후반(여러 번의 에피소드를 진행한 후)에는 임의의 에라에서도 할인 보상(discounted reward)가 높은 값을 가지며 지속적으로 상승하는 것으로 나타났다.In addition, as shown in FIG. 8 , in the latter half of learning (after performing several episodes), the discounted reward has a high value and continuously rises even in an arbitrary error.

이와 같은 시뮬레이션 결과를 통해, 본 발명에 따른 심층 강화 학습을 통해 비직교 다중 접속 방식 기반 차량 통신에서의 자원 할당을 효율적으로 수행할 수 있음을 인지할 수 있다.Through such simulation results, it can be recognized that resource allocation in non-orthogonal multiple access method-based vehicle communication can be efficiently performed through deep reinforcement learning according to the present invention.

본 발명은 도면에 도시된 일 실시 예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시 예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 등록청구범위의 기술적 사상에 의해 정해져야 할 것이다.Although the present invention has been described with reference to one embodiment shown in the drawings, this is merely exemplary, and those of ordinary skill in the art will understand that various modifications and equivalent other embodiments are possible therefrom. Accordingly, the true technical protection scope of the present invention should be determined by the technical spirit of the appended claims.

10; 무선 통신 시스템
100; 기지국
200; 차량
300; 자원 할당 장치
310; 강화 학습부
330; 자원 할당부10; wireless communication system
100; base station
200; vehicle
300; resource allocation device
310; reinforcement learning department
330; resource allocator

Claims

Reinforcement learning unit for performing reinforcement learning to modify the policy (policy) to increase the total data throughput between the vehicles and at least one base station while changing the locations of at least two or more vehicles and channels allocated to a plurality of compartments; and
Non-orthogonal including a resource allocation unit that generates a resource allocation table in which one of the channels is allocated to each of the partitions according to a policy determined by the reinforcement learning unit, and transmits the resource allocation table to the vehicles and the base station Resource allocation device in multi-access method-based vehicle communication.

According to claim 1,
The channel is a resource allocating apparatus in a non-orthogonal multiple access method-based vehicle communication that is distinguished by at least one of a frequency and a signal strength.

According to claim 1,
The reinforcement learning is
After performing an action to change the channels allocated to the compartments, the data throughput is measured as a reward and the learning process of modifying the policy so that the total data throughput is increased n times (n is 2) (a) step of repeating (a) or more natural numbers;
(b) repeating step (a) m times (m is a natural number equal to or greater than 2) while moving the positions of the vehicles according to the driving information set for each vehicle; and
and (c) repeating step (b) p times (p is a natural number greater than or equal to 2) while changing the initial positions of the vehicles.

4. The method of claim 3,
The step (a) is,
allocating the channels to the partitions based on a resource allocation table generated according to the policy;
measuring the overall data throughput by observing communication between the vehicles and the base station; and
A resource allocation apparatus in non-orthogonal multiple access method-based vehicle communication, comprising the step of updating a weight of a neural network so that a loss, which is a difference between a result value of a predetermined objective function and the compensation, is minimized.

4. The method of claim 3,
The driving information is an apparatus for allocating resources in vehicle communication based on a non-orthogonal multiple access method including the traveling direction and speed of each of the vehicles.

According to claim 1,
In the non-orthogonal multiple access scheme-based vehicle communication, the total data throughput includes a data throughput in an uplink from each of the vehicles to another vehicle or the base station and a data throughput in a downlink from the base station to the vehicles. of resource allocation device.

According to claim 1,
The total data throughput is calculated by Equation 1 below,
[Equation 1]

where N is the set of channels, V is the set of vehicles, B is the set of the base station,

is a resource allocation apparatus in non-orthogonal multiple access scheme-based vehicle communication indicating a fourth data throughput in a downlink from the j-th base station to a second vehicle through the n-th channel.

8. The method of claim 7,
The first data throughput is calculated according to Equation 2 below,
[Equation 2]

here,

is a resource allocation apparatus in non-orthogonal multiple access scheme-based vehicle communication indicating a first signal to interference and noise ratio (SINR) value of the n-th channel between the i-th vehicle and the other vehicle.

8. The method of claim 7,
The second data throughput is calculated according to the following Equation 3,
[Equation 3]

here,

is a resource allocation apparatus in non-orthogonal multiple access scheme-based vehicle communication indicating a second SINR value of the n-th channel between the i-th vehicle and the base station.

8. The method of claim 7,
The third data throughput is calculated according to the following Equation 5,
[Equation 4]

here,

is a resource allocation apparatus in non-orthogonal multiple access scheme-based vehicle communication indicating a third SINR value in a downlink from the j-th base station to the first vehicle through the n-th channel.

8. The method of claim 7,
The fourth data throughput is calculated according to Equation 6 below,
[Equation 6]

here,

is a resource allocation apparatus in non-orthogonal multiple access scheme-based vehicle communication indicating a fourth SINR value in a downlink from the j-th base station to the second vehicle through the n-th channel.

After performing an action of changing channels allocated to a plurality of compartments, the total data throughput between at least two or more vehicles and at least one or more base stations is measured as a reward, and the policy (policy) to increase the total data throughput (a) repeating the learning process to correct ) n times (n is a natural number greater than or equal to 2);
(b) repeating step (a) m times (m is a natural number equal to or greater than 2) while moving the positions of the vehicles according to the driving information set for each vehicle;
(c) repeating step (b) p times (p is a natural number greater than or equal to 2) while changing the initial positions of the vehicles; and
(d) allocating the channels to the compartments according to the policy determined through the steps (a) to (c).

13. The method of claim 12,
The step (a) is,
allocating the channels to the partitions based on a resource allocation table generated according to the policy;
measuring the overall data throughput by observing communication between the vehicles and the base station; and
A method for allocating resources in a non-orthogonal multiple access scheme-based vehicle communication, comprising: updating a weight of a neural network so that a loss, which is a difference between a result value of a predetermined objective function and the compensation, is minimized.

13. The method of claim 12,
The channel is a resource allocation method in a non-orthogonal multiple access method-based vehicle communication that is distinguished by at least one of a frequency and a signal strength.

13. The method of claim 12,
The method of allocating resources in vehicle communication based on a non-orthogonal multiple access method, wherein the driving information includes a traveling direction and speed of each of the vehicles.

13. The method of claim 12,
In the non-orthogonal multiple access scheme-based vehicle communication, the total data throughput includes a data throughput in an uplink from each of the vehicles to another vehicle or the base station and a data throughput in a downlink from the base station to the vehicles. resource allocation method.

13. The method of claim 12,
The total data throughput is calculated by Equation 1 below,
[Equation 1]

represents a fourth data throughput in a downlink from the j-th base station to a second vehicle through the n-th channel. A resource allocation method in non-orthogonal multiple access scheme-based vehicle communication.

18. The method of claim 17,
The first data throughput is calculated according to Equation 2 below,
[Equation 2]

here,

is a resource allocation method in non-orthogonal multiple access scheme-based vehicle communication indicating a first signal to interference and noise ratio (SINR) value of the n-th channel between the i-th vehicle and the other vehicle.

18. The method of claim 17,
The second data throughput is calculated according to the following Equation 3,
[Equation 3]

here,

is a resource allocation method in non-orthogonal multiple access scheme-based vehicle communication indicating the second SINR value of the n-th channel between the i-th vehicle and the base station.

18. The method of claim 17,
The third data throughput is calculated according to the following Equation 5,
[Equation 4]

here,

is a method for allocating resources in non-orthogonal multiple access scheme-based vehicle communication indicating a third SINR value in a downlink from the j-th base station to the first vehicle through the n-th channel.

18. The method of claim 17,
The fourth data throughput is calculated according to Equation 6 below,
[Equation 5]

here,

is a method for allocating resources in non-orthogonal multiple access scheme-based vehicle communication indicating a fourth SINR value in a downlink from the j-th base station to the second vehicle through the n-th channel.