KR102555696B1

KR102555696B1 - Device and method for allocating resource in vehicle to everything communication based on non-orhthogonal multiple access

Info

Publication number: KR102555696B1
Application number: KR1020210016125A
Authority: KR
Inventors: 조성현; 이솔; 안세영
Original assignee: 한양대학교 에리카산학협력단
Priority date: 2021-02-04
Filing date: 2021-02-04
Publication date: 2023-07-14
Also published as: KR20220112890A

Abstract

비직교 다중 접속 방식 기반 차량 통신에서의 자원 할당 장치 및 방법이 개시된다. 본 발명의 실시 예에 따른 비직교 다중 접속 방식 기반 차량 통신에서의 자원 할당 장치는 복수의 구획들에 할당되는 채널들과 적어도 2 이상의 차량들의 위치를 변경하면서 상기 차량들과 적어도 1 이상의 기지국 간의 전체 데이터 처리량이 증가되도록 정책(policy)을 수정하는 강화 학습을 수행하는 강화 학습부 및 상기 강화 학습부에 의해 결정된 정책에 따라 상기 구획들 각각에 상기 채널들 중 하나가 할당된 자원 할당 테이블을 생성하고 상기 자원 할당 테이블을 상기 차량들과 상기 기지국으로 전송하는 자원 할당부를 포함할 수 있다.An apparatus and method for allocating resources in vehicle communication based on a non-orthogonal multiple access scheme are disclosed. An apparatus for allocating resources in vehicular communication based on a non-orthogonal multiple access scheme according to an embodiment of the present invention changes channels allocated to a plurality of segments and locations of at least two or more vehicles while changing the entire space between the vehicles and at least one or more base stations. Create a reinforcement learning unit that performs reinforcement learning for modifying a policy to increase data throughput and a resource allocation table in which one of the channels is allocated to each of the partitions according to a policy determined by the reinforcement learning unit A resource allocation unit transmitting the resource allocation table to the vehicles and the base station may be included.

Description

Device and method for allocating resources in vehicle communication based on non-orthogonal multiple access scheme

본 발명은 비직교 다중 접속 방식 기반 차량 통신에서의 자원 할당 장치 및 방법에 관한 것으로서, 특히, 심층 강화 학습을 적용하여 과도한 지연 없이 효율적으로 자원을 할당할 수 있는 비직교 다중 접속 방식 기반 차량 통신에서의 자원 할당 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for allocating resources in vehicle communication based on a non-orthogonal multiple access scheme, and more particularly, in vehicle communication based on a non-orthogonal multiple access scheme capable of efficiently allocating resources without excessive delay by applying deep reinforcement learning. It relates to a resource allocation device and method.

차량통신(Vehicle to Everything; V2X)과 비직교 다중 접속 방식(Non-Orthogonal Multiple Access; NOMA)과 같은 기술들이 현재 상용화되고 있다.Technologies such as Vehicle to Everything (V2X) and Non-Orthogonal Multiple Access (NOMA) are currently being commercialized.

차량통신(V2X)은 이동하는 차량이 기지국 또는 다른 차량과 통신하는 기술로서 무선 통신에 의한 다중 경로 페이딩(multi-path fading)와 차량 이동에 따른 도플러 확산(doppler spread) 등 신호의 감쇄, 변화 및 간섭에 대응하여 안정적인 통신을 수행하는 것이 중요하다.Vehicle communication (V2X) is a technology in which a moving vehicle communicates with a base station or another vehicle, and attenuation, change, and It is important to perform stable communication in response to interference.

비직교 다중 접속 방식(NOMA)은 동일한 채널(주파수)을 통해 동시에 2 이상의 수신 단말에 신호를 전송하여 자원 효율을 향상시키기 위한 기술로서 송신측에서는 동시에 송신하는 신호의 송신 전력을 제어하며, 수신측에서는 순차적 간섭 제거(successive interference cancellation; SIC)를 통해 다른 신호를 제거하고 본인의 신호만 디코딩하는 것이다.Non-Orthogonal Multiple Access (NOMA) is a technology for improving resource efficiency by simultaneously transmitting signals to two or more receiving terminals through the same channel (frequency). Through successful interference cancellation (SIC), other signals are canceled and only the own signal is decoded.

최근에는 자율주행과 차량 내 인포테인먼트 등의 발전으로 인해 차량통신에서의 통신 트래픽이 급격하게 증가하고 있으며, 특히, 도심지와 같이 차량의 수가 많은 환경에서는 기존의 통신 방식으로는 필요 데이터 처리량을 확보하지 못하는 문제가 발생하고 있다. 따라서, 데이터 처리량을 향상시키기 위해 차량통신에 비직교 다중 접속 방식을 적용하는 기술이 고려되고 있다. Recently, communication traffic in vehicle communication is rapidly increasing due to the development of autonomous driving and in-vehicle infotainment. A problem is occurring. Therefore, a technique of applying a non-orthogonal multiple access method to vehicle communication to improve data throughput is being considered.

이때, 통신 기기들에게 채널을 할당하는, 특히, 비직교 다중 접속 방식에서 동일한 채널을 할당하고 전송 전력의 세기를 결정하는 무선 자원 할당(wireless resource allocation) 기술이 매우 중요하다.At this time, a wireless resource allocation technique for allocating channels to communication devices, in particular allocating the same channel in a non-orthogonal multiple access scheme and determining the strength of transmission power, is very important.

무선 자원할당이 적절하게 수행되어야 시스템 내의 기기들 간 간섭을 최소화할 수 있고 가장 전송 효율이 높은 채널을 사용함으로써 전체 시스템의 데이터 처리량을 극대화할 수 있다.When radio resource allocation is properly performed, interference between devices in the system can be minimized, and data throughput of the entire system can be maximized by using a channel with the highest transmission efficiency.

한편, 심층 강화 학습(Deep Reinforcement Learning; DRL)은 인공지능 분야에서 머신 러닝(machine learning)의 한 종류로서 기존의 강화 학습 알고리즘에 인공신경망(artificial neural network)를 적용하여 경우의 수가 많아 일반적으로 최적화 알고리즘 또는 수학적 접근법을 도출해내기 어려운 non-convex 특성을 가진 문제를 해결하기에 적합한 기술이다.On the other hand, deep reinforcement learning (DRL) is a type of machine learning in the field of artificial intelligence, and it is generally optimized because there are many cases by applying artificial neural networks to existing reinforcement learning algorithms. It is a technique suitable for solving problems with non-convex characteristics that are difficult to derive from an algorithm or mathematical approach.

심층강화학습에서는 에이전트 (Agent), 환경 (Environment), 상태 (State), 행동 (Action), 보상 (Reward)과 같은 개념을 사용한다. 에이전트는 환경을 관찰 (Observation) 하여 현재 상태를 알아내고, 현재 상태에서 어떤 행동을 하면 얼마만큼의 보상을 받을 수 있는지를 예측할 수 있도록 학습하면서 각 상태에서 최대의 보상을 주는 행동을 하는 최적의 정책 (Policy)을 학습한다. 학습이 완료된 심층강화학습 모델을 사용하게 되면 자신의 정책에 따라 별도의 추가적인 학습 없이 현재 상태에서 최적의 보상을 반환하는 행동을 수행할 수 있기 때문에 실시간으로 수행되는 시스템에서의 즉각적인 답을 도출해야 할 때 유리하게 작용할 수 있으며, 시스템 내의 사용자들에게 동일한 모델을 분산형으로 공유할 수 있기 때문에 서로 간의 행동을 합의하는 과정에서 생기는 지연시간 또한 줄일 수 있다.In deep reinforcement learning, concepts such as Agent, Environment, State, Action, and Reward are used. The agent observes the environment to find out the current state, learns to predict how much reward it can receive by taking an action in the current state, and takes an action that gives the maximum reward in each state. Optimal policy (Policy). If you use a deep reinforcement learning model that has been trained, you can perform an action that returns an optimal reward in the current state without additional learning according to your policy, so you need to derive an immediate answer in a real-time system. It can work advantageously when the user in the system can share the same model in a decentralized way, so the delay time that occurs in the process of agreeing on each other's actions can also be reduced.

차량통신은 차량의 이동 등 변수가 높기 때문에 비직교 다중 접속 방식을 도입하는데 다양한 어려움이 있는데, 특히 자원 할당 측면에서 큰 어려움이 있다. 특히, 구역 내 사용자(또는 차량)가 증가하면 자원 할당 문제를 짧은 시간 내에 계산하기 어렵고 전송 효율 조건을 충족하는데 어려움이 발생할 수 있다. 따라서, 차량 통신의 다양한 변수를 고려하더라도 시스템 내 필요 데이터 처리량을 충족하고 지연이 발생하지 않는 효율적인 자원 할당 기법이 필요하다.Since vehicle communication has high variables such as vehicle movement, there are various difficulties in introducing a non-orthogonal multiple access scheme, especially in terms of resource allocation. In particular, when the number of users (or vehicles) within a zone increases, it is difficult to calculate a resource allocation problem within a short time and it may be difficult to meet transmission efficiency conditions. Therefore, there is a need for an efficient resource allocation technique that satisfies the required data throughput in the system and does not cause delay even when various variables of vehicle communication are considered.

본 발명이 이루고자 하는 기술적인 과제는 심층 강화 학습을 적용하여 과도한 지연 없이 효율적으로 자원을 할당할 수 있는 비직교 다중 접속 방식 기반 차량 통신에서의 자원 할당 장치 및 방법을 제공하는 것이다.A technical problem to be achieved by the present invention is to provide an apparatus and method for allocating resources in vehicle communication based on a non-orthogonal multiple access scheme capable of efficiently allocating resources without excessive delay by applying deep reinforcement learning.

본 발명의 실시 예에 따른 비직교 다중 접속 방식 기반 차량 통신에서의 자원 할당 장치는 복수의 구획들에 할당되는 채널들과 적어도 2 이상의 차량들의 위치를 변경하면서 상기 차량들과 적어도 1 이상의 기지국 간의 전체 데이터 처리량이 증가되도록 정책(policy)을 수정하는 강화 학습을 수행하는 강화 학습부 및 상기 강화 학습부에 의해 결정된 정책에 따라 상기 구획들 각각에 상기 채널들 중 하나가 할당된 자원 할당 테이블을 생성하고 상기 자원 할당 테이블을 상기 차량들과 상기 기지국으로 전송하는 자원 할당부를 포함할 수 있다.An apparatus for allocating resources in vehicular communication based on a non-orthogonal multiple access scheme according to an embodiment of the present invention changes channels allocated to a plurality of segments and locations of at least two or more vehicles while changing the entire space between the vehicles and at least one or more base stations. Create a reinforcement learning unit that performs reinforcement learning for modifying a policy to increase data throughput and a resource allocation table in which one of the channels is allocated to each of the partitions according to a policy determined by the reinforcement learning unit A resource allocation unit transmitting the resource allocation table to the vehicles and the base station may be included.

실시 예에 따라, 상기 채널들은 주파수 및 신호 세기 중에서 적어도 하나에 의해 구분될 수 있다.Depending on the embodiment, the channels may be distinguished by at least one of frequency and signal strength.

실시 예에 따라, 상기 강화 학습은, 상기 구획들에 할당되는 상기 채널들을 변경하는 행동(action)을 한 후 상기 데이터 처리량을 보상(reward)으로서 측정하며 상기 전체 데이터 처리량이 증가되도록 상기 정책을 수정하는 학습 과정을 n번(n은 2 이상의 자연수) 반복하는 (a) 단계, 상기 차량들마다 설정된 운행 정보에 따라 상기 차량들의 상기 위치를 이동시키면서 (a) 단계를 m번(m은 2 이상의 자연수) 반복하는 (b) 단계 및 상기 차량들의 초기 위치를 변경하면서 (b) 단계를 p번(p는 2 이상의 자연수) 반복하는 (c) 단계를 포함할 수 있다.According to an embodiment, the reinforcement learning measures the data throughput as a reward after taking an action of changing the channels allocated to the partitions, and modifies the policy so that the total data throughput is increased. (a) repeating the learning process n times (n is a natural number greater than or equal to 2), and repeating step (a) m times (m is a natural number greater than or equal to 2) while moving the locations of the vehicles according to the driving information set for each vehicle. ) repeating (b) and (c) repeating step (b) p times (p is a natural number equal to or greater than 2) while changing the initial positions of the vehicles.

실시 예에 따라, 상기 (a) 단계는, 상기 정책에 따라 생성된 자원 할당 테이블에 기초하여 상기 구획들에 상기 채널들을 할당하는 단계, 상기 차량들과 상기 기지국 간의 통신을 관찰하여 상기 전체 데이터 처리량을 측정하는 단계 및 소정의 목표 함수(objective function)의 결과 값과 상기 보상의 차이인 손실(loss)이 최소화되도록 뉴럴 네트워크의 가중치를 업데이트하는 단계를 포함할 수 있다.According to an embodiment, the step (a) may include allocating the channels to the partitions based on a resource allocation table generated according to the policy, observing communication between the vehicles and the base station, and determining the total data throughput and updating weights of the neural network to minimize a loss, which is a difference between a resultant value of a predetermined objective function and the compensation.

실시 예에 따라, 상기 운행 정보는 상기 차량들 각각의 진행 방향과 속도를 포함할 수 있다.Depending on the embodiment, the driving information may include the traveling direction and speed of each of the vehicles.

실시 예에 따라, 상기 전체 데이터 처리량은 상기 차량들 각각으로부터 다른 차량 또는 상기 기지국으로의 상향 링크에서의 데이터 처리량과 상기 기지국으로부터 상기 차량들로의 하향 링크에서의 데이터 처리량을 포함할 수 있다.Depending on the embodiment, the total data throughput may include a data throughput in uplink from each of the vehicles to another vehicle or the base station and a data throughput in downlink from the base station to the vehicles.

실시 예에 따라, 상기 전체 데이터 처리량은 다음의 수학식에 의해 계산되며,

, 여기서, N은 상기 채널들의 집합, V는 상기 차량들의 집합, B는 상기 기지국의 집합,

는 n번째 채널을 통한 i번째 차량으로부터 다른 차량으로의 상향 링크에서의 제1 데이터 처리량,

는 상기 n번째 채널을 통한 상기 i번째 차량으로부터 상기 기지국으로의 상향 링크에서의 제2 데이터 처리량,

는 상기 n번째 채널을 통한 j번째 기지국으로부터 제1 차량으로의 하향 링크에서의 제3 데이터 처리량,

는 상기 n번째 채널을 통한 상기 j번째 기지국으로부터 제2 차량으로의 하향 링크에서의 제4 데이터 처리량을 나타낸다.According to an embodiment, the total data throughput is calculated by the following equation,

, where N is the set of channels, V is the set of vehicles, B is the set of base stations,

Is the first data throughput in the uplink from the i-th vehicle to another vehicle through the n-th channel,

Is the second data throughput in the uplink from the i-th vehicle to the base station through the n-th channel,

Is the third data throughput in the downlink from the j-th base station to the first vehicle through the n-th channel,

Denotes a fourth data throughput in a downlink from the j-th base station to the second vehicle through the n-th channel.

실시 예에 따라, 상기 제1 데이터 처리량은 다음의 수학식에 따라 계산되며,

, 여기서,

는 상기 i번째 차량에 상기 n번째 채널의 할당 여부를 나타내는 값,

는 상기 i번째 차량과 상기 다른 차량 사이의 상기 n번째 채널의 제1 SINR(signal to interference and noise ratio) 값을 나타낸다.According to an embodiment, the first data throughput is calculated according to the following equation,

, here,

Is a value indicating whether the n-th channel is allocated to the i-th vehicle,

Represents a first signal to interference and noise ratio (SINR) value of the n-th channel between the i-th vehicle and the other vehicle.

실시 예에 따라, 상기 제2 데이터 처리량은 다음의 수학식에 따라 계산되며,

, 여기서,

는 상기 i번째 차량과 상기 기지국 사이의 상기 n번째 채널의 제2 SINR 값을 나타낸다.According to an embodiment, the second data throughput is calculated according to the following equation,

, here,

represents the second SINR value of the n-th channel between the i-th vehicle and the base station.

상기 제3 데이터 처리량은 다음의 수학식에 따라 계산되며,

, 여기서,

는 상기 j번째 기지국에 상기 n번째 채널의 할당 여부를 나타내는 값,

는 상기 n번째 채널을 통한 상기 j번째 기지국으로부터 상기 제1 차량으로의 하향 링크에서의 제3 SINR 값을 나타낸다.The third data throughput is calculated according to the following equation,

, here,

Is a value indicating whether the n-th channel is allocated to the j-th base station,

denotes a third SINR value in a downlink from the j-th base station to the first vehicle through the n-th channel.

실시 예에 따라, 상기 제4 데이터 처리량은 다음의 수학식에 따라 계산되며,

, 여기서,

는 상기 n번째 채널을 통한 상기 j번째 기지국으로부터 상기 제2 차량으로의 하향 링크에서의 제4 SINR 값을 나타낸다.According to an embodiment, the fourth data throughput is calculated according to the following equation,

, here,

denotes a fourth SINR value in a downlink from the j-th base station to the second vehicle through the n-th channel.

본 발명의 실시 예에 따른 비직교 다중 접속 방식 기반 차량 통신에서의 자원 할당 방법은 복수의 구획들에 할당되는 채널들을 변경하는 행동(action)을 한 후 적어도 2 이상의 차량들과 적어도 1 이상의 기지국 간의 전체 데이터 처리량을 보상(reward)으로서 측정하며 상기 전체 데이터 처리량이 증가되도록 정책(policy)을 수정하는 학습 과정을 n번(n은 2 이상의 자연수) 반복하는 (a) 단계, 상기 차량들마다 설정된 운행 정보에 따라 상기 차량들의 위치를 이동시키면서 (a) 단계를 m번(m은 2 이상의 자연수) 반복하는 (b) 단계, 상기 차량들의 초기 위치를 변경하면서 (b) 단계를 p번(p는 2 이상의 자연수) 반복하는 (c) 단계 및 (a) 내지 (c) 단계를 통해 결정된 상기 정책에 따라 상기 구획들에 상기 채널들을 할당하는 (d) 단계를 포함할 수 있다.A method for allocating resources in vehicle communication based on a non-orthogonal multiple access method according to an embodiment of the present invention is performed between at least two vehicles and at least one base station after performing an action of changing channels allocated to a plurality of segments. Step (a) of repeating the learning process of measuring the total data throughput as a reward and modifying the policy so that the total data throughput is increased n times (n is a natural number equal to or greater than 2); Step (b) of repeating step (a) m times (m is a natural number greater than or equal to 2) while moving the locations of the vehicles according to the information, and step (b) p times (p is 2) while changing the initial locations of the vehicles. It may include repeating step (c) and allocating the channels to the partitions according to the policy determined through steps (a) to (c).

, 여기서,

, here,

, 여기서,

, here,

상기 제3 데이터 처리량은 다음의 수학식에 따라 계산되며,

, 여기서,

, here,

, 여기서,

, here,

본 발명의 실시 예에 따른 비직교 다중 접속 방식 기반 차량 통신에서의 자원 할당 장치 및 방법은 심층 강화 학습을 적용하여 과도한 지연 없이 효율적으로 자원을 할당할 수 있다.An apparatus and method for allocating resources in vehicular communication based on a non-orthogonal multiple access scheme according to an embodiment of the present invention can efficiently allocate resources without excessive delay by applying deep reinforcement learning.

본 발명의 상세한 설명에서 인용되는 도면을 보다 충분히 이해하기 위하여 각 도면의 상세한 설명이 제공된다.
도 1은 본 발명의 실시 예에 따른 무선 통신 시스템을 나타내는 개념도이다.
도 2는 본 발명의 실시 예에 따른 자원 할당 장치를 나타내는 블록도이다.
도 3은 도 2에 도시된 강화 학습부의 강화 학습 과정을 나타내는 플로우 차트(flow chart)이다.
도 4는 도 3에 도시된 S200 단계를 보다 상세하게 나타내는 플로우 차트이다.
도 5는 도 3에 도시된 강화 학습 과정을 시뮬레이션하기 위한 가상의 시스템을 나타내는 도면이다.
도 6은 강화 학습이 수행되는 동안 학습 초기 단계에서의 손실의 변화를 나타내는 그래프이다.
도 7은 강화 학습이 수행되는 동안 학습 후기 단계에서의 손실의 변화를 나타내는 그래프이다.
도 8은 강화 학습이 수행되는 동안 할인 보상(discounted reward)의 변화를 나타내는 그래프이다.A detailed description of each drawing is provided in order to more fully understand the drawings cited in the detailed description of the present invention.
1 is a conceptual diagram illustrating a wireless communication system according to an embodiment of the present invention.
2 is a block diagram illustrating a resource allocation apparatus according to an embodiment of the present invention.
FIG. 3 is a flow chart showing a reinforcement learning process of the reinforcement learning unit shown in FIG. 2 .
Figure 4 is a flow chart showing step S200 shown in Figure 3 in more detail.
FIG. 5 is a diagram illustrating a virtual system for simulating the reinforcement learning process shown in FIG. 3 .
6 is a graph showing a change in loss at an initial stage of learning while reinforcement learning is being performed.
7 is a graph showing a change in loss at a later stage of learning while reinforcement learning is being performed.
8 is a graph showing changes in discounted rewards while reinforcement learning is being performed.

본 명세서에 개시되어 있는 본 발명의 개념에 따른 실시 예들에 대해서 특정한 구조적 또는 기능적 설명들은 단지 본 발명의 개념에 따른 실시 예들을 설명하기 위한 목적으로 예시된 것으로서, 본 발명의 개념에 따른 실시 예들은 다양한 형태들로 실시될 수 있으며 본 명세서에 설명된 실시 예들에 한정되지 않는다.Specific structural or functional descriptions of the embodiments according to the concept of the present invention disclosed in this specification are only illustrated for the purpose of explaining the embodiments according to the concept of the present invention, and the embodiments according to the concept of the present invention It can be embodied in various forms and is not limited to the embodiments described herein.

본 발명의 개념에 따른 실시 예들은 다양한 변경들을 가할 수 있고 여러 가지 형태들을 가질 수 있으므로 실시 예들을 도면에 예시하고 본 명세서에 상세하게 설명하고자 한다. 그러나, 이는 본 발명의 개념에 따른 실시 예들을 특정한 개시 형태들에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물, 또는 대체물을 포함한다.Embodiments according to the concept of the present invention can apply various changes and can have various forms, so the embodiments are illustrated in the drawings and described in detail in this specification. However, this is not intended to limit the embodiments according to the concept of the present invention to specific disclosure forms, and includes all changes, equivalents, or substitutes included in the spirit and technical scope of the present invention.

제1 또는 제2 등의 용어는 다양한 구성 요소들을 설명하는데 사용될 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로만, 예컨대 본 발명의 개념에 따른 권리 범위로부터 이탈되지 않은 채, 제1구성요소는 제2구성요소로 명명될 수 있고, 유사하게 제2구성요소는 제1구성요소로도 명명될 수 있다.Terms such as first or second may be used to describe various components, but the components should not be limited by the terms. The above terms are only for the purpose of distinguishing one component from another component, e.g., without departing from the scope of rights according to the concept of the present invention, a first component may be termed a second component, and similarly The second component may also be referred to as the first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 구성요소들 간의 관계를 설명하는 다른 표현들, 즉 "~사이에"와 "바로 ~사이에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.It is understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, but other elements may exist in the middle. It should be. On the other hand, when an element is referred to as “directly connected” or “directly connected” to another element, it should be understood that no other element exists in the middle. Other expressions describing the relationship between elements, such as "between" and "directly between" or "adjacent to" and "directly adjacent to", etc., should be interpreted similarly.

본 명세서에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Terms used in this specification are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as "comprise" or "having" are intended to designate that the described feature, number, step, operation, component, part, or combination thereof exists, but one or more other features or numbers However, it should be understood that it does not preclude the presence or addition of steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and unless explicitly defined in this specification, it should not be interpreted in an ideal or excessively formal meaning. don't

이하, 첨부한 도면을 참조하여 본 발명의 바람직한 실시 예를 설명함으로써, 본 발명을 상세히 설명한다.Hereinafter, the present invention will be described in detail by describing preferred embodiments of the present invention with reference to the accompanying drawings.

도 1은 본 발명의 실시 예에 따른 무선 통신 시스템을 나타내는 개념도이다.1 is a conceptual diagram illustrating a wireless communication system according to an embodiment of the present invention.

도 1을 참조하면, 무선 통신 시스템(10)은 기지국(100)과 차량들(200-1 내지 200-4, 이하, 통칭할 때는 200으로 표시함)을 포함한다. 도 1에서는 하나의 기지국(100)과 4대의 차량들(200-1 내지 200-4)이 도시되어 있으나, 본 발명의 기술적 사상은 이에 한정되지 않는다. 즉, 무선 통신 시스템(10)은 적어도 하나의 기지국과 적어도 2 이상의 차량들을 포함할 수 있다.Referring to FIG. 1 , a wireless communication system 10 includes a base station 100 and vehicles 200-1 to 200-4 (hereinafter referred to as 200). Although one base station 100 and four vehicles 200-1 to 200-4 are shown in FIG. 1, the technical spirit of the present invention is not limited thereto. That is, the wireless communication system 10 may include at least one base station and at least two or more vehicles.

기지국(100)과 차량등(200-1 내지 200-4)는 비직교 다중 접속 방식(non-orthogonal multiple access)으로 서로 통신할 수 있다.The base station 100 and the vehicle lights 200-1 to 200-4 may communicate with each other in a non-orthogonal multiple access scheme.

예를 들어, 기지국(100)은 동일한 주파수로 신호의 세기를 다르게 하여 제1 차량(200-1)과 제2 차량(200-2)으로 신호를 송신할 수 있다(하향 링크 채널(NOMA1)). 이때, 기지국(100)은 기지국(100)으로부터 제1 차량(200-1)과 제2 차량(200-2) 각각까지의 거리에 따라 제1 차량(200-1)과 제2 차량(200-2)으로 송신하는 신호의 세기를 조절할 수 있다. 기지국(100)과 제1 차량(200-1)의 거리와 기지국(100)과 제2 차량(200-2)의 거리의 차이가 크도록 제1 차량(200-1)과 제2 차량(200-2)을 선택하는 것이 바람직할 수 있다. 기지국(100)은 제1 차량(200-1)과 제2 차량(200-2) 중에서 기지국(100)에서 더 먼 거리의 차량을 기준으로 하향 링크 채널(NOMA1)의 통신 범위를 결정할 수 있다.For example, the base station 100 may transmit signals to the first vehicle 200-1 and the second vehicle 200-2 by varying the strength of the signal at the same frequency (downlink channel NOMA1). . At this time, the base station 100 determines the first vehicle 200-1 and the second vehicle 200-2 according to the distances from the base station 100 to the first vehicle 200-1 and the second vehicle 200-2, respectively. 2) can adjust the strength of the transmitted signal. The difference between the distance between the base station 100 and the first vehicle 200-1 and the distance between the base station 100 and the second vehicle 200-2 is large. It may be desirable to choose -2). The base station 100 may determine the communication range of the downlink channel NOMA1 based on a vehicle farther away from the base station 100 among the first vehicle 200-1 and the second vehicle 200-2.

또한, 제3 차량(200-3)은 동일한 주파수로 신호의 세기를 다르게 하여 기지국(100)과 제4 차량(200-4)으로 신호를 송신할 수 있다(상향 링크 채널(NOMA2)). 이때, 제3 차량(200-3)은 제3 차량(200-3)으로부터 기지국(100)과 제42 차량(200-4) 각각까지의 거리에 따라 기지국(100)과 제4 차량(200-4)으로 송신하는 신호의 세기를 조절할 수 있다. 제3 차량(200-3)과 기지국(100)의 거리와 제3 차량(200-3)과 제4 차량(200-4)의 거리의 차이가 크도록 제4 차량(200-4)을 선택하는 것이 바람직할 수 있다. 제3 차량(200-3)은 기지국(100)과 제4 차량(200-4) 중에서 기지국(100)에서 더 먼 거리의 차량을 기준으로 상향 링크 채널(NOMA2)의 통신 범위를 결정할 수 있다.In addition, the third vehicle 200-3 may transmit signals to the base station 100 and the fourth vehicle 200-4 by varying signal strengths at the same frequency (uplink channel NOMA2). At this time, the third vehicle 200-3 is connected to the base station 100 and the fourth vehicle 200-4 according to the distances from the third vehicle 200-3 to the base station 100 and the 42nd vehicle 200-4, respectively. In 4), the strength of the transmitted signal can be adjusted. The fourth vehicle 200-4 is selected so that the difference between the distance between the third vehicle 200-3 and the base station 100 and the distance between the third vehicle 200-3 and the fourth vehicle 200-4 is large. It may be desirable to The third vehicle 200-3 may determine the communication range of the uplink channel NOMA2 based on a vehicle farther away from the base station 100 among the base station 100 and the fourth vehicle 200-4.

본 발명에서는 도로(Road)에 해당하는 영역이 복수의 구획들(HF)로 구분한다. 구획들(HF)은 사전에 설정된 형태로 정의될 수 있는데, 가장 간단하게는 사각형 형태로 정의될 수 있다. In the present invention, an area corresponding to a road is divided into a plurality of divisions HF. The partitions HF may be defined in a preset shape, most simply in a rectangular shape.

도 1에서는 도로(Road)가 왕복 2차선으로 도시되어 있고, 왕복 2차선의 폭 전체가 포함되도록 구획들(HF)이 정의되어 있으나, 본 발명의 기술적 사상은 이에 한정되지 않는다. 예를 들어, 구획들(HF) 각각은 차선 하나만 포함하도록 설정될 수 있다.In FIG. 1 , the road is shown as a two-lane line, and the divisions HF are defined to include the entire width of the two-lane line, but the technical spirit of the present invention is not limited thereto. For example, each of the sections HF may be set to include only one lane.

또한, 구획들(HF) 각각에 무선 통신을 위한 자원, 예를 들어, 주파수 등이 할당되는데 구획들(HF) 각각에 하나의 통신 장치, 예를 들어, 하나의 차량(200)만 존재하는 것이 바람직하므로(복수의 차량들이 존재하는 경우 복수의 차량들에 하나의 주파수가 할당되어 주파수 충돌이 발생할 우려가 있음) 구획들(HF)의 크기는 이를 고려하여 정의될 수 있다. 즉, 구획들(HF) 각각의 크기는 차량(200)의 크기 보다 작게 설정되는 것이 바람직할 수 있다.In addition, a resource for wireless communication, eg, a frequency, is allocated to each of the compartments HF, and only one communication device, for example, one vehicle 200 exists in each of the compartments HF. Since it is preferable (when a plurality of vehicles exist, one frequency is allocated to the plurality of vehicles, and there is a risk of frequency collision), the size of the segments HF may be defined considering this. That is, it may be desirable to set the size of each of the compartments HF smaller than the size of the vehicle 200 .

무선 통신 시스템(10)에 도 1에 도시된 바와 같이 소수의 기지국(100)과 차량들(200)만 존재하는 경우에는 다양한 방법으로 큰 지연 없이 기지국(100)과 차량들(200)에 채널을 할당할 수 있으나, 기지국(100)과 차량들(200)이 다수인 경우에는 미리 정해진 알고리즘으로 적절하게 채널을 할당할 수 없다.As shown in FIG. 1 in the wireless communication system 10, when only a small number of base stations 100 and vehicles 200 exist, channels are provided to the base stations 100 and vehicles 200 without a large delay in various ways. However, when there are a large number of base stations 100 and vehicles 200, channels cannot be appropriately allocated using a predetermined algorithm.

본 발명의 실시 예에 따른 자원 할당 장치(300)는 심층 강화 학습을 통해 결정된 정책에 따라 무선 통신 시스템(10)에서 기지국(100)과 차량들(200)이 사용할 자원을 할당한다.The resource allocation apparatus 300 according to an embodiment of the present invention allocates resources to be used by the base station 100 and vehicles 200 in the wireless communication system 10 according to a policy determined through deep reinforcement learning.

자원 할당 장치(300)는 일종의 컴퓨팅 시스템으로서 본 발명의 실시 예에 따른 자원 할당 방법을 수행할 수 있다. 즉, 본 발명의 자원 할당 방법은 컴퓨터에 의해 수행될 수 있다.The resource allocation device 300 is a kind of computing system and may perform a resource allocation method according to an embodiment of the present invention. That is, the resource allocation method of the present invention may be performed by a computer.

자원 할당 장치(300)는 기지국(100)과 접속된 컴퓨팅 시스템에 구비될 수 있다. 예를 들어, 자원 할당 장치(300)는 기지국(100)에 그 기능이 구현되거나 기지국(100)에 접속된 서버 등에 구현될 수 있다.The resource allocation device 300 may be provided in a computing system connected to the base station 100 . For example, the resource allocation device 300 may have its functions implemented in the base station 100 or implemented in a server connected to the base station 100 .

본 명세서에서 채널은 주파수 및 신호 세기 중에서 적어도 하나에 의해 구분되는 무선 통신을 위한 자원을 의미할 수 있다.In this specification, a channel may mean a resource for wireless communication distinguished by at least one of frequency and signal strength.

자원 할당 장치(300)의 구조 및 동작은 도 2 및 도 3을 통해 보다 상세하게 설명될 것이다.The structure and operation of the resource allocation device 300 will be described in detail with reference to FIGS. 2 and 3 .

도 2는 본 발명의 실시 예에 따른 자원 할당 장치를 나타내는 블록도이며, 도 3은 도 2에 도시된 강화 학습부의 강화 학습 과정을 나타내는 플로우 차트(flow chart)이다.2 is a block diagram showing a resource allocation device according to an embodiment of the present invention, and FIG. 3 is a flow chart showing a reinforcement learning process of the reinforcement learning unit shown in FIG. 2 .

도 2 및 도 3을 참조하면, 자원 할당 장치(300)는 강화 학습부(310)와 자원 할당부(330)를 포함할 수 있다.Referring to FIGS. 2 and 3 , the resource allocation device 300 may include a reinforcement learning unit 310 and a resource allocation unit 330 .

강화 학습부(310)는 구획들(HF)에 할당되는 채널들과 차량들(200)의 위치를 변경하면서 차량들(200)과 기지국(100) 간의 전체 데이터 처리량이 증가하는 방향으로 정책(policy)을 수정하는 강화 학습을 수행할 수 있다.The reinforcement learning unit 310 sets a policy in the direction of increasing the total data throughput between the vehicles 200 and the base station 100 while changing the locations of the vehicles 200 and the channels allocated to the divisions HF. ), reinforcement learning can be performed.

상기 강화 학습에서 환경(environment)은 무선 통신 시스템(10)의 X축 및 Y축의 길이, 도로(Road)의 길이, 너비 및 위치(좌표), 무선 링크(상향 링크, 하향 링크)의 속성, 차량들(200)의 이동 속도 등으로 정의될 수 있고, 행위(action)는 구획들(HF) 각각에 채널을 할당하는 것으로 정의될 수 있고, 보상(reward)은 무선 통신 시스템(10)에서의 전체 데이터 처리량(throughput)으로 정의될 수 있고, 정책(policy)은 주어진 상태에서 최적의 자원 할당을 수행하기 위한 기준(뉴럴 네트워크)로 정의될 수 있으며, 손실(loss)은 주어진 상태에서 에이전트(agent)가 정해진 행동을 했을 때 예상되는 보상과 실제 해당 행동을 했을 때 측정된 보상의 차이로 정의될 수 있다.In the reinforcement learning, the environment is the length of the X axis and the Y axis of the wireless communication system 10, the length, width and position (coordinates) of the road, attributes of the wireless link (uplink, downlink), vehicle It may be defined as the moving speed of the s 200, the action may be defined as allocating a channel to each of the partitions HF, and the reward may be defined as the total in the wireless communication system 10 It can be defined as data throughput, a policy can be defined as a criterion (neural network) for optimal resource allocation in a given state, and a loss can be defined as an agent in a given state. It can be defined as the difference between the reward expected when a given action is taken and the reward measured when the action is actually performed.

강화 학습부(310)는 최소 단위의 학습 과정을 3단계로 반복하면서 강화 학습을 수행할 수 있다.The reinforcement learning unit 310 may perform reinforcement learning while repeating the learning process of the minimum unit in three stages.

본 명세서에는, 설명의 편의를 위해, 반복 단위를 작은 순서대로 에포크(epoch), 에라(ear) 및 에피소드(episode)라고 정의한다. 상위 단계는 조건을 변경하면서 하위 단계를 설정된 횟수만큼 반복하는 것을 의미할 수 있다. 예를 들어, 에포크는 상기 최소 단위의 학습 과정을 의미하고, 1 에라는 상기 에포크가 n번(n은 2 이상의 자연수) 반복되는 것을 의미하고, 1 에피소드는 에라가 m번(m은 2 이상의 자연수) 반복되는 것을 의미한다. 에피소드는 p번(p는 2 이상의 자연수) 반복된다.In this specification, for convenience of description, repetition units are defined as epoch, ear, and episode in a small order. The upper step may mean repeating the lower step a set number of times while changing the condition. For example, epoch means the learning process of the minimum unit, 1 epoch means that the epoch is repeated n times (n is a natural number of 2 or more), and 1 episode means erra is m times (m is a natural number of 2 or more). ) means repeating. The episode is repeated p times (p is a natural number greater than or equal to 2).

강화 학습부(310)는, 심층 강화 학습의 전처리로서, 구역들(HF)을 정의하거나, 학습 단계의 반복 횟수, 즉, 에포크, 에라 및 에피소드의 반복 횟수를 정의하거나, 기지국(100)과 차량들(200)의 위치를 설정할 수 있다.The reinforcement learning unit 310, as a preprocessing of deep reinforcement learning, defines zones HF, or defines the number of repetitions of a learning step, that is, the number of repetitions of epochs, eras, and episodes, or the base station 100 and the vehicle The positions of the fields 200 can be set.

도 3의 단계들(S100 내지 S500)은 강화 학습부(310), 구체적으로, 강화 학습부의 에이전트(agent)에 의해 수행될 수 있다. 본 명세서에서는 설명의 편의를 위해 에이전트를 강화 학습부(310)로 통칭한다.Steps S100 to S500 of FIG. 3 may be performed by the reinforcement learning unit 310, specifically, an agent of the reinforcement learning unit. In this specification, for convenience of description, the agent is collectively referred to as the reinforcement learning unit 310 .

심층 강화 학습의 첫번째 단계로서, 강화 학습부(310)는 무선 통신 시스템(10)의 상태를 관측할 수 있다(S100). 예를 들어, 강화 학습부(310)는 무선 통신 시스템(10)에서의 기지국(100)과 차량등(200)의 위치를 관측할 수 있다.As a first step of deep reinforcement learning, the reinforcement learning unit 310 may observe the state of the wireless communication system 10 (S100). For example, the reinforcement learning unit 310 may observe the positions of the base station 100 and the vehicle light 200 in the wireless communication system 10 .

강화 학습부(310)는, 최소 단위의 학습 과정으로서, 관측된 상태를 기반으로 구획들(HF)에 할당되는 채널들을 변경하면서 기지국(100)과 차량들(200)간의 전체 데이터 처리량이 증가되는 방향으로 정책(policy)을 수정할 수 있다(S200). 다시 말해, 강화 학습부(310)는 관측된 상태를 기반으로 소정의 학습 횟수 동안 구획들(HF)에 할당되는 채널들을 변경하는 행동을 하면서 상기 전체 데이터 처리량을 보상으로서 측정하고 상기 보상을 최대화하고 손실을 최소화할 수 있도록 정책을 업데이트할 수 있다.The reinforcement learning unit 310, as a learning process of the minimum unit, increases the overall data throughput between the base station 100 and the vehicles 200 while changing the channels allocated to the sections HF based on the observed state. A policy may be modified in this direction (S200). In other words, the reinforcement learning unit 310 measures the total data throughput as a reward, maximizes the reward, and Policies can be updated to minimize losses.

강화 학습부(310)이 상기 최소 단위의 학습 과정을 수행하는 프로세스는 도 4를 통해 보다 상세하게 설명될 것이다.A process in which the reinforcement learning unit 310 performs the learning process of the minimum unit will be described in detail with reference to FIG. 4 .

도 4는 도 3에 도시된 S200 단계를 보다 상세하게 나타내는 플로우 차트이다.Figure 4 is a flow chart showing step S200 shown in Figure 3 in more detail.

도 4를 참조하면, 강화 학습부(310)는 정책에 따라 자원 할당 테이블을 생성하고 생성된 자원 할당 테이블에 기초하여 구획들(HF)에 채널들을 할당할 수 있다(S201).Referring to FIG. 4 , the reinforcement learning unit 310 may generate a resource allocation table according to a policy and allocate channels to partitions HF based on the generated resource allocation table (S201).

상기 자원 할당 테이블은 구획들(HF)의 번호를 인덱스(index)로 하고 구획들(HF) 각각에서 사용할 채널 번호를 요소로 하는 2차원 배열로 정의될 수 있다.The resource allocation table may be defined as a two-dimensional array having the number of segments HF as an index and a channel number to be used in each segment HF as an element.

하나의 구획에는 하나의 채널이 할당되는 것이, 시스템의 복잡도를 저감하는 측면에서, 가장 바람직할 수 있다. 강화 학습 이후 실제 무선 통신을 수행할 때 하나의 구획에 하나의 채널이 할당되어야 차량들(200)이 추가적인 확인/검증 프로세스 없이 자신의 위치에 따라 채널을 결정하여 통신할 수 있기 때문이다.Allocating one channel to one partition may be most preferable in terms of reducing the complexity of the system. This is because when performing actual wireless communication after reinforcement learning, one channel must be assigned to one section so that the vehicles 200 can communicate by determining a channel according to their location without an additional confirmation/verification process.

강화 학습부(310)는 차량들(200)과 기지국(100) 간의 통신을 관찰하여 무선 통신 시스템(10)에서의 전체 데이터 처리량을 측정할 수 있다(S203).The reinforcement learning unit 310 may observe communication between the vehicles 200 and the base station 100 to measure the total data throughput in the wireless communication system 10 (S203).

상기 전체 데이터 처리량은 차량들(200) 중에서 어느 하나로부터 차량들(200) 중에서 다른 차량 또는 기지국(100)으로의 상향 링크에서의 데이터 처리량과 기지국(100)으로부터 차량들(200)로의 하향 링크에서의 데이터 처리량을 포함할 수 있다.The total data throughput is the data throughput in the uplink from one of the vehicles 200 to another vehicle among the vehicles 200 or the base station 100 and the data throughput in the downlink from the base station 100 to the vehicles 200. of data throughput.

상기 전체 데이터 처리량은 다음의 수학식 1에 의해 계산될 수 있으며, 수학식 1은 심층 강화 학습에 있어서 목표 함수(objective function)으로 역할할 수 있다.The total data throughput can be calculated by Equation 1 below, and Equation 1 can serve as an objective function in deep reinforcement learning.

[수학식 1][Equation 1]

수학식 1에서 N은 채널들의 집합, V는 차량들(200)의 집합, B는 기지국(100)의 집합을 나타낸다.

는 상기 채널들 중에서 n번째 채널을 통해 차량들(200) 중에서 i번째 차량으로부터 차량들(200) 중에서 다른 차량으로의 상향 링크에서의 제1 데이터 처리량을 나타낸다.

는 상기 n번째 채널을 통한 상기 i번째 차량으로부터 복수의 기지국들 중에서 대응하는 기지국으로의 상향 링크에서의 제2 데이터 처리량을 나타낸다.

는 상기 n번째 채널을 통한 복수의 기지국들 중에서 j번째 기지국으로부터 차량들(200) 중에서 제1 차량으로의 하향 링크에서의 제3 데이터 처리량,

는 상기 n번째 채널을 통한 상기 j번째 기지국으로부터 차량들(200) 중에서 제2 차량으로의 하향 링크에서의 제4 데이터 처리량을 나타낸다.In Equation 1, N represents a set of channels, V represents a set of vehicles 200, and B represents a set of base stations 100.

denotes a first data throughput in an uplink from the i-th vehicle among the vehicles 200 to another vehicle among the vehicles 200 through the n-th channel among the above channels.

denotes a second data throughput in an uplink from the i-th vehicle through the n-th channel to a corresponding base station among a plurality of base stations.

is a third data throughput in a downlink from the j-th base station among a plurality of base stations through the n-th channel to the first vehicle among the vehicles 200;

denotes a fourth data throughput in a downlink from the j-th base station to the second vehicle among vehicles 200 through the n-th channel.

상기 제1 데이터 처리량은 다음의 수학식 2에 따라 계산될 수 있다.The first data throughput may be calculated according to Equation 2 below.

[수학식 2][Equation 2]

수학식 2에서

는 상기 i번째 차량에 상기 n번째 채널의 할당 여부를 나타내는 값이다. 즉,

이 '1'이면 상기 i번째 차량에 상기 n번째 채널이 할당된 것을 나타내고, 반대로,

이 '0'이면 상기 i번째 차량에 상기 n번째 채널이 할당되지 않은 것을 나타낸다.

는 상기 i번째 차량과 상기 다른 차량 사이의 상기 n번째 채널의 제1 SINR(signal to interference and noise ratio) 값을 나타낸다.in Equation 2

Is a value indicating whether the n-th channel is allocated to the i-th vehicle. in other words,

If is '1', it indicates that the n-th channel is assigned to the i-th vehicle, and conversely,

If is '0', it indicates that the n-th channel is not allocated to the i-th vehicle.

상기 제2 데이터 처리량은 다음의 수학식 3에 따라 계산될 수 있다.The second data throughput may be calculated according to Equation 3 below.

[수학식 3][Equation 3]

수학식 3에서

는 상기 i번째 차량에 상기 n번째 채널의 할당 여부를 나타내는 값이다.

는 상기 i번째 차량과 상기 복수의 기지국들 중에서 상기 대응하는 기지국 사이의 상기 n번째 채널의 제2 SINR 값을 나타낸다.in Equation 3

Is a value indicating whether the n-th channel is allocated to the i-th vehicle.

represents a second SINR value of the n-th channel between the i-th vehicle and the corresponding base station among the plurality of base stations.

상기 제3 데이터 처리량은 다음의 수학식 5에 따라 계산될 수 있다.The third data throughput may be calculated according to Equation 5 below.

[수학식 5][Equation 5]

수학식 5에서,

는 상기 j번째 기지국에 상기 n번째 채널의 할당 여부를 나타내는 값이다.

는 상기 n번째 채널을 통한 상기 j번째 기지국으로부터 상기 제1 차량으로의 하향 링크에서의 제3 SINR 값을 나타낸다.In Equation 5,

Is a value indicating whether the n-th channel is allocated to the j-th base station.

상기 제4 데이터 처리량은 다음의 수학식 6에 따라 계산될 수 있다.The fourth data throughput may be calculated according to Equation 6 below.

[수학식 6][Equation 6]

수학식 6에서,

는 상기 n번째 채널을 통한 상기 j번째 기지국으로부터 상기 제2 차량으로의 하향 링크에서의 제4 SINR 값을 나타낸다.In Equation 6,

상기 제1 SINR 값은 다음의 수학식 7에 따라 계산될 수 있다.The first SINR value may be calculated according to Equation 7 below.

[수학식 7][Equation 7]

수학식 7에서

는 상기 i번째 차량으로부터 상기 다른 차량으로 송신하는 상향 링크 신호의 세기(전력 크기)를 나타낸다. s는 차량들(200) 중에서 s번째 차량을 나타내며,

는 상기 s번째 차량에 상기 n번째 채널의 할당 여부를 나타내는 값이다.

는 상기 s번째 차량으로부터 상기 s번째 차량이 NOMA 방식으로 통신하는 특정 차량으로 송신하는 신호의 세기를 나타내며,

는 상기 s번째 차량으로부터 상기 s번째 차량이 NOMA 방식으로 통신하는 기지국으로 송신하는 신호의 세기를 나타낸다. t는 복수의 기지국들(100) 중에서 t번째 기지국을 나타내며,

는 상기 t번째 기지국에 상기 n번째 채널의 할당 여부를 나타내는 값이다.

는 상기 t번째 기지국으로부터 상기 t번째 기지국이 NOMA 방식으로 통신하는 차량들 중에서 어느(가까운) 하나로 송신하는 신호의 세기를 나타내며,

는 상기 t번째 기지국으로부터 상기 t번째 기지국이 NOMA 방식으로 통신하는 차량들 중에서 다른(먼) 하나로 송신하는 신호의 세기를 나타낸다.

는 잡음 세기 상수를 나타낸다.in Equation 7

represents the strength (power level) of an uplink signal transmitted from the ith vehicle to the other vehicle. s represents the s-th vehicle among the vehicles 200,

Is a value indicating whether the n-th channel is allocated to the s-th vehicle.

Represents the strength of a signal transmitted from the s-th vehicle to a specific vehicle that the s-th vehicle communicates with in the NOMA method,

represents the strength of a signal transmitted from the s-th vehicle to a base station with which the s-th vehicle communicates in a NOMA manner. t represents a t-th base station among a plurality of base stations 100,

Is a value indicating whether the n-th channel is allocated to the t-th base station.

Represents the strength of a signal transmitted from the t-th base station to any (near) one of the vehicles that the t-th base station communicates with in the NOMA method,

denotes the strength of a signal transmitted from the t-th base station to another (distant) vehicle among vehicles that the t-th base station communicates with in the NOMA method.

represents the noise intensity constant.

H는 송신기과 수신기 사이의 채널 이득(channel gaim)을 의미하는데,

이라고 함은 k번째 채널을 통해 l번째 송신기와 m번째 수신기 사이의 채널 이득을 의미하며, 다음의 수학식 8에 의해 정의된다.H denotes the channel gain between the transmitter and the receiver,

Means a channel gain between the lth transmitter and the mth receiver through the kth channel, and is defined by Equation 8 below.

[수학식 8][Equation 8]

수학식 8에서,

는 상기 k번째 채널에서 상기 l번째 송신기와 상기 m번째 수신기 사이의 fast fading component를 나타내는데, 이는 complex Gaussian 분포를 따르며

이다.

은 상기 l번째 송신기와 상기 m번째 수신기 사이의 log-normal shadowing 랜덤 변수를 나타내며, 표준편차

를 가진다.

은 상기 l번째 송신기와 상기 m번째 수신기 사이의 path-loss 상수를 나타낸다.

은 상기 l번째 송신기와 상기 m번째 수신기 사이의 거리를 나타낸다.

은 상기 l번째 송신기의 path-loss exponent를 나타낸다.In Equation 8,

Represents a fast fading component between the l-th transmitter and the m-th receiver in the k-th channel, which follows a complex Gaussian distribution

am.

Represents a log-normal shadowing random variable between the lth transmitter and the mth receiver, and the standard deviation

have

represents a path-loss constant between the l-th transmitter and the m-th receiver.

represents the distance between the l-th transmitter and the m-th receiver.

represents the path-loss exponent of the lth transmitter.

상기 제2 SINR 값은 다음의 수학식 9에 따라 계산될 수 있다.The second SINR value may be calculated according to Equation 9 below.

[수학식 9][Equation 9]

수학식 9는 수학식 7에서

이

으로,

이

으로,

이

으로 변경된 것을 제외하고는 실질적으로 동일하므로 중복되는 설명은 생략한다.

는 상기 i번째 차량이 상기 기지국으로 송신하는 상향 링크 신호의 세기를 나타낸다.Equation 9 is in Equation 7

this

by,

this

by,

this

Since they are substantially the same except for being changed to , duplicate descriptions are omitted.

represents the strength of an uplink signal transmitted from the i-th vehicle to the base station.

상기 제3 SINR 값은 다음의 수학식 10에 따라 계산될 수 있다.The third SINR value may be calculated according to Equation 10 below.

[수학식 10][Equation 10]

수학식 10는 수학식 7에서

이

으로,

이

으로,

이

는 상기 j번째 기지국이 상기 제1 차량으로 송신하는 하향 링크 신호의 세기를 나타낸다.Equation 10 is in Equation 7

this

by,

this

by,

this

represents the strength of the downlink signal transmitted from the j-th base station to the first vehicle.

상기 제4 SINR 값은 다음의 수학식 11에 따라 계산될 수 있다.The fourth SINR value may be calculated according to Equation 11 below.

[수학식 11][Equation 11]

수학식 11는 수학식 7에서

이

으로,

이

으로,

이

는 상기 j번째 기지국이 상기 제2 차량으로 송신하는 하향 링크 신호의 세기를 나타낸다.Equation 11 is in Equation 7

this

by,

this

by,

this

represents the strength of the downlink signal transmitted from the j-th base station to the second vehicle.

강화 학습부(310)는 손실이 최소화되도록 뉴럴 네트워크의 가중치를 업데이트할 수 있다(S205).The reinforcement learning unit 310 may update the weights of the neural network so that the loss is minimized (S205).

강화 학습부(310)는 목표 함수에 의해 계산된 값과 실제 시뮬레이션된 도는 측정된 값 사이의 차이인 손실이 최소화될 수 있도록 뉴럴 네트워크의 가중치를 변경할 수 있다.The reinforcement learning unit 310 may change the weight of the neural network so that a loss, which is a difference between a value calculated by the target function and an actual simulated or measured value, can be minimized.

도 4에 도시된 단계들(S201 내지 S205)이 본 발명에서 최소 단위의 학습 과정을 나타낸다.Steps S201 to S205 shown in FIG. 4 represent the minimum unit learning process in the present invention.

다시 도 2와 도 3을 참조하면, 강화 학습부(310)는 최소 단위의 학습 과정인 에포크(S200)의 반복 횟수를 카운트하며 n번(n은 2 이상의 자연수) 반복한다(S210의 NO 브랜치). Referring back to FIGS. 2 and 3 , the reinforcement learning unit 310 counts the number of repetitions of the epoch (S200), which is the minimum learning process, and repeats n times (n is a natural number greater than or equal to 2) (NO branch of S210). .

에포크가 n번 반복되어 1 에라가 종료되면(S220의 YES 브랜치), 강화 학습부(310)는 차량들(200)마다 설정된 운행 정보에 따라 차량들(200)의 위치를 이동시킬 수 있다(S300).When epoch is repeated n times and 1 era ends (YES branch of S220), the reinforcement learning unit 310 may move the locations of the vehicles 200 according to the driving information set for each vehicle 200 (S300). ).

실시 예에 따라, 운행 정보는 차량들(200) 각각의 진행 방향과 속도를 포함할 수 있다.Depending on the embodiment, the driving information may include the traveling direction and speed of each of the vehicles 200 .

강화 학습부(310)는 차량들(200)을 이동시킨 후 S200 내지 S210의 강화 학습, 즉, 새로운 에라를 시작한다. 강화 학습부(310)는 에라의 반복 횟수를 카운트하며 m번(m은 2 이상의 자연수) 반복한다(S310의 NO 브랜치).After moving the vehicles 200, the reinforcement learning unit 310 starts the reinforcement learning of S200 to S210, that is, a new era. The reinforcement learning unit 310 counts the number of repetitions of Era and repeats it m times (where m is a natural number greater than or equal to 2) (NO branch of S310).

에라가 m번 반복되어 1 에피소드가 종료되면(S310의 YES 브랜치), 강화 학습부(410)는 차량들(200)의 위치를 초기화한다. 여기서, 초기화라 함은 차량들(200)의 위치를 랜덤하게 결정하는 것을 의미하며, 차량들(200)이 추가되거나 제거되는 것을 포함할 수 있다.When ERA is repeated m times and one episode ends (YES branch of S310), the reinforcement learning unit 410 initializes the positions of the vehicles 200. Here, initialization means randomly determining the locations of the vehicles 200, and may include adding or removing the vehicles 200.

강화 학습부(310)는 차량들(200)의 위치를 초기화한 후 S200 내지 S310의 강화 학습, 즉, 새로운 에피소드를 시작한다. 강화 학습부(310)는 에피소드의 반복 횟수를 카운트하며 p번(p는 2 이사의 자연수) 반복한다(S410의 NO 브랜치).After initializing the positions of the vehicles 200, the reinforcement learning unit 310 starts the reinforcement learning of S200 to S310, that is, a new episode. The reinforcement learning unit 310 counts the number of repetitions of the episode and repeats it p times (p is a natural number greater than or equal to 2) (NO branch of S410).

에피소드가 p번 반복한 후(S410의 NO 브랜치), 강화 학습부(310)는 마지막 버전의 정책에 따라 자원 할당 모델을 도출한다(S500). 예를 들어, 강화 학습부(310)는 전책에 따라 자원 할당 테이블을 생성하고 생성된 자원 할당 테이블을 자원 할당부(330)로 출력할 수 있다.After the episode is repeated p times (NO branch of S410), the reinforcement learning unit 310 derives a resource allocation model according to the last version of the policy (S500). For example, the reinforcement learning unit 310 may generate a resource allocation table according to the strategy and output the generated resource allocation table to the resource allocation unit 330 .

자원 할당부(330)는 강화 학습부(310)로부터 수신된 자원 할당 테이블을 기지국(100)과 차량들(200)로 전송할 수 있다. 자원 할당부(330)는 강화 학습부(310)로부터 최종적인 자원 할당 테이블을 수신할 수 있음은 물론이며 강화 학습 동안의 자원 할당 테이블도 수신할 수 있다.The resource allocation unit 330 may transmit the resource allocation table received from the reinforcement learning unit 310 to the base station 100 and vehicles 200 . The resource allocation unit 330 may receive a final resource allocation table from the reinforcement learning unit 310 as well as a resource allocation table during reinforcement learning.

기지국(100)과 차량들(200)은 자원 할당 테이블에 기초하여 무선 통신에 사용할 채널을 스스로 선택하여 데이터를 송신할 수 있다.The base station 100 and the vehicles 200 may transmit data by themselves selecting a channel to be used for wireless communication based on the resource allocation table.

이와 같은 심층 강화 학습 과정을 통해 자원 할당 테이블을 결정하고 이를 무선 통신 시스템(10)에서 사용함에 따라 기지국(100)과 차량들(200)은 가용 채널 탐색 시간 등 큰 지연 없이 무선 통신을 수행할 수 있으며 심층 강화 학습의 특성에 따라 매우 효율적이고 충돌 없는 채널 할당이 가능하다.As the resource allocation table is determined through such a deep reinforcement learning process and used in the wireless communication system 10, the base station 100 and the vehicles 200 can perform wireless communication without a large delay such as an available channel search time. According to the characteristics of deep reinforcement learning, very efficient and conflict-free channel allocation is possible.

도 5 내지 도 8은 본 발명의 기술적 사상을 시뮬레이션한 과정과 결과를 나타낸다.5 to 8 show the process and results of simulating the technical idea of the present invention.

도 5는 도 3에 도시된 강화 학습 과정을 시뮬레이션하기 위한 가상의 시스템을 나타내는 도면이고, 도 6은 강화 학습이 수행되는 동안 학습 초기 단계에서의 손실의 변화를 나타내는 그래프이고, 도 7은 강화 학습이 수행되는 동안 학습 후기 단계에서의 손실의 변화를 나타내는 그래프이며, 도 8은 강화 학습이 수행되는 동안 할인 보상(discounted reward)의 변화를 나타내는 그래프이다.5 is a diagram showing a virtual system for simulating the reinforcement learning process shown in FIG. 3, FIG. 6 is a graph showing a change in loss in the initial stage of learning while reinforcement learning is being performed, and FIG. 7 is reinforcement learning. 8 is a graph showing a change in discounted reward while reinforcement learning is being performed.

도 5 내지 도 8을 참조하면, 도 5에 도시한 바와 같이, X축과 Y축 모두 2,000[m] 이상의 구역에서의 무선 통신 시스템(10)에 9개의 기지국들(MBS)과 복수의 차량들(VUE)이 있는 것으로 설정하였다. 이와 같은 설정에서 에포크는 수천번 반복, 에라는 수백번 반복, 에피소드는 수십번 반복하는 것으로 시뮬레이션하였다.5 to 8, as shown in FIG. 5, nine base stations (MBS) and a plurality of vehicles are installed in the wireless communication system 10 in an area of 2,000 [m] or more in both the X axis and the Y axis. (VUE) was set to exist. In this setting, epochs were repeated thousands of times, eras were repeated hundreds of times, and episodes were repeated dozens of times.

도 6에 도시한 바와 같이, 학습 초기(첫번째 에피소드, 첫번째 에라, 첫번째 에포크)에는 목표 함수의 결과와 실제 시뮬레이션 결과 사이의 차이인 손실이 비교적 크고 심지어 발산하는 것으로 나타났다.As shown in FIG. 6, at the beginning of learning (first episode, first era, first epoch), the loss, which is the difference between the result of the target function and the actual simulation result, is relatively large and even divergent.

그러나, 도 7에 도시한 바와 같이. 학습 중반(첫번째 에피소드, 40번째 에라, 첫번째 에포크)에는 손실이 빠르게 '0'으로 수렴하는 것으로 나타났다.However, as shown in Figure 7. In the middle of training (1st episode, 40th era, 1st epoch), it was found that the loss quickly converged to '0'.

또한, 도 8에 도시한 바와 같이, 학습 후반(여러 번의 에피소드를 진행한 후)에는 임의의 에라에서도 할인 보상(discounted reward)가 높은 값을 가지며 지속적으로 상승하는 것으로 나타났다.In addition, as shown in FIG. 8, in the latter half of learning (after several episodes), it was found that the discounted reward had a high value and continuously increased in any era.

이와 같은 시뮬레이션 결과를 통해, 본 발명에 따른 심층 강화 학습을 통해 비직교 다중 접속 방식 기반 차량 통신에서의 자원 할당을 효율적으로 수행할 수 있음을 인지할 수 있다.Through such a simulation result, it can be recognized that resource allocation in vehicle communication based on a non-orthogonal multiple access scheme can be efficiently performed through deep reinforcement learning according to the present invention.

본 발명은 도면에 도시된 일 실시 예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시 예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 등록청구범위의 기술적 사상에 의해 정해져야 할 것이다.Although the present invention has been described with reference to an embodiment shown in the drawings, this is only exemplary, and those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom. Therefore, the true technical protection scope of the present invention should be determined by the technical spirit of the attached claims.

10; 무선 통신 시스템
100; 기지국
200; 차량
300; 자원 할당 장치
310; 강화 학습부
330; 자원 할당부10; wireless communication system
100; base station
200; vehicle
300; resource allocation device
310; Reinforcement Learning Department
330; resource allocation department

Claims

a reinforcement learning unit that performs reinforcement learning for modifying a policy to increase overall data throughput between the vehicles and at least one base station while changing channels allocated to a plurality of segments and positions of at least two or more vehicles; and
A resource allocation unit generating a resource allocation table in which one of the channels is allocated to each of the partitions according to a policy determined by the reinforcement learning unit and transmitting the resource allocation table to the vehicles and the base station,
The reinforcement learning,
After performing an action to change the channels allocated to the partitions, the data throughput is measured as a reward, and the learning process of modifying the policy is performed n times (n is 2) so that the total data throughput is increased. (a) repeating the natural number of the above);
(b) repeating step (a) m times (m is a natural number equal to or greater than 2) while moving the locations of the vehicles according to driving information set for each vehicle; and
(c) repeating step (b) p times (p is a natural number equal to or greater than 2) while changing the initial positions of the vehicles;
The driving information includes the direction and speed of each of the vehicles. A device for allocating resources in vehicle communication based on a non-orthogonal multiple access scheme.

delete

According to claim 1,
In step (a),
allocating the channels to the partitions based on a resource allocation table generated according to the policy;
measuring the total data throughput by observing communications between the vehicles and the base station; and
An apparatus for allocating resources in vehicular communication based on a non-orthogonal multiple access scheme, comprising updating weights of a neural network such that a loss, which is a difference between a result value of a predetermined objective function and the compensation, is minimized.

delete

According to claim 1,
In non-orthogonal multiple access scheme based vehicle communication, the total data throughput includes a data throughput in an uplink from each of the vehicles to another vehicle or the base station and a data throughput in a downlink from the base station to the vehicles. resource allocation device.

According to claim 1,
The total data throughput is calculated by Equation 1 below,
[Equation 1]

Here, N is the set of channels, V is the set of vehicles, B is the set of base stations,

represents a fourth data throughput in downlink from the j-th base station to the second vehicle through the n-th channel.

According to claim 7,
The first data throughput is calculated according to Equation 2 below,
[Equation 2]

here,

According to claim 7,
The second data throughput is calculated according to Equation 3 below,
[Equation 3]

here,

Is a second SINR value of the n-th channel between the i-th vehicle and the base station.

According to claim 7,
The third data throughput is calculated according to Equation 5 below,
[Equation 4]

here,

represents a third SINR value in downlink from the j-th base station to the first vehicle through the n-th channel.

According to claim 7,
The fourth data throughput is calculated according to Equation 6 below,
[Equation 6]

here,

represents a fourth SINR value in a downlink from the j-th base station to the second vehicle through the n-th channel.

After an action of changing channels allocated to a plurality of segments, the total data throughput between at least two or more vehicles and at least one base station is measured as a reward, and a policy is established so that the total data throughput is increased. (a) repeating the learning process of correcting ) n times (n is a natural number equal to or greater than 2);
(b) repeating step (a) m times (m is a natural number equal to or greater than 2) while moving the locations of the vehicles according to driving information set for each vehicle;
(c) repeating step (b) p times (p is a natural number equal to or greater than 2) while changing the initial positions of the vehicles; and
(d) allocating the channels to the partitions according to the policy determined through steps (a) to (c);
The method of allocating resources in vehicle communication based on a non-orthogonal multiple access method, wherein the driving information includes the traveling direction and speed of each of the vehicles.

According to claim 12,
In step (a),
allocating the channels to the partitions based on a resource allocation table generated according to the policy;
measuring the total data throughput by observing communications between the vehicles and the base station; and
A method of allocating resources in vehicular communication based on a non-orthogonal multiple access scheme, comprising the step of updating weights of a neural network such that a loss that is a difference between a resultant value of a predetermined objective function and the compensation is minimized.

delete

According to claim 12,
In non-orthogonal multiple access scheme based vehicle communication, the total data throughput includes a data throughput in an uplink from each of the vehicles to another vehicle or the base station and a data throughput in a downlink from the base station to the vehicles. resource allocation method.

According to claim 12,
The total data throughput is calculated by Equation 1 below,
[Equation 1]

represents a fourth data throughput in a downlink from the j-th base station to the second vehicle through the n-th channel.

According to claim 17,
The first data throughput is calculated according to Equation 2 below,
[Equation 2]

here,

Represents a first signal to interference and noise ratio (SINR) value of the n-th channel between the i-th vehicle and the other vehicle. Resource allocation method in vehicle communication based on a non-orthogonal multiple access scheme.

According to claim 17,
The second data throughput is calculated according to Equation 3 below,
[Equation 3]

here,

A resource allocation method in vehicle communication based on a non-orthogonal multiple access scheme representing a second SINR value of the n-th channel between the i-th vehicle and the base station.

According to claim 17,
The third data throughput is calculated according to Equation 5 below,
[Equation 4]

here,

represents a third SINR value in a downlink from the j-th base station to the first vehicle through the n-th channel.

According to claim 17,
The fourth data throughput is calculated according to Equation 6 below,
[Equation 5]

here,

represents a fourth SINR value in downlink from the j-th base station to the second vehicle through the n-th channel.