KR20220087717A

KR20220087717A - Method and device for controlling resource reallocation in vehicle network

Info

Publication number: KR20220087717A
Application number: KR1020200178041A
Authority: KR
Inventors: 김성륜; 오승은; 이지훈
Original assignee: 연세대학교 산학협력단
Priority date: 2020-12-18
Filing date: 2020-12-18
Publication date: 2022-06-27
Also published as: KR102433577B1

Abstract

개시된 기술은 차량 네트워크의 자원 재할당 제어 방법 및 장치에 관한 것으로, 디바이스가 차량 네트워크를 통해 복수의 차량들에 대한 위치정보, 할당된 리소스 정보 및 리워드를 수신하는 단계; 상기 디바이스가 상기 위치정보 및 리워드를 토대로 상기 차량의 자원 재할당에 대한 제 1 확률(Probability)을 계산하고, 강화학습 모델에 상기 리소스 정보 및 리워드를 입력하여 상기 자원 재할당에 대한 제 2 확률을 계산하는 단계; 및 상기 디바이스가 상기 제 1 확률 및 상기 제 2 확률에 대한 계산 결과를 토대로 상기 차량 네트워크의 정책(Policy)을 결정하는 단계;를 포함한다.The disclosed technology relates to a method and apparatus for controlling resource reallocation of a vehicle network, the method comprising: receiving, by a device, location information, allocated resource information, and a reward for a plurality of vehicles through a vehicle network; The device calculates a first probability for resource reallocation of the vehicle based on the location information and the reward, and inputs the resource information and reward to the reinforcement learning model to obtain a second probability for the resource reallocation calculating; and determining, by the device, a policy of the vehicle network based on the calculation results for the first probability and the second probability.

Description

Method and device for controlling resource reallocation of vehicle network {METHOD AND DEVICE FOR CONTROLLING RESOURCE REALLOCATION IN VEHICLE NETWORK}

개시된 기술은 차량 네트워크의 자원 재할당을 제어하는 방법 및 장치에 관한 것이다.The disclosed technology relates to a method and apparatus for controlling resource reallocation in a vehicle network.

종래 서브 6GHz 대역을 활용하는 차량 네트워크의 대표적인 표준으로는 C-V2X, 그리고 IEEE 802.11p가 있다. C-V2X 표준은 자율적인 스케줄링 기반의 C-V2X 모드 4와 네트워크에서 스케줄링을 제어하는 C-V2X 모드 3를 포함하고 있다.Representative standards of vehicle networks using the conventional sub 6 GHz band include C-V2X and IEEE 802.11p. The C-V2X standard includes C-V2X mode 4 based on autonomous scheduling and C-V2X mode 3 that controls scheduling in the network.

C-V2X 모드 4는 각 차량들이 일정 수의 패킷을 전송하게 되면 센싱 정보를 바탕으로 하여 기존에 사용하던 리소스 블록(Resource block)을 그대로 사용할 것인지 아니면 다른 리소스 블록으로 갱신하여 통신할 것인지를 결정한다. C-V2X 모드 3는 기지국이 센싱한 정보를 바탕으로 리소스 블록을 각 차량들에게 직접 할당하는 기법이다.In C-V2X mode 4, when each vehicle transmits a certain number of packets, based on sensing information, it is determined whether to use the existing resource block as it is or to update it with another resource block to communicate. . C-V2X mode 3 is a technique for directly allocating a resource block to each vehicle based on information sensed by the base station.

한편, 차량의 자율주행 레벨이 증가하고 차량 간 어플리케이션이 다양회됨에 따라 더 높은 데이터 전송 속도를 충족할 수 있는 MAC 연구의 필요성이 증가하고 있다. 이에 따라 Abobe-6GHz, 즉 밀리미터파를 이용한 방향성 전송에 대한 연구가 집중적으로 이루어지고 있다.On the other hand, as the autonomous driving level of vehicles increases and inter-vehicle applications diversify, the need for MAC research that can meet higher data transmission rates is increasing. Accordingly, research on directional transmission using Abobe-6GHz, that is, millimeter wave, is being intensively conducted.

한편, C-V2X 모드 4의 경우 기존에 있는 인프라를 활용하지 않고 인프라를 활용하는 기법 대비 자원 선택의 효율이 떨어지는 문제점이 있으나 인프라를 거치면서 추가로 발생하는 지연시간(Latency)이 없다는 장점이 있다. 반면 C-V2X 모드 3의 경우 추가적인 지연시간이 발생하는 단점이 있지만 효율적인 자원 선택을 통한 데이터 전송 이득이 발생한다.On the other hand, in the case of C-V2X mode 4, there is a problem in that the efficiency of resource selection is lower than that of the method that does not utilize the existing infrastructure and uses the infrastructure, but there is an advantage that there is no additional latency caused by going through the infrastructure. . On the other hand, in the case of C-V2X mode 3, there is a disadvantage in that an additional delay time occurs, but a data transmission gain occurs through efficient resource selection.

최근 서브 6GHz 대역에서 Abobe-6GHz로 대역이 옮겨감에 따라 C-V2X 모드 3의 경우 기지국이 센싱한 정보를 바탕으로 자원을 할당해주기 때문에 모든 차량들이 방향성 안테나를 활용하는 경우 데이터 전송률이 떨어지는 문제가 발생할 수 있다. 그리고 C-V2X 모드 4의 경우에는 밀리미터파를 이용하더라도 종전의 서브 6GHz 대역과 큰 차이가 나타나지 않는다.As the band recently moved from the sub 6GHz band to the Abobe-6GHz band, in the case of C-V2X mode 3, the data rate drops when all vehicles use directional antennas because resources are allocated based on the information sensed by the base station. can occur And in the case of C-V2X mode 4, even if millimeter wave is used, there is no significant difference from the previous sub 6GHz band.

한국 공개특허 제10-2020-0096096호Korean Patent Publication No. 10-2020-0096096

개시된 기술은 차량 네트워크의 자원 재할당을 제어하는 방법 및 장치를 제공하는데 있다.The disclosed technology is to provide a method and apparatus for controlling resource reallocation of a vehicle network.

상기의 기술적 과제를 이루기 위하여 개시된 기술의 제 1 측면은 디바이스가 차량 네트워크를 통해 복수의 차량들에 대한 위치정보, 할당된 리소스 정보 및 리워드를 수신하는 단계, 상기 디바이스가 상기 위치정보 및 리워드를 토대로 상기 차량의 자원 재할당에 대한 제 1 확률(Probability)을 계산하고, 강화학습 모델에 상기 리소스 정보 및 리워드를 입력하여 상기 자원 재할당에 대한 제 2 확률을 계산하는 단계 및 상기 디바이스가 상기 제 1 확률 및 상기 제 2 확률에 대한 계산 결과를 토대로 상기 차량 네트워크의 정책(Policy)을 결정하는 단계를 포함하는 차량 네트워크의 자원 재할당 제어 방법을 제공하는데 있다.A first aspect of the disclosed technology to achieve the above technical problem is the device receiving location information, allocated resource information, and a reward for a plurality of vehicles through a vehicle network, the device receiving the location information and the reward based on the calculating a first probability for the resource reallocation of the vehicle, inputting the resource information and a reward to a reinforcement learning model, and calculating a second probability for the resource reallocation, and the device performs the first An object of the present invention is to provide a method for controlling resource reallocation of a vehicle network, comprising determining a policy of the vehicle network based on a probability and a calculation result for the second probability.

상기의 기술적 과제를 이루기 위하여 개시된 기술의 제 2 측면은 복수의 차량들에 대한 위치정보, 할당된 리소스 정보 및 리워드를 수신하는 안테나, 상기 복수의 차량들에 대한 정책을 결정하기 위해서 복수의 강화학습 모델을 저장하는 메모리 및 상기 복수의 강화학습 모델 중 제 1 강화학습 모델에 상기 위치정보 및 리워드를 입력하여 상기 복수의 차량의 자원 재할당에 대한 제 1 확률을 계산하고, 상기 복수의 강화학습 모델 중 제 2 강화학습 모델에 상기 리소스 정보 및 리워드를 입력하여 상기 자원 재할당에 대한 제 2 확률을 계산하고, 상기 제 1 확률 및 상기 제 2 확률에 대한 계산 결과를 토대로 차량 네트워크의 정책을 결정하는 프로세서를 포함하는 차량 네트워크의 자원 재할당 제어 장치를 제공하는데 있다.A second aspect of the disclosed technology to achieve the above technical task is an antenna for receiving location information, allocated resource information and rewards for a plurality of vehicles, and a plurality of reinforcement learning to determine a policy for the plurality of vehicles A memory storing a model and inputting the location information and a reward to a first reinforcement learning model among the plurality of reinforcement learning models calculates a first probability for resource reallocation of the plurality of vehicles, and the plurality of reinforcement learning models of the second reinforcement learning model, calculating a second probability for the resource reallocation by inputting the resource information and the reward, and determining a vehicle network policy based on the calculation results for the first probability and the second probability An object of the present invention is to provide an apparatus for controlling resource reallocation of a vehicle network including a processor.

개시된 기술의 실시 예들은 다음의 장점들을 포함하는 효과를 가질 수 있다. 다만, 개시된 기술의 실시 예들이 이를 전부 포함하여야 한다는 의미는 아니므로, 개시된 기술의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다. Embodiments of the disclosed technology may have effects including the following advantages. However, since it does not mean that the embodiments of the disclosed technology should include all of them, it should not be understood that the scope of the disclosed technology is limited thereby.

개시된 기술의 일 실시예에 따르면 차량 네트워크의 자원 재할당 제어 방법 및 장치는 차량에 할당된 리소스 블록의 재할당 가능성 및 확률을 복수의 강화학습 모델을 통해 계산하여 차량 네트워크의 정책을 높은 신뢰도로 결정하는 효과가 있다.According to an embodiment of the disclosed technology, a method and apparatus for controlling resource reallocation of a vehicle network calculates the reallocation possibility and probability of a resource block allocated to a vehicle through a plurality of reinforcement learning models to determine a vehicle network policy with high reliability has the effect of

또한, 차량 네트워크의 통신 환경을 고려하여 높은 데이터 전송률을 유지하는 효과가 있다.In addition, there is an effect of maintaining a high data transmission rate in consideration of the communication environment of the vehicle network.

또한, 분산 저장된 크리틱 모델을 이용하여 데이터 전송량을 줄이는 효과가 있다.In addition, there is an effect of reducing the amount of data transmission by using the distributed and stored crit model.

도 1은 개시된 기술의 일 실시예에 따른 차량 네트워크의 자원 재할당 제어 과정을 나타낸 도면이다.
도 2는 개시된 기술의 일 실시예에 따른 차량 네트워크의 자원 재할당 제어 방법에 대한 순서도이다.
도 3은 개시된 기술의 일 실시예에 따른 차량 네트워크의 자원 재할당 제어 장치에 대한 블록도이다.
도 4는 개시된 기술의 일 실시예에 따라 디바이스의 강화학습 모델을 이용하는 것을 나타낸 도면이다.
도 5는 개시된 기술의 일 실시예에 따라 복수의 차량에 분산된 강화학습 모델을 이용하는 것을 나타낸 도면이다.
도 6은 리소스 블록 후보군을 결정하는 것을 나타낸 도면이다.1 is a diagram illustrating a resource reallocation control process of a vehicle network according to an embodiment of the disclosed technology.
2 is a flowchart of a method for controlling resource reallocation of a vehicle network according to an embodiment of the disclosed technology.
3 is a block diagram of an apparatus for controlling resource reallocation of a vehicle network according to an embodiment of the disclosed technology.
4 is a diagram illustrating using a reinforcement learning model of a device according to an embodiment of the disclosed technology.
5 is a diagram illustrating the use of a reinforcement learning model distributed over a plurality of vehicles according to an embodiment of the disclosed technology.
6 is a diagram illustrating determination of a resource block candidate group.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Since the present invention can have various changes and can have various embodiments, specific embodiments are illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to specific embodiments, and it should be understood to include all modifications, equivalents and substitutes included in the spirit and scope of the present invention.

제 1 , 제 2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 해당 구성요소들은 상기 용어들에 의해 한정되지는 않으며, 단지 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제 1 구성요소는 제 2 구성요소로 명명될 수 있고, 유사하게 제 2 구성요소도 제 1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Terms such as first, second, A, and B may be used to describe various components, but the components are not limited by the above terms, and only for the purpose of distinguishing one component from other components. is used only as For example, without departing from the scope of the present invention, a first component may be referred to as a second component, and similarly, a second component may also be referred to as a first component. and/or includes a combination of a plurality of related listed items or any of a plurality of related listed items.

본 명세서에서 사용되는 용어에서 단수의 표현은 문맥상 명백하게 다르게 해석되지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 한다. 그리고 "포함한다" 등의 용어는 설시된 특징, 개수, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 의미하는 것이지, 하나 또는 그 이상의 다른 특징들이나 개수, 단계 동작 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 배제하지 않는 것으로 이해되어야 한다.In terms of terms used herein, the singular expression is to be understood as including the plural expression unless the context clearly dictates otherwise. And terms such as "comprising" mean that the specified feature, number, step, operation, component, part, or a combination thereof exists, but one or more other features or number, step operation component, part It should be understood as not excluding the possibility of the presence or addition of or combinations thereof.

도면에 대한 상세한 설명을 하기에 앞서, 본 명세서에서의 구성부들에 대한 구분은 각 구성부가 담당하는 주기능 별로 구분한 것에 불과함을 명확히 하고자 한다. 즉, 이하에서 설명할 2개 이상의 구성부가 하나의 구성부로 합쳐지거나 또는 하나의 구성부가 보다 세분화된 기능별로 2개 이상으로 분화되어 구비될 수도 있다. Prior to a detailed description of the drawings, it is intended to clarify that the classification of the constituent parts in the present specification is merely a division according to the main function each constituent unit is responsible for. That is, two or more components to be described below may be combined into one component, or one component may be divided into two or more for each more subdivided function.

그리고 이하에서 설명할 구성부 각각은 자신이 담당하는 주기능 이외에도 다른 구성부가 담당하는 기능 중 일부 또는 전부의 기능을 추가적으로 수행할 수도 있으며, 구성부 각각이 담당하는 주기능 중 일부 기능이 다른 구성부에 의해 전담되어 수행될 수도 있음은 물론이다. 따라서, 본 명세서를 통해 설명되는 각 구성부들의 존재 여부는 기능적으로 해석되어야 할 것이다.In addition, each of the constituent units to be described below may additionally perform some or all of the functions of other constituent units in addition to the main function it is responsible for. Of course, it may be carried out by being dedicated to it. Accordingly, the existence or non-existence of each component described through the present specification should be interpreted functionally.

도 1은 개시된 기술의 일 실시예에 따른 차량 네트워크의 자원 재할당 제어 과정을 나타낸 도면이다. 도 1을 참조하면 차량 네트워크 내 노드들로 복수의 차량들과 기지국이 포함된다. 예컨대, 차량 A, 차량 B 및 차량 C 는 서로 네트워크를 통해 센싱데이터를 송수신할 수 있다. 그리고 차량 뿐만 아니라 기지국에도 센싱데이터를 전송할 수 있다. 기본적인 네트워크의 구조는 셀룰러-V2X(Cellular-V2X, C-V2X)의 모드 3의 형태일 수 있다. 다만 내부적으로 처리되는 데이터 프레임워크는 C-V2X 모드 4를 기반으로 할 수 있다. 즉, C-V2X 모드 3와 모드 4가 혼용된 형태일 수 있다. 기지국에는 각 차량들에 자원을 재할당하기 위한 디바이스가 탑재된다. 디바이스(110)에는 자원 재할당 확률을 계산하는 강화학습 모델이 저장되어 있다.1 is a diagram illustrating a resource reallocation control process of a vehicle network according to an embodiment of the disclosed technology. Referring to FIG. 1 , a plurality of vehicles and a base station are included as nodes in a vehicle network. For example, vehicle A, vehicle B, and vehicle C may transmit/receive sensing data to each other through a network. In addition, sensing data can be transmitted not only to the vehicle but also to the base station. The basic network structure may be in the form of mode 3 of cellular-V2X (C-V2X). However, the internally processed data framework can be based on C-V2X mode 4. That is, C-V2X mode 3 and mode 4 may be mixed. The base station is equipped with a device for reallocating resources to each vehicle. The device 110 stores a reinforcement learning model for calculating the resource reallocation probability.

한편, 자원 재할당을 제어하는 장치는 지상의 기지국 뿐만 아니라 도로 상의 인프라에 설치되는 노변 기지국(Road Side Unit, RSU)을 이용할 수도 있다. 디바이스는 차량 네트워크를 통해 복수의 차량들에 대한 위치정보, 할당된 리소스 정보 및 리워드를 수신한다. 차량의 위치정보는 차량 내 탑재된 안테나 또는 센서의 센싱값을 의미한다. 예컨대, 복수개의 차량이 특정 구간을 통과할 때 차체에 각각 탑재된 센서의 감지 반경 이내에서 수집되는 센싱 결과를 위치정보로 이용할 수 있다. 그리고 리소스 정보는 현재 차량에 할당된 리소스 블록의 정보 및 리소스 블록의 크기 정보를 의미한다. 그리고 리워드는 차량 네트워크의 환경(Environment)에 대한 리워드를 의미한다. 즉, 복수의 차량들에 대한 위치정보, 할당된 리소스 정보 및 리워드는 차량 네트워크의 정책을 결정하기 위한 파라미터로 이용될 수 있다.On the other hand, the device for controlling the resource reallocation may use a roadside base station (Road Side Unit, RSU) installed in the infrastructure on the road as well as the base station on the ground. The device receives location information, allocated resource information, and rewards for a plurality of vehicles through a vehicle network. The vehicle location information refers to a value sensed by an antenna or sensor mounted in the vehicle. For example, when a plurality of vehicles pass through a specific section, a sensing result collected within a detection radius of a sensor mounted on each vehicle body may be used as location information. And, the resource information means information on a resource block currently allocated to the vehicle and information on the size of the resource block. And the reward means a reward for the environment (Environment) of the vehicle network. That is, location information about a plurality of vehicles, allocated resource information, and rewards may be used as parameters for determining a policy of a vehicle network.

한편 디바이스는 수신된 네트워크 파라미터를 이용하여 차량에 대한 자원 재할당 확률을 각각 계산할 수 있다. 여기에서 차량에 대한 자원은 통신을 위한 리소스 블록을 의미한다. 디바이스는 복수의 차량들에게 기 힐당된 리소스 블록을 그대로 유지할 것인지 아니면 새로운 리소스 블록을 할당할 것인지 계산할 수 있다. 이때, 단순히 자원 재할당을 위한 확률을 강화학습 모델을 이용하여 계산하는 것이 아니라 차량의 위치정보를 토대로 자원 재할당 가능성을 추정한 결과도 함께 이용할 수 있다. 일 실시예로, 차량의 위치정보 및 리워드를 토대로 차량의 자원 재할당에 대한 제 1 확률을 계산할 수 있다. 그리고, 강화학습 모델에 리소스 정보 및 리워드를 입력하여 자원 재할당에 대한 제 2 확률을 계산할 수 있다. 그리고 제 1 확률과 제 2 확률의 계산 결과를 토대로 차량 네트워크의 정책(Policy)을 결정할 수 있다.Meanwhile, the device may calculate a resource reallocation probability for each vehicle by using the received network parameter. Here, the resource for the vehicle means a resource block for communication. The device may calculate whether to maintain the resource block previously assigned to the plurality of vehicles or to allocate a new resource block. In this case, rather than simply calculating the probability for resource reallocation using the reinforcement learning model, the result of estimating the resource reallocation possibility based on vehicle location information can also be used. As an embodiment, a first probability for resource reallocation of a vehicle may be calculated based on location information and a reward of the vehicle. Then, the second probability for resource reallocation may be calculated by inputting resource information and a reward to the reinforcement learning model. In addition, a policy of the vehicle network may be determined based on the calculation result of the first probability and the second probability.

한편, 디바이스는 자원 재할당 확률을 계산하기 위한 강화학습 모델을 포함한다. 강화학습 모델은 액터 크리틱 네트워크를 이용할 수 있다. 디바이스는 차량 네트워크의 통신 상태에 따라 자신이 보유한 크리틱 네트워크를 이용할 수도 있고 차량에 저장된 크리틱 네트워크를 이용할 수도 있다. 즉, 크리틱 네트워크는 차량과 디바이스에 분산 저장될 수 있다. 예컨대, 차량 네트워크의 통신 환경이 양호하면 디바이스가 저장하는 크리틱 네트워크를 이용할 수 있고 차량 네트워크의 통신 환경이 불량하면 차량에 탑재된 크리닉 네트워크의 결과값(Value)을 수신할 수 있다. 전자의 경우는 디바이스에 집중된(Centralized) 형태의 모델일 수 있으며 후자의 경우는 차량과 디바이스에 분산된(Decentralized) 형태의 모델일 수 있다. 여기에서 차량 네트워크의 통신 환경의 양호 또는 불량을 판단하는 기준은 특정 차량의 주변에 다수의 차량이 몰려 있어서 데이터 간섭이 발생할 가능성에 따라 판단할 수 있다. 예컨대, 통신 환경의 간섭이 임계값 이상이면 복수개의 차량에 저장된 크리틱 네트워크를 이용하고 통신 환경의 간섭이 임계값 미만이면 디바이스에 저장된 크리틱 네트워크를 이용할 수 있다. 따라서 통신 환경에 따라 데이터 전송률을 조절하여 정책 결정에 따른 정확도를 유지할 수 있다.Meanwhile, the device includes a reinforcement learning model for calculating the resource reallocation probability. Reinforcement learning models can use actor-critic networks. The device may use the Critik network possessed by the device according to the communication state of the vehicle network, or may use the Critik network stored in the vehicle. That is, the Critik network can be distributed and stored in vehicles and devices. For example, if the communication environment of the vehicle network is good, the crit network stored by the device may be used, and if the communication environment of the vehicle network is poor, the result value of the clinic network mounted in the vehicle may be received. In the former case, it may be a model of a centralized type in the device, and in the latter case, it may be a model of a type of distributed (Decentralized) type in vehicles and devices. Here, the criterion for determining whether the communication environment of the vehicle network is good or bad may be determined according to the possibility that data interference may occur because a large number of vehicles are clustered around a specific vehicle. For example, if the interference in the communication environment is greater than or equal to the threshold value, the Critik network stored in the plurality of vehicles may be used, and if the interference in the communication environment is less than the threshold value, the Critik network stored in the device may be used. Therefore, it is possible to maintain the accuracy according to the policy decision by adjusting the data rate according to the communication environment.

한편, 상술한 바와 같이 복수의 차량 및 디바이스는 C-V2X(Cellular Vehicle-to-Everything) 기반 통신을 이용한다. 복수의 차량은 C-V2X의 업링크를 통해 위치정보, 할당된 리소스 정보 및 리워드를 디바이스에 네트워크 파라미터로 전송할 수 있다. 그리고, 디바이스는 C-V2X의 다운링크를 통해 정책을 전송할 수 있다.Meanwhile, as described above, a plurality of vehicles and devices use C-V2X (Cellular Vehicle-to-Everything) based communication. A plurality of vehicles may transmit location information, allocated resource information, and rewards as network parameters to the device through the uplink of C-V2X. And, the device may transmit the policy through the downlink of C-V2X.

한편, 디바이스는 자원 재할당 확률을 각각 계산한 결과에 따라 현재 리소스 블록의 재할당 확률이 높은 것으로 판단하면 현재 할당된 상태를 유지하고 재할당 확률이 낮은 것으로 판단하면 복수의 리소스 블록 후보군들 중 랜덤하게 결정된 하나를 차량에 대한 새로운 리소스 블록으로 결정할 수 있다. 전자의 경우에는 전송 속도(Data rate)가 저하되지 않을 것으로 예측한 경우이며 이 경우 현재 상태를 그대로 유지할 수 있다. 물론 다음 네트워크 파라미터가 수신되면 상술한 계산 과정을 다시 수행하게 되며 이 때 자원 재할당 확률이 낮아지면 리소스 블록을 새롭게 갱신할 수 있다.On the other hand, if the device determines that the reallocation probability of the current resource block is high according to the result of calculating the resource reallocation probability, the device maintains the currently allocated state, and if it is determined that the reallocation probability is low, a random among a plurality of resource block candidates The one determined to be a new resource block for the vehicle may be determined. In the former case, it is predicted that the data rate will not decrease, and in this case, the current state may be maintained. Of course, when the next network parameter is received, the above-described calculation process is performed again. At this time, if the resource reallocation probability decreases, the resource block may be newly updated.

한편, 디바이스는 차량의 자원 재할당 확률이 낮은 것으로 계산하면 센싱 윈도우 내 1000ms 이전까지 측정된 RSSI 값을 토대로 새로운 리소스 블록을 결정할 수 있다. 예컨대, RSSI 값을 리소스 블록 별로 평균을 계산하고 이 중 하위 20%의 값을 복수의 리소스 블록 후보군으로 결정할 수 있다. 이들 중 랜덤하게 결정된 하나의 리소스 블록이 차량에게 다시 할당하게 되는 리소스 블록으로 결정된다.Meanwhile, when the device calculates that the resource reallocation probability of the vehicle is low, the device may determine a new resource block based on the RSSI value measured before 1000 ms within the sensing window. For example, an average of the RSSI values may be calculated for each resource block, and a value of the lower 20% may be determined as a plurality of resource block candidate groups. Among them, one randomly determined resource block is determined as a resource block to be re-allocated to the vehicle.

한편, 상술한 바와 같이 디바이스는 차량의 위치 자원 재할당에 대한 제 1 확률과 제 2 확률을 계산한 결과를 토대로 정책을 결정한다. 일 실시예로, 제 1 확률을 계산한 결과 및 제 2 확률을 계산한 결과 각각에 가중치를 곱한 값의 평균을 차량 네트워크의 정책(Policy)으로 결정할 수 있다. 이때, 강화학습 모델이 사전에 충분히 학습되지 않은 시기에는 계산 결과에 곱해지는 가중치를 조절하는 것으로 결과값의 신뢰도를 높일 수 있다. 예컨대, 학습 초기에는 제 1 확률을 계산한 결과에 대한 가중치를 높은 값으로 설정하고 학습 후기에는 제 2 확률을 계산한 결과에 대한 가중치를 높은 값으로 설정할 수 있다. 따라서 차량 네트워크의 통신 상태를 고려하여 데이터 전송량을 조절하여 네트워크 신뢰도를 높일 수 있고 가중치를 조절하여 강화학습의 불안정한 결과값을 보완할 수 있다.On the other hand, as described above, the device determines a policy based on the result of calculating the first probability and the second probability for the location resource reallocation of the vehicle. As an embodiment, an average of values obtained by multiplying the result of calculating the first probability and the result of calculating the second probability by a weight may be determined as a policy of the vehicle network. In this case, when the reinforcement learning model is not sufficiently trained in advance, the reliability of the result value can be increased by adjusting the weight multiplied by the calculation result. For example, at the beginning of learning, the weight for the result of calculating the first probability may be set to a high value, and at the end of learning, the weight for the result of calculating the second probability may be set to a high value. Therefore, it is possible to increase the network reliability by adjusting the data transmission amount in consideration of the communication state of the vehicle network, and to compensate for the unstable result value of reinforcement learning by adjusting the weight.

도 2는 개시된 기술의 일 실시예에 따른 차량 네트워크의 자원 재할당 제어 방법에 대한 순서도이다. 도 2를 참조하면 차량 네트워크의 자원 재할당 제어 방법(200)은 차량 네트워크 파라미터를 수신하는 단계(210), 차량의 자원 재할당에 대한 제 1 확률 및 제 2 확률을 각각 계산하는 단계(220) 및 차량 네트워크의 정책을 결정하는 단계(230)를 포함한다. 차량 네트워크의 자원 재할당 제어 방법(200)은 디바이스를 통해 순차적으로 수행될 수 있다.2 is a flowchart of a method for controlling resource reallocation of a vehicle network according to an embodiment of the disclosed technology. Referring to FIG. 2 , the method 200 for controlling resource reallocation of a vehicle network includes receiving a vehicle network parameter ( 210 ), calculating a first probability and a second probability for resource reallocation of the vehicle, respectively ( 220 ) and determining (230) a policy of the vehicle network. The resource reallocation control method 200 of the vehicle network may be sequentially performed through a device.

210 단계에서 디바이스는 차량 네트워크를 통해 복수의 차량들에 대한 네트워크 파라미터를 수신한다. 네트워크 파라미터는 차량의 위치정보, 차량에 할당된 리소스 정보 및 리워드를 포함한다.In operation 210 , the device receives network parameters for a plurality of vehicles through a vehicle network. The network parameters include vehicle location information, resource information allocated to the vehicle, and rewards.

220 단계에서 디바이스는 네트워크 파라미터에 포함된 차량의 위치정보 및 리워드를 토대로 차량의 자원 재할당에 대한 제 1 확률을 계산한다. 그리고, 강화학습 모델에 리소스 정보 및 리워드를 입력하여 자원 재할당에 대한 제 2 확률을 계산한다. 제 1 확률을 계산하는 것은 차량의 센싱데이터를 토대로 차량이 향후 이동할 위치를 고려하여 자원 재할당 여부를 추정하는 것이며 제 2 확률을 계산하는 것은 현재 할당된 리소스 블록을 그대로 이용할 것인지 또는 다른 리소스 블록을 할당할 것인지 확률을 계산하는 것을 의미한다. 디바이스는 강화학습 모델을 이용하여 제 2 확률을 계산할 수 있다. 물론 제 1 확률의 계산에 있어서도 강화학습 모델을 이용할 수 있다.In step 220, the device calculates a first probability for resource reallocation of the vehicle based on the vehicle location information and the reward included in the network parameter. Then, a second probability for resource reallocation is calculated by inputting resource information and a reward to the reinforcement learning model. Calculating the first probability is estimating whether resources will be reallocated in consideration of the location where the vehicle will move in the future based on the sensing data of the vehicle, and calculating the second probability is whether to use the currently allocated resource block as it is or to use another resource block. It means calculating the probability of assigning. The device may calculate the second probability using the reinforcement learning model. Of course, the reinforcement learning model can also be used in the calculation of the first probability.

230 단계에서 디바이스는 제 1 확률 및 제 2 확률 각각의 계산 결과를 토대로 차량 네트워크의 정책(Policy)을 결정한다. 정책은 제 1 확률과 제 2 확률의 계산 결과의 가중 평균을 구한 값으로 결정할 수 있다. 예컨대, 제 1 확률의 계산 결과에 특정 값의 가중치를 곱하고 제 2 확률의 계산 결과에 서로 다른 가중치를 곱하고 두 결과값의 평균을 취하는 것으로 정책을 결정할 수 있다. 가중치의 비중은 강화학습 모델의 학습 정도에 따라 서로 달라질 수 있다. 예컨대, 학습 초기에는 제 1 확률의 계산 결과에 대한 가중치를 높은 값으로 설정하고 학습 후기에는 제 2 확률의 계산 결과에 대한 가중치를 높은 갚으로 설정할 수 있다.In step 230, the device determines a policy of the vehicle network based on the calculation results of each of the first probability and the second probability. The policy may be determined as a value obtained by calculating a weighted average of the calculation results of the first probability and the second probability. For example, the policy may be determined by multiplying the calculation result of the first probability by a weight of a specific value, multiplying the calculation result of the second probability by different weights, and taking the average of the two result values. The weight of the weight may vary depending on the learning degree of the reinforcement learning model. For example, at the beginning of learning, the weight for the calculation result of the first probability may be set to a high value, and the weight for the calculation result of the second probability may be set to be high in the late learning period.

도 3은 개시된 기술의 일 실시예에 따른 차량 네트워크의 자원 재할당 제어 장치에 대한 블록도이다. 도 3을 참조하면 차량 네트워크의 자원 재할당 제어 장치(300)는 안테나(310), 메모리(320), 및 프로세서(330)를 포함한다.3 is a block diagram of an apparatus for controlling resource reallocation of a vehicle network according to an embodiment of the disclosed technology. Referring to FIG. 3 , an apparatus 300 for controlling resource reallocation of a vehicle network includes an antenna 310 , a memory 320 , and a processor 330 .

안테나(310)는 복수의 차량들에 대한 위치정보, 할당된 리소스 정보 및 리워드를 수신한다. 안테나는 차량에 탑재된 방향성 안테나와 통신할 수 있는 장치이다. 안테나(310)는 기지국에 탑재된 안테나를 이용하거나 주파수 센서를 이용할 수 있다.The antenna 310 receives location information for a plurality of vehicles, allocated resource information, and a reward. The antenna is a device capable of communicating with a directional antenna mounted on a vehicle. The antenna 310 may use an antenna mounted on a base station or a frequency sensor.

메모리(320)는 복수의 차량들에 대한 정책을 결정하기 위해서 복수의 강화학습 모델을 저장한다. 메모리(320)는 소정의 저장공간을 갖는 하드디스크와 같은 장치로 구현된다. 메모리(320)는 강화학습 모델과 네트워크 파라미터를 저장할 수 있는 용량을 갖는다.The memory 320 stores a plurality of reinforcement learning models to determine a policy for a plurality of vehicles. The memory 320 is implemented as a device such as a hard disk having a predetermined storage space. The memory 320 has a capacity to store the reinforcement learning model and network parameters.

프로세서(330)는 복수의 강화학습 모델 중 제 1 강화학습 모델에 위치정보 및 리워드를 입력하여 복수의 차량의 자원 재할당에 대한 제 1 확률을 계산한다. 그리고, 복수의 강화학습 모델 중 제 2 강화학습 모델에 리소스 정보 및 리워드를 입력하여 자원 재할당에 대한 제 2 확률을 계산한다. 제 1 확률의 계산과 제 2 확률의 계산에는 각각 서로 다른 강화학습 모델을 이용할 수 있다. 예컨대, 제 1 확률을 계산하기 위해서 DQN(Deep Q-network)과 같은 모델을 이용할 수 있다. 물론 제 2 강화학습 모델과 마찬가지로 액터 크리틱 네트워크를 이용할 수도 있다. 프로세서(330)는 두 강화학습 모델을 통해 결과값이 출력되면 각각의 가중 평균을 계산하여 이를 네트워크의 정책으로 결정한다.The processor 330 calculates a first probability for resource reallocation of a plurality of vehicles by inputting location information and a reward to the first reinforcement learning model among the plurality of reinforcement learning models. Then, by inputting resource information and a reward to the second reinforcement learning model among the plurality of reinforcement learning models, a second probability for resource reallocation is calculated. For the calculation of the first probability and the calculation of the second probability, different reinforcement learning models may be used. For example, a model such as Deep Q-network (DQN) may be used to calculate the first probability. Of course, like the second reinforcement learning model, it is also possible to use an actor-critic network. When the result value is output through the two reinforcement learning models, the processor 330 calculates each weighted average and determines it as a policy of the network.

한편, 상술한 자원 재할당 제어 장치(300)는 컴퓨터와 같은 디바이스에서 실행될 수 있는 실행가능한 알고리즘을 포함하는 프로그램(또는 어플리케이션)으로 구현될 수 있다. 상기 프로그램은 일시적 또는 비일시적 판독 가능 매체(non-transitory computer readable medium)에 저장되어 제공될 수 있다.Meanwhile, the above-described resource reallocation control apparatus 300 may be implemented as a program (or application) including an executable algorithm that can be executed in a device such as a computer. The program may be provided by being stored in a temporary or non-transitory computer readable medium.

비일시적 판독 가능 매체란 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상술한 다양한 어플리케이션 또는 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리카드, ROM (read-only memory), PROM (programmable read only memory), EPROM(Erasable PROM, EPROM) 또는 EEPROM(Electrically EPROM) 또는 플래시 메모리 등과 같은 비일시적 판독 가능 매체에 저장되어 제공될 수 있다.The non-transitory readable medium refers to a medium that stores data semi-permanently, rather than a medium that stores data for a short moment, such as a register, cache, memory, and the like, and can be read by a device. Specifically, the various applications or programs described above are CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM (read-only memory), PROM (programmable read only memory), EPROM (Erasable PROM, EPROM) Alternatively, it may be provided by being stored in a non-transitory readable medium such as an EEPROM (Electrically EPROM) or flash memory.

일시적 판독 가능 매체는 스태틱 램(Static RAM，SRAM), 다이내믹 램(Dynamic RAM，DRAM), 싱크로너스 디램 (Synchronous DRAM，SDRAM), 2배속 SDRAM(Double Data Rate SDRAM，DDR SDRAM), 증강형 SDRAM(Enhanced SDRAM，ESDRAM), 동기화 DRAM(Synclink DRAM，SLDRAM) 및 직접 램버스 램(Direct Rambus RAM，DRRAM) 과 같은 다양한 RAM을 의미한다Temporarily readable media include Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), Enhanced SDRAM (Enhanced) SDRAM, ESDRAM), Synchronous DRAM (Synclink DRAM, SLDRAM) and Direct Rambus RAM (DRRAM)

도 4는 개시된 기술의 일 실시예에 따라 디바이스의 강화학습 모델을 이용하는 것을 나타낸 도면이다. 도 4를 참조하면 복수의 차량과 노변기지국으로 이루어진 차량 네트워크를 형성할 수 있으며 이러한 차량 네트워크의 데이터 프레임워크 구조는 C-V2X 모드 4 구조일 수 있다. 차량 네트워크의 통신 상태에 따라 업링크를 통해 전송되는 네트워크 파라미터가 달라질 수 있다. 도 4의 경우에는 차량 네트워크의 통신 상태가 양호한 상태이므로 업링크를 통해 A2C 알고리즘 동작에 필요한 네트워크 파라미터로 위치정보, 리소스 정보 및 리워드를 전송할 수 있다.4 is a diagram illustrating using a reinforcement learning model of a device according to an embodiment of the disclosed technology. Referring to FIG. 4 , a vehicle network including a plurality of vehicles and a roadside base station may be formed, and the data framework structure of the vehicle network may be a C-V2X mode 4 structure. Network parameters transmitted through the uplink may vary according to the communication state of the vehicle network. In the case of FIG. 4 , since the communication state of the vehicle network is in a good state, location information, resource information, and a reward may be transmitted as network parameters necessary for the A2C algorithm operation through the uplink.

도 5는 개시된 기술의 일 실시예에 따라 복수의 차량에 분산된 강화학습 모델을 이용하는 것을 나타낸 도면이다. 앞서 도 4의 경우에는 차량 네트워크의 통신 상태가 양호한 것을 가정하였으나 도 5의 경우에는 통신 상태가 고르지 못한 경우를 가정한다. 이 경우에는 크리틱 네트워크를 차량에 분산시키고 차량의 크리틱 네트워크를 통해 출력된 결과값을 네트워크 파라미터로 수신할 수 있다. 즉, 통신 상태에 따라 강화학습 모델의 구조가 디바이스에 집중된 형태이거나 디바이스와 차량에 분산된 형태일 수 있다.5 is a diagram illustrating the use of a reinforcement learning model distributed over a plurality of vehicles according to an embodiment of the disclosed technology. In the case of FIG. 4, it is assumed that the communication state of the vehicle network is good, but in the case of FIG. 5, it is assumed that the communication state is uneven. In this case, it is possible to distribute the critique network to the vehicle and receive the result value output through the critique network of the vehicle as a network parameter. That is, depending on the communication state, the structure of the reinforcement learning model may be in a form that is concentrated on the device or a form that is distributed between the device and the vehicle.

추가로 도 5에서는 각 차량이 크리틱 모델을 가지고 있기에 디바이스의 추정 기법을 사용하지 않는다는 전제 하에 위치정보이나 리워드를 주고 받지 않는 대신에 차량의 크리틱 네트워크에서 출력된 값(Value)과 리소스 정보(Rsc Info) 값을 업로드하여 통신량을 줄이는 효과를 볼 수 있다. 이와 같은 분산형 A2C 기법은 차량 별 채널 환경 등 통신 상황에 따라 업로드할 정보량을 다르게 가져갈 수 있다. 물론 일부 통신 환경이 좋은 차량들에 한해서는 추정 기법이 적용될 수 있다.In addition, in FIG. 5 , on the premise that the device estimation method is not used because each vehicle has a critical model, the value and resource information (Rsc Info) output from the vehicle's critical network instead of exchanging location information or rewards are not provided. ) by uploading the value, you can see the effect of reducing the communication amount. Such a distributed A2C technique can take different amounts of information to be uploaded depending on the communication conditions such as the channel environment for each vehicle. Of course, the estimation method may be applied only to vehicles with good communication environments.

한편, 아래 수학식 1과 같이 두 가지 기법을 모두 포함하는 수식을 정리할 수 있다.Meanwhile, an equation including both techniques can be summarized as shown in Equation 1 below.

[수학식 1][Equation 1]

가령 주변에 차량이 많이 위치해 간섭을 주거나 받을 확률이 높은 차량들의 리소스 블록을 더 빈번하게 바꿔줌으로써 러닝 초기에도 비교적 높은 성능을 노려볼 수 있다.For example, it is possible to aim for relatively high performance even in the early stages of running by changing the resource blocks of vehicles that are likely to cause or receive interference because there are many nearby vehicles.

도 6은 리소스 블록 후보군을 결정하는 것을 나타낸 도면이다. 도 6을 참조하면 차량 네트워크의 기본적인 데이터 프레임워크는 C-V2X 모드 4 프레임워크를 기반으로 한다. 각 차량들이 센싱 및 데이터 전송을 하던 중 일정 수의 패킷을 전송하게 되면 셀렉션 윈도우(Selection Window)에 도입하게 되고 자원 재할당 가능성(1-P, P=[0,0.8])에 따라 기존 통신에 활용하던 리소스 블록을 그대로 사용할지 혹은 새로운 리소스 블록 후보군들 중 랜덤하게 재할당할 것인지를 결정한다. 이 때, 센싱 윈도우(Sensing Window)로부터 1000ms 이전까지의 측정된 수신 세기(Received Signal Strength Indication, RSSI) 값을 리소스 블록 별로 평균을 내고 그 중 하위 20% 값을 새로운 리소스 블록 후보군으로 지정한다. 이들 중 랜덤하게 결정된 하나가 새로운 리소스 블록으로 결정된다.6 is a diagram illustrating determination of a resource block candidate group. Referring to FIG. 6 , the basic data framework of the vehicle network is based on the C-V2X mode 4 framework. When each vehicle transmits a certain number of packets during sensing and data transmission, it is introduced into the selection window and is applied to the existing communication according to the resource reallocation possibility (1-P, P=[0,0.8]). It is decided whether to use the used resource block as it is or to randomly reallocate it from among new resource block candidates. At this time, the received signal strength indication (RSSI) values measured before 1000 ms from the sensing window are averaged for each resource block, and a lower 20% value among them is designated as a new resource block candidate group. One randomly determined among them is determined as a new resource block.

개시된 기술의 일 실시예에 따른 차량 네트워크의 자원 재할당 제어 방법 및 장치는 이해를 돕기 위하여 도면에 도시된 실시 예를 참고로 설명되었으나, 이는 예시적인 것에 불과하며, 당해 분야에서 통상적 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시 예가 가능하다는 점을 이해할 것이다. 따라서, 개시된 기술의 진정한 기술적 보호범위는 첨부된 특허청구범위에 의해 정해져야 할 것이다.Although the method and apparatus for controlling resource reallocation of a vehicle network according to an embodiment of the disclosed technology have been described with reference to the embodiment shown in the drawings for better understanding, this is merely exemplary, and those of ordinary skill in the art It will be understood that various modifications and equivalent other embodiments are possible therefrom. Accordingly, the true technical protection scope of the disclosed technology should be defined by the appended claims.

Claims

receiving, by a device, location information, allocated resource information, and a reward for a plurality of vehicles through a vehicle network;
The device calculates a first probability for resource reallocation of the vehicle based on the location information and the reward, and inputs the resource information and reward to the reinforcement learning model to obtain a second probability for the resource reallocation calculating; and
and determining, by the device, a policy of the vehicle network based on the calculation results for the first probability and the second probability.

The method of claim 1,
The reinforcement learning model is a resource reallocation control method of a vehicle network including an actor network and a crit network.

The method of claim 1,
The reinforcement learning model includes an actor network, and a resource reallocation control method of a vehicle network operated by receiving a value output from the vehicle's critical network.

3. The method of claim 2,
The crit network is stored in the plurality of vehicles and the device, respectively, and if the interference of the communication environment is greater than or equal to a threshold value, the crit network stored in the plurality of vehicles is used; A method for controlling resource reallocation of a vehicle network using a stored crit network.

The method of claim 1,
The plurality of vehicles and the device use C-V2X (Cellular Vehicle-to-Everything) based communication,
The plurality of vehicles transmit the location information, allocated resource information, and reward as network parameters to the device through the uplink of the C-V2X,
The device transmits the policy through the downlink of the C-V2X resource reallocation control method of the vehicle network.

The method of claim 1,
A resource reallocation control method of a vehicle network for transmitting a sensing result collected within a detection radius of a sensor mounted on a vehicle body to the device as the location information when the plurality of vehicles pass through a specific section.

The method of claim 1,
The device is a resource reallocation control method of a vehicle network, characterized in that the RSU (Road Side Unit) installed in a base station on the ground or infrastructure on the road.

The method of claim 1, wherein calculating the reassignment probability comprises:
When the device calculates that the reallocation probability of the current resource block is high, it maintains the currently allocated state, and when the device calculates that the reallocation probability is low, a randomly determined one of a plurality of resource block candidate groups is selected as a new resource for the vehicle. A method of controlling resource reallocation of a vehicle network determined by blocks.

9. The method of claim 8,
The device calculates an average for each resource block of RSSI values measured before 1000 ms within a sensing window, and determines a value of the lower 20% as the plurality of resource block candidate groups.

The method of claim 1,
The device determines the policy by calculating a weighted average of the estimation result and the calculation result,
A resource reallocation control method of a vehicle network in which a weight for the estimation result is set to a high value at the beginning of learning and a weight for the calculation result is set to a high return in the late learning period.

An antenna for receiving location information, allocated resource information, and rewards for a plurality of vehicles;
a memory for storing a plurality of reinforcement learning models to determine a policy for the plurality of vehicles; and
Input the location information and the reward to a first reinforcement learning model among the plurality of reinforcement learning models to calculate a first probability for resource reallocation of the plurality of vehicles, and a second reinforcement learning model among the plurality of reinforcement learning models A processor configured to calculate a second probability for the resource reallocation by inputting the resource information and a reward to the , and determine a policy of a vehicle network based on the calculation results for the first probability and the second probability; Vehicle including a Resource reallocation control device in the network.

12. The method of claim 11,
The second reinforcement learning model is an apparatus for controlling resource reallocation of a vehicle network including an actor network and a critique network.

12. The method of claim 11,
The second reinforcement learning model includes an actor network, and an apparatus for controlling resource reallocation of a vehicle network that operates by receiving a value output from a critical network of the plurality of vehicles.

13. The method of claim 12,
The crit network is stored in the plurality of vehicles and the memory, respectively, and when the interference of the communication environment is greater than or equal to a threshold value, the crit network stored in the plurality of vehicles is used; A resource reallocation control device of the vehicle network using the stored crit network.

12. The method of claim 11,
The processor maintains the currently allocated resource block when it is calculated that the resource reallocation probability is high, and selects a randomly determined one of a plurality of resource block candidate groups when it is calculated that the resource reallocation probability is low for the plurality of vehicles. Resource reallocation control device of vehicle network determined by resource block.

16. The method of claim 15,
The processor calculates an average for each resource block of RSSI values measured before 1000 ms within a sensing window, and determines a value of a lower 20% of the RSSI values as the plurality of resource block candidate groups.