KR102257536B1

KR102257536B1 - A Method and Apparatus for Distributed Congestion Control of VANET using Reinforcement Learning Based on Neural Network Model

Info

Publication number: KR102257536B1
Application number: KR1020190174166A
Authority: KR
Inventors: 문철
Original assignee: 한국교통대학교산학협력단
Priority date: 2019-12-02
Filing date: 2019-12-24
Publication date: 2021-05-31

Abstract

The present invention relates to a method and an apparatus for distributed congestion control of a VANET using a reinforcement learning based neural network model. The distributed congestion control method of the VANET using the reinforcement learning based neural network model according to an embodiment of the present invention comprises the steps of: (a) calculating at least one among a channel busy ratio (CBR) of a wireless channel for a V2X communication service, and a channel occupancy ratio (CR), a movement speed and a transmission power of a vehicle; and (b) applying at least one among the CBR of the wireless channel for the V2X communication service, and the CR, the moving speed and the transmission power of the vehicle to the reinforcement learning-based neural network model to determine the maximum CR and the maximum transmission power of the vehicle. The present invention enables a V2X terminal to be optimally adapted to wave resource allocation.

Description

{A Method and Apparatus for Distributed Congestion Control of VANET using Reinforcement Learning Based on Neural Network Model}

본 발명은 VANET(Vehicular Ad-hoc network)에 관한 것으로, 더욱 상세하게는 강화학습 기반 신경망 모델을 이용한 VANET의 분산혼잡제어 방법 및 장치에 관한 것이다.The present invention relates to a VANET (Vehicular Ad-hoc network), and more particularly, to a method and apparatus for distributed congestion control of VANET using a neural network model based on reinforcement learning.

차량 간 충돌 회피(Collision Avoidance, CA) 안전서비스나 자율주행을 지원하기 위해서, 각 자동차의 V2X(Vehicular to Everything) 통신 단말기인 OBU(On Board unit)는 다양한 정보를 담고 있는 메시지를 다수의 주변 객체(자동차, 사람, 노변기지국(Road Side Unit, RSU) 등)들에게 동시에 전송(broadcast transmission)한다. Collision Avoidance (CA) Between vehicles To support safety services or autonomous driving, the V2X (Vehicular to Everything) communication terminal of each vehicle, OBU (On Board unit), sends a message containing various information to a number of surrounding objects. Simultaneously broadcast transmission to (automobiles, people, roadside units (RSU), etc.).

메시지를 전송하기 위해 각 차량은 경쟁 기반의 채널 접속(channel access)을 수행하게 되는데, 경쟁하는 차량의 수가 증가하게 되면 전송하는 메시지들이 충돌할 확률이 높아지고 이에 따라 CA 안전서비스나 자율주행 지원 서비스가 실패할 가능성이 높아질 수 있다. In order to transmit messages, each vehicle performs competition-based channel access.If the number of competing vehicles increases, the probability of transmitted messages colliding increases, and accordingly, CA safety service or autonomous driving support service It can increase the likelihood of failure.

따라서, 차량 수 증가에 따라 채널 접속 경쟁이 심해질 경우, 각 차량은 스스로 메시지의 전송률(transmit rate)을 낮추어 초당 발생하는 메시지 총량을 줄여 메시지 간 충돌 확률을 감소시키거나, 전송 전력을 낮추어서 메시지를 수신하는 공간적인 범위를 줄임으로서 경쟁하는 차량의 수를 줄이는 기술을 사용할 수 있다. 이러한 기술을 분산혼잡제어(Distributed Congestion Control, DCC)라고 하며, IEEE 802.11p 물리와 MAC(Medium Access Control) 기술을 사용하는 V2X 통신 기술을 위해 표준화 되었으며, 3GPP C-V2X 통신기술을 위해 표준화가 진행 중이다. Therefore, when the number of vehicles increases and the contention for channel access becomes severe, each vehicle reduces the transmission rate of messages by itself to reduce the total amount of messages generated per second to reduce the probability of collision between messages, or to receive messages by lowering the transmission power. Technology that reduces the number of competing vehicles can be used by reducing the spatial extent of the competition. This technology is called Distributed Congestion Control (DCC) and is standardized for V2X communication technology using IEEE 802.11p physical and MAC (Medium Access Control) technology, and standardization is in progress for 3GPP C-V2X communication technology. In progress.

이러한 종래 DCC 기술은 채널 접속 경쟁하는 차량들이 발생시키는 메시지의 총량인 채널 부하(channel load, CL)를 나타내는 파라미터인 CBR(Channel Busy Ratio)를 각 차량에서 측정하고, 각 차량에서 측정된 CBR에 따라 각 차량이 사용하는 채널 용량인 CR(channel occupancy ratio)을 각자 조절함으로써 채널 접속 경쟁 차량들이 발생시키는 채널 부하(channel load)의 총량을 조절할 수 있다.This conventional DCC technology measures CBR (Channel Busy Ratio), a parameter representing the channel load (CL), which is the total amount of messages generated by vehicles competing for channel access, and according to the CBR measured in each vehicle. By individually adjusting the channel occupancy ratio (CR), which is the channel capacity used by each vehicle, the total amount of channel load generated by competing vehicles for channel access can be adjusted.

종래 기술은 도 1과 같이 각 차량에서 측정된 CBR이 속한 구간에 따라 각 차량이 사용할 수 전파자원의 최대치(CR_limit)를 매핑하는 방식을 사용하고 있다. 도 1의 CBR 값과 그에 따른 CR_limit 값의 매핑 테이블은 다음과 같은 과정에 의해 유도된다.The prior art uses a method of mapping _{the maximum value (CR limit} ) of radio resources that can be used by each vehicle according to the section to which the CBR measured in each vehicle belongs as shown in FIG. 1. The mapping table of the CBR value of FIG. 1 and the corresponding CR _limit value is derived by the following process.

먼저, 각 차량의 V2X 수신기가 사용 가능한 주파수와 시간 차원의 전체 전파 자원 중에서, 수신 신호의 강도를 나타내는 RSSI(Received Signal Strength Indicator)가 정해진 기준을 초과하는 전파자원의 비율을 CBR로 측정한다. First, the ratio of the radio resources in which the RSSI (Received Signal Strength Indicator) indicating the strength of the received signal exceeds a predetermined criterion is measured by CBR among the total radio resources in the frequency and time dimensions that can be used by the V2X receiver of each vehicle.

이렇게 측정된 CBR 값을 이용하여 경쟁하고 있는 차량의 수, N_sta를 하기 <수학식 1>과 같이 추정한다.Using the measured CBR value, the number of competing vehicles, N _sta , is estimated as shown in Equation 1 below.

CBR이 나타내는 채널 사용도는 전파 자원을 이용하여 V2X 통신을 수행하는 차량 수와 비례하기 때문에, N_sta는 CBR을 입력 값으로 하는 임의의 함수 f()로 표현할 수 있다. Since the channel usage indicated by CBR is proportional to the number of vehicles performing V2X communication using radio resources, N _sta can be expressed as an arbitrary function f() using CBR as an input value.

각 차량이 사용할 수 있는 채널 용량의 최대치 CR_limit은 전파자원 총량의 최대치인 CR_{total_limit}을 N_sta로 나누어서 하기 <수학식 2>와 같이 구할 수 있다. _{The maximum CR limit} of the channel capacity that each vehicle can use can be _{obtained by dividing CR total_limit} , the maximum value of the total amount of radio resources, by N _sta , as shown in Equation 2 below.

따라서, 측정된 CBR을 이용하여 각 차량이 사용할 수 있는 채널 용량의 최대치 CR_limit을 결정하는 종래 기술의 정확도를 결정하는 가장 중요한 과정은 <수학식 1>의 측정된 CBR로부터 N_sta를 결정하는 과정이다. Therefore, the most important process of determining the accuracy of the prior art for determining _{the maximum CR limit} of the channel capacity that each vehicle can use using the measured CBR is the process of determining _{N sta from the measured CBR of <Equation 1>.} to be.

그러나, 도로의 형태(교차로, 직선, 도로 폭), 차량 이동 속도, V2X 서비스의 요구 QoS에 따른 전파 자원의 소요량 등에 따라 CBR을 입력 받아 N_sta를 결정하는 함수 f()가 달라지게 된다. However, depending on the type of road (intersection, straight line, road width), vehicle movement speed, and the required amount of radio wave resources according to QoS required for V2X service, the function f() for determining _{N sta by receiving CBR is changed.}

그러나, 종래기술은 하나의 f() 만을 정의하고 있기 때문에, 다양한 도로 환경, 차량의 운행 상태, V2X 서비스의 다양한 QoS에 적용할 수 있는 CR_limit을 결정할 수 없다. 또한, 종래 기술을 이용하여 다양한 도로 환경, 운행 상태 등에 적응하도록 하기 위해서는, 다양한 환경에 따라 수없이 많은 함수 f()와 그에 따른 매핑 테이블을 정의해야 하기 때문에, 실제 적용이 불가능하다. However, since the prior art only defines one f(), it is not possible to determine _{a CR limit} applicable to various road environments, vehicle driving conditions, and various QoS of V2X services. In addition, in order to adapt to various road environments, driving conditions, etc. using the prior art, since numerous functions f() and corresponding mapping tables must be defined according to various environments, practical application is not possible.

[특허문헌 1] 한국등록특허 제10-2002807호[Patent Document 1] Korean Patent Registration No. 10-2002807

본 발명은 전술한 문제점을 해결하기 위하여 창출된 것으로, 강화학습 기반 신경망 모델을 이용한 VANET의 분산혼잡제어 방법 및 장치를 제공하는 것을 그 목적으로 한다.The present invention was created to solve the above-described problem, and an object of the present invention is to provide a method and apparatus for controlling distributed congestion of a VANET using a neural network model based on reinforcement learning.

또한, 본 발명은 강화학습 기반의 신경망을 이용하여, VANET(Vehicular Ad-hoc network)에서 다양한 도로 환경과 차량 운행 상태, 그리고 다양한 V2X 통신 서비스의 상이한 QoS를 고려하여, 차량 V2X 단말기 스스로 전파자원 할당을 최적으로 적응하는 분산혼잡제어 방법 및 장치를 제공하는 것을 그 목적으로 한다. In addition, the present invention uses a reinforcement learning-based neural network, in consideration of various road environments, vehicle operation conditions, and different QoS of various V2X communication services in VANET (Vehicular Ad-hoc network), and allocates radio resources by itself to a vehicle V2X terminal. It is an object of the present invention to provide a distributed congestion control method and apparatus that optimally adapts to.

본 발명의 목적들은 이상에서 언급한 목적들로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 명확하게 이해될 수 있을 것이다.The objects of the present invention are not limited to the objects mentioned above, and other objects that are not mentioned will be clearly understood from the following description.

상기한 목적들을 달성하기 위하여, 강화학습 기반 신경망 모델을 이용한 VANET의 분산혼잡제어 방법은 (a) V2X 통신 서비스를 위한 무선채널의 채널혼잡 비율(channel busy ratio, CBR), 차량의 채널점유 비율(channel occupancy ratio, CR), 이동속도 및 송신전력 중 적어도 하나를 산출하는 단계; 및 (b) 상기 V2X 통신 서비스를 위한 무선채널의 채널혼잡 비율, 상기 차량의 채널점유 비율, 이동속도, 및 송신전력 중 적어도 하나를 강화학습 기반 신경망 모델에 적용하여, 상기 차량의 최대 채널점유 비율 및 최대 송신전력을 결정하는 단계; 를 포함할 수 있다. In order to achieve the above objectives, VANET's distributed congestion control method using a reinforcement learning-based neural network model includes (a) a channel busy ratio (CBR) of a wireless channel for a V2X communication service, and a channel occupancy ratio of a vehicle ( calculating at least one of channel occupancy ratio, CR), movement speed, and transmission power; And (b) applying at least one of a channel congestion ratio of a radio channel for the V2X communication service, a channel occupancy ratio of the vehicle, a movement speed, and a transmission power to a neural network model based on reinforcement learning, and the maximum channel occupancy ratio of the vehicle. And determining a maximum transmission power. It may include.

실시예에서, 상기 (b) 단계는, 상기 V2X 통신 서비스를 위한 무선채널의 채널혼잡 비율, 상기 차량의 채널점유 비율, 이동속도, 및 송신전력 중 적어도 하나를 상기 강화학습 기반 신경망 모델에 적용하여 상기 강화학습 기반 신경망 모델의 결과값을 산출하는 단계;를 포함할 수 있다. In an embodiment, in the step (b), at least one of a channel congestion ratio of a radio channel for the V2X communication service, a channel occupancy ratio of the vehicle, a movement speed, and a transmission power is applied to the reinforcement learning-based neural network model. It may include; calculating a result value of the reinforcement learning-based neural network model.

실시예에서, 상기 (b) 단계는, 상기 강화학습 기반 신경망 모델의 결과값에 따라 상기 차량의 최대 채널점유 비율 및 최대 송신전력을 결정하는 단계;를 포함할 수 있다. In an embodiment, step (b) may include determining a maximum channel occupancy ratio and a maximum transmission power of the vehicle according to a result value of the reinforcement learning-based neural network model.

실시예에서, 상기 (b) 단계는, 상기 차량의 최대점유 비율 및 최대 송신전력에 따라, 상기 채널혼잡 비율이 목표로 하는 타겟 채널혼잡 비율에 근접한 정도를 즉각 보상 값으로 결정하는 단계와 전송률(throughput)과 시간 지연(latency)의 목표 값에 대한 달성도를 미래 보상 값으로 결정하는 단계를 포함할 수 있다. In an embodiment, the step (b) includes: immediately determining a degree that the channel congestion ratio approaches a target target channel congestion ratio as a compensation value according to the maximum occupancy ratio and the maximum transmission power of the vehicle, and the transmission rate ( throughput) and time delay (latency) may include determining a degree of achievement for a target value as a future compensation value.

실시예에서, 상기 (b) 단계는, 상기 즉각 보상 값과 미래 보상 값을 결합한 현재 전체 보상 값을 이용하여 상기 강화학습 기반 신경망 모델을 학습시키는 단계;를 포함할 수 있다. In an embodiment, the step (b) may include training the reinforcement learning-based neural network model by using the current total compensation value obtained by combining the immediate compensation value and the future compensation value.

실시예에서, 상기 (b) 단계는, 상기 현재 전체 보상 값과 직전의 전체 보상 값의 차이(difference)가 임계값보다 큰지 여부를 결정하는 단계와 상기 현재 전체 보상 값과 직전의 전체 보상 값의 차이(difference)가 임계값보다 큰 경우, 상기 차량의 최대 채널점유 비율 및 최대 송신전력에 따라, 상기 V2X 통신 서비스를 위한 메시지를 송신할 수 있다. In an embodiment, the step (b) includes determining whether a difference between the current total compensation value and the previous total compensation value is greater than a threshold value, and the current total compensation value and the previous total compensation value. When the difference is greater than the threshold value, the message for the V2X communication service may be transmitted according to the maximum channel occupancy ratio and the maximum transmission power of the vehicle.

실시예에서, 강화학습 기반 신경망 모델을 이용한 VANET의 분산혼잡제어 장치는 V2X 통신 서비스를 위한 무선채널의 채널혼잡 비율(channel busy ratio, CBR), 상기 차량의 채널점유 비율(channel occupancy ratio, CR), 이동속도, 및 송신전력 중 적어도 하나를 결정하고, 상기 V2X 통신 서비스를 위한 무선채널의 채널혼잡 비율, 상기 차량의 채널점유 비율, 이동속도 및 송신전력 중 적어도 하나를 강화학습 기반 신경망 모델에 적용하여, 상기 차량의 최대 채널점유 비율 및 최대 송신전력을 결정하는 제어부;를 포함할 수 있다. In an embodiment, the VANET distributed congestion control apparatus using the reinforcement learning-based neural network model includes a channel busy ratio (CBR) of a wireless channel for a V2X communication service, and a channel occupancy ratio (CR) of the vehicle. , Movement speed, and transmission power, and at least one of the channel congestion ratio of the wireless channel for the V2X communication service, the channel occupancy ratio of the vehicle, the movement speed and the transmission power is applied to the reinforcement learning-based neural network model Thus, it may include a; a control unit for determining the maximum channel occupancy ratio and the maximum transmission power of the vehicle.

실시예에서, 상기 제어부는, 상기 V2X 통신 서비스를 위한 무선채널의 채널혼잡 비율, 상기 차량의 채널점유 비율, 이동속도, 및 송신전력 중 적어도 하나를 상기 강화학습 기반 신경망 모델에 적용하여 상기 강화학습 기반 신경망 모델의 결과값을 산출할 수 있다. In an embodiment, the control unit applies at least one of a channel congestion ratio of a wireless channel for the V2X communication service, a channel occupancy ratio of the vehicle, a movement speed, and a transmission power to the reinforcement learning-based neural network model, and the reinforcement learning It is possible to calculate the result value of the underlying neural network model.

실시예에서, 상기 제어부는, 상기 강화학습 기반 신경망 모델의 결과값에 따라 상기 차량의 최대 채널점유 비율 및 최대 송신전력을 결정할 수 있다. In an embodiment, the controller may determine a maximum channel occupancy ratio and a maximum transmission power of the vehicle according to a result value of the reinforcement learning-based neural network model.

실시예에서, 상기 제어부는, 상기 채널혼잡 비율이 목표로 하는 타겟 채널혼잡 비율에 근접한 정도를 즉각 보상 값으로 결정하고, 전송률(throughput)과 시간 지연(latency)의 목표 값에 대한 달성도를 미래 보상 값으로 결정할 수 있다. In an embodiment, the control unit immediately determines a degree that the channel congestion ratio is close to a target target channel congestion ratio as a compensation value, and compensates for a degree of achievement of a target value of throughput and time delay in the future. Can be determined by value.

실시예에서, 상기 제어부는, 상기 즉각적 보상 값과 미래 보상 값을 결합한 현재 전체 보상 값을 이용하여 상기 강화학습 기반 신경망 모델을 학습시킬 수 있다. In an embodiment, the controller may train the reinforcement learning-based neural network model by using a current total compensation value obtained by combining the immediate compensation value and the future compensation value.

실시예에서, 상기 제어부는, 상기 현재 전체 보상 값과 직전의 전체 보상 값의 차이(difference)가 임계값보다 큰지 여부를 결정하고, 상기 현재 전체 보상 값과 직전의 전체 보상 값의 차이(difference)가 임계값보다 큰 경우, 상기 차량의 최대 채널점유 비율 및 최대 송신전력에 따라, 상기 V2X 통신 서비스를 위한 메시지를 송신하는 통신부;를 더 포함할 수 있다. In an embodiment, the control unit determines whether a difference between the current total compensation value and the previous total compensation value is greater than a threshold value, and the difference between the current total compensation value and the previous total compensation value When is greater than the threshold value, a communication unit for transmitting a message for the V2X communication service according to the maximum channel occupancy rate and maximum transmission power of the vehicle; may be further included.

본 발명의 일 실시예에 따른 강화학습 기반의 VANET을 위한 분산혼잡제어 방법은 차량 V2X 단말에서 채널혼잡도 CBR을 추정하고, 현재 차량의 전파자원 점유율 CR, 이동속도, 송신 전력, 그리고 V2X 서비스 QoS을 추정하고, 상기 채널혼잡도 CBR, 전파 자원 점유율 CR, 이동속도, 송신 전력, V2X 서비스 QoS를 입력으로 삼는 강화학습 기반의 신경망을 이용하여 해당 차량이 사용할 수 있는 전파자원 점유율의 최대치 CR_limit과 송신전력 최대치 P_limit를 결정할 수 있다. The distributed congestion control method for VANET based on reinforcement learning according to an embodiment of the present invention estimates the channel congestion degree CBR in the vehicle V2X terminal, and determines the current vehicle's radio resource share CR, movement speed, transmission power, and V2X service QoS. Using a reinforcement learning-based neural network that takes the channel congestion CBR, radio resource share CR, movement speed, transmission power, and V2X service QoS as inputs, the maximum CR _limit and transmission power of the radio resource share that the vehicle can use. The maximum value P _limit can be determined.

본 발명의 일 실시예에 따른 강화학습 기반의 신경망을 이용한 VANET 분산혼잡제어 장치는 신경망의 변수와 추후학습을 위해 행동과 그에 따른 보상 등을 저장하는 저장장치, 현재 채널의 채널혼잡도 CBR과 자차(ego vehicle)의 현재 전파자원 점유율 CR, 송신 전력 P, 이동속도 v, 그리고 V2X 서비스의 QoS를 추정하는 데이터수집장치, 그리고 상기 채널혼잡도 CBR, 전파 자원 점유율 CR, 이동속도, 송신 전력, V2X 서비스 QoS를 입력으로 삼는 강화학습 기반의 신경망을 이용하여 해당 차량이 사용할 수 있는 전파자원 점유율의 최대치 CR_limit과 송신전력 최대치 P_limit를 결정하는 제어장치를 포함할 수 있다. The VANET distributed congestion control device using a reinforcement learning-based neural network according to an embodiment of the present invention includes a storage device that stores variables of the neural network and actions and compensation for later learning, and the channel congestion degree CBR of the current channel and the own difference ( ego vehicle)'s current radio resource occupancy CR, transmission power P, movement speed v, and a data collection device that estimates QoS of V2X service, and the channel congestion CBR, radio resource occupancy CR, movement speed, transmission power, V2X service QoS It may include a control device that determines the _{maximum value CR limit} of the radio resource share that the vehicle can use and the maximum transmission power value P _limit by using a reinforcement learning-based neural network taking as an input.

상기한 목적들을 달성하기 위한 구체적인 사항들은 첨부된 도면과 함께 상세하게 후술될 실시예들을 참조하면 명확해질 것이다.Detailed matters for achieving the above objects will become apparent with reference to embodiments to be described later in detail together with the accompanying drawings.

그러나, 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라, 서로 다른 다양한 형태로 구성될 수 있으며, 본 발명의 개시가 완전하도록 하고 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자(이하, "통상의 기술자")에게 발명의 범주를 완전하게 알려주기 위해서 제공되는 것이다.However, the present invention is not limited to the embodiments disclosed below, but may be configured in various different forms, so that the disclosure of the present invention is complete and those of ordinary skill in the technical field to which the present invention pertains ( Hereinafter, it is provided in order to completely inform the scope of the invention to the "normal engineer").

본 발명의 일 실시예에 의하면, VANET(Vehicular Ad-hoc network)에서 다양한 도로 환경과 차량 운행 상태, 그리고 다양한 V2X 통신 서비스의 상이한 QoS를 고려하여, 차량 V2X 단말기 스스로 전파자원 할당을 최적으로 적응할 수 있다. According to an embodiment of the present invention, in consideration of various road environments, vehicle operation states, and different QoS of various V2X communication services in a vehicle ad-hoc network (VANET), a vehicle V2X terminal can optimally adapt radio resource allocation by itself. have.

본 발명의 효과들은 상술된 효과들로 제한되지 않으며, 본 발명의 기술적 특징들에 의하여 기대되는 잠정적인 효과들은 아래의 기재로부터 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the above-described effects, and the potential effects expected by the technical features of the present invention will be clearly understood from the following description.

도 1은 종래의 CBR과 그에 따른 CR_limit의 매핑 테이블을 도시한 도면이다.
도 2는 본 발명의 일 실시예에 따른 강화학습 기반 신경망 모델을 이용한 VANET의 분산혼잡제어 장치를 도시한 도면이다.
도 3은 본 발명의 일 실시예에 따른 강화학습 기반 신경망 모델을 이용한 VANET의 분산혼잡제어 장치의 동작 방법을 도시한 도면이다.
도 4는 본 발명의 일 실시예에 따른 분산혼잡제어 장치의 제어부에 대한 기능적 구성을 도시한 도면이다.
도 5는 본 발명의 일 실시예에 따른 강화학습 기반 신경망 모델의 학습을 위한 분산혼잡제어 장치의 제어부의 동작 방법을 도시한 도면이다.
도 6은 본 발명의 일 실시예에 따른 기 학습된 강화학습 기반 신경망 모델을 이용한 VANET의 분산혼잡제어 장치의 상세 분산혼잡제어 동작 방법을 도시한 도면이다.1 is a diagram showing a mapping table of a conventional CBR and a CR _{limit according thereto.}
2 is a diagram illustrating a distributed congestion control apparatus of VANET using a neural network model based on reinforcement learning according to an embodiment of the present invention.
3 is a diagram illustrating a method of operating a distributed congestion control apparatus of VANET using a neural network model based on reinforcement learning according to an embodiment of the present invention.
4 is a diagram showing a functional configuration of a control unit of a distributed congestion control apparatus according to an embodiment of the present invention.
5 is a diagram illustrating a method of operating a control unit of a distributed congestion control apparatus for learning a neural network model based on reinforcement learning according to an embodiment of the present invention.
6 is a diagram illustrating a detailed distributed congestion control operation method of a distributed congestion control apparatus of VANET using a previously learned reinforcement learning-based neural network model according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고, 여러 가지 실시예들을 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 이를 상세히 설명하고자 한다. In the present invention, various modifications may be made and various embodiments may be provided, and specific embodiments will be illustrated in the drawings and described in detail.

청구범위에 개시된 발명의 다양한 특징들은 도면 및 상세한 설명을 고려하여 더 잘 이해될 수 있을 것이다. 명세서에 개시된 장치, 방법, 제법 및 다양한 실시예들은 예시를 위해서 제공되는 것이다. 개시된 구조 및 기능상의 특징들은 통상의 기술자로 하여금 다양한 실시예들을 구체적으로 실시할 수 있도록 하기 위한 것이고, 발명의 범위를 제한하기 위한 것이 아니다. 개시된 용어 및 문장들은 개시된 발명의 다양한 특징들을 이해하기 쉽게 설명하기 위한 것이고, 발명의 범위를 제한하기 위한 것이 아니다.Various features of the invention disclosed in the claims may be better understood in view of the drawings and detailed description. The apparatus, method, preparation method, and various embodiments disclosed in the specification are provided for illustration purposes. The disclosed structural and functional features are intended to enable a person skilled in the art to specifically implement various embodiments, and are not intended to limit the scope of the invention. The disclosed terms and sentences are intended to describe various features of the disclosed invention in an easy to understand manner, and are not intended to limit the scope of the invention.

본 발명을 설명함에 있어서, 관련된 공지기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우, 그 상세한 설명을 생략한다.In describing the present invention, when it is determined that a detailed description of a related known technology may unnecessarily obscure the subject matter of the present invention, a detailed description thereof will be omitted.

이하, 본 발명의 일 실시예에 따른 강화학습 기반 신경망 모델을 이용한 VANET의 분산혼잡제어 방법 및 장치를 설명한다.Hereinafter, a method and apparatus for controlling distributed congestion of VANET using a neural network model based on reinforcement learning according to an embodiment of the present invention will be described.

본 발명은 강화학습 기반의 신경망을 이용하여, VANET(Vehicular Ad-hoc network)에서 다양한 도로 환경과 차량 운행 상태, 그리고 다양한 V2X 통신 서비스의 상이한 QoS를 고려하여, 차량 V2X 단말기 스스로 전파자원 할당을 최적으로 적응하는 분산혼잡제어 방법 및 장치를 제공하는 것을 그 목적으로 한다. The present invention uses a reinforcement learning-based neural network, in consideration of various road environments, vehicle operation conditions, and different QoS of various V2X communication services in a VANET (Vehicular Ad-hoc network), and optimizes the allocation of radio resources by itself to a vehicle V2X terminal. It is an object of the present invention to provide a method and apparatus for distributed congestion control that is adapted to.

도 2는 본 발명의 일 실시예에 따른 강화학습 기반 신경망 모델을 이용한 VANET(Vehicle Ad Hoc Network)의 분산혼잡제어 장치(100)를 도시한 도면이다.2 is a diagram illustrating a distributed congestion control apparatus 100 of a vehicle ad hoc network (VANET) using a neural network model based on reinforcement learning according to an embodiment of the present invention.

도 2를 참고하면, 분산혼잡제어 장치(100)는 V2X 통신 서비스를 제공하는 차량과 통합되어 하나의 엔티티(entity)로 구성되거나 분리되어 별도의 엔티티로 구성될 수 있다. 예를 들어, 분산혼잡제어 장치(100)는 각 차량의 V2X(Vehicle to Everything) 통신기 단말기 내부에 위치할 수 있다.Referring to FIG. 2, the distributed congestion control apparatus 100 may be integrated with a vehicle providing a V2X communication service and configured as a single entity, or may be separated and configured as a separate entity. For example, the distributed congestion control apparatus 100 may be located inside a vehicle to everything (V2X) communication terminal of each vehicle.

분산혼합제어 장치(100)는 제어부(101), 통신부(102), 저장부(103) 및 차량네트워크 인터페이스부(104)를 포함할 수 있다. 이 경우, 통신부(102)는 V2X 수신부(105) 및 V2X 송신부(106)를 포함할 수 있다. The distributed mixing control apparatus 100 may include a control unit 101, a communication unit 102, a storage unit 103, and a vehicle network interface unit 104. In this case, the communication unit 102 may include a V2X receiving unit 105 and a V2X transmitting unit 106.

제어부(101)는 V2X 수신부(105)의 정보를 이용하여 현재 무선채널의 채널혼잡 비율(channel busy ratio, CBR)과 자 차(ego vehicle)의 현재 채널점유 비율(channel occupancy ratio, CR)과 송신전력 P를 추정할 수 있다. The control unit 101 transmits the channel busy ratio (CBR) of the current wireless channel and the current channel occupancy ratio (CR) of the ego vehicle using the information of the V2X receiver 105. Power P can be estimated.

일 실시예에서, 채널 혼잡 비율은 채널 혼잡도 또는 이와 동등한 기술적 의미를 갖는 용어로 지칭될 수 있다. 또한, 채널점유 비율은 전파자원 점유율 또는 이와 동등한 기술적 의미를 갖는 용어로 지칭될 수 있다. In one embodiment, the channel congestion rate may be referred to as a channel congestion degree or a term having a technical meaning equivalent thereto. In addition, the channel occupancy ratio may be referred to as radio resource occupancy or a term having a technical meaning equivalent thereto.

제어부(101)는 차량네트워크 인터페이스부(104)를 통해, 자 차의 이동속도 v와 V2X 서비스의 QoS(quality of service)를 추정할 수 있다. The control unit 101 may estimate the moving speed v of the own vehicle and the quality of service (QoS) of the V2X service through the vehicle network interface unit 104.

제어부(101)는 무선채널의 채널혼잡 비율 CBR, 차량의 채널점유 비율 CR, 이동속도 v, 송신전력 P, 그리고 V2X 서비스의 QoS를 입력으로 하는 강화학습 기반 신경망 모델을 이용하여 각 차량이 사용할 수 있는 최대 채널점유 비율 CR_limit과 최대 송신전력 P_limit를 결정할 수 있다.The control unit 101 can be used by each vehicle using a reinforcement learning-based neural network model that inputs the channel congestion ratio CBR of the wireless channel, the channel occupancy ratio CR of the vehicle, the movement speed v, the transmission power P, and the QoS of the V2X service. The maximum channel occupancy ratio CR _limit and the maximum transmission power P _limit can be determined.

일 실시예에서, 제어부(101)는 적어도 하나의 프로세서 또는 마이크로(micro) 프로세서를 포함하거나, 또는, 프로세서의 일부일 수 있다. 또한, 제어부(101)는 CP(communication processor)라 지칭될 수 있다. 제어부(101)는 본 발명의 다양한 실시예에 따른 분산혼잡제어 장치(100)의 동작을 제어할 수 있다. In an embodiment, the control unit 101 may include at least one processor or a micro processor, or may be a part of a processor. Also, the control unit 101 may be referred to as a communication processor (CP). The controller 101 may control the operation of the distributed congestion control apparatus 100 according to various embodiments of the present disclosure.

저장부(103)는 강화학습 기반 신경망 모델의 변수와 추후학습을 위해 행동(최대 채널점유 비율CR_limit과 최대 송신전력 P_limit결정 및 수행)과 그에 따른 보상값(목표 성능에 대한 달성도)을 저장할 수 있다.The storage unit 103 stores the variables of the reinforcement learning-based neural network model and actions ( _{determining and performing the maximum channel occupancy ratio CR limit} and the maximum transmission power P _limit ) and compensation values (achievement of the target performance) for later learning. I can.

일 실시예에서, 저장부(103)는 휘발성 메모리, 비휘발성 메모리 또는 휘발성 메모리와 비휘발성 메모리의 조합으로 구성될 수 있다. 그리고, 저장부(103)는 제어부(101)의 요청에 따라 저장된 데이터를 제공할 수 있다.In one embodiment, the storage unit 103 may be formed of a volatile memory, a nonvolatile memory, or a combination of a volatile memory and a nonvolatile memory. In addition, the storage unit 103 may provide stored data according to the request of the control unit 101.

통신부(102)는 분산혼잡제어를 수행하여 결정된 최대 채널점유 비율 CR_limit과 최대 송신전력 P_limit를 이용하여 V2X 서비스 메시지를 송신할 수 있다. The communication unit 102 may transmit the V2X service message using the _{maximum channel occupancy ratio CR limit} and the maximum transmission power P _limit determined by performing distributed congestion control.

일 실시예에서, 통신부(102)는 유선 통신 모듈 및 무선 통신 모듈 중 적어도 하나를 포함할 수 있다. 통신부(102)의 전부 또는 일부는 '송신부', '수신부' 또는 '송수신부(transceiver)'로 지칭될 수 있다.In one embodiment, the communication unit 102 may include at least one of a wired communication module and a wireless communication module. All or part of the communication unit 102 may be referred to as a'transmitter', a'receiver', or a'transceiver'.

각 분산혼잡제어 장치(100)로부터 송신된 V2X 서비스 메시지들은 VANET 무선 채널을 통해 전송될 수 있다. 이 경우, 각 차량의 V2X 송신부(106)가 발생시키는 메시지의 총량인 채널 부하(channel load, CL)에 의해 채널혼잡 비율 CBR(Channel Busy Ratio)이 결정될 수 있다. V2X service messages transmitted from each distributed congestion control device 100 may be transmitted through a VANET wireless channel. In this case, a channel congestion ratio CBR (Channel Busy Ratio) may be determined by a channel load (CL), which is the total amount of messages generated by the V2X transmitter 106 of each vehicle.

CBR은 다음 단계 메시지 전송을 위해 각 분산혼잡제어 장치(100)의 V2X 수신부(105)와 제어부(101)에 의해 추정될 수 있다.The CBR may be estimated by the V2X receiving unit 105 and the control unit 101 of each distributed congestion control device 100 for transmission of the next step message.

도 2를 참고하면, 분산혼잡제어 장치(100)는 제어부(101), 통신부(102), 저장부(103) 및 차량네트워크 인터페이스부(104)를 포함할 수 있다. 본 발명의 다양한 실시 예들에서 분산혼잡제어 장치(100)는 도 2에 설명된 구성들이 필수적인 것은 아니어서, 도 2에 설명된 구성들보다 많은 구성들을 가지거나, 또는 그보다 적은 구성들을 가지는 것으로 구현될 수 있다.Referring to FIG. 2, the distributed congestion control apparatus 100 may include a control unit 101, a communication unit 102, a storage unit 103, and a vehicle network interface unit 104. In various embodiments of the present invention, since the configurations described in FIG. 2 are not essential, the distributed congestion control apparatus 100 may have more or fewer configurations than those described in FIG. 2. I can.

도 3은 본 발명의 일 실시예에 따른 강화학습 기반 신경망 모델을 이용한 VANET의 분산혼잡제어 장치(100)의 동작 방법을 도시한 도면이다.3 is a diagram illustrating a method of operating a distributed congestion control apparatus 100 for VANET using a neural network model based on reinforcement learning according to an embodiment of the present invention.

도 3을 참고하면, S201 단계는, V2X 통신 서비스를 위한 무선 채널의 현재 채널혼잡 비율, 차량의 현재 채널점유 비율, 차량의 현재 이동속도 및 송신전력 중 적어도 하나를 산출하는 단계이다. Referring to FIG. 3, step S201 is a step of calculating at least one of a current channel congestion rate of a wireless channel for a V2X communication service, a current channel occupancy rate of a vehicle, a current moving speed of the vehicle, and a transmission power.

S203 단계는, V2X 통신 서비스를 위한 무선 채널의 현재 채널혼잡 비율, 차량의 채널점유 비율, 이동속도, 및 송신전력 중 적어도 하나를 강화학습 기반 신경망 모델에 적용하여, 다음 메시지 전송을 위한 차량의 최대 채널점유 비율 및 최대 송신전력을 결정하는 단계이다. Step S203, by applying at least one of the current channel congestion rate of the wireless channel for the V2X communication service, the channel occupancy rate of the vehicle, the movement speed, and the transmission power to the reinforcement learning-based neural network model, the maximum of the vehicle for the next message transmission. This is the step of determining the channel occupancy ratio and the maximum transmission power.

일 실시예에서, V2X 통신을 위한 무선채널의 현재 채널혼잡 비율, 차량의 현재 채널점유 비율, 이동속도, 및 송신전력 중 적어도 하나를 강화학습 기반 신경망 모델에 적용하여 강화학습 기반 신경망 모델의 결과값을 산출할 수 있다. In one embodiment, a result of a reinforcement learning-based neural network model by applying at least one of a current channel congestion rate of a wireless channel for V2X communication, a current channel occupancy rate of a vehicle, a moving speed, and a transmission power to a reinforcement learning-based neural network model. Can be calculated.

일 실시예에서, 강화학습 기반 신경망 모델의 결과값에 따라 차량의 최대 채널점유 비율 및 최대 송신전력을 결정할 수 있다. In an embodiment, the maximum channel occupancy ratio and the maximum transmission power of the vehicle may be determined according to the result value of the reinforcement learning-based neural network model.

일 실시예에서, 차량의 최대점유 비율 및 최대 송신전력에 따라, 채널혼잡 비율 CBR이 목표로 하는 타겟 채널혼잡 비율인 CBR_target에 근접한 정도를 즉각 보상 값으로 결정할 수 있다.In an embodiment, according to the maximum occupancy ratio and the maximum transmission power of the vehicle, a degree close to CBR_target, which is a target channel congestion ratio of the channel congestion ratio CBR, may be immediately determined as a compensation value.

일 실시예에서, 차량의 최대점유 비율 및 최대 송신전력에 따라, 요구되는 전송률(throughput)과 시간 지연(latency)을 달성하고 있는지의 여부 또는 목표 값에 대한 현재 값의 달성도를 미래 보상 값으로 결정할 수 있다.In one embodiment, depending on the maximum occupancy ratio and the maximum transmission power of the vehicle, whether or not the required throughput and the time delay (latency) is achieved or the degree of achievement of the current value for the target value is determined as a future compensation value. I can.

일 실시예에서, 현재 즉각 보상 값과 미래 보상 값을 결합한 전체 보상 값을 이용하여 강화학습 기반 신경망 모델을 학습시킬 수 있다. In an embodiment, a reinforcement learning-based neural network model may be trained by using a total compensation value obtained by combining a current immediate compensation value and a future compensation value.

일 실시예에서, 기 학습된 강화학습 기반 신경망 모델을 이용하여 VANET의 혼잡도에 따라 차량의 최대 채널점유 비율과 최대 송신전력을 결정하고, 현재 전체 보상 값과 직전의 전체 보상 값의 차이(difference)가 임계값보다 클 경우, 기 결정된 차량의 최대 채널점유 비율 및 최대 송신전력에 따라 V2X 통신 서비스를 위한 메시지를 송신할 수 있다. In one embodiment, the maximum channel occupancy ratio and the maximum transmission power of the vehicle are determined according to the congestion level of the VANET using a previously learned reinforcement learning-based neural network model, and the difference between the current total compensation value and the previous total compensation value (difference) When is greater than the threshold value, a message for a V2X communication service may be transmitted according to a predetermined maximum channel occupancy rate and maximum transmission power of the vehicle.

즉, 본 발명에 따르면, 강화학습 기반 신경망 모델을 이용하여, VANET의 혼잡도에 따라 각 차량이 개별적으로 각 차량이 사용할 수 있는 최대 채널점유 비율 CR_limit와 최대 송신전력 P_limit를 결정하여, 단기적으로 채널혼잡비율이 목표하는 값에 근접하도록 하고 장기적으로 차량의 전송률과 지연시간에서 목표하는 성능을 달성할 수 있다.That is, according to the present invention, by using a reinforcement learning-based neural network model, each vehicle individually determines the maximum channel occupancy ratio CR _limit and the maximum transmission power P _limit that each vehicle can use according to the congestion level of the VANET. The channel congestion ratio can be close to the target value, and the target performance can be achieved in the transmission rate and delay time of the vehicle in the long term.

도 4는 본 발명의 일 실시예에 따른 본 발명에 따른 분산혼잡제어 장치(100)의 제어부(101)의 기능적 구성(300)을 도시한 도면이다. 4 is a diagram showing a functional configuration 300 of the control unit 101 of the distributed congestion control apparatus 100 according to the present invention according to an embodiment of the present invention.

도 4를 참고하면, 분산혼잡제어 장치(100)의 제어부(101)의 기능적 구성(300)은 상태 입력부(310), 가치 네트워크(320), 행동 결정부(330) 및 보상 부여부(340)를 포함할 수 있다. Referring to FIG. 4, the functional configuration 300 of the control unit 101 of the distributed congestion control device 100 is a state input unit 310, a value network 320, an action decision unit 330, and a reward grant unit 340. It may include.

상태 선택부(310)에서, 제어부(101)는 통신부(102)와 차량네트워크 인터페이스부(104)와 연계하여 V2X 통신 무선채널의 채널혼잡 비율 CBR과 차량의 현재 채널점유 비율 CR, 이동속도 v, 송신전력 P를 산출하고 가치 네트워크(320)에 입력한다. In the state selection unit 310, the control unit 101 connects with the communication unit 102 and the vehicle network interface unit 104, the channel congestion ratio CBR of the V2X communication wireless channel, the current channel occupancy ratio CR of the vehicle, the movement speed v, The transmission power P is calculated and input to the value network 320.

가치 네트워크(320)에서, 제어부(101)는 V2X 통신 무선채널의 채널혼잡 비율 CBR과 차량의 현재 채널점유 비율 CR, 이동속도 v, 송신전력 P를 입력으로 하여 강화학습 기반의 신경망 모델에 적용할 수 있다. In the value network 320, the control unit 101 inputs the channel congestion ratio CBR of the V2X communication wireless channel, the current channel occupancy ratio CR of the vehicle, the movement speed v, and the transmission power P to be applied to the reinforcement learning-based neural network model. I can.

예를 들어, 강화학습 기반의 신경망 모델은 DQN(Deep Q-network)을 포함할 수 있다. DQN은 Q-학습(learning) 기술에 인공신경망 기술을 추가하여 더 넓은 상태 공간 상에서 강화 학습을 진행하는 알고리즘을 의미할 수 있다. For example, a neural network model based on reinforcement learning may include a deep Q-network (DQN). DQN may refer to an algorithm that performs reinforcement learning in a wider state space by adding artificial neural network technology to Q-learning technology.

종래의 Q-학습 기술은 도 1과 같이 상태 공간이 한정된 경우에 각 상태에 대한 가치를 나타내는 Q 값을 저장하여 효과적으로 학습을 진행할 수 있었으나, 무선채널의 CBR 뿐 아니라 차량의 CR, 이동속도, 송신전력과 같이 상태 공간이 증가하게 되면 아주 큰 수의 조합에 대한 Q 값을 저장해야 하므로 효율적이지 못하다. 여기서, Q 값은 신경망 모델의 결과값을 의미할 수 있다. In the conventional Q-learning technology, when the state space is limited, as shown in FIG. 1, the Q value representing the value of each state can be stored to effectively learn, but not only the CBR of the wireless channel, but also the CR, movement speed, and transmission of the vehicle. If the state space increases, such as power, it is not efficient because it has to store the Q values for a very large number of combinations. Here, the Q value may mean the result of the neural network model.

이에 반해, 본 발명에 따른 DQN은 개별 Q 값을 저장하지 않고, Q 값을 결정하는 함수를 인공신경망으로 근사화하여 문제를 해결하는데, 인공신경망으로 근사화된 가치 네트워크에 현재 상태를 입력으로 주고 Q 값인 Q_t을 결과 값으로 추출할 수 있다. On the other hand, the DQN according to the present invention solves the problem by approximating a function that determines the Q value with an artificial neural network without storing individual Q values. Q _t can be extracted as the resulting value.

행동 결정부(330)에서, 제어부(101)는 인공신경망으로 근사화된 가치 네트워크(320)의 결과 값 Q_t 에 따라 각 차량이 사용할 수 있는 최대 채널점유 비율 CR_limit와 최대 송신전력 P_limit를 결정하는 행동 a_t를 결정할 수 있다. In the behavior determination unit 330, the control unit 101 determines _{the maximum channel occupancy ratio CR limit} and the maximum transmission power P _limit that can be used by each vehicle according to _{the result value Q t of the value network 320 approximated by the artificial neural network.} You can determine the action a _t to do.

보상 부여부(340)에서, 제어부(101)는 행동 a_t에 대해 VANET 환경으로부터 받게 되는 두 가지 보상(예: 즉각 보상과 미래 보상)을 관찰할 수 있다. 보상 값의 증감에 따라 행동 a_t를 수행하거나 또는 수행하지 않을 수 있다. 보상 값은 학습을 위해 행동 a_t와 함께 저장부(103)에 저장될 수 있다. In the reward granting unit 340, the controller 101 may observe two rewards (eg, immediate reward and future reward) received from the VANET environment for the _{action a t.} _{Action a t} may or may not be performed according to the increase or decrease of the reward value. The reward value may be stored in the storage unit 103 together with _{the action a t for learning.}

여기서, 즉각 보상은 행동 결정부(330)가 결정한 행동에 대해 발생하는 즉각적인 보상이고, 미래 보상은 행동으로 인해 나타나는 미래 환경에 대한 보상을 의미할 수 있다. Here, the immediate compensation is an immediate compensation generated for an action determined by the action determination unit 330, and the future compensation may mean a compensation for a future environment caused by the action.

일 실시예에서, 이러한 행동 결정 과정(330)은 하기 <수학식 3>과 같이 표현할 수 있다.In one embodiment, the action determination process 330 may be expressed as Equation 3 below.

여기서,

는 감가율(discount factor)로 0에서 1사이의 값을 가지며 0에 가까우면 현재, 1에 가까울수록 미래에 대한 보상을 나타낼 수 있다.

는 학습율로 0에서 1사이의 값을 가지며 Q 값의 학습율을 결정한다. here,

Is a discount factor and has a value between 0 and 1, and the closer to 0 represents the present, and the closer to 1 represents the reward for the future.

Is the learning rate and has a value between 0 and 1 and determines the learning rate of the Q value.

보상 부여부(340)에서, 행동 결정부(330)가 결정한 행동에 대한 즉각 보상 값으로 V2X 통신을 위한 무선채널의 채널혼잡 비율 CBR을 고려할 수 있다. 즉, 행동 결정부(330)가 결정한 각 차량이 사용할 수 있는 최대 채널점유 비율 CR_limit와 최대 송신전력 P_limit에 의한 채널혼잡 비율 CBR이 목표로 하는 타겟 채널혼잡 비율인 CBR_target에 근접한 정도를 즉각 보상 값으로 부여할 수 있다. In the compensation granting unit 340, the channel congestion ratio CBR of the radio channel for V2X communication may be considered as an immediate compensation value for the action determined by the action determiner 330. _{That is, the channel congestion ratio based on the maximum channel occupancy ratio CR limit} and the maximum transmission power P _limit determined by the behavior determination unit 330 that each vehicle can use is immediately compensated for the degree close to CBR_target, the target channel congestion ratio. Can be given by value.

또한, 보상 부여부(340)에서, 행동 결정부(330)가 결정한 행동에 대한 미래 보상 값으로 자 차가 사용하는 V2X 서비스의 QoS 달성도를 고려할 수 있다. 즉, 자 차가 사용하는 V2X 서비스를 성공하기 위해 요구되는 전송률(throughput)과 시간 지연(latency)을 달성하고 있는지의 여부 또는 목표 값에 대한 현재 값의 달성도를 보상 값으로 부여할 수 있다. In addition, in the reward granting unit 340, the degree of QoS achievement of the V2X service used by the own vehicle may be considered as a future compensation value for the behavior determined by the action determination unit 330. That is, whether or not a throughput and a time delay required for success of the V2X service used by the own vehicle are achieved, or a degree of achievement of a current value for a target value may be given as a compensation value.

전송률은 수신에 성공한 패킷에 대한 전송률인 PDR(packet delivery rate)로 측정될 수 있다. 또한, 패킷 전송에 소요되는 시간 지연은 수신 성공한 패킷 간의 시간 간격인 IPG(Inter Packet Gap)으로 측정될 수 있다. 상기 PDR과 IPG는 각 차량에서 순시적으로 측정할 수 없는 성능 평가 지표이므로 미래 보상 값으로 사용될 수 있다.The transmission rate may be measured as a packet delivery rate (PDR), which is a transmission rate for a packet that has been successfully received. In addition, the time delay required for packet transmission may be measured as an IPG (Inter Packet Gap), which is a time interval between successfully received packets. Since the PDR and IPG are performance evaluation indicators that cannot be measured instantaneously in each vehicle, they can be used as future compensation values.

일 실시예에서, 가치 네트워크(320)에서, 강화학습 기반의 신경망 모델은 후술할 도 5의 기본적인 DQN을 훈련하는 절차에 따라 학습될 수 있다. 분산혼잡제어 장치(100)의 제어부(101)는 강화학습 기반의 신경망 모델을 저장부(103)에 저장되어 있는 행동과 보상을 이용하여 정기적인 훈련 기간마다 학습하고 DQN 네트워크를 업데이트 할 수 있다. In one embodiment, in the value network 320, the reinforcement learning-based neural network model may be trained according to the procedure of training a basic DQN of FIG. 5 to be described later. The control unit 101 of the distributed congestion control apparatus 100 may learn the reinforcement learning-based neural network model at each periodic training period using actions and rewards stored in the storage unit 103 and update the DQN network.

또한, 제어부(101)는 실제 환경에서 차량이 운행 중에 분산혼잡제어를 하면서 학습을 수행할 수 있다. 추가적으로 제어부(101)는 다양한 VANET 환경에 대해 사전에 학습을 하기 위해 일정한 가상환경 시뮬레이션 데이터나 실제 환경에서 수집된 데이터를 이용할 수 있다. In addition, the controller 101 may perform learning while performing distributed congestion control while the vehicle is running in an actual environment. Additionally, the control unit 101 may use certain virtual environment simulation data or data collected in an actual environment in order to pre-learn various VANET environments.

즉, 본 발명에 따른 제어부(101)는 강화학습 기반 신경망 모델을 이용하여 환경의 상태(VANET의 혼잡도)에 따라 일련의 행동(채널점유 비율과 송신전력을 결정)을 수행하는데, 즉각적 보상을 통해 순시적인 채널혼잡비율에서 목표하는 값을 달성하도록 하고 장기적으로 전송률과 지연시간에서 목표하는 성능을 달성하도록 한다. That is, the control unit 101 according to the present invention performs a series of actions (determining the channel occupancy ratio and transmission power) according to the state of the environment (the congestion level of VANET) using the reinforcement learning-based neural network model. It aims to achieve the target value in the instantaneous channel congestion rate, and achieves the target performance in the transmission rate and delay time in the long term.

도 5는 본 발명의 일 실시예에 따른 강화학습 기반 신경망 모델의 학습을 위한 제어부(101)의 동작 방법을 도시한 도면이다.5 is a diagram illustrating a method of operating the controller 101 for learning a neural network model based on reinforcement learning according to an embodiment of the present invention.

도 5를 참고하면, S401 단계는 상태 입력부(310)에서 현재 상태 s_t(CBR, CR, v, P)를 결정하는 단계이다. 즉, 차량의 채널혼잡 비율 CBR, 채널점유 비율 CR, 차량의 이동속도 및 송신전력을 산출하고 결정할 수 있다. Referring to FIG. 5, step S401 is _{a step of determining a current state s t} (CBR, CR, v, P) in the state input unit 310. That is, the channel congestion ratio CBR of the vehicle, the channel occupancy ratio CR, the moving speed of the vehicle, and the transmission power can be calculated and determined.

S403 단계는 가치 네트워크(320)에서 강화학습 기반 신경망 모델을 이용하여 결과값 Q_t를 결정하는 단계이다. 예를 들어, DQN을 이용하여 Q_t값을 결정할 수 있다.Step S403 is a step of determining _{the result value Q t} by using the reinforcement learning-based neural network model in the value network 320. For example, the Q _t value can be determined using DQN.

S405 단계는 행동 결정부(330)에서 Q_t 값에 따라 최대 채널점유 비율 CR_limit과 최대 송신전력 P_limit를 결정하는 행동 a_t(CR_limit, P_limit)을 결정하는 단계이다. 즉, 현재 최대 채널점유 비율 CR_limit과 최대 송신전력 P_limit를 결정할 수 있다. Step S405 is a step of determining an action a _t (CR _limit , P _limit ) of determining the maximum channel occupancy ratio CR _limit and the maximum transmission power P _limit _{according to the Q t value in the action determination unit 330.} That is, the current maximum channel occupancy ratio CR _limit and the maximum transmission power P _limit can be determined.

S407 단계는 보상 부여부(340)에서 행동 a_t(CR_limit, P_limit)에 따른 보상 R_t을 결정하는 단계이다. Step S407 is a step of determining the _{reward R t} according to the action a _t (CR _limit , P _limit ) in the reward granting unit 340.

S409 단계는 강화학습 기반 신경망 모델의 학습 과정이 종료되었는지 여부를 판단하는 단계이다. In step S409, it is determined whether the learning process of the reinforcement learning-based neural network model has ended.

S411 단계는 강화학습 기반 신경망 모델의 학습이 종료되지 않은 경우, 현재 상태 s_t, 행동, a_t, 그리고 그에 따른 보상 R_t를 저장부(103)에 저장하는 단계이다. In step S411, when the training of the reinforcement learning-based neural network model is not finished, the current state s _t , the behavior, a _t , and the corresponding compensation R _t are stored in the storage unit 103.

S413 단계는 다음 학습을 위하여(t

t+1), S401 단계로 진행하며, 이 과정을 학습이 종료될 때까지 반복하는 단계이다. Step S413 is for the next learning (t

It proceeds to steps t+1) and S401, and repeats this process until the learning is completed.

이와 같이, 분산혼잡제어 장치(100)는 분산혼잡제어를 통해 각 차량의 최대 채널점유 비율 CR_limit과 최대 송신전력 P_limit를 결정하기 위한 DQN을 마련할 수 있다. In this way, the distributed congestion control apparatus 100 may provide a DQN for determining _{the maximum channel occupancy ratio CR limit} and the maximum transmission power P _{limit of each vehicle through the distributed congestion control.}

분산혼잡제어 장치(100)는 실제 환경에서 분산혼잡제어를 하면서 학습을 수행할 수 있으며, 또한 사전에 학습을 하기 위해 실제 환경이나 가상환경 시뮬레이션을 통해 생성된 일정한 샘플 데이터를 이용하여 학습할 수 있다. The distributed congestion control apparatus 100 may perform learning while performing distributed congestion control in a real environment, and may also learn using constant sample data generated through simulation of a real environment or a virtual environment in order to learn in advance. .

도 6은 본 발명의 일 실시예에 따른, 도 5와 같은 학습 과정을 통해 기 학습된 강화학습 기반 신경망 모델을 이용한 VANET의 분산혼잡제어 장치(100)의 상세 분산혼잡제어 동작 방법을 도시한 도면이다.FIG. 6 is a diagram showing a detailed distributed congestion control operation method of a VANET distributed congestion control apparatus 100 using a reinforcement learning-based neural network model previously learned through a learning process as shown in FIG. 5 according to an embodiment of the present invention. to be.

도 6을 참고하면, S501 단계는, 상태 입력부(310)에서 현재 상태 s_t(CBR, CR, v, P)를 결정하는 단계이다. 즉, 차량의 채널혼잡 비율 CBR, 채널점유 비율 CR, 차량의 이동속도 및 송신전력을 결정할 수 있다.Referring to FIG. 6, step S501 is _{a step of determining a current state s t} (CBR, CR, v, P) in the state input unit 310. That is, the channel congestion ratio CBR of the vehicle, the channel occupancy ratio CR, the moving speed of the vehicle, and the transmission power can be determined.

S503 단계는, 가치 네트워크(320)에서 기 학습된 강화학습 기반 신경망 모델을 이용하여 결과값 Q_t를 결정하는 단계이다. 예를 들어, 학습이 완료된 DQN을 이용하여 Q_t값을 결정할 수 있다. _{In step S503, the result value Q t} is determined using the reinforcement learning-based neural network model previously learned in the value network 320. _{For example, the value of Q t} may be determined by using the DQN on which the learning has been completed.

S505 단계는,행동 결정부(330)에서 Q_t 값에 따라 최대 채널점유 비율 CR_limit과 최대 송신전력 P_limit를 결정하는 행동 a_t(CR_limit, P_limit)을 결정하는 단계이다. 즉, 현재 최대 채널점유 비율 CR_limit과 최대 송신전력 P_limit를 결정할 수 있다.Step S505 is a step of determining an action a _t (CR _limit , P _limit ) for determining the maximum channel occupancy ratio CR _limit and the maximum transmission power P _limit _{according to the Q t value in the action determination unit 330.} That is, the current maximum channel occupancy ratio CR _limit and the maximum transmission power P _limit can be determined.

S507 단계는, 보상 부여부(340)에서 행동 a_t(CR_limit, P_limit)에 따른 보상 R_t을 관찰하는 단계이다. Step S507 is a step of observing the _{reward R t} according to the action a _t (CR _limit , P _limit ) in the reward granting unit 340.

S509 단계는, 현재 행동에 따른 현재 보상 R_t와 직전의 보상 R_t-1의 차이가 임계값

보다 큰지 여부를 판단하는 단계이다. 일 실시예에서, V2X 통신 무선채널 혼잡도 CBR와 목표 CBR인 CBR_target의 차이, 즉

이 감소하는지 여부를 결정할 수 있다. 이를 통해, 분산혼잡제어에 따라 CBR이 지속적으로 CBR_target으로 근접하고 있는지를 판단할 수 있다.In step S509, the difference between _{the current reward R t} and the previous reward R _t-1 according to the current action is a threshold value.

It is a step to determine whether it is greater than or not. In one embodiment, the difference between the V2X communication radio channel congestion CBR and the target CBR, CBR_target, that is,

You can decide whether it decreases or not. Through this, it can be determined whether the CBR is continuously approaching the CBR_target according to the distributed congestion control.

S511 단계는, 현재 보상과 직전의 보상의 차이가 임계값보다 큰 경우, 행동 a_t(CR_limit, P_limit)에 따라 각 차량의 최대 채널점유 비율 CR_limit과 최대 송신전력 P_limit를 변경하는 단계이다.In step S511, when the difference between the current compensation and the previous compensation is greater than the threshold, changing the maximum channel occupancy ratio CR _limit and the maximum transmission power P _limit _{of each vehicle according to the action a t} (CR _limit , P _{limit ).} to be.

S513 단계는, 다음 단계로 진행하며(t

t+1), 이 과정을 분산혼잡제어가 종료될 때까지 반복한다. Step S513 proceeds to the next step (t

t+1), this process is repeated until the distributed congestion control is finished.

S515 단계는, 현재 보상과 직전의 보상의 차이가 임계값보다 크지 않은 경우, 분산혼잡제어(DCC)가 종료되었는지 여부를 판단하는 단계이다.In step S515, when the difference between the current compensation and the previous compensation is not greater than a threshold value, it is determined whether the distributed congestion control (DCC) has ended.

S517 단계는, 분산혼잡제어가 종료되지 않은 경우, 다음 단계로 진행하며(t

t+1), 이 과정을 분산혼잡제어가 종료될 때까지 반복한다. Step S517, if the distributed congestion control is not finished, proceeds to the next step (t

이상의 설명은 본 발명의 기술적 사상을 예시적으로 설명한 것에 불과한 것으로, 통상의 기술자라면 본 발명의 본질적인 특성이 벗어나지 않는 범위에서 다양한 변경 및 수정이 가능할 것이다.The above description is merely illustrative of the technical idea of the present invention, and those of ordinary skill in the art will be able to make various changes and modifications without departing from the essential characteristics of the present invention.

따라서, 본 명세서에 개시된 실시예들은 본 발명의 기술적 사상을 한정하기 위한 것이 아니라, 설명하기 위한 것이고, 이러한 실시예들에 의하여 본 발명의 범위가 한정되는 것은 아니다.Accordingly, the embodiments disclosed in the present specification are not intended to limit the technical idea of the present invention, but are intended to be described, and the scope of the present invention is not limited by these embodiments.

본 발명의 보호범위는 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 이해되어야 한다.The scope of protection of the present invention should be interpreted by the claims, and all technical ideas within the scope equivalent thereto should be understood as being included in the scope of the present invention.

100: 분산혼합제어 장치
101: 제어부
102: 통신부
103: 저장부
104: 차량네트워크 인터페이스부
105: V2X 수신부
106: V2X 송신부
300: 제어부
310: 상태 입력부
320: 가치 네트워크
330: 행동 결정부
340: 보상 부여부100: distributed mixing control device
101: control unit
102: communication department
103: storage unit
104: vehicle network interface unit
105: V2X receiver
106: V2X transmitter
300: control unit
310: status input unit
320: value network
330: action decision section
340: reward granting unit

Claims

Calculating at least one of a channel busy ratio (CBR) of a wireless channel for a V2X communication service, a channel occupancy ratio (CR) of a vehicle, a movement speed, and a transmission power;
Calculating a result value of the reinforcement learning-based neural network model by applying at least one of a channel congestion ratio of the radio channel, a channel occupancy ratio of the vehicle, a moving speed, and a transmission power to a reinforcement learning-based neural network model;
Determining a maximum channel occupancy ratio and a maximum transmission power of the vehicle according to the result of the reinforcement learning-based neural network model;
Immediately determining a degree that the channel congestion ratio is close to a target target channel congestion ratio as a compensation value according to the maximum occupancy ratio and maximum transmission power of the vehicle; And
Determining a degree of achievement of a target value of a throughput and a time delay of the radio channel as a future compensation value;
Containing,
Distributed congestion control method of VANET using reinforcement learning-based neural network model.

delete

The method of claim 1,
After the step of calculating the at least one,
Training the reinforcement learning-based neural network model by using the total current compensation value obtained by combining the immediate compensation value and the future compensation value;
Further comprising,
Distributed congestion control method of VANET using reinforcement learning-based neural network model.

The method of claim 5,
After the step of calculating the at least one,
Determining whether a difference between the current total compensation value and the previous total compensation value is greater than a threshold value; And
Transmitting a message for the V2X communication service according to a maximum channel occupancy ratio and a maximum transmission power of the vehicle when the difference between the current total compensation value and the previous total compensation value is greater than a threshold value;
Further comprising,
Distributed congestion control method of VANET using reinforcement learning-based neural network model.

Calculate at least one of a channel busy ratio (CBR) of a wireless channel for a V2X communication service, a channel occupancy ratio (CR) of a vehicle, a moving speed, and a transmission power,
Applying at least one of a channel congestion ratio of the radio channel, a channel occupancy ratio of the vehicle, a moving speed, and a transmission power to a reinforcement learning-based neural network model to calculate a result value of the reinforcement learning-based neural network model,
Determine the maximum channel occupancy ratio and the maximum transmission power of the vehicle according to the result of the reinforcement learning-based neural network model,
In accordance with the maximum occupancy ratio and the maximum transmission power of the vehicle, the degree to which the channel congestion ratio approaches a target target channel congestion ratio is immediately determined as a compensation value,
A control unit determining a degree of achievement of a target value of a throughput and a time delay of the radio channel as a future compensation value;
Containing,
Distributed congestion control device of VANET using reinforcement learning-based neural network model.

delete

The method of claim 7,
The control unit,
Training the reinforcement learning-based neural network model using the current total compensation value combined with the immediate compensation value and the future compensation value,
Distributed congestion control device of VANET using reinforcement learning-based neural network model.

The method of claim 11,
The control unit,
Determine whether a difference between the current total compensation value and the previous total compensation value is greater than a threshold value,
A communication unit for transmitting a message for the V2X communication service according to a maximum channel occupancy ratio and a maximum transmission power of the vehicle when a difference between the current total compensation value and the previous total compensation value is greater than a threshold value;
Further comprising,
Distributed congestion control device of VANET using reinforcement learning-based neural network model.