KR102538330B1

KR102538330B1 - Signal detection method and apparatus based on reinforcement learning for vehicular mimo communication

Info

Publication number: KR102538330B1
Application number: KR1020210054255A
Authority: KR
Inventors: 이충용; 임채훈
Original assignee: 현대모비스 주식회사
Priority date: 2020-11-27
Filing date: 2021-04-27
Publication date: 2023-06-01
Also published as: KR20220074687A

Abstract

본 실시예들은 MIMO 신호 검출 과정에서 강화 학습을 적용하고, MIMO 신호 검출의 성능과 복잡도 간의 상충 관계를 개선하고 차량의 속도에 따라 강화 학습의 에피소드 개수를 제어하여 성능과 복잡도를 유연하게 조절할 수 있는 차량용 MIMO 통신을 위한 강화 학습 기반의 신호 검출 장치 및 방법을 제공한다. The present embodiments apply reinforcement learning in the MIMO signal detection process, improve the trade-off between performance and complexity of MIMO signal detection, and control the number of episodes of reinforcement learning according to the speed of the vehicle to flexibly adjust performance and complexity. A reinforcement learning-based signal detection apparatus and method for automotive MIMO communication are provided.

Description

Signal detection apparatus and method based on reinforcement learning for vehicle MIMO communication

본 발명이 속하는 기술 분야는 차량용 MIMO 통신을 위한 신호 검출 장치 및 방법에 관한 것이다.The technical field to which the present invention belongs relates to a signal detection apparatus and method for vehicle MIMO communication.

이 부분에 기술된 내용은 단순히 본 실시예에 대한 배경 정보를 제공할 뿐 종래기술을 구성하는 것은 아니다.The contents described in this part merely provide background information on the present embodiment and do not constitute prior art.

N개의 안테나를 구비하는 송신 장치(Tx)와 송신 장치(Tx)의 지정된 커버리지 내에 M개의 안테나를 장착한 수신 장치(Rx)로 구성되는 차량용 MIMO(Multiple Input Multiple Output) 통신 시스템이 있다. V2X(Vehicular to Everything) 애플리케이션에 따라 송신 장치와 수신 장치는 차량, 보행자, 기지국 인프라 등 다양한 요소로 이루어질 수 있다. There is a multiple input multiple output (MIMO) communication system for vehicles composed of a transmitter (Tx) having N antennas and a receiver (Rx) equipped with M antennas within a designated coverage of the transmitter (Tx). Depending on the Vehicular to Everything (V2X) application, the transmitter and receiver may be composed of various elements such as vehicles, pedestrians, and base station infrastructure.

종래에는 복잡도를 완화하기 위해 수신 신호에 선형 필터를 곱하여 신호를 검출하는 ZF(Zero-Forcing) 또는 MMSE(Minimum Mean Square Error) 등과 같은 방법이 고려되었다.Conventionally, in order to alleviate complexity, methods such as ZF (Zero-Forcing) or MMSE (Minimum Mean Square Error) for detecting a signal by multiplying a received signal with a linear filter have been considered.

선형 필터 기반의 신호 검출 기술은 복잡도는 낮으나 신호 검출 성능이 떨어지는 문제점이 있다. 종래의 MIMO 신호 검출 기술은 성능과 복잡도 간 상충 관계가 크기 때문에 초신뢰-저지연 통신을 필요로 하는 차량용 통신 시스템에는 적합하지 않다. 종래의 MIMO 신호 검출 방법은 기법마다 성능과 복잡도가 고정적이기 때문에 V2X 애플리케이션에 따라 다양한 지연시간과 신뢰도를 요구하는 차량용 통신 시스템에는 적합하지 않다.The linear filter-based signal detection technology has a low complexity but low signal detection performance. Conventional MIMO signal detection technology is not suitable for in-vehicle communication systems that require ultra-reliable low-latency communication because of a high trade-off between performance and complexity. Conventional MIMO signal detection methods are not suitable for vehicle communication systems that require various delay times and reliability according to V2X applications because performance and complexity are fixed for each technique.

US 8000416 (2011.08.16.)US 8000416 (2011.08.16.) KR 10-1048976 (2011.07.06.)KR 10-1048976 (2011.07.06.) KR 10-1571103 (2015.11.17.)KR 10-1571103 (2015.11.17.) KR 10-1752491 (2017.06.23.)KR 10-1752491 (2017.06.23.)

본 발명의 실시예들은 MIMO 신호 검출 과정에서 강화 학습을 적용하고, MIMO 신호 검출의 성능과 복잡도 간의 상충 관계를 개선하고 차량의 속도에 따라 강화 학습의 에피소드 개수를 제어하여 성능과 복잡도를 유연하게 조절하는 데 주된 목적이 있다.Embodiments of the present invention apply reinforcement learning in the MIMO signal detection process, improve the trade-off between performance and complexity of MIMO signal detection, and flexibly adjust performance and complexity by controlling the number of episodes of reinforcement learning according to the speed of the vehicle. It has a main purpose to

본 발명의 명시되지 않은 또 다른 목적들은 하기의 상세한 설명 및 그 효과로부터 용이하게 추론할 수 있는 범위 내에서 추가적으로 고려될 수 있다.Other non-specified objects of the present invention may be additionally considered within the scope that can be easily inferred from the following detailed description and effects thereof.

본 실시예의 일 측면에 의하면, 차량용 MIMO(Multiple Input Multiple Output) 통신을 위한 신호 검출 장치에 있어서, 무선 주파수를 이용하여 수신 신호를 수신하는 안테나; 및 채널 행렬, 송신 신호, 및 수신 잡음의 관계가 상기 수신 신호로 표현된 송수신 관계에 대해서, 강화 학습을 통해 상기 송신 신호를 검출하는 신호 처리부를 포함하는 신호 검출 장치를 제공한다.According to one aspect of the present embodiment, in a signal detection apparatus for multiple input multiple output (MIMO) communication for a vehicle, an antenna for receiving a received signal using a radio frequency; and a signal processing unit configured to detect the transmission signal through reinforcement learning with respect to the transmission/reception relationship in which the relationship between the channel matrix, the transmission signal, and the reception noise is expressed by the reception signal.

상기 신호 처리부는 상기 채널 행렬을 상기 유니터리 행렬(unitary matrix)과 하삼각 행렬(lower triangular matrix)로 분해하고, 상기 유니터리 행렬을 이용하여 상기 송수신 관계를 재구성할 수 있다.The signal processing unit may decompose the channel matrix into the unitary matrix and a lower triangular matrix, and reconstruct the transmission/reception relationship using the unitary matrix.

상기 신호 처리부는 상기 재구성한 송수신 관계로부터 상기 송신 신호의 검출 기준을 산출할 수 있다.The signal processing unit may calculate a detection criterion of the transmission signal from the reconstructed transmission/reception relationship.

상기 신호 처리부는 상기 재구성한 송수신 관계에 마르코프 결정 프로세스(Markov Decision Process)를 적용하고, 상기 마르코프 결정 프로세스에 따라 상태(state), 행동(action), 및 보상(reward)을 정의하여, 상기 송신 신호를 검출할 수 있다.The signal processing unit applies a Markov Decision Process to the reconstructed transmission and reception relationship, defines a state, action, and reward according to the Markov Decision Process, can be detected.

상기 신호 처리부는 상기 송신 신호의 성상도(constellation)에서 실수부 집합의 개수를 인덱스로 갖는 행동으로 정의할 수 있다.The signal processing unit may define an action having the number of sets of real parts as an index in the constellation of the transmission signal.

상기 보상은 상기 재구성한 송수신 관계로부터 산출한 상기 송신 신호의 검출 기준을 이용하며, 신호 검출 오류가 작을수록 보상이 커지도록 마이너스 연산자를 적용할 수 있다.The compensation uses a detection criterion of the transmission signal calculated from the reconstructed transmission/reception relationship, and a minus operator may be applied so that the compensation increases as the signal detection error decreases.

상기 신호 처리부는 차량의 속도에 따라 상기 강화 학습의 학습 에피소드 개수를 조절하여 신호 검출의 복잡도 및 성능을 제어할 수 있다.The signal processing unit may control complexity and performance of signal detection by adjusting the number of learning episodes of the reinforcement learning according to the speed of the vehicle.

본 실시예의 다른 측면에 의하면, 차량용 MIMO(Multiple Input Multiple Output) 통신을 위한 신호 검출 방법에 있어서, 채널 행렬, 송신 신호, 및 수신 잡음의 관계가 상기 수신 신호로 표현된 송수신 관계를 재구성하는 단계; 상기 재구성한 송수신 관계에 마르코프 결정 프로세스(Markov Decision Process)를 적용하는 단계; 상기 마르코프 결정 프로세스에 따라 정의된 상태 행동 가치 함수를 학습하는 단계; 및 상기 상태 행동 가치 함수를 통해 상기 송신 신호를 검출하는 단계를 포함하는 신호 검출 방법을 제공한다.According to another aspect of the present embodiment, in a signal detection method for multiple input multiple output (MIMO) communication for a vehicle, the step of reconstructing a transmission/reception relationship in which a relationship between a channel matrix, a transmission signal, and reception noise is expressed by the received signal; applying a Markov Decision Process to the reconstructed transmission/reception relationship; learning a state action value function defined according to the Markov decision process; and detecting the transmission signal through the state action value function.

상기 마르코프 결정 프로세스를 적용하는 단계는, 상기 송신 신호의 성상도(constellation)에서 실수부 집합의 개수를 인덱스로 갖는 행동으로 정의할 수 있다.Applying the Markov decision process may be defined as an action having the number of sets of real parts as an index in the constellation of the transmission signal.

상기 상태 행동 가치 함수를 학습하는 단계는, 차량의 속도에 따라 상기 상태 행동 가치 함수의 학습 에피소드 개수를 조절하여 신호 검출의 복잡도 및 성능을 제어할 수 있다.In the step of learning the state action value function, complexity and performance of signal detection may be controlled by adjusting the number of learning episodes of the state action value function according to the speed of the vehicle.

이상에서 설명한 바와 같이 본 발명의 실시예들에 의하면, MIMO 신호 검출 과정에서 강화 학습을 적용하고, MIMO 신호 검출의 성능과 복잡도 간의 상충 관계를 개선하고 차량의 속도에 따라 강화 학습의 에피소드 개수를 제어하여 성능과 복잡도를 유연하게 조절할 수 있는 효과가 있다.As described above, according to the embodiments of the present invention, reinforcement learning is applied in the MIMO signal detection process, the tradeoff between performance and complexity of MIMO signal detection is improved, and the number of reinforcement learning episodes is controlled according to the speed of the vehicle. This has the effect of flexibly adjusting the performance and complexity.

여기에서 명시적으로 언급되지 않은 효과라 하더라도, 본 발명의 기술적 특징에 의해 기대되는 이하의 명세서에서 기재된 효과 및 그 잠정적인 효과는 본 발명의 명세서에 기재된 것과 같이 취급된다.Even if the effects are not explicitly mentioned here, the effects described in the following specification expected by the technical features of the present invention and their provisional effects are treated as described in the specification of the present invention.

도 1은 다양한 V2X 애플리케이션 및 시나리오를 예시한 도면이다.
도 2는 본 발명의 일 실시예에 따른 신호 검출 장치를 예시한 블록도이다.
도 3은 본 발명의 일 실시예에 따른 신호 검출 장치가 처리하는 N개의 송신 안테나를 장착한 MIMO 통신 시스템에 4-QAM 모듈레이션을 적용한 상황에 대한 의사 결정 트리를 예시한 도면이다.
도 4는 본 발명의 일 실시예에 따른 신호 검출 장치가 처리하는 상태 행동 가치 함수의 학습 동작을 예시한 도면이다.
도 5는 본 발명의 다른 실시예에 따른 신호 검출 방법을 예시한 흐름도이다.
도 6은 본 발명의 실시예들을 시뮬레이션한 결과, 비트 에너지 대 잡음 비(E_b/N₀)에 따른 비트 오류율 성능을 예시한 도면이다.
도 7은 본 발명의 실시예들을 시뮬레이션한 결과, 학습 에피소드에 따른 비트 오류율 성능을 예시한 도면이다.1 is a diagram illustrating various V2X applications and scenarios.
2 is a block diagram illustrating a signal detection device according to an embodiment of the present invention.
3 is a diagram illustrating a decision tree for a situation in which 4-QAM modulation is applied to a MIMO communication system equipped with N transmit antennas processed by a signal detection apparatus according to an embodiment of the present invention.
4 is a diagram illustrating a learning operation of a state action value function processed by a signal detection apparatus according to an embodiment of the present invention.
5 is a flowchart illustrating a signal detection method according to another embodiment of the present invention.
6 is a diagram illustrating bit error rate performance according to a bit energy-to-noise ratio (E _b /N ₀ ) as a result of simulation of embodiments of the present invention.
7 is a diagram illustrating bit error rate performance according to learning episodes as a result of simulation of embodiments of the present invention.

이하, 본 발명을 설명함에 있어서 관련된 공지기능에 대하여 이 분야의 기술자에게 자명한 사항으로서 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하고, 본 발명의 일부 실시예들을 예시적인 도면을 통해 상세하게 설명한다. Hereinafter, in the description of the present invention, if it is determined that a related known function may unnecessarily obscure the subject matter of the present invention as an obvious matter to those skilled in the art, the detailed description thereof will be omitted, and some embodiments of the present invention will be described. It will be described in detail through exemplary drawings.

도 1은 다양한 V2X 애플리케이션 및 시나리오를 예시한 도면이다.1 is a diagram illustrating various V2X applications and scenarios.

도 1에서는 일 예로 N개의 안테나를 구비하는 송신 장치(Tx)와 송신 장치(Tx)의 지정된 커버리지 내에 M개의 안테나를 장착한 수신 장치(Rx)로 구성되는 차량용 MIMO 통신 시스템을 도시하였다. V2X 애플리케이션에 따라 송신 장치와 수신 장치는 차량, 보행자, 기지국 인프라 등 다양한 요소로 이루어질 수 있다. 또한, 본 발명의 차량용 MIMO 통신 시스템에서 사용하는 무선 통신은 셀룰러 이동통신(LTE, NR 등)과 무선랜(IEEE802.11p, IEEE802.11bd 등)의 다양한 통신 규격 및 표준에 맞춰 사용할 수 있다. 예컨대, 5G, 6G 등의 차세대 이동통신을 적용할 수 있다.In FIG. 1, as an example, a MIMO communication system for a vehicle composed of a transmitter (Tx) having N antennas and a receiver (Rx) equipped with M antennas within a designated coverage of the transmitter (Tx) is shown. Depending on the V2X application, the transmitter and receiver may be composed of various elements such as vehicles, pedestrians, and base station infrastructure. In addition, wireless communication used in the vehicle MIMO communication system of the present invention can be used according to various communication standards and standards of cellular mobile communication (LTE, NR, etc.) and wireless LAN (IEEE802.11p, IEEE802.11bd, etc.). For example, next-generation mobile communications such as 5G and 6G may be applied.

N개의 송신 안테나와 M개의 수신 안테나를 장착한 MIMO 통신 시스템에서 송수신 관계는 수학식 1과 같이 표현된다.In a MIMO communication system equipped with N transmit antennas and M receive antennas, a transmit/receive relationship is expressed as Equation 1.

여기서,

는 수신 신호 벡터,

는 송신 신호 벡터,

는 채널 행렬,

은 수신잡음 벡터를 나타낸다. 수학식 1에서 채널 행렬

의 i번째 행, j번째 열의 원소

는 j번째 송신 안테나와 i번째 수신 안테나 사이의 채널 왜곡을 나타낸다. 또한, 수신잡음 벡터

의 i번째 원소

는 i번째 수신 안테나에서 수신된 잡음 신호를 나타낸다. 복소수로 이루어진 수학식 1에서 실수부와 허수부를 분리함으로써 수학식 2와 같이 실수 값으로 이루어진 송수신 관계로 재구성할 수 있다.here,

is the received signal vector,

is the transmit signal vector,

is the channel matrix,

denotes a received noise vector. Channel matrix in Equation 1

element in the i-th row, j-th column of

represents the channel distortion between the j-th transmit antenna and the i-th receive antenna. Also, the received noise vector

the i element of

Denotes a noise signal received at the i-th receiving antenna. By separating the real part and the imaginary part in Equation 1 composed of complex numbers, it can be reconstructed into a transmission/reception relationship composed of real values as shown in Equation 2.

여기서,

는 실수부를 취하는 연산자,

는 허수부를 취하는 연산자를 나타낸다.here,

is an operator that takes a real part,

represents an operator that takes an imaginary part.

수학식 2에서 최대우도 추정(Maximum Likelihood Estimation)에 근거한 최적의 송신 신호 검출 기준은 수학식 3과 같다.In Equation 2, the optimal transmission signal detection criterion based on maximum likelihood estimation is as shown in Equation 3.

여기서,

는 검출 신호, X는 송신 신호의 성상도(constellation)에서 실수부 집합을 나타낸다. 수학식 3의 검출 기준을 통해 최적의 검출 성능을 얻을 수 있지만 모든 송신 신호의 경우의 수를 완전 탐색해야 하므로 매우 높은 복잡도가 요구된다. 따라서 종래에는 복잡도를 완화하기 위해 수신 신호에 선형 필터를 곱하여 신호를 검출하는 ZF(Zero-Forcing) 또는 MMSE(Minimum Mean Square Error) 등과 같은 방법이 고려되었다.here,

Denotes a set of real parts in a constellation of a detection signal and X denotes a transmission signal. Optimal detection performance can be obtained through the detection criterion of Equation 3, but very high complexity is required because the number of cases of all transmission signals must be completely searched. Therefore, conventionally, a method such as ZF (Zero-Forcing) or MMSE (Minimum Mean Square Error) for detecting a signal by multiplying a received signal with a linear filter has been considered in order to alleviate complexity.

하지만 선형 필터 기반의 신호 검출 기술은 복잡도는 낮으나 신호 검출 성능이 떨어지는 문제점이 있다. 이처럼 종래의 MIMO 신호 검출 기술은 성능과 복잡도 간 상충 관계가 크기 때문에 초신뢰-저지연 통신을 필요로 하는 차량용 통신 시스템에는 적합하지 않다. However, the signal detection technology based on the linear filter has a problem in that the signal detection performance is low although the complexity is low. As such, the conventional MIMO signal detection technology has a large trade-off between performance and complexity, so it is not suitable for a vehicle communication system requiring ultra-reliable low-latency communication.

종래의 MIMO 신호 검출 방법은 기법 별로 성능과 복잡도가 고정적이기 때문에 V2X 애플리케이션에 따라 다양한 지연시간과 신뢰도를 요구하는 차량용 통신 시스템에는 적합하지 않다. Conventional MIMO signal detection methods have fixed performance and complexity for each technique, so they are not suitable for vehicle communication systems that require various delay times and reliability according to V2X applications.

본 발명에서는 MIMO 과제를 MDP(Markov Decision Process) 과제로 변경하여 이러한 문제를 해결한다. 본 발명은 MIMO 신호 검출의 성능과 복잡도 간의 상충 관계를 개선하고 성능과 복잡도를 유연하게 조절할 수 있는 차량용 MIMO 통신 시스템을 위한 강화 학습 기반의 신호 검출 장치 및 방법을 제안한다. In the present invention, this problem is solved by changing the MIMO task to a Markov Decision Process (MDP) task. The present invention proposes a signal detection apparatus and method based on reinforcement learning for an in-vehicle MIMO communication system capable of improving the trade-off between performance and complexity of MIMO signal detection and flexibly adjusting performance and complexity.

도 2는 본 발명의 일 실시예에 따른 신호 검출 장치를 예시한 블록도이다.2 is a block diagram illustrating a signal detection device according to an embodiment of the present invention.

차량용 MIMO(Multiple Input Multiple Output) 통신을 위한 신호 검출 장치(100)는 안테나(110) 및 신호 처리부(120)를 포함한다. 신호 검출 장치는 송신기, 수신기, 또는 송신기와 수신기가 결합된 시스템으로 구현될 수 있다.A signal detection apparatus 100 for multiple input multiple output (MIMO) communication for a vehicle includes an antenna 110 and a signal processing unit 120 . The signal detection device may be implemented as a transmitter, a receiver, or a system in which a transmitter and a receiver are combined.

안테나(110)는 무선 주파수를 이용하여 수신 신호를 수신한다.The antenna 110 receives a reception signal using a radio frequency.

신호 처리부(120)는 채널 행렬, 송신 신호, 및 수신 잡음의 관계가 수신 신호로 표현된 송수신 관계에 대해서, 강화 학습을 통해 송신 신호를 검출한다.The signal processing unit 120 detects a transmission signal through reinforcement learning with respect to a transmission/reception relationship in which a relationship between a channel matrix, a transmission signal, and reception noise is expressed as a received signal.

MIMO 통신 시스템에서 강화학습을 적용하여 신호를 검출하기 위해서는 수학식 2의 송수신 관계를 마르코프 결정 프로세스(Markov Decision Process)으로 해석할 수 있어야 한다. In order to detect a signal by applying reinforcement learning in a MIMO communication system, the transmission/reception relationship of Equation 2 must be interpreted as a Markov Decision Process.

신호 처리부는 채널 행렬을 유니터리 행렬(unitary matrix)과 하삼각 행렬(lower triangular matrix)로 분해하고, 유니터리 행렬을 이용하여 송수신 관계를 재구성한다. 채널 행렬 H를 수학식 5와 같이 QL 분해를 수행한다.The signal processor decomposes the channel matrix into a unitary matrix and a lower triangular matrix, and reconstructs a transmission/reception relationship using the unitary matrix. QL decomposition is performed on the channel matrix H as shown in Equation 5.

여기서,

는 유니터리 행렬(unitary matrix),

은 하삼각 행렬(lower triangular matrix)을 나타낸다. QL 분해를 통해 얻은 유니터리 행렬 Q를 수학식 2의 좌변과 우변에 곱하여 수학식 6과 같이 MIMO 송수신 관계를 재구성할 수 있다.here,

is a unitary matrix,

represents a lower triangular matrix. The MIMO transmission/reception relationship can be reconstructed as shown in Equation 6 by multiplying the left and right sides of Equation 2 by the unitary matrix Q obtained through QL decomposition.

신호 처리부는 재구성한 송수신 관계로부터 송신 신호의 검출 기준을 산출한다. 수학식 6의 송수신 관계에서 최적의 송신 신호 검출 기준은 수학식 7과 같이 주어진다.The signal processing unit calculates a transmission signal detection criterion from the reconstructed transmission/reception relationship. In the transmission/reception relationship of Equation 6, the optimal transmission signal detection criterion is given as Equation 7.

여기서, z_n는

의 n번째 원소, L_n,l은

의 n번째 행, l번째 열의 원소를 나타낸다. 기존 수학식 2의 송수신 관계는 모든 송신 신호가 서로 간섭으로 작용을 하지만 QL 분해를 통해 얻은 수학식 6에서는 인접한 안테나의 송신 신호만 간섭으로 영향을 미치기 때문에 마르코프 결정 프로세스(Markov Decision Process)으로 해석할 수 있다. 이전 안테나 인덱스의 심볼만을 다음 안테나에 영향을 주도록 처리할 수 있다. where _zn is

The nth element of , L _n,l is

Indicates the element of the n-th row and l-th column of . In the existing transmission/reception relationship of Equation 2, all transmission signals act as interference, but in Equation 6 obtained through QL decomposition, only transmission signals from adjacent antennas affect each other as interference, so it can be interpreted as a Markov Decision Process. can Only the symbol of the previous antenna index can be processed to affect the next antenna.

신호 처리부는 재구성한 송수신 관계에 마르코프 결정 프로세스(Markov Decision Process)를 적용하고, 마르코프 결정 프로세스에 따라 상태(state), 행동(action), 및 보상(reward)을 정의하여, 송신 신호를 검출한다.The signal processing unit applies a Markov Decision Process to the reconstructed transmission/reception relationship, defines a state, action, and reward according to the Markov Decision Process, and detects a transmission signal.

수학식 6에서의 신호 검출 과정을 마르코프 결정 프로세스(Markov Decision Process)에 따른 의사 결정 트리로 표현하면 도 3과 같이 도시할 수 있다.If the signal detection process in Equation 6 is expressed as a decision tree according to a Markov decision process, it can be shown as shown in FIG.

도 3은 본 발명의 일 실시예에 따른 신호 검출 장치가 처리하는 N개의 송신 안테나를 장착한 MIMO 통신 시스템에 4-QAM 모듈레이션을 적용한 상황에 대한 의사 결정 트리를 예시한 도면이다.3 is a diagram illustrating a decision tree for a situation in which 4-QAM modulation is applied to a MIMO communication system equipped with N transmit antennas processed by a signal detection apparatus according to an embodiment of the present invention.

도 3의 의사 결정 트리에서 다음과 같이 상태(state), 행동(action), 보상(reward)을 정의하여 강화 학습 환경을 구축한다. 의사 결정 트리의 노드로 이루어진 상태(state) 집합 (

)을 정의한다. 여기서, s_i,j는 의사 결정 트리의 i번째 레벨의 j번째 노드인 상태(state)를 나타낸다. 신호의 성상도의 실수부 집합(X)의 개수를 인덱스로 가지는 행동(action) 집합(

)을 정의한다. 예를 들어 t 레벨의 현재 상태(current state) s(t)에서 행동(action) a(t)∈A을 수행하여 다음 상태(next state) s(t+1)로 이동한다고 할 때 수행한 행동(action)을 연속적으로 모아보면 송신 신호

를 검출할 수 있다. 이때 현재 상태(current state) s(t)에서 최적의 행동(action) a(t)를 결정하기 위한 정책(policy) 함수 π(s(t))는 수학식 8과 같이 주어진다.In the decision tree of FIG. 3, a reinforcement learning environment is built by defining a state, an action, and a reward as follows. The set of states of the nodes of the decision tree (

) is defined. Here, s _i,j denotes a state that is the j-th node of the i-th level of the decision tree. A set of actions (actions) having the number of sets (X) of the real part of the constellation of the signal as an index

) is defined. For example, when moving to the next state s(t+1) by performing action a(t)∈A in the current state s(t) at level t, the action performed If you collect (action) continuously, the transmission signal

can be detected. At this time, the policy function π(s(t)) for determining the optimal action a(t) in the current state s(t) is given by Equation 8.

여기서, Q^*(s(t), a(t))는 상태(state) s(t)와 행동(action) a(t)에 대한 최적의 상태(state) 행동(action) 가치 함수를 나타낸다. 최적의 상태 행동 가치 함수를 학습하기 위해서는 보상(reward)을 기반으로 벨먼(Bellman) 방정식에 기초하에 반복적으로 수학식 9과 같이 업데이트한다.Here, Q ^* (s(t), a(t)) denotes the optimal state action value function for state s(t) and action a(t). In order to learn the optimal state action value function, it is repeatedly updated as shown in Equation 9 based on the Bellman equation based on the reward.

여기서,

는 학습 비율,

는 감가 비율을 나타낸다. 이때 행동(action) a(t)를 수행한 후에 받은 보상(reward) r(t+1)은 수학식 7에 따라 수학식 10과 같이 정의한다.here,

is the learning rate,

represents the depreciation rate. At this time, the reward r(t+1) received after performing the action a(t) is defined as Equation 10 according to Equation 7.

수학식 10에서 신호 검출 오류가 작을수록 큰 보상을 받도록 마이너스 연산자를 취하였다. 최종적으로 본 발명의 일 실시예에 따른 MIMO 정보 검출을 위한 최적의 정책 함수를 학습하는 과정을 요약하면 도 4과 같다.In Equation 10, the minus operator was taken so that the smaller the signal detection error, the larger the compensation. Finally, a summary of the process of learning an optimal policy function for detecting MIMO information according to an embodiment of the present invention is shown in FIG. 4 .

도 4는 본 발명의 일 실시예에 따른 신호 검출 장치가 처리하는 상태 행동 가치 함수의 학습 동작을 예시한 도면이다.4 is a diagram illustrating a learning operation of a state action value function processed by a signal detection apparatus according to an embodiment of the present invention.

일반적으로 차량의 속도가 빠르면 무선 채널에 의한 왜곡이 심하게 발생하기 때문에 신호 검출의 신뢰도를 높이는 것이 더욱 중요하다. 사전에 차량의 속도에 따른 적절한 학습 에피소드 수를 지정하여 테이블로 매핑하고, 주행하는 차량의 속도에 맞게 적응적으로 학습 에피소드 개수를 적용함으로써 차량의 현재 상황을 고려하면서 신호 검출의 신뢰도를 보장할 수 있다. V2X 애플리케이션에 따라 사전에 학습 에피소드 개수를 지정하여 다양한 V2X 시나리오에 따라 신뢰도와 복잡도를 유연하게 조절할 수 있다.In general, when the speed of a vehicle is high, distortion due to a radio channel is severe, so it is more important to increase the reliability of signal detection. By designating the appropriate number of learning episodes according to the speed of the vehicle in advance and mapping them to a table, and by adaptively applying the number of learning episodes according to the speed of the vehicle, the reliability of signal detection can be guaranteed while considering the current situation of the vehicle. there is. Reliability and complexity can be flexibly adjusted according to various V2X scenarios by specifying the number of learning episodes in advance according to the V2X application.

도 5는 본 발명의 다른 실시예에 따른 신호 검출 방법을 예시한 흐름도이다. 차량용 MIMO 통신을 위한 신호 검출 방법은 차량용 MIMO 통신을 위한 신호 검출 장치에 의해 수행될 수 있다.5 is a flowchart illustrating a signal detection method according to another embodiment of the present invention. A signal detection method for vehicular MIMO communication may be performed by a signal detection apparatus for vehicular MIMO communication.

단계 S10에서 채널 행렬, 송신 신호, 및 수신 잡음의 관계가 상기 수신 신호로 표현된 송수신 관계를 재구성한다.In step S10, the relationship between the channel matrix, the transmission signal, and the reception noise is reconstructed as a transmission/reception relationship expressed by the reception signal.

단계 S20에서 재구성한 송수신 관계에 마르코프 결정 프로세스(Markov Decision Process)를 적용한다. 상기 마르코프 결정 프로세스를 적용하는 단계(S20)는 송신 신호의 성상도(constellation)에서 실수부 집합의 개수를 인덱스로 갖는 행동으로 정의한다.A Markov Decision Process is applied to the transmission/reception relationship reconstructed in step S20. Applying the Markov decision process (S20) is defined as an action having the number of sets of real parts as an index in a constellation of a transmission signal.

단계 S30에서 마르코프 결정 프로세스에 따라 정의된 상태 행동 가치 함수를 학습한다. 상태 행동 가치 함수를 학습하는 단계(S30)는 차량의 속도에 따라 상태 행동 가치 함수의 학습 에피소드 개수를 조절하여 신호 검출의 복잡도 및 성능을 제어한다.In step S30, the state action value function defined according to the Markov decision process is learned. In the step of learning the state action value function ( S30 ), complexity and performance of signal detection are controlled by adjusting the number of learning episodes of the state action value function according to the speed of the vehicle.

단계 S40에서 상태 행동 가치 함수를 통해 송신 신호를 검출한다.In step S40, a transmission signal is detected through a state action value function.

종래기술과 발명의 MIMO 신호 검출 성능을 비교하기 위해 시뮬레이션을 수행하였다. 시뮬레이션에 사용된 변수는 표 1과 같다.Simulations were performed to compare the MIMO signal detection performance of the prior art and the present invention. The variables used in the simulation are shown in Table 1.

도 6은 ZF, MMSE, MLD(Maximum Likelihood Detection) 신호 검출 기법과 본 발명인 RLD(Reinforcement Learning-based Detection) 신호 검출 기법의 비트 에너지 대 잡음 비(E_b/N₀)에 따른 비트 오류율(bit error rate, BER) 성능을 도시한 것이다. 도 7은 비트 에너지 대 잡음 비(E_b/N₀)가 16dB일 때 학습 에피소드 수(L)의 증가에 따른 비트 오류율 성능을 도시한 것이다. 6 shows the bit error rate (bit error rate) according to the bit energy-to-noise ratio (E _b /N ₀ ) of ZF, MMSE, MLD (Maximum Likelihood Detection) signal detection techniques and RLD (Reinforcement Learning-based Detection) signal detection technique of the present invention rate, BER) performance. 7 illustrates bit error rate performance according to an increase in the number of training episodes (L) when the bit energy-to-noise ratio (E _b /N ₀ ) is 16 dB.

학습 에피소드의 수(L)가 커질수록 송신 신호 검출 성능이 높아진다. 특히, 학습 에피소드가 50 이상인 경우 ZF와 MMSE보다 높은 신호 검출 성능을 보이며 500 근처에서 최적의 MLD의 성능에 도달하였다. 예컨대, 300 정도로 초기화하고 상황에 맞게 50으로 점진적으로 감소시키거나 500으로 점진적으로 증가시킬 수 있다. 즉 학습 에피소드의 수를 조절하여 MIMO 신호 검출의 복잡도와 성능을 유연하게 조절할 수 있으며 비교적 낮은 복잡도로 최적의 MLD 성능에 도달함을 확인하였다. As the number of learning episodes (L) increases, transmission signal detection performance increases. In particular, when the number of learning episodes is 50 or more, the optimal MLD performance is reached around 500, showing higher signal detection performance than ZF and MMSE. For example, it can be initialized to about 300 and gradually decreased to 50 or gradually increased to 500 according to the situation. That is, it was confirmed that the complexity and performance of MIMO signal detection can be flexibly adjusted by adjusting the number of learning episodes, and the optimal MLD performance is reached with relatively low complexity.

따라서, 본 발명은 초신뢰-저지연 통신을 필요로 하는 차량용 MIMO 통신 시스템에 적합한 신호 검출 방법이라 할 수 있다.Therefore, the present invention can be said to be a signal detection method suitable for a vehicle MIMO communication system requiring ultra-reliable low-latency communication.

신호 검출 장치는 적어도 하나의 프로세서, 컴퓨터 판독 가능한 저장매체 및 통신 버스를 포함할 수 있다.The signal detection device may include at least one processor, a computer readable storage medium, and a communication bus.

프로세서는 신호 검출 장치로 동작하도록 제어할 수 있다. 예컨대, 프로세서는 컴퓨터 판독 가능한 저장 매체에 저장된 하나 이상의 프로그램들을 실행할 수 있다. 하나 이상의 프로그램들은 하나 이상의 컴퓨터 실행 가능 명령어를 포함할 수 있으며, 컴퓨터 실행 가능 명령어는 프로세서에 의해 실행되는 경우 신호 검출 장치로 하여금 예시적인 실시예에 따른 동작들을 수행하도록 구성될 수 있다.The processor may be controlled to operate as a signal detection device. For example, a processor may execute one or more programs stored on a computer readable storage medium. The one or more programs may include one or more computer executable instructions, which when executed by a processor may cause the signal detection device to perform operations in accordance with an illustrative embodiment.

컴퓨터 판독 가능한 저장 매체는 컴퓨터 실행 가능 명령어 내지 프로그램 코드, 프로그램 데이터 및/또는 다른 적합한 형태의 정보를 저장하도록 구성된다. 컴퓨터 판독 가능한 저장 매체에 저장된 프로그램은 프로세서에 의해 실행 가능한 명령어의 집합을 포함한다. 일 실시예에서, 컴퓨터 판독한 가능 저장 매체는 메모리(랜덤 액세스 메모리와 같은 휘발성 메모리, 비휘발성 메모리, 또는 이들의 적절한 조합), 하나 이상의 자기 디스크 저장 디바이스들, 광학 디스크 저장 디바이스들, 플래시 메모리 디바이스들, 그 밖에 신호 검출 장치에 의해 액세스되고 원하는 정보를 저장할 수 있는 다른 형태의 저장 매체, 또는 이들의 적합한 조합일 수 있다.A computer readable storage medium is configured to store computer executable instructions or program code, program data and/or other suitable form of information. A program stored on a computer readable storage medium includes a set of instructions executable by a processor. In one embodiment, the computer readable storage medium includes memory (volatile memory such as random access memory, non-volatile memory, or a suitable combination thereof), one or more magnetic disk storage devices, optical disk storage devices, flash memory devices. , other types of storage media that can be accessed by the signal detection device and store desired information, or a suitable combination thereof.

통신 버스는 프로세서, 컴퓨터 판독 가능한 저장 매체를 포함하는 신호 검출 장치의 다른 다양한 컴포넌트들을 상호 연결한다.The communication bus interconnects the processor and various other components of the signal detection device including computer readable storage media.

신호 검출 장치는 또한 하나 이상의 입출력 장치를 위한 인터페이스를 제공하는 하나 이상의 입출력 인터페이스 및 하나 이상의 통신 인터페이스를 포함할 수 있다. 입출력 인터페이스 및 통신 인터페이스는 통신 버스에 연결된다. 입출력 장치는 입출력 인터페이스를 통해 신호 검출 장치의 다른 컴포넌트들에 연결될 수 있다.The signal detection device may also include one or more input/output interfaces providing interfaces for one or more input/output devices and one or more communication interfaces. An input/output interface and a communication interface are connected to the communication bus. The input/output device may be connected to other components of the signal detection device through an input/output interface.

신호 검출 장치는 하드웨어, 펌웨어, 소프트웨어 또는 이들의 조합에 의해 로직회로 내에서 구현될 수 있고, 범용 또는 특정 목적 컴퓨터를 이용하여 구현될 수도 있다. 장치는 고정배선형(Hardwired) 기기, 필드 프로그램 가능한 게이트 어레이(Field Programmable Gate Array, FPGA), 주문형 반도체(Application Specific Integrated Circuit, ASIC) 등을 이용하여 구현될 수 있다. 또한, 장치는 하나 이상의 프로세서 및 컨트롤러를 포함한 시스템온칩(System on Chip, SoC)으로 구현될 수 있다.The signal detection device may be implemented in a logic circuit by hardware, firmware, software, or a combination thereof, or may be implemented using a general-purpose or special-purpose computer. The device may be implemented using a hardwired device, a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), or the like. In addition, the device may be implemented as a System on Chip (SoC) including one or more processors and controllers.

신호 검출 장치는 하드웨어적 요소가 마련된 컴퓨팅 디바이스 또는 서버에 소프트웨어, 하드웨어, 또는 이들의 조합하는 형태로 탑재될 수 있다. 컴퓨팅 디바이스 또는 서버는 각종 기기 또는 유무선 통신망과 통신을 수행하기 위한 통신 모뎀 등의 통신장치, 프로그램을 실행하기 위한 데이터를 저장하는 메모리, 프로그램을 실행하여 연산 및 명령하기 위한 마이크로프로세서 등을 전부 또는 일부 포함한 다양한 장치를 의미할 수 있다.The signal detection device may be installed in a computing device or server equipped with hardware elements in the form of software, hardware, or a combination thereof. A computing device or server includes all or part of a communication device such as a communication modem for communicating with various devices or wired/wireless communication networks, a memory for storing data for executing a program, and a microprocessor for executing calculations and commands by executing a program. It can mean a variety of devices, including

도 5에서는 각각의 과정을 순차적으로 실행하는 것으로 기재하고 있으나 이는 예시적으로 설명한 것에 불과하고, 이 분야의 기술자라면 본 발명의 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 도 5에 기재된 순서를 변경하여 실행하거나 또는 하나 이상의 과정을 병렬적으로 실행하거나 다른 과정을 추가하는 것으로 다양하게 수정 및 변형하여 적용 가능할 것이다.In FIG. 5, it is described that each process is sequentially executed, but this is merely an example, and a person skilled in the art changes and executes the sequence described in FIG. 5 without departing from the essential characteristics of the embodiment of the present invention. Alternatively, it will be possible to apply various modifications and variations by executing one or more processes in parallel or adding another process.

본 실시예들에 따른 동작은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능한 매체에 기록될 수 있다. 컴퓨터 판독 가능한 매체는 실행을 위해 프로세서에 명령어를 제공하는 데 참여한 임의의 매체를 나타낸다. 컴퓨터 판독 가능한 매체는 프로그램 명령, 데이터 파일, 데이터 구조 또는 이들의 조합을 포함할 수 있다. 예를 들면, 자기 매체, 광기록 매체, 메모리 등이 있을 수 있다. 컴퓨터 프로그램은 네트워크로 연결된 컴퓨터 시스템 상에 분산되어 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수도 있다. 본 실시예를 구현하기 위한 기능적인(Functional) 프로그램, 코드, 및 코드 세그먼트들은 본 실시예가 속하는 기술분야의 프로그래머들에 의해 용이하게 추론될 수 있을 것이다.Operations according to the present embodiments may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer readable medium. Computer readable medium refers to any medium that participates in providing instructions to a processor for execution. A computer readable medium may include program instructions, data files, data structures, or combinations thereof. For example, there may be a magnetic medium, an optical recording medium, a memory, and the like. The computer program may be distributed over networked computer systems so that computer readable codes are stored and executed in a distributed manner. Functional programs, codes, and code segments for implementing this embodiment may be easily inferred by programmers in the art to which this embodiment belongs.

본 실시예들은 본 실시예의 기술 사상을 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.These embodiments are for explaining the technical idea of this embodiment, and the scope of the technical idea of this embodiment is not limited by these embodiments. The scope of protection of this embodiment should be construed according to the claims below, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of rights of this embodiment.

Claims

In the signal detection device for multiple input multiple output (MIMO) communication for vehicles,
an antenna for receiving a received signal using a radio frequency; and
Based on the received signal, a signal processing unit for detecting a transmission signal through reinforcement learning;
Characterized in that the signal processing unit controls the complexity and performance of signal detection by adjusting the number of learning episodes of the reinforcement learning according to the speed of the vehicle.
signal detection device.

According to claim 1,
The received signal is expressed as a transmission/reception relationship between a channel matrix, the transmission signal, and reception noise,
The signal processing unit decomposes the channel matrix into a unitary matrix and a lower triangular matrix,
The signal detection device characterized in that the transmission and reception relationship is reconstructed using the unitary matrix.

According to claim 2,
The signal processing unit calculates a detection criterion of the transmission signal from the reconstructed transmission/reception relationship.

According to claim 2,
The signal processing unit applies a Markov Decision Process to the reconstructed transmission and reception relationship, defines a state, action, and reward according to the Markov Decision Process, A signal detection device characterized in that for detecting.

According to claim 4,
The signal processing unit defines a behavior having as an index the number of sets of real parts in the constellation of the transmission signal.

According to claim 4,
The compensation uses a detection criterion of the transmission signal calculated from the reconstructed transmission/reception relationship, and applies a minus operator so that the compensation increases as the signal detection error decreases.

delete

In the signal detection method for multiple input multiple output (MIMO) communication for vehicles,
receiving a reception signal using a radio frequency;
Based on the received signal, detecting a transmitted signal through reinforcement learning,
Detecting the transmission signal,
Controlling the complexity and performance of signal detection by adjusting the number of learning episodes of the reinforcement learning according to the speed of the vehicle.
signal detection method.

According to claim 8,
Detecting the transmission signal,
A signal detection method characterized by applying a Markov decision process defined as an action having the number of sets of real parts as an index in the constellation of the transmission signal.

delete