KR102027674B1

KR102027674B1 - Method and system for estimating i/q imbalance parameter of transceiver based on reinforcement learning

Info

Publication number: KR102027674B1
Application number: KR1020180004290A
Authority: KR
Inventors: 박현철; 지정주; 권기림
Original assignee: 한국과학기술원
Priority date: 2018-01-12
Filing date: 2018-01-12
Publication date: 2019-10-02
Also published as: KR20190086133A

Abstract

강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 방법 및 시스템이 제시된다. 일 실시예에 따른 강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 방법은, 송신단 및 수신단 중 적어도 어느 하나 이상의 I/Q 불균형이 발생하는 다중 안테나 채널에서, 각 파라미터의 초기값을 임의로 설정하는 단계; I/Q 불균형 값이 포함된 유효 채널을 측정하는 단계; 이전 단계(step)에서 추정한 상기 송신단 또는 상기 수신단 I/Q 불균형 값을 증가 또는 감소시켜 I/Q 불균형 값의 추정 값을 계산하는 단계; 계산된 상기 I/Q 불균형 값의 추정 값을 이용하여 I/Q 불균형이 보상된 유효 채널을 계산하는 단계; 상기 I/Q 불균형이 보상된 유효 채널을 이용하여 I/Q 불균형이 보상된 정도를 나타내는 코스트(cost)를 측정하는 단계; 상기 코스트(cost)에 비례하여 보상(reward)을 계산하는 단계; 및 상기 보상을 기반으로 다음 단계 크기(step size)를 설정하는 단계를 포함하여 이루어질 수 있다. A method and system for estimating I / Q imbalance parameters of a multi-antenna transceiver based on enhanced learning is presented. According to an embodiment, an I / Q imbalance parameter estimation method of an enhanced learning-based multi-antenna transceiver includes arbitrarily setting an initial value of each parameter in a multi-antenna channel in which at least one I / Q imbalance occurs between a transmitter and a receiver. Doing; Measuring an effective channel including an I / Q imbalance value; Calculating an estimated value of the I / Q imbalance value by increasing or decreasing the transmitter / receiver I / Q imbalance value estimated in the previous step; Calculating an effective channel compensated for I / Q imbalance by using the estimated value of the calculated I / Q imbalance value; Measuring a cost indicating a degree to which an I / Q imbalance is compensated for by using an effective channel compensated for the I / Q imbalance; Calculating a reward in proportion to the cost; And setting a next step size based on the compensation.

Description

I / Q imbalance parameter estimation method and system for reinforcement learning-based multi-antenna transceivers {METHOD AND SYSTEM FOR ESTIMATING I / Q IMBALANCE PARAMETER OF TRANSCEIVER BASED ON REINFORCEMENT LEARNING}

아래의 실시예들은 강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 방법 및 시스템에 관한 것으로, 더욱 상세하게는 송수신단에 I/Q 불균형이 존재하는 다중 안테나 시스템에서 강화 학습 기법을 이용한 I/Q 불균형 파라미터의 추정 방법 및 시스템에 관한 것이다. The following embodiments relate to a method and system for estimating I / Q imbalance parameters of a multi-antenna transceiver based on reinforcement learning. More specifically, the present invention relates to an I / Q imbalance parameter estimation method in a multi-antenna system in which an I / Q imbalance exists in a transceiver. / Q unbalance parameter estimation method and system.

많은 수의 기지국 안테나를 사용한 대용량 다중 입력 다중 출력(massive Multi Input Multi Output, massive MIMO) 기술은 사용자 간 신호 간섭을 제거하고 높은 주파수 효율을 내는 기술로 최근 각광을 받고 있다. Massive Multi Input Multi Output (MIMO) technology, which uses a large number of base station antennas, has recently been in the spotlight as a technology for eliminating signal interference between users and providing high frequency efficiency.

많은 수의 안테나를 사용하기 위해서는 저비용의 소자를 사용해야 하며, 이러한 RF 소자에서 발생하는 동 위상(in-phase) 성분 및 직교 위상(quadrature phase) 성분 불균형(I/Q 불균형)은 반송파의 동 위상 성분과 직교 위상 성분의 진폭과 위상에 불일치를 발생시켜 직교성을 떨어뜨리고 간섭을 발생시켜 성능을 크게 떨어뜨린다.In order to use a large number of antennas, low-cost devices must be used, and the in-phase and quadrature phase imbalances (I / Q imbalance) occurring in these RF devices are the in-phase components of the carrier wave. Inconsistency in the amplitude and phase of the orthogonal phase component degrades orthogonality and interference, which significantly degrades performance.

종래에 I/Q 불균형의 영향을 보상하는 기법이 제안되었으나, 이러한 기법을 위해서는 I/Q 불균형 파라미터(parameter)의 추정이 필요하다.Conventionally, a technique for compensating the influence of the I / Q imbalance has been proposed, but for such a technique, it is necessary to estimate the I / Q imbalance parameter.

한국공개특허 10-2009-0089531호는 이러한 직교 주파수 분할 다중 수신기에서 아이/큐 불균형 파라미터를 추정하는 장치 및 방법에 관한 것으로, 휴대용 단말기의 외장 메모리에 저장된 파일을 효율적으로 관리하기 위한 방법 및 장치에 관한 기술을 기재하고 있다.Korean Patent Laid-Open No. 10-2009-0089531 relates to an apparatus and method for estimating an eye / cue unbalance parameter in such an orthogonal frequency division multiplexing receiver, and to a method and apparatus for efficiently managing a file stored in an external memory of a portable terminal. It describes the technology.

한국공개특허 10-2009-0089531호Korean Patent Publication No. 10-2009-0089531

실시예들은 강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 방법 및 시스템에 관하여 기술하며, 보다 구체적으로 송수신단에 I/Q 불균형이 존재하는 다중 안테나 시스템에서 강화 학습 기법을 이용한 I/Q 불균형 파라미터의 추정 기술을 제공한다. Embodiments describe a method and system for estimating I / Q imbalance parameters of a multi-antenna transceiver based on reinforcement learning, and more specifically, I / Q using a reinforcement learning technique in a multi-antenna system in which an I / Q imbalance exists in the transceiver. Provide a technique for estimating unbalanced parameters.

실시예들은 보상 계산을 I/Q 불균형이 보상된 정도로 측정하여, 시간에 따라 맞는 값으로 수렴하게 하며 파라미터 변화에 강인한 특성을 갖는 강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 방법 및 시스템을 제공하는데 있다. Embodiments are a method and system for estimating the I / Q imbalance parameter of a reinforcement learning-based multi-antenna transceiver having a characteristic that measures the degree to which the I / Q imbalance is compensated, converges to a value corrected over time, and is robust to parameter change. To provide.

일 실시예에 따른 강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 방법은, 송신단 및 수신단 중 적어도 어느 하나 이상의 I/Q 불균형이 발생하는 다중 안테나 채널에서, 각 파라미터의 초기값을 임의로 설정하는 단계; I/Q 불균형 값이 포함된 유효 채널을 측정하는 단계; 이전 단계(step)에서 추정한 상기 송신단 또는 상기 수신단 I/Q 불균형 값을 증가 또는 감소시켜 I/Q 불균형 값의 추정 값을 계산하는 단계; 계산된 상기 I/Q 불균형 값의 추정 값을 이용하여 I/Q 불균형이 보상된 유효 채널을 계산하는 단계; 상기 I/Q 불균형이 보상된 유효 채널을 이용하여 I/Q 불균형이 보상된 정도를 나타내는 코스트(cost)를 측정하는 단계; 상기 코스트(cost)에 비례하여 보상(reward)을 계산하는 단계; 및 상기 보상을 기반으로 다음 단계 크기(step size)를 설정하는 단계를 포함하여 이루어질 수 있다. According to an embodiment, an I / Q imbalance parameter estimation method of an enhanced learning-based multi-antenna transceiver includes arbitrarily setting an initial value of each parameter in a multi-antenna channel in which at least one I / Q imbalance occurs between a transmitter and a receiver. Doing; Measuring an effective channel including an I / Q imbalance value; Calculating an estimated value of the I / Q imbalance value by increasing or decreasing the transmitter / receiver I / Q imbalance value estimated in the previous step; Calculating an effective channel compensated for I / Q imbalance by using the estimated value of the calculated I / Q imbalance value; Measuring a cost indicating a degree to which an I / Q imbalance is compensated for by using an effective channel compensated for the I / Q imbalance; Calculating a reward in proportion to the cost; And setting a next step size based on the compensation.

상기 송신단 또는 상기 수신단에서 보상된 보상 중 더 높은 보상을 주는 행동(action)을 선택하는 단계; 및 선택된 상기 더 높은 보상을 주는 행동을 기반으로 I/Q 불균형을 업데이트 하는 단계를 포함하고, 상기 단계를 각 파라미터에 대해 반복하여 계산할 수 있다. Selecting a higher rewarding action among the rewards compensated at the transmitting end or the receiving end; And updating the I / Q imbalance based on the higher rewarding action selected, and repeating the step for each parameter.

상기 I/Q 불균형 값의 추정 값을 이용하여 I/Q 불균형이 보상된 유효 채널을 계산하는 단계는, 보상기를 통해 상기 I/Q 불균형 값의 추정 값을 이용하여 I/Q 불균형이 보상된 유효 채널을 계산한 후 보상 생성기로 전달할 수 있다. The calculating of the effective channel compensated for I / Q imbalance using the estimated value of the I / Q imbalance value may include: calculating an effective I / Q imbalance compensated for using the estimated value of the I / Q imbalance value through a compensator. The channel can be calculated and passed to the compensation generator.

상기 I/Q 불균형이 보상된 유효 채널을 이용하여 I/Q 불균형이 보상된 정도를 나타내는 코스트(cost)를 측정하는 단계는, 상기 I/Q 불균형이 보상된 유효 채널 행렬을 입력 받아 보상 생성기에서 I/Q 불균형이 보상된 정도를 나타내는 코스트(cost)를 측정하며, 오그멘티드 채널 노테이션(augmented channel notation)에서 실수부에 해당하는 부분과 허수부에 해당하는 부분의 정도를 비교하여 계산할 수 있다. Measuring the cost representing the degree to which the I / Q imbalance is compensated by using the effective channel compensated for the I / Q imbalance, and receiving a valid channel matrix compensated for the I / Q imbalance from the compensation generator The cost indicating the degree to which the I / Q imbalance is compensated for is measured, and can be calculated by comparing the degree corresponding to the real part and the imaginary part in the augmented channel notation. .

상기 보상을 기반으로 다음 단계 크기(step size)를 설정하는 단계는, 상기 단계 크기를 코스트(cost)에 비례하게 설정하여 상기 보상이 큰 경우, 추정 값이 실제 값에 가까운 것이므로 상기 단계 크기를 감소시키고, 상기 보상이 작은 경우, 추정 값이 실제 값과 차이가 큰 것이므로 상기 단계 크기를 크게 설정할 수 있다. In the step of setting a next step size based on the compensation, if the compensation is large because the step size is set in proportion to the cost, the estimated value is close to the actual value, so that the step size is reduced. In the case where the compensation is small, the step size may be set to be large because the estimated value is different from the actual value.

상기 송신단은 사용자 단말이고 상기 수신단은 기지국이며, 상기 I/Q 불균형 값은, 기지국 진폭(gain) 불균형 값, 기지국 위상(phase) 불균형 값, 사용자 단말 진폭 불균형 값 및 사용자 단말 위상 불균형 값이 될 수 있다. The transmitting end is a user terminal and the receiving end is a base station, and the I / Q imbalance value may be a base station gain imbalance value, a base station phase imbalance value, a user terminal amplitude imbalance value, and a user terminal phase imbalance value. have.

다른 실시예에 따른 강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 시스템은, 송신단 및 수신단 중 적어도 어느 하나 이상의 I/Q 불균형이 발생하는 다중 안테나 채널에서, 각 파라미터의 초기값을 임의로 설정하고, I/Q 불균형 값이 포함된 유효 채널을 측정하며, 이전 단계(step)에서 추정한 상기 송신단 또는 상기 수신단 I/Q 불균형 값을 증가 또는 감소시켜 I/Q 불균형 값의 추정 값을 계산하고 I/Q 불균형이 보상된 유효 채널을 계산하여 I/Q 불균형이 보상된 정도를 나타내는 코스트(cost)를 측정하고, 상기 코스트(cost)에 비례하여 보상(reward)을 계산하는 수신단을 포함하여 이루어질 수 있다. According to another embodiment, an I / Q imbalance parameter estimation system of a reinforcement learning based multiple antenna transceiver stage may arbitrarily set an initial value of each parameter in a multi-antenna channel in which at least one I / Q imbalance occurs between a transmitter and a receiver. And measuring an effective channel including an I / Q imbalance value, and calculating an estimated value of the I / Q imbalance value by increasing or decreasing the transmitter or receiver I / Q imbalance value estimated in the previous step. And a receiver that calculates an effective channel compensated for I / Q imbalance and measures a cost representing the degree to which the I / Q imbalance is compensated, and calculates a reward in proportion to the cost. Can be.

상기 수신단은, 보상된 상기 보상 중 더 높은 보상을 주는 행동(action)을 선택하여, 선택된 상기 더 높은 보상을 주는 행동을 기반으로 I/Q 불균형을 업데이트 할 수 있다. The receiving end may select a higher rewarding action among the rewarded rewards to update the I / Q imbalance based on the selected higher rewarding action.

또 다른 실시예에 따른 강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 시스템은, 송신단으로부터 신호를 전달 받아 유효 채널을 추정하는 보상기; 및 상기 보상기로부터 I/Q 불균형이 보상된 유효 채널 행렬을 입력 받아 I/Q 불균형이 보상된 정도를 나타내는 코스트(cost)를 측정하고, 상기 코스트(cost)에 비례하여 보상(reward)을 계산하는 보상 생성기를 포함하고, 상기 보상기는, 송신단 및 수신단 중 적어도 어느 하나 이상의 I/Q 불균형이 발생하는 다중 안테나 채널에서, 각 파라미터의 초기값을 임의로 설정하고, I/Q 불균형 값이 포함된 유효 채널을 측정하며, 이전 단계(step)에서 추정한 상기 송신단 또는 상기 수신단 I/Q 불균형 값을 증가 또는 감소시켜 I/Q 불균형 값의 추정 값을 계산하고 I/Q 불균형이 보상된 유효 채널을 계산할 수 있다. According to another embodiment, an I / Q imbalance parameter estimating system of an enhanced learning-based multi-antenna transceiver includes: a compensator for estimating an effective channel by receiving a signal from a transmitting end; And receiving a valid channel matrix compensated for I / Q imbalance from the compensator, measuring a cost representing a degree to which the I / Q imbalance is compensated, and calculating a reward in proportion to the cost. A compensator, wherein the compensator is a multi-antenna channel in which at least one of I / Q imbalance occurs in at least one of a transmitter and a receiver, arbitrarily sets an initial value of each parameter, and includes an effective channel including an I / Q imbalance value. It is possible to calculate the estimated value of the I / Q imbalance value by calculating the estimated value of the I / Q imbalance value by increasing or decreasing the I / Q imbalance value of the transmitter or the receiver estimated in the previous step. have.

상기 수신단은, 보상된 상기 보상 중 더 높은 보상을 주는 행동(action)을 선택하고, 상기 보상기는, 선택된 상기 더 높은 보상을 주는 행동을 기반으로 I/Q 불균형을 업데이트 할 수 있다. The receiving end selects a higher rewarding action among the compensated rewards, and the compensator may update the I / Q imbalance based on the selected higher rewarding action.

상기 보상 생성기는, 상기 I/Q 불균형이 보상된 유효 채널 행렬을 입력 받아 보상 생성기에서 I/Q 불균형이 보상된 정도를 나타내는 코스트(cost)를 측정하며, 오그멘티드 채널 노테이션(augmented channel notation)에서 실수부에 해당하는 부분과 허수부에 해당하는 부분의 정도를 비교하여 계산할 수 있다. The compensation generator receives an effective channel matrix compensated for the I / Q imbalance and measures a cost indicating a degree to which the I / Q imbalance is compensated in the compensation generator, and an augmented channel notation Can be calculated by comparing the degree corresponding to the real part and the part corresponding to the imaginary part.

상기 보상기는, 상기 보상을 기반으로 다음 단계 크기(step size)를 설정하며, 상기 단계 크기를 코스트(cost)에 비례하게 설정하여 상기 보상이 큰 경우, 추정 값이 실제 값에 가까운 것이므로 상기 단계 크기를 감소시키고, 상기 보상이 작은 경우, 추정 값이 실제 값과 차이가 큰 것이므로 상기 단계 크기를 크게 설정할 수 있다. The compensator sets a next step size based on the compensation, and if the compensation is large by setting the step size in proportion to the cost, the estimated value is close to the actual value. When the compensation is small, the step size can be set large because the estimated value is a large difference from the actual value.

강화 학습을 기반으로 측정된 상기 유효 채널로부터 I/Q 불균형 파라미터를 임의로 추정하고 보상하는 방법으로 반복(iterative)하여 추정하며, 상기 I/Q 불균형의 추정 정도에 따른 보상을 계산하여 행동(action)의 크기를 적응적으로(adaptive) 조정하여 수렴 속도와 수렴 정확도를 조절할 수 있다. Iterative estimation is performed by randomly estimating and compensating I / Q imbalance parameters from the effective channel measured based on reinforcement learning, and calculating compensation based on the degree of estimation of the I / Q imbalance. The convergence speed and convergence accuracy can be adjusted by adaptively adjusting the size of.

실시예들에 따르면 보상 계산을 I/Q 불균형이 보상된 정도로 측정하여, 시간에 따라 맞는 값으로 수렴하게 하며 파라미터 변화에 강인한 특성을 갖는 강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 방법 및 시스템을 제공할 수 있다. According to embodiments, a method for estimating I / Q imbalance parameters of a reinforcement learning-based multi-antenna transceiver having a characteristic of measuring I / Q imbalance compensated and converged to a value corrected over time and robust to parameter changes And a system.

또한, 실시예들에 따르면 기존의 보상 값을 기반으로 다음 행동의 크기를 결정하기 때문에, 초기에는 빠른 수렴 속도를 가지며 수렴의 정확도가 높고, 많은 수의 안테나를 사용하는 통신 시스템에서 발생하는 I/Q 불균형을 해결하기 위해 사용될 수 있다. In addition, according to embodiments, since the size of the next behavior is determined based on the existing compensation value, an I / A generated in a communication system using a large number of antennas, which has a fast convergence speed, high accuracy of convergence, and the like, may be used. Can be used to resolve Q imbalance.

도 1은 일 실시예에 따른 강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 시스템을 나타내는 블록도이다.
도 2는 일 실시예에 따른 일 실시예에 따른 강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 방법을 나타내는 흐름도이다.
도 3은 일 실시예에 따른 진폭 불균형 파라미터 추정 MSE를 나타낸 도면이다.
도 4는 일 실시예에 따른 위상 불균형 파라미터 추정 MSE를 나타낸 도면이다. 1 is a block diagram illustrating an I / Q imbalance parameter estimation system of an enhanced learning based multiple antenna transceiver according to an embodiment.
2 is a flowchart illustrating a method for estimating an I / Q imbalance parameter of a reinforcement learning based multi-antenna transceiver according to an embodiment.
3 illustrates an amplitude imbalance parameter estimation MSE according to an embodiment.
4 is a diagram illustrating a phase imbalance parameter estimation MSE according to an embodiment.

이하, 첨부된 도면을 참조하여 실시예들을 설명한다. 그러나, 기술되는 실시예들은 여러 가지 다른 형태로 변형될 수 있으며, 본 발명의 범위가 이하 설명되는 실시예들에 의하여 한정되는 것은 아니다. 또한, 여러 실시예들은 당해 기술분야에서 평균적인 지식을 가진 자에게 본 발명을 더욱 완전하게 설명하기 위해서 제공되는 것이다. 도면에서 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다.Hereinafter, exemplary embodiments will be described with reference to the accompanying drawings. However, the described embodiments may be modified in many different forms, and the scope of the present invention is not limited to the embodiments described below. In addition, various embodiments are provided to more fully describe the present invention to those skilled in the art. Shape and size of the elements in the drawings may be exaggerated for more clear description.

본 명세서에서 본 발명의 실시예들을 기지국과 단말 간의 데이터 송신 및 수신의 관계를 중심으로 설명한다. 여기서, 기지국은 단말과 직접적으로 통신을 수행하는 네트워크의 종단 노드(terminal node)로서의 의미를 갖는다. 아래에서 기지국에 의해 수행되는 것으로 설명된 특정 동작은 경우에 따라서는 기지국의 상위 노드(upper node)에 의해 수행될 수도 있다.In the present specification, embodiments of the present invention will be described based on a relationship between data transmission and reception between a base station and a terminal. Here, the base station has a meaning as a terminal node of the network that directly communicates with the terminal. The specific operation described below to be performed by the base station may be performed by an upper node of the base station in some cases.

즉, 기지국을 포함하는 다수의 네트워크 노드들(network nodes)로 이루어지는 네트워크에서 단말과의 통신을 위해 수행되는 다양한 동작들은 기지국 또는 기지국 이외의 다른 네트워크 노드들에 의해 수행될 수 있음은 자명하다. '기지국(BS: Base Station)'은 고정국(fixed station), Node B, eNode B(eNB), 액세스 포인트(AP: Access Point) 등의 용어에 의해 대체될 수 있다. 또한, 아래에서 기지국이라는 용어는 셀 또는 섹터를 포함하는 개념으로 사용될 수 있다. 한편, 중계기는 Relay Node(RN), Relay Station(RS) 등의 용어에 의해 대체될 수 있다. '단말(Terminal)'은 UE(User Equipment), MS(Mobile Station), MSS(Mobile Subscriber Station), SS(Subscriber Station) 등의 용어로 대체될 수 있다.That is, it is obvious that various operations performed for communication with a terminal in a network composed of a plurality of network nodes including a base station may be performed by the base station or other network nodes other than the base station. A 'base station (BS)' may be replaced by terms such as a fixed station, a Node B, an eNode B (eNB), an access point (AP), and the like. In addition, the term base station may be used below as a concept including a cell or a sector. On the other hand, the repeater may be replaced by terms such as Relay Node (RN), Relay Station (RS). The term “terminal” may be replaced with terms such as user equipment (UE), mobile station (MS), mobile subscriber station (MSS), and subscriber station (SS).

이하의 설명에서 사용되는 특정 용어들은 본 발명의 이해를 돕기 위해서 제공된 것이며, 이러한 특정 용어의 사용은 본 발명의 기술적 사상을 벗어나지 않는 범위에서 다른 형태로 변경될 수 있다.Specific terms used in the following description are provided to help the understanding of the present invention, and the use of such specific terms may be changed to other forms without departing from the technical spirit of the present invention.

몇몇의 경우, 본 발명의 개념이 모호해지는 것을 피하기 위하여 공지의 구조 및 장치는 생략되거나, 각 구조 및 장치의 핵심기능을 중심으로 한 블록도 형식으로 도시될 수 있다. 또한, 본 명세서 전체에서 동일한 구성요소에 대해서는 동일한 도면 부호를 사용하여 설명한다.
In some instances, well-known structures and devices may be omitted or shown in block diagram form centering on the core functions of the structures and devices in order to avoid obscuring the concepts of the present invention. In addition, the same components will be described with the same reference numerals throughout the present specification.

본 실시예에서는 다중 안테나 통신 시스템의 송신단과 수신단에서 I/Q 불균형이 발생하는 시스템을 고려한다. 종래에 I/Q 불균형의 영향을 보상하는 기법이 제안되었으나, 이러한 기법을 위해서는 I/Q 불균형 파라미터의 추정이 필요하다. 또한, I/Q 불균형이 수신단과 송신단 양쪽에서 발생하는 경우, 구해야 하는 변수의 수가 방정식의 수보다 많아 직접적인 계산이 불가능하다.In this embodiment, a system in which I / Q imbalance occurs in a transmitting end and a receiving end of a multi-antenna communication system is considered. Conventionally, a technique for compensating the effects of the I / Q imbalance has been proposed, but for this technique, an estimation of the I / Q imbalance parameter is required. In addition, when the I / Q imbalance occurs at both the receiving end and the transmitting end, the number of variables to be obtained is greater than the number of equations so that direct calculation is impossible.

안테나의 수가 증가하는 대용량(massive) MIMO 시스템에서는 추정해야 하는 I/Q 불균형 파라미터의 수가 안테나 수에 비례해서 증가하기 때문에 모든 상황에 대한 라벨(label)을 제공하는 지도 학습 기법은 필요한 훈련 데이터의 수가 커서 사용하기 적합하지 않다.In a massive MIMO system where the number of antennas increases, supervised learning techniques that provide labels for all situations require that the number of training data required be increased because the number of I / Q imbalance parameters that must be estimated increases in proportion to the number of antennas. Not suitable for cursor use

한편, 강화 학습은 훈련 데이터가 필요하지 않고 에이전트(agent)가 보상을 크게 하는 방향으로 행동(action)을 진행하며 학습을 진행하므로, 제안한 문제에 적합하다.On the other hand, reinforcement learning does not require training data, and the agent proceeds with the action in the direction of increasing the reward, which is suitable for the proposed problem.

아래의 실시예에서는 송수신단에 I/Q 불균형이 존재하는 다중 안테나 시스템에서 강화 학습 기법을 이용한 I/Q 불균형 파라미터의 추정 알고리즘을 제안한다. 제안한 알고리즘은 시간에 따라 시스템이 적응적으로 파라미터의 추정 오차를 감소시키도록 설계되었으며, 시스템의 변화에 강인한 특징을 가지고 있다.In the following embodiment, an algorithm for estimating I / Q imbalance parameters using reinforcement learning is proposed in a multi-antenna system in which I / Q imbalance exists in a transceiver. The proposed algorithm is designed so that the system adaptively reduces the estimation error of parameters over time and is robust to changes in the system.

다중 안테나의 RF　소자에서 발생하는 I/Q 불균형은 동 위상 신호와 직교 위상 신호에 간섭을 발생시켜 전송 속도를 크게 떨어뜨린다. 이를 보상하기 위해서는 정확한 I/Q 불균형 파라미터의 추정이 필요한데, 본 실시예에 따르면 정확한 I/Q 불균형 파라미터의 추정이 가능하므로 더 높은 전송 속도를 가지는 무선통신이 가능하다.
I / Q imbalances in the RF elements of multiple antennas interfere with in-phase and quadrature signals, significantly reducing the transmission speed. In order to compensate for this, an accurate I / Q imbalance parameter needs to be estimated. According to the present embodiment, since an accurate I / Q imbalance parameter can be estimated, wireless communication having a higher transmission speed is possible.

도 1은 일 실시예에 따른 강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 시스템을 나타내는 블록도이다. 1 is a block diagram illustrating an I / Q imbalance parameter estimation system of an enhanced learning based multiple antenna transceiver according to an embodiment.

도 1을 참조하면, 일 실시예에 따른 강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 시스템은 강화 학습 기반으로 다수의 안테나를 시스템의 기지국(BS)과 사용자 단말(UT)에서 I/Q 불균형을 추정할 수 있다. Referring to FIG. 1, an I / Q imbalance parameter estimating system of an enhanced learning based multi-antenna transceiver according to an embodiment includes a plurality of antennas based on enhanced learning based on I / Q at a base station (BS) and a user terminal (UT) of a system. The Q imbalance can be estimated.

일 실시예에 따른 강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 시스템(100)은 사용자 단말(UT, 110)과 기지국(BS, 140) 사이에 보상기(compensator, 120)와 보상 생성기(reward generator, 130)로 구성될 수 있다.According to an exemplary embodiment, an I / Q imbalance parameter estimating system 100 of a reinforcement learning based multi-antenna transceiver includes a compensator 120 and a compensation generator between a user terminal UT 110 and a base station BS 140. reward generator, 130).

사용자 단말(110)에서는 파일럿(pilot) 신호를 전송하여 기지국(140)에서 유효 채널을 추정하도록 할 수 있다.The user terminal 110 may transmit a pilot signal to allow the base station 140 to estimate an effective channel.

기지국(140)에서는 수신한 채널 정보를 기존에 추정한 I/Q 불균형 파라미터를 기반으로 기존 단계 크기(step size)만큼 I/Q 불균형 값을 증가 또는 감소시켜 해당하는 값의 I/Q 불균형 행렬의 추정 값을 계산하여 기지국(140)와 사용자 단말(110) 양쪽에서 보상할 수 있다.The base station 140 increases or decreases the I / Q imbalance value by the existing step size based on the previously estimated I / Q imbalance parameter based on the received channel information, thereby determining the I / Q imbalance matrix of the corresponding value. The estimated value may be calculated to compensate for both the base station 140 and the user terminal 110.

보상된 채널 정보는 보상 생성기(130)의 입력으로 전달하며, 보상 생성기(130)는 입력된 행렬 값으로부터 I/Q 불균형이 보상된 정도를 측정한다. 이는 오그멘티드 채널 노테이션(augmented channel notation)에서 실수부에 해당하는 부분과 허수부에 해당하는 부분의 정도를 비교하여 계산한다. The compensated channel information is transmitted to the input of the compensation generator 130, and the compensation generator 130 measures the degree to which the I / Q imbalance is compensated from the input matrix value. This is calculated by comparing the degree corresponding to the real part and the imaginary part in the augmented channel notation.

보상된 정도에 비례하여 보상(reward)을 계산한다.The reward is calculated in proportion to the amount compensated.

기지국(140)에서는 보상된 보상 중 더 높은 보상을 주는 행동을 선택한다. 여기에서는 파라미터의 증가 또는 감소 중 보상을 크게 주는 방향으로 행동을 취한다The base station 140 selects an action that gives a higher reward among the compensated rewards. In this case, the action is taken in such a way that the compensation increases during the increase or decrease of the parameter.

받은 보상을 기반으로 다음 단계 크기를 설정한다. 기존의 보상이 컸던 경우 추정 값이 실제 값에 가까운 것이므로 단계 크기를 감소시키고, 보상이 작은 경우 추정 값이 실제 값과 차이가 큰 것이므로 단계 크기를 크게 설정한다. 이상의 과정을 모든 파라미터에 대해서 반복한다.Set the next step size based on the reward received. If the existing compensation is large, the step size is reduced because the estimated value is close to the actual value. If the compensation is small, the step size is large because the estimated value is large from the actual value. The above procedure is repeated for all parameters.

아래에서 일 실시예에 따른 강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 시스템을 하나의 예를 들어 보다 구체적으로 설명한다.
Hereinafter, an I / Q imbalance parameter estimation system of an enhanced learning based multi-antenna transceiver according to an embodiment will be described in more detail with an example.

일 실시예에 따른 강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 시스템(100)은 보상기(120) 및 보상 생성기(130)를 포함하여 이루어질 수 있다. 한편, 송신단은 사용자 단말(110)이고 수신단은 기지국(140)이며, I/Q 불균형 값은 기지국(140) 진폭(gain) 불균형 값, 기지국(140) 위상(phase) 불균형 값, 사용자 단말(110) 진폭 불균형 값 및 사용자 단말(110) 위상 불균형 값이 될 수 있다. The I / Q imbalance parameter estimating system 100 of the reinforcement learning based multi-antenna transceiver according to an embodiment may include a compensator 120 and a compensation generator 130. Meanwhile, the transmitting end is the user terminal 110 and the receiving end is the base station 140, and the I / Q imbalance value is the amplitude imbalance value of the base station 140, the phase imbalance value of the base station 140, and the user terminal 110. ) May be an amplitude imbalance value and a user terminal 110 phase imbalance value.

보상기(120)는 송신단으로부터 신호를 전달 받아 유효 채널을 추정할 수 있다. 또한, 보상기(120)는 송신단 및 수신단 중 적어도 어느 하나 이상의 I/Q 불균형이 발생하는 다중 안테나 채널에서, 각 파라미터의 초기값을 임의로 설정하고, I/Q 불균형 값이 포함된 유효 채널을 측정하며, 이전 단계(step)에서 추정한 송신단 또는 수신단 I/Q 불균형 값을 증가 또는 감소시켜 I/Q 불균형 값의 추정 값을 계산하고 I/Q 불균형이 보상된 유효 채널을 계산할 수 있다. The compensator 120 may receive a signal from the transmitter to estimate the effective channel. In addition, the compensator 120 arbitrarily sets an initial value of each parameter and measures an effective channel including an I / Q imbalance value in a multi-antenna channel in which at least one I / Q imbalance occurs in at least one of a transmitter and a receiver. In addition, the estimated value of the I / Q imbalance value can be calculated by calculating the I / Q imbalance value by increasing or decreasing the transmitter / receiver I / Q imbalance value estimated in the previous step and calculating the effective channel compensated for the I / Q imbalance value.

한편, 수신단은 보상된 보상 중 더 높은 보상을 주는 행동(action)을 선택하고, 보상기(120)는 수신단에서 보상된 보상 중 더 높은 보상을 주는 행동(action)을 선택함에 따라 선택된 더 높은 보상을 주는 행동을 기반으로 I/Q 불균형을 업데이트 할 수 있다. On the other hand, the receiving end selects an action that gives a higher reward among the rewarded rewards, and the compensator 120 selects a higher reward that is selected as the higher reward among the rewarded rewards is selected. The note can update the I / Q imbalance based on the behavior.

보상기(120)는 보상을 기반으로 다음 단계 크기(step size)를 설정하며, 단계 크기를 코스트(cost)에 비례하게 설정하여 보상이 큰 경우, 추정 값이 실제 값에 가까운 것이므로 단계 크기를 감소시키고, 보상이 작은 경우, 추정 값이 실제 값과 차이가 큰 것이므로 단계 크기를 크게 설정할 수 있다. The compensator 120 sets the next step size based on the compensation. If the compensation is large by setting the step size proportional to the cost, the estimated value is close to the actual value, and thus the step size is reduced. For example, if the compensation is small, the estimated value is a large difference from the actual value, so that the step size can be set large.

보상 생성기(130)는 보상기(120)로부터 I/Q 불균형이 보상된 유효 채널 행렬을 입력 받아 I/Q 불균형이 보상된 정도를 나타내는 코스트(cost)를 측정하고, 코스트(cost)에 비례하여 보상(reward)을 계산할 수 있다. The compensation generator 130 receives an effective channel matrix in which the I / Q imbalance is compensated from the compensator 120, measures a cost indicating the degree to which the I / Q imbalance is compensated, and compensates in proportion to the cost. (reward) can be calculated.

또한, 보상 생성기(130)는 I/Q 불균형이 보상된 유효 채널 행렬을 입력 받아 보상 생성기(130)에서 I/Q 불균형이 보상된 정도를 나타내는 코스트(cost)를 측정하며, 오그멘티드 채널 노테이션(augmented channel notation)에서 실수부에 해당하는 부분과 허수부에 해당하는 부분의 정도를 비교하여 계산할 수 있다. In addition, the compensation generator 130 receives an effective channel matrix compensated for I / Q imbalance, and measures a cost indicating the degree to which the I / Q imbalance is compensated by the compensation generator 130, and the channel notation It can be calculated by comparing the degree of the portion corresponding to the real part and the imaginary part in the augmented channel notation.

강화 학습을 기반으로 측정된 유효 채널로부터 I/Q 불균형 파라미터를 임의로 추정하고 보상하는 방법으로 반복(iterative)하여 추정하며, I/Q 불균형의 추정 정도에 따른 보상을 계산하여 행동(action)의 크기를 적응적으로(adaptive) 조정하여 수렴 속도와 수렴 정확도를 조절할 수 있다. Iterative estimation is performed by randomly estimating and compensating the I / Q imbalance parameter from the effective channel measured based on reinforcement learning, and calculating the compensation according to the estimation degree of the I / Q imbalance. Can be adjusted adaptively to adjust the convergence speed and convergence accuracy.

실시예들에 따르면 보상 계산을 I/Q 불균형이 보상된 정도로 측정하여, 시간에 따라 맞는 값으로 수렴하게 하며 파라미터 변화에 강인한 특성을 가진다. 기존의 보상 값을 기반으로 다음 행동의 크기를 결정하기 때문에, 초기에는 빠른 수렴 속도를 가지며 수렴의 정확도가 높다. 그리고 많은 수의 안테나를 사용하는 통신 시스템에서 발생하는 I/Q 불균형을 해결하기 위해 사용될 수 있다.
According to the embodiments, the compensation calculation measures the degree to which the I / Q imbalance is compensated, converges to a value corrected over time, and has a strong characteristic against parameter changes. Since the size of the next action is determined based on the existing compensation value, the convergence speed is initially high and the accuracy of convergence is high. And it can be used to solve the I / Q imbalance that occurs in a communication system using a large number of antennas.

다른 실시예에 따른 강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 시스템은 송신단 및 수신단 중 적어도 어느 하나 이상의 I/Q 불균형이 발생하는 다중 안테나 채널에서, 각 파라미터의 초기값을 임의로 설정하고, I/Q 불균형 값이 포함된 유효 채널을 측정하며, 이전 단계(step)에서 추정한 송신단 또는 수신단 I/Q 불균형 값을 증가 또는 감소시켜 I/Q 불균형 값의 추정 값을 계산하고 I/Q 불균형이 보상된 유효 채널을 계산하여 I/Q 불균형이 보상된 정도를 나타내는 코스트(cost)를 측정하고, 코스트(cost)에 비례하여 보상(reward)을 계산하는 수신단을 포함하여 이루어질 수 있다. According to another embodiment, an I / Q imbalance parameter estimation system of an enhanced learning-based multi-antenna transceiver may arbitrarily set an initial value of each parameter in a multi-antenna channel having at least one I / Q imbalance between a transmitter and a receiver. We measure the effective channel that contains the I / Q imbalance value, and calculate the estimated value of the I / Q imbalance value by increasing or decreasing the transmitter or receiver I / Q imbalance value estimated in the previous step. The receiver may be configured to calculate a valid channel compensated for an imbalance, measure a cost representing a degree to which the I / Q imbalance is compensated, and calculate a reward in proportion to the cost.

수신단은 보상된 보상 중 더 높은 보상을 주는 행동(action)을 선택하여, 선택된 더 높은 보상을 주는 행동을 기반으로 I/Q 불균형을 업데이트 할 수 있다. The receiving end may select a higher rewarding action among the rewarded rewards to update the I / Q imbalance based on the selected higher rewarding action.

예를 들어, 수신단은 보상기 및 보상 생성기를 포함할 수 있으며, 이러한 특징을 앞에서 설명한 일 실시예에 따른 강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 시스템의 설명과 중복되어 생략한다. For example, the receiving end may include a compensator and a compensation generator, and this feature is omitted in the overlapping description of the I / Q imbalance parameter estimation system of the enhanced learning-based multi-antenna transceiver according to the above-described embodiment.

한편, 송신단은 사용자 단말이고 수신단은 기지국이며, I/Q 불균형 값은 기지국 진폭(gain) 불균형 값, 기지국 위상(phase) 불균형 값, 사용자 단말 진폭 불균형 값 및 사용자 단말 위상 불균형 값이 될 수 있다.
Meanwhile, the transmitting end is a user terminal and the receiving end is a base station, and the I / Q imbalance value may be a base station gain imbalance value, a base station phase imbalance value, a user terminal amplitude imbalance value, and a user terminal phase imbalance value.

도 2는 일 실시예에 따른 일 실시예에 따른 강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 방법을 나타내는 흐름도이다. 2 is a flowchart illustrating a method for estimating an I / Q imbalance parameter of a reinforcement learning based multi-antenna transceiver according to an embodiment.

도 2를 참조하면, 일 실시예에 따른 강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 방법은, 송신단 및 수신단 중 적어도 어느 하나 이상의 I/Q 불균형이 발생하는 다중 안테나 채널에서, 각 파라미터의 초기값을 임의로 설정하는 단계(210), I/Q 불균형 값이 포함된 유효 채널을 측정하는 단계(220), 이전 단계(step)에서 추정한 송신단 또는 수신단 I/Q 불균형 값을 증가 또는 감소시켜 I/Q 불균형 값의 추정 값을 계산하는 단계(230), 계산된 I/Q 불균형 값의 추정 값을 이용하여 I/Q 불균형이 보상된 유효 채널을 계산하는 단계(240), I/Q 불균형이 보상된 유효 채널을 이용하여 I/Q 불균형이 보상된 정도를 나타내는 코스트(cost)를 측정하는 단계(250), 코스트(cost)에 비례하여 보상(reward)을 계산하는 단계(260), 및 보상을 기반으로 다음 단계 크기(step size)를 설정하는 단계(270)를 포함하여 이루어질 수 있다. Referring to FIG. 2, in the I / Q imbalance parameter estimation method of an enhanced learning-based multi-antenna transceiver according to an embodiment, each parameter in a multi-antenna channel in which at least one I / Q imbalance occurs between a transmitter and a receiver is generated. In step 210, an initial value of is arbitrarily set, a step of measuring an effective channel including an I / Q imbalance value 220, and an increase or decrease of the transmitter or receiver I / Q imbalance value estimated in the previous step Calculating an estimated value of the I / Q imbalance value (230), calculating an effective channel compensated for I / Q imbalance using an estimated value of the calculated I / Q imbalance value (240), and I / Q Measuring a cost indicative of the degree to which the I / Q imbalance is compensated for using an effective channel compensated for an imbalance, calculating a reward in proportion to the cost (260), And step size based on the reward It may be made, including the step of setting (270).

송신단 또는 수신단에서 보상된 보상 중 더 높은 보상을 주는 행동(action)을 선택하는 단계(280) 및 선택된 더 높은 보상을 주는 행동을 기반으로 I/Q 불균형을 업데이트 하는 단계(290)를 포함하고, 상기 단계를 각 파라미터에 대해 반복하여 계산할 수 있다. Selecting (280) a higher reward action among the rewards compensated at the transmitting or receiving end and updating the I / Q imbalance based on the selected higher reward action (290), The above steps can be calculated repeatedly for each parameter.

실시예들에 따르면, 강화 학습을 기반으로 측정된 채널로부터 I/Q 불균형 파라미터를 임의로 추정하고 보상하는 방법으로 반복(iterative)하여 추정하며, 현재 I/Q 불균형의 추정 정도에 따른 보상을 계산하여 행동(action)의 크기를 적응적으로(adaptive) 조정하여 수렴 속도와 수렴 정확도를 조절할 수 있다. According to the embodiments, iteratively estimates by a method of arbitrarily estimating and compensating I / Q imbalance parameters from the measured channel based on reinforcement learning, and calculating compensation based on the degree of estimation of the current I / Q imbalance. By adaptively adjusting the size of the action, the convergence speed and convergence accuracy can be controlled.

아래에서 일 실시예에 따른 강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 방법을 보다 구체적으로 설명한다.
Hereinafter, a method for estimating an I / Q imbalance parameter of a reinforcement learning based multi-antenna transceiver according to an embodiment will be described in more detail.

일 실시예에 따른 강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 방법은 도 1에서 설명한 일 실시예에 따른 강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 시스템을 이용하여 하나의 예를 들어 설명할 수 있다. 여기서, 일 실시예에 따른 강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 시스템은 사용자 단말(UT)과 기지국(BS,) 사이에 보상기(compensator)와 보상 생성기(reward generator)로 구성될 수 있다.An I / Q imbalance parameter estimation method of an enhanced learning based multi-antenna transceiver according to an embodiment is performed by using an I / Q imbalance parameter estimation system of an enhanced learning-based multi-antenna transceiver according to an embodiment described with reference to FIG. 1. For example, it can be explained. Here, the I / Q imbalance parameter estimation system of the reinforcement learning based multi-antenna transceiver according to an embodiment may be composed of a compensator and a reward generator between the user terminal (UT) and the base station (BS). Can be.

단계(210)는, 송신단 및 수신단 중 적어도 어느 하나 이상의 I/Q 불균형이 발생하는 다중 안테나 채널에서, 각 파라미터의 초기값을 임의로 설정할 수 있다. 한편, 송신단은 사용자 단말이고 수신단은 기지국이 될 수 있으며, I/Q 불균형 값은 기지국 진폭(gain) 불균형 값, 기지국 위상(phase) 불균형 값, 사용자 단말 진폭 불균형 값 및 사용자 단말 위상 불균형 값이 될 수 있다. Step 210 may arbitrarily set an initial value of each parameter in a multi-antenna channel in which at least one I / Q imbalance occurs in at least one of a transmitter and a receiver. Meanwhile, the transmitting end may be a user terminal and the receiving end may be a base station, and the I / Q imbalance value may be a base station gain imbalance value, a base station phase imbalance value, a user terminal amplitude imbalance value, and a user terminal phase imbalance value. Can be.

단계(220)에서, I/Q 불균형 값이 포함된 유효 채널을 측정할 수 있다. In operation 220, an effective channel including an I / Q imbalance value may be measured.

단계(220)에서, 이전 단계(step)에서 추정한 송신단 또는 수신단 I/Q 불균형 값을 증가 또는 감소시켜 I/Q 불균형 값의 추정 값을 계산할 수 있다. In step 220, the estimated value of the I / Q imbalance value may be calculated by increasing or decreasing the transmitter or receiver I / Q imbalance value estimated in the previous step.

단계(240)에서, 계산된 I/Q 불균형 값의 추정 값을 이용하여 I/Q 불균형이 보상된 유효 채널을 계산할 수 있다. 이 때, 보상기를 통해 I/Q 불균형 값의 추정 값을 이용하여 I/Q 불균형이 보상된 유효 채널을 계산한 후 보상 생성기로 전달할 수 있다. In operation 240, an estimated value of the calculated I / Q imbalance value may be used to calculate an effective channel compensated for I / Q imbalance. In this case, an effective channel compensated for I / Q imbalance is calculated using the estimated value of the I / Q imbalance value through the compensator, and then transferred to the compensation generator.

단계(250)에서, I/Q 불균형이 보상된 유효 채널을 이용하여 I/Q 불균형이 보상된 정도를 나타내는 코스트(cost)를 측정할 수 있다. 여기서, I/Q 불균형이 보상된 유효 채널 행렬을 입력 받아 보상 생성기에서 I/Q 불균형이 보상된 정도를 나타내는 코스트(cost)를 측정하며, 오그멘티드 채널 노테이션(augmented channel notation)에서 실수부에 해당하는 부분과 허수부에 해당하는 부분의 정도를 비교하여 계산할 수 있다.In step 250, the effective channel compensated for the I / Q imbalance may be used to measure the cost indicating the degree to which the I / Q imbalance is compensated. Here, a valid channel matrix compensated for I / Q imbalance is input to measure a cost representing the degree to which the I / Q imbalance is compensated by the compensation generator, and the real part is displayed in the augmented channel notation. It can be calculated by comparing the degree of the corresponding part with the imaginary part.

단계(260)에서, 코스트(cost)에 비례하여 보상(reward)을 계산할 수 있다. In step 260, a reward may be calculated in proportion to the cost.

단계(270)에서, 보상을 기반으로 다음 단계 크기(step size)를 설정할 수 있다. 여기서, 단계 크기를 코스트(cost)에 비례하게 설정하여 보상이 큰 경우, 추정 값이 실제 값에 가까운 것이므로 단계 크기를 감소시키고, 보상이 작은 경우, 추정 값이 실제 값과 차이가 큰 것이므로 단계 크기를 크게 설정할 수 있다. In step 270, a next step size may be set based on the reward. Here, if the compensation is large by setting the step size in proportion to the cost, the step size is reduced because the estimated value is close to the actual value, and if the compensation is small, the step size is because the estimated value is different from the actual value. Can be set large.

또한, 단계(280)에서, 송신단 또는 수신단에서 보상된 보상 중 더 높은 보상을 주는 행동(action)을 선택할 수 있다. Further, in step 280, an action that gives a higher reward among rewards compensated at the transmitting end or the receiving end may be selected.

단계(290)에서, 선택된 더 높은 보상을 주는 행동을 기반으로 I/Q 불균형을 업데이트 할 수 있다. 여기서, 단계(280)을 생략하고, 송신단 또는 수신단에서 보상된 보상 중 더 높은 보상을 주는 행동(action)을 선택함에 따라 선택된 더 높은 보상을 주는 행동을 기반으로 I/Q 불균형을 업데이트 할 수도 있다.
In step 290, the I / Q imbalance may be updated based on the selected higher reward behavior. In this case, step 280 may be omitted, and the I / Q imbalance may be updated based on the higher rewarding action selected by selecting a higher rewarding action among the rewards compensated at the transmitting end or the receiving end. .

아래에서는 일 실시예에 따른 강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 방법 및 시스템을 하나의 예를 들어 보다 상세히 설명한다.Hereinafter, a method and system for estimating I / Q imbalance parameters of an enhanced learning-based multi-antenna transceiver according to an embodiment will be described in more detail with one example.

먼저, 송수신단 I/Q 불균형이 발생하는 다중 안테나 수신 신호 모델을 설명한다. First, a multi-antenna received signal model in which the transmit / receive end I / Q imbalance occurs will be described.

N 개의 안테나를 가진 기지국과 단일 안테나를 K 명의 사용자가 존재하는 다중 안테나 시스템에서 송수신단 양쪽에서 I/Q 불균형이 경우 하향링크 수신 신호는 다음 식과 같이 나타낼 수 있다. 여기서, N은 기지국(BS)의 안테나 수이고, K는 사용자 단말(UT)의 수를 나타낸다. In a multi-antenna system in which a base station having N antennas and a single user antenna have K users, the downlink reception signal may be represented as follows in the case of I / Q imbalance at both a transmitting and receiving end. Here, N is the number of antennas of the base station (BS), K represents the number of user terminals (UT).

[식 1][Equation 1]

여기서,

는 오그멘티드(augmented) 수신 심볼 벡터,

이고,

는 오그멘티드(augmented) 백색잡음(AWGN) 벡터,

이며,

는 오그멘티드(augmented) 송신 심볼 벡터,

이다.

는 오그멘티드(augmented) 채널 행렬,

이고,

는 프리코더(precoder) 행렬,

이다. 또한,

는 UT I/Q 불균형 행렬,

이고,

는 BS I/Q 불균형 행렬,

이며,

는 잡음 I/Q 불균형 행렬,

이다. here,

Is an augmented received symbol vector,

ego,

Is an augmented white noise (AWGN) vector,

Is,

Is an augmented transmission symbol vector,

to be.

Is an augmented channel matrix,

ego,

Is the precoder matrix,

to be. Also,

Is the UT I / Q imbalance matrix,

ego,

BS I / Q imbalance matrix,

Is,

Is the noise I / Q imbalance matrix,

to be.

그리고,

은 다음 식과 같이 모델링이 가능하다.And,

Can be modeled as follows.

[식 2][Equation 2]

여기서,

는 치환(permutation) 행렬,

이고,

는

을 위한 정규화 상수(normalize constant)이며, 기지국 치환 행렬(permutation matrix)

는 다음 식과 같은 성분을 가진다.

는 추적 연산자(Trace operator)이다. here,

Is the permutation matrix,

ego,

Is

Normalization constant for the base station permutation matrix (permutation matrix)

Has the following components.

Is a trace operator.

[식 3][Equation 3]

여기서,

는 I 성분과 Q 성분이 분리되어 있는 오그멘티드 벡터(augmented vector)를 RF 체인(chain)별로 순서를 바꾸는 역할을 한다.

는 다음 식과 같이 정의될 수 있다.here,

Is used to change the order of the augmented vector in which the I and Q components are separated for each RF chain.

Can be defined as

[식 4][Equation 4]

각

는 n 번째 RF 체인의 I/Q 불균형 성분 행렬이며, 다음 식과 같이 나타낼 수 있다.bracket

Is the I / Q imbalance component matrix of the nth RF chain and can be expressed as follows.

[식 5][Equation 5]

여기서,

는 n 번째 BS의 진폭 불균형이고,

는 n 번째 BS의 위상 불균형이다.here,

Is the amplitude imbalance of the nth BS,

Is the phase imbalance of the nth BS.

사용자 단말에서 발생하는 I/Q 불균형 행렬은 기지국 I/Q 불균형 행렬과 유사한 구조로 모델링되며, 다음 식과 같이 나타낼 수 있다.The I / Q imbalance matrix generated in the user terminal is modeled in a structure similar to the base station I / Q imbalance matrix, and can be expressed as follows.

[식 6][Equation 6]

여기서,

는

을 위한 정규화 상수이며, 사용자 단말 치환 행렬

는 다음 식과 같은 성분을 가질 수 있다. here,

Is

Normalization constant for, user terminal substitution matrix

May have a component such as

[식 7][Equation 7]

여기서,

는 다음 식과 같이 정의될 수 있다.here,

Can be defined as

[식 8][Equation 8]

각

는 k 번째 RF 체인의 I/Q 불균형 성분 행렬이며, 다음 식과 같이 나타낼 수 있다.bracket

Is the I / Q imbalance component matrix of the k- th RF chain and can be expressed as follows.

[식 9][Equation 9]

여기서,

는 k 번째 UT의 진폭 불균형이고,

는 k 번째 UT의 위상 불균형이다. here,

Is the amplitude imbalance of the k th UT,

Is the phase imbalance of the k th UT.

사용자 단말 RF 체인에서 발생하는 I/Q 불균형은 잡음에도 영향을 미친다. 이상적인 RF 체인의 경우 반송파가 서로 직교하기 때문에 발생하는 잡음 간에 상관성이 없다. 그러나 I/Q 불균형이 발생하는 경우 반송파의 직교성이 깨지므로 동 위상 잡음과 직교 위상 잡음 간에 상관성이 발생하게 된다. 이를 나타내는 행렬

는 다음 식과 같이 모델링될 수 있다.I / Q imbalances in the user terminal RF chain also affect noise. In an ideal RF chain, there is no correlation between noise generated because the carriers are orthogonal to each other. However, if I / Q imbalance occurs, the orthogonality of the carrier is broken, so that correlation between in-phase noise and quadrature phase noise occurs. Matrix representing this

Can be modeled as follows.

[식 10][Equation 10]

여기서,

는

을 위한 정규화 상수이며,

는 다음 식과 같이 정의될 수 있다.here,

Is

Normalization constant for

Can be defined as

[식 11][Equation 11]

각

[식 12][Equation 12]

다음으로, 강화 학습 기반 I/Q 불균형 추정 알고리즘을 설명한다. Next, the reinforcement learning based I / Q imbalance estimation algorithm will be described.

기지국과 사용자 단말 양쪽에서 I/Q 불균형이 발생하는 다중 안테나 채널에서, I/Q 불균형 행렬이 포함된 유효 채널 행렬을 다음 식과 같이 정의할 수 있다.In a multi-antenna channel in which I / Q imbalance occurs at both the base station and the user terminal, an effective channel matrix including the I / Q imbalance matrix may be defined as follows.

[식 13][Equation 13]

여기서,

는 유효 채널 행렬,

이다.here,

Is the effective channel matrix,

to be.

기지국과 사용자 단말의 I/Q 불균형 행렬

,

를 추정하면, 이를 유효 채널의 양쪽에서 역행렬을 곱하는 방법으로 보상할 수 있다. 보상된 유효 채널

는 다음 식과 같이 정의할 수 있다.I / Q imbalance matrix between base station and user terminal

,

We can compensate for this by multiplying the inverse of both effective channels. Compensated Effective Channel

Can be defined as

[식 14][Equation 14]

여기서,

는 I/Q 불균형을 보상된 유효 채널 행렬,

이고,

는 추정된 UT I/Q 불균형 행렬,

이며,

는 추정된 BS I/Q 불균형 행렬,

이다. here,

Is the effective channel matrix compensated for I / Q imbalance,

ego,

Is the estimated UT I / Q imbalance matrix,

Is,

Is the estimated BS I / Q imbalance matrix,

to be.

와

는 각각 보상된 I/Q 불균형 행렬을 나타낸다. I/Q 불균형을 완전히 추정한 경우,

가 성립하여

가 성립한다. 여기서,

는

단위 행렬(Identity matrix)이고,

는 복합 공액 전치(conjugate transpose)이다.

Wow

Denotes the compensated I / Q imbalance matrix, respectively. If you fully estimate the I / Q imbalance,

Is established

Is established. here,

Is

Is an identity matrix,

Is a conjugate conjugate.

이로부터, I/Q 불균형의 각 파라미터가 보상된 정도를 나타내는 코스트(cost)을 다음 식과 같이 정의할 수 있다.From this, a cost representing the degree to which each parameter of the I / Q imbalance is compensated can be defined as follows.

[식 15][Equation 15]

여기서, U(a, b)는 최소값 a와 최대값 b를 가지는 균등확률분포를 나타낸다. Where U (a, b) represents an equal probability distribution having a minimum value a and a maximum value b.

각 코스트(cost)은

에서 해당하는 파라미터가 얼마나 보상되었는지 나타내며, I/Q 불균형이 없는 이상적인 경우,

이 성립한다.
Each cost is

Indicates how much of the corresponding parameter is compensated for in the ideal case without I / Q imbalance,

This holds true.

위의 성질을 강화 학습에 적용하여 I/Q 불균형을 추정할 수 있다.The above properties can be applied to reinforcement learning to estimate the I / Q imbalance.

여기에서 에이전트(agent)는 통신 시스템이며, 환경(environment)은 채널 환경이 된다. 에이전트의 행동(action)은 추정 파라미터의 증가 또는 감소로 설정하였다. 에이전트가 다음 행동(action)을 결정하는 척도인 보상(reward)은 코스트(cost)에 반비례하게 설정하여, 정확한 값으로 다가갈수록 에이전트가 높은 보상을 받도록 설계하였다.In this case, the agent is a communication system, and the environment is a channel environment. The agent's action is set to increase or decrease the estimation parameter. The reward, which is the measure by which the agent decides the next action, is set in inverse proportion to the cost, and the agent is designed to receive a higher reward as it approaches the correct value.

수렴성과 수렴 속도를 높이기 위해, 단계 크기는 코스트(cost)도 비례하게 설정하여, 오차가 큰 초기에는 단계 크기를 크게 하여 수렴 속도를 높이고, 정확한 값을 추정할수록 단계 크기를 작게 하여 정확하게 수렴하도록 설정하였다.
In order to increase convergence and convergence speed, the step size is set proportionally, so that the initial step is large, the step size is increased to increase the convergence speed, and the smaller the step size is, the more accurate the value is estimated to converge. It was.

위의 내용을 기반으로 다음의 알고리즘을 제안한다. 즉, 일 실시예에 따른 화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 방법 및 시스템을 나타낼 수 있다. Based on the above, we propose the following algorithm. That is, an I / Q imbalance parameter estimation method and system of a speech learning based multi-antenna transceiver according to an embodiment may be described.

기지국과 사용자 단말 양쪽에서 I/Q 불균형이 발생하는 다중 안테나 채널에서, 각 파라미터의 초기값을 임의로 설정한다.In a multiple antenna channel where I / Q imbalance occurs at both the base station and the user terminal, an initial value of each parameter is arbitrarily set.

그리고 I/Q 불균형 행렬이 포함된 유효 채널을 측정할 수 있으며, 다음 식과 같이 표현할 수 있다.The effective channel including the I / Q imbalance matrix can be measured and expressed as follows.

[식 16][Equation 16]

여기서,

는 유효 채널 행렬,

이다.
here,

Is the effective channel matrix,

to be.

기지국(BS) 진폭 불균형(gain imbalance) BS amplitude gain imbalance 업데이트update

모든 기지국 안테나 인덱스(index)

i에 대해 다음을 반복한다.All base station antenna indexes

Repeat for i

기존 단계(step)에서 추정한 기지국 진폭 불균형을 증가 또는 감소시킨 경우의 I/Q 불균형 추정 행렬을 계산할 수 있으며, 다음 식과 같이 나타낼 수 있다. 다른 인덱스의 진폭 불균형과 위상 불균형(phase imbalance)은 기존에 추정한대로 반영할 수 있다. The I / Q imbalance estimation matrix can be calculated when the base station amplitude imbalance estimated in the previous step is increased or decreased, and can be expressed as follows. The amplitude and phase imbalances of the other indices can be reflected as previously estimated.

[식 17]Formula 17

여기서,

는 진폭 불균형 추정 단계 크기(step size)이다. here,

Is the amplitude imbalance estimation step size.

위에서 정의한 보상 함수(reward function)로 각 경우에 대한 보상을 계산할 수 있으며, 다음 식과 같이 나타낼 수 있다. 한편, 사용자 단말 진폭(gain)/위상(phase) 불균형은 기존에 추정한 값을 반영할 수 있다.Compensation for each case can be calculated with the reward function defined above and can be expressed as the following equation. Meanwhile, the amplitude / phase imbalance of the user terminal may reflect a previously estimated value.

[식 18][Equation 18]

코스트(cost)로부터 다음 단계 크기를 설정할 수 있으며, 다음 식과 같이 나타낼 수 있다.The next step size can be set from the cost and can be expressed as follows.

[식 19][Equation 19]

가장 보상을 크게 주는 경우를 선택하며, 다음 식과 같이 나타낼 수 있다.The most rewarding case is selected and can be expressed as the following equation.

[식 20][Equation 20]

해당하는 값으로 기지국 진폭 불균형을 업데이트할 수 있으며, 다음 식과 같이 나타낼 수 있다.The base station amplitude imbalance can be updated with the corresponding value, and can be expressed as follows.

[식 21]Formula 21

사용자 단말(UT) 진폭 불균형(gain imbalance) UT amplitude gain imbalance 업데이트update

앞에서 설명한 기지국을 사용자 단말로 바꾸어 같은 과정을 반복할 수 있다.
The same process may be repeated by changing the base station described above to a user terminal.

기지국(BS) 위상 불균형(phase imbalance) Base station (BS) phase imbalance 업데이트update

모든 기지국 안테나 인덱스

i에 대해 다음을 반복한다.All base station antenna indexes

Repeat for i

기존 단계에서 추정한 기지국 진폭 불균형을 증가 또는 감소시킨 경우의 I/Q 불균형 추정 행렬을 계산할 수 있으며, 다음 식과 같이 나타낼 수 있다. 다른 인덱스의 진폭 불균형과 위상 불균형은 기존에 추정한대로 반영한다.The I / Q imbalance estimation matrix can be calculated when the base station amplitude imbalance estimated at the previous step is increased or decreased, and can be expressed as follows. The amplitude and phase imbalances of the other indices are reflected as previously estimated.

[식 22]Formula 22

여기서,

는 위상 불균형 추정 단계 크기(step size)이다. here,

Is the phase imbalance estimation step size.

위에서 정의한 보상 함수로 각 경우에 대한 보상을 계산할 수 있으며, 다음 식과 같이 나타낼 수 있다. 사용자 단말(UT)의 진폭 불균형 및 위상 불균형은 기존에 추정한 값을 반영한다.Compensation for each case can be calculated with the compensation function defined above, and can be expressed as the following equation. The amplitude imbalance and the phase imbalance of the user terminal UT reflect a previously estimated value.

[식 23]Formula 23

코스트(cost)으로부터 다음 단계 크기를 설정할 수 있으며, 다음 식과 같이 나타낼 수 있다.The next step size can be set from the cost and can be expressed as follows.

[식 24]Formula 24

가장 보상을 크게 주는 경우를 선택할 수 있으며, 다음 식과 같이 나타낼 수 있다.The most rewarding case can be selected and can be expressed as the following equation.

[식 25][Equation 25]

해당하는 값으로 기지국 진폭 불균형을 업데이트 할 수 있으며, 다음 식과 같이 나타낼 수 있다.The base station amplitude imbalance can be updated with the corresponding value, which can be expressed as the following equation.

[식 26]Formula 26

사용자 단말(UT) 위상 불균형(phase imbalance) User terminal (UT) phase imbalance 업데이트update

앞에서 설명한 기지국을 사용자 단말로 바꾸어 같은 과정을 반복할 수 있다. 이후, 유효 채널 측정 단계로 돌아갈 수 있다.
The same process may be repeated by changing the base station described above to a user terminal. The effective channel measurement step can then be returned.

시뮬레이션 환경은 MIMO 시스템에서 I/Q 불균형이 기지국과 사용자 단말 양쪽에서 발생한다고 가정할 수 있다.The simulation environment may assume that I / Q imbalance occurs in both the base station and the user terminal in the MIMO system.

는 레일리 페이딩 채널(Rayleigh fading channel)이고,

는 백색잡음으로 발생할 수 있다.

Is a Rayleigh fading channel,

Can occur as white noise.

초기 추정 값은 다음 식과 같이 나타낼 수 있다.The initial estimated value can be expressed as follows.

[식 27][Equation 27]

초기 단계 크기는 다음 식과 같이 나타낼 수 있다. The initial stage size can be expressed as

[식 28][Equation 28]

로 설정할 수 있다.

Can be set to

채널 추정을 위한 파일럿 SNR을 0dB부터 15dB까지 바꾸어가며 실험할 수 있다.
Experiment with pilot SNR for channel estimation from 0dB to 15dB.

도 3은 일 실시예에 따른 진폭 불균형 파라미터 추정 MSE를 나타낸 도면이다. 3 illustrates an amplitude imbalance parameter estimation MSE according to an embodiment.

도 4는 일 실시예에 따른 위상 불균형 파라미터 추정 MSE를 나타낸 도면이다. 4 is a diagram illustrating a phase imbalance parameter estimation MSE according to an embodiment.

도 3 및 도 4를 참조하면, 일 실시예에 따른 강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 방법 및 시스템의 성능을 확인할 수 있다. 파일럿 SNR에 따라 성능을 측정했으며, 추정된 I/Q 불균형 파라미터 값인 추정 값과 실제 파라미터 값의 평균제곱오차(MSE)를 측정하였다. 3 and 4, the performance of the method and system for estimating the I / Q imbalance parameter of the enhanced learning-based multi-antenna transceiver according to an embodiment may be checked. Performance was measured according to the pilot SNR, and the mean square error (MSE) between the estimated value and the actual parameter value, which is an estimated I / Q imbalance parameter value, was measured.

시간이 지남에 따라 낮은 MSE로 수렴하는 것을 확인하였으며, 이후에 반복(iteration)이 진행되어도 MSE가 안정적으로 크게 변하지 않음을 확인할 수 있다. 일 실시예에 따른 강화 학습 기반 다중 안테나 송수신단의 I/Q 불균형 파라미터 추정 방법 및 시스템은 반복(iteration)을 통해 정확한 값을 추정하여 I/Q 불균형이 발생하는 시스템에서 이를 추정하고 보상하기 위해 사용할 수 있다.It was confirmed that the convergence to a low MSE over time, and even after the iteration proceeds, it can be seen that the MSE does not change significantly. An I / Q imbalance parameter estimation method and system of an enhanced learning-based multi-antenna transceiver according to an embodiment may be used to estimate and compensate an accurate value in an I / Q imbalance system by iterating an accurate value through iteration. Can be.

실시예들을 따르면 미래 무선 통신의 핵심 기술로 주목되는 거대 배열 다중 안테나 기술에서 많은 수의 안테나로부터　쉽게 발생할 수 있는 I/Q 불균형 문제를 해결하기 위한 방법으로, 무선 통신 기술 및 다중 안테나를 사용하는 미래 차량 통신 등 다양한 분야에 적용될 것으로 보인다. 특히, 실시예들에 따르면 미래 무선 통신의 핵심 기술로 주목 받는 거대 안테나 배열 기술에서 흔하게 발생하는 하이 디바이스 코스트(high device cost) 문제를 해결하는 RF 소자에서 발생하는 문제를 해결할 수 있다.
According to the embodiments, a method for solving an I / Q imbalance problem that can easily occur from a large number of antennas in a large array multi-antenna technology, which is noted as a core technology of future wireless communication, and the future of using wireless communication technology and multiple antennas It is expected to be applied to various fields such as vehicle communication. In particular, according to the embodiments, it is possible to solve a problem occurring in an RF device that solves a high device cost problem that is commonly encountered in a large antenna array technology, which is attracting attention as a core technology of future wireless communication.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 컨트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 컨트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the devices and components described in the embodiments include, for example, processors, controllers, arithmetic logic units (ALUs), digital signal processors, microcomputers, field programmable arrays (FPAs), It may be implemented using one or more general purpose or special purpose computers, such as a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to the execution of the software. For convenience of explanation, one processing device may be described as being used, but one of ordinary skill in the art will appreciate that the processing device includes a plurality of processing elements and / or a plurality of types of processing elements. It can be seen that it may include. For example, the processing device may include a plurality of processors or one processor and one controller. In addition, other processing configurations are possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the above, and configure the processing device to operate as desired, or process it independently or collectively. You can command the device. Software and / or data may be any type of machine, component, physical device, virtual equipment, computer storage medium or device in order to be interpreted by or to provide instructions or data to the processing device. It can be embodied in. The software may be distributed over networked computer systems so that they may be stored or executed in a distributed manner. Software and data may be stored on one or more computer readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be embodied in the form of program instructions that can be executed by various computer means and recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the media may be those specially designed and constructed for the purposes of the embodiments, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.Although the embodiments have been described by the limited embodiments and the drawings as described above, various modifications and variations are possible to those skilled in the art from the above description. For example, the described techniques may be performed in a different order than the described method, and / or components of the described systems, structures, devices, circuits, etc. may be combined or combined in a different form than the described method, or other components. Or even if replaced or substituted by equivalents, an appropriate result can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.
Therefore, other implementations, other embodiments, and equivalents to the claims are within the scope of the claims that follow.

Claims

Arbitrarily setting an initial value of each parameter in a multi-antenna channel in which at least one I / Q imbalance occurs in at least one of a transmitting end and a receiving end;
Measuring an effective channel including an I / Q imbalance value;
Calculating an estimated value of the I / Q imbalance value by increasing or decreasing the transmitter / receiver I / Q imbalance value estimated in the previous step;
Calculating an effective channel compensated for I / Q imbalance by using the estimated value of the calculated I / Q imbalance value;
Measuring a cost indicating a degree to which an I / Q imbalance is compensated for by using an effective channel compensated for the I / Q imbalance;
Calculating a reward in proportion to the cost;
Setting a next step size based on the compensation;
Selecting a higher rewarding action among the rewards compensated at the transmitting end or the receiving end; And
Updating the I / Q imbalance based on the higher rewarding behavior selected
Including,
Setting a next step size based on the compensation,
If the compensation is large because the step size is set proportionally to the cost, the estimated value is close to the actual value, and the step size is reduced. If the compensation is small, the estimated value is different from the actual value. Set the step size to large,
Iteratively calculating the above steps for each parameter, iteratively estimating by means of randomly estimating and compensating I / Q imbalance parameters from the effective channels measured based on reinforcement learning, and calculating the I / Q imbalance. Adjusting convergence speed and convergence accuracy by adaptively adjusting the magnitude of the action by calculating the compensation according to the estimation degree
A method for estimating I / Q imbalance parameters of a reinforcement learning based multiple antenna transceiver.

delete

The method of claim 1,
Computing an effective channel compensated for I / Q imbalance using the estimated value of the I / Q imbalance value,
Computing an effective channel compensated for I / Q imbalance using the estimated value of the I / Q imbalance value through a compensator and passing it to a compensation generator
A method for estimating I / Q imbalance parameters of a reinforcement learning based multiple antenna transceiver.

The method of claim 1,
Measuring a cost representing the degree to which the I / Q imbalance is compensated using the effective channel compensated for the I / Q imbalance,
A valid channel matrix compensated for the I / Q imbalance is input to measure a cost representing the degree to which the I / Q imbalance is compensated by a compensation generator, and corresponds to a real part in an augmented channel notation. To calculate the degree of comparison between the part corresponding to the imaginary part
A method for estimating I / Q imbalance parameters of a reinforcement learning based multiple antenna transceiver.

delete

The method of claim 1,
The transmitting end is a user terminal and the receiving end is a base station,
The I / Q imbalance value is
Base station gain unbalance value, base station phase unbalance value, user terminal amplitude unbalance value, and user terminal phase unbalance value
A method for estimating I / Q imbalance parameters of a reinforcement learning based multiple antenna transceiver.

In a multi-antenna channel in which at least one of the transmitting and receiving I / Q imbalances occurs, the initial value of each parameter is arbitrarily set, the effective channel including the I / Q imbalance value is measured, and in the previous step The estimated value of the I / Q imbalance value is calculated by increasing or decreasing the estimated I / Q imbalance value of the transmitting end or the receiving end, and the effective channel compensated for the I / Q imbalance is calculated to indicate the degree to which the I / Q imbalance is compensated. A receiver that measures a cost and calculates a reward in proportion to the cost
Including,
The receiving end,
Selecting a higher rewarding action among the rewarded rewards to update the I / Q imbalance based on the higher rewarding action selected, and based on the reward to determine the next step size. Setting the step size in proportion to the cost, if the compensation is large, the estimated value is close to the actual value, and decreasing the step size; if the compensation is small, the estimated value is different from the actual value. Is large, so set the above step size to be large,
Iterative estimation is performed by randomly estimating and compensating I / Q imbalance parameters from the effective channel measured based on reinforcement learning, and calculating compensation based on the degree of estimation of the I / Q imbalance. Adjusting convergence speed and convergence accuracy by adaptively scaling
I / Q imbalance parameter estimation system of the reinforcement learning-based multiple antenna transceiver.

delete

The method of claim 7, wherein
The transmitting end is a user terminal and the receiving end is a base station,
The I / Q imbalance value is
Base station gain unbalance value, base station phase unbalance value, user terminal amplitude unbalance value, and user terminal phase unbalance value
I / Q imbalance parameter estimation system of the reinforcement learning-based multiple antenna transceiver.

A compensator for receiving a signal from a transmitter and estimating an effective channel; And
Compensation that receives a valid channel matrix compensated for I / Q imbalance from the compensator, measures a cost representing the degree to which the I / Q imbalance is compensated, and calculates a reward in proportion to the cost. Generator
Including,
The compensator,
In a multi-antenna channel in which at least one of the transmitting and receiving I / Q imbalances occurs, the initial value of each parameter is arbitrarily set, the effective channel including the I / Q imbalance value is measured, and in the previous step Increasing or decreasing the estimated I / Q imbalance value of the transmitter or receiver to calculate an estimated value of the I / Q imbalance value and calculating an effective channel compensated for I / Q imbalance,
The receiving end,
Select an action that gives a higher reward among the rewarded rewards,
The compensator,
Update the I / Q imbalance based on the higher rewarding action selected, set a next step size based on the reward, and set the step size proportionately to the reward If the value is large, the step size is reduced because the estimated value is close to the actual value, and if the compensation is small, the step size is set large because the estimated value is large from the actual value,
Iterative estimation is performed by randomly estimating and compensating I / Q imbalance parameters from the effective channel measured based on reinforcement learning, and calculating compensation based on the degree of estimation of the I / Q imbalance. Adjusting convergence speed and convergence accuracy by adaptively scaling
I / Q imbalance parameter estimation system of the reinforcement learning-based multiple antenna transceiver.

delete

The method of claim 10,
The compensation generator,
A valid channel matrix compensated for the I / Q imbalance is input to measure a cost representing the degree to which the I / Q imbalance is compensated by a compensation generator, and corresponds to a real part in an augmented channel notation. To calculate the degree of comparison between the part corresponding to the imaginary part
I / Q imbalance parameter estimation system of the reinforcement learning-based multiple antenna transceiver.

delete

The method of claim 10,
The transmitting end is a user terminal and the receiving end is a base station,
The I / Q imbalance value is
Base station gain unbalance value, base station phase unbalance value, user terminal amplitude unbalance value, and user terminal phase unbalance value
I / Q imbalance parameter estimation system of the reinforcement learning-based multiple antenna transceiver.

delete