KR102508071B1

KR102508071B1 - Beam forming training method and snr prediction method for beam forming training in wlan system

Info

Publication number: KR102508071B1
Application number: KR1020220101909A
Authority: KR
Inventors: 김문석
Original assignee: 세종대학교산학협력단
Priority date: 2022-08-16
Filing date: 2022-08-16
Publication date: 2023-03-08

Abstract

Disclosed are a method for the beam forming training based on reinforced learning in a wireless LAN system and a device thereof. The disclosed method for the beam forming training based on reinforced learning comprises: a step of receiving SISO feedback information from a terminal; and a step of determining a current transmission sector group used for transmitting an action frame by using the pre-trained reinforced learning model. A status group of the reinforced learning model includes: the SISO feedback information, and a previous transmission sector group used for transmitting the action frame in a previous MIMO step. The action group of the reinforced learning model includes: a participation information of the terminal for a current MIMO step, and an update information for the current transmission sector group. Therefore, the time required for the beam forming training can be reduced.

Description

Reinforcement learning-based beamforming training method and apparatus

본 발명은 무선랜 시스템에서의 빔포밍 훈련 방법 및 장치에 관한 것으로서, 더욱 상세하게는 강화 학습 기반의 빔포밍 훈련 방법 및 장치에 관한 것이다. The present invention relates to a beamforming training method and apparatus in a wireless LAN system, and more particularly, to a reinforcement learning-based beamforming training method and apparatus.

IEEE 802.11 표준에 따르면, MU-MIMO 빔포밍 훈련은 SISO 단계와 MIMO 단계로 수행된다. According to the IEEE 802.11 standard, MU-MIMO beamforming training is performed in SISO and MIMO stages.

SISO 단계에서 액세스 포인트는 송신 섹터별로 짧은 섹터 스윕 프레임(short sector sweep frame)을 전송하고, 단말로부터 SISO 피드백 정보를 수신한다. SISO 피드백 정보는 송신 섹터별로 전송된 짧은 섹터 스윕 프레임에 대해서, 단말이 측정한 SNR값을 포함한다.In the SISO step, the access point transmits a short sector sweep frame for each transmission sector and receives SISO feedback information from the terminal. The SISO feedback information includes an SNR value measured by the terminal with respect to the short sector sweep frame transmitted for each transmission sector.

그리고 MIMO 단계는 세부적으로 BF 셋업(setup) 하위 단계, BF 선택(selection) 하위 단계, BF 훈련(training) 하위 단계 및 BF 피드백(feedback) 하위 단계로 이루어진다. 각 하위 단계에서 액세스 포인트는 빔포밍 훈련을 위한 액션 프레임을 단말로 전송한다. In addition, the MIMO step includes a BF setup substep, a BF selection substep, a BF training substep, and a BF feedback substep. In each sub-step, the access point transmits an action frame for beamforming training to the terminal.

관련 선행문헌으로 대한민국 등록특허 제10-2288198호, 제10-2154481호, 제10-2153923호, 대한민국 공개특허 제2022-0036494호가 있다.As related prior literature, there are Republic of Korea Patent Registration Nos. 10-2288198, 10-2154481, 10-2153923, and Republic of Korea Patent Publication No. 2022-0036494.

본 발명은, 빔포밍 훈련을 위한 액션 프레임이 효율적으로 전송될 수 있도록, 송신 섹터를 위상 배열 안테나에 할당하는 빔포밍 훈련 방법을 제공하기 위한 것이다.An object of the present invention is to provide a beamforming training method for allocating transmission sectors to phased array antennas so that action frames for beamforming training can be efficiently transmitted.

상기한 목적을 달성하기 위한 본 발명의 일 실시예에 따르면, 단말로부터 SISO 피드백 정보를 수신하는 단계; 및 미리 학습된 강화 학습 모델을 이용하여, 액션 프레임의 전송에 이용되는 현재 송신 섹터 집합을 결정하는 단계를 포함하며, 상기 강화 학습 모델의 상태 집합은, 상기 SISO 피드백 정보 및 이전 MIMO 단계에서 액션 프레임의 전송에 이용된 이전 송신 섹터 집합을 포함하며, 상기 강화 학습 모델의 액션 집합은, 현재 MIMO 단계에 대한 상기 단말의 참여 정보 및 상기 현재 송신 섹터 집합에 대한 업데이트 정보를 포함하는 강화 학습 기반의 빔포밍 훈련 방법이 제공된다. According to one embodiment of the present invention for achieving the above object, receiving SISO feedback information from the terminal; and determining a current transmission sector set used for transmission of an action frame by using a pre-learned reinforcement learning model, wherein the state set of the reinforcement learning model includes the SISO feedback information and an action frame in a previous MIMO step. A reinforcement learning-based beam including a previous transmission sector set used for transmission of , and an action set of the reinforcement learning model including information on participation of the terminal in a current MIMO step and update information on the current transmission sector set. A forming training method is provided.

또한 상기한 목적을 달성하기 위한 본 발명의 다른 실시예에 따르면, 단말로부터 SISO 피드백 정보를 수신하고, 액션 프레임을 전송하는 위상 배열 안테나; 메모리; 상기 메모리와 전기적으로 연결되는 적어도 하나의 프로세서를 포함하며, 상기 프로세서는 미리 학습된 강화 학습 모델을 이용하여, 상기 액션 프레임의 전송에 이용되는 현재 송신 섹터 집합을 결정하며, 상기 강화 학습 모델의 상태 집합은 상기 SISO 피드백 정보 및 이전 MIMO 단계에서 액션 프레임의 전송에 이용된 이전 송신 섹터 집합을 포함하며, 상기 강화 학습 모델의 액션 집합은 현재 MIMO 단계에 대한 상기 단말의 참여 정보 및 상기 현재 송신 섹터 집합에 대한 업데이트 정보를 포함하는 액세스 포인트가 제공된다.In addition, according to another embodiment of the present invention for achieving the above object, a phased array antenna for receiving SISO feedback information from a terminal and transmitting an action frame; Memory; and at least one processor electrically connected to the memory, wherein the processor determines a current transmission sector set used for transmission of the action frame by using a pre-learned reinforcement learning model, and determines a state of the reinforcement learning model The set includes the SISO feedback information and a previous transmit sector set used for transmission of an action frame in a previous MIMO step, and the action set of the reinforcement learning model includes the UE's participation information for the current MIMO step and the current transmit sector set. An access point including update information about is provided.

본 발명의 일실시예에 따르면, 강화 학습 모델을 이용함으로써, 빔포밍 훈련을 위한 액션 프레임의 불필요한 전송이 감소하고, 빔포밍 훈련 시간이 줄어들 수 있다. According to an embodiment of the present invention, unnecessary transmission of action frames for beamforming training can be reduced and beamforming training time can be reduced by using a reinforcement learning model.

또한 본 발명의 일실시예에 따르면, 가능한 많은 단말들이 빔포밍 훈련에 참여할 수 있다.Also, according to an embodiment of the present invention, as many terminals as possible can participate in beamforming training.

도 1은 일반적인 빔포밍 훈련 방법을 설명하기 위한 도면이다.
도 2는 본 발명의 일실시예에 따른 무선랜 시스템에서 빔포밍 훈련 방법을 설명하기 위한 도면이다.
도 3은 본 발명의 일실시예에 따른 무선랜 시스템에서 빔포밍 훈련을 위한 SNR 예측 방법을 설명하기 위한 도면이다.
도 4는 본 발명의 일실시예에 따른 인공 신경망을 설명하기 위한 도면이다.1 is a diagram for explaining a general beamforming training method.
2 is a diagram for explaining a beamforming training method in a WLAN system according to an embodiment of the present invention.
3 is a diagram for explaining an SNR prediction method for beamforming training in a WLAN system according to an embodiment of the present invention.
4 is a diagram for explaining an artificial neural network according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다. Since the present invention can make various changes and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention. Like reference numerals have been used for like elements throughout the description of each figure.

이하에서, 본 발명에 따른 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 무선랜 시스템에서의 일반적인 빔포밍 훈련 방법을 설명하기 위한 도면으로서, 도 1에는 송신 섹터가 4개(TS1 내지 TS4)이고, 액세스 포인트의 위상 배열 안테나가 2개이며, 제1위상 배열 안테나(110)에는 제1 및 제2송신 섹터(TS1, TS2)가 할당되고, 제2위상 배열 안테나(120)에는 제3 및 제4송신 섹터(TS3, TS4)가 할당된 예시가 도시된다.1 is a diagram for explaining a general beamforming training method in a WLAN system. In FIG. 1, there are 4 transmission sectors (TS1 to TS4), 2 phased array antennas of an access point, and a first phased array An example in which the first and second transmission sectors TS1 and TS2 are allocated to the antenna 110 and the third and fourth transmission sectors TS3 and TS4 are allocated to the second phased array antenna 120 is illustrated.

액세스 포인트는 SISO 단계에서 각 송신 섹터 별로 짧은 섹터 프레임을 전송한다. 제1위상 배열 안테나(110)는 제1 및 제2송신 섹터(TS1, TS2)로 짧은 섹터 프레임을 전송하고, 제2위상 배열 안테나(120)는 제3 및 제4송신 섹터(TS3, TS4)로 짧은 섹터 프레임을 전송한다. 액세스 포인트에 접속된 단말은 짧은 섹터 프레임을 수신하여 SNR을 측정하고, 측정된 SNR값을 포함하는 SISO 피드백 정보를 액세스 포인트로 전송한다. 즉 하나의 무선 단말은, 제1 내지 제4송신 섹터(TS1 내지 TS4)에 대해 측정된 4개의 SNR값을 포함하는 SISO 피드백 정보를 액세스 포인트로 전송한다. 예컨대 무선 단말은 제1송신 섹터(TS1)로 전송되는 짧은 섹터 스윕 프레임을 수신하여 제1송신 섹터(TS1)에 대한 SNR값을 생성하며, 제2송신 섹터(TS2)로 전송되는 짧은 섹터 스윕 프레임을 수신하여 제2송신 섹터(TS2)에 대한 SNR값을 생성한다.The access point transmits a short sector frame for each transmit sector in the SISO step. The first phased array antenna 110 transmits a short sector frame to the first and second transmission sectors TS1 and TS2, and the second phased array antenna 120 transmits a short sector frame to the third and fourth transmission sectors TS3 and TS4. Transmits a short sector frame with A terminal connected to the access point receives the short sector frame, measures the SNR, and transmits SISO feedback information including the measured SNR value to the access point. That is, one wireless terminal transmits SISO feedback information including four SNR values measured for the first to fourth transmission sectors (TS1 to TS4) to the access point. For example, the wireless terminal receives the short sector sweep frame transmitted to the first transmission sector TS1, generates an SNR value for the first transmission sector TS1, and transmits the short sector sweep frame to the second transmission sector TS2. is received to generate an SNR value for the second transmission sector TS2.

액세스 포인트는 측정된 SNR값을 이용하여 MIMO 단계에서 빔포밍 훈련을 수행한다. 액세스 포인트는 BF 셋업 하위 단계, BF 선택 하위 단계, BF 훈련 하위 단계 및 BF 피드백 하위 단계 각각에서 액션 프레임을 전송하여, 빔포밍 훈련을 수행한다. The access point performs beamforming training in the MIMO step using the measured SNR value. The access point performs beamforming training by transmitting action frames in each of the BF setup sub-step, BF selection sub-step, BF training sub-step, and BF feedback sub-step.

액세스 포인트는, MIMO 단계에서 액세스 포인트에 접속한 단말이 모두 수신할 수 있도록 액션 프레임을 전송하며, 이를 위해 복수의 송신 섹터의 일부 송신 섹터를 포함하는 송신 섹터 조합 중 하나를 선택하여, 액션 프레임을 전송할 수 있다. 송신 섹터 조합은, 위상 배열 안테나 각각에 할당된 송신 섹터 중에서 하나씩 선택되어 결정될 수 있다. 도 1과 같은 예시에서, 송신 섹터 조합은 (TS1, TS3), (TS1, TS4), (TS2, TS3) (TS2, TS4)와 같이, 4개로 결정될 수 있으며, 액세스 포인트는 이러한 송신 섹터 조합 중에 하나를 선택하여, 선택된 송신 섹터 조합에 포함된 복수의 송신 섹터로 액션 프레임을 전송할 수 있다.In the MIMO step, the access point transmits an action frame so that all terminals accessing the access point can receive it. can transmit The transmission sector combination may be determined by selecting one transmission sector assigned to each phased array antenna. In the example of FIG. 1, four transmission sector combinations may be determined, such as (TS1, TS3), (TS1, TS4), (TS2, TS3) (TS2, TS4), and the access point may select one of these transmission sector combinations By selecting one, an action frame may be transmitted to a plurality of transmission sectors included in the selected transmission sector combination.

모든 송신 섹터 조합으로 액션 프레임을 전송하는 것은 빔포밍 훈련 시간의 증가를 초래하며 비효율적이므로, 모든 단말이 액션 프레임을 수신하면서도 액션 프레임의 전송 횟수가 최소가 될 수 있는 효율적인 송신 섹터 조합을 결정하는 것이 필요하다. 이에 본 발명은 강화 학습을 이용하여 최적의 송신 섹터 조합을 결정하고, 이를 통해 빔포밍 훈련을 수행하는 방법을 제안한다.Transmitting the action frame with all transmission sector combinations increases the beamforming training time and is inefficient. Therefore, it is important to determine an efficient transmission sector combination that can minimize the number of transmissions of action frames while all terminals receive the action frames. need. Accordingly, the present invention proposes a method of determining an optimal transmission sector combination using reinforcement learning and performing beamforming training therethrough.

본 발명의 일실시예에 따른 빔포밍 훈련 방법은 무선랜 시스템의 액세스 포인트에서 수행된다.A beamforming training method according to an embodiment of the present invention is performed in an access point of a wireless LAN system.

도 2는 본 발명의 일실시예에 따른 무선랜 시스템에서 빔포밍 훈련 방법을 설명하기 위한 도면이며, 도 3은 본 발명의 일실시예에 따른 강화 학습 모델을 도시하는 도면이다.2 is a diagram for explaining a beamforming training method in a WLAN system according to an embodiment of the present invention, and FIG. 3 is a diagram showing a reinforcement learning model according to an embodiment of the present invention.

도 2를 참조하면 본 발명의 일실시예에 따른 액세스 포인트는 단말로부터 SISO 피드백 정보를 수신(S210)하고, 미리 학습된 강화 학습 모델을 이용하여, 액션 프레임의 전송에 이용되는 현재 송신 섹터 집합을 결정(S220)한다. 여기서, 송신 섹터 집합이란, 위상 배열 안테나 각각에 대해 할당된 송신 섹터 중에서, 액션 프레임의 전송에 이용되는 송신 섹터 조합을 포함하는 집합을 의미한다.Referring to FIG. 2, the access point according to an embodiment of the present invention receives SISO feedback information from the terminal (S210), and uses a pre-learned reinforcement learning model to determine a current transmission sector set used for transmission of an action frame. It is determined (S220). Here, the transmission sector set means a set including a transmission sector combination used for transmission of an action frame among transmission sectors allocated to each phased array antenna.

강화 학습 모델은 일실시예로서 도 3의 액터-크리틱(actor-critic) 모델일 수 있다. 강화 학습 모델에서 에이전트(RL Agent)는 액세스 포인트에 대응되며, 환경(Environment)은 빔포밍 훈련에 대응된다. 그리고 에이전트는 액터와 크리틱으로 구성될 수 있다. 에이전트, 즉 액세스 포인트는 현재 빔포밍 환경의 상태(state)를 관찰(observation)하며, 관찰된 상태에서 보상(reward)이 최대가 되는 행동(action)을 수행한다. 여기서 행동은 현재 송신 섹터 집합을 결정하는 것에 대응되며, 액터와 크리틱은, TD(Temporal Difference) 에러를 통해 업데이트될 수 있다.As an example, the reinforcement learning model may be the actor-critic model of FIG. 3 . In the reinforcement learning model, an agent (RL Agent) corresponds to an access point, and an environment (Environment) corresponds to beamforming training. And agents can be composed of actors and critiques. The agent, that is, the access point, observes the state of the current beamforming environment and performs an action that maximizes the reward in the observed state. Here, the action corresponds to determining the current transmit sector set, and the actor and critique may be updated through a Temporal Difference (TD) error.

강화 학습 모델의 상태 집합은, SISO 피드백 정보 및 이전 MIMO 단계에서 액션 프레임의 전송에 이용된 이전 송신 섹터 집합을 포함하며, 강화 학습 모델의 액션 집합은, 현재 MIMO 단계에 대한 단말의 참여 정보 및 현재 송신 섹터 집합에 대한 업데이트 정보를 포함한다. 여기서, 이전 송신 섹터 집합은 단계 S220의 현재 송신 섹터 집합 결정 전에, 액션 프레임의 전송에 이용된 송신 섹터 집합을 의미한다. 그리고 참여 정보는 SISO 피드백 정보에 포함된 SNR값에 따라 결정되는 정보로서, 미리 설정된 임계값 이상의 SNR값을 전송한 단말이 MIMO 단계에 참여한다.The state set of the reinforcement learning model includes SISO feedback information and a previous transmission sector set used for transmission of an action frame in a previous MIMO step, and the action set of the reinforcement learning model includes UE participation information for a current MIMO step and a current set of transmission sectors. It includes update information about the transmit sector set. Here, the previous transmission sector set refers to a transmission sector set used for transmission of an action frame before determining the current transmission sector set in step S220. The participation information is information determined according to the SNR value included in the SISO feedback information, and a terminal that transmits an SNR value equal to or greater than a preset threshold participates in the MIMO step.

이하에서는 MIMO 단계를 구성하는, BF 셋업 하위 단계, BF 선택 하위 단계, BF 훈련 하위 단계 및 BF 피드백 하위 단계 별로, 현재 송신 섹터 집합을 결정하는 방법을 자세히 설명하기로 한다.Hereinafter, a method of determining a current transmission sector set for each of the BF setup sub-step, BF selection sub-step, BF training sub-step, and BF feedback sub-step constituting the MIMO step will be described in detail.

BF 셋업 하위 단계, BF 선택 하위 단계BF setup substep, BF select substep

BF 셋업 하위 단계 및 BF 선택 하위 단계에서 액세스 포인트는, 전술된 강화 학습 모델을 이용하여 현재 송신 섹터 집합을 결정한다.In the BF setup sub-step and the BF selection sub-step, the access point determines the current transmit sector set using the reinforcement learning model described above.

빔포밍 훈련에 참여하는 단말의 개수를 N이라고 하고, 상태 집합에 포함된 이전 송신 섹터 집합과 SISO 피드백 정보를 각각 T_N 및 F_N으로 표현할 때, 이전 송신 섹터 집합은 [수학식 1]과 같이 표현될 수 있다.When the number of terminals participating in beamforming training is N, and the previous transmission sector set and SISO feedback information included in the state set are expressed as T _N and F _N , respectively, the previous transmission sector set is expressed as in [Equation 1] It can be.

여기서,

는 i번째 액션 프레임을 전송할 때 이용된 송신 섹터 조합을 나타낸다. 빔포밍 훈련에 참여하는 단말의 개수가 N개이기 때문에, 액션 프레임은 최대 N회 전송되며, 송신 섹터 조합의 최대 개수는 N개이다.here,

represents a transmission sector combination used when transmitting the i-th action frame. Since the number of terminals participating in beamforming training is N, action frames are transmitted up to N times, and the maximum number of transmit sector combinations is N.

그리고 위상 배열 안테나의 개수를 M개라고 할 때, i번째 액션 프레임의 전송에 이용되는 송신 섹터 조합은 [수학식 2]와 같이 표현될 수 있고,

는 j번째 위상 배열 안테나에 할당된 송신 섹터를 나타낸다.And when the number of phased array antennas is M, the transmission sector combination used for transmission of the i-th action frame can be expressed as in [Equation 2],

denotes a transmission sector allocated to the j-th phased array antenna.

SISO 피드백 정보는 [수학식 3]과 같이 표현될 수 있으며, 여기서

는, k번째 단말로부터 수신된 SISO 피드백 정보를 나타낸다. SISO feedback information can be expressed as [Equation 3], where

Represents SISO feedback information received from the k-th terminal.

그리고

는 [수학식 4]와 같이 표현될 수 있으며,

은 송신 섹터(TS)의 총 개수를 L이라고 할 때, SISO 단계에서 l번째 송신 섹터로 전송된 짧은 섹터 스윕 프레임에 대한 SNR값을 나타낸다. 즉, 단말은 모든 송신 섹터 각각에 대한 SNR값을 포함하는 SISO 피드백 정보를 액세스 포인트로 전송한다.and

Can be expressed as [Equation 4],

denotes an SNR value for a short sector sweep frame transmitted to the l-th transmission sector in the SISO step, when the total number of transmission sectors (TS) is L. That is, the terminal transmits SISO feedback information including SNR values for all transmission sectors to the access point.

행동 집합에 포함된 참여 정보와 업데이트 정보를 각각 P_N, e라고 할 때, 참여 정보는 [수학식 5]와 같이 표현될 수 있고, k 번째 단말이 현재 MIMO 단계에 참여하는 경우는 p_k는 1, 그렇지 않은 경우 p_k는 0으로 설정될 수 있다.When the participation information and update information included in the action set are P _N , e, respectively, the participation information can be expressed as in [Equation 5], and when the k-th terminal participates in the current MIMO step, p _k is 1, otherwise p _k may be set to 0.

단말이 전송한 SNR값 중에서 미리 설정된 제1임계값 이상의 SNR값이 1개 이상이라면, 해당 단말은 현재 MIMO 단계에 참여하는 것으로 결정된다.If there is one or more SNR values equal to or greater than the preset first threshold among the SNR values transmitted by the UE, it is determined that the corresponding UE participates in the current MIMO step.

업데이트 정보는 일실시예로서, 제1 내지 제4정보를 포함할 수 있으며, 액세스 포인트는 업데이트 정보에 따라서 현재 송신 섹터 집합을 결정한다. 먼저, 제1정보(e=1)는 이전 송신 섹터 집합에, 적어도 하나의 새로운 송신 섹터 조합을 추가하는 정보이다. 따라서 이 경우, 액션 프레임의 전송 횟수가 증가한다. 이 때, 액세스 포인트는, 이전 송신 섹터 집합에 포함된 송신 섹터 조합을 제외한 송신 섹터 조합 중에서, 액션 프레임을 추가로 전송할 적어도 하나의 송신 섹터 조합을 이전 송신 섹터 집합에 추가하여, 현재 송신 섹터 집합을 결정한다. As an example, the update information may include first to fourth information, and the access point determines a current transmission sector set according to the update information. First, first information (e=1) is information for adding at least one new transmission sector combination to a previous transmission sector set. Accordingly, in this case, the number of transmissions of action frames increases. At this time, the access point adds at least one transmit sector combination to which an action frame is additionally transmitted among transmit sector combinations other than transmit sector combinations included in the previous transmit sector set to the previous transmit sector set, thereby obtaining the current transmit sector set. Decide.

액세스 포인트는 이전 빔포밍 훈련의 BF 셋업 하위 단계, BF 선택 하위 단계에서 액션 프레임을 수신하지 못한 단말들이 포함된 송신 섹터들중에서, 액션 프레임을 추가로 전송할 송신 섹터 조합을 결정하여 이전 송신 섹터 집합에 추가한다. 단말의 액션 프레임의 수신 여부는, 단말이 전송하는 애크 정보로 확인할 수 있다. 이 때, 액세스 포인트는, 액션 프레임을 수신하지 못한 단말들이 전송한 SNR값 중 미리 설정된 제2임계값 이상인 SNR값이 최대가 되도록, 액션 프레임을 추가로 전송할 송신 섹터 조합을 결정한다. The access point determines a transmission sector combination to additionally transmit an action frame among transmission sectors including terminals that have not received an action frame in the BF setup sub-step and the BF selection sub-step of the previous beamforming training, and determines the transmission sector combination to the previous transmission sector set. Add. Whether or not the terminal has received the action frame can be confirmed through ACK information transmitted by the terminal. At this time, the access point determines a transmission sector combination to additionally transmit an action frame such that an SNR value greater than or equal to a preset second threshold is maximized among SNR values transmitted by terminals that have not received the action frame.

다시 말해 액세스 포인트는 현재 MIMO 단계에 참여하고, 이전 MIMO 단계에서 액션 프레임을 수신하지 못한 단말 중에서, SISO 피드백 정보의 SNR값이 제2임계값 이상인 단말의 개수가 최대인, 송신 섹터 조합을 이전 송신 섹터 집합에 추가하여, 현재 송신 섹터 집합을 결정한다.In other words, the access point transmits the previous transmit sector combination, in which the number of UEs having an SNR value of SISO feedback information equal to or greater than the second threshold is the maximum among UEs participating in the current MIMO step and not receiving an action frame in the previous MIMO step. In addition to the sector set, determine the current transmit sector set.

만일 제2임계값 이상의 SNR값이 존재하지 않는다면, 액세스 포인트는 최대 SNR값을 전송한 단말이 포함된 송신 섹터가 포함되도록, 액션 프레임을 추가로 전송할 송신 섹터 조합을 결정한다. 즉, 액세스 포인트는 현재 MIMO 단계에 참여하고, 이전 MIMO 단계에서 액션 프레임을 수신하지 못한 단말 중에서, 제2임계값 이상인 단말이 존재하지 않는 경우, 최대 SNR값에 대응되는 단말이 포함된 송신 섹터 조합을 이전 송신 섹터 집합에 추가하여, 현재 송신 섹터 집합을 결정한다.If there is no SNR value equal to or greater than the second threshold value, the access point determines a transmission sector combination to additionally transmit an action frame such that a transmission sector including a terminal having transmitted the maximum SNR value is included. That is, when the access point currently participates in the MIMO phase and does not have a terminal having a value higher than the second threshold value among the terminals that have not received an action frame in the previous MIMO phase, the transmission sector combination including the terminal corresponding to the maximum SNR value exists. is added to the previous transmitted sector set to determine the current transmitted sector set.

제2정보(e=2)는 이전 송신 섹터 집합에 포함된 송신 섹터 조합 중에서, 적어도 하나의 송신 섹터 조합을 제거하는 정보이다. 따라서 이 경우 액션 프레임의 전송 횟수가 감소한다. 액세스 포인트는, 이전 송신 섹터 집합에 포함된 송신 섹터 조합 각각으로 액션 프레임이 전송됐을 때, 액션 프레임을 수신한 단말의 개수가 최소인 송신 섹터 조합을 이전 송신 섹터 집합에서 제거할 수 있다.The second information (e=2) is information for removing at least one transmission sector combination from among transmission sector combinations included in a previous transmission sector set. Accordingly, in this case, the number of transmissions of action frames is reduced. When the action frame is transmitted to each of the transmission sector combinations included in the previous transmission sector set, the access point may remove the transmission sector combination having the minimum number of terminals that have received the action frame from the previous transmission sector set.

제3정보(e=3)는 이전 송신 섹터 집합에 포함된 송신 섹터 조합 중에서, 적어도 하나를 새로운 송신 섹터 조합으로 변경하는 정보이다. 따라서, 이 경우 이전 송신 섹터 집합에 포함된 송신 섹터 조합의 개수와, 현재 송신 섹터 집합에 포함된 송신 섹터 조합의 개수는 동일하다. 액세스 포인트는 업데이트 정보가 제2정보인 경우의 제거될 송신 섹터 조합을 결정하는 방법을 이용해, 송신 섹터 조합을 제거하고, 업데이트 정보가 제1정보인 경우의 추가될 송신 섹터 조합을 결정하는 방법을 이용해, 송신 섹터 조합을 추가한다. The third information (e=3) is information for changing at least one of the transmission sector combinations included in the previous transmission sector set to a new transmission sector combination. Accordingly, in this case, the number of transmit sector combinations included in the previous transmit sector set is equal to the number of transmit sector combinations included in the current transmit sector set. The access point uses a method of determining a transmission sector combination to be removed when the update information is the second information, and a method of determining a transmission sector combination to be added when the update information is the first information. to add a transmit sector combination.

제4정보(e=4)는 이전 송신 섹터 집합을 현재 송신 섹터 집합으로 이용하는 정보이다. 즉, 이 경우 액세스 포인트는 이전 송신 섹터 집합을 현재 송신 섹터 집합으로 결정하고, 액션 프레임을 전송한다.Fourth information (e=4) is information using a previous transmission sector set as a current transmission sector set. That is, in this case, the access point determines the previous transmission sector set as the current transmission sector set and transmits the action frame.

액세스 포인트는 전술된 바와 같이 보상이 최대가 될 수 있도록 업데이트 정보 중 하나를 결정하고, 결정된 업데이트 정보에 따라서 현재 송신 섹터 집합을 결정할 수 있다. 그리고 이 때, 보상(

)은 [수학식 6]과 같이 결정될 수 있다.As described above, the access point may determine one of the update information to maximize compensation, and determine a current transmit sector set according to the determined update information. And at this time, compensation (

) can be determined as in [Equation 6].

여기서, N_S는 BF 셋업 하위 단계에서 적어도 하나의 액션 프레임을 수신한 단말의 개수를 나타내며, N_t는 BF 셋업 하위 단계에서의 액션 프레임의 전송 횟수를 나타낸다.Here, N _S represents the number of terminals that have received at least one action frame in the BF setup sub-step, and N _t indicates the number of transmissions of action frames in the BF setup sub-step.

BF 훈련 하위 단계BF training sub-steps

BF 훈련 하위 단계에서 액세스 포인트는 현재 송신 섹터 집합의 송신 섹터 조합에 포함된 단말 중에서, 액션 프레임의 수신이 가능한 단말을 확인하여, 송신 섹터 조합을 결정한다.In the BF training sub-step, the access point identifies a terminal capable of receiving an action frame from among terminals included in the transmit sector combination of the current transmit sector set, and determines the transmit sector combination.

액세스 포인트는 이전 송신 섹터 집합의 송신 섹터 조합에 포함된 단말은 액션 프레임의 수신이 가능한 것으로 판단한다. 그리고 현재 송신 섹터 집합의 송신 섹터 조합 중 새로 추가된 송신 섹터 조합에 포함된 단말에 대해, 새로 추가된 송신 섹터 조합에 포함된 단말에서 전송된 SNR값 중 적어도 하나가 미리 설정된 제2임계값인 경우 액션 프레임의 수신이 가능한 것으로 판단한다.The access point determines that the terminal included in the transmission sector combination of the previous transmission sector set can receive the action frame. And when at least one of the SNR values transmitted from the terminal included in the newly added transmission sector combination among the transmission sector combinations of the current transmission sector set is a preset second threshold value It is determined that the reception of the action frame is possible.

액세스 포인트는 액션 프레임의 수신이 가능한 단말 중에서, BF 훈련 하위 단계에서의 빔포밍 훈련이 필요한 단말이 포함된 송신 섹터 조합이 현재 송신 섹터 집합에 포함되도록, 현재 송신 섹터 집합을 갱신할 수 있다. The access point may update the current transmission sector set so that a transmission sector combination including a terminal requiring beamforming training in a lower BF training step among terminals capable of receiving an action frame is included in the current transmission sector set.

BF 피드백 하위 단계BF feedback sub-step

BF 피드백 하위 단계에서 액세스 포인트는, 결정된 현재 송신 섹터 집합과 무관하게, 최대 SNR값을 전송한 단말이 포함된 송신 섹터로 액션 프레임을 전송한다.In the BF feedback substep, the access point transmits an action frame to a transmission sector including a terminal that has transmitted the maximum SNR value, regardless of the determined current transmission sector set.

도 4는 본 발명의 일실시예에 따른 강화 학습 모델의 업데이트 방법을 설명하기 위한 도면이다.4 is a diagram for explaining a method for updating a reinforcement learning model according to an embodiment of the present invention.

본 발명의 일실시예에 따른 액세스 포인트는, 강화 학습 모델의 가치 함수와 정책 함수 딥러닝 네트워크에 대해 업데이트를 수행한다. 즉, 액세스 포인트는 강화 학습 모델을 이용해 현재 송신 섹터 집합을 결정하면서, 수집된 데이터를 이용하여 강화 학습 모델에 대한 학습을 수행하여, 딥러닝 네트워크를 업데이트한다.An access point according to an embodiment of the present invention performs an update on a value function and a policy function deep learning network of a reinforcement learning model. That is, the access point updates the deep learning network by learning the reinforcement learning model using the collected data while determining the current transmission sector set using the reinforcement learning model.

본 발명의 일실시예에 따른 강화 학습 모델에서 에이전트의 액터는 정책 함수를 이용하여 행동, 즉 현재 송신 섹터 집합의 결정을 수행한다. 그리고 에이전트의 크리틱(비평가)는 가치 함수를 이용하여 액터의 행동, 즉 현재 송신 섹터 집합의 결정이 적절한지 여부를 평가한다. 정책 함수와 가치 함수는 딥러닝 네트워크로 구현될 수 있으며, 가치 함수의 평가 결과가, 정책 함수 딥러닝 네트워크의 파라미터(가중치) 업데이트에 이용된다. 그리고 가치 함수 딥러닝 네트워크는, 행동의 수행 결과, 즉 보상에 따라 파라미터가 업데이트된다.In the reinforcement learning model according to an embodiment of the present invention, an actor of an agent performs an action, that is, determines a set of current transmission sectors by using a policy function. Then, the critique (critic) of the agent uses the value function to evaluate whether the action of the actor, that is, the determination of the current transmit sector set, is appropriate. The policy function and the value function may be implemented as a deep learning network, and the evaluation result of the value function is used to update parameters (weights) of the deep learning policy function network. In the value function deep learning network, parameters are updated according to the result of performing the action, that is, the reward.

도 4를 참조하면, 액세스 포인트는 정책 함수 및 가치 함수에 대한 딥러닝 네트워크를 이용하여 정책 함수 및 가치 함수의 출력값을 획득(S410)한다.Referring to FIG. 4 , the access point acquires output values of the policy function and the value function by using a deep learning network for the policy function and the value function (S410).

정책 함수(

)는 특정 상태(S)에서 특정 행동(A)이 결정될 확률을 계산하기 위한 함수로서, 여기서

는 정책 함수 딥러닝 네트워크의 가중치를 나타낸다. 정책 함수 딥러닝 네트워크는 상태 정보를 입력받아, 입력된 상태 정보에 대해서 특정 행동을 결정할 확률에 대한, 평균(

)과 표준 편차(

)를 출력하도록 학습된 네트워크이다. 액세스 포인트는 정책 함수 딥러닝 네트워크의 출력값으로부터 정규 분포(

)를 생성하여, 특정 행동을 결정할 확률을 계산할 수 있다. 예컨대, 전술된 업데이트 정보 중 가장 높은 확률에 대응되는 정보에 따라서, 액세스 포인트는, 현재 송신 섹터 집합을 결정할 수 있다.policy function (

) is a function for calculating the probability that a specific action (A) is determined in a specific state (S), where

represents the weight of the policy function deep learning network. The policy function deep learning network receives state information and averages (

) and standard deviation (

) is a network trained to output The access point is normally distributed from the output of the policy function deep learning network (

), we can calculate the probability of deciding on a particular action. For example, according to information corresponding to the highest probability among the aforementioned update information, the access point may determine a current transmission sector set.

가치 함수(

)는 입력된 상태에 대한 보상의 기대값을 계산하기 위한 함수이며, 여기서

는 가치 함수 딥러닝 네트워크의 가중치를 나타낸다. 가치 함수 딥러닝 네트워크는 상태 정보를 입력받아, 입력된 상태 정보에 대한 보상 기대값을 출력하도록 학습된 네트워크이다. value function (

) is a function for calculating the expected value of the reward for the input state, where

represents the weights of the value function deep learning network. A value function deep learning network is a network trained to receive state information and output an expected reward value for the input state information.

액세스 포인트는 정책 함수 및 가치 함수 딥러닝 네트워크의 출력값을 손실 함수에 적용하여, 손실 함수의 손실값이 최소가 되도록 정책 함수 및 가치 함수 딥러닝 네트워크의 가중치를 업데이트(S420)한다. The access point applies the output values of the policy function and the value function deep learning network to the loss function, and updates the weights of the policy function and the value function deep learning network so that the loss value of the loss function is minimized (S420).

액세스 포인트는 일실시예로서, 시간차 학습(Temporal Difference Learning)을 이용해, 정책 함수 및 가치 함수 딥러닝 네트워크의 가중치를 업데이트할 수 있다. 시간 t 시점에서 상태를 S_t 라 하고 해당 시점에서의 보상을

라고 할 때,

의 실제 값에 대한 추정치 TD 타겟(Temporal Difference target,

)은 [수학식 7]을 통해 계산될 수 있다.As an example, the access point may update the weights of the policy function and the value function deep learning network using temporal difference learning. The state at time t is S _t and the reward at that time is

When you say

An estimate of the actual value of TD target (Temporal Difference target,

) can be calculated through [Equation 7].

여기서,

은 감가율(Discount Factor)을 나타낸다.here,

represents the discount factor.

가치 함수 딥러닝 네트워크는 [수학식 8]과 같이 계산되는 TD 에러값(

)이 최소가 되도록 업데이트될 수 있다. The value function deep learning network is a TD error value calculated as in [Equation 8] (

) can be updated to be minimal.

그리고 정책 함수 딥러닝 네트워크는 [수학식 9]와 같이 표현되는 손실 함수의 손실값이 최소가 되도록 업데이트될 수 있다.And, the policy function deep learning network can be updated so that the loss value of the loss function expressed as in [Equation 9] is minimized.

도 5는 본 발명의 일실시예에 따른 강화 학습 기반의 빔포밍 훈련 장치를 설명하기 위한 도면으로서, 도 5에서는 액세스 포인트가 일실시예로서 도시된다.5 is a diagram for explaining a reinforcement learning-based beamforming training apparatus according to an embodiment of the present invention. In FIG. 5, an access point is shown as an embodiment.

도 5를 참조하면, 본 발명의 일실시예에 따른 액세스 포인트는 위상 배열 안테나(510), 메모리(520) 및 적어도 하나의 프로세서(530)를 포함한다.Referring to FIG. 5 , an access point according to an embodiment of the present invention includes a phased array antenna 510 , a memory 520 and at least one processor 530 .

위상 배열 안테나(510)는 단말로부터 SISO 피드백 정보를 수신하고, 액션 프레임을 전송한다.The phased array antenna 510 receives SISO feedback information from the terminal and transmits an action frame.

프로세서(530)는 메모리(520)와 전기적으로 연결되며, 미리 학습된 강화 학습 모델을 이용하여, 액션 프레임의 전송에 이용되는 현재 송신 섹터 집합을 결정한다. 강화 학습 모델은 전술된 강화 학습 모델로서, 강화 학습 모델의 상태 집합은 SISO 피드백 정보 및 이전 MIMO 단계에서 액션 프레임의 전송에 이용된 이전 송신 섹터 집합을 포함하며, 강화 학습 모델의 액션 집합은 현재 MIMO 단계에 대한 상기 단말의 참여 정보 및 상기 현재 송신 섹터 집합에 대한 업데이트 정보를 포함한다.The processor 530 is electrically connected to the memory 520 and determines a current transmission sector set used for transmission of an action frame by using a pre-learned reinforcement learning model. The reinforcement learning model is the above-mentioned reinforcement learning model, wherein the state set of the reinforcement learning model includes SISO feedback information and a previous transmission sector set used for transmission of action frames in the previous MIMO step, and the action set of the reinforcement learning model is the current MIMO action set. and information about the UE's participation in the step and update information about the current transmission sector set.

앞서 설명한 기술적 내용들은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예들을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 하드웨어 장치는 실시예들의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The technical contents described above may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program commands recorded on the medium may be specially designed and configured for the embodiments or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. - includes hardware devices specially configured to store and execute program instructions, such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler. A hardware device may be configured to act as one or more software modules to perform the operations of the embodiments and vice versa.

이상과 같이 본 발명에서는 구체적인 구성 요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등하거나 등가적 변형이 있는 모든 것들은 본 발명 사상의 범주에 속한다고 할 것이다.As described above, the present invention has been described by specific details such as specific components and limited embodiments and drawings, but these are provided to help a more general understanding of the present invention, and the present invention is not limited to the above embodiments. , Those skilled in the art in the field to which the present invention belongs can make various modifications and variations from these descriptions. Therefore, the spirit of the present invention should not be limited to the described embodiments, and it will be said that not only the claims to be described later, but also all modifications equivalent or equivalent to these claims belong to the scope of the present invention. .

Claims

Receiving SISO feedback information from a terminal; and
Determining a current transmission sector set used for transmission of an action frame by using a pre-learned reinforcement learning model;
The state set of the reinforcement learning model is
It includes the SISO feedback information and a previous transmission sector set used for transmission of an action frame in a previous MIMO step,
The action set of the reinforcement learning model,
Includes participation information of the terminal for a current MIMO step and update information for the current transmission sector set,
The current and previous transmission sector set is
Among the transmission sectors assigned to each phased array antenna, a combination of transmission sectors used for transmission of the action frame is included,
The update information
Update information for the BF setup substep and BF selection substep,
first information for adding a new transmission sector combination to the previous transmission sector set;
second information for removing at least one transmission sector combination from among transmission sector combinations included in the previous transmission sector set;
third information for changing at least one of the transmission sector combinations included in the previous transmission sector set to a new transmission sector combination; and
At least one of fourth information using the previous transmission sector set as the current transmission sector set
Reinforcement learning-based beamforming training method.

According to claim 1,
The participation information
Information determined according to the SNR value included in the SISO feedback information
Reinforcement learning-based beamforming training method.

delete

According to claim 1,
The step of determining the current transmit sector set is
Among the terminals participating in the current MIMO phase and not receiving the action frame in the previous MIMO phase, a transmission sector combination having the maximum number of terminals having an SNR value of the SISO feedback information equal to or greater than the threshold is assigned to the previous transmission sector set. In addition, determining the current transmission sector set
Reinforcement learning-based beamforming training method.

According to claim 5,
The step of determining the current transmit sector set is
Among the UEs participating in the current MIMO step and not receiving the action frame in the previous MIMO step, if there is no UE having a value greater than or equal to the threshold value, a transmission sector combination including a UE corresponding to the maximum SNR value is transmitted to the previous MIMO stage. In addition to the transmit sector set, determining the current transmit sector set
Reinforcement learning-based beamforming training method.

According to claim 1,
The step of determining the current transmit sector set is
In a situation where the action frame is transmitted for the previous transmission sector set, removing a transmission sector combination having the minimum number of terminals receiving the action frame from the previous transmission sector set
Reinforcement learning-based beamforming training method.

According to claim 1,
The step of determining the current transmit sector set is
In a situation in which the action frame is transmitted for the previous transmission sector set, determining a transmission sector combination having the minimum number of terminals receiving the action frame;
Among the UEs participating in the current MIMO step and not receiving the action frame in the previous MIMO step, the determined transmit sector set is a transmit sector combination in which the number of UEs having an SNR value of the SISO feedback information equal to or greater than the threshold value is the maximum. to change
Reinforcement learning-based beamforming training method.

According to claim 1,
Determining a transmission sector combination for a BF training sub-step by checking a terminal capable of receiving the action frame from among terminals included in the transmission sector combination of the current transmission sector set
Reinforcement learning-based beamforming training method further comprising a.

According to claim 1,
Updating the deep learning network for the value function and policy function of the reinforcement learning model.
Reinforcement learning-based beamforming training method further comprising a.

According to claim 10,
The step of performing an update on the deep learning network
obtaining output values of the value function and the policy function by using the deep learning network; and
Updating weights of the deep learning network by applying the output value to a loss function so that the loss value of the loss function is minimized.
Reinforcement learning-based beamforming training method comprising a.

a phased array antenna for receiving SISO feedback information from a terminal and transmitting an action frame;
Memory;
including at least one processor electrically connected to the memory;
The processor
Determine a current transmission sector set used for transmission of the action frame by using a pre-learned reinforcement learning model;
The state set of the reinforcement learning model is
It includes the SISO feedback information and a previous transmission sector set used for transmission of an action frame in a previous MIMO step,
The action set of the reinforcement learning model is
Includes participation information of the terminal for a current MIMO step and update information for the current transmission sector set,
The current and previous transmission sector set is
Among the transmission sectors assigned to each phased array antenna, a combination of transmission sectors used for transmission of the action frame is included,
The update information
Update information for the BF setup substep and BF selection substep,
first information for adding a new transmission sector combination to the previous transmission sector set;
second information for removing at least one transmission sector combination from among transmission sector combinations included in the previous transmission sector set;
third information for changing at least one of the transmission sector combinations included in the previous transmission sector set to a new transmission sector combination; and
At least one of fourth information using the previous transmission sector set as the current transmission sector set
access point.