KR101845398B1

KR101845398B1 - Method and apparatus for setting barring factor for controlling access of user equipment based on machine learning

Info

Publication number: KR101845398B1
Application number: KR1020170026212A
Authority: KR
Inventors: 문지훈; 임유진
Original assignee: 숙명여자대학교산학협력단
Priority date: 2017-02-28
Filing date: 2017-02-28
Publication date: 2018-04-04

Abstract

The present invention provides a method and a device for setting a barring factor based on machine learning in an access control technique of an access class barring manner. According to an embodiment of the present invention, the method performs learning according to a Q-learning algorithm by putting connection success probability as a state, putting increase of the barring factor, maintenance of the barring factor and reduction of the barring factor as a selective behavior, and putting collision probability and an average access delay time as a compensation. When an actual barring factor is set, optical behavior in a current state derived from a training result is performed. According to an embodiment of the present invention, the barring factor suitable for an environment in which a base station and terminals are faced can be set through the machine learning, thereby reducing overload and congestion of a communication network.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and apparatus for setting a blocking factor for a terminal access control based on a machine learning,

본 발명은 기계학습기반의 단말 접속 제어를 위한 차단 인자 설정 방법 및 장치에 관한 것으로, 보다 상세하게는 무선통신망에서 접속 클래스 차단(Access Class Barring) 방식의 접속 제어 기법에서 기계학습을 기반으로 한 차단 인자를 설정하는 방법 및 장치에 관한 것이다. The present invention relates to a method and apparatus for setting a blocking factor for terminal access control based on a machine learning, and more particularly, to a method and apparatus for setting a blocking factor for a terminal access control based on a machine learning based on machine learning in an access class barring To a method and apparatus for setting an argument.

도 1에 이동통신시스템을 보여주는 블록도가 도시되어 있다. 이동통신시스템은 코어 네트워크(CN)(106)와, 기지국(102), 단말(104a~104n)을 포함한다. 기지국(102)은 단말(104a~104n)과 통신하는 고정된 지점(fixed station)을 말하며, eNB(evolved-NodeB), BTS(base transceiver system), 액세스 포인트(access point) 등 다른 용어로 불릴 수 있다. 하나의 기지국(102)에는 하나 이상의 셀이 존재할 수 있다. 하나의 셀은 1.25, 2.5, 5, 10 및 20 MHz 등의 대역폭 중 하나로 설정되어 여러 단말에게 하향 또는 상향 전송 서비스를 제공한다. 이때 서로 다른 셀은 서로 다른 대역폭을 제공하도록 설정될 수 있다. 단말(104a~104n)은 고정되거나 이동성을 가질 수 있으며, MS(mobile station), UT(user terminal), SS(subscriber station), MT(mobile terminal), 무선기기(wireless device) 등 다른 용어로 불릴 수 있다.1 is a block diagram illustrating a mobile communication system. The mobile communication system includes a core network (CN) 106, a base station 102, and terminals 104a to 104n. A base station 102 is a fixed station that communicates with terminals 104a-104n and may be referred to in other terms such as an evolved-NodeB (eNB), a base transceiver system (BTS), an access point, have. One base station 102 may have more than one cell. One cell is set to one of the bandwidths of 1.25, 2.5, 5, 10, and 20 MHz, and provides a downlink or uplink transmission service to a plurality of UEs. At this time, different cells may be set to provide different bandwidths. Terminals 104a-104n may be fixed or mobile and may be referred to by other terms such as a mobile station (MS), a user terminal (UT), a subscriber station (SS), a mobile terminal (MT) .

머신타입통신(Machine-Type Communication)(이하, 'MTC'라 함)은 소형의 기기 간에 인간의 개입 없이 이루어지는 통신을 말하며, 기존의 이동통신망을 사용하여 이루어질 수 있다. 이하에서는 단말을 MTC 기기로, 기지국을 eNB로 칭한다. Machine-type communication (hereinafter referred to as 'MTC') refers to communication between small devices without human intervention, and can be performed using existing mobile communication networks. Hereinafter, the terminal will be referred to as an MTC device and the base station will be referred to as an eNB.

MTC 기기는 소량의 데이터를 LTE-A 통신망을 이용하여 송수신하지만, 많은 수의 기기들이 동시에 통신망의 동일한 채널(RACH: Random Access Channel)을 통해 액세스를 시도하기 때문에 통신망의 과부하(overload) 및 혼잡(congestion) 현상을 야기할 수 있다. 임의접속(Random Access) 절차를 도 2를 참조하여 설명한다. 도 2는 임의접속 절차를 간략화하여 보여주는 데이터 흐름도이다.MTC devices transmit and receive a small amount of data by using LTE-A communication network. However, since a large number of devices attempt to access the same channel (RACH: Random Access Channel) at the same time, overload and congestion congestion phenomenon. A random access procedure will be described with reference to FIG. FIG. 2 is a data flow chart showing a simplified random access procedure. FIG.

MTC 기기(UE)가 접속을 시도하는 RACH에는 64개의 프리앰블(preamble)이 통신을 위해 사용된다. 각 MTC 기기들은 64개 중에서 임의의 프리앰블을 선택하여 RACH 채널 접속을 시도하면서 eNB(Evolved NodeB)에 자신의 송신 의사를 알리게 된다(S21). eNB는 동일한 프리앰블을 사용하는 서로 다른 기기가 있는지 확인한다. 동일한 프리앰블을 사용하는 기기들은 충돌(collision) 발생으로 인하여 RACH 액세스에 실패하게 되고, 충돌이 발생하지 않는 기기들만이 네트워크에 액세스할 수 있는 권한을 부여받아서 원하는 데이터를 eNB에 전송 할 수 있게 된다. 충돌이 발생하지 않았다면 eNB는 동일한 프리앰블을 사용하여 해당 MTC 기기로 임의접속 응답을 전송한다(S22). 그러면 MTC 기기는 자신의 ID를 보내면서 RRC(Radio Resource Control) 연결을 요청한다(S23). 그러면, eNB는 contention resolution을 수행한다(S24). 64 preambles are used for communication in the RACH in which the MTC apparatus UE tries to access. Each MTC device selects an arbitrary preamble out of 64 and notifies the eNB (Evolved NodeB) of its transmission intention by attempting to access the RACH channel (S21). The eNB checks whether there are different devices using the same preamble. Devices using the same preamble will fail to access the RACH due to a collision, and only devices that do not collide will be given access to the network, so that desired data can be transmitted to the eNB. If there is no collision, the eNB transmits a random access response to the corresponding MTC device using the same preamble (S22). Then, the MTC device sends its ID and requests a RRC (Radio Resource Control) connection (S23). Then, the eNB performs contention resolution (S24).

그런데 접속을 시도하는 기기들이 많아지면 충돌 빈도가 높아져서 데이터 전송 효율이 낮아진다. 이러한 임의 접속(random access) 시의 충돌을 방지하기 위한 방법 들 중에 접속 클래스 차단(Access Class Barring, 이하 'ACB'라 함)이 있다. ACB에서 eNB는 ACB와 관련된 파라미터들을 SIB(System Information Block)에 담아 정기적으로 통신망 내에 전송하며, 이 파라미터에는 차단인자(barring factor)(0~1 사이의 값)와 차단기간(barring duration)이 포함되어 있다. 네트워크의 혼잡도(혼잡도 = 1 - (소정 기간동안의 접속성공 단말수/소정 기간동안의 접속요청 단말수))가 높아져서 통신망에 과부하가 생기면 ACB 기법이 활성화된다. ACB 기법과 관련해서는 예를 들면 공개특허 제10-2017-0008665호 등에 개시되어 있다.However, if more devices are attempted to connect, the frequency of collision increases and the data transmission efficiency becomes lower. Among the methods for preventing collision at the time of random access are Access Class Barring (ACB). In the ACB, the eNB periodically transmits ACB-related parameters in a SIB (System Information Block), and this parameter includes a barring factor (a value between 0 and 1) and a barring duration . The ACB scheme is activated when the congestion degree of the network (congestion = 1 - (number of successful connection terminals in a predetermined period / number of connection request terminals in a predetermined period) increases and the communication network is overloaded. Related to the ACB technique is disclosed in, for example, Japanese Laid-Open Patent Publication No. 10-2017-0008665.

ACB 기법이 활성화되면 네트워크에 접속하려는 기기들은 자체적으로 0~1까지 난수를 발생시켜 이 값이 차단인자보다 작으면 계속해서 임의 접속을 진행하고, 값이 크면 차단기간 동안 임의 접속을 쉬고, 차단기간이 지나면 다시 임의 접속을 시도하게 된다. 따라서 차단인자 값을 어떻게 설정하는가가 통신망의 과부하(overload) 및 혼잡(congestion) 현상을 감소시키는데 많은 영향을 미치게 된다.When the ACB method is activated, the devices to be connected to the network generate random numbers from 0 to 1, and if the value is smaller than the blocking factor, the random access continues. If the value is large, the random access is restored during the blocking period, The random access is attempted again. Therefore, how to set the blocking factor value has a great effect on reducing the overload and congestion of the communication network.

본 발명의 일 실시형태의 한가지 목적은 무선통신망에서 접속 클래스 차단(Access Class Barring) 방식의 접속 제어 방식에서 기계학습을 통해 최적의 차단인자 값을 구하는 방법을 제공하는 것이다. It is an object of one embodiment of the present invention to provide a method for obtaining an optimum blocking factor value through machine learning in an access class barring access control method in a wireless communication network.

본 발명의 일 실시형태의 다른 목적은 무선통신망에서 접속 클래스 차단(Access Class Barring) 방식의 접속 제어 방식에서 최적의 차단인자 값을 설정함으로써 통신망의 과부하(overload) 및 혼잡(congestion) 현상을 감소시키는 것이다.Another object of an embodiment of the present invention is to reduce an overload and a congestion phenomenon of a communication network by setting an optimum blocking factor value in a connection control method of an access class barring scheme in a wireless communication network will be.

본 발명의 일 실시형태에 따른 방법은, 무선통신망에서 접속 클래스 차단(Access Class Barring) 방식의 접속 제어 기법에서 차단 인자를 설정하는 방법으로서, 접속성공확률을 상태로 놓고, 차단인자의 증가, 차단인자의 유지, 차단인자의 감소를 선택할 수 있는 행동으로 놓고, 충돌확률과 평균 접속지연시간을 보상으로 놓고 Q-학습 알고리즘에 따라 학습을 수행하여 각 상태에서 선택할 최적의 행동을 구하여 훈련결과로서 저장하는 훈련단계와, 현재 상태에 따라 상기 훈련결과로부터 도출되는 최적의 행동을 수행하는 차단인자설정단계를 구비한다. A method according to an embodiment of the present invention is a method for setting a blocking factor in an access class barric access control scheme in a wireless communication network, The learning of the Q-learning algorithm is performed by setting the collision probability and the average access delay time as compensations, and the optimal behavior to be selected in each state is obtained. And a blocking factor setting step for performing an optimal behavior derived from the training result according to the current state.

본 발명의 일 실시형태에 따른 장치는, 프로세서와, 메모리와, 단말들과의 무선통신을 위한 무선통신부를 구비한다. 프로세서는, 접속성공확률을 상태로 놓고, 차단인자의 증가, 차단인자의 유지, 차단인자의 감소를 선택할 수 있는 행동으로 놓고, 충돌확률과 평균 접속지연시간을 보상으로 놓고 Q-학습 알고리즘에 따라 학습을 소정 횟수 이상 실시하여 각 상태에서 선택할 최적의 행동을 구하여 메모리에 저장하는 훈련단계와, 현재의 상태에 대해서 저장된 상기 최적의 행동을 수행하는 차단인자설정단계를 수행함으로써 단말 접속 제어를 위한 차단인자를 설정를 구비한다.An apparatus according to an embodiment of the present invention includes a processor, a memory, and a wireless communication unit for wireless communication with terminals. The processor places the probability of connection success in the state and puts the action to be able to select the increase of the blocking factor, the maintenance factor of the blocking factor and the reduction of the blocking factor, and sets the compensation probability and the average access delay time as compensation, Learning is performed a predetermined number of times or more to obtain optimal behavior to be selected in each state and stored in a memory, and a blocking factor setting step for performing the optimal behavior stored in the current state is performed, Setting the argument.

본 발명의 일 실시형태에 따르면, 기계학습을 통하여 기지국과 단말들이 처한 환경에 적합한 차단인자를 설정할 수 있으므로 통신망의 과부하(overload) 및 혼잡(congestion) 현상을 감소시킬 수 있다. According to an embodiment of the present invention, it is possible to set a blocking factor suitable for the environments in which the base station and the terminals are located through machine learning, thereby reducing the overload and congestion of the communication network.

도 1은 머신타입통신이 적용되는 이동통신시스템의 구성을 보여주는 블록도이다.
도 2는 임의접속 절차를 간략화하여 보여주는 데이터 흐름도이다.
도 3은 본 발명의 일 실시예에 따른 단말 접속 제어를 위한 차단 인자 설정 방법의 동작 흐름을 보여주는 흐름도이다.
도 4는 본 발명의 차단 인자 설정 방법을 적용한 일 실시예에서의 성능을 보여주는 그래프이다.1 is a block diagram showing a configuration of a mobile communication system to which a machine type communication is applied.
FIG. 2 is a data flow chart showing a simplified random access procedure. FIG.
3 is a flowchart illustrating an operation flow of a blocking factor setting method for terminal connection control according to an embodiment of the present invention.
FIG. 4 is a graph showing performance in an embodiment to which the blocking factor setting method of the present invention is applied.

후술하는 본 발명에 대한 상세한 설명은, 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예에 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 본 발명의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 본 발명의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다.The following detailed description of the invention refers to the accompanying drawings, which illustrate, by way of illustration, specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. It should be understood that the various embodiments of the present invention are different, but need not be mutually exclusive. For example, certain features, structures, and characteristics described herein may be implemented in other embodiments without departing from the spirit and scope of the invention in connection with an embodiment. It is also to be understood that the position or arrangement of the individual components within each disclosed embodiment may be varied without departing from the spirit and scope of the invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is to be limited only by the appended claims, along with the full scope of equivalents to which such claims are entitled, if properly explained. In the drawings, like reference numerals refer to the same or similar functions throughout the several views.

본 발명의 일 실시형태에서는 ACB(Access Class Barring)와 같은 과부하 제어방법을 인에이블 시키거나 디스에이블 시키는데에 혼잡계수를 이용한다. 혼잡계수는 전체 프리앰블 전송 개수에 대한 충돌되는 프리앰블의 개수의 비율로 나타낸다. 혼잡계수가 소정의 임계치를 넘어서면 과부하 제어방법이 인에이블 되고, 반대로 임계치 이하가 되면 과부하 제어방법은 디스에이블 된다. In one embodiment of the present invention, a congestion coefficient is used to enable or disable an overload control method such as ACB (Access Class Barring). The congestion coefficient is expressed as a ratio of the number of colliding preambles to the total number of preamble transmissions. If the congestion coefficient exceeds a predetermined threshold value, the overload control method is enabled. If the congestion coefficient is below the threshold value, the overload control method is disabled.

본 발명에서는 Q-학습 알고리즘을 이용하여 차단인자를 효율적으로 설정한다. Q-학습 알고리즘은 시간 변화에 따른 적합도 차이를 학습에 이용하는 TD-학습(Temporal Difference learning)의 한 종류로서, 동적인 환경 하에서 시행착오를 거쳐 환경으로부터 주어지는 보상(reward)을 최대화하기 위한 학습 방법이다. Q-학습 환경 내에서 행동하는 에이전트(agent)는 특정 상태(state)에서 가능한 행동(action)들 중 하나를 택해 행하고 다른 상태(state)로 이동하게 된다. 이동하면서 환경으로부터 행동의 대가에 해당하는 보상을 받게 된다. Q-학습의 목표는 이러한 보상의 총합을 최대화하는 것이다. 다시 말해서 매 순간 보상(reward) 값이 최대화될 수 있도록 행동(action)을 선택하는 것이다. In the present invention, a blocking factor is efficiently set using a Q-learning algorithm. The Q-learning algorithm is a kind of TD-learning (Temporal Difference learning) that uses the difference of fitness according to time as a learning. It is a learning method to maximize the reward given from the environment through trial and error under dynamic environment . An agent acting in the Q-learning environment takes one of the possible actions in a particular state and moves to another state. On the move, you receive compensation for the cost of action from the environment. The goal of Q-learning is to maximize the sum of these rewards. In other words, the action is chosen so that the instantaneous reward value is maximized.

단말(MTC 기기)은 시간 슬롯 단위로 RACH를 위하여 경쟁한다. eNB는 과부하 제어방법이 인에이블된 상태에서는 현재 상태에서 기대되는 보상이 최대가 되는 행동을 취한다.The terminal (MTC device) competes for the RACH on a time slot basis. The eNB takes the action that maximizes the expected compensation in the current state when the overload control method is enabled.

S를 해당 환경에서 가능한 상태의 집합이라고 하고, A를 수행 가능한 행동의 집합이라고 하자. 시간 슬롯 t에서 환경의 상태는 s_t이고 상태 s_t는 상태의 집합 S 내에 속한다(s_t=s∈S). 시간 슬롯 t에서 eNB는 환경의 상태 s_t를 인식하고 인식된 상태 s_t 및 과거의 경험을 토대로 기대되는 보상이 최대가 되는 행동 a_t를 취한다. 행동 a_t는 행동의 집합 A 내에 속한다(a_t=a∈A).Let S be a set of possible states in the environment, and let A be a set of possible actions. In time slot t, the state of the environment is s _t and the state s _t belongs to the set S of states (s _t = s∈S). In time slot t, the eNB recognizes the state s _t of the environment and takes the behavior a _t that maximizes the expected compensation based on the recognized state s _t and past experience. The action a _t belongs to the set A of actions (a _t = a∈A).

행동 a_t가 동작하여 s_(t+1)로 환경을 바꾸게 되면 시스템에서는 r_t를 보상 받게 된다. 각 상태 s에 대해서 최대의 r_t를 보상받을 수 있는 최적의 정책 π^*(s)을 찾는 것이 목표이다. 최적의 정책은 수학식 1과 같이 정의될 수 있다. If the behavior a _t operates and the environment is changed to s _{(t + 1)} , the system will compensate r _t . The goal is to find the optimal policy π ^* (s) that can compensate for the maximum r _t for each state s. The optimal policy can be defined as Equation (1).

여기에서 Q^*(s,a)는 최적의 Q값을 말한다. Q값은 수학식 2와 같이 정의한다.Where Q ^* (s, a) is the optimal Q value. The Q value is defined by Equation (2).

이 식에서 α는 (0≤α≤1)의 범위를 갖는 학습율(learning rate)이고

는 (0≤

≤1)의 범위를 갖는 할인인자(discount factor)를 말한다. 학습율(learning rate)이 0이라면 Q값은 업데이트 하지 않는다. 학습율(learning rate)을 높게 설정하면 학습이 빠르게 이루어진다. 할인인자(discount factor)를 높게 설정하면 즉각적인 보상의 미래의 보상에 대한 가중치가 더 커진다.In this equation, a is a learning rate having a range of (0?? 1)

(0?

Lt; 1). &Lt; / RTI > If the learning rate is zero, the Q value is not updated. Setting a high learning rate makes learning faster. Setting a high discount factor increases the weighting of future compensation of immediate compensation.

eNB는 Q-학습 알고리즘을 적용하여 시스템 성능을 최대화하도록 차단인자를 설정한다. 이를 위하여 본 발명에서는 상태(State), 행동(Action), 보상(Reward)을 다음과 같이 정의한다. The eNB applies Q-learning algorithms to set blocking factors to maximize system performance. To this end, the present invention defines states, actions, and rewards as follows.

상태(State)는 접속성공확률

로 정의한다.

는 시간 슬롯 t에서 RACH에 대해 경합하는 장치들의 수에 대한 RACH에 성공적으로 접속한 장치들의 수의 비율을 나타내며 (0≤

≤1)의 범위를 갖는다. 접속성공확률은 최소값(0)과 최대값(1) 사이에서 유한한 수(N)의 상태로 균등하게 분할되어 할당된다. 예를 들어, 상태의 수를 4개로 하는 경우(N=4)에는 접속성공확률값에 따라 0.75≤

≤1.0, 0.5≤

<0.75, 0.25≤

<0.5, 0≤

<0.25 의 4개의 상태 중의 하나로 할당된다. 따라서 eNB는 현재의 접속성공확률에 따라서 지금 어떠한 상태에 속해 있는지 알 수 있다. State is the probability of success

.

Represents the ratio of the number of devices successfully connected to the RACH to the number of devices contending for RACH in time slot t (0 <

Lt; = 1). The connection success probability is equally divided and allocated in a finite number (N) state between the minimum value (0) and the maximum value (1). For example, if the number of states is four (N = 4), the probability of connection is 0.75

? 1.0, 0.5?

&Lt; 0.75, 0.25

<0.5, 0

&Lt; 0.25. Therefore, the eNB can know what status it is in now according to the current connection success probability.

행동(Action)은 1) 차단인자(barring factor) p값의 증가, 2) 차단인자 p값의 감소, 3) 차단인자 p값의 이전 값 유지의 세 가지 행동을 취할 수 있으며, p값의 증가와 감소는 δ_i(δ_i{0.1, 0.2, 0.3, 0.4, 0.5}) 만큼 시킬 수 있다. Action can take three actions: 1) increase of barring factor p value, 2) decrease of p factor value, and 3) maintenance of previous value of p factor value, and increase of p value And the reduction can be made by δ _i (δ _i {0.1, 0.2, 0.3, 0.4, 0.5}).

훈련단계에서 행동을 선택하는 한가지 방법으로서 엡실론-그리디 알고리즘(ε-greedy algorithm)을 사용할 수 있다. 엡실론-그리디 알고리즘에서는 확률 ε으로 현재 상태에서 선택할 수 있는 행동을 임의로(random) 선택한다. 즉, 행동을 선택할 때, 대부분의 경우에는 현재 상태에서 가장 큰 Q값을 갖는 행동을 선택하지만, 확률 ε으로 행동을 임의로(random) 선택한다.One way to select behavior in the training phase is to use the epsilon-greedy algorithm. The Epsilon - Greedy algorithm randomly chooses the behavior that can be selected from the current state with probability ε. That is, when choosing an action, in most cases the action with the highest Q value is selected in the current state, but the behavior is chosen randomly with probability ε.

eNB가 현재 상태 s에서 최대인 Q값을 가지는 행동 a를 선택하기 위해서는 보상이 정의되어야 한다. 일 실시형태에서 충돌확률과 평균 접속지연시간(access delay)이라는 측면에서 시스템 성능을 최대로 하도록 보상을 정의한다. 즉, 시간 슬롯 t에서 상태 s 와 행동 a에 대한 eNB의 보상은 수학식 3과 같다.The compensation must be defined in order for the eNB to select a behavior a with a maximum Q value in the current state s. In one embodiment, compensation is defined to maximize system performance in terms of collision probability and average access delay. That is, the compensation of the eNB for the state s and the behavior a in the time slot t is expressed by Equation (3).

충돌확률

는 시간 슬롯 t에서 RACH에 대해 경합하는 장치들의 수에 대한 RACH에서 충돌이 발생한 장치들의 수의 비율로 정해진다 (0≤

≤1). 평균 접속지연시간 delay _t 는 시간 슬롯 t에서 RACH 접속에 성공한 기기들에 대하여, 각 기기가 처음 랜덤 액세스를 시도한 시간부터 접속을 최종 성공한 시간까지의 지연 시간들의 평균을 나타낸다. delay _max 는 시스템에서 허용하는 최대 접속 지연 시간이며, β는 평활인자(smoothing factor)이다. (0≤β≤1)Collision probability

Is defined as the ratio of the number of devices with collisions in the RACH to the number of devices competing for RACH in time slot t (0 <

1). The average access delay time delay _t represents the average of the delay times from the time when each device first attempts random access to the time when the device succeeds in connection to the devices that succeed in RACH connection in time slot t. delay _max is the maximum access delay allowed by the system, and β is the smoothing factor. (0??? 1)

다음으로, 도 3을 참조하여 본 발명의 일 실시예에 따른 기계학습기반의 단말 접속 제어를 위한 차단 인자 설정 방법을 설명한다.Next, with reference to FIG. 3, a method of setting a blocking factor for terminal access control based on a machine learning according to an embodiment of the present invention will be described.

본 방법은 eNB(기지국)에서 수행된다. eNB는 프로세서와, 메모리와, 단말과의 무선통신을 위한 무선통신부와, 코어 네트워크에의 접속을 위한 코어 네트워크 접속부를 구비한다. The method is performed in an eNB (base station). The eNB includes a processor, a memory, a wireless communication unit for wireless communication with the terminal, and a core network connection unit for connection to the core network.

프로세서는 전술한 것처럼, 접속성공확률을 유한한 수의 상태로 분할하여 할당하고, 차단인자의 증가, 차단인자의 유지, 차단인자의 감소를 선택할 수 있는 행동으로 놓고, 충돌확률과 평균 접속지연시간을 보상으로 놓고 Q-학습 알고리즘에 따라 학습을 수행하여 그 훈련결과를 메모리에 저장한다(단계 S310). As described above, the processor divides the connection success probability into a finite number of states and assigns it as an action capable of selecting an increase in blocking factor, maintenance of blocking factor, and reduction in blocking factor, Learning is performed according to the Q-learning algorithm, and the training result is stored in the memory (step S310).

즉, delay _max 를 시스템에서 허용하는 최대 접속 지연 시간, β를 0과 1 사이의 값을 갖는 평활인자(smoothing factor)라 할 때 현재 상태 s에서 선택할 수 있는 모든 행동 a (a∈A)에 대해서 수학식 3으로 표현되는 보상 r_t(s,a)를 계산하고, 계산된 보상 중에서 가장 큰 보상을 가져오는 행동을 선택하여 수행하는 것을 소정 횟수 이상 실시하여 각 상태에서 선택할 최적의 행동을 구하여 메모리에 저장한다.That is, for all the actions a (a∈A) that can be selected in the current state s when the delay _max is the maximum access delay time allowed by the system, and β is a smoothing factor with a value between 0 and 1, mathematical compensation is represented by the following formula 3 r _t (s, a) for calculating and subjected to then choose to bring the greatest compensation in the calculated compensation action or greater than a predetermined number of times, obtain the best action selected in each state memory .

훈련이 종료되면, 전체 프리앰블 전송 개수에 대한 충돌되는 프리앰블의 개수의 비율을 혼잡계수라 할 때, 혼잡계수가 소정의 임계치를 넘어서는지를 확인한다(단계 S320). When the training is finished, it is checked whether the congestion coefficient exceeds a predetermined threshold value, when the ratio of the number of colliding preambles to the total number of preamble transmissions is a congestion coefficient (step S320).

혼잡계수가 소정의 임계치를 넘어서는 경우에 현재 상태에 대해서 메모리에 저장되어 있는 최적의 행동을 읽어와서 수행한다(단계 S330). 최적의 행동이란 차단인자의 증가, 감소, 유지 중의 하나이며, 증가 또는 감소의 폭 δ_i 는 예를 들면 {0.1, 0.2, 0.3, 0.4, 0.5} 중의 하나를 선택할 수 있다. 한편, 실시형태에 따라서는 단계 S330를 수행해가면서 그 결과를 훈련결과에 계속하여 반영하도록 구성하는 것도 가능하다.When the congestion coefficient exceeds a predetermined threshold value, the optimum behavior stored in the memory for the current state is read and performed (step S330). Optimal behavior is one of increasing, decreasing, and maintaining the blocking factor, and the width of the increase or decrease δ _i can be selected, for example, {0.1, 0.2, 0.3, 0.4, 0.5}. On the other hand, according to the embodiment, it is possible to constitute such that the result is continuously reflected on the training result while performing the step S330.

도 4는 상태의 집합 S가 (0.75≤

≤1.0, 0.5≤

<0.75, 0.25≤

<0.5, 0≤

<0.25)의 4개의 상태로 이루어지고, 행동의 집합 A가 (δ_i=0.2일때, 0.2 감소, 유지, +0.2 증가)의 3개의 행동으로 이루어진다고 정의되었을 때의 성능을 보인 것이다. Figure 4 shows that the set of states S is (0.75 <

? 1.0, 0.5?

&Lt; 0.75, 0.25

<0.5, 0

<0.25), and the performance when the set of actions A is defined to be composed of three behaviors (δ _i = 0.2, decrease 0.2, keep, increase +0.2).

보상을 위해서 β=0.5으로 delay _max 는 시간 슬롯 크기 × 프리앰블 전송의 최대 숫자로 설정하였다. 도 4에서 성공확률(success probability)은 전체 서비스 시간 T 동안 RACH 접속을 시도한 기기들의 수에 대한 RACH 접속을 성공한 기기들의 수의 비율을 나타낸다. 실패확률(Fail probability)은 전체 서비스 시간 동안 RACH 접속을 시도한 기기들의 수에 대한 RACH 접속을 최종적으로 실패한 기기들의 수의 비율을 나타낸다. 훈련(Training)은 5번 반복되었으며, 훈련(training)이 반복될수록 성능이 좋아지는 것을 확인할 수 있다.For compensation, the delay _max is set to the maximum number of timeslot size x preamble transmission, with beta = 0.5. In FIG. 4, the success probability represents a ratio of the number of devices that have succeeded in RACH connection to the number of devices attempting RACH connection for the entire service time T. Fail probability indicates the ratio of the number of devices that finally fail the RACH connection to the number of devices attempting RACH connection for the entire service time. The training was repeated 5 times, and the more the training was repeated, the better the performance.

이상에서 실시예들에 설명된 특징, 구조, 효과 등은 본 발명의 하나의 실시예에 포함되며, 반드시 하나의 실시예에만 한정되는 것은 아니다. 나아가, 각 실시예에서 예시된 특징, 구조, 효과 등은 실시예들이 속하는 분야의 통상의 지식을 가지는 자에 의해 다른 실시예들에 대해서도 조합 또는 변형되어 실시 가능하다. 따라서 이러한 조합과 변형에 관계된 내용들은 본 발명의 범위에 포함되는 것으로 해석되어야 할 것이다.The features, structures, effects and the like described in the embodiments are included in one embodiment of the present invention and are not necessarily limited to only one embodiment. Furthermore, the features, structures, effects and the like illustrated in the embodiments can be combined and modified by other persons skilled in the art to which the embodiments belong. Therefore, it should be understood that the present invention is not limited to these combinations and modifications.

또한, 이상에서 실시예를 중심으로 설명하였으나 이는 단지 예시일 뿐 본 발명을 한정하는 것이 아니며, 본 발명이 속하는 분야의 통상의 지식을 가진 자라면 본 실시예의 본질적인 특성을 벗어나지 않는 범위에서 이상에 예시되지 않은 여러 가지의 변형과 응용이 가능함을 알 수 있을 것이다. 예를 들어, 실시예에 구체적으로 나타난 각 구성 요소는 변형하여 실시할 수 있는 것이다. 그리고 이러한 변형과 응용에 관계된 차이점들은 첨부된 청구 범위에서 규정하는 본 발명의 범위에 포함되는 것으로 해석되어야 할 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is clearly understood that the same is by way of illustration and example only and is not to be taken by way of illustration, It can be seen that various modifications and applications are possible. For example, each component specifically shown in the embodiments can be modified and implemented. It is to be understood that all changes and modifications that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

102 기지국
104 코어 네트워크
106 단말 102 base station
104 core network
106 terminal

Claims

delete

A method for setting a blocking factor in an access class blocking (Access Class Barring) access control scheme in a wireless communication network,
The probability of connection success is set as a state, and the action to select the increase of the blocking factor, the maintenance factor of the blocking factor, and the reduction factor of the blocking factor is set as the compensation and the learning is performed according to the Q-learning algorithm A training step of obtaining an optimum behavior to be selected in each state and storing it as a training result,
A blocking factor setting step of performing an optimal behavior derived from the training result according to a current state
And,
In the training step
Connection success probability in time slot t

Is the ratio of the number of devices successfully connected to the RACH to the number of devices competing for RACH in time slot t,
The probability of collision in time slot t

Is the ratio of the number of collision-causing devices in the RACH to the number of devices competing for RACH in time slot t,
The average access delay time delay _t in the time slot t is the average of the delay times from the time when each device first attempts random access to the time when the device succeeds in connection to the devices that succeed in RACH connection in time slot t,
The connection success probability is divided into a minimum value and a maximum value and is allocated in a finite number of states,
The training step comprises:
For all behaviors a (a∈A) that can be selected in the current state s when the delay _max is the maximum access delay time allowed by the system and β is a smoothing factor with a value between 0 and 1, calculating a compensation r _t (s, a), which is represented by,

A step of selecting and carrying out the action that brings the largest compensation out of the calculated compensation
Is performed a predetermined number of times or more to obtain and store an optimal behavior to be selected in each state,
Wherein the blocking factor setting step performs the optimal behavior stored for the current state.

3. The method of claim 2,
Wherein the connection success probability is equally divided between a minimum value and a maximum value and is allocated in a finite number of states.

delete

The method of claim 3,
Wherein the increasing and decreasing width of the blocking factor is one of 0.1, 0.2, 0.3, 0.4 and 0.5.

The method according to any one of claims 2, 3, and 5,
And selecting a randomly selected behavior from the current state to a probability ε in the training step.

The method according to any one of claims 2, 3, and 5,
Further comprising the step of reflecting the execution result in the blocking factor setting step to the training result.

The method according to any one of claims 2, 3, and 5,
Wherein the blocking factor setting step is performed only when the ratio of the number of collided preambles to the total number of preamble transmissions is a congestion coefficient and the congestion coefficient exceeds a predetermined threshold value .

A processor,
A memory,
And a wireless communication unit for wireless communication with the terminals,
The processor comprising:
The probability of connection success is set as the state, and the action of selecting the increase of the blocking factor, the maintenance factor of the blocking factor, and the reduction factor of the blocking factor is set as a compensation, the collision probability and the average access delay time are compensated, A training step of performing an operation more than a predetermined number of times,
A blocking factor setting step of performing the optimal behavior stored for the current state
A blocking factor for terminal connection control is set,
In the training step
Connection success probability in time slot t

A step of selecting and carrying out the action that brings the largest compensation out of the calculated compensation
Is performed a predetermined number of times or more to obtain and store an optimal behavior to be selected in each state,
The blocking factor setting step performs the stored optimal behavior for the current state.