WO2018151344A1 - Procédé de groupement de dispositifs ap en utilisant dqn, et dispositif de communication coopératif utilisant dqn - Google Patents

Procédé de groupement de dispositifs ap en utilisant dqn, et dispositif de communication coopératif utilisant dqn Download PDF

Info

Publication number
WO2018151344A1
WO2018151344A1 PCT/KR2017/001683 KR2017001683W WO2018151344A1 WO 2018151344 A1 WO2018151344 A1 WO 2018151344A1 KR 2017001683 W KR2017001683 W KR 2017001683W WO 2018151344 A1 WO2018151344 A1 WO 2018151344A1
Authority
WO
WIPO (PCT)
Prior art keywords
dqn
terminal
clustering
cooperative communication
cell
Prior art date
Application number
PCT/KR2017/001683
Other languages
English (en)
Korean (ko)
Inventor
조동호
이혁준
지동진
정배렬
Original Assignee
한국과학기술원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 한국과학기술원 filed Critical 한국과학기술원
Priority to PCT/KR2017/001683 priority Critical patent/WO2018151344A1/fr
Publication of WO2018151344A1 publication Critical patent/WO2018151344A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W40/00Communication routing or communication path finding
    • H04W40/02Communication route or path selection, e.g. power-based or shortest path routing
    • H04W40/20Communication route or path selection, e.g. power-based or shortest path routing based on geographic position or location
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W40/00Communication routing or communication path finding
    • H04W40/24Connectivity information management, e.g. connectivity discovery or connectivity update
    • H04W40/32Connectivity information management, e.g. connectivity discovery or connectivity update for defining a routing cluster membership

Definitions

  • the technology described below relates to cooperative communication of a mobile communication AP device.
  • a service may be provided in cooperation with an adjacent AP device for a terminal located in a boundary area of a cell served by an AP device such as a base station. That is, a plurality of AP devices cooperate to provide a communication service for one terminal.
  • an AP device such as a base station.
  • CoMP Coordinatd Multi-Point
  • Clustering techniques are divided into dynamic clustering and static clustering techniques. Dynamic clustering is to perform clustering in real time according to the location of the terminal, static clustering is performed by using a predetermined pattern.
  • Dynamic clustering reflects the location of the terminal in real time, resulting in a large overhead of the system, and correct clustering is difficult to provide stable services when the terminal is out of the expected location or traffic increases rapidly.
  • the technology described below is intended to provide clustering between AP devices using deep Q-network (DQN).
  • DQN deep Q-network
  • the cooperative communication device using the DQN includes a storage device for storing the DQN variable and the location of the neighboring AP device, an antenna for receiving channel state information from the terminal in the cell, and the distribution of the terminal in the cell identified using the channel state information. Determining at least one candidate AP device capable of serving a specific area of the cell among neighboring AP devices, and inputting the position of the candidate AP device, the distribution of the terminal, and the channel state information of each terminal to the DQN. And a control circuit for determining at least one target AP device of one candidate AP device.
  • the technology described below provides high quality service to a terminal located in a boundary region of a cell by providing optimal clustering for a situation through reinforcement learning using DQN.
  • 1 is an example of a communication environment for cooperative communication.
  • 5 is an example of clustering through reinforcement learning.
  • 10 is another example of clustering using DQN.
  • first, second, A, B, etc. may be used to describe various components, but the components are not limited by the terms, but merely for distinguishing one component from other components. Only used as For example, the first component may be referred to as the second component, and similarly, the second component may be referred to as the first component without departing from the scope of the technology described below.
  • each process constituting the method may occur differently from the stated order unless the context clearly indicates a specific order. That is, each process may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.
  • the technology described below relates to a cooperative communication technique of a plurality of AP devices.
  • the technology described below relates to clustering between a plurality of AP devices for cooperative communication.
  • the AP device may be a mobile communication AP (base station), a small cell AP, a WiFi AP, or the like.
  • the cooperative communication may be performed between APs of the same type or may be performed in heterogeneous networks such as macro cells and small cells.
  • the AP device may be a device using the same communication method, or in some cases, may be a device using different communication methods. For convenience of explanation, it is assumed that the cooperative communication between the AP devices such as the base station of the mobile communication.
  • the AP device is arranged in a certain area. 1 illustrates an example of dividing an entire region into an n ⁇ n grid. For convenience of explanation, one AP device is shown for each rectangle. The UE is randomly distributed in the entire area. The terminal may also change position with time.
  • the AP device 10A may provide a communication service for the terminals 5A and 5B.
  • the terminal 5B located in the boundary region of the cell may be serviced by another AP device 10B. Therefore, the AP device 10A and the AP device 10B may interfere with each other. In this case, the AP device 10A and the AP device 10B may provide a communication service to the terminal 5B without a trunk line if cooperative communication is performed.
  • a process of determining an AP device to be clustered for cooperative communication will be described.
  • DQN is an algorithm for reinforcement learning in a wider state space by adding a value network to Q-learning technology.
  • the Q value is a measure of the value function for the state. For example, the position and distribution of each terminal can be changed indefinitely, and since there is a large number of combinations, it is not efficient to store the Q value for each situation.
  • DQN solves this problem by estimating a function that determines the Q value, rather than storing individual Q values. If the conventional Q learning technique stores the state in each table and checks the Q value through a lookup, the DQN inputs the current state to the value network and extracts the Q value as a result value.
  • the DQN can approximate a function of determining the Q value using a value network of three or more layers.
  • Q-learning is basically a reinforcement learning algorithm that consists of environment, agent, state, action, and reward.
  • agent By first acting by an agent, the agent can move to a new state.
  • Immediate rewards are immediate rewards for actions taken by an agent
  • future rewards are rewards for the future environment resulting from the action.
  • the agent's final goal is to update the Q value to get the maximum of both rewards. This can be expressed as Equation 1 below.
  • is a discount factor between 0 and 1, the closer to 0, the closer to 1, the greater the importance of compensation for the future. In the present invention, it is set to 0.5 to equally consider the reward for the present.
  • the behavior is clustering
  • the compensation is the throughput according to the clustering.
  • the agent corresponds to a subject that performs clustering and may be an AP device. Alternatively, a separate control device located in the mobile communication core network may be an agent.
  • the terminal is randomly distributed over the entire region and may move constantly over time. It is assumed that the location information of the user can be known by using channel state information (CSI) information, which is a unit representing the user's channel state information.
  • CSI channel state information
  • the state of the Q-learning is defined as in Equation 2 below.
  • C represents a base station identification number.
  • C ⁇ ⁇ 1,2,3, ..., N ⁇ .
  • the UE is an identification number of the terminal.
  • UE ⁇ ⁇ user 1 , user 2 , user 3 , ..., user M ⁇ .
  • CI represents CSI information.
  • CI ⁇ ⁇ CSI 1 , CSI 2 , CSI 3 , ..., CSI m ⁇ .
  • the state may be generated as follows.
  • Q t (s) [(1,2,3), (user 1 , ⁇ user 2 , user 3 ⁇ , ⁇ user 4 , user 5 ⁇ ), (CSI 1 , ⁇ CSI 2 , CSI 3 ⁇ , ⁇ CSI 4 , CSI 5 ⁇ )].
  • the behavior of the agent depends on the surrounding environment of the AP device. For example, the behavior may vary depending on whether the environment is a road or pedestrian zone in which the car travels.
  • a cluster may be formed according to a road shape (expected movement direction) by determining the number of AP devices forming a cluster.
  • 2 is an example of clustering for cooperative communication.
  • 2 (a) is an example of clustering of AP devices disposed in a road area.
  • K 2
  • a cluster may be formed by selecting two adjacent AP devices around a road.
  • 2 (b) is an example of clustering for a walking zone of a person.
  • clusters may be formed by following the steps below. First, (1) find the area with the most users in the boundary area using CSI information. (2) As shown in FIG. 2 (b), one adjacent AP device is selected based on the AP device of the corresponding zone to form a cluster.
  • a criterion for selecting a neighboring AP device selects a neighboring AP device capable of maximizing interference cancellation.
  • the interference magnitudes are the same, all neighboring AP devices are clustered.
  • 2 (b) shows an example of selecting a neighboring AP device capable of maximally eliminating interference of a terminal located in a boundary area. 2B illustrates an example in which the AP device 20A selects the AP device 20B located below.
  • the AP device includes a storage device 21 for storing DQN variables and other information, a control circuit 22 for determining a DQN learning and clustering type, and an antenna 23 for communicating with the terminal.
  • the antenna 23 may receive channel state information from the terminal in the cell.
  • the control circuit 22 determines at least one candidate AP device capable of serving a specific area of the cell among neighboring AP devices based on the distribution of terminals in the cell identified using the channel state information, and determines the location of the candidate AP device.
  • the at least one target AP device among the at least one candidate AP device is determined by inputting the distribution of the terminal and channel state information of each terminal to the DQN. Thereafter, the AP device and the target AP device perform clustering to perform cooperative communication.
  • the storage device 21 may store actions and rewards for later learning.
  • the reward may use the performance or throughput of the terminal as a reward value for the action taken by the agent.
  • the compensation may be set as in Equations 3 and 4 below.
  • T lb is the lower 5% of overall performance
  • T avg is the average of overall performance.
  • 5% is one example.
  • the sigmoid function of FIG. 3 has a characteristic that the derivative is larger as the derivative value is smaller in the domain around 0 and 1 and approaches 0.5.
  • 5% performance is close to average performance, a small penalty is imposed. If the 5% performance is reduced by a certain degree or more, a large penalty is applied to guarantee the capacity of the edge area terminals.
  • the agent checks the current state s (C, UE, CSI) (110).
  • the agent obtains a Q value using the DQN (120).
  • the agent selects an action that determines the clustering type according to the Q value (130).
  • the agent observes the reward according to the action (140). If the learning is not finished, the agent stores 150 its actions and the rewards 150 accordingly. Repeat this process until the end of learning.
  • the agent prepares a DQN for determining clustering. Agents can learn while clustering in a live environment. Agents can also use certain sample data to train in advance.
  • the agent may be any one AP device as described above. Alternatively, it may be another control device that receives information from the AP device. For example, the agent may be a control device located in the core network of the mobile communication.
  • FIG. 5 is an example of a process 200 of clustering through reinforcement learning.
  • FIG. 5 assumes a situation in which the DQN learned according to FIG. 4 is provided.
  • the agent checks the current state s (C, UE, CSI) (210).
  • the agent obtains a Q value using the learned DQN (220).
  • the agent selects an action that determines the clustering type according to the Q value (230).
  • the agent observes the reward for the behavior (240).
  • the agent determines whether the reward according to the current behavior is greater than the reward immediately before (250).
  • the agent may determine that the reward is greater if the current reward is greater than a certain threshold than the previous reward. That is, the agent determines whether the performance of the terminal is constantly improved according to the clustering.
  • the agent changes the cluster according to the action (260). If the current reward is not greater than the previous reward, the agent does not change the cluster. The agent checks if the communication is terminated (270), and repeats the whole process until the communication is terminated.
  • the current clustering environment is composed of a two-dimensional structure of an AP device and terminals.
  • AP devices with many terminals in the border region can increase capacity by removing interference through clustering. If the terminals are mostly near the AP device and there is little movement, it is efficient to operate the AP devices individually. Therefore, using an artificial neural network that can reflect the two-dimensional structure as a value network helps to improve performance.
  • CNN Convolutional Neural Network
  • CNN is an artificial neural network structure that can best understand the above two-dimensional structure.
  • CNN consists of several convolutional layers and several fully connected layers.
  • the convolutional layer extracts the 2-D structure from the state observed through the convolution mask and shared weights.
  • By nesting convolutional layers more complex features can be found.
  • One of the most frequently used techniques in CNN is max pooling, which extracts only the largest value in the space covered by the mask, and reduces the complexity and guarantees translational invariance.
  • the first convolutional layer receives the location of the current AP device, the distribution of the terminals, and the CSI of each terminal as input. This layer uses a 5 * 5 convolution mask to find low level features. Low level features refer to simple features such as terminal distribution and density between any two AP devices. The next two layers use a 3 * 3 convolution mask to find high level features. The high level features are inferred from the low level features found above, and indicate the spatial distribution of two pairs of AP devices with many terminals and the movement pattern of the terminal in time.
  • the value network will be trained according to the procedures for training basic DQNs.
  • the behavior is changed in the communication environment, that is, the clustering environment is changed, and the reward according to the behavior is observed.
  • the agent stores the observed behavior and reward pairs in storage. Agents learn the value network at regular training sessions. The agent can use the actions and rewards stored in the storage device to perform training during the training period and update the DQN network.
  • the agent checks 310 the behavior and reward information stored in the storage device.
  • the agent retrieves the variables of the DQN (320).
  • the DQN variable is stored in advance in the storage device.
  • the agent learns 330 the DQN using behaviors and rewards stored in the storage device.
  • the agent uses the learned DQN again to observe the reward according to the behavior (340).
  • the agent repeats 350 the learning process until the learning is completed using all the sample data (behavior and reward) stored in the storage device.
  • the agent assigns a variable of the newly learned DQN (360).
  • the newly designated variable may be stored in the storage device.
  • 8 is an example of clustering using DQN. 8 is a situation of a performance hall performing a performance. In the venue, many terminals attempt to communicate at one time during the performance. Since the performance takes place at an unspecified time, a very heavy burden is placed on unspecified times for AP devices around the stadium. In the existing static clustering structure, it is difficult to guarantee QoS considering only one of the environments in which performances are performed or not. If the reinforcement learning clustering is performed using the DQN according to the above-described method, capacity can be increased by changing the clustering form of the AP device according to the changing terminal density.
  • an AP device around a stadium and an AP device having a low terminal density may be clustered to utilize a communication resource of an AP device having a low terminal density. Accordingly, the total capacity of the terminals in the stadium may be increased by sharing resources not utilized by the AP device having a low terminal density to the AP devices around the stadium.
  • 9 is another example of clustering using DQN.
  • 9 is an example of a downtown situation. In the downtown area, the terminal traffic increases sharply in a certain time zone (commuting time), and the traffic is eliminated after the commute time.
  • the network capacity is degraded because clustering cannot be performed in response to the terminal traffic that changes in each time zone.
  • Reinforcement learning clustering using DQN can increase network capacity by eliminating interference problems between AP devices by forming clusters only in areas where terminal traffic increases in order to maximize a compensation value determined according to throughput.
  • the clusters may be formed according to the road shape to reduce the number of handovers of the terminal, thereby providing stable network capacity.
  • dynamic clustering it is difficult to apply the actual network model because the system overhead is greatly increased because the state change of many terminals must be reflected in real time.
  • 10 is another example of clustering using DQN.
  • 10 is an example of a situation in which a disaster has occurred.
  • neighboring AP devices are destroyed and rescue personnel are increasing, causing a surge in data traffic for flexible AP devices to temporarily handle.
  • static clustering it is difficult to provide stable network capacity in a disaster situation because the cluster pattern is applied without being aware of the change in the situation.
  • reinforcement learning clustering is performed using DQN, the system recognizes that the compensation value is greatly reduced and forms a cluster between the AP devices that can be operated as shown in FIG. 10 to increase the compensation. Resources will be shared. Therefore, network resources can be concentrated on AP devices that need support, thereby providing network capacity required for the structure.
  • dynamic clustering requires high computational power, and it is impossible to provide stable network capacity when most of the flexible AP devices are lost in a disaster.

Abstract

L'invention concerne un procédé de groupement de dispositifs AP utilisant un réseau Q profond (DQN), comprenant les étapes suivantes : identification d'une distribution de terminaux dans une cellule d'un premier dispositif AP en utilisant des informations d'état de canal ; détermination d'un ou plusieurs dispositifs AP candidats aptes à accomplir un service pour une zone spécifique de la cellule conjointement avec le premier dispositif AP, sur la base de la distribution de terminaux ; détermination d'au moins un deuxième dispositif AP parmi lesdits dispositifs AP candidats au moyen d'un DQN qui reçoit, en tant qu'entrées, une position du premier dispositif AP, les positions des dispositifs AP candidats, la distribution de terminaux et les informations d'état de canal de chaque terminal ; et regroupement du premier dispositif AP et dudit deuxième dispositif AP.
PCT/KR2017/001683 2017-02-16 2017-02-16 Procédé de groupement de dispositifs ap en utilisant dqn, et dispositif de communication coopératif utilisant dqn WO2018151344A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/KR2017/001683 WO2018151344A1 (fr) 2017-02-16 2017-02-16 Procédé de groupement de dispositifs ap en utilisant dqn, et dispositif de communication coopératif utilisant dqn

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/KR2017/001683 WO2018151344A1 (fr) 2017-02-16 2017-02-16 Procédé de groupement de dispositifs ap en utilisant dqn, et dispositif de communication coopératif utilisant dqn

Publications (1)

Publication Number Publication Date
WO2018151344A1 true WO2018151344A1 (fr) 2018-08-23

Family

ID=63169940

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2017/001683 WO2018151344A1 (fr) 2017-02-16 2017-02-16 Procédé de groupement de dispositifs ap en utilisant dqn, et dispositif de communication coopératif utilisant dqn

Country Status (1)

Country Link
WO (1) WO2018151344A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130242748A1 (en) * 2012-03-16 2013-09-19 Nokia Siemens Networks Oy Hierarchical network and interference management
KR20140063334A (ko) * 2012-11-16 2014-05-27 삼성전자주식회사 휴대 단말기에서 근거리 통신을 연결하는 장치 및 방법
KR101574882B1 (ko) * 2014-03-18 2015-12-04 한국과학기술원 분산 네트워크 기반의 인지 라디오 단말의 송신 전력 제어 방법 및 장치
KR20160134626A (ko) * 2016-11-15 2016-11-23 주식회사 케이티 네트워크 상황 정보를 활용한 액세스 포인트 접속 방법 및 시스템

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130242748A1 (en) * 2012-03-16 2013-09-19 Nokia Siemens Networks Oy Hierarchical network and interference management
KR20140063334A (ko) * 2012-11-16 2014-05-27 삼성전자주식회사 휴대 단말기에서 근거리 통신을 연결하는 장치 및 방법
KR101574882B1 (ko) * 2014-03-18 2015-12-04 한국과학기술원 분산 네트워크 기반의 인지 라디오 단말의 송신 전력 제어 방법 및 장치
KR20160134626A (ko) * 2016-11-15 2016-11-23 주식회사 케이티 네트워크 상황 정보를 활용한 액세스 포인트 접속 방법 및 시스템

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YIQIANG CHEN, ET AL.: "Power-efficient access-point selection for indoor location estimation", IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, vol. 18, no. 7, 30 May 2006 (2006-05-30), pages 877 - 888, XP055114614 *

Similar Documents

Publication Publication Date Title
KR101877243B1 (ko) 강화학습 기반의 신경망을 이용한 ap 장치 클러스터링 방법 및 강화학습 기반의 신경망을 이용한 협력 통신 장치
Anandakumar et al. Supervised machine learning techniques in cognitive radio networks during cooperative spectrum handovers
Labriji et al. Mobility aware and dynamic migration of MEC services for the Internet of Vehicles
Wickramasuriya et al. Base station prediction and proactive mobility management in virtual cells using recurrent neural networks
CN103891389B (zh) 用于对等网络中的wan辅助的争用检测和解决的方法和装置
CN112351503A (zh) 基于任务预测的多无人机辅助边缘计算资源分配方法
CN114189888A (zh) 基于数字孪生的5g融合网架构下多模终端接入系统及方法
Huang et al. An overview of intelligent wireless communications using deep reinforcement learning
WO2022045700A1 (fr) Procédé et appareil de mise à l'échelle automatique de conteneurs dans un réseau central natif en nuage
WO2020152389A1 (fr) Apprentissage automatique pour un réseau de communication
Ali et al. Reinforcement-learning-enabled massive internet of things for 6G wireless communications
Alghamdi et al. On the optimality of task offloading in mobile edge computing environments
Krishnan et al. Optimizing throughput performance in distributed MIMO Wi-Fi networks using deep reinforcement learning
Krishna et al. An efficient approach for distributed dynamic channel allocation with queues for real-time and non-real-time traffic in cellular networks
Huang et al. Collaborative machine learning for energy-efficient edge networks in 6G
Wu et al. Artificial intelligence for smart resource management in multi-user mobile heterogeneous RF-light networks
WO2022030713A1 (fr) Configuration de ressources dans un réseau à auto-organisation
Sun et al. Harmonizing artificial intelligence with radio access networks: Advances, case study, and open issues
WO2018151344A1 (fr) Procédé de groupement de dispositifs ap en utilisant dqn, et dispositif de communication coopératif utilisant dqn
WO2021086140A1 (fr) Procédé d'optimisation de transfert intercellulaire assisté par serveur mdas dans un réseau sans fil
WO2019066464A1 (fr) Procédé et système permettant de fournir des avantages pour l'application, et support d'enregistrement lisible par ordinateur non transitoire
Peng et al. Ultra-dense heterogeneous relay networks: A non-uniform traffic hotspot case
Wu et al. Characterizing user association patterns for optimizing small-cell edge system performance
WO2016137101A1 (fr) Procédé et système de commande de dispersion de volume de trafic dans le temps et dans l'espace, au moyen d'une communication bidirectionnelle
WO2021230649A1 (fr) Procédé et système d'attribution de ressources à canaux multiples

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17896502

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17896502

Country of ref document: EP

Kind code of ref document: A1