WO2019080771A1 - 用于无线通信的电子设备和方法 - Google Patents

用于无线通信的电子设备和方法

Info

Publication number
WO2019080771A1
WO2019080771A1 PCT/CN2018/110964 CN2018110964W WO2019080771A1 WO 2019080771 A1 WO2019080771 A1 WO 2019080771A1 CN 2018110964 W CN2018110964 W CN 2018110964W WO 2019080771 A1 WO2019080771 A1 WO 2019080771A1
Authority
WO
WIPO (PCT)
Prior art keywords
behavior
user
electronic device
state
communication quality
Prior art date
Application number
PCT/CN2018/110964
Other languages
English (en)
French (fr)
Inventor
赵友平
王静云
孙晨
郭欣
Original Assignee
索尼公司
赵友平
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 索尼公司, 赵友平 filed Critical 索尼公司
Priority to CN201880044183.3A priority Critical patent/CN110809893A/zh
Priority to EP18869999.5A priority patent/EP3700247A4/en
Priority to US16/634,887 priority patent/US11140561B2/en
Publication of WO2019080771A1 publication Critical patent/WO2019080771A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/18Network planning tools
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/02Resource partitioning among network components, e.g. reuse partitioning
    • H04W16/10Dynamic resource partitioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/22Traffic simulation tools or models
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/10Scheduling measurement reports ; Arrangements for measurement reports
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W48/00Access restriction; Network selection; Access point selection
    • H04W48/20Selecting an access point
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/04Wireless resource allocation
    • H04W72/044Wireless resource allocation based on the type of the allocated resource
    • H04W72/0453Resources in frequency domain, e.g. a carrier in FDMA
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/50Allocation or scheduling criteria for wireless resources
    • H04W72/54Allocation or scheduling criteria for wireless resources based on quality criteria
    • H04W72/541Allocation or scheduling criteria for wireless resources based on quality criteria using the level of interference
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/50Allocation or scheduling criteria for wireless resources
    • H04W72/54Allocation or scheduling criteria for wireless resources based on quality criteria
    • H04W72/542Allocation or scheduling criteria for wireless resources based on quality criteria using measured or perceived quality
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W88/00Devices specially adapted for wireless communication networks, e.g. terminals, base stations or access point devices
    • H04W88/08Access point devices
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W88/00Devices specially adapted for wireless communication networks, e.g. terminals, base stations or access point devices
    • H04W88/18Service support devices; Network management devices

Definitions

  • Embodiments of the present invention generally relate to the field of wireless communications, and more particularly to resource management in a User Centric Network (UCN), and more particularly to electronic devices and methods for wireless communication.
  • UCN User Centric Network
  • Ultra-Dense Networks deploys miniaturized small base stations to become an effective technical means to meet the growing demand for mobile data rates.
  • the dense and flexible configuration of small base stations enables the implementation of a user-centric network (UCN) to support efficient communication of massive mobile users and devices.
  • the UCN allows each user to jointly select multiple access points, such as base stations, for cooperative transmission to meet the quality of service requirements of all users with the greatest probability. Therefore, User-centric Ultra-Dense Networks (UUDN) will become the main trend of the future network.
  • an electronic device for wireless communication comprising: processing circuitry configured to determine a cooperative access point for a user within a predetermined range with a wireless network topology of the wireless network as a state And re-determining the set of coordinated access points for the user in response to changes in the topology of the wireless network, wherein the wireless network topology includes distribution of users and distribution of access points.
  • a method for wireless communication comprising: determining, by a wireless network topology of a wireless network, a set of coordinated access points for a user within a predetermined range; and responding to the wireless network The topology changes, and the set of cooperative access points is re-determined for the user, wherein the wireless network topology includes the distribution of users and the distribution of access points.
  • the electronic device and method according to the present application can implement a dynamic selection of an Access Point Group (APG) to better meet the communication requirements of all users.
  • APG Access Point Group
  • Figure 1 shows a schematic diagram of a UUDN scenario
  • FIG. 2 shows a functional block diagram of an electronic device for wireless communication in accordance with one embodiment of the present application
  • Figure 3 shows a graph of an example of a utility function
  • FIG. 4 shows a functional block diagram of an electronic device for wireless communication in accordance with one embodiment of the present application
  • FIG. 5 illustrates a functional block diagram of an electronic device for wireless communication in accordance with another embodiment of the present application
  • FIG. 6 shows a functional block diagram of an electronic device for wireless communication in accordance with another embodiment of the present application.
  • FIG. 7 illustrates a functional block diagram of an electronic device for wireless communication in accordance with another embodiment of the present application.
  • Figure 8 is a diagram showing the flow of information between a user, an access point, and a spectrum management device
  • Figure 9 shows a schematic diagram of a simulation scenario of a simulation example
  • FIG. 10 is a diagram showing an example of a behavior matrix and a Q-value matrix
  • Figure 11 shows a schematic diagram of the results of performing the determined behavior
  • FIG. 12 is a diagram showing another example of a behavior matrix and a Q-value matrix
  • Figure 13 shows a schematic diagram of the results of performing the determined behavior
  • Figure 14 shows a schematic diagram of a simulation scenario 1 of another simulation example
  • Figure 15 shows a schematic diagram of a simulation scenario 2 of another simulation example
  • FIG. 16 shows a comparison diagram of a cumulative distribution function (CDF) of the user satisfaction rate obtained based on the simulation scenario 1;
  • CDF cumulative distribution function
  • Figure 17 is a graph showing the proportion of users moving along a rectangular trajectory in simulation scenario 2 to meet user communication quality requirements at different laps;
  • Figure 18 shows a flow chart of a method for wireless communication in accordance with one embodiment of the present application
  • 19 is a block diagram showing an example of a schematic configuration of a server 700 to which the technology of the present disclosure can be applied;
  • FIG. 20 is a block diagram of an exemplary structure of a general purpose personal computer in which methods and/or apparatus and/or systems in accordance with embodiments of the present invention may be implemented.
  • FIG. 1 shows a schematic diagram of a scene of a UUDN.
  • APs access points
  • UE User Equipment
  • each AP is communicably connected to a spectrum management device, such as a spectrum coordinator (SC), and the SC determines a cooperative APG for the UEs within its management scope, and the cooperative APG has a cooperative relationship with the corresponding UE, that is, the UE.
  • SC spectrum coordinator
  • a collection of APs that provide communication access services.
  • the local SC and the neighboring SC can also communicate appropriately to exchange information. It can be seen that compared with the traditional cellular network architecture, the network architecture of FIG. 1 is characterized by a large number of APs, even more than the number of UEs.
  • the AP described herein may be any node that provides network communication services, such as a base station, a small base station, and the like.
  • a base station can be implemented as any type of evolved Node B (eNB), such as a macro eNB and a small eNB.
  • the small eNB may be an eNB covering a cell smaller than the macro cell, such as a pico eNB, a micro eNB, and a home (femto) eNB.
  • the base station can be implemented as any other type of base station, such as a NodeB and a base transceiver station (BTS).
  • BTS base transceiver station
  • the base station can include: a body (also referred to as a base station device) configured to control wireless communication; and one or more remote wireless headends (RRHs) disposed at a different location than the body.
  • a body also referred to as a base station device
  • RRHs remote wireless headends
  • various types of terminals can operate as base stations by performing base station functions temporarily or semi-persistently.
  • the UE or user may be any terminal device or a wireless communication device that provides the service.
  • the terminal device may be implemented as a mobile terminal (such as a smart phone, a tablet personal computer (PC), a notebook PC, a portable game terminal, a portable/encrypted dog type mobile router, and a digital camera device) or an in-vehicle terminal (such as a car navigation device).
  • the terminal device can also be implemented as a terminal (also referred to as a machine type communication (MTC) terminal) that performs machine-to-machine (M2M) communication.
  • MTC machine type communication
  • M2M machine-to-machine
  • the terminal device may be a wireless communication module (such as an integrated circuit module including a single wafer) installed on each of the above terminals.
  • SC shown in FIG. 1 is only an example of a spectrum management apparatus, and may also be in the form of other spectrum management apparatuses, such as a Spectrum Access System (SAS), etc., which are not limitative.
  • SAS Spectrum Access System
  • the present embodiment provides an electronic device 100 for wireless communication.
  • the electronic device 100 includes: a determining unit 101 configured to take a wireless network topology of a wireless network as a state, for a predetermined A user within range determines an Association of Access Point Sets (APGs); and an update unit 102 is configured to re-determine a set of coordinated access points for the user in response to changes in the topology of the wireless network.
  • APGs Association of Access Point Sets
  • the determining unit 101 and the updating unit 102 can be implemented by one or more processing circuits, which can be implemented, for example, as a chip.
  • the electronic device 100 may be, for example, located on a spectrum management device (such as an SC or SAS) as shown in FIG. 1, or communicably connected to a spectrum management device.
  • the electronic device 100 may employ a reinforcement learning algorithm to determine a collaborative APG for users within a predetermined range.
  • the predetermined range may be, for example, at least a part of a management range of the spectrum management device in which the electronic device is located.
  • the reinforcement learning algorithm regards learning as a process of tentative evaluation, learning the mapping from environmental state to behavior, so that the selected behavior can obtain the greatest reward of the environment, that is, the external environment evaluates the learning system in a certain sense (or the whole The system's operating performance) is optimal.
  • the reinforcement learning algorithm used herein may include, for example, a Q-learning algorithm, a differential learning algorithm, etc., in which a wireless network topology may be taken as a state.
  • the wireless network topology includes the distribution of users and the distribution of access points.
  • the wireless network topology changes as the user and/or access point moves, or when the switch state of a particular user and/or access point changes.
  • the wireless network topology changes, for example, corresponding to the states S t , S t+1 , and S t shown in the figure. +2 .
  • the collaborative APG of the user determined in the previous state may no longer be applicable in the new state, for example, the communication requirement of the user cannot be satisfied, and therefore, the update unit 102 re-determines the collaboration for the user in response to the change.
  • APG to provide users with a stable, continuous communication service.
  • the change in the wireless network topology includes a change in the location of the user, the change being detected by the user, the user reporting the change to the electronic device 100 upon detecting the change, and requesting the electronic device 100 to re-determine the collaboration for it APG.
  • the change in network topology also includes a change in the location of the access point, the access point also reports a change in its location to the electronic device 100, and accordingly, the electronic device 100 can also re-based the change. Determine the user's collaborative APG.
  • the determining unit 101 can regard the cooperation relationship between the user and the access point as an action in the reinforcement learning algorithm, and for each behavior, based on the degree of satisfaction of the communication quality requirement of the user when making the behavior and the network overhead caused To calculate the evaluation of this behavior.
  • users have specific requirements for their communication quality.
  • the degree of satisfaction of the user's communication quality requirements indicates an aspect of the evaluation of the behavior.
  • the user's communication quality requirements can be expressed, for example, by the quality of service required by the user.
  • the signal to interference and noise ratio (SINR) threshold can be used.
  • SINR signal to interference and noise ratio
  • the behavior when changing from the previous state to the current state, the behavior changes accordingly, for example, the behavior determined from the previous state changes to a certain behavior, and the change in behavior means the change of the cooperative APG of the UE, so the AP switching will occur. Operation, causing network overhead. As far as the evaluation of behavior is concerned, it is expected that the network overhead is as small as possible, and therefore, the network overhead indicates another aspect of the evaluation of behavior.
  • the determining unit 101 determines the collaborative APG of the user in the current state based on the highest evaluated behavior. In other words, the determination unit 101 determines the behavior of the highest evaluation as the behavior to be implemented, thereby determining the cooperative APG of each user. For example, the highest-rated behavior is the behavior that the user's communication quality needs are most satisfied and the network overhead is minimal when compared to other behaviors.
  • the cooperative relationship between the user and the access point that is, the behavior in the reinforcement learning algorithm (also referred to as an individual) can be represented by the following matrix.
  • a n,m is 1, it indicates that there is a cooperative relationship between the nth user and the mth AP.
  • a n,m is 0, it indicates that there is no cooperation relationship between the nth user and the mth AP.
  • a i [a 11 a 12 ... a 1M a 21 a 22 ... a 2M ... a N1 a N2 ... a NM ] 1 ⁇ NM (2)
  • each behavior is treated as a row, which can constitute a behavior matrix.
  • a plurality of behaviors may be initially generated, that is, a plurality of A i having different values are generated.
  • a predetermined condition may also be set to define the generated behavior, the predetermined condition including, for example, one or more of the following: the generated behavior should be such that the communication quality of each user satisfies its communication quality requirement; relative to the previous state
  • the network overhead of the determined behavior does not exceed a predetermined overhead threshold.
  • communication quality requirements can be expressed in terms of SINR thresholds.
  • the degree of satisfaction of the communication quality requirements of each user at the time of implementing each behavior in this state and the network overhead caused are evaluated as the behavior.
  • the evaluation of behavior is represented by Q-value, and the evaluation of each behavior constitutes a Q-value matrix.
  • the determining unit 101 can calculate the degree of satisfaction of the communication quality requirement of the user by using the SINR threshold of each user and the estimated SINR of the corresponding user.
  • the degree of satisfaction of the communication quality requirement of the user is higher.
  • the determining unit 101 can comprehensively consider the degree of satisfaction of the communication quality requirements of the respective users.
  • the degree of satisfaction of the user's communication quality requirements includes the utility value of all users and the generation value of the SINR that does not satisfy the user, wherein the user's utility value is calculated by the utility function, and the utility function is the estimated user's SINR and A non-linear function of the ratio of the user's SINR threshold, the cost value depends on the difference between the SINR threshold of the corresponding user and the estimated SINR.
  • the utility value is used to indicate the satisfaction degree of the SINR of the corresponding user with respect to the SINR threshold
  • the cost value is used to indicate the degree of deficiency of the SINR of the corresponding user with respect to the SINR threshold.
  • U n is the utility value of the nth user, which is obtained by the user's utility function calculation, for example, can be obtained by the following formula (4);
  • is the cost factor, Is the n-th user SINR threshold, the estimated SINR n is the n th user SINR.
  • tanh() is a hyperbolic tangent function
  • is a spreading factor (for example, 3.5834)
  • is a symmetric center (for example, 0.8064).
  • Figure 3 shows a curve of the utility function as an example. As shown in Figure 3, when the user's SINR exceeds the SINR threshold, the change in the utility function curve becomes relatively slow, approaching 1, to avoid a user's SINR. High causes the R value to be too large. It should be understood that the utility function is not limited to the form shown in the formula (4), but may be modified as appropriate.
  • SINR n can be estimated using various communication system models.
  • SINR n can be calculated as follows:
  • p j and p k are the powers of the jth AP and the kth AP, respectively, d nj and d nk are the distances from the nth user to the jth AP and the kth AP, respectively, and ⁇ is the path loss factor , ⁇ c(n) is the cooperative APG of the nth user, ⁇ I(n) is the interference APG of the nth user, n 0 is the noise power at the user receiver, and the interference APG refers to providing communication for other users.
  • the determining unit 101 calculates the degree of satisfaction of the communication quality requirement of the user, and in the Q-learning algorithm, the degree of satisfaction is equivalent to the bonus value.
  • the location information of the user, the location information and the transmission power of the access point, and the communication quality requirement of the user such as the SINR threshold are used.
  • the determining unit 101 can also be configured to use the difference between the behavior and the behavior determined in the previous state for each behavior as the network overhead caused by the behavior. For example, when the determining unit 101 determines the behavior with the highest evaluation as the behavior to be implemented, the behavior determined in the previous state is the behavior with the highest evaluation in the previous state. When the current state is the initial state, that is, when there is no previous state, the network overhead can be set to zero.
  • the determining unit 101 may, in the case of implementing a certain behavior, the amount of operation of the network switching operation to be performed as the network overhead caused by the behavior, compared to the behavior determined in the previous state.
  • the behavior can be represented by a binarized matrix of collaborative relationships.
  • the network overhead can be expressed by the Hamming distance between the behaviors, as shown in the following equation (6).
  • the physical meaning of the Hamming distance between the behaviors is the number of cooperative AP handovers between the two APG selection schemes.
  • the network overhead is equivalent to the value of the generation.
  • is the cost factor
  • D ham () is the Hamming distance calculation.
  • the network overhead may be taken into account only when the network overhead incurred in making the behavior exceeds a predetermined overhead threshold. At this point, you can use the following formula (7) to calculate the network overhead:
  • T d is a predetermined network overhead threshold, that is, a predetermined Hamming distance threshold.
  • a predetermined network overhead threshold is used, which may be provided by the AP.
  • Binding is calculated as follows behavior evaluation by the formula (3) and (7), so as to obtain a matrix Q-value Q (S t) in a state S t, Q (S t) of each element is calculated as follows.
  • the Q-value matrix Q(S t ) is a matrix of T ⁇ 1 dimensions, and T is the number of behaviors. According to the obtained Q-value matrix Q(S t ), for example, the behavior corresponding to the largest Q-value, that is, the highest evaluated behavior can be selected as the APG selection result in the state S t . In this case, it is possible to satisfy the communication quality requirements of each user as much as possible while reducing the network overhead caused by the AP handover.
  • APG selection calculation process may be performed online in real time, offline, or a combination of the two.
  • the electronic device 100 may further include a storage unit 103 configured to store, for each state, each behavior in the state as an evaluation matrix in association with the evaluation for the behavior calculation.
  • the storage unit 103 can be implemented by using various memories.
  • the evaluation may include, for example, the degree of satisfaction of the aforementioned user's communication quality requirements (eg, R(S t , A i )) and the network overhead (eg, PH(S t , A i )) caused by the execution behavior.
  • the updating unit 102 can be configured to determine the changed content based on the content of the evaluation matrix in the case where there is an evaluation matrix for the changed state when the state changes.
  • the behavior to be taken in the state Specifically, an appropriate behavior in the state, such as evaluating the highest behavior, can be selected according to the current state. After the behavior is selected, the cooperative relationship between the UE and the AP is determined. In this way, the calculation load can be reduced, the processing speed can be improved, and a fast and stable APG switching in the user's mobile state can be realized.
  • the update unit 102 may be further configured to update the execution of the previous state stored in the storage unit 103 by using the information of the actual communication quality of the user when the determined behavior is performed in the previous state when the state changes. Evaluation of behavior. Among them, the actual communication quality of the user is obtained by the user's measurement.
  • the update unit 102 can replace the stored degree of satisfaction of the estimated communication quality requirements with the degree of satisfaction of the communication quality requirements calculated based on the actual communication quality of the user.
  • the updating unit 102 can replace the stored R (S t , A, for example, by the following formula (9). i ):
  • the update unit 102 is configured to replace the portion of the evaluation of the behavior performed in the previous state with respect to the satisfaction level of the communication quality requirement of the user with the value calculated as follows: the actual satisfaction of the communication quality requirement of the user in the previous state The weighted sum of the degree and the highest degree of satisfaction of the user's communication quality requirements estimated in the current state.
  • the updating unit 102 can replace the stored R (S t ) by, for example, the following formula (10). , A i ):
  • the wireless network topology as a state may also include other variable parameters, such as one or more of the following: communication quality requirements of the UE, maximum transmit power of the AP, predetermined network overhead threshold of the AP, etc. .
  • changes in these parameters may also cause the update unit 102 to re-determine the APG, or update the stored evaluation of the behavior performed in the previous state.
  • the electronic device 100 determines the cooperative APG for different states by using the reinforcement learning algorithm, thereby enabling dynamic APG selection to better meet the communication quality requirements of all users.
  • the reinforcement learning algorithm as an example, but is limited thereto, other algorithms may be used to perform the determination of the cooperative APG.
  • FIG. 5 illustrates a functional block diagram of an electronic device 200 for wireless communication according to another embodiment of the present application.
  • the electronic device 200 further includes: a grouping unit 201 configured In each state, the behavior is obtained by grouping the access points on a user-centric basis and selecting a collaborative APG for the corresponding user within the user's group.
  • grouping unit 201 can be implemented by one or more processing circuits, which can be implemented, for example, as a chip. Further, although not shown in FIG. 5, the electronic device 200 may also include the storage unit 103 shown in FIG.
  • the grouping unit 201 can group according to the Euclidean distance between the user and the access point.
  • the following equation (11) shows the value of the access point to the user's membership parameter calculated by Euclidean distance:
  • u j represents the jth UE
  • x i represents the i-th AP.
  • the AP and the UE have different locations in the wireless network, and the membership parameter values are also different. The smaller the Euclidean distance from the AP to the UE, the larger the membership parameter value. If the AP has a larger membership parameter value for which UE, the AP is allocated to the UE. In this way, the grouping of each UE is established.
  • the determining unit 101 randomly selects a cooperative access point set for the corresponding user within the user's group and takes the cooperative relationship of the user and the access point satisfying the predetermined condition as an action.
  • the predetermined condition may include one or more of the following: each user's communication quality satisfies its communication quality requirement; when the cooperative relationship is employed, the network is determined relative to the behavior determined in the previous state The overhead does not exceed the predetermined overhead threshold.
  • bits corresponding to APs other than the grouping of the UE are set to values having no cooperative relationship (for example, 0).
  • the electronic device 200 can narrow down the selection range of the user's cooperative AP by including the grouping unit 201, thereby facilitating more reasonable behavior, improving selection accuracy, and reducing computational load.
  • the electronic device 300 further includes: an estimating unit 301 configured In order to target each state, new behaviors are estimated based on the initially obtained behavior.
  • the estimation unit 301 can be implemented by one or more processing circuits, which can be implemented, for example, as a chip.
  • the electronic device 300 may include the storage unit 103 shown in FIG. 4, the grouping unit 201 described with reference to FIG. 5, and the like.
  • the behavior is initially generated by randomly selecting the AP for the user.
  • new behaviors may be further estimated based on the initially obtained behavior.
  • the estimation unit 301 can use a Genetic Algorithm (GA) to estimate new behavior.
  • GA Genetic Algorithm
  • the estimating unit 301 can select N p behaviors having a better R value from the initially obtained behaviors to constitute an initial population of genetic algorithms.
  • the network fitness matrix of the initial population is calculated, and the network fitness matrix of the population is obtained according to the Q-value of each behavior, as shown in the following formula (12).
  • a selection operation is performed, for example, using a roulette selection method, the probability of occurrence of each individual in the offspring is calculated according to the network fitness value of the individual in the initial population, and N p individuals are randomly selected according to the probability to form a child.
  • the probability p i is as shown in the following equation (13):
  • a crossover operation is performed to randomly select two individuals A m and A n from the constructed progeny population, and randomly select multiple points for multi-point crossing to generate a new individual or population.
  • the intersection operation of the mth individual A m and the nth individual A n at the i position is as shown in the following formula (14):
  • the mutation operation is performed, and an individual is randomly selected from the population obtained after the cross operation, and a point in the individual is randomly selected for mutation to produce a better individual. Since the individual's chromosome is 0 or 1, the mutation operates to mutate chromosome 0 to 1, or to change 1 to zero. In this way, new individuals, new behaviors, are obtained.
  • the estimating unit 301 can repeatedly perform the selecting operation, the performing operation, and the mutating operation to generate a plurality of new behaviors. For example, the number of repeated operations can be set in advance.
  • the estimating unit 301 is further configured to treat the behavior as a new behavior only when the behavior estimated by the genetic algorithm satisfies a predetermined condition.
  • the predetermined condition may include, for example, one or more of the following: each user's communication quality satisfies its communication quality requirement; when the behavior is adopted, the network overhead relative to the determined behavior in the previous state does not exceed the predetermined overhead Threshold.
  • the new behavior obtained above is added to the initially obtained behavior to constitute a new behavior set, and the determining unit 101 uses the reinforcement learning algorithm to determine the evaluation of the behavior (for example, the Q-value value described in the first embodiment), thereby Select the highest rated behavior as the behavior to be implemented in the current state to determine the collaborative APG for each user.
  • the electronic device 300 obtains a new behavior by an estimation method such as a genetic algorithm, thereby expanding the behavior set, so that the optimal cooperative APG can be determined more accurately.
  • FIG. 7 illustrates a functional block diagram of an electronic device 400 for wireless communication according to another embodiment of the present application.
  • the electronic device 400 further includes: a transceiver unit 401 configured Receiving one or more of location information, maximum transmit power information, and a predetermined network overhead threshold of the access point, and to the access point, for receiving one or more of the location information of the user and the communication quality requirement Sending information of the determined set of coordinated access points.
  • a transceiver unit 401 configured Receiving one or more of location information, maximum transmit power information, and a predetermined network overhead threshold of the access point, and to the access point, for receiving one or more of the location information of the user and the communication quality requirement Sending information of the determined set of coordinated access points.
  • the transceiver unit 401 can be implemented, for example, through a communication interface.
  • the communication interface includes, for example, a network interface, or an antenna, a transceiver circuit, and the like.
  • the electronic device 400 may further include the storage unit 103 shown in FIG. 4, the grouping unit 201 described with reference to FIG. 5, the estimating unit 301 described with reference to FIG. 6, and the like.
  • the above information received by the transceiver unit 401 is used for the determination and update of the user's collaborative APG. For example, when the wireless network topology as a state changes, the transceiver unit 401 will reacquire the above various information.
  • the transceiving unit 401 is further configured to receive information of the actual communication quality of the user. For example, when the state changes, the user reports to the electronic device 400 the actual communication quality obtained by performing the determined behavior in the state before the change, such as the actual SINR and the utility value.
  • the location information and the communication quality requirement of the user may be provided to the transceiver unit 401 via the access point, or may be directly provided to the transceiver unit 401.
  • FIG. 8 shows a schematic diagram of information flow between a user (UE), an access point (AP), and a spectrum management device when the electronic device 400 is disposed on a spectrum management device (eg, SC or SAS).
  • UE user
  • AP access point
  • SC spectrum management device
  • the UE requests the AP of the cooperative communication to the spectrum management apparatus, and reports information such as the location information and the communication quality requirement such as the SINR threshold.
  • the AP reports its location information, maximum transmit power information, predetermined network overhead threshold, and the like to the spectrum management device.
  • the AP can report its location information only when the system is initialized.
  • the UE may report the related information directly to the spectrum management device, or report the related information through the AP. In the latter case, the information reported by the AP also includes information about the location information and communication quality requirements of the user.
  • the spectrum management apparatus After acquiring the various kinds of information described above, the spectrum management apparatus performs selection of the cooperative APG of the user. Specifically, the spectrum management apparatus may use the Q-learning reinforcement learning algorithm specifically described in the first embodiment to select the behavior with the largest Q-value, thereby determining the cooperative APG of each user. It should be noted that in the case where an evaluation matrix for a plurality of states is stored in the spectrum management apparatus, if the current state is also included in the stored state, the evaluation matrix that has been stored may be used to select the behavior without repeating the reinforcement. Learning algorithm.
  • the spectrum management device transmits the determined information of the cooperative APG to the AP to enable the AP to cooperate with the UE based on the information.
  • the UE periodically determines whether its location changes periodically.
  • the location changes or the degree of change reaches a certain level, it means that the wireless network topology changes, and the UE needs to re-request the cooperative APG.
  • the UE provides the changed position information to the spectrum management apparatus.
  • the UE also provides the spectrum management apparatus with the actual utility value and the SINR value when it performs the determined behavior in the state before the change.
  • the spectrum management device updates the Q-value of the behavior determined in the previous state based on the actual utility value and the SINR value provided by the UE.
  • the spectrum management apparatus also reselects the behavior to be performed in the current state based on the current location information of the UE, for example, by performing a Q-learning reinforcement learning algorithm as described above.
  • the behavior to be performed can be selected by looking up the evaluation matrix.
  • the spectrum management device transmits the determined information of the cooperative APG to the AP to enable the AP to cooperate with the UE based on the information.
  • Fig. 9 is a diagram showing a simulation scenario of the simulation example, in which a triangle represents a UE, a square represents an AP, and a broken line and an arrow indicate a motion trajectory of one of the UEs.
  • Figure 9 shows four different location of the UE, representing the state S 1, S 2, S 3 and S 4.
  • the parameters used in the simulation are listed as follows: operating frequency, 3.5 GHz; channel bandwidth, 10 MHz; number of UEs, 3; transmit power, 0 dBm; number of APs, 16; SINR threshold of the UE, 7 dB; UE receiver Noise figure, 5dB; population evolution number in genetic algorithm, 10; cross ratio, 0.7; mutation ratio, 0.1; number of individuals, 10; Hamming distance threshold, 5.
  • each row represents a behavior, a collaborative relationship between an AP and a UE, with a total of 18 behaviors.
  • the spectrum management matrix is generated using the aforementioned Q-learning algorithm.
  • Corresponding Q-value matrix As shown on the right side of Figure 10.
  • the Q-value matrix is obtained by calculation using the above equations (3) to (5) and (7) to (8).
  • state S1 is the initial state, Zero matrix.
  • the spectrum management device selects the behavior corresponding to the maximum value in the Q-value matrix, such as behavior 15, and informs the AP to cooperate with the UE based on the behavior.
  • FIG. 11 shows a schematic diagram of the result of performing the behavior 15. Among them, the UE and the AP circled by the same line type have a cooperative relationship.
  • the new UE position information, and an actual SINR utility in the state S 15 at one execution behavior uploaded to a value obtained by spectrum management apparatus.
  • the spectrum management apparatus calculates the actual communication quality satisfaction degree obtained by performing the behavior 15 using the formula (9) based on the information, and updates as shown in the formula (10).
  • is set to 0.
  • the spectrum management device uses a genetic algorithm to update the behavior matrix under state S 1 to obtain a behavior matrix under state S 2 As shown on the left side of Figure 12.
  • the spectrum management apparatus uses the aforementioned Q-learning algorithm to generate and Corresponding Q-value matrix As shown on the right side of Figure 12.
  • the Q-value matrix is obtained by calculation using the above equations (3) to (5) and (7) to (8).
  • the spectrum management device selects the behavior corresponding to the maximum value in the Q-value matrix, such as behavior 11, and informs the AP to cooperate with the UE based on the behavior.
  • FIG. 13 shows a schematic diagram of the result of performing the behavior 11. Among them, the UE and the AP circled by the same line type have a cooperative relationship.
  • FIG. 14 and 15 show two simulation scenarios of the simulation example, in which dashed lines and arrows show the motion trajectories of one of the UEs.
  • the UE reciprocates along the dotted line, so the state is from S 1 ⁇ S 9 ⁇ S 1 .
  • the simulation scenario 2 shown in Fig. 15 the UE cyclically moves along a rectangle formed by a broken line.
  • the parameters used in the simulation are listed as follows: operating frequency, 28 GHz; channel bandwidth, 10 MHz; number of UEs, 6; transmit power, 0 dBm; number of APs, 60; SINR threshold of the UE, 7 dB; Noise figure, 5dB; population evolution number in genetic algorithm, 10; cross ratio, 0.7; variation ratio, 0.1; number of individuals, 10; beam width, ⁇ /4; Hamming distance threshold is 5 in simulation scenario 1. In simulation scenario 2, it is 10.
  • FIG. 16 shows a comparison diagram of a cumulative distribution function (CDF) of the user satisfaction rate obtained based on the simulation scenario 1.
  • the solid line is the CDF curve corresponding to the reinforcement learning algorithm.
  • the two curves are the CDF curve corresponding to the contrast algorithm with the Hamming distance threshold of 5 and the CDF curve corresponding to the comparison algorithm with the Hamming distance threshold of 20 from top to bottom. It can be seen that the performance of the algorithm based on reinforcement learning is better than the performance of the comparison algorithm.
  • Figure 17 shows the proportion of UEs moving along a rectangular trajectory in simulation scenario 2 to meet user communication quality requirements, such as QoS requirements, at different laps. It can be seen that as the number of laps increases, the satisfaction rate of the UE also increases, that is, the effect of the reinforcement learning algorithm becomes more and more significant over time.
  • FIG. 18 shows a flowchart of a method for wireless communication according to an embodiment of the present application.
  • the method includes: using a wireless network topology of a wireless network as a state, for a user within a predetermined range Determining the collaborative APG (S12); and re-determining the collaborative APG for the user in response to changes in the wireless network topology (S17), wherein the wireless network topology may include distribution of users and distribution of access points.
  • a reinforcement learning algorithm can be used in step S12 to determine a collaborative APG.
  • step S12 the cooperation relationship between the user and the access point is taken as an action in the reinforcement learning algorithm, and for each behavior, based on the degree of satisfaction of the communication quality requirement of the user when making the behavior and the network overhead brought about Calculate the evaluation of this behavior.
  • the collaborative APG of the user in the current state is determined based on the highest rated behavior.
  • the highest-rated behavior can be the behavior that the user's communication quality requirement is the highest and the network overhead is minimized when the behavior is compared with other behaviors.
  • each user's signal to interference and noise ratio threshold and the estimated user's signal to interference and noise ratio are used to calculate the degree of satisfaction of the user's communication quality requirements.
  • the satisfaction degree of the user's communication quality requirement may include the utility value of all users and the generation value of the user's signal to interference and noise ratio, wherein the utility value of the user is calculated by the utility function, and the utility function is the estimated user's signal dry noise.
  • a non-linear function that is a ratio of the ratio of the signal to noise ratio threshold of the corresponding user, the value of which depends on the difference between the signal-to-noise ratio threshold of the corresponding user and the estimated signal-to-noise ratio of the user.
  • the difference between the behavior and the behavior determined in the previous state can be used as the network overhead caused by the behavior.
  • Behavior can be represented by a binarized matrix of collaborative relationships, and network overhead can be represented by Hamming distances between behaviors. This network overhead can be taken into account only when the network overhead incurred in making this behavior exceeds a predetermined overhead threshold.
  • the above method may further include the steps of: receiving one or more of location information and communication quality requirements of the user, and location information, maximum transmission power information, and reservation of the access point.
  • One or more of the network overhead thresholds (S11) and transmitting the determined information of the set of coordinated access points to the access point (S13).
  • the information received in step S11 is used for the calculation in step S12.
  • the above method may further include step S14: for each state, each behavior in the state is stored as an evaluation matrix in association with the evaluation for the behavior calculation. In this way, when the state changes, when there is an evaluation matrix for the changed state, the behavior to be employed in the changed state is determined based on the content of the evaluation matrix.
  • the above method further includes the step S15 of receiving information of the actual communication quality of the user when the state changes. Further, the above method further includes the step S16: updating the stored evaluation of the behavior performed in the previous state by using the information of the actual communication quality of the user when performing the determined behavior in the previous state, that is, updating the evaluation matrix. content.
  • the value calculated as follows may be used in place of the degree of satisfaction of the user's communication quality requirement in the evaluation of the behavior performed in the previous state: the actual satisfaction degree of the user's communication quality requirement in the previous state and the current state A weighted sum of the estimated maximum satisfaction levels of the user's communication quality requirements.
  • the above method may further include the steps of: grouping the access points by user-centered and selecting a collaboration connection for the corresponding user within the user's group in each state.
  • grouping can be based on the Euclidean distance between the user and the access point.
  • the cooperative access point set is randomly selected for the corresponding user within the user's packet and the cooperative relationship of the user and the access point satisfying the predetermined condition is taken as an action.
  • the predetermined condition may include, for example, one or more of the following: each user's communication quality satisfies its communication quality requirement; when the collaboration relationship is employed, the network overhead relative to the determined behavior in the previous state does not exceed the predetermined overhead threshold.
  • new behaviors can be estimated based on the initially obtained behavior for each state. For example, genetic algorithms are used to estimate new behavior. It is also possible to treat the behavior as a new behavior only when the behavior estimated by the genetic algorithm satisfies the above predetermined condition.
  • the electronic devices 100 to 400 can be implemented as any type of server, such as a tower server, a rack server, and a blade server.
  • the electronic devices 100 to 400 may be control modules mounted on a server (such as an integrated circuit module including a single wafer, and a card or blade inserted into a slot of the blade server).
  • Server 700 includes a processor 701, a memory 702, a storage device 703, a network interface 704, and a bus 706.
  • the processor 701 can be, for example, a central processing unit (CPU) or a digital signal processor (DSP) and controls the functionality of the server 700.
  • the memory 702 includes random access memory (RAM) and read only memory (ROM), and stores data and programs executed by the processor 701.
  • the storage device 703 may include a storage medium such as a semiconductor memory and a hard disk.
  • Network interface 704 is a communication interface for connecting server 700 to communication network 705.
  • Communication network 705 can be a core network such as an Evolved Packet Core Network (EPC) or a packet data network (PDN) such as the Internet.
  • EPC Evolved Packet Core Network
  • PDN packet data network
  • the bus 706 connects the processor 701, the memory 702, the storage device 703, and the network interface 704 to each other.
  • Bus 706 can include two or more buses (such as a high speed bus and a low speed bus) each having a different speed.
  • the determining unit 101, the updating unit 102, the grouping unit 201, the estimating unit 301, and the like described with reference to FIGS. 2, 5, and 6 may be implemented by the processor 701.
  • the storage unit 103 described with reference to FIG. 4 may be implemented, for example, by the memory 702 or the storage device 703.
  • the transceiving unit 401 described with reference to FIG. 7 may be implemented, for example, by the network interface 704, and a part of its functions may also be implemented by the processor 701.
  • the processor 701 can perform selection and update of the cooperative APG by performing functions of the determining unit 101, the updating unit 102, and the like.
  • the present invention also proposes a program product for storing an instruction code readable by a machine.
  • the instruction code is read and executed by a machine, the above-described method according to an embodiment of the present invention can be performed.
  • a storage medium for carrying a program product storing the above-described storage machine readable instruction code is also included in the disclosure of the present invention.
  • the storage medium includes, but is not limited to, a floppy disk, an optical disk, a magneto-optical disk, a memory card, a memory stick, and the like.
  • a program constituting the software is installed from a storage medium or a network to a computer having a dedicated hardware structure (for example, the general-purpose computer 2000 shown in FIG. 20), which is installed with various programs. At the time, it is possible to perform various functions and the like.
  • a central processing unit (CPU) 2001 executes various processes in accordance with a program stored in a read only memory (ROM) 2002 or a program loaded from a storage portion 2008 to a random access memory (RAM) 2003.
  • ROM read only memory
  • RAM random access memory
  • data required when the CPU 2001 executes various processes and the like is also stored as needed.
  • the CPU 2001, the ROM 2002, and the RAM 2003 are connected to each other via a bus 2004.
  • Input/output interface 2005 is also connected to bus 2004.
  • the following components are connected to the input/output interface 2005: an input portion 2006 (including a keyboard, a mouse, etc.), an output portion 2007 (including a display such as a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.),
  • the storage section 2008 (including a hard disk or the like), the communication section 2009 (including a network interface card such as a LAN card, a modem, etc.).
  • the communication section 2009 performs communication processing via a network such as the Internet.
  • the drive 2010 can also be connected to the input/output interface 2005 as needed.
  • a removable medium 2011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive 2010 as needed, so that the computer program read therefrom is installed into the storage portion 2008 as needed.
  • a program constituting the software is installed from a network such as the Internet or a storage medium such as the removable medium 2011.
  • such a storage medium is not limited to the removable medium 2011 shown in FIG. 20 in which a program is stored and distributed separately from the device to provide a program to the user.
  • the removable medium 2011 include a magnetic disk (including a floppy disk (registered trademark)), an optical disk (including a compact disk read only memory (CD-ROM) and a digital versatile disk (DVD)), and a magneto-optical disk (including a mini disk (MD) (registered) Trademark)) and semiconductor memory.
  • the storage medium may be a ROM 2002, a hard disk included in the storage section 2008, or the like, in which programs are stored, and distributed to the user together with the device containing them.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Computer Security & Cryptography (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

本公开提供了用于无线通信的电子设备和方法,该电子设备包括:处理电路,被配置为:以无线网络的无线网络拓扑结构作为状态,为预定范围内的用户确定协作接入点集合;以及响应于无线网络拓扑结构的变化,重新为用户确定协作接入点集合,其中,无线网络拓扑结构包括用户的分布和接入点的分布。

Description

用于无线通信的电子设备和方法
本申请要求于2017年10月25日提交中国专利局、申请号为201711009075.6、发明名称为“用于无线通信的电子设备和方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明的实施例总体上涉及无线通信领域,具体地涉及以用户为中心网络(User Centric Network,UCN)中的资源管理,更具体地涉及用于无线通信的电子设备和方法。
背景技术
随着通信网络飞速发展,用户移动数据速率需求的指数型增长,此外,设备的移动性、灵活配置等也给未来无线网络带来了挑战。超密集网络(Ultra-Dense Networks,UDN)部署微型化小基站,成为能够满足日益增长的移动数据速率需求的一种有效技术方式。而小基站的密集以及灵活配置使得实现以用户为中心的网络(UCN),以支持海量移动用户及设备的有效通信成为可能。UCN允许每个用户联合选择多个接入点比如基站进行协作传输,以便最大概率地满足所有用户的服务质量需求。因此,以用户为中心超密集网络(User-centric Ultra-Dense Networks,UUDN)将成为未来网络的主要趋势。
另外,随着人工智能和物联网的兴起,机器学习等人工智能方法也是近来研究的热点之一,无线网络模拟人的思维方式使资源管理等更加智能化。
发明内容
在下文中给出了关于本申请的简要概述,以便提供关于本申请的某些方面的基本理解。应当理解,这个概述并不是关于本申请的穷举性概述。它并不是意图确定本申请的关键或重要部分,也不是意图限定本申请的范围。其目的仅仅是以简化的形式给出某些概念,以此作为稍后论 述的更详细描述的前序。
根据本申请的一个方面,提供了一种用于无线通信的电子设备,包括:处理电路,被配置为:以无线网络的无线网络拓扑结构作为状态,为预定范围内的用户确定协作接入点集合;以及响应于无线网络拓扑结构的变化,重新为用户确定协作接入点集合,其中,无线网络拓扑结构包括用户的分布和接入点的分布。
根据本申请的另一个方面,提供了一种用于无线通信的方法,包括:以无线网络的无线网络拓扑结构作为状态,为预定范围内的用户确定协作接入点集合;以及响应于无线网络拓扑结构的变化,重新为用户确定协作接入点集合,其中,无线网络拓扑结构包括用户的分布和接入点的分布。
依据本申请的其它方面,还提供了用于实现上述方法的计算机程序代码和计算机程序产品以及其上记录有该用于实现上述方法的计算机程序代码的计算机可读存储介质。
根据本申请的电子设备和方法能够实现动态的协作接入点集合(Access Point Group,APG)的选择,更好地满足所有用户的通信要求。
通过以下结合附图对本申请的优选实施例的详细说明,本申请的这些以及其他优点将更加明显。
附图说明
为了进一步阐述本发明的以上和其它优点和特征,下面结合附图对本发明的具体实施方式作进一步详细的说明。所述附图连同下面的详细说明一起包含在本说明书中并且形成本说明书的一部分。具有相同的功能和结构的元件用相同的参考标号表示。应当理解,这些附图仅描述本发明的典型示例,而不应看作是对本发明的范围的限定。在附图中:
图1示出了UUDN的一个场景示意图;
图2示出了根据本申请的一个实施例的用于无线通信的电子设备的功能模块框图;
图3示出了效用函数的一个示例的曲线图;
图4示出了根据本申请的一个实施例的用于无线通信的电子设备的功能模块框图;
图5示出了根据本申请的另一个实施例的用于无线通信的电子设备的功能模块框图;
图6示出了根据本申请的另一个实施例的用于无线通信的电子设备的功能模块框图;
图7示出了根据本申请的另一个实施例的用于无线通信的电子设备的功能模块框图;
图8示出了用户、接入点和频谱管理装置之间的信息流程的示意图;
图9示出了一个仿真实例的仿真场景的示意图;
图10示出了行为矩阵和Q-value矩阵的一个示例的图;
图11示出了执行了所确定的行为的结果的示意图;
图12示出了行为矩阵和Q-value矩阵的另一个示例的图;
图13示出了执行了所确定的行为的结果的示意图;
图14示出了另一个仿真实例的仿真场景1的示意图;
图15示出了另一个仿真实例的仿真场景2的示意图;
图16示出了基于仿真场景1获得的用户满意率的累积分布函数(CDF)的对比图;
图17示出了在仿真场景2中用户沿着长方形轨迹移动,在不同圈数下满足用户通信质量需求的比例的曲线图;
图18示出了根据本申请的一个实施例的用于无线通信的方法的流程图;
图19是示出可以应用本公开内容的技术的服务器700的示意性配置的示例的框图;以及
图20是其中可以实现根据本发明的实施例的方法和/或装置和/或系统的通用个人计算机的示例性结构的框图。
具体实施方式
在下文中将结合附图对本发明的示范性实施例进行描述。为了清楚和简明起见,在说明书中并未描述实际实施方式的所有特征。然而,应该了解,在开发任何这种实际实施例的过程中必须做出很多特定于实施方式的决定,以便实现开发人员的具体目标,例如,符合与系统及业务相关的那些限制条件,并且这些限制条件可能会随着实施方式的不同而有所改变。此外,还应该了解,虽然开发工作有可能是非常复杂和费时的,但对得益于本公开内容的本领域技术人员来说,这种开发工作仅仅是例行的任务。
在此,还需要说明的一点是,为了避免因不必要的细节而模糊了本发明,在附图中仅仅示出了与根据本发明的方案密切相关的设备结构和/或处理步骤,而省略了与本发明关系不大的其他细节。
<第一实施例>
图1示出了UUDN的一个场景示意图。其中,在用户设备(User Equipment,UE,下文中也称为用户)周围存在多个接入点(Access Point,AP),UE通过使用不同的AP进行协作传输。并且,各个AP与频谱管理装置比如频谱协调器(Spectrum Coordinator,SC)可通信地连接,SC为其管理范围内的UE确定协作APG,协作APG为与相应的UE存在协作关系、即为该UE提供通信接入服务的AP的集合。此外,本地SC与相邻SC还可以适当地通信以交换信息。可以看出,与传统的蜂窝网络架构相比,图1的网络架构的特点在于AP的数目众多,甚至多于UE的数目。
本文中所述的AP可以是任何提供网络通信服务的节点,比如基站、小基站等。基站可以被实现为任何类型的演进型节点B(eNB),诸如宏eNB和小eNB。小eNB可以为覆盖比宏小区小的小区的eNB,诸如微微eNB、微eNB和家庭(毫微微)eNB。代替地,基站可以被实现为任何其他类型的基站,诸如NodeB和基站收发台(BTS)。基站可以包括:被配置为控制无线通信的主体(也称为基站设备);以及设置在与主体不同的地方的一个或多个远程无线头端(RRH)。另外,各种类型的终端均可以通过暂时地或半持久性地执行基站功能而作为基站工作。
UE或者用户可以是任何终端设备或者提供服务的无线通信设备。例如,终端设备可以被实现为移动终端(诸如智能电话、平板个人计算机(PC)、笔记本式PC、便携式游戏终端、便携式/加密狗型移动路由器和数字摄像装置)或者车载终端(诸如汽车导航设备)。终端设备还可以被实现为执行机器对机器(M2M)通信的终端(也称为机器类型通信(MTC)终端)。此外,终端设备可以为安装在上述终端中的每个终端上的无线通信模块(诸如包括单个晶片的集成电路模块)。
此外,图1中所示的SC仅是频谱管理装置的一个示例,还可以使用其他的频谱管理装置的形式,比如频谱接入系统(Spectrum Access System,SAS)等,这些均不是限制性的。
在图1所示的场景中,用户甚至接入点均可以处于移动状态,因此,用户的协作APG的动态选择有助于维持稳定和高质量的通信。为此,本实施例提供了一种用于无线通信的电子设备100,如图2所示,电子设备100包括:确定单元101,被配置为以无线网络的无线网络拓扑结构作为状态,为预定范围内的用户确定协作接入点集合(APG);以及更新单元102,被配置为响应于无线网络拓扑结构的变化,重新为用户确定协作接入点集合。
其中,确定单元101和更新单元102可以由一个或多个处理电路实现,该处理电路例如可以实现为芯片。电子设备100例如可以位于图1中所示的频谱管理装置(比如SC或SAS)上,或者可通信地连接到频谱管理装置。
在该实施例中,电子设备100可以采用强化学习算法来为预定范围内的用户确定协作APG。其中,预定范围例如可以为电子设备所在的频谱管理装置的管理范围的至少一部分。
强化学习算法将学习看作试探评价过程,学习从环境状态到行为的映射,使得所选择的行为能够获得环境最大的奖赏,即,使得外部环境对学习系统在某种意义下的评价(或整个系统的运行性能)为最佳。本文中所用的强化学习算法例如可以包括Q-learning算法、差分学习算法等,其中,可以将无线网络拓扑结构作为状态。
在一个示例中,无线网络拓扑结构包括用户的分布和接入点的分布。换言之,当用户以及/或者接入点发生移动时,或者特定的用户以及/或者 接入点的开关状态发生变化时,无线网络拓扑结构发生变化。如图1中所示,当UE沿着黑色虚线箭头的方向从下往上移动时,无线网络拓扑结构发生变化,例如对应于图中所示的状态S t、S t+1、和S t+2。在这种情况下,前一个状态下确定的用户的协作APG在新的状态下可能不再适用,例如无法满足该用户的通信需求,因此,更新单元102响应于这种变化重新为用户确定协作APG,从而为用户提供稳定的、持续的通信服务。
在一个示例中,无线网络拓扑结构的变化包括用户的位置的变化,该变化由用户检测得到,用户在检测到变化时,向电子设备100报告该变化,并请求电子设备100为其重新确定协作APG。在其他示例中,例如,网络拓扑结构的变化还包括接入点的位置的变化,则接入点也向电子设备100报告其位置的变化,相应地,电子设备100也可以基于该变化来重新确定用户的协作APG。
例如,确定单元101可以将用户与接入点的协作关系作为强化学习算法中的行为,并且针对每个行为,基于在作出该行为时用户的通信质量需求的满足程度以及所带来的网络开销来计算该行为的评价。通常,用户对于其通信质量有特定要求,当实施某一行为时,用户的通信质量要求的满足程度指示了该行为的评价的一方面。用户的通信质量需求例如可以用用户所要求的服务质量(Quality of Service)来表示,具体地,如下文所述,可以用信干噪比(Signal to interference and noise ratio,SINR)阈值来表示。但是,应该理解,这仅是示例,并不是限制性的。
此外,当从前一状态改变到当前状态时,行为也相应地改变,例如从前一状态下确定的行为改变到某一行为,行为的改变意味着UE的协作APG的改变,因此将发生AP的切换操作,引起网络开销。就行为的评价而言,期望该网络开销尽量小,因此,网络开销指示了行为的评价的另一方面。
在一个示例中,确定单元101基于评价最高的行为来确定当前状态下用户的协作APG。换言之,确定单元101将评价最高的行为确定为要实施的行为,从而确定了各个用户的协作APG。例如,评价最高的行为为与其他行为相比,作出该行为时用户的通信质量需求满足程度最高以及所带来的网络开销最小的行为。
在下文中,为了便于理解,将以Q-learning算法为例对实施例的各个方面进行描述。但是,应该理解,这并不是限制性的,其他强化学习算法也可以适用于本申请。
假设预定范围内存在N个用户,M个AP,则用户与接入点的协作关系、即强化学习算法中的行为(也可称为个体)可以用如下矩阵来表示。
Figure PCTCN2018110964-appb-000001
其中,a n,m(n=1,......,N;m=1,......,M)表示第n个用户与第m个AP之间的协作关系,例如,当a n,m为1时,表示第n个用户与第m个AP之间具有协作关系,当a n,m为0时,表示第n个用户与第m个AP之间没有协作关系。
为了便于操作,还可以将(1)改变为式(2)所示的矢量形式。
A i=[a 11 a 12 ... a 1M a 21 a 22 ... a 2M ... a N1 a N2 ... a NM] 1×NM    (2)
即,将式(1)的各个行重新排列为一行。当存在多个行为时,将每个行为作为一行,可以构成行为矩阵。
首先,针对某一个状态比如状态S t,可以初始地生成多个行为,即生成具有不同取值的多个A i。例如,还可以设置预定条件来限定所生成的行为,预定条件例如包括如下中的一个或多个:所生成的行为应使得每个用户的通信质量满足其通信质量需求;相对于前一状态下所确定的行为的网络开销不超过预定开销阈值。例如,通信质量需求可以用SINR阈值来表示。
如上所述,将该状态下实施每一个行为时各个用户的通信质量需求的满足程度以及所带来的网络开销作为该行为的评价。在Q-learning算法中,行为的评价用Q-value表示,各个行为的评价构成Q-value矩阵。
示例性地,确定单元101可以利用每一个用户的SINR阈值和估算的相应用户的SINR来计算该用户的通信质量需求的满足程度。当估算的相应用户的SINR越接近该用户的SINR阈值时,该用户的通信质量 需求的满足程度越高。例如,确定单元101可以综合考虑各个用户的通信质量需求的满足程度。
在一个示例中,用户的通信质量需求的满足程度包括所有用户的效用值以及不满足用户的SINR的代价值,其中,用户的效用值由效用函数计算得到,效用函数为估算的用户的SINR与该用户的SINR阈值的比值的非线性函数,代价值取决于相应用户的SINR阈值与估算的SINR之间的差。效用值用以表示相应用户的SINR相对于SINR阈值的满足程度,代价值用以表示相应用户的SINR相对于SINR阈值的不足程度。
例如,在状态S t下实施行为A i时,用户的通信质量需求的满足程度R(S t,A i)可以用下式(3)来计算:
Figure PCTCN2018110964-appb-000002
其中,U n为第n个用户的效用值,由用户的效用函数计算获得,例如可以通过下式(4)计算获得;σ为代价因子,
Figure PCTCN2018110964-appb-000003
为第n个用户的SINR阈值,SINR n为估算的第n个用户的SINR。
Figure PCTCN2018110964-appb-000004
其中,tanh()为双曲正切函数,ξ为扩展因子(例如为3.5834),η为对称中心(例如为0.8064)。图3示出了作为示例的该效用函数的曲线,如图3所示,当用户的SINR超过SINR阈值时,效用函数曲线的变化变得相对缓慢,逼近于1,以避免一个用户的SINR过高造成R值过大。应该理解,效用函数并不限于式(4)所示的形式,而是可以适当地修改。
在以上计算中
Figure PCTCN2018110964-appb-000005
例如可以由用户提供,而SINR n可以采用各种通信系统模型来估算获得。作为一个示例,可以如下计算SINR n
Figure PCTCN2018110964-appb-000006
其中,p j和p k分别为第j个AP和第k个AP的功率,d nj和d nk分别为第n个用户到第j个AP和第k个AP的距离,α为路径损耗因子,Φ c(n)为第n个用户的协作APG,Φ I(n)为第n个用户的干扰APG,n 0为用户接 收机处的噪声功率,干扰APG指的是为其他用户提供通信接入服务从而对所讨论的第n个用户造成干扰的AP的集合。
如上式(3)至(5)所示,确定单元101计算出了用户的通信质量需求的满足程度,在Q-learning算法中,该满足程度相当于奖励值。在上述计算过程中,使用了用户的位置信息、接入点的位置信息和发射功率以及用户的通信质量需求比如SINR阈值。
此外,确定单元101还可以被配置为针对每一个行为,使用该行为与前一状态下所确定的行为之间的差别作为该行为所带来的网络开销。例如,在确定单元101将评价最高的行为确定为要实施的行为时,前一状态下所确定的行为即为前一状态下评价最高的行为。当当前状态为初始状态时、即不存在前一状态时,可以将网络开销设置为0。
在一个示例中,确定单元101可以将在实施某一行为的情况下,与前一状态下所确定的行为相比,所要进行的网络切换操作的操作量作为该行为带来的网络开销。
如前所述,行为可以用协作关系的二值化矩阵表示,在这种情况下,网络开销可以用行为间的汉明距表示,如下式(6)所示。实际上,在行为由0或1构成的情况下,行为间的汉明距的物理意义为两种APG选择方案之间的协作AP切换数目。在Q-learning算法中,该网络开销相当于代价值。
Figure PCTCN2018110964-appb-000007
其中,
Figure PCTCN2018110964-appb-000008
为前一状态S t-1下所确定实施的行为,σ为代价因子,D ham()表示汉明距计算。如前所述,当状态S t为初始状态时,可以将PH(S t,A i)设置为0。
在另一个示例中,可以仅在作出所述行为时带来的网络开销超过预定开销阈值时才将该网络开销纳入考虑。此时,可以采用下式(7)来计算网络开销:
Figure PCTCN2018110964-appb-000009
其中,T d为预定网络开销阈值、即预定汉明距阈值。如式(7)所示,仅在A i
Figure PCTCN2018110964-appb-000010
之间的汉明距大于T d时,才计算网络开销,否则将网络开销视为0。在该计算中,使用了预定网络开销阈值,该阈值可以由AP提供。
结合上述式(3)和式(7),可以如下计算行为的评价,从而获得状态S t下的Q-value矩阵Q(S t),Q(S t)的每一个元素计算如下。
Q(S t,A i)=R(S t,A i)+PH(S t,A i)     (8)
其中,Q-value矩阵Q(S t)为T×1维的矩阵,T为行为的数量。根据所获得的Q-value矩阵Q(S t),例如可以选择最大的Q-value所对应的行为、即评价最高的行为作为该状态S t下的APG选择结果。在这种情况下,可以使得尽量满足各个用户的通信质量需求,同时减小AP切换引起的网络开销。
应该理解,上述APG选择的计算过程可以是在线实时执行的,也可以是离线进行的,或者二者结合的。
如图4所示,电子设备100还可以包括:存储单元103,被配置为针对每一个状态,将该状态下的每一个行为与针对该行为计算的评价相关联地存储为评价矩阵。
其中,存储单元103可以采用各种存储器来实现。评价例如可以包括前述用户的通信质量需求的满足程度(比如,R(S t,A i))和执行行为所带来的网络开销(比如,PH(S t,A i))两方面。
可以理解,在建立了这样的评价矩阵之后,更新单元102可以被配置为在状态发生变化时,在存在针对变化后的状态的评价矩阵的情况下,基于该评价矩阵的内容来确定变化后的状态下要采用的行为。具体地,可以根据当前状态来选择该状态下的适当的行为、比如评价最高的行为。在选定了行为之后,UE与AP间的协作关系随之确定。这样,可以减小计算负荷,提高处理速度,实现在用户移动状态下的快速、稳定的APG切换。
另一方面,在不存在针对变化后的状态的评价矩阵的情况下,则如上所述针对该变化后的状态建立评价矩阵。
此外,更新单元102还可以被配置为在状态发生变化时,利用在前一状态下执行所确定的行为时用户的实际通信质量的信息来更新存储单 元103中存储的前一状态下所执行的行为的评价。其中,用户的实际通信质量由用户测量而获得。
例如,更新单元102可以用基于用户的实际通信质量计算的通信质量需求的满足程度来代替所存储的由估算得到的通信质量需求的满足程度。在状态由S t改变为S t+1且在状态S t下所确定的行为为A i的情况下,更新单元102例如可以用下式(9)来代替所存储的R(S t,A i):
Figure PCTCN2018110964-appb-000011
其中,
Figure PCTCN2018110964-appb-000012
为第n个用户的实际SINR,且在计算式(9)的U n时也使用
Figure PCTCN2018110964-appb-000013
例如在使用式(4)计算U n时tanh函数的分子上为
Figure PCTCN2018110964-appb-000014
通过使用实际通信质量的信息来更新评价矩阵,在某状态下所确定的行为对应的实际通信质量不佳的情况下,如果后续再返回到该状态时,则不会选择之前选择的行为,从而有助于提高通信质量。
在另一个示例中,在更新评价矩阵时,还可以考虑变化后的状态、即当前状态与前一状态之间的相关性。例如,更新单元102被配置为用如下计算的值来代替前一状态下所执行的行为的评价中有关用户的通信质量需求的满足程度的部分:前一状态下用户的通信质量需求的实际满足程度与当前状态下所估算的用户的通信质量需求的最高满足程度的加权和。
例如,在状态由S t改变为S t+1且在状态S t下所确定的行为为A i的情况下,更新单元102例如可以用下式(10)来代替所存储的R(S t,A i):
Figure PCTCN2018110964-appb-000015
其中,R t+1如式(9)所示,
Figure PCTCN2018110964-appb-000016
为在状态S t+1下找到一个行为A使R(S t+1,A)在所有行为的R值中最大;γ为折扣因子,表示前一状态与当前状态的相关度;如果γ=0,表示R值只和前一状态的R值相关。
此外,更一般地,作为状态的无线网络拓扑结构还可以包括其他可变的参数,比如如下中的一个或多个:UE的通信质量需求,AP的最大发射功率,AP的预定网络开销阈值等。换言之,这些参数的变化也可以 使得更新单元102重新确定APG,或者更新存储的前一状态下所执行的行为的评价。
综上所述,根据本实施例的电子设备100通过使用强化学习算法来针对不同的状态确定协作APG,从而能够实现动态的APG的选择,更好地满足所有用户的通信质量要求。此外,虽然以上以强化学习算法作为示例进行了描述,但是并限于此,还可以使用其他算法来进行协作APG的确定。
<第二实施例>
图5示出了根据本申请的另一个实施例的用于无线通信的电子设备200的功能模块框图,除了图2所示的各个单元之外,电子设备200还包括:分组单元201,被配置为在每一个状态下,通过以用户为中心对接入点进行分组并在用户的分组内为相应用户选择协作APG来获得行为。
类似地,分组单元201可以由一个或多个处理电路实现,该处理电路例如可以实现为芯片。此外,虽然图5中未示出,但是电子设备200也可以包括参照图4所示的存储单元103。
例如,分组单元201可以根据用户与接入点之间的欧氏距离来进行分组。下式(11)示出了用欧氏距离计算的接入点对用户的隶属参数值:
Figure PCTCN2018110964-appb-000017
其中,u j代表第j个UE,x i代表第i个AP,在不同状态下,无线网络中AP和UE的位置不同,其隶属参数值也不同。AP到UE的欧氏距离越小,则隶属参数值越大。如果AP对哪个UE的隶属参数值较大,则将AP分配给该UE。这样,建立了各个UE的分组。
确定单元101在用户的分组内为相应用户随机选择协作接入点集合并将满足预定条件的用户与接入点的协作关系作为行为。与第一实施例中类似,预定条件可以包括以下中的一个或多个:每个用户的通信质量满足其通信质量需求;采用该协作关系时,相对于前一状态下所确定的行为的网络开销不超过预定开销阈值。
本实施例与第一实施例的区别在于行为的生成不同。例如,当使用 二值化矩阵来表示行为时,在本实施例中,将与UE的分组以外的AP对应的位均设置为不具有协作关系的值(比如,0)。
因此,根据本实施例的电子设备200通过包括分组单元201,可以缩小用户的协作AP的选择范围,从而容易获得更合理的行为,提高选择准确度并减轻计算负荷。
<第三实施例>
图6示出了根据本申请的另一个实施例的用于无线通信的电子设备300的功能模块框图,除了图2所示的各个单元之外,电子设备300还包括:估算单元301,被配置为针对每一个状态,基于初步获得的行为来估算新的行为。
类似地,估算单元301可以由一个或多个处理电路实现,该处理电路例如可以实现为芯片。此外,虽然图6中未示出,但是电子设备300也可以包括参照图4所示的存储单元103、参照图5所述的分组单元201等。
在第一实施例和第二实施例中,通过为用户随机选择AP的方式来初步生成行为。在本实施例中,为了提高效率,可以进一步基于初步获得的行为来估算新的行为。
例如,估算单元301可以使用遗传算法(Genetic Algorithm,GA)来估算新的行为。
具体地,估算单元301可以从初步获得的行为中选择N p个具有较优R值的行为构成遗传算法的初始种群(Populations)。计算初始种群的网络适应度(fitness)矩阵,种群的网络适应度矩阵根据每个行为的Q-value得到,如下式(12)所示。
Figure PCTCN2018110964-appb-000018
其中,P i为种群中的第i个个体、即第i个行为,Δ为逼近于0的值,Q(S t,P i)为在S t状态下的P i对应的Q-value。
接下来执行选择操作,例如采用轮盘赌选择法,依据初始种群中的 个体的网络适应度值计算每个个体在子代中出现的概率,并按照此概率随机选择N p个个体构成子代种群,其中概率p i如下式(13)所示:
Figure PCTCN2018110964-appb-000019
然后执行交叉操作,从所构成的子代种群中随机选择两个个体A m和A n,随机选择多点进行多点交叉,从而产生新的个体或种群。例如,将第m个个体A m和第n个个体A n在i位的交叉操作如下式(14)所示:
Figure PCTCN2018110964-appb-000020
应该理解,式(14)中的个体仅是示意性的,并不对本申请构成限制。
接下来执行变异操作,从交叉操作后获得的种群中随机选择一个个体,并随机选择该个体中的一点进行变异以便产生更加优秀的个体。由于个体的染色体是0或1,因此变异操作为将染色体0变异为1,或者将1变为0。这样,就获得了新的个体、即新的行为。
估算单元301可以重复执行选择操作、执行操作和变异操作,以产生多个新的行为。例如,可以预先设置重复操作的次数。
在一个示例中,估算单元301还被配置为仅在通过遗传算法估算的行为满足预定条件时将该行为作为新的行为。类似地,预定条件例如可以包括以下中的一个或多个:每个用户的通信质量满足其通信质量需求;采用该行为时,相对于前一状态下所确定的行为的网络开销不超过预定开销阈值。
以上获得的新的行为加入到初步获得的行为中以构成新的行为集合,确定单元101使用强化学习算法来确定行为的评价(比如,第一实施例中所述的Q-value值),从而选择评价最高的行为作为当前状态下要实施的行为,以确定各个用户的协作APG。
根据本实施例的电子设备300通过估算方法比如遗传算法来获得新 的行为,从而扩展了行为集合,使得能够更准确地确定最优的协作APG。
<第四实施例>
图7示出了根据本申请的另一个实施例的用于无线通信的电子设备400的功能模块框图,除了图2所示的各个单元之外,电子设备400还包括:收发单元401,被配置为接收所述用户的位置信息以及通信质量需求中的一个或多个,接收所述接入点的位置信息、最大发射功率信息以及预定网络开销阈值中的一个或多个,以及向接入点发送所确定的协作接入点集合的信息。
其中,收发单元401例如可以通过通信接口来实现。通信接口例如包括网络接口、或者天线和收发电路等。此外,虽然图7中未示出,但是电子设备400还可以包括参照图4所示的存储单元103、参照图5所述的分组单元201、参照图6所描述的估算单元301等。
收发单元401所接收的上述信息用于用户的协作APG的确定以及更新。例如,当作为状态的无线网络拓扑结构发生变化时,收发单元401将重新获取上述各个信息。
此外,收发单元401还被配置为接收用户的实际通信质量的信息。例如,当状态发生变化时,用户向电子设备400上报在变化前的状态下执行所确定的行为而获得的实际通信质量比如实际的SINR以及效用值。
其中,用户的位置信息以及通信质量需求可以经由接入点提供给收发单元401,也可以直接提供给收发单元401。
为了便于理解,图8示出了当电子设备400设置在频谱管理装置(例如SC或SAS)上时,用户(UE)、接入点(AP)和频谱管理装置之间的信息流程的示意图。
首先,UE向频谱管理装置请求协作通信的AP,并上报其位置信息和通信质量需求的信息比如SINR阈值。AP向频谱管理装置上报其位置信息、最大发射功率信息、预定网络开销阈值等。在AP的位置固定的情况下,AP可以仅在系统初始化时上报其位置信息。如前所述,UE可以直接向频谱管理装置上报相关信息,也可以经由AP上报相关信息。在后一种情况下,AP上报的信息还包括用户的位置信息和通信质量需求 的信息。
频谱管理装置在获取上述各种信息之后,进行用户的协作APG的选择。具体地,频谱管理装置可以采用第一实施例中所具体描述的Q-learning强化学习算法,选择Q-value最大的行为,从而确定各个用户的协作APG。应该注意,在频谱管理装置中存储了针对多个状态的评价矩阵的情况下,如果当前状态也包含在已存储的状态中,则可以利用已经存储的评价矩阵来选择行为,而不必重复执行强化学习算法。
接下来,频谱管理装置将所确定的协作APG的信息发送给AP,以使得AP能够基于该信息来协作UE。
在图8的示例中,假设仅有UE的位置会发生变化。因此,UE例如周期性地确定其位置是否发生变化,当位置发生变化或者变化达到一定程度时,意味着无线网络拓扑结构发生变化,UE需要重新请求协作APG。此时,UE向频谱管理装置提供其变化后的位置信息。此外,UE还向频谱管理装置提供其在变化前的状态下执行所确定的行为时的实际效用值和SINR值。频谱管理装置基于UE提供的实际效用值和SINR值来更新前一状态下所确定的行为的Q-value。此外,频谱管理装置还基于UE当前的位置信息来重新选择当前状态下要执行的行为,例如,可以如上所述通过执行Q-learning强化学习算法来进行选择。或者,在频谱管理装置中存储有针对当前状态的评价矩阵的情况下,可以通过查找该评价矩阵来选择要执行的行为。类似地,频谱管理装置将所确定的协作APG的信息发送给AP,以使得AP能够基于该信息来协作UE。
应该理解,图8所示的信息流程仅是示意性的,而非限制性的。
下面,为了更进一步示出本申请的技术的细节和效果,将给出应用本申请的技术的两个仿真实例。首先参照图9至图13描述第一个仿真实例。图9示出了该仿真实例的仿真场景的示意图,其中,三角形代表UE,方块代表AP,虚线和箭头指示了其中一个UE的运动轨迹。图9中示出了UE的四个不同的位置,分别代表状态S 1、S 2、S 3和S 4
仿真中所采用的参数列举如下:工作频率,3.5GHz;信道带宽,10MHz;UE的数目,3个;发射功率,0dBm;AP的数目,16个;UE的SINR阈值,7dB;UE接收机处的噪声系数,5dB;遗传算法中的种群进化次数,10;交叉比率,0.7;变异比率,0.1;个体数量,10;汉明 距阈值,5。
在状态S 1下,UE将位置信息和通信质量需求的信息上传至频谱管理装置,AP将位置信息、最大发射功率信息以及汉明距阈值上传至频谱管理装置。频谱管理装置生成一些初步行为并基于这些初步行为使用遗传算法生成新的行为,初步行为和新的行为构成行为矩阵
Figure PCTCN2018110964-appb-000021
图10的左侧示出了状态S 1下的行为矩阵
Figure PCTCN2018110964-appb-000022
的示例,每一行代表一种行为、即AP和UE间的一种协作关系,共有18种行为。每一种行为用48位的二进制序列表示,其中,有M(在本例中M=16)个AP,则1~M位代表用户1和AP之间的协作关系,M+1~2M位代表用户2和AP的协作关系,以此类推。
频谱管理矩阵利用前述Q-learning算法生成与
Figure PCTCN2018110964-appb-000023
对应的Q-value矩阵
Figure PCTCN2018110964-appb-000024
如图10的右侧所示。Q-value矩阵利用前述式(3)至(5)和(7)至(8)计算获得。在本仿真实例中,状态S1为初始状态,
Figure PCTCN2018110964-appb-000025
为零矩阵。
频谱管理装置选择Q-value矩阵中的最大值所对应的行为、例如行为15,并告知AP基于该行为来协作UE。图11示出了执行行为15的结果的示意图。其中,用同样线型的圆圈圈出来的UE和AP具有协作关系。
接下来,由于UE的移动而变换到状态S 2。UE将新的位置信息、在状态S 1下执行行为15而获得的实际SINR及效用值上传至频谱管理装置。频谱管理装置根据这些信息利用公式(9)计算执行行为15所获得的实际通信质量满足程度,并如公式(10)所示来更新
Figure PCTCN2018110964-appb-000026
中的R_15的值。在公式(10)中,γ设置为0。
频谱管理装置利用遗传算法来更新状态S 1下的行为矩阵,以获得状态S 2下的行为矩阵
Figure PCTCN2018110964-appb-000027
如图12的左侧所示。类似地,频谱管理装置利用前述Q-learning算法生成与
Figure PCTCN2018110964-appb-000028
对应的Q-value矩阵
Figure PCTCN2018110964-appb-000029
如图12的右侧所示。Q-value矩阵利用前述式(3)至(5)和(7)至(8)计算获得。
频谱管理装置选择Q-value矩阵中的最大值所对应的行为、例如行为11,并告知AP基于该行为来协作UE。图13示出了执行行为11的结果的示意图。其中,用同样线型的圆圈圈出来的UE和AP具有协作关系。
在状态继续变化至状态S 3和S 4时,频谱管理装置执行与状态S 2下 类似的操作,在此不再赘述。
下面参照图14至图17描述第二个仿真实例。图14和图15示出了该仿真实例的两个仿真场景,其中,虚线和箭头示出了其中一个UE的运动轨迹。在图14所示的仿真场景1中,UE沿虚线做往复运动,因此状态从S 1→S 9→S 1。在图15所示的仿真场景2中,UE沿虚线构成的长方形循环运动。在两种场景中,均假定其余的UE和AP位置不变。初始状态为t=0时的S 1,其余状态按UE移动的位置依次类推。
仿真中所采用的参数列举如下:工作频率,28GHz;信道带宽,10MHz;UE的数目,6个;发射功率,0dBm;AP的数目,60个;UE的SINR阈值,7dB;UE接收机处的噪声系数,5dB;遗传算法中的种群进化次数,10;交叉比率,0.7;变异比率,0.1;个体数量,10;波束宽度,π/4;汉明距阈值在仿真场景1中为5,在仿真场景2中为10。
除了本申请所提出的基于强化学习算法的APG选择,为了对比,还针对仿真场景1进行了如下对比算法的APG选择的仿真:使用遗传算法来获取新的行为,但仅基于APG重新选择的切换阈值、即汉明距阈值T d来确定行为。图16示出了基于仿真场景1获得的用户满意率的累积分布函数(CDF)的对比图。其中,实线为强化学习算法对应的CDF曲线,两条曲线从上至下分别为汉明距阈值为5的对比算法对应的CDF曲线和汉明距阈值为20的对比算法对应的CDF曲线。可以看出,基于强化学习的算法的性能优于对比算法的性能。
图17示出了UE在仿真场景2中沿着长方形轨迹移动,在不同圈数下满足用户通信质量需求比如QoS要求的比例。可以看出,随着圈数的增加,UE的满意率也随之增加,即强化学习算法的效果随着时间的推移越来越显著。
<第五实施例>
在上文的实施方式中描述用于无线通信的电子设备的过程中,显然还公开了一些处理或方法。下文中,在不重复上文中已经讨论的一些细节的情况下给出这些方法的概要,但是应当注意,虽然这些方法在描述用于无线通信的电子设备的过程中公开,但是这些方法不一定采用所描述的那些部件或不一定由那些部件执行。例如,用于无线通信的电子设 备的实施方式可以部分地或完全地使用硬件和/或固件来实现,而下面讨论的用于无线通信的方法可以完全由计算机可执行的程序来实现,尽管这些方法也可以采用用于无线通信的电子设备的硬件和/或固件。
图18示出了根据本申请的一个实施例的用于无线通信的方法的流程图,如图18所示,该方法包括:以无线网络的无线网络拓扑结构作为状态,为预定范围内的用户确定协作APG(S12);以及响应于无线网络拓扑结构的变化,重新为用户确定协作APG(S17),其中,无线网络拓扑结构可以包括用户的分布和接入点的分布。示例性地,在步骤S12中可以使用强化学习算法来确定协作APG。
在步骤S12中,将用户与接入点的协作关系作为强化学习算法中的行为,并且针对每个行为,基于在作出该行为时用户的通信质量需求的满足程度以及所带来的网络开销来计算该行为的评价。例如,基于评价最高的行为来确定当前状态下用户的协作APG。其中,评价最高的行为可以为与其他行为相比,作出该行为时用户的通信质量需求满足程度最高以及所带来的网络开销最小的行为。
在一个示例中,利用每一个用户的信干噪比阈值和估算的相应用户的信干噪比来计算用户的通信质量需求的满足程度。用户的通信质量需求的满足程度可以包括所有用户的效用值以及不满足用户的信干噪比的代价值,其中,用户的效用值由效用函数计算得到,效用函数为估算的用户的信干噪比和相应用户的信干噪比阈值的比值的非线性函数,代价值取决于相应用户的信干噪比阈值与估算的该用户的信干噪比之间的差。
此外,针对每个行为,可以使用该行为与前一状态下确定的行为之间的差别作为该行为所带来的网络开销。行为可以用协作关系的二值化矩阵表示,网络开销可以用行为间的汉明距表示。可以仅在作出该行为时带来的网络开销超过预定开销阈值时将该网络开销纳入考虑。
此外,如图18中的虚线框所示,上述方法还可以包括如下步骤:接收用户的位置信息以及通信质量需求中的一个或多个,以及接入点的位置信息、最大发射功率信息以及预定网络开销阈值中的一个或多个(S11),以及向接入点发送所确定的协作接入点集合的信息(S13)。在步骤S11中接收的信息用于步骤S12中的计算。
上述方法还可以包括步骤S14:针对每一个状态,将该状态下的每一个行为与针对该行为计算的评价相关联地存储为评价矩阵。这样,在状态发生变化时,在存在针对变化后的状态的评价矩阵的情况下,基于评价矩阵的内容来确定变化后的状态下要采用的行为。
此外上述方法还包括步骤S15:在状态发生变化时,接收用户的实际通信质量的信息。进一步地,上述方法还包括步骤S16:利用在前一状态下执行所确定的行为时用户的实际通信质量的信息来更新所存储的前一状态下所执行的行为的评价,即更新评价矩阵的内容。
例如,可以用如下计算的值来代替前一状态下所执行的行为的评价中有关用户的通信质量需求的满足程度的部分:前一状态下用户的通信质量需求的实际满足程度与当前状态下所估算的用户的通信质量需求的最高满足程度的加权和。
此外,虽然图18中未示出,但是上述方法还可以包括如下步骤:在每一个状态下,通过以用户为中心对所述接入点进行分组并在用户的分组内为相应用户选择协作接入点集合来获得行为。例如,可以根据用户与接入点之间的欧氏距离来进行分组。在这种情况下,在用户的分组内为相应用户随机选择协作接入点集合并将满足预定条件的用户与接入点的协作关系作为行为。预定条件例如可以包括以下中的一个或多个:每个用户的通信质量满足其通信质量需求;采用该协作关系时,相对于前一状态下所确定的行为的网络开销不超过预定开销阈值。
此外,在获取行为时,还可以针对每一个状态,基于初步获得的行为来估算新的行为。例如,使用遗传算法来估算新的行为。还可以仅在通过遗传算法估算的行为满足上述预定条件时才将该行为作为新的行为。
注意,上述方法的细节在第一至第四实施例中已经进行了详细描述,在此不再重复。
本公开内容的技术能够应用于各种产品。例如,电子设备100至400可以被实现为任何类型的服务器,诸如塔式服务器、机架式服务器以及刀片式服务器。电子设备100至400可以为安装在服务器上的控制模块(诸如包括单个晶片的集成电路模块,以及插入到刀片式服务器的槽中 的卡或刀片(blade))。
[关于服务器的应用示例]
图19是示出可以应用本公开内容的技术的服务器700的示意性配置的示例的框图。服务器700包括处理器701、存储器702、存储装置703、网络接口704以及总线706。
处理器701可以为例如中央处理单元(CPU)或数字信号处理器(DSP),并且控制服务器700的功能。存储器702包括随机存取存储器(RAM)和只读存储器(ROM),并且存储数据和由处理器701执行的程序。存储装置703可以包括存储介质,诸如半导体存储器和硬盘。
网络接口704为用于将服务器700连接到通信网络705的通信接口。通信网络705可以为诸如演进分组核心网(EPC)的核心网或者诸如因特网的分组数据网络(PDN)。
总线706将处理器701、存储器702、存储装置703和网络接口704彼此连接。总线706可以包括各自具有不同速度的两个或更多个总线(诸如高速总线和低速总线)。
在图19所示的服务器700中,参照图2、图5和图6所描述的确定单元101、更新单元102、分组单元201、估算单元301等可以由处理器701实现。参照图4描述的存储单元103例如可以由存储器702或存储装置703实现,参照图7描述的收发单元401例如可以由网络接口704实现,其功能的一部分也可以由处理器701实现。例如,处理器701可以通过执行确定单元101、更新单元102等的功能来执行协作APG的选择和更新。
以上结合具体实施例描述了本发明的基本原理,但是,需要指出的是,对本领域的技术人员而言,能够理解本发明的方法和装置的全部或者任何步骤或部件,可以在任何计算装置(包括处理器、存储介质等)或者计算装置的网络中,以硬件、固件、软件或者其组合的形式实现,这是本领域的技术人员在阅读了本发明的描述的情况下利用其基本电路设计知识或者基本编程技能就能实现的。
而且,本发明还提出了一种存储有机器可读取的指令代码的程序产 品。所述指令代码由机器读取并执行时,可执行上述根据本发明实施例的方法。
相应地,用于承载上述存储有机器可读取的指令代码的程序产品的存储介质也包括在本发明的公开中。所述存储介质包括但不限于软盘、光盘、磁光盘、存储卡、存储棒等等。
在通过软件或固件实现本发明的情况下,从存储介质或网络向具有专用硬件结构的计算机(例如图20所示的通用计算机2000)安装构成该软件的程序,该计算机在安装有各种程序时,能够执行各种功能等。
在图20中,中央处理单元(CPU)2001根据只读存储器(ROM)2002中存储的程序或从存储部分2008加载到随机存取存储器(RAM)2003的程序执行各种处理。在RAM 2003中,也根据需要存储当CPU 2001执行各种处理等等时所需的数据。CPU 2001、ROM 2002和RAM 2003经由总线2004彼此连接。输入/输出接口2005也连接到总线2004。
下述部件连接到输入/输出接口2005:输入部分2006(包括键盘、鼠标等等)、输出部分2007(包括显示器,比如阴极射线管(CRT)、液晶显示器(LCD)等,和扬声器等)、存储部分2008(包括硬盘等)、通信部分2009(包括网络接口卡比如LAN卡、调制解调器等)。通信部分2009经由网络比如因特网执行通信处理。根据需要,驱动器2010也可连接到输入/输出接口2005。可移除介质2011比如磁盘、光盘、磁光盘、半导体存储器等等根据需要被安装在驱动器2010上,使得从中读出的计算机程序根据需要被安装到存储部分2008中。
在通过软件实现上述系列处理的情况下,从网络比如因特网或存储介质比如可移除介质2011安装构成软件的程序。
本领域的技术人员应当理解,这种存储介质不局限于图20所示的其中存储有程序、与设备相分离地分发以向用户提供程序的可移除介质2011。可移除介质2011的例子包含磁盘(包含软盘(注册商标))、光盘(包含光盘只读存储器(CD-ROM)和数字通用盘(DVD))、磁光盘(包含迷你盘(MD)(注册商标))和半导体存储器。或者,存储介质可以是ROM 2002、存储部分2008中包含的硬盘等等,其中存有程序,并且与包含它们的设备一起被分发给用户。
还需要指出的是,在本发明的装置、方法和系统中,各部件或各步 骤是可以分解和/或重新组合的。这些分解和/或重新组合应该视为本发明的等效方案。并且,执行上述系列处理的步骤可以自然地按照说明的顺序按时间顺序执行,但是并不需要一定按时间顺序执行。某些步骤可以并行或彼此独立地执行。
最后,还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。此外,在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上虽然结合附图详细描述了本发明的实施例,但是应当明白,上面所描述的实施方式只是用于说明本发明,而并不构成对本发明的限制。对于本领域的技术人员来说,可以对上述实施方式作出各种修改和变更而没有背离本发明的实质和范围。因此,本发明的范围仅由所附的权利要求及其等效含义来限定。

Claims (21)

  1. 一种用于无线通信的电子设备,包括:
    处理电路,被配置为:
    以无线网络的无线网络拓扑结构作为状态,为预定范围内的用户确定协作接入点集合;以及
    响应于所述无线网络拓扑结构的变化,重新为所述用户确定协作接入点集合,
    其中,所述无线网络拓扑结构包括所述用户的分布和接入点的分布。
  2. 根据权利要求1所述的电子设备,其中,所述处理电路被配置为将所述用户与接入点的协作关系作为行为,并且针对每个行为,基于在作出该行为时所述用户的通信质量需求的满足程度以及所带来的网络开销来计算该行为的评价,所述处理电路被配置为基于评价最高的行为来确定当前状态下所述用户的协作接入点集合。
  3. 根据权利要求2所述的电子设备,其中,所述评价最高的行为为与其他行为相比,作出该行为时用户的通信质量需求满足程度最高以及所带来的网络开销最小的行为。
  4. 根据权利要求2所述的电子设备,其中,所述处理电路被配置为利用每一个用户的信干噪比阈值和估算的该用户的信干噪比来确定所述用户的通信质量需求的满足程度。
  5. 根据权利要求4所述的电子设备,其中,所述用户的通信质量需求的满足程度包括所有用户的效用值以及不满足用户的信干噪比的代价值,其中,用户的效用值由效用函数计算得到,所述效用函数为估算的信干噪比和信干噪比阈值的比值的非线性函数,所述代价值取决于相应用户的信干噪比阈值与估算的该用户的信干噪比之间的差。
  6. 根据权利要求2所述的电子设备,其中,所述处理电路被配置为针对每个行为,使用该行为与前一状态下确定的行为之间的差别作为该行为所带来的网络开销,所述网络开销用行为间的汉明距表示。
  7. 根据权利要求2所述的电子设备,其中,在作出所述行为的网络 开销超过预定开销阈值时将该网络开销纳入考虑。
  8. 根据权利要求2所述的电子设备,还包括存储器,被配置为针对一个状态,将该状态下的每一个行为与针对该行为计算的评价相关联地存储为评价矩阵。
  9. 根据权利要求8所述的电子设备,其中,所述处理电路还被配置为在所述状态发生变化时,在存在针对变化后的状态的评价矩阵的情况下,基于所述评价矩阵的内容来确定所述变化后的状态下要采用的行为。
  10. 根据权利要求8所述的电子设备,其中,所述处理电路被配置为在所述状态发生变化时,利用在前一状态下执行所确定的行为时所述用户的实际通信质量的信息来更新所述存储器存储的所述前一状态下所执行的行为的评价。
  11. 根据权利要求10所述的电子设备,其中,所述处理电路被配置为用如下计算的值来代替所述前一状态下所执行的行为的评价中有关所述用户的通信质量需求的满足程度的部分:所述前一状态下用户的通信质量需求的实际满足程度与当前状态下所估算的用户的通信质量需求的最高满足程度的加权和。
  12. 根据权利要求2所述的电子设备,其中,所述处理电路被配置为:在一个状态下,通过以用户为中心对所述接入点进行分组并在用户的分组内为相应用户选择协作接入点集合来获得所述行为。
  13. 根据权利要求12所述的电子设备,其中,所述处理电路被配置为根据用户与接入点之间的欧氏距离来进行所述分组。
  14. 根据权利要求12所述的电子设备,其中,所述处理电路被配置为在用户的分组内为相应用户随机选择协作接入点集合并将满足预定条件的用户与接入点的协作关系作为所述行为。
  15. 根据权利要求14所述的电子设备,其中,所述预定条件包括以下中的一个或多个:每个用户的通信质量满足其通信质量需求;采用该协作关系时,相对于前一状态下所确定的行为的网络开销不超过预定开销阈值。
  16. 根据权利要求2所述的电子设备,其中,所述处理电路还被配置为针对一个状态,基于初步获得的行为来估算新的行为。
  17. 根据权利要求16所述的电子设备,其中,所述处理电路被配置为使用遗传算法来估算新的行为,所述处理电路被配置为仅在通过遗传算法估算的行为满足预定条件时将该行为作为所述新的行为。
  18. 根据权利要求1所述的电子设备,还包括:
    收发单元,被配置为接收所述用户的位置信息以及通信质量需求中的一个或多个,以及所述接入点的位置信息、最大发射功率信息以及预定网络开销阈值中的一个或多个,以及向所述接入点发送所确定的协作接入点集合的信息。
  19. 根据权利要求18所述的电子设备,其中,所述收发单元还被配置为接收所述用户的实际通信质量的信息。
  20. 一种用于无线通信的方法,包括:
    以无线网络的无线网络拓扑结构作为状态,为预定范围内的用户确定协作接入点集合;以及
    响应于所述无线网络拓扑结构的变化,重新为所述用户确定协作接入点集合,
    其中,所述无线网络拓扑结构包括所述用户的分布和接入点的分布。
  21. 一种计算机可读存储介质,包括计算机可执行指令,所述计算机可执行指令在由处理器执行时使得执行根据权利要求20所述的方法。
PCT/CN2018/110964 2017-10-25 2018-10-19 用于无线通信的电子设备和方法 WO2019080771A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201880044183.3A CN110809893A (zh) 2017-10-25 2018-10-19 用于无线通信的电子设备和方法
EP18869999.5A EP3700247A4 (en) 2017-10-25 2018-10-19 ELECTRONIC DEVICE AND METHOD FOR WIRELESS COMMUNICATION
US16/634,887 US11140561B2 (en) 2017-10-25 2018-10-19 Electronic device and method for wireless communication

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711009075.6A CN109714772A (zh) 2017-10-25 2017-10-25 用于无线通信的电子设备和方法
CN201711009075.6 2017-10-25

Publications (1)

Publication Number Publication Date
WO2019080771A1 true WO2019080771A1 (zh) 2019-05-02

Family

ID=66246287

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/110964 WO2019080771A1 (zh) 2017-10-25 2018-10-19 用于无线通信的电子设备和方法

Country Status (5)

Country Link
US (1) US11140561B2 (zh)
EP (1) EP3700247A4 (zh)
CN (2) CN109714772A (zh)
TW (1) TWI756438B (zh)
WO (1) WO2019080771A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6805193B2 (ja) * 2018-02-13 2020-12-23 日本電信電話株式会社 無線通信システム、無線通信方法、基地局及び端末
WO2020068127A1 (en) * 2018-09-28 2020-04-02 Ravikumar Balakrishnan System and method using collaborative learning of interference environment and network topology for autonomous spectrum sharing
CN111124332B (zh) * 2019-11-18 2024-03-01 北京小米移动软件有限公司 设备呈现内容的控制方法、控制装置及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105451250A (zh) * 2015-09-01 2016-03-30 电信科学技术研究院 一种网络接入点动态组网方法及设备
WO2016208955A1 (en) * 2015-06-22 2016-12-29 Samsung Electronics Co., Ltd. Method and apparatus for operation in coexistence environment of cellular, non-cellular, macro and micro networks
CN106788646A (zh) * 2015-11-24 2017-05-31 上海贝尔股份有限公司 用于利用虚拟小区进行通信的方法和装置以及通信系统

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101571729B1 (ko) * 2009-01-30 2015-11-25 엘지전자 주식회사 CoMP 집합 단위 핸드오프 수행 방법
US10250364B2 (en) * 2011-12-09 2019-04-02 Nokia Corporation Channel measurements supporting coordinated multi-point operation
JP5962670B2 (ja) * 2012-01-26 2016-08-03 ソニー株式会社 無線通信装置及び無線通信方法、並びに無線通信システム
US8953478B2 (en) * 2012-01-27 2015-02-10 Intel Corporation Evolved node B and method for coherent coordinated multipoint transmission with per CSI-RS feedback
WO2013167161A1 (en) * 2012-05-07 2013-11-14 Nokia Siemens Networks Oy JOINT ASSIGNMENT AND SCHEDULING FOR OVERLAPPING CoMP CLUSTERS
US20140235266A1 (en) * 2013-02-16 2014-08-21 Qualcomm Incorporated Focused assistance data for WiFi access points and femtocells
US9184998B2 (en) * 2013-03-14 2015-11-10 Qualcomm Incorporated Distributed path update in hybrid networks
CN105338650B (zh) * 2014-08-08 2019-02-22 电信科学技术研究院 一种异构网络中的接入方法和装置
WO2017015802A1 (zh) * 2015-07-25 2017-02-02 华为技术有限公司 一种分配接入回程资源的方法及装置
CN110506432B (zh) * 2017-03-31 2021-06-22 华为技术有限公司 一种协作小区确定方法及网络设备
US20190019082A1 (en) * 2017-07-12 2019-01-17 International Business Machines Corporation Cooperative neural network reinforcement learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016208955A1 (en) * 2015-06-22 2016-12-29 Samsung Electronics Co., Ltd. Method and apparatus for operation in coexistence environment of cellular, non-cellular, macro and micro networks
CN105451250A (zh) * 2015-09-01 2016-03-30 电信科学技术研究院 一种网络接入点动态组网方法及设备
CN106788646A (zh) * 2015-11-24 2017-05-31 上海贝尔股份有限公司 用于利用虚拟小区进行通信的方法和装置以及通信系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3700247A4

Also Published As

Publication number Publication date
TWI756438B (zh) 2022-03-01
US11140561B2 (en) 2021-10-05
TW201918109A (zh) 2019-05-01
EP3700247A4 (en) 2020-11-11
CN110809893A (zh) 2020-02-18
EP3700247A1 (en) 2020-08-26
CN109714772A (zh) 2019-05-03
US20200236560A1 (en) 2020-07-23

Similar Documents

Publication Publication Date Title
Mei et al. Intelligent radio access network slicing for service provisioning in 6G: A hierarchical deep reinforcement learning approach
Chen et al. A GNN-based supervised learning framework for resource allocation in wireless IoT networks
Cheng et al. Localized small cell caching: A machine learning approach based on rating data
Yoshida et al. MAB-based client selection for federated learning with uncertain resources in mobile networks
Tang et al. Energy efficient power allocation in cognitive radio network using coevolution chaotic particle swarm optimization
Chen et al. Deep Q-Network based resource allocation for UAV-assisted Ultra-Dense Networks
Hamidouche et al. Collaborative artificial intelligence (AI) for user-cell association in ultra-dense cellular systems
Mismar et al. A framework for automated cellular network tuning with reinforcement learning
Ali et al. Smart computational offloading for mobile edge computing in next-generation Internet of Things networks
WO2019080771A1 (zh) 用于无线通信的电子设备和方法
Zhang et al. Deep learning based user association in heterogeneous wireless networks
Le et al. Enhanced resource allocation in D2D communications with NOMA and unlicensed spectrum
Shodamola et al. A machine learning based framework for KPI maximization in emerging networks using mobility parameters
Giri et al. Deep Q-learning based optimal resource allocation method for energy harvested cognitive radio networks
Van Truong et al. System performance and optimization in NOMA mobile edge computing surveillance network using GA and PSO
Nabi et al. Deep learning based fusion model for multivariate LTE traffic forecasting and optimized radio parameter estimation
Abubakar et al. A lightweight cell switching and traffic offloading scheme for energy optimization in ultra-dense heterogeneous networks
Gures et al. A comparative study of machine learning-based load balancing in high-speed
Tang et al. Nonconvex dynamic spectrum allocation for cognitive radio networks via particle swarm optimization and simulated annealing
Liu et al. Robust power control for clustering-based vehicle-to-vehicle communication
US20230036110A1 (en) Optimization engine, optimization method, and program
Qi et al. QoS‐aware cell association based on traffic prediction in heterogeneous cellular networks
Zou et al. Resource multi-objective mapping algorithm based on virtualized network functions: RMMA
Zhang et al. Intra‐cell and inter‐cell interference‐constrained D2D communication underlaying cellular networks
JP6563474B2 (ja) 無線通信システムにおける装置及び方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18869999

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018869999

Country of ref document: EP

Effective date: 20200520