EP2647239A1 - Procédé et appareil de communication - Google Patents

Procédé et appareil de communication

Info

Publication number
EP2647239A1
EP2647239A1 EP11794062.7A EP11794062A EP2647239A1 EP 2647239 A1 EP2647239 A1 EP 2647239A1 EP 11794062 A EP11794062 A EP 11794062A EP 2647239 A1 EP2647239 A1 EP 2647239A1
Authority
EP
European Patent Office
Prior art keywords
learning
network
state
parameters
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP11794062.7A
Other languages
German (de)
English (en)
Inventor
George Koudouridis
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Sweden AB
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of EP2647239A1 publication Critical patent/EP2647239A1/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0823Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods

Definitions

  • the technical field of multi-user communications provides relevant art of technology for this specification of an invention. This may also be the case for a technical field of short-range radio communications or surface covering wireless communications; or operations, management or configuration of wireless communication networks. This may particularly be the case for a technical field of local awareness and local or distributed control of communication networks.
  • Wireless communications provide a means of communicating across a distance by means of electromagnetic signals.
  • a communications network being wireless to an ever increasing extent; some of the challenges of surface covering wireless communications; such as resource sensing and allocation, interference prediction, and decision making therefore has be approached in the art 15 in order to provide for increased automation of network maintenance and administration.
  • 3GPP TR 36.902 V9.2.0 Technical report; 3rd Generation Partnership Project; 3rd Generation Partnership Project; Technical Specification Group Radio Access Network; Evolved Universal Terrestrial Radio Access Network (E-UTRAN), Self-configuring and self- optimizing network (SON) use cases and solutions(Release 9), France, June 2010, discusses e.g. automated configuration of Physical Cell Identity, Mobility Robustness and setting of HO (handover) parameters, Mobility Load Balancing, RACH (Random Access Channel) configuration, maintaining and building neighbor relationships, and inter-cell interference coordination. Particularly, it has been concluded that reduction of operational efforts and complexity improves system operability in a multi-vendor environment.
  • United States Patent No. US6829491 provides a communication network subject to dynamic optimization using network operation metrics, such as may be acquired from a network controller such as a mobile switching center. Implementation of the parameter adjustments are modeled to determine if further or different operational parameter adjustments should be determined.
  • network operation metrics such as may be acquired from a network controller such as a mobile switching center.
  • Implementation of the parameter adjustments are modeled to determine if further or different operational parameter adjustments should be determined.
  • the document mentions that a network may be load-unbalanced due to unusually dense subscriber populations (e.g. sports arenas during sports events). It concludes that it would be advantageous to have a method and system for dynamically monitoring of network communication metrics, inclusive of metrics associated with communications as provided through a plurality of network resources. Accordingly, operation parameters may thereby be redistributed dynamically as a result of modeling and estimation of network system parameters as a function of network performance information. Summary
  • a method of controlling a telecommunications network comprising at least one device arranged for interaction as regards network configuration 20 parameters is disclosed. Examples of learning systems and network optimization during run-time are provided facilitating adaptation to a system state.
  • FIG. 1 illustrates a typical Cognitive Engine, CE, in accordance with the invention.
  • Figure 2 shows a system architecture illustrating functionality in two example independent CSONE entities.
  • Figure 3 schematically illustrates a system as preferably described in terms of a
  • Figure 4 schematically illustrates determining a best action
  • Figure 5 illustrates some example key enabling technologies and solutions in three different dimensions of cooperative operation.
  • Figure 6 illustrates schematically sensing monitoring interfaces.
  • Figure 7 illustrates communication interfaces of a configuration/decision making and/or execution module
  • FIG. 8 illustrates schematically and in accordance with the invention two example Communication/Cooperation/Execution Modules.
  • Figure 9 illustrates schematically the interfaces of an optimization module (91) and various entities.
  • Figure 10 illustrates a learning example
  • Figure 11 illustrates another learning example.
  • Figure 12 illustrates a cognitive SON centralized architecture.
  • Figure 13 illustrates a cognitive SON distributed architecture with example autonomous cognitive engine.
  • Figure 14 illustrates a cognitive SON hybrid network architecture.
  • Figure 15 shows SON functionality of cognitive SON.
  • Figure 16 illustrates schematically the interactions between two independent processes running in two separate autonomous nodes.
  • Figure 17 illustrates a system relating to the invention.
  • Figure 18 illustrates three levels of the operation relating to the invention.
  • Figure 19 illustrates dimensions of cooperative decision and control relating to the invention.
  • Figure 20 illustrates a system relating to the invention.
  • Figure 21 illustrates cognitive SON optimisation process.
  • Fig.22 illustrates the interactions between two independent processes running in two separate autonomous nodes.
  • Figure 23 illustrates the procedure of optimization according to the invention.
  • Figure 24 illustrates functionality in two independent CSONE entities according to the invention.
  • Figure 25 illustrates a system according to the invention.
  • Figure 26 illustrates a system according to the invention.
  • Figure 27 illustrates a system according to the invention.
  • Figure 28 illustrates optimization functional unit according to the invention.
  • Figure 29 illustrates procedure of learning task.
  • Figure 30 illustrates an example of learning according to the invention.
  • Figure 31 illustrates a cognitive SON centralised architecture.
  • Figure 32 illustrates a deployment of the architecture consisting only of CSONE entities.
  • Figure 33 illustrates a cognitive SON distributed architecture.
  • Figure 34 illustrates a deployment of the above architecture consisting only of CSONE entities.
  • Figure 35 illustrates a CSONE hybrid Architecture of central coordination.
  • Figure 36 illustrates a CSONE hybrid Architecture of distributed coordination.
  • Figure 37 illustrates a deployment of the architecture consisting only of CSONE entities.
  • cognitive nodes In order to make such a level of cognition possible, cognitive nodes efficiently represent and store environmental and operational information, since a distinctive characteristic of cognitive radios and cognitive networks is the capability of making decisions and adaptations based on past experience, on current operational conditions, and also possibly on future behavior predictions. A model of an underlying environment in each node provides only partial knowledge. Nodes may therefore cooperate in order to jointly acquire a global or wide-range knowledge of the environment, enabling distributed operations.
  • FIG 1 illustrates a typical Cognitive Engine, CE, in accordance with the invention as will be further described in detail below.
  • a cognitive node can maintain a model of the local environment that in turn allows for educated communications decision based on the impact of its actions.
  • a cognitive node determines or selects decision variables in order to maximize a performance metric, e.g., determining or selecting a power setting value that will lead to (close to) maximum utilization of network resources.
  • a cognitive node acts autonomously, as the CE provides the ability to learn and adapt to a changing environment.
  • a cognitive engine preferably is adapted to: accurately model dynamics and one or more states of its environment by means of: performance metrics and environment dynamics (physical environment - radio resources)
  • a system may change state at any point in time.
  • a system's state typically may change many times during its life-time.
  • some system states are useful for mapping into an action decision variable while others are not.
  • some system states are targeted while others are not. Performing control over the processes aims at steering system transitions to targeted system states e.g., states where the system performs favorably.
  • Figure 2 shows a system architecture illustrating functionality in two example independent CSONE entities. Operation in the cognitive SON engine CSONE is supported and realized by means of knowledge stored in a knowledge base. More specifically, each node of the various units or modules as described above preferably maintain a knowledge base (111) comprising facts and rules.
  • the knowledge base may be distributed or centralized. Facts are represented by parameter value pairs that build up a model of the environment and itself, i.e. the owner of the facts and the knowledge base. Facts are used to represent information about e.g.
  • radio environment inclusive of load and interference level
  • configuration settings such as transmitted power settings.
  • Rules are preferably represented by parameter-value implications of premise- implies-conclusion type (if ⁇ premise> then ⁇ conclusion).
  • a premise is e.g. a rule or a (conjunction of) fact(s), typically of monitoring types.
  • a conclusion correspondingly is, e.g., a rule or a (conjunction of) fact(s), typically of configuration type.
  • rules apply for all values of parameters of a subset of values as defined by numerical operators.
  • Rules may imply rules or facts.
  • the set of facts and rules represents a model of the environment in which the knowledge possessing entity interacts.
  • the set represents a model of the entity itself including its capabilities, objectives, roles, functions and actions.
  • the set of facts and rules represents a model of the environment in which the knowledge possessing entity interacts in and a model of the entity itself including its capabilities, objectives, roles, functions and actions.
  • knowledge K consists of facts and rules.
  • Facts reflects on apriori knowledge of the environment and the entity itself. It includes among others the system state S set, the actions A set that the entity itself and functions F set.
  • Facts and Rules are stored in a Knowledge Base, preferably accessible by all functional units partially or in its entirety.
  • a model of the external environment and the rules the environment obeys can be described and stored in the knowledge base.
  • An observation and transition model of the environment can be also described in terms of environment states and transitions between states due to events caused by external entities or due to actions taken by the CE itself.
  • the environment model is based on a-priori and/or learned knowledge and presented by parameters or parameter functions.
  • An cognitive SON engine CSONE is said to learn from experience E with respect to some class of tasks T and performance measure/metrics P, if its performance at tasks in T, as measured by P, improves with experience E.
  • a radio node that learns configuration of mobility optimization might improve its mobility performance as measured by its ability to optimally configure mobility parameters through experience obtained by configuring mobility with its neighbors.
  • a well-defined learning problem requires a well-defined task, performance metric and training experience.
  • Design a learning approach involves a number of design choices, including choosing the type of training experience, the target (or objective) function to be learned, a representation for this function and an algorithm for learning the target function from training examples.
  • learning involves searching through a space of possible hypotheses H to find the hypothesis h that best fit the available training examples D and other prior constraints or knowledge.
  • h In terms of SON functionality at any one time t, h, would correspond to a state s, and D to the current set observations ⁇ .
  • Much of the above optimisation and control functionality is performed by learning methods that search different hypothesis spaces (e.g., numerical functions, decision trees, neural networks, policies, rules) based on different conditions under which these search methods converge toward an optimal hypothesis.
  • search different hypothesis spaces e.g., numerical functions, decision trees, neural networks, policies, rules
  • Operation of optimization control is performed by learning new facts and rules or by modifying existing rules to improve performance.
  • Figure 3 schematically illustrates a system as preferably described in terms of a model.
  • a model should ideally represent all entities of the system, their states and procedures not hiding any information from being derived in order to correctly represent the system.
  • a system state may typically be specified based on a multiplicity of quantifying parameters of the system model.
  • a preferred set of parameters, S provides all the parameters necessary to differentiate between any two system states. Parameters can be monitored, calculated, estimated and/or derived from statistical observations.
  • system parameters include
  • N is a positive integer.
  • a system responds as determined appropriate to a particular system state by means of actions as equipped.
  • An action refers to execution of one or multiple instructions during operations of a system.
  • an action corresponds to configuration of network nodes that controls operations of the network.
  • the actions are arranged to maintain the system in a target state or bring it towards a target state.
  • a system operating entity is equipped by a set of actions A which are performed as needed to drive the system towards the goal/target state.
  • a goal/target state is, e.g., radio resource allocation for desired throughput performance where the actual allocation corresponds to the action.
  • control 7t A (s) refers to the process of identifying an action for any state of the system that is a target state. More specifically, control 7t A (s),
  • 7i A (s): S ⁇ A,, (eq. 2) maps a state ss S into action as A .
  • An example process of determining a control in accordance with an embodiment of the invention is as follows: For each state of the system, find the best action, A,,, among a given of authorized actions, A,,. In accordance with an embodiment of the invention, determining the best action, A,, is schematically illustrated in figure 4. The determining involves configuring a controller (42) as depicted in the figure. Three levels of operation are preferably considered:
  • execution level system function such as any RRM function (43) of a radio network.
  • the optimization entity (41) determines a control process 7i(s), preferably optimized in relation to one or more objectives.
  • optimizing a control process is understood as determining a policy, i.e. determining relevant control for a preferred set of state variables in order to maximize a utility objective, or minimize a cost/penalty objective, considering the various states.
  • Objectives are preferably expressed by means of utility functions (objective functions).
  • a utility function is a function that maps from system states preferably to real numbers. As indicated in figure 4, entities of the various operation levels cooperate. Implementing cooperation requires deployment of communication enabling functionality.
  • cooperation enablers refer to those mechanisms, procedures and/or protocols which make the collaboration between network nodes and system entities possible. Collaboration spans between information exchange, action coordination and decision making; moreover, such aspects are expected to be exploited in different dimensions thus featuring different communication layers and capabilities.
  • Three of the key aspects corresponding to dimensions for decision and control that are based on cooperation or potentially can benefit from it, are:
  • two or more nodes may cooperate by simply exchanging information with each other and deciding independently.
  • two or more nodes may cooperate by deciding on a coordinated plan of configurations shared between the nodes.
  • cooperation may be performed by each layer separately or in a cross-layer fashion.
  • a set of enablers for cooperative decision and control is preferably associated.
  • the following associations are envisaged: in information/context exchange dimension (or collaboration dimension): - sensing data, configuration settings,
  • the first aspect captures the balance between a node's individual objectives and the objectives of the network it belongs to.
  • a network objective can be defined as the sum of all nodes' objectives (a "social welfare" objective).
  • a node may defer from taking actions that maximize its individual objectives for the benefit of the maximization of network objectives. For instance, a cell may select a lower transmission power setting while this maximizes the overall network throughput (e.g. by causing less interference) at the expense of its local cell throughput performance.
  • the opposite would be a node selecting a higher power setting to increase its local cell throughput, thereby possibly causing more interference to neighboring cells, reducing the overall network throughput.
  • the second aspect refers to identification of optimal configurations for network operation (e.g. with the least energy consumption) that balance the benefits offered by a fully cooperative (coordinated decisions) and a simpler autonomic (independent decisions) approach. For example this should take into account on one hand that the complete picture can be made available to various parts of a cooperative system (e.g. utilizing information sharing) but this additional fine-grained information and flexibility comes with a cost in signaling to be justified by the expected gains. Also the processing associated with the second aspect (coordinated actions-independent actions) is preferably balanced.
  • the invention identifies a number of events causing problems with state parameters or the associated mapping:
  • the set of parameters comprises a great number of parameters making the system state description complex.
  • the parameter values are noisy e.g., due to traffic and radio channel being
  • the environment is stochastic i.e., the transition between system states is not deterministic.
  • ⁇ Sensing/Monitoring Unit (102) deals with the observation and state transmission modeling.
  • Configuration/Decision Making Unit (103) deals with the action-state modeling for action selection and valuation.
  • Interaction Functional Unit (104) deals with interaction modeling for negotiation and communication of decisions and execution/effectuation of selected actions.
  • Knowledge base consists of facts and rules describing the models required for the realization of the cognitive SON engine.
  • Knowledge base can be a functional unit of its own or maintained and communicated between functional units as depicted above
  • the various units communicate over interfaces (105), (106), (107), (108), (109), (110).
  • operation in the cognitive engine (also referred to as policy engine) is supported and realized by means of knowledge in terms of fact and rules stored in a data base, a knowledge base (111).
  • rules are the various controls of a policy, which is a mapping of S on A.
  • FIG. 6 illustrates schematically sensing monitoring interfaces.
  • the role of a sensing/monitoring module is, e.g., collection of KPIs, KPI statistical processing, and control of KPI monitoring.
  • the sensing module has a monitoring to communication interface (106), (61), MCi. E.g. monitoring parameters and monitoring time-interval are communicated over the MCi.
  • the sensing module also has a monitoring to decision interface, MDi, (109), (62).
  • System state information is communicated over the MDi.
  • monitoring interfaces that are device dependent, such as an interface to RRM (Radio Resource Management) at a base station, or a device interface between a gateway and the sensing module.
  • RRM Radio Resource Management
  • measurement monitoring interfaces 63), (64) e.g. for monitoring or communication of parameter values or time intervals, such as with a RAT/RAN RRM (Radio Access
  • Figure 7 illustrates communication interfaces of a configuration/decision making and/or execution module.
  • Configuration/decision making and/or execution module functions comprise e.g., making configuration decisions based on environment state information, radio resource configuration of control, and power and spectrum allocation.
  • An example interface (107), (71) facilitates exchange of configuration parameters or information between the configuration/decision making and/or
  • the monitoring to decision interface (109), (62), (72) has been explained in relation to figure 6.
  • An example interface (between the configuration/decision making and/or execution module (73), (74) provides a device dependent interface for RRM at a base station or for a gateway.
  • the interface may comprise two parts, a decision part for exchange of configuration control parameters (to be set) or configuration information (to be collected), and an execution part for exchange of messages configuring a device such as an RRM or a gateway.
  • Figure 8 illustrates schematically and in accordance with the invention two example Communication/Cooperation/Execution Modules (81), (82), e.g., comprising functionality for providing ⁇ information exchange, such as.
  • configuration information e.g. power, spectrum, interference cancellation, neighbor information
  • the two modules communicate with each other over a Ci (Cooperation/Communication Interface) (83) and with other entities such as ⁇ RRM at a base station (84); or
  • the execution part (87), (88) comprises e.g.
  • the Communication /Execution /Cooperation modules interface an RRM entity/function and a sensor/actuator element/gateway across a Ci/Xi (87), (88) interface (communication/cooperation / execution interface).
  • Figure 9 illustrates schematically the interfaces of an optimization module (91) and various entities (92), (93), (94) that the optimization module interfaces (95), (96), (97) in accordance with the invention.
  • the optimization module (91) ⁇ classifies one or more states of the environment based on the parameters for single, or multiple objectives.
  • the optimization module preferably have a plurality of interfaces (95), (96), (97). There are three different interfaces illustrated. One is intended for monitoring (92), and the other for decision making (97). A third interface between the optimization module and a user of a communication/cooperation module (96) is destined for execution.
  • the optimization module is preferably adapted for learning a policy that maps any state of the system to an set of actions that operate favorably according to objectives of an adoption process of the optimization module, this regardless of whether
  • policies are maintained and executed centrally or distributed, whether distributed in numerous or functionally.
  • the optimization module is adapted to learn, identify and/or provide distinguishable states of the system and the differentiating parameters, an accurate model of environment and the rules governing it for future predictions,
  • a set of rules that provides efficient and stable operation and fast convergence as the system state changes.
  • the set of state is recursively refined by learning, the actions onto which the states are mapped are correspondingly adaptively refined, as are the mapping rule and network parameter settings.
  • a set of parameters are preferably identified for a given objective or set of objectives capable of differentiating between any two states of the system.
  • Bayesian learning e.g., applied to identify the conditioning and the correlations between parameters indicative to a system state.
  • Inductive learning learning rules out of observable facts, e.g., applied for learning a state.
  • Neural network learning (learning a function from known examples), e.g. applied for learning a state.
  • Instance-based learning learning state functions from similarities and differences between instances, e.g. applied for learning a state.
  • An example output is a concise description of system states where organization patterns and operation patterns are uniquely identified, preferably with non or just a few non-explaining states remaining to be considered for the mapping, control or policy. At best there is a solution where each state is described by a minimum number of one or more parameter values or parameter-value pairs.
  • Time is also an aspect as the output need provide an accurate result over time. To capture dynamics over time state -transitions are considered.
  • Another aspect of the invention is action-state control.
  • Methods applicable as such to action-state mapping, control or policy are known as such in the art.
  • Non-exclusive examples of such methods are
  • Reinforcement learning differs from standard supervised learning in that correct input/output pairs is not required.
  • RL is a form of learning that conforms to
  • ⁇ actions typically corresponding to value settings or one or more configuration parameters/variable s .
  • Q-learning is a particular implementation of RL, where an expected payoff/reward associated with various actions is estimated.
  • a controller makes such an estimate.
  • Q-learning estimates Q-values recursively.
  • a Q-value, Q(s,a) is a value function that provides a numerical estimate of the value of performing an individual action at a given state s of the environment.
  • the controller updates its estimate Q(s,a) based on a sample (a, r): Q(s,a) ⁇ - Q(s,a) + (r-Q(a)) . (eq. 4)
  • the sample (a, r) is the experience obtained by the base station: action a was performed resulting in payoff/reward r.
  • is the learning rate (0 ⁇ 1), governing to what extent the new sample replaces the current estimate. Assuming infinite number of iterations, the algorithm converges to Q(a).
  • FIG. 10 A learning example is illustrated in figure 10: With reference to figure 10, the task is to find a policy 7i(s): S ⁇ A that maximizes the sum of future cumulative rewards, expressed as a utility function
  • Non-deterministic environment m ⁇ a ⁇ x t+k+1 ; 0 ⁇ ⁇ ⁇ 1
  • the Q-algorithm is as follows for a starting state and action (s t ,a t ):
  • a learning rate coefficient r] is preferably added
  • Exploration-Exploitation is a probabilistic approach to select actions
  • k>0 is preferably a constant that determines how strongly the selection favors actions with high Q- values. Larger k-values will assign higher probabilities to actions with above average Q, causing an optimizer to exploit what it has been learned and seek actions as instructed to maximize its reward. Smaller values will assign higher probabilities for other actions with below average Q, causing the optimizer to explore actions that do not currently have high Q values. Parameter k may vary with the number of iterations so that the optimizer favors exploration in the early stages of learning, to gradually shift towards more exploitation.
  • Figure 11 illustrates another learning example, where transmit power, p, and antenna tilt, a, are configured and determined according to the traffic of a cells area. Illustrated as a non-limiting single-cell example, transmit power is assumed constant and the Q-values for different antenna tilt angles are learned, until a favorable action a 4 is found with a resulting Q-value of
  • utilities are applied to guide the determination of an action by providing a maximum utility.
  • a utility function evaluates a state of an environment. It maps the state to a scalar value indicating how good the state is. By comparing the scalar to other one or more values, e.g. of other states, it is possible to compare how good different states are.
  • Reward functions in reinforcement learning optimization should be expressed as utility functions on a multiplicity of KPIs.
  • a negotiation strategy is preferably applied.
  • a typical negotiation strategy comprises a sequence of actions taken in a negotiation process e.g. consisting of offers, counter-offers, accept or quit.
  • Learning in negotiation in principle provides learning the negotiation strategy of other negotiating entities, their types, utilities and models.
  • bayesian belief networks can be used as efficient updating mechanisms.
  • recipient Given the domain knowledge in the form of conditional statements, recipient preferably uses a standard Bayesian updating rule to revise the desirable outcome of the offerer.
  • Example classes of learning that can be applied in a multi-cell (multi -objective system are:
  • a learning network provides great many benefits as compared to set preconfigured networks. It is not always known from the first deployment how traffic in an area will behave or develop, what will be the load, what it typical user mobility, or how the area should be classified according to kind. In brief the best configuration may not be known at the time of commission or deployment, while a learning network is capable of adapting thereto. According to preferred embodiments, the learning facilities provide dynamically discovering of optimal solutions at run -time. The learning process allows for base stations to reconfigure themselves if they are moved to a new area or if the traffic behavior changes, such as when establishing a new residential area. The learning process for a communications network should be arranged as a long-term process for convergence to a preferred solution over time.
  • the cognitive engine and learning is preferably applied to a cellular network for various optimization objectives.
  • the utility function f[Ki...,K n ] corresponds to a policy set by the operator and facilitates comparison between different sets of KPI providing different states. There is a mapping from the decision parameters (configuration parameters to KPI values. By learning, the system can understand this mapping and how to change configuration parameters to quickly get to the optimum system state.
  • Figures 12-14 schematically illustrate deployment of cognitive SON functionality in wireless networks of various physical architectures.
  • ⁇ ( ⁇ , ao, ⁇ , ai,. . . , 3 ⁇ 4 t ) (2.1) is called the policy of the node and maps the complete history of observation-action pairs up to time t— to an optimal action a t .
  • the policy ignores all its observed history except for the last observation 0 t resulting in the form
  • ⁇ ( ⁇ ,) a, (2.2) which is a mapping from the current observation of the entity to an action a t .
  • the collective information that is contained in the world at any time step t, and that is relative to performance measure, is called a state of the world and is denoted by s t .
  • the observation ⁇ of the entity provides only partial information about the actual state s,.
  • the stochastic coupling between s, and ⁇ may alternatively be defined by an observation model in the form ⁇ ( ⁇ ,
  • the Markov property is assumed for the world model where the current state of the world at time t summarizes all relevant information for the state at time t+1. More specifically, an entity can perceive a set S of distinct states and has a set A of actions it can perform. At each time step t the entity senses the current state s t , chooses an action a t and performs it with a
  • function ⁇ corresponds to a transition model that specifies the mapping between a state-action pair (s t> a,) to a new state s, + i with probability one if the environment is deterministic and probability environment p(s, + i
  • + i is a stochastic variable that can take all possible values in S, each with corresponding probability p(s, + i
  • Each entity selects among the actions that achieve the objectives of the tasks/operations it has been aimed for.
  • a way to formalize the notion of objective is to define them as goal states of the world that would correspond to the optimal states that the environment would be if the tasks were optimally performed.
  • an autonomous entity searches through the state space for an optimal sequence of actions to a goal state.
  • Clearly, not all states are of equal preference and not all goal states are equally optimal.
  • a formulization of the notion of preference and optimality is by assigning to each state s a real number U(s) that is called the utility of state s for that particular task and entity; the larger the utility of the state U(s), the better the state s.
  • U evaluating each state of the world can be used by an entity for its decision making. Assuming a stochastic environment utility-based decision making is based on the premise that the optimal action a,* of the entity at state s, should maximize expected utility, that is,
  • a reward function r Sx A ⁇ R, i.e., the entity receives reward r(s, a) when it takes action a at state s, then the entity is to maximize a function of accumulated reward over its planning operation time.
  • a standard such function is the discounted future reward r(s,,a,) + yr(s, + i,a, + i) + y 2 r(s, +2 , a, +2 )+" ⁇ where ⁇ e [0, 1) is a discount rate ensuring that the sum remains finite for infinite operation time.
  • ⁇ e [0, 1) is a discount rate ensuring that the sum remains finite for infinite operation time.
  • different policies will produce different discounted future rewards, since each policy will take the entity through different sequences of states.
  • the optimal value of a state s following some policy is defined as the maximum discounted future reward the entity would receive by starting at state s by:
  • a policy 7i*(s) that achieves the maximum in (2.8) or (2.9) is an optimal policy: n * (s) e arg max Q (2. 10) a Note that there can be many optimal policies in a given task, but they all share a unique U* and Q*.
  • Q-learning is a method for estimating the optimal Q* (and from that an optimal policy) that does not require knowledge of the transition model.
  • the entity repeatedly interacts with the environment and tries to estimate Q* by trial-and-error.
  • the entity initializes a function Q(s,a) for each state-action pair, and then it begins exploring the environment.
  • the entity can choose exploration action a in state s according to a Boltzmann distribution where ⁇ controls the smoothness of the distribution (and thus the randomness of the choice), and is decreasing with time.
  • each entity i receives an observation ⁇ ⁇ e ⁇ that provides information about s.
  • the profile of the individual observations of all entities ($ defines the joint observation ⁇ .
  • each observation is a deterministic function of the state: the observation of each entity at each state is fully determined by the setup of the problem.
  • a more general observation models can be defined in which the coupling between states and observations is stochastic.
  • an observation model could define ajoint probability distribution p(s,$) over states and joint observations, from which various other quantities can be computed, like ⁇ ( ⁇ ) or p(6
  • the profile of individual policies ( ⁇ ;) defines the joint policy ⁇ .
  • Multi-entity decision making also requires defining an explicit payoff function Qi for each entity.
  • This function can take several forms; for instance, it can be a function Qi (s, a) over states and joint actions; or a function Qi(9, a) over joint observations and joint actions; or a function Qi(9i, a) over individual observations and joint actions. Note that often one form can be derived from the other; for instance, when an inverse observation model p(s
  • $) is available, we can write Qi($, a) ⁇ seS p(s
  • ajoint policy 7i* ( 7ii* ) i s a Nash equilibrium if no entity has an incentive to unilaterally change its policy; that is, no entity i would like to take at state s an action ai ⁇ (s) assuming that all other entities stick with their equilibrium policies ⁇ - * (s).
  • the policy can be negotiated among' the entities as necessary. Negotiations are performed by means of interaction rounds with offers and counter-offers ending with accept or quit.
  • the offers and counter offers refers to suggestions for joint actions who's Q(s,a) of the joint action a is within the thresholds of offer acceptability of the involved entities.
  • - Ai is the set of available actions of entity i.
  • - ⁇ is the set of private information $i e ⁇ that define the types of entity and which is not revealed to the other entities
  • Including payment functions ⁇ is essential because we need to motivate the entity to participate in the mechanism; participation for an entity is not a priori the case.
  • a mechanism in which no entity is worse off by participating, that is, Qi(Si, a) > 0 for all i, ⁇ , and a , is called individually rational.
  • Figure 12 illustrates a cognitive SON centralized architecture.
  • a central node with cognitive engine configure node functions. This includes function referring to control and information functions, e.g. RRM functions. The functions are preferably dedicated and abstracted.
  • Figure 13 illustrates a cognitive SON distributed architecture with example
  • Figure 14 illustrates a cognitive SON hybrid network architecture with a plurality of options such as central coordination, distributed coordination, hierarchical structures, or a structure with central and distributed coordination at each level of the hierarchy.
  • a communication node or simply node
  • a communication node is generally assumed to observe its environment, deliberate, decide what actions to take, actuate its decisions and finally adapt to its environment. It's desirable that in due course the node learns the most optimal decision given a set of environment conditions and possibly some feedback.
  • An autonomous node is any device where decisions can be made.
  • communications nodes will be exemplifies by radio/wireless nodes which in cellular (mobile) networks refer to infrastructure nodes such eNBs (enhanced Node B) and BSs (Base Stations) and mobile nodes such as UE (User Equipment) and mobile terminals.
  • eNBs enhanced Node B
  • BSs Base Stations
  • UE User Equipment
  • Figure 15 shows SON functionality of cognitive SON as follows: ⁇ Observation: monitors the environment for observations ⁇ in order to derive the current state s (in its simplest form it monitors parameters and may/may not derive statistics from observed parameters).
  • Figure 16 illustrates schematically the interactions between two independent processes running in two separate autonomous nodes.
  • Cognition is a multi-disciplinary concept targeting systems with a wide range of capabilities such as resource sensing, interpretation, inference, prediction, decision making, learning, and cooperation.
  • self-management encompasses self-capabilities, such as, self-awareness, self-configuration, self-optimization and self-healing.
  • the need for cognitive adaptation spans various time-scales due to the different time-scales of the changes in the radio or networking environment. For example, short scale changes radio environment are caused by fading and shadowing, and adaptation requires fast reaction. Medium time-scale changes are caused by the changing set of communicating devices or traffic flows, finally, long term changes happen due to changing traffic load or due to network failures.
  • the basis for cognitive, autonomous and self-managing networks is a high level of local node awareness about the local physical and network environment, as well as some notion of the corresponding global network status.
  • cognitive nodes In order to make such a level of cognition possible, cognitive nodes must efficiently represent and store environmental and operational information, since a distinctive characteristic of cognitive radios and cognitive networks is the capability of making decisions and adaptations based on past experience, on current operational conditions, and also possibly on future behaviour predictions. It is therefore imperative to obtain a functional understanding of the underlying environments, such that operational models of each system layer can be constructed and subsequently combined to an integrated model where the relation between the parameters of the physical and network environment and its correlations are exposed.
  • the models of the environment in each node provide only partial knowledge. Nodes may therefore cooperate in order to jointly acquire a more global knowledge of the environment, enabling distributed optimization.
  • the cognitive capabilities of a network node are enabled by a Cognitive Engine (CE), as depicted in architecture later on.
  • a cognitive node can maintain a model of the local environment that in turn allows for educated communications decision based on the impact of its actions.
  • a cognitive node can further make rational decisions in order to maximize its performance metrics, e.g., a cognitive node selects a power setting value that will lead to optimal utilization of network resources.
  • a cognitive node can act autonomously since the CE provides the ability to learn and adapt to a changing environment.
  • a cognitive engine should be able to: Accurately model the dynamics and the state of its environment by means of : o Performance metrics and environment dynamics (physical environment - radio resources) o Model-deduced knowledge/information exchange between the cognitive nodes (network environment - neighboring nodes). Make rational decisions in terms of action selections. o The goal for a rational node is to maximize the expected utility of its actions given the state of its physical and network environment. o Learn from past actions, events, impact and (delayed) feedback.
  • An architecture suited to dynamic future mobile network environments is herewith suggested to cope with emerging concept of cognitive autonomous, cooperative, self-Xed and self-organisednetworks.
  • a system may be in different states at any one time.
  • a system's state may change many times throughout its life-time.
  • Such processes cause system state transitions.
  • some system states are desirable while others are not.
  • some system states are a system's target while others are not.
  • Performing control over the processes aims at steering system transitions to targeted system states e.g., states where the system performs optimally.
  • Describing a system is done by means of a model.
  • a model of any system consists of all the entities in the system, their states and procedures not excluding any information derived to understand and evalutate the system.
  • a system state is typically represented/described/characterised(?) based on a multiplicity of quantifying parameters of the system model.
  • This set of parameters, S provide all the parameters necessary to differentiate between any two system states.
  • System state S, S (KPIi,...,KPI N ) where KPI in a radio network may incl. cell load, number of users, radio resource utilisation, throughput, spectral efficiency, QoS, etc.
  • the system may respond by means of actions it is equipped with.
  • the goal is to act so as the system remains in or moves towards a target state.
  • Acting refers to the execution of one or multiple instructions on the operation of the system.
  • an action corresponds to the configuration of network nodes that controls its operation.
  • a system operating entity is equiped by a set of actions A which are performed as needed to drive the system towards a goal/target state e.g., radio resource allocation for optimal throughput performance where the actual allocation corresponds to the action and optimal trhoughput performance to the target state. More specifically, we define
  • Action A, A (ai,...,a M ) where a is an action which in a radio network corresponds to the setting of one or more configuration parameters incl. transmitted power, antenna tilt, antenna mode, beam-forming, mobility offset, admission threshold, etc
  • Figure 17 illustrates a system according to the invention.
  • control refers to the process of identifying an action to any state of the system that is a target state. More specifically,
  • Control, 7i(s):S ⁇ A maps a state se S into action ae A and
  • Policy - the control process function 7i(s) defined over all states in S.
  • the objective of control optimisation is to find the most optimal (or an optimal) policy.
  • Objectives are expressed by means of utility functions ( objective functions) that describes how close to the targeted optimum a system state is.
  • a utility function is a function that maps from system states to real numbers.
  • cooperation enablers refer to those mechanisms, procedures and/or protocols which make the collaboration between network nodes and system entities possible. Collaboration spans between information exchange, actions coordination and decision making; moreover, such aspects are expected to be exploited in different dimensions thus featuring different communication layers and capabilities.
  • any cooperative and/or autonomous solution can be mapped to this space which can present numerous kinds of solution arrangements for cooperation.
  • two nodes may cooperate by simply exchanging information with each other and deciding independently.
  • two nodes may cooperate by deciding on a coordinated plan of configurations divided between them.
  • cooperation may be performed by each layer separately or in a cross-layer fashion.
  • Figure 19 illustrates dimensions of cooperative decision and control according to the invention.
  • Information/Context exchanging axis or collaboration axis: sensing data, configuration settings, fused/processed information, knowledge presentation, etc.
  • Decision coordination and control axis or coordination axis: routing/relaying control, negotiation protocol, coordination planning, synchronisation, distributed decision making, knowledge reasoning, conflict resolution, etc.
  • Layer mechanisms axis Routing/ relaying at L3 layer, MAC protocols and/or relaying at L2 layer, cooperative multi-point transmission at LI (PHY) layer, network coding and cross-layer etc.
  • a network objective can be defined as the sum of all nodes' objectives (as in social welfare).
  • a node may defer from taking actions that maximise its individual objectives for the benefit of the maximisation of the network objectives. For instance, a cell may select a lower power setting that maximises the overall network throughput (e.g., cause less interference) to the expense of its cell throughput performance.
  • a node may select a higher power setting to increase its own cell throughput causing more interference to all neighbouring cells and thus reducing the overall network throughput.
  • the second direction focuses on the trade-offs and the benefits offered by a fully cooperative (coordinated decisions) and a simpler autonomic (independent decisions) approach. For example extensive information exchange would increase signalling while the absence of any information would lead to non-optimal decisions.
  • the set of parameters is many and the system state description becomes complex.
  • the parameteres are noisy e.g., due raffic and radio channel are stochastic snd/or
  • the list of actions is incomplete to achieve the targeted objective.
  • the utility function guiding the action selection diverges from target system state or converges unacceptably slowly. - ...
  • Signalling/coordination/information exchange cost e.g., overhead and energy.
  • a node In the observations of a node are embedded the (physical, really or artificial) environment it perceives and acts in and the world consisting of all nodes perceiving and acting in this environment.
  • the observation 0 t of the entity provides only partial information about the actual state s t .
  • the stochastic coupling between s t and 0 t may alternatively be defined by an observation model in the form p(0 t
  • 6 t ) p(6 t
  • the Markov property is assumed for the world model where the current state of the world at time t summarizes all relevant information for the state at time t+1. More specifically, an entity can perceive a set S of distinct states and has a set A of actions it can perform. At each time step t the entity senses the current state s t , chooses an action a, and performs it with a change of the environment and world state as a result. With other words upon action execution the environment responds by producing the succeeding state ⁇ ,).
  • corresponds to a transition model that specifies the mapping between a state-action pair (s t , ⁇ ,) to a new state s t +i with probability one if the environment is deterministic and probability environment p(s t +i
  • Each entity selects among the actions that achieve the objectives of the tasks/operations it has been aimed for.
  • a way to formalize the notion of objective is to define them as goal states of the world that would correspond to the optimal states that the environment would be if the tasks were optimally performed.
  • an autonomous entity searches through the state space for an optimal sequence of actions to a goal state.
  • states are of equal preference and not all goal states are equally optimal.
  • a formulization of the notion of preference and optimality is by assigning to each state s a real number U(s ) that is called the utility of state s for that particular task and entity; the larger the utility of the state U(s ), the better the state s.
  • Such a function U evaluating each state of the world can be used by an entity for its decision making. Assuming a stochastic environment utility-based decision making is based on the premise that the optimal action a, of the entity at state s, should maximize expected utility, that is,
  • a standard such function is the discounted future reward r(s t ,a t ) + Yr(s t +i,a t +i) + y 2 r(s t +2, a t +i)+- ⁇ ⁇ , where ⁇ e [0, 1) is a discount rate ensuring that the sum remains finite for infinite operation time.
  • ⁇ e [0, 1) is a discount rate ensuring that the sum remains finite for infinite operation time.
  • the optimal Q-value of a state s and action a of the entity is the maximum discounted future reward the entity can receive after taking action a in state s:
  • a policy Ji*(s) that achieves the maximum in (2.8) or (2.9) is an optimal policy:
  • Q * (s, a) R(s, a) + ⁇ p(s' ⁇ s, a)max Q * (s ⁇ ') (2.11)
  • This is a set of nonlinear equations, one for each state, the solution of which defines the optimal Q*.
  • the transition model is unavailable .
  • Q-learning is a method for estimating the optimal Q* (and from that an optimal policy) that does not require knowledge of the transition model.
  • the entity repeatedly interacts with the environment and tries to estimate Q* by trial-and-error. The entity initializes a function Q(s,a) for each state-action pair, and then it begins exploring the environment.
  • (0, 1) is a learning rate that regulates convergence. If all state-action pairs are visited infinitely often and ⁇ decreases slowly with time, Q-learning converges to the optimal Q ⁇ [Watkins 1992] .
  • the entity can choose exploration action a in state s according to a Boltzmann distribution
  • a more general observation models can be defined in which the coupling between states and observations is stochastic.
  • an observation model could define ajoint probability distribution p(s, ⁇ ) over states and joint observations, from which various other quantities can be computed, like ⁇ ( ⁇ ) or p(0
  • the profile of individual policies (jii ) defines the joint policy ⁇ .
  • Multi-entity decision making also requires defining an explicit payoff function Qi for each entity.
  • This function can take several forms; for instance, it can be a function Qi (s , a) over states and joint actions; or a function Qi ( ⁇ , a) over joint observations and joint actions; or a function Qi (0i , a) over individual observations and joint actions. Note that often one form can be derived from the other; for instance, when an inverse observation model p(s
  • ⁇ ) is available, we can write Qi ( ⁇ , a) ⁇ se S p(s
  • ajoint policy ⁇ * ( ⁇ * ⁇ ) is a Nash equilibrium if no entity has an incentive to unilaterally change its policy; that is, no entity i would like to take at state s an action a ⁇ ⁇ * ⁇ (s ) assuming that all other entities stick with their equilibrium policies ⁇ *- ⁇ (s).
  • the policy can be negotiated among the entities as necessary. Negotiations are performed by means of interaction rounds with offers and counter-offers ending with accept or quit.
  • the offers and counter offers refers to suggestions for joint actions who's Q(s,a) of the joint action a is within the thresholds of offer acceptability of the involved entities.
  • Ai is the set of available actions of entity i.
  • Qi (0i , a) is the payoff function of entity i that is defined as
  • Including payment functions ⁇ is essential because we need to motivate the entity to participate in the mechanism; participation for an entity is not a priori the case.
  • a mechanism in which no entity is worse off by participating, that is, Qi (0i , a) > 0 for all i, 0i, and a, is called individually rational.
  • Figure 21 illustrates cognitive SON optimisation process
  • a communication node (or simply node) to observe its environment, deliberate, decide what actions to take, actuate its decisions and finally adapt to its environment. It's desirable that in due course the node learns the most optimal decision given a set of environment conditions and possibly some feedback.
  • An autonomous node is any device where decisions can be made.
  • the term communications nodes will be exemplifies by radio/wireless nodes which in cellular (mobile) networks refer to infrastructure nodes such eNBs and BSs and mobile nodes such as UE and mobile terminals.
  • a node implementing the steps depicted in Figure 21 implements cognitive SON.
  • Observation monitors the enviroment for observations ⁇ in order to derive the current state s (in its simplest form it monitors parameters and may/may not derive statistics from observed parameters)
  • Actuator execute actions or cooperates with other entities to collaborate i.e., exchange observations or to coordinate i.e. synchronize actions
  • Fig.22 visualises the interactions between two independent processes running in two separate autonomous nodes.
  • Sensing/Monitoring Functional Unit deals with the observation and state transmission modelling.
  • Configuration/Decision Making Functional Unit deals with the action-state modelling for action selection and valuation.
  • Optimisation Functional Unit deals with the optimisation of all models, functional units and optimal control of policies
  • Interaction Functional Unit deals with interaction modelling for negotiation and communication of decisions and execution/effectuation of selected actions.
  • Knowledge base consists of facts and rules describing the models required for the realisation of the cognitive SON engine.
  • Knowledge base can be a Functional Unit of its own or maintained and communicated between functional units as depicted above.
  • each node of the above identified functional units maintain a knowledge -base consisting of facts and rules.
  • the implementation of such a knowledge base can be part of the above modules or a separate functional entity updating and providing access to information.
  • Facts are represented by parameter-value pairs that build up a model of the environment and the-self i.e., the owner of the facts and the knowledge -base. Facts are used to represent information about Monitoring Parameterse.g., o the radio environment incl. load, interference etc o KPIs i.e., performance metrics Discovery Parameters o neighbouring nodes and neighbouring nodes capabilities, state etc.
  • Configuration parameters o Configuration settings e.g., transmitted power settings, etc
  • Rules are represented by parameter-value implications of premise-implies-conclusion (If ⁇ premise> then ⁇ conclusion>) type.
  • a premise may be a rule or a (conjunction of) fact(s), typically of monitoring types.
  • a conclusion can be a rule or a (conjunction of) fact(s), typically of configuration type.
  • the set of facts and rules represents a model of the environment in which the knowledge possesing entity interacts in and a model of the entity itself including its capabilities, objectives, roles, functions and actions.
  • Knowledge K consists of facts and rules
  • Facts reflects on apriori knowledge of the environment and the entity itself. It includes among others the system state S set, the actions A set that the entity itself and functions F set
  • a model of the external environment and the rules the environment obays can be described and stored in the knowledge base.
  • An observation and transition model of the environment can be also described in terms of environment states and transisitions between states due to events caused by external entities or due to actions taken by the CE itself.
  • the environment model is based on apriori and/or learned knowledge and presented by parameters or parameter functions.
  • Figure 25 illustrates a system according to the invention.
  • sensing/monitoring Two of the main roles of the sensing/monitoring is to sense and monitor observable parameters and collect short-term and long-term statistics on parameter values and performance measurements (infromation observing operation). to better describe the environment states i.e., to uniquely identify the state of the environement and define it accurately and in a concise way (information processing operation).
  • the task of the information observing operation is to update the state environment description p so as it reflects the actual environment at anyone time.
  • the information processing operation targets to learn the different states of the environment. This can be done in numerous ways including classifying the parameter-value pair ⁇ p, x(p)> combinations observed in the system by means e.g., decision trees. Decision trees classify instances of p by sorting them down the tree from the root to some leaf node, which provides the classification of the instance. Each node in the tree specifies a test of some parameter of p, and each branch descending from the node corresponds to one of the possible values for this parameter.
  • a instance of p is clasified by starting at the root node of the tree, testing the parameter specified by this node, then moving down the tree branch corresponding to the value of the parameter. This process is repeated for the subtree rooted at the new node.
  • decision trees represent a disjunction of conjunctions on the parameter values of instances. Each path from a tree root to a leaf corresponds to a conjunction of parameter tests, and the tree itself to a disjunctions of these conjunctions. The goal of a decision tree is to select the parameter that is most useful in classifying states. Parameter tests based on the measure of entropy can be used to characterise the (im)purity of an arbitrary collection of parameter p instances. Decision tree is only an example of classifying states.
  • Sensing/Monitoring FU contributes directly to ⁇ Observation model • Transition model
  • MCi monitoring-to-communication interface
  • MDi monitoring-to-decision interface
  • Figure 26 illustrates a system according to the invention. Configuration/Decision Making functions incl.
  • Device dependent o RRM at Base station o Sensor Element/Gateway Decision Part (Di) o Config control parameters (set) o Configuration info (get) - Execution part (Xi) o Configuration of device 3.5 Interaction Functional Unit
  • Figure 27 illustrates a system according to the invention.
  • Ci/Xi Cooperation-Communication / Execution interface
  • Figure 28 illustrates optimization functional unit.
  • Opimisation Functional Unit deals with an analysis part and a learning part.
  • the analysis/reasoning unit elaborates on the identification of relevant statistics, correlations and conditional probablities between states, observations, actions and any combination thereof.
  • the learnign unit is trying to learn from experince patterns in the world model that can assist in predicitions and optimal operation.
  • a cognitive SON engine CSONE is said to learn from experience E with respect to some class of tasks T and performance measure/metrics P, if its performance at tasks in T, as measured by P, improves with experience E.
  • a radio node that learns configuration of mobility optimisation might improve its mobility performance as measured by its ability to optimally configure mobility parameters through experience obtained by configuring mobility with its neighbours.
  • a well-defined learning problem requires a well-defined task, performance metric and training experience.
  • Design a learning approach involves a number of design choices, including choosing the type of training experience, the target (or objective) function to be learned, a representation for this function and an algorithm for learning the target function from training examples.
  • learning involves searching through a space of possible hypotheses H to find the hypothesis h that best fit the available training examples D and other prior constraints or knowledge.
  • h In terms of SON functionality at any one time t, h, would correspond to a state s, and D to the current set observations ⁇ .
  • Much of the above optimisation and control functionality is performed by learning methods that search different hypothesis spaces (e.g., numerical functions, decision trees, neural networks, policies, rules) based on different conditions under which these search methods converge toward an optimal hypotheses.
  • search different hypothesis spaces e.g., numerical functions, decision trees, neural networks, policies, rules
  • Operation of optimisation control is performed by learning new facts and rules or by modifying existing rules to improve performance.
  • optimisation methods aims at learning a policy that maps any state of the system to an optimal set of actions according to the objectives of the optimising entity/function(s).
  • the optimising entity is able to efficiently learn - all distinguishable states of the system and the differentiating parameters an accurate model of environment and the rules governing it for future predictions all transitions between different system states an optimal course of sequential and/or joint parallell actions to achieve control and operation optimisation a set of rules that guarantees efficient and stable operation and fast convergence as the system state changes.
  • the goal of the state optimisation is to identify the set of parameters that for a given objective (or set of objectives) concisely differentiates between any two states of the system.
  • Bayesian Learning can be applied to identify the conditioning and the correlations between parameters indicative to a system state.
  • the output of the state optimisation is concise descriptions of system states where organisation patterns and operation patterns are uniquelly identified.
  • An optimised solution is a solution where each state is described by a minimum number of parameter-value pairs.
  • Another objective of the state optimisation is that facts and rules i.e., the model renders accurately the environment at any one time. Updating the facts to reflect the state of the environment optimally requires
  • the goal of the state optimisation is to identify the set of parameters that for a given objective (or set of objectives) concisely differentiates between any two states of the system.
  • Radio Learning refers to the ability of radio nodes to learn from their environment and their interactions with other radio nodes.
  • Learning aims at identifying an optimal set of actions for which the radio node and the overall network perform best.
  • An action typically corresponds to value settings of configuration parameters/variables.
  • the performance of the system is evaluated by means of an objective function which corresponds to the total reward or payoff or utility.
  • the learning is performed by means of sophisticated trial and error searching among all possible parameter value combinations.
  • Q- Learning Q- Learning
  • RL can be used by a controller to estimate based on past experience, the expected payoff/reward associated with their actions.
  • One particular implementation of RL is
  • Q-value, Q(s,a) is a value function that provides a numerical estimate of the value of performing an individual action a at a given state s of the environment.
  • the controller updates its estimate Q(s,a), based on sample (a, r) as follows:
  • the sample (a, r) is the experience obtained by the base station: action a was performed resulting in payoff/reward r.
  • is the learning rate (0 ⁇ ⁇ ⁇ 1), governing to what extent the new sample replaces the current estimate. Assuming infinite number of iterations the algorithm converges to an Q(a).
  • k may vary with the number of iterations so that the optimiser favors exploration in the early stages of learning, to gradually shift towards more exploitation.
  • transmit power (p) and antenna tilt (a) will be configured and optimised according to the traffic of a cells area.
  • transmit power is assumed constant and the Q-values for different antenna tilt angles are learned
  • Figure 30 illustrates an example of learning according to the invention.
  • Utilities are used to guide for the selection of the optimal action as described by the utility optimisation next.
  • a utility function evaluates the state of the environment. It maps a state to a scalar value indicating how good the state is. By comparing the scalar, we can compare how good different states are.
  • An non-aggregating function that is non-pareto based o E.g., user-defined ordering where the objectives are ranked according to the order of importance of the desinger
  • negotiation strategy is a sequence of actions taken in a negotiation process consisting of offers, counter-offers, accept or quit.
  • bayesian belief networks can be used as efficient updating mechanisms. Given the domain knowledge in the form of conditional statements and the signal e in the form of offers then offer recipient can use the standard bayesian updating rule to revise the desirable outcome of the offerer.
  • N cells implementing control with full information sharing and simultaneous actions.
  • KPI Key Performance Indicators
  • the utility function enables the comparison of different sets of KPI (different states)
  • All CSONE-equipped nodes are communicating via its interaction units. Interactions with non-CSONE nodes are performed in their entirety by means of the execution unit. Interactions between CSONE nodes can be perfromed either by means of communication/cooperation unit.
  • ⁇ Functions refer only to control and information e.g., RRM functions, etc
  • CSONE centralised architecture facilitates centralized control performed by a central entity, e.g., O&M etc., that may operate in the following way, it:
  • the model maintained by a central entity as envisaged above induces full knowledge of the world and the nodes the central entity monitors, controls, interacts and optimises.
  • FIG. 32 A deployment of the above architecture consisting only of CSONE entities is illustrated the figure 32.
  • CSONE distributed architecture facilitates distributed control performed by CSONE nodes each one:
  • the model maintained by each entity implies partial knowledge of the world pertinent to the local environment of the entity i.e., the entity itself and the neighbours within reach.
  • Working towards full knowledge requires information exchange by means of observations, state descriptions and statistics, action selection and evaluation and interactions.
  • Figure Hybrid 1 (Fig. 35): CSONE hybrid Architecture of central coordination
  • Figure Hybrid 2 (Fig. 36): CSONE hybrid Architecture of distributed coordination Cognitive SON hybrid Architecture (as illustrated in the figures above) Many possible options ⁇ central coordination Fig. Hybrid 1
  • Hierarchical structures or central and distributed coordination at each level of the hierarchy facilitates hierarchical structure of control that combines centralized control or distributed control at any level of hierarchy and any order.
  • central control at the root of the hierarchy the architecture is said to perform central coordination control as in fig.
  • Hybrid 1 In case of distributed control is said to perform distributed coordination control as in fig.
  • Models at higher level of hierarchy are closer to the management operation and models maintained by lower levels of abstraction are closer to the functional operation of networks or node functions.
  • a deployment of the above architecture consisting only of CSONE entities is illustrated in figure 37.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

L'invention porte sur un procédé de commande d'un réseau de télécommunication, le réseau comprenant au moins un dispositif conçu pour une interaction concernant des paramètres de configuration de réseau. L'invention porte également sur des exemples de systèmes d'apprentissage et d'optimisation de réseau durant une durée d'exécution, facilitant une adaptation à un état de système.
EP11794062.7A 2010-12-03 2011-11-22 Procédé et appareil de communication Withdrawn EP2647239A1 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
SE2010000285 2010-12-03
SE2010000287 2010-12-06
PCT/EP2011/070631 WO2012072445A1 (fr) 2010-12-03 2011-11-22 Procédé et appareil de communication

Publications (1)

Publication Number Publication Date
EP2647239A1 true EP2647239A1 (fr) 2013-10-09

Family

ID=45315737

Family Applications (1)

Application Number Title Priority Date Filing Date
EP11794062.7A Withdrawn EP2647239A1 (fr) 2010-12-03 2011-11-22 Procédé et appareil de communication

Country Status (3)

Country Link
EP (1) EP2647239A1 (fr)
CN (1) CN103548375A (fr)
WO (1) WO2012072445A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11800398B2 (en) 2021-10-27 2023-10-24 T-Mobile Usa, Inc. Predicting an attribute of an immature wireless telecommunication network, such as a 5G network

Families Citing this family (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110090820A1 (en) 2009-10-16 2011-04-21 Osama Hussein Self-optimizing wireless network
EP2754271B1 (fr) 2011-09-09 2019-11-13 Reverb Networks Inc. Procédés et appareil pour mettre en oeuvre un gestionnaire de réseaux à optimisation-organisation automatique
US9258719B2 (en) 2011-11-08 2016-02-09 Viavi Solutions Inc. Methods and apparatus for partitioning wireless network cells into time-based clusters
WO2013123162A1 (fr) * 2012-02-17 2013-08-22 ReVerb Networks, Inc. Procédés et appareil de coordination dans des réseaux à plusieurs modes
US20140120930A1 (en) * 2012-10-31 2014-05-01 Nokia Siemens Networks Oy Method, Apparatus, Computer Program Product and System for Communicating Predictions
CN103068058B (zh) * 2012-12-24 2015-08-26 中国人民解放军总参谋部第六十一研究所 一种基于双层环路模型的无线资源调度方法
EP2750432A1 (fr) * 2012-12-28 2014-07-02 Telefónica, S.A. Procédé et système permettant de prédire l'utilisation d'un canal
EP3001887B1 (fr) 2013-03-25 2023-03-08 DZS Inc. Procédé et appareil pour mettre en ouvre une découverte et une commande de système sans fil utilisant un espace d'états
EP2986048B1 (fr) * 2013-05-02 2020-02-26 Huawei Technologies Co., Ltd. Appareil, dispositif et procédé d'optimisation de réseau
US10412601B2 (en) 2013-06-13 2019-09-10 Nokia Solutions And Networks Oy Coordination in self-organizing networks
CN103442368B (zh) * 2013-09-09 2016-03-30 哈尔滨工业大学 认知无线系统中基于潜在博弈的频谱分配方法
GB2524583B (en) * 2014-03-28 2017-08-09 Kaizen Reaux-Savonte Corey System, architecture and methods for an intelligent, self-aware and context-aware digital organism-based telecommunication system
CN105532031B (zh) * 2014-06-05 2019-12-17 华为技术有限公司 资源优化的方法和装置
WO2016026509A1 (fr) * 2014-08-18 2016-02-25 Telefonaktiebolaget L M Ericsson (Publ) Technique pour la gestion de regles pour le fonctionnement d'un reseau a auto-organisation
US9456362B2 (en) 2015-01-19 2016-09-27 Viavi Solutions Uk Limited Techniques for dynamic network optimization using geolocation and network modeling
US9113353B1 (en) 2015-02-27 2015-08-18 ReVerb Networks, Inc. Methods and apparatus for improving coverage and capacity in a wireless network
US9392471B1 (en) * 2015-07-24 2016-07-12 Viavi Solutions Uk Limited Self-optimizing network (SON) system for mobile networks
CN105391490B (zh) * 2015-10-20 2019-02-05 中国人民解放军理工大学 一种基于认知的卫星通信网络选择算法
US20170255863A1 (en) * 2016-03-04 2017-09-07 Supported Intelligence, LLC System and method of network optimization
WO2018068857A1 (fr) 2016-10-13 2018-04-19 Huawei Technologies Co., Ltd. Procédé et unité de gestion de ressources radio utilisant un apprentissage de renforcement
CN107425997B (zh) * 2017-03-27 2019-08-06 烽火通信科技股份有限公司 类人网的网络架构及实现方法
CN110770761B (zh) * 2017-07-06 2022-07-22 华为技术有限公司 深度学习系统和方法以及使用深度学习的无线网络优化
US10375585B2 (en) 2017-07-06 2019-08-06 Futurwei Technologies, Inc. System and method for deep learning and wireless network optimization using deep learning
CN109308246A (zh) * 2017-07-27 2019-02-05 阿里巴巴集团控股有限公司 系统参数的优化方法、装置及设备、可读介质
CN107948984B (zh) * 2017-11-13 2021-07-09 中国电子科技集团公司第三十研究所 一种适用于自组织网络的基于主被动感知结合的认知系统
CN111050330B (zh) * 2018-10-12 2023-04-28 中兴通讯股份有限公司 移动网络自优化方法、系统、终端及计算机可读存储介质
CN111835545B (zh) * 2019-04-22 2023-03-24 中兴通讯股份有限公司 一种网络的自适应配置方法和装置
CN112188505B (zh) * 2019-07-02 2024-05-10 中兴通讯股份有限公司 一种网络优化方法和装置
WO2021190772A1 (fr) 2020-03-27 2021-09-30 Telefonaktiebolaget Lm Ericsson (Publ) Politique d'optimisation de paramètres de cellule
WO2021213644A1 (fr) * 2020-04-22 2021-10-28 Nokia Technologies Oy Mécanisme de coordination et de commande pour résolution de conflit pour des fonctions d'automatisation de réseau
WO2021244765A1 (fr) 2020-06-03 2021-12-09 Telefonaktiebolaget Lm Ericsson (Publ) Amélioration de l'utilisation d'un réseau de communication
EP3944562A3 (fr) * 2020-07-24 2022-03-23 Nokia Technologies Oy Procédés et appareils pour déterminer la configuration optimale dans des réseaux autonomes cognitifs
FI20205781A1 (en) 2020-08-04 2022-02-05 Nokia Technologies Oy MACHINE LEARNING BASED ANTENNA PANEL WIRING
CN112039767B (zh) * 2020-08-11 2021-08-31 山东大学 基于强化学习的多数据中心节能路由方法及系统
EP4252470A1 (fr) * 2020-11-24 2023-10-04 Telefonaktiebolaget LM Ericsson (publ) Paramètre de réseau pour réseau cellulaire basé sur la sécurité
WO2022123292A1 (fr) * 2020-12-09 2022-06-16 Telefonaktiebolaget Lm Ericsson (Publ) Apprentissage de renforcement coordonné décentralisé pour optimiser des réseaux d'accès radio
US20240119300A1 (en) * 2021-02-05 2024-04-11 Telefonaktiebolaget Lm Ericsson (Publ) Configuring a reinforcement learning agent based on relative feature contribution
WO2023022679A1 (fr) * 2021-08-14 2023-02-23 Telefonaktiebolaget Lm Ericsson (Publ) Assurance de qualité de service 5g industrielle par mise en correspondance de processus de décision markovien
WO2023031098A1 (fr) * 2021-09-02 2023-03-09 Nokia Solutions And Networks Oy Dispositifs et procédés de génération d'a priori
WO2023138776A1 (fr) * 2022-01-21 2023-07-27 Huawei Technologies Co., Ltd. Appareil et procédé d'apprentissage distribué pour réseaux de communication
FR3140729A1 (fr) * 2022-10-11 2024-04-12 Commissariat A L'energie Atomique Et Aux Energies Alternatives Méthode de gestion de ressources radio dans un réseau cellulaire au moyen d’une cartographie hybride de caractéristiques radio

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6829491B1 (en) 2001-08-15 2004-12-07 Kathrein-Werke Kg Dynamic and self-optimizing smart network
US7466652B2 (en) * 2003-08-14 2008-12-16 Telcordia Technologies, Inc. Auto-IP traffic optimization in mobile telecommunications systems
JP5029025B2 (ja) * 2007-01-18 2012-09-19 日本電気株式会社 無線基地局装置および無線リソース管理方法
CN101488880B (zh) * 2008-01-16 2012-03-14 北京航空航天大学 一种提高服务组合可信性的自适应维护方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BATTITI R: "USING MUTUAL INFORMATION FOR SELECTING FEATURES IN SUPERVISED NEURAL NET LEARNING", IEEE TRANSACTIONS ON NEURAL NETWORKS, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 5, no. 4, 1 July 1994 (1994-07-01), pages 537 - 550, XP000460492, ISSN: 1045-9227, DOI: 10.1109/72.298224 *
See also references of WO2012072445A1 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11800398B2 (en) 2021-10-27 2023-10-24 T-Mobile Usa, Inc. Predicting an attribute of an immature wireless telecommunication network, such as a 5G network

Also Published As

Publication number Publication date
CN103548375A (zh) 2014-01-29
WO2012072445A1 (fr) 2012-06-07

Similar Documents

Publication Publication Date Title
EP2647239A1 (fr) Procédé et appareil de communication
Morocho-Cayamcela et al. Machine learning for 5G/B5G mobile and wireless communications: Potential, limitations, and future directions
Szott et al. Wi-Fi meets ML: A survey on improving IEEE 802.11 performance with machine learning
Wang et al. Artificial intelligence-based techniques for emerging heterogeneous network: State of the arts, opportunities, and challenges
Pasandi et al. Challenges and limitations in automating the design of mac protocols using machine-learning
Fourati et al. Comprehensive survey on self-organizing cellular network approaches applied to 5G networks
Kaloxylos et al. AI and ML–Enablers for beyond 5G Networks
Matinmikko et al. Fuzzy-logic based framework for spectrum availability assessment in cognitive radio systems
Karunaratne et al. An overview of machine learning approaches in wireless mesh networks
Ashtari et al. Knowledge-defined networking: Applications, challenges and future work
Abbasi et al. Deep Reinforcement Learning for QoS provisioning at the MAC layer: A Survey
Cheng et al. Deep learning for wireless networking: The next frontier
Caso et al. User-centric radio access technology selection: A survey of game theory models and multi-agent learning algorithms
Rojas et al. A scalable SON coordination framework for 5G
Zheng et al. An adaptive backoff selection scheme based on Q-learning for CSMA/CA
Meshkova et al. Designing a self-optimization system for cognitive wireless home networks
Flushing et al. Relay node placement for performance enhancement with uncertain demand: A robust optimization approach
Arnous et al. ILFCS: an intelligent learning fuzzy-based channel selection framework for cognitive radio networks
Burgueño et al. Distributed deep reinforcement learning resource allocation scheme for industry 4.0 device-to-device scenarios
Pandey Adaptive Learning For Mobile Network Management
Höyhtyä et al. Cognitive engine: design aspects for mobile clouds
Galindo-Serrano et al. Managing femto to macro interference without X2 interface support through POMDP
Sanusi Radio resource management techniques for industrial IoT in 5G-and beyond networks
Jia et al. Digital Twin Enabled Intelligent Network Orchestration for 6G: A Dual-Layered Approach
Wang et al. Cognitive networks and its layered cognitive architecture

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20130530

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAX Request for extension of the european patent (deleted)
RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: HUAWEI TECHNOLOGIES SWEDEN AB

17Q First examination report despatched

Effective date: 20170503

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20170914