WO2022182272A1

WO2022182272A1 - Method for communications network analysis using trained machine learning models and network topography information

Info

Publication number: WO2022182272A1
Application number: PCT/SE2021/050158
Authority: WO
Inventors: Pedro BATISTA; Marios DAOUTIS; Konstantinos Vandikas; Alessandro Previti; Yifei JIN; Aneta VULGARAKIS FELJAN
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 2021-02-25
Filing date: 2021-02-25
Publication date: 2022-09-01

Abstract

A method for communications network analysis comprises obtaining one or more trained machine learning (ML) models of one or more types of components of the communications network, and using the obtained trained ML models and network topography information to generate a target ML model that represents a network topography of the communications network. Values of one or more Key Performance Indicators (KPIs) of the communications network are predicted using the generated target ML model, and values of one or more KPIs of the communications network are monitored at network endpoints. The predicted and monitored KPI values are used to detect when one or more of the monitored KPI values deviate from the corresponding predicted KPI values. When it is detected that a monitored KPI value from among the one or more monitored KPI values deviates from the corresponding predicted KPI value, components within the communications network that are candidates for the cause of the deviation of the monitored KPI value are identified.

Description

Method for communications network analysis using trained machine learning models and network topography information

Technical Field

Embodiments described herein relate to methods and apparatus for communications network analysis, in particular methods and apparatus applying machine learning and machine reasoning techniques to communications network analysis.

Background

In order to oversee the operation of a complex communications network, such as a mobile telecommunications network that may be found in a metropolitan area or an Internet of Things (loT) network comprising a large number of loT devices (such as a remote sensor network), it is necessary to analyse the operation of thousands of nodes. Existing systems for monitoring such a network typically rely on low-sampled (samples every few seconds) high-level aggregated data (for example, port-level packet counters) of nodes. The low-sampled, high- level aggregated data may be transferred to a central location for analysis. The collection and analysis of data in this way can be resource intensive, however given sufficient analysis, the data typically allows for some level of anomaly detection and root cause analysis of key performance indicator (KPI) degradation.

Existing analysis systems (for example, see “Adaptive fault detection approaches for dynamic mobile networks” by Liu, D. and Payton, J., 2011 IEEE CCNC, available at https://ieeexplore.ieee.org/document/5766588 as of 18 February 2021) are not compatible with the increasing need for network automation in 3rd Generation Partnership Project (3GPP) 5th Generation (5G) technology, particularly where network slicing is implemented. Network slices are a category of logical network, which provide specific network capabilities and network characteristics. Logical networks are a form of Software Defined Networks (SDN) and may also be referred to as virtual networks. SDNs essentially decouple the network control functions (the control plane) from the data forwarding functions (the data plane), introducing a degree of separation between control of the physical components forming the network infrastructure (nodes, cables, etc.) and the overall network control. In SDN, data transfer services can be used to provide a user with a data connection between two points, without requiring the user to have detailed knowledge of exactly which components of the network are responsible for providing the connection. As such, a data transfer service can be used to satisfy the data traffic requirements of a user, such as transferring a given volume of data traffic between two points at a given rate, with a given reliability, and so on. Using network slicing allows different applications with different requirements to co-exist while keeping their respective Quality of Service (QoS) guarantees; the QoS guarantees may contain threshold standards relating to latency, throughput, reliability, security, and so on, where penalties may be imposed upon an infrastructure provider that fails to satisfy an agreed QoS guarantee with a Mobile Virtual Network Operator, MVNO. Different QoS guarantees may be in place with different operators, as agreed in Service Level Agreements (SLAs) with the operators. Network slicing technology creates multiple logical networks on top of the traditionally monitored nodes.

Existing network analysis systems typically rely on statistical analysis to compare samples of performance measurements of a given node behaving well against measurements of interest (that is, live data from the given node); in this way, anomalous performance of the given node may be detected. An intrinsic problem with existing network analysis systems is the ongoing requirement for direct measurements of nodes, which is expensive even for networks without network slicing. Where network slicing is in use the problem is compounded, as measurements are typically required for each node and each slice to perform a full analysis. Even where measurements on a per node and per slice basis are available, increasing the amount of monitoring data consumes expensive communication resources that could be used to deliver services to users. Further, in order to allow issues to be swiftly detected and corrected to ensure network high-availability and zero-touch operation (in which a network operator specifies an intent for network, then leaves the network under computer control with no further network operator control/supervision) requires acquiring node and slice data with a higher sampling rate than may have been used in previous systems. This higher sampling rate may allow network controllers to react fast or proactively to avoid KPI degradation but has the unavoidable consequence of further increasing the consumption of communication resources.

Summary

It is an object of the present disclosure to provide a method, apparatus and computer readable medium which at least partially address one or more of the challenges discussed above. In particular, it is an object of the present disclosure to provide network analysis methods and apparatuses that may identify unexpected behaviour of a component in a communications network without requiring measurements of performance on an individual component level. According to an aspect of an embodiment there is provided a method for network analysis of a communications network. The method comprises obtaining one or more trained machine learning (ML) models of one or more types of components of the communications network, and generating a target ML model that represents a network topography of the communications network, wherein the generation of the target ML model uses network topography information and the trained ML models of types of components. The method further comprises predicting values of one or more Key Performance Indicators, KPIs, of the communications network using the generated target ML model, monitoring values of one or more KPIs of the communications network at network endpoints, and detecting when one or more of the monitored KPI values deviate from the corresponding predicted KPI values. When it is detected that a monitored KPI value from among the one or more monitored KPI values deviates from the corresponding predicted KPI value, the method further comprises identifying components within the communications network that are candidates for the cause of the deviation of the monitored KPI value, the identification being made using the generated target ML model and the monitored values of one or more KPIs at the network endpoints. Through creation of a target ML model using trained ML models of components and the monitoring of KPI values at network endpoints, the method allows for the identification of potentially defective components without requiring measurements to be taken on a per component level.

The target ML model may be an arrangement of interconnected ML models of components connected to replicate the network topography, and the identification of components within the communications network that are candidates for the cause of the deviation of the monitored KPI values from the predicted KPI values may comprise using model based diagnostics, wherein the target model is encoded as a constraint satisfaction problem. In this way components that may be the cause of deviation may be effectively identified using logical reasoning techniques.

One or more component candidates for the cause of deviation of the monitored KPI value from the corresponding predicted KPI value may be output, wherein the candidates selected for output may have a likelihood above a predetermined likelihood threshold of being the cause of deviation of the monitored KPI value from the corresponding predicted KPI value are output and/or may be among a predetermined number of candidates selected for output in order of likelihood of being the cause of deviation of the monitored KPI value from the corresponding predicted KPI value. Outputting the candidates in this way may be of particular use where it is desired to undertake maintenance on the network to rectify any potential problems. The communication network may be a telecommunications network or an Internet of Things, loT, network. Aspects of embodiments may be particularly well suited to use with communications networks of these types.

According to a further aspect of an embodiment there is provided a network analyser for performing network analysis of a communications network, the network analyser comprising processing circuitry and a memory containing instructions executable by the processing circuitry. The network analyser is operable to obtain one or more trained machine learning, ML, models of one or more types of components of the communications network, and generate a target ML model that represents a network topography of the communications network, wherein the generation of the target ML model uses network topography information and the trained ML models of types of components. The network analyser is further operable to predict values of one or more Key Performance Indicators, KPIs, of the communications network using the target ML model, monitor values of one or more KPIs of the communications network at network endpoints, and detect when one or more of the monitored KPI values deviate from the corresponding predicted KPI values. The network analyser is further operable, when it is detected that a monitored KPI value from among the one or more monitored KPI values deviates from the corresponding predicted KPI value, to identify components within the communications network that are candidates for the cause of the deviation of the monitored KPI value, the identification being made using the target ML model and the monitored values of one or more KPIs at the network endpoints. The network analyser may provide some or all of the benefits discussed above in the context of methods in accordance with aspects of embodiments.

Certain embodiments may provide one or more of the following technical advantage of identifying unexpected behaviour of a component in a target communications network without requiring measurements of performance on an individual component level, through the use of ML modelling of both network components and subsequently of the target communications network. Candidates for the cause of anomalous network behaviour may be identified using model-based diagnosis methods. Accordingly, the load on networks that may be imposed by component and/or slice level measurements throughout the network may be reduced, and the human labour required to supervise network operations may also be reduced. The network analysis methods and apparatuses are applicable to a broad range of networks, but are particularly well suited to telecommunications and loT networks; telecommunications and loT networks tend to be complex and involve a large number of components, so the efficiencies provided by methods and analysis in accordance with aspects of embodiments can be particularly beneficial. Brief Description of Drawings

The present disclosure is described, by way of example only, with reference to the following figures, in which:-

Figure 1 is a flowchart of a method in accordance with aspects of embodiments;

Figures 2A and 2B are schematic diagrams of network analysers in accordance with aspects of embodiments;

Figure 3 is a schematic diagram of a target ML model in accordance with an aspect of an embodiment;

Figure 4 is a schematic diagram of a further target ML model in accordance with an aspect of an embodiment;

Figure 5 is a visualisation of an example of a unified model in accordance with an aspect of an embodiment; and

Figures 6A and 6B are a scheduling diagram showing an overview of steps that may be performed in accordance with an aspect of an embodiment.

Detailed Description

For the purpose of explanation, details are set forth in the following description in order to provide a thorough understanding of the embodiments disclosed. It will be apparent, however, to those skilled in the art that the embodiments may be implemented without these specific details or with an equivalent arrangement.

As discussed above, existing systems for network analysis typically rely on the ongoing measurement of each component in a network in order to detect when a component is malfunctioning. An alternative option for network analysis relies on creating a model of a given network that that describes the expected impact each component has on a given end-to-end measurement (KPI) of interest, wherein the model is created and maintained by human experts. Although this type of modelling may be suitable for smaller networks, the creation and maintenance of such models tends to be extremely time consuming and creating/maintaining models for more complex networks is often not feasible. Accordingly, it is desirable to provide an alternative means for modelling communications networks, thereby facilitating analysis of said networks. Aspects of embodiments provide network analysis methods and apparatus, which implement a three-stage process to identify components which may be responsible for deviations in monitored KPI values from expected KPI values. Aspects of embodiments use verification tools (such as model-based diagnosis) to identify unexpected component behaviour without examining performance measurements of individual components (only end-to-end KPIs are used). In the first of the three stages ML models of individual components are obtained (this may include the creation of the ML models, if suitable models are not already available). In the second stage, given a description of a communications network (that is, a network topography) the previously obtained component ML models are combined to produce a target ML model that represents the communications network of interest. In the third stage, model- based diagnosis tools are used to identify components that might be the cause of any deviations in monitored KPI values from expected KPI values. Accordingly, aspects of embodiments may be used to identify misbehaving components of a network without requiring ongoing component measurements, and without requiring human experts to create and maintain a network model.

A method in accordance with aspects of embodiments is illustrated in the flowchart of Figure 1. The method may be executed by any suitable apparatus. Examples of suitable apparatuses 20A, 20B in accordance with aspects of embodiments that are suitable for executing the method are shown schematically in Figure 2A and Figure 2B. One or more of the apparatuses 20A, 20B shown in Figures 2A and 2B may be incorporated into a system, for example, where the system is all or part of a telecommunications network, one or more of the apparatuses 20A, 20B used to execute the method may be incorporated into a base station, core network node or other network component.

As shown in step S101 of Figure 1 the method comprises obtaining one or more trained machine learning (ML) models of one or more types of components of a communications network. The trained ML models each model the internal behaviour of a particular component, for example, a particular type of router, base station, network node, network switch, firewall, and so on. The exact components used in a network varies for different types of networks; loT communications networks utilise different components to mobile telecommunication networks, for example. The number of different trained ML models of components required is determined by the number of different types of components in the communications network to be modelled; each type of component may be modelled separately. The trained ML models may be obtained from a database of component models, if a suitable database containing the ML models required is available. Alternatively, the models may be obtained using measurements of individual component inputs and outputs, wherein the measurements of component inputs and outputs may be used to generate the required ML models. The measurements of component inputs and outputs may be obtained from a database of component measurements and/or through measurements on components (for example, in a training network used for that purpose). A network used to obtain the component measurements may be a physical network, or may be a simulation of a physical network. Any suitable emulator, as will be familiar to those skilled in the art, may be used to provide a simulation of a physical network. An example of a suitable emulator is ns3, a discussion of which is available at https://www.nsnam.org/ as of 12 February 2021.

Once created, the ML models of components may be used as required, and may be retained in a database for use in modelling subsequent networks (such as the database referred to above). Where a network analyser 20A in accordance with the aspect of an embodiment shown in Figure 2A is used, the trained ML models may be obtained (from a database or using measurements) in accordance with a computer program stored in a memory 22, executed by a processor 21 in conjunction with one or more interfaces 23. Alternatively, where a network analyser 20B in accordance with the aspect of an embodiment shown in Figure 2B is used, the trained ML models may be obtained (from a database or using measurements) by the obtainer 24.

In order to obtain the models of components from data, it is necessary to consider how a component operates. Using the example of measurement data collected from a training network (either a physical network or network simulation, as discussed above), the training network may comprise a number of components ( m ) forming component set M = (m₀, ... , m_M}. The training network may support a number of slices S = (s₀, ... , s_s}, and may be monitored using KPIs A = {a₀, ...,a_A}. The slice-level input load metrics may be L = {l₀, ... , l_L}, and slice level output load metrics may be O = {o₀, ..., o₀}. The component attributes may be P = { p₀ , ...,p_P} and slice-level attributes may be Q = { q₀ , ..., q_Q}.

By taking measurements of the training network, it is possible to determine a^m(s), the contribution of component m for KPI a of slice s, for each combination of component, KPI and slice. Examples of KPIs that may be monitored include latency, throughput, error rates, and so on. It is also possible to determine l^m(s), the slice-level load metric l imposed at component m by slice s. An example of l is the number of bytes received by the component at the slice virtual link within a time interval. The related metric o ^m(s), which is the slice- level load metric o outputted by component m for slice s and which reflects how the metric l was affected by the component may also be determined. As an example, o may be the number of bytes transmitted at the slice virtual link, wherein the component may have dropped/queued some of the l received packets, so load input and load output for a component are not necessarily the same. The component attributes p^m, for component m may also be determined; examples of attributes p include configuration information such as how component resources are prioritized among slices, link capacities (in bits per second), supported protocols, and so on. Similarly q(s ), the slice-level attributes q for slice s, may also be determined; examples of attributes q include slice priority level, service type (for example, autonomous driving, security footage, and so on) connection type (for example, peer to peer, P2P), and so on.

Using the measurements as discussed above, a ML model R for component m can be developed, wherein:

R( l^m,p^m,q) = (o^m,a^m )

Accordingly, using model R and given the input load, component attributes and slice attributes, the output load and KPIs may be obtained. Data from the training network may be used to model all components m; each type of component being modelled using a different ML component model. As the measurements include information relating to all the slices, e.g. l^m is multidimensional and represents all the input for the component, from all the slices in S and for all metrics L, , the ML model can learn, e.g. how resources are shared among slices in case of congestion (given, e.g. its configuration p^m and slice priorities q). Any suitable ML algorithm may be used to produce the ML models, as will be familiar to those skilled in the art; an example of a suitable algorithm type is a neural network.

Table 1 below is an example of measurement data of the type that may be obtained from a training network and used to produce ML models. The data indicated may be obtained from more than one example of a particular type of component, that is, if the training network contains three routers of a given type, data from all three of these routers may be used to develop a ML model for that type of router. For simplicity, no component attributes are considered in the measurement data shown in Table 1 (so p = 0). Lower slice priority (q) values indicate higher priority. The input and output loads are given in megabytes per second (Mbps), and the latency value (a) is given in milliseconds (ms).

TABLE 1

The data sample contained in Table 1 indicates how the latency is affected by the load and the priority level. The values of a represent the amount of latency that is added by the router. A ML model may use this information to learn to model the relationship between latency, load and priority (and further information; the data in table 1 is a sample of a larger data set that may be used to train a ML model). In the Table 1 example, the router is adding an additional latency of 2ms when the load is greater or equal to 400Mbps and there is also a dependency with the priority level. The ML model learns to represent the expected behaviour of the router using the measured data.

Once the one or more trained ML models of components have been obtained, a target ML model of the communications network to be analysed is generated, as shown in step S102. The target ML model is generated by combining the individual component models, in a way that replicates the network topography of the communications network. The network topography information may indicate: connections between components in the target network; if logical networks are used, which components support which logical networks; physical separations between components (which may be relevant for latency calculations); hardware specific details (such as when components were installed, maintenance history, software history); and so on.

In some aspects of embodiments, the target ML model may be an arrangement of interconnected ML models of components, replicating the arrangement of components in the communications network to be analysed. The ML models of components may be interconnected such that, for example, the output of a given first ML model of a component is used as the input of a given second ML model of a component. Where a network analyser 20A in accordance with the aspect of an embodiment shown in Figure 2A is used, the target ML model may be generated in accordance with a computer program stored in a memory 22, executed by a processor 21 in conjunction with one or more interfaces 23. Alternatively, where a network analyser 20B in accordance with the aspect of an embodiment shown in Figure 2B is used, the target ML models may be generated by the model generator 25.

Figure 3 is a schematic diagram of a target ML model, that illustrates how a target ML model may be used to estimate end to end latency for a slice. In this example three instances of component model R are used as each of the three components in the model is of the same type, therefore the same model may be used for each component. The only measurement required is the input load imposed by the slice l°(s); using this information the model can generate the load that is passed from one node to the other, as well as the impact each node has on KPIs (in this example, latency is calculated). The target ML model then can compute the expected slice end-to-end KPI a for slice s if everything is working properly as

When the target ML model has been generated, the target ML model can then be used to predict values of one or more KPIs of the target communications network, as shown in step S103. This step requires knowledge of the load values input into the target network, but does not require measurements from any components within the network. Essentially, the inputs into the communication network (load values) are entered into the target ML model, and the model processes these inputs to predict one or more KPI values (latency, throughput, and so on; the KPIs of interest are typically dependent on the function of the target network). Where a network analyser 20A in accordance with the aspect of an embodiment shown in Figure 2A is used, the one or more KPIs may be predicted in accordance with a computer program stored in a memory 22, executed by a processor 21 in conjunction with one or more interfaces 23. Alternatively, where a network analyser 20B in accordance with the aspect of an embodiment shown in Figure 2B is used, the one or more KPIs may be predicted by the predictor 26.

A further example of a target ML model is illustrated in Figure 4; this target ML model is slightly more complex than the target ML model shown in Figure 3. In the Figure 4 example, again three instances of the same component (here a router) are included in the target network. The predicted output o₃ at Rrouter₃ is obtained by propagating the input load values l₁, l₂ received in input at Rrouter1 and Rrouter₂ respectively through the target ML model. The component ML models of the routers are obtained as discussed above, while the connections between the inputs and outputs of the routers are obtained from the network topography. The network topography also indicates that the latency resulting from the link between the routers is 5ms. As shown in Figure 4, the I input load values l₁ l₂ received in input at Rrouter₁ and Rrouter₂ respectively are 300Mbps and 200Mbps respectively. Given these input loads, the latency added by each of Rrouter₁ and Rrouter₂ is 0. The input load l₃ at Rrouter₃ is 500Mbps (this is given by o ₁ + o₂ ), and the latency added by Rrouter₃ is 2ms. Accordingly, for the data input into Rrouter₁ and output from Rrouter₃, the target ML network predicts a total latency of 7ms (5ms link latency + 2ms). Both Figures 3 and 4 are simplified examples; in a target ML model based on a typical real world communications network, the output of Rrouter₃ may pass to another component ML model and so on.

The KPI values at network endpoints are then monitored, as shown in step S104. Again, only measurements at the endpoints of the target network are required; no measurements are required on a component by component level. The frequency with which measurements are obtained is determined by the requirements of the communication network, which may be determined, for example, by QoS guarantees to users of the network. Where a network analyser 20A in accordance with the aspect of an embodiment shown in Figure 2A is used, the one or more KPIs may be monitored at network endpoints in accordance with a computer program stored in a memory 22, executed by a processor 21 in conjunction with one or more interfaces 23. Alternatively, where a network analyser 20B in accordance with the aspect of an embodiment shown in Figure 2B is used, the one or more KPIs may be monitored at network endpoints by the monitor 27. Typically, the measurements are received from the endpoints at the network analyser via the communications network, although another transmission means may also be used.

Once obtained, the one or more monitored KPI values may then be compared with the corresponding predicted KPI values from the target ML network (that is, the predicted KPI values for the same KPI as the monitored KPI values), so that deviations of one or more of the monitored KPI values from the corresponding predicted KPI values can be detected (as shown in step S105). A deviation may be detected where predicted and measured KPI values differ by more than a predetermined difference threshold; as will be understood by those skilled in the art the thresholds are determined on a KPI and network specific basis. Where no deviations between monitored and predicted KPI values are detected, the communications network is then understood to be operating correctly. If the network is understood to be operating correctly, the monitoring of KPI values continues, as does the prediction of KPI values, so that any issues with the network that arise can be detected. By contrast, when deviation of one or more of the monitored KPI values from the corresponding predicted KPI value(s) is detected, candidate components for the cause of the deviation are then identified (as shown in step S106). Where a network analyser 20A in accordance with the aspect of an embodiment shown in Figure 2A is used, the deviation of one or more KPIs may be detected in accordance with a computer program stored in a memory 22, executed by a processor 21 in conjunction with one or more interfaces 23. Alternatively, where a network analyser 20B in accordance with the aspect of an embodiment shown in Figure 2B is used, the deviation of one or more KPIs may be detected by the detector 28.

Returning to the example illustrated in Figure 4, the communication network corresponding to the target ML network shown in Figure 4 may receive as inputs the same loads l ₁ = 300 and l₂ = 200 to the physical routers router₁ and router₂ (corresponding to the component ML models Rrouter₁ and Rrouter₂ as shown in Figure 4). The monitored KPI value (here, latency) measured at router₃ may then be 8ms. As discussed above, the predicted KPI value from the target ML model should be 7ms. Accordingly, if the conditions for determining a deviation are satisfied by a difference of 1ms between a predicted latency of 7ms and a monitored latency of 8ms (for example, if 1ms is larger than a predetermined difference threshold for this KPI), then a deviation in this KPI value may be detected.

As mentioned above, when a deviation in one or more KPI values is detected, then components that are candidates for the cause of the deviation are identified (see step S106). In order to identify the candidates, Model Based Diagnostics (MBD) may be used. Additional information on MBD can be found in “A Theory of Diagnosis from First Principles” by Reiter, R., Artif. Intell. 32(1): pg 57-95 (1987), available at https://www.science direct .com/science/article/abs/pii/0004370287900634?via%3Dihub as of 12 February 2021.

Candidates are identified by encoding the target ML model into a logical representation, which can then be used as the subject of a formal analysis of the network behaviour. The logical representation may take the form of a constraint satisfaction problem. Where the components used for a given path through the target network (as may be utilised by a logical network or slice) form the ordered set (from entry into the network to end point in the network) V_s =< v₀, ..., v_v > , wherein each v_i points to component m_vi, then an example of a constraint specifying the normal behaviour of a node can be formalized as:

¬ab(v_i) → f(v_i,f(v_i-₁))

In the above logical equation, f(v_if(v_i-1) is a function of the current component v_t and the previous components in the ordered set; the function is recursive, so f(v_i-₁) is itself a function of f(v_i-₂) , and so on. Further, ab(. ) is a logical predicate which intuitively models abnormal behaviour, that is, ¬ab(v_i)is true whenever v_i is not behaving abnormally. In the example shown in Figure 4, wherein the KPI under consideration is latency:

¬ab(v_i) → latency (v_i-₁)+ aⁱ

Using the KPI measurement latency(v_v) (that is, the latency of the endpoint component), the logical representation, and the input into the network, it is possible to compute the expected latency for each internal component , without requiring specific measurement of the latencies at the components. As will be appreciated, the same technique can be applied for other KPIs or combinations of KPIs. f(v_i,f(v_i-₁)) is a function describing the expected output from component m_vi , that is, given an input what is the corresponding expected output. As suggested by the formula, the input to a component is typically a function of more than one further component (as shown in Figure 4, where the input into router₃ is a function of the outputs of router1 and router₂, this information is provided by the network topography as encapsulated in the target ML model. A relationship among a component v_i and a set of components U is formally defined as a function f(v_if(U)) , where the output of the components in U is the input of v_i. Accordingly, the following logical equation is true where none of component v_i and a set of components U is behaving abnormally.

¬ab(v_i) → f(v_i,f( U))

If a monitored KPI value deviates from the corresponding predicted KPI value, the logical representation can be used to identify a set of candidates (or candidate components) that may explain the inconsistency.

Candidate components are those who, if ab(v_i) is true, restore consistency. The fact that they are behaving abnormally is then consistent with the observation. An explanation of the deviation in monitored KPI is therefore provided by minimizing the number of ab(. ) assumed to be true. More formally, given a set N of predicates ab(v_i ) set to false, we seek for a minimal set of predicates M ⊂N such that once the predicates in ab(v_i) are set to true, the set of constraints encoding the network becomes feasible. The computation of M can be done using any suitable diagnostic technique as will be familiar to those skilled in the art; an example of a suitable technique is HS-tree modelling, as also discussed in “A Theory of Diagnosis from First Principles” (as cited above). Once obtained, M provides a set of component combinations (wherein a component combination may comprise a single component, and the set itself may consist of one combination) that, if behaving abnormally, explain the unexpected measurement. These component combinations are the candidates for the cause of the deviation between the monitored and predicted KPI values. Where a network analyser 20A in accordance with the aspect of an embodiment shown in Figure 2A is used, the identification of candidate components may be performed in accordance with a computer program stored in a memory 22, executed by a processor 21 in conjunction with one or more interfaces 23. Alternatively, where a network analyser 20B in accordance with the aspect of an embodiment shown in Figure 2B is used, the identification of candidate components may be performed by the identifier 29.

Once the one or more candidates for the cause of the deviation between monitored and predicted KPI value(s) have been identified, some or all of the candidate components may be outputted. The outputted candidate components may then be reviewed for maintenance, replaced, and so on. In this way, the cause of deviation between monitored and expected KPIs may be addressed. In some aspects of embodiments, candidate components having a likelihood above a predetermined likelihood threshold of being the cause of deviation of the monitored KPI value from the corresponding predicted KPI value are output. The predetermined likelihood may take into account past maintenance history of components, if the candidate component is present in several different component combination (as discussed above), and so on. Any useful available information may be taken into account when determining the likelihood of a given candidate component being responsible for the deviation. Additionally or alternatively, a predetermined number of component candidates may be output, in order of likelihood starting with the most likely candidate for the cause of the deviation. Outputting a predetermined number of candidates may be of particular use when seeking to perform maintenance on a communications network based on the output candidates.

In some aspects of embodiments, an alternative method may be used to identify components within a communications network that are candidates for the cause of deviation in a monitored KPI value. In these aspects of embodiments a Graph Neural Network (GNN) is used as the target ML model, and to identify the candidates. A GNN uses deep learning methods to make predictions based on a graph that consists of a number of nodes, connected by a number of edges. The component ML models (obtained as discussed above) are used as the nodes of the GNN, with the edges of the GNN indicating the connections between the components, as obtained from the network topography. The target ML model is therefore a unified model that is created by combining the trained ML models of the network components, connected according to the network topography. Various different GNN architectures may be used. Examples of suitable architectures include: Graph Convolution Networks (GCN), which are graph neural networks that have a graph convolution layer which is used in order to aggregate features from neighbouring (connected) nodes; Simple Graph Convolution (SGC) networks, which are based on GCN but avoid unnecessary operations; and Graph Sampling and Aggregation networks (SAGE or GraphSAGE), which are an inductive approach for collecting and aggregating data in graph neural networks that are effective at operating with new graphs that the original model has not seen.

A visualisation of an example of a unified model is shown in Figure 5. In Figure 5, the unified model comprises 5 network nodes, identified as nodes m₁ to m₅. Each of the nodes m has a set of features X, where X = [{a_s ^m,l_s ^m,o_s ^m,p^m}], and a, l,o,p and s are as discussed above. Each of the nodes also has a classification Y, which indicates whether the node is operating correctly (that is, behaving normally). The possible values of Y are true (behaving normally, shown on Figure 5 as “OK”) and false (not behaving normally, shown on Figure 5 as “NOT OK”).

The unified model is trained using supervised learning. The dataset used for training comprises examples of components represented by their features (the input and output of each component ML model) and also labelled to indicate whether or not the component is faulty or not . Binary Cross Entropy (BCE) may be used to provide the binary component classification (that is, faulty or not faulty). Once trained, the unified model may process measurements of data input and output into the network at network endpoints and the network topography to predict the status of the components within the network (faulty or not faulty), and thereby identify which candidates within the network are likely candidates for a deviation in KPI values, that is, a deviation in monitored KPI values from expected KPI values. Once identified, all or a selection of the candidates may then be output as discussed above.

Figure 6A and Figure 6B (collectively referred to as Figure 6) are two halves of a scheduling diagram showing an overview of steps that may be performed in accordance with an aspect of an embodiment. Some or all of the modules shown in Figure 6, that is, the network orchestrator, learning module, modelling module and reasoning module may form part of a network analyser, for example, network analyser 20A or network analyser 20B.

In the aspect of an embodiment to which Figure 6 relates, the component ML models are obtained using data from a training network. For simplicity, Figure 6 divides the steps taken into three groups: learning, modelling and reasoning. Steps 1 and 2 (see Figure 6A) are in the learning group, steps 3 to 5 (see Figure 6A) are in the modelling group and step 6 to 8 (see Figure 6B) are in the reasoning group. In step 1 , measurements for each of the types of component in the target network are taken from the training network. These measurements are used in step 2 to obtain the component ML models. In step 3 the network topography of the target network is obtained, and in step 4 the component ML models of the components in the target network are obtained. A target ML model is then generated (in step 5) using this information. Using the target ML model, predictions for expected KPI values are computed in step 6, and through collecting measurements at network endpoints the corresponding observed KPI values are obtained from the target network in step 7. It is then determined if the monitored KPI values deviate from the predicted KPI values; if there is no deviation, then the monitoring/predicting may continue. By contrast, where deviation is detected (as discussed above), logical reasoning is performed in step 8 to determine the components within the network that are candidates for the cause of the deviation.

The network analysis methods and apparatuses in accordance with aspects of embodiments may identify unexpected behaviour of a component in a target communications network without requiring measurements of performance on an individual component level, through the use of ML modelling of both network components and subsequently of the target communications network. Candidates for the cause of anomalous network behaviour may be identified using model based diagnosis methods. Accordingly, the load on networks that may be imposed by component and/or slice level measurements throughout the network may be reduced, and the human labour required to supervise network operations may also be reduced. The network analysis methods and apparatuses are applicable to a broad range of networks, but are particularly well suited to telecommunications and loT networks; telecommunications and loT networks tend to be complex and involve a large number of components, so the efficiencies provided by methods and analysis in accordance with aspects of embodiments can be particularly beneficial.

It will be appreciated that examples of the present disclosure may be virtualised, such that the methods and processes described herein may be run in a cloud environment.

The methods of the present disclosure may be implemented in hardware, or as software modules running on one or more processors. The methods may also be carried out according to the instructions of a computer program, and the present disclosure also provides a computer readable medium having stored thereon a program for carrying out any of the methods described herein. A computer program embodying the disclosure may be stored on a computer readable medium, or it could, for example, be in the form of a signal such as a downloadable data signal provided from an Internet website, or it could be in any other form. In general, the various exemplary embodiments may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the disclosure is not limited thereto. While various aspects of the exemplary embodiments of this disclosure may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

As such, it should be appreciated that at least some aspects of the exemplary embodiments of the disclosure may be practiced in various components such as integrated circuit chips and modules. It should thus be appreciated that the exemplary embodiments of this disclosure may be realized in an apparatus that is embodied as an integrated circuit, where the integrated circuit may comprise circuitry (as well as possibly firmware) for embodying at least one or more of a data processor, a digital signal processor, baseband circuitry and radio frequency circuitry that are configurable so as to operate in accordance with the exemplary embodiments of this disclosure.

It should be appreciated that at least some aspects of the exemplary embodiments of the disclosure may be embodied in computer-executable instructions, such as in one or more program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types when executed by a processor in a computer or other device. The computer executable instructions may be stored on a computer readable medium such as a hard disk, optical disk, removable storage media, solid state memory, RAM, etc. As will be appreciated by one of skill in the art, the function of the program modules may be combined or distributed as desired in various embodiments. In addition, the function may be embodied in whole or in part in firmware or hardware equivalents such as integrated circuits, field programmable gate arrays (FPGA), and the like.

References in the present disclosure to “one embodiment”, “an embodiment” and so on, indicate that the embodiment described may include a particular feature, structure, or characteristic, but it is not necessary that every embodiment includes the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to implement such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.

It should be understood that, although the terms “first”, “second” and so on may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and similarly, a second element could be termed a first element, without departing from the scope of the disclosure. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed terms.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the present disclosure. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises”, “comprising”, “has”, “having”, “includes” and/or “including”, when used herein, specify the presence of stated features, elements, and/or components, but do not preclude the presence or addition of one or more other features, elements, components and/ or combinations thereof. The terms “connect”, “connects”, “connecting” and/or “connected” used herein cover the direct and/or indirect connection between two elements.

The present disclosure includes any novel feature or combination of features disclosed herein either explicitly or any generalization thereof. Various modifications and adaptations to the foregoing exemplary embodiments of this disclosure may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings. However, any and all modifications will still fall within the scope of the non-limiting and exemplary embodiments of this disclosure. For the avoidance of doubt, the scope of the disclosure is defined by the claims.

Claims

1. A method for network analysis of a communications network, the method comprising: obtaining one or more trained machine learning, ML, models of one or more types of components of the communications network; generating a target ML model that represents a network topography of the communications network, wherein the generation of the target ML model uses network topography information and the trained ML models of types of components; predicting values of one or more Key Performance Indicators, KPIs, of the communications network using the generated target ML model; monitoring values of one or more KPIs of the communications network at network endpoints; and detecting when one or more of the monitored KPI values deviate from the corresponding predicted KPI values; wherein the method further comprises, when it is detected that a monitored KPI value from among the one or more monitored KPI values deviates from the corresponding predicted KPI value, identifying components within the communications network that are candidates for the cause of the deviation of the monitored KPI value, the identification being made using the generated target ML model and the monitored values of one or more KPIs at the network endpoints.

2. The method of claim 1 , wherein the step of obtaining the trained ML models of one or more components comprises generating ML models of the one or more components using measurements of individual component inputs and outputs.

3. The method of claim 2, wherein the measurements of individual component inputs and outputs are obtained from a database of component measurements, and/or wherein the measurements of individual component inputs and outputs are obtained from one or more training networks.

4. The method of claim 3, wherein the one or more training networks comprise a physical communications network and/or a simulation of a physical communications network.

5. The method of any preceding claim, wherein the target ML model is an arrangement of interconnected ML models of components connected to replicate the network topography.

6. The method of claim 5, wherein the connections between the interconnected ML models allow the output of a first given ML model from among the interconnected ML models to be taken as an input for a second given ML model from among the interconnected ML models.

7. The method of any of claims 5 and 6, wherein the identification of components within the communications network that are candidates for the cause of the deviation of the monitored KPI values from the predicted KPI values comprises using model based diagnostics, wherein the target model is encoded as a constraint satisfaction problem.

8. The method of claim 7, wherein the constraint satisfaction problem is used to identify as candidates for the cause of the deviation combinations of components within the communications network that if behaving abnormally could be responsible for the deviation in monitored KPI value.

9. The method of any preceding claim, wherein the one or more components of the communications network comprise one or more of: a base station; a router; a network node; network switches; and firewalls.

10. The method of any preceding claim, wherein the network topography information of the communications network comprises one or more of: connections between components; if logical networks are used on the communications network, which components support which logical networks; physical separations between components; and hardware specific details.

11. The method of any of claims 1 to 4, wherein the target ML model is a unified model of the communications network created by combining the trained ML models of one or more components.

12. The method of claim 11 , wherein the unified model is trained using supervised learning to be able to identify components within the communications network that are candidates for the cause of the deviation of the monitored KPI value from the corresponding predicted KPI value.

13. The method of any preceding claim, further comprising outputting one or more component candidates for the cause of deviation of the monitored KPI value from the corresponding predicted KPI value.

14. The method of claim 13, wherein component candidates having a likelihood above a predetermined likelihood threshold of being the cause of deviation of the monitored KPI value from the corresponding predicted KPI value are output.

15. The method of any of claims 13 and 14, wherein a predetermined number of component candidates are output, the component candidates being selected for output in order of likelihood of being the cause of deviation of the monitored KPI value from the corresponding predicted KPI value.

16. The method of any of claims 13 to 15, further comprising modifying the communications network based on the output component candidates.

17. The method of any preceding claim, wherein one or more monitored KPI values are detected as deviating from the corresponding predicted KPI values when the difference between the monitored KPI value and corresponding predicted KPI value is greater than a predetermined difference threshold.

18. The method of any preceding claim, wherein the communication network is a telecommunications network or wherein the communications network is an Internet of Things, loT, network.

19. A network analyser for performing network analysis of a communications network, the network analyser comprising processing circuitry and a memory containing instructions executable by the processing circuitry, whereby the network analyser is operable to: obtain one or more trained machine learning, ML, models of one or more types of components of the communications network; generate a target ML model that represents a network topography of the communications network, wherein the generation of the target ML model uses network topography information and the trained ML models of types of components; predict values of one or more Key Performance Indicators, KPIs, of the communications network using the target ML model; monitor values of one or more KPIs of the communications network at network endpoints; and detect when one or more of the monitored KPI values deviate from the corresponding predicted KPI values; wherein, when it is detected that a monitored KPI value from among the one or more monitored KPI values deviates from the corresponding predicted KPI value, the network analyser is further configured to identify components within the communications network that are candidates for the cause of the deviation of the monitored KPI value, the identification being made using the target ML model and the monitored values of one or more KPIs at the network endpoints.

20. The network analyser of claim 19, further configured to obtain the trained ML models of one or more components by generating ML models of the one or more components of the communications network using measurements of individual component inputs and outputs.

21. The network analyser of claim 20, further configured to obtain the measurements of individual component inputs and outputs from a database of component measurements, and/or from one or more training networks.

22. The network analyser of claim 21 , wherein the one or more training networks comprise a physical communications network and/or a simulation of a physical communications network.

23. The network analyser of any of claims 19 to 22, wherein the target ML model is an arrangement of interconnected ML models of components connected to replicate the network topography.

24. The network analyser of claim 23, wherein the connections between the interconnected ML models allow the output of a first given ML model from among the interconnected ML models to be taken as an input for a second given ML model from among the interconnected ML models.

25. The network analyser of any of claims 23 and 24, further configured to identify components within the communications network that are candidates for the cause of the deviation of the monitored KPI values from the predicted KPI values using model based diagnostics, wherein the target model is encoded as a constraint satisfaction problem.

26. The network analyser of claim 25, configured to use the constraint satisfaction problem to identify as candidates for the cause of the deviation combinations of components within the communications network that if behaving abnormally could be responsible for the deviation in monitored KPI value.

27. The network analyser of any of claims 19 to 26, wherein the one or more components of the communications network comprise one or more of: a base station; a router; a network node; network switches; and firewalls.

28. The network analyser of any of claims 19 to 27, wherein the network topography information of the communications network comprises one or more of: connections between components; if logical networks are used on the communications network, which components support which logical networks; physical separations between components; and hardware specific details.

29. The network analyser of any of claims 19 to 22, wherein the target ML model is a unified model of the communications network created by combining the trained ML models of one or more components.

30. The network analyser of claim 29, configured to train the unified model using supervised learning to be able to identify components within the communications network that are candidates for the cause of the deviation of the monitored KPI value from the corresponding predicted KPI value.

31. The network analyser of any of claims 19 to 30, further configured to output one or more component candidates for the cause of deviation of the monitored KPI value from the corresponding predicted KPI value.

32. The network analyser of claim 31 , configured to output component candidates having a likelihood above a predetermined likelihood threshold of being the cause of deviation of the monitored KPI value from the corresponding predicted KPI value.

33. The network analyser of any of claims 31 and 32, configured to output a predetermined number of component candidates, the component candidates being selected for output in order of likelihood of being the cause of deviation of the monitored KPI value from the corresponding predicted KPI value.

34. The network analyser of any of claims 19 to 33, configured to detect one or more monitored KPI values as deviating from the corresponding predicted KPI values when the difference between the monitored KPI value and corresponding predicted KPI value is greater than a predetermined difference threshold.

35. A communications network comprising the network analyser of any of claims 19 to 34.

36. The communications network of claim 35, wherein the communication network is a telecommunications network or wherein the communications network is an Internet of Things, loT, network.

37. A computer-readable medium comprising instructions which, when executed on a computer, cause the computer to perform a method in accordance with any of claims 1 to 18.

38. A network analyser for performing network analysis of a communications network, the network analyser comprising: an obtainer configured to obtain one or more trained machine learning, ML, models of one or more types of components of the communications network; a model generator configured to generate a target ML model that represents a network topography of the communications network, wherein the generation of the target ML model uses network topography information and the trained ML models of types of components; a predictor configured to predict values of one or more Key Performance Indicators, KPIs, of the communications network using the target ML model; a monitor configured to monitor values of one or more KPIs of the communications network at network endpoints; and a detector configured to detect when one or more of the monitored KPI values deviate from the corresponding predicted KPI values; wherein the network analyser further comprises an identifier configured, when it is detected that a monitored KPI value from among the one or more monitored KPI values deviates from the corresponding predicted KPI value, to identify components within the communications network that are candidates for the cause of the deviation of the monitored KPI value, the identification being made using the target ML model and the monitored values of one or more KPIs at the network endpoints.