CN119856472A

CN119856472A - Operational anomaly detection and isolation in a multi-domain communication network

Info

Publication number: CN119856472A
Application number: CN202280099897.0A
Authority: CN
Inventors: 阿提拉·米奇扬科夫; 亚历山大·比罗; 博通德·瓦尔加; 威尔玛·奥尔戈万伊
Original assignee: Telefonaktiebolaget LM Ericsson AB
Current assignee: Telefonaktiebolaget LM Ericsson AB
Priority date: 2022-09-14
Filing date: 2022-09-14
Publication date: 2025-04-18
Also published as: WO2024057063A1; EP4588222A1

Abstract

Embodiments include computer-implemented methods for detecting operational anomalies in a multi-domain communication network. Such a method includes obtaining a plurality of time sequences of performance data from a plurality of domains of a communication network and determining one or more models of non-abnormal network behavior based on the plurality of time sequences. Such a method includes classifying each time series into a plurality of types based on the presence or absence of at least two types of components in each time series. Such a method includes detecting operational anomalies in a plurality of time series or in additional performance data obtained from a plurality of domains of the communication network based on one or more models and the classified types. Other embodiments include a network analysis system configured to perform such methods.

Description

Operational anomaly detection and isolation in a multi-domain communication network

Technical Field

The present disclosure relates generally to communication networks, and more particularly to techniques for detecting operational anomalies (e.g., faults, etc.) that appear across multiple domains of a communication network.

Background

Fifth generation ("5G") cellular systems, also known as New Radios (NRs), were initially standardized as 3GPP Rel-15 and continue to evolve in later releases. NR was developed for maximum flexibility to support a variety of different use cases including enhanced mobile broadband (eMBB), machine Type Communication (MTC), ultra-reliable low-latency communication (URLLC), side-link device-to-device (D2D), and several others. The 5G/NR technology has many similarities to fourth generation LTE.

At a higher layer, the 5G system (5 GS) is composed of AN Access Network (AN) and a Core Network (CN). The AN provides UE connectivity to the CN, e.g., via a base station such as a gNB or NG-eNB. As described in more detail below, the CN includes various Network Functions (NF) that provide a range of different functions such as session management, connection management, charging, authentication, and the like.

The increasing complexity of communication networks, including 5G networks, has driven the development of analytical systems that support the operation, optimization and planning of these networks. This includes detecting and accounting for abrupt, undesirable changes (e.g., faults) in network operation and/or performance. These analysis systems in turn require the collection and processing of large amounts of data, particularly time series data.

Generally, a time sequence is a sequence of data or information values, each data or information value having an associated instance of time (e.g., the time at which the data or information value was generated and/or collected). The data or information may be any measurable thing that depends in some way on time, such as price, humidity or number of people. An important feature of the time series is the frequency, which is the frequency at which the data values of the data set are recorded. The frequency is also inversely proportional to the period (or duration) between successive data values.

Time series analysis includes techniques that attempt to understand or contextualize time series data, such as making predictions or predictions of future data (or events) using models constructed from past time series data. To best facilitate such analysis, it is preferred that the time series consists of data values measured and/or recorded at a constant frequency or period.

The time series data sets may be collected from a geographic location, such as from nodes of a communication network located in one or more geographic areas (e.g., country, region, province, city, etc.). For example, the values of Performance Measurement (PM) counters may be collected from various network nodes at specific time intervals. The time series data collected in this manner may be used to analyze, predict, and/or understand user behavior patterns and network performance trends.

Disclosure of Invention

However, even with the large amount of available time series data, it can be very difficult to detect and account for sudden, undesirable changes in network operation and/or performance (e.g., faults or anomalies).

For example, advanced communication networks are robust and distributed, so that faults have a relatively limited impact on users, sessions and/or subsets of network elements, which makes them more difficult to detect. Furthermore, normal network behavior varies with time of day, day of week, month, and/or season. The presence or absence of these trends needs to be considered in detecting abnormal network behavior. Furthermore, the time sequence of each available data is typically one-dimensional, such that it is collected from a single network node and is uncorrelated with other data sources. In this way, it is more difficult to detect faults appearing in multiple network nodes.

Embodiments of the present disclosure address these and other problems, issues, and/or difficulties by providing techniques for detecting and isolating communication network operational anomalies based on related data sources from various network domains, and corresponding network analysis systems that perform the techniques.

Some embodiments include a method (e.g., procedure) for detecting operational anomalies in a multi-domain communication network.

These exemplary methods may include obtaining a plurality of time sequences of performance data from a plurality of domains of a communication network. The example methods may also include determining one or more models of non-abnormal network behavior based on the plurality of time series. The exemplary methods may further include classifying each time series into a plurality of types based on the presence or absence of at least two types of components in each time series. The example methods may also include detecting operational anomalies in multiple time series or in additional performance data obtained from multiple domains of the communication network based on the one or more models and classification types.

In some embodiments, the exemplary methods may further include determining, based on detecting a plurality of operational anomalies in the additional performance data, an order of importance of the detected operational anomalies based on deviations of each from corresponding non-anomalous network behavior. In some of these embodiments, the example methods may further include initiating one or more corrective actions in a plurality of domains of the communication network in response to the one or more detected anomalies determined to be most significant. In some of these embodiments, the example methods may further include refraining from initiating one or more further corrective actions in one or more domains of the communication network in response to the one or more detected anomalies determined to be of minor importance.

In some embodiments, classifying each time series based on the presence or absence of at least two types of components includes the operations of:

Detecting whether each of the time series includes a seasonal component and/or a non-constant trend component;

Classifying the time series as a first type when the time series includes seasonal components;

Classifying the time series into a second type when the time series includes non-constant trend components but does not include seasonal components, and

Classifying the time series into a third type when the time series includes neither a non-constant trend component nor a seasonal component.

Disclosed herein are various examples of obtained performance data, determined non-abnormal behavioral models, and detection of operational anomalies.

Other embodiments include a network analysis system (e.g., NWDAF, SMO node, NM node, cloud system, etc.) configured to perform operations corresponding to any of the example methods described herein. Other embodiments include a non-transitory, computer-readable medium storing program instructions that, when executed by a processing circuit, configure the network analysis system to perform operations corresponding to any one of the example methods described herein.

These and other embodiments described herein may provide a wide range of possibilities to investigate various known network faults as well as fast, automatic detection of network faults that are still unknown. In this way, embodiments may capture new anomalies early while they are still developing, thereby minimizing their impact on user experience and network performance. Furthermore, abnormal detection based on learning normal network behavior has significant advantages over conventional, threshold-based alarm systems, as many KPIs depend on factors such as time of day, day of week, network load, etc. Further, by monitoring and correlating network wide Key Performance Indicators (KPIs), embodiments may isolate UEs, data sessions, etc. that are affected by unrecognized failures or interworking problems. In addition to the more visible network failures that are typically identified by conventional techniques, embodiments may identify more potential failures and interworking problems that are typically missed by conventional techniques.

These and other objects, features and advantages of the embodiments of the present disclosure will become apparent upon reading the following detailed description with reference to the accompanying drawings, which are briefly described below.

Drawings

FIG. 1 is a high-level block diagram of an exemplary 5G/NR network architecture.

Fig. 2 illustrates an exemplary 5G reference architecture with service-based interfaces and various 3GPP defined NFs.

Fig. 3 shows an exemplary multi-domain network comprising a RAN, a packet-based Core Network (CN) and an IP Multimedia Subsystem (IMS).

Fig. 4-7 illustrate various exemplary time series of network performance data collected over a period of about four (4) weeks.

Fig. 8 shows a functional diagram of a network analysis system according to an embodiment of the present disclosure.

Fig. 9-12 illustrate an exemplary time series of network performance data and three components extracted from the time series using embodiments of the present disclosure.

FIG. 13 illustrates an exemplary time series including trend components detected in accordance with an embodiment of the present disclosure.

Fig. 14 shows the time series of fig. 13 with the remaining components after removal of the trend components.

15-17 Illustrate exemplary arrangements of upper and lower bounds for anomaly detection for multiple composite time series and two separate time series according to embodiments of the present disclosure.

Fig. 18 illustrates an exemplary implementation of a network analysis system according to an embodiment of the present disclosure.

Fig. 19 shows a high-level diagram of an open RAN (O-RAN) architecture.

Figures 20-21 illustrate two implementation options for integrating embodiments of the present disclosure with an O-RAN architecture.

Fig. 22 illustrates an exemplary method (e.g., process) for detecting operational anomalies in a multi-domain communication network, according to various embodiments of the present disclosure.

Fig. 23 illustrates a communication system in accordance with various embodiments of the present disclosure.

Fig. 24 illustrates a network node according to various embodiments of the present disclosure.

FIG. 25 illustrates a host computing system according to various embodiments of the present disclosure.

FIG. 26 is a block diagram of a virtualized environment in which functions implemented by some embodiments of the disclosure may be virtualized.

Detailed Description

Some embodiments contemplated herein will now be described more fully with reference to the accompanying drawings. However, other embodiments are included within the scope of the subject matter disclosed herein, which should not be construed as limited to only the embodiments set forth herein, but rather, these embodiments are provided by way of example to convey the scope of the subject matter to those skilled in the art.

In general, all terms used herein will be interpreted according to their ordinary meaning in the relevant art unless clearly given and/or implied by the context in which they are used. All references to elements, devices, components, means, steps, etc. should be interpreted openly as referring to at least one instance of an element, device, component, means, step, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless one step is explicitly described as being followed or preceded by another step and/or implicitly one step must be followed or preceded by another step. Any feature of any embodiment disclosed herein may be applied to any other embodiment where appropriate. Likewise, any advantages of any embodiment may apply to any other embodiment and vice versa. Other objects, features and advantages of the attached embodiments will be apparent from the following description.

Note that the description herein focuses on 3GPP cellular communication systems, and thus 3GPP terminology or terminology similar to 3GPP terminology is often used. However, the concepts disclosed herein are not limited to 3GPP systems. Furthermore, although the term "cell" is used herein, it should be understood (particularly with respect to 5G NR) beams may be used instead of cells, as such, the concepts described herein apply equally to both cells and beams.

Fig. 1 shows a high-level view of an exemplary 5G network 100, including a Next Generation RAN (NG-RAN) 199 and a 5G core (5G core,5 gc) 198.NG-RAN 199 can include a set of gnbs (gnbs) connected to 5GC via one or more NG interfaces, such as gnbs 100, 150 connected via interfaces 102, 152, respectively. Further, the gnbs can be connected to each other via one or more Xn interfaces, such as Xn interface 140 between gnbs 100 and 150. Regarding NR interfaces to UEs, each of the gnbs may support Frequency Division Duplexing (FDD), time Division Duplexing (TDD), or a combination thereof. Each of the gnbs may serve a geographic coverage area including one or more cells, and in some cases, may also provide coverage in the respective cells using various directional beams.

NG-RAN 199 is layered into a radio network layer (Radio Network Layer, RNL) and a transport network layer (Transport Network Layer, TNL). The NG-RAN architecture, i.e. the NG-RAN logical nodes and the interfaces between them, are defined as part of the RNL. For each NG-RAN interface (NG, xn, F1), the relevant TNL protocol and function is specified. TNL serves user plane transport and signaling transport.

The NG RAN logical node shown in fig. 1 comprises one Central Unit (CU or gNB-CU) and one or more Distributed units (DU or gNB-DU). For example, gNB 100 includes gNB-CU 120 and gNB-DUs 120 and 130. A CU (e.g., the gNB-CU 120) is a logical node that hosts higher layer protocols and performs various gNB functions such as controlling the operation of DUs. DUs (e.g., gNB-DUs 120, 230) are decentralized logical nodes that host lower layer protocols and may include various subsets of gNB functions according to function split options. The gNB-CUs are connected to one or more gNB-DUs through respective Fl logical interfaces (e.g., 122 and 132).

One variation in 5G networks (e.g., in 5 GC) is that the traditional peer-to-peer interfaces and protocols established in earlier generation networks are modified and/or replaced by service-based architecture (Service Based Architecture, SBA), where Network Functions (NF) provide one or more services to one or more service consumers. This may be done, for example, by a hypertext transfer protocol/Representational state transfer (Hyper Text Transfer Protocol/Representational STATE TRANSFER, HTTP/REST) Application Programming Interface (API). In general, the various services are self-contained functions that can be changed and modified in an isolated manner without affecting other services.

In addition, services consist of various "service operations" which are finer divisions of overall service functionality. The interaction between the service consumer and the producer may be of the "request/response" or "subscription/notification" type. In 5G SBA, a Network Repository Function (NRF) allows each network function to discover services provided by other network functions, and a Data Storage Function (DSF) allows each network function to store its context. This 5G SBA model is based on principles including modularity, reusability, and NF self-inclusion, which may enable network deployment to take advantage of the latest virtualization and software technologies.

Fig. 2 illustrates an exemplary non-roaming architecture of a 5G network (200) with a service-based interface and various 3GPP defined NFs. These include the following NFs, with additional details provided for those most relevant to the present disclosure:

An application function (Application Function, AF with Naf interface) interacts with the 5GC to provide information to the network operator and to subscribe to specific events that occur in the operator network. The AF provides an application that delivers a service in a layer (i.e., transport layer) different from a layer (i.e., signaling layer) requesting the service, and controls flow resources according to contents negotiated with the network. The AF conveys dynamic session information to the PCF (via the N5 interface) including a description of the medium to be delivered by the transport layer.

Policy control function (Policy Control Function, PCF with Npcf interface) supports a unified policy framework to manage network behavior by providing PCC rules (e.g., for the handling of each traffic data flow under PCC control) to the SMF via the N7 reference point. The PCF provides policy control decisions and flow-based charging control including service data flow detection, gating, qoS, and flow-based charging (except credit management) for the SMF.

The PCF receives session and media related information from the AF and informs the AF of traffic (or user)

Planar events.

The user plane function (User Plane Function, UPF) supports the handling of user plane traffic, including packet inspection and different enforcement actions (e.g., event detection and reporting), based on rules received from the SMF. The UPF communicates with the RAN (e.g., NG-RNA) via an N3 reference point, with the SMF (discussed below) via an N4 reference point, and with an external Packet Data Network (PDN) via an N6 reference point. The N9 reference point is used for communication between two UPFs.

Session management functions (Session Management Function, SMF, with Nsmf interface) interact with the decoupled traffic (or user) plane, including creating, updating and removing Protocol Data Unit (PDU) sessions and managing session contexts with User Plane Functions (UPF), e.g., for event reporting. For example, SMF performs data flow detection (based on filter definitions included in PCC rules), online and offline charging interactions, and policy enforcement.

The billing function (Charging Function, CHF, with Nchf interface) is responsible for fusing online and offline billing functions. It provides quota management (for online charging), re-authorization triggers, rating conditions, etc., and receives notification from the SMF about usage reports.

Quota management involves granting a particular number of units (e.g., bytes, seconds) for a service. CHF and CHF

But also interacts with the billing system.

The access and mobility management function (ACCESS AND Mobility Management Function, AMF, with Namf interface) terminates the RAN CP interface and handles all mobility and connection management for the UE (similar to MME in EPC). The AMF communicates with the UE via an N1 reference point, with the SMF via an N11 reference point, and with the RAN (e.g., NG-RAN) via an N2 reference point.

Network open function (Network Exposure Function, NEF) with Nnef interfaces

Port-acts as an entry point into the operator network by securely opening network capabilities and events provided by the 3GPP NF to the AF and by providing the AF with a way to securely provide information to the 3GPP network. For example, NEF provides services that allow AF to provide specific subscription data (e.g., expected UE behavior) for various UEs. In general, the services provided by the NEF are similar to those provided by SCEF in EPC.

A network repository function (Network Repository Function, NRF) with Nnrf interface-providing service registration and discovery enabling NFs to identify appropriate services available from other NFs.

Network slice selection function (Network Slice Selection Function, NSSF), with Nnssf interface-a "network slice" is a logical partition of a 5G network that provides specific network capabilities and features, e.g., to support specific services. A network slice instance is a set of NF instances and required network resources (e.g., computing, storage, communication) that provide the capabilities and characteristics of the network slice. NSSF enable other NFs (e.g., AMFs) to identify network slice instances appropriate for the service desired by the UE.

An authentication server function (Authentication Server Function, AUSF) with Nausf interface-based on the fact that in the user's home network (HPLMN), it performs user authentication and computes security key material for various purposes.

A Network data analysis Function (Network DATA ANALYTICS Function, NWDAF),

Has Nnwdaf interfaces that interact with other NFs to collect relevant data and provide network analysis information (e.g., statistics and/or predictions of past events) to other NFs.

A location management function (Location Management Function, LMF) with Nlmf interface-supporting various functions related to the determination of UE location including any one of DL location measurement or location estimate from UE, UL location measurement from NG RAN, and non-UE associated assistance data from NG RAN for UE location determination and obtaining.

Unified data management (Unified DATA MANAGEMENT, UDM) functions support the generation of 3GPP authentication credentials, user identification handling, subscription data based access authorization, and other subscriber related functions. To provide this functionality, the UDM uses subscription data (including authentication data) stored in a 5GC unified data store (UDR). In addition to UDM, UDR also supports storage and retrieval of PCF policy data, and storage and retrieval of NEF application data. The terms "UDM" and "UDM function" are used interchangeably herein.

The IP multimedia subsystem (IP Multimedia Subsystem, IMS) is an architectural framework for delivering multimedia services to wireless devices based on these internet-centric protocols. IMS was originally specified by the 3 rd generation partnership project (3 GPP) in release 5 (Rel-5) as a technology for evolved mobile networks outside of GSM, e.g. for delivering internet services via GPRS. IMS has evolved in subsequent releases to support other access networks as well as a wide range of services and applications.

At a high level, the functionality of an IMS network can be subdivided into two types, control and media, and application enablers. The control functions include a call session control function (Call Session Control Function, CSCF) and a home subscriber server (Home Subscriber Server, HSS). The CSCF is used for session control of devices and applications using the IMS network. Session control includes secure routing of Session Initiation Protocol (SIP) messages, subsequent monitoring of SIP sessions, and communication with policy framework to support media authorization. CSCF functions can also be divided into proxy CSCFs (P-CSCFs), serving CSCFs (S-CSCFs), and interrogating CSCFs (I-CSCFs).

The CSCF also interacts with the HSS, which is the master database containing user and subscriber information, to support network entities handling calls and sessions. For example, the HSS provides functions such as identification handling, access authorization, authentication, mobility management (e.g., which session control entity is serving the user), session establishment support, service provisioning support, and service authorization support.

The media resource function (Media Resource Function, MRF) may provide media services in the user's home network and may manage and process media streams such as voice, video, voice-to-text, and real-time transcoding of multimedia data. In general, webRTC gateways allow both local-based and browser-based devices to securely access services in a network.

As mentioned briefly above, the increasing complexity of communication networks, including 5G networks, has driven the evolution of analytical systems that support the operation, optimization and planning of these networks. This includes detecting and accounting for abrupt, undesirable changes (e.g., faults) in network operation and/or performance. Advanced analysis systems need to collect and associate basic network events from different network domains such as CN, RAN and transport network. Such an analysis system calculates user-level and session-level E2E quality of service metrics (S-KPIs) and radio and network resource metrics (R-KPIs) characterizing the user-level and session-level radio environment or network operation.

Fig. 3 shows an exemplary multi-domain network (300) including a UE, a RAN, a packet-based CN, and an IMS. As shown in fig. 3, the RAN includes an eNB providing an LTE-Uu radio interface to the UE and a gNB providing an NR-Uu interface. The CN includes the SMF, AMF, and UPF in 5GC discussed above, as well as a Mobility Management Entity (MME), serving Gateway (SGW), and Packet Gateway (PGW) as part of an evolved packet core (Evolved Packet Core, EPC) associated with the LTE network. The UPF is connected to the IMS via an N6 interface, and as such, the IMS in fig. 3 is an example of the PDN shown in fig. 2.

Fig. 3 also shows various "sampling points" at which data may be collected from three domains of the network. For example, node events (e.g., PM counters) may be collected from eNB, gNB, AMF, SMF, UPF, MME and PGWs. Similarly, interface events may be collected from the S5-U (user), S5-C (control), sl-U and S5-U interfaces in the CN and the Mw interface between the P-CSCF and IS-CSCSF in the IMS. In addition to detecting events and/or conditions at individual nodes and/or interfaces, some more advanced analysis systems combine information collected from multiple domains to determine a "user experience" analysis that represents the performance experienced by an end user for a particular service.

The time series data sets may be collected from various nodes and various interfaces in multiple domains of the communication network. The time series data collected in this manner may be used to analyze, predict, and/or understand user behavior patterns and network performance trends. However, even with the large amount of available time series data, it can be very difficult to detect and account for sudden, undesirable changes in network operation and/or performance (e.g., faults or anomalies). For example, advanced communication networks (such as the exemplary network shown in fig. 3) are robust and distributed, whereby faults have a relatively limited impact on users, sessions, and/or subsets of network elements, making them more difficult to detect.

One conventional approach to fault detection and troubleshooting involves manually searching for false elements using multiple types of filtering options provided by network monitoring and analysis tools. In general, these advanced tools provide the ability to investigate various network problems. However, if only random searches in the available data are used, it is almost impossible to find an unknown problem.

Another approach is to set fixed alarm thresholds for various network KPIs or metrics. This may be used for problematic situations and/or to avoid manual searching. However, there is a tradeoff between sensitivity and false alarms. If the threshold is set too low, the system is overloaded with a large number of alarms, and if it is set too high, only highly serious problems will be detected and are typically later than desired.

Another common approach is anomaly detection, which sets alarms based on the observed distribution of network KPIs or metrics. In this way, events that are outliers (in some statistical sense) with respect to typical or normal values will be detected.

Even so, these techniques are not always successful. Network behavior (based on KPIs or metrics) that is considered "normal" varies with time of day, day of week, month and/or season, as well as with network load and many other variables. The presence or absence of these trends needs to be considered in detecting abnormal network behavior. Furthermore, different KPIs and metrics may have different variability or dependencies on these factors.

Furthermore, each time series of data collected from a multi-domain network (e.g., as shown in fig. 3) is typically one-dimensional, such that it is collected from a single network element (e.g., node, interface, etc.) and is uncorrelated with other data sources. While this supports the detection of faults that have a measurable effect on a single network element, it is difficult to detect faults that appear in multiple network elements.

Other prior art techniques also suffer from similar drawbacks. U.S. patent 8200193 describes a UE-based technique for identifying abnormal traffic generated by a unique UE, but does not detect network level problems. U.S. patent publications 2021/0058424 and 2020/0106795 disclose techniques for anomaly detection in a communication network that focus on performance metrics of individual elements (e.g., micro-services or nodes) without regard to behavioral differences (e.g., periodicity, trends, etc.) between multidimensional network structures or time series data. Us patent 7460498 describes a technique for detecting problems with fixed telecommunication lines based on measurements of individual network elements, nor does it consider a multidimensional network structure.

Embodiments of the present disclosure address these and other problems, challenges, and/or difficulties by novel, flexible, and efficient techniques for detecting and isolating communication network operational anomalies based on related time-series data sources from various network domains, and corresponding network analysis systems that perform such techniques. Some aspects include:

robust learning of time series behavior, correcting errors and anomalies in training data using relationships of various time series describing various marginal distributions of KPIs from a complex network;

Classifying the time series with respect to the seasonal, existence and/or point of change of the trend of the time series;

Targeted anomaly detection applied to time series of classes, and

Isolating network problems and/or anomalies within the multidimensional space represented by the time series by finding the filter that highlights the most impact on the monitored KPIs.

Correlation of data from multiple sources for each user session enables filtering through multiple dimensions and combinations thereof. For example, embodiments may support calculating the drop rate of a UE from provider a on a cell from RAN provider B, or the video quality of a user of service provider C in area D.

The time series of each collected data may be considered as a marginal distribution of network performance or user experience within a particular dimension, where the complete network performance is represented by a multi-dimensional set of time series with unknown relationships between them (i.e., between the marginal distributions).

Embodiments apply anomaly detection to the multi-dimensional set of time series to automatically detect problems during network operation. The monitored network performance metrics and user experience KPI time series are first classified according to the existence of seasonal and trending components, and anomaly detection first learns of normal network behavior.

The relationships between the time series of the multidimensional system are used to ensure robustness in learning the normal behavior of the network in an unsupervised system. In some embodiments, an under-fit machine learning (MACHINE LEARNING, ML) model based on L1 regularization training may be used to suppress the effects of anomalies in the training data. This approach provides intelligent noise filtering capability and allows the ML model to learn the periodicity of normal behavior without capturing minor anomalies that are only present in a subset of other relevant time series.

Network problems or abnormal behavior correspond to detected anomalies. Embodiments may apply filtering and ordering to these network anomalies to distinguish between, for example, abnormal network operations and abnormal network loads. The observed metrics and KPI dependence on the underlying traffic can also be considered.

In some embodiments, the marginal distributions of KPIs for multiple dimensions (and combinations of dimensions) are used to isolate problematic network elements on an end-to-end data path by identifying their contribution to the observed performance degradation. Network failures typically affect multiple identifiable subscriber groups (i.e., marginal distributions of certain KPIs) as "side effects" beyond the actual trigger or root cause of the problem. For example, a serious YouTube service outage (root cause) may affect the video QoE metrics (side effects) of all apple terminals.

Embodiments may provide various benefits and/or advantages. For example, embodiments provide almost unlimited possibilities to investigate various known network faults, but also provide fast, automatic anomaly detection of network faults that are still unknown. In this way, embodiments may capture new anomalies early while they are still developing, thereby minimizing their impact on user experience and network performance.

Furthermore, abnormal detection based on learning normal network behavior has significant advantages over conventional, threshold-based alarm systems, as many KPIs depend on factors such as time of day, day of week, network load, etc. Having a threshold adapted to these factors significantly increases the reliability of fault detection. Further, embodiments utilize a learning system that reduces and/or eliminates the impact of training time errors on fault detection during operation.

Furthermore, by monitoring and correlating network-wide QoS/QoE KPIs, embodiments can accurately isolate UEs, data sessions, etc. that are affected by unrecognized failures or interworking problems. In addition to the more visible network element failures that are typically identified by conventional FM/PM techniques, embodiments may also identify more potential failures and interworking problems that are typically missed by these conventional techniques.

The time series data collected in the communication network (e.g., as shown in fig. 3) may have various formats, characteristics, and/or patterns. Fig. 4-7 illustrate various exemplary time sequences collected over a period of about four (4) weeks. The time series in fig. 4 has a daily pattern with peak times and minimum circumference points, while the time series in fig. 5 has a more random pattern but includes a single event represented by a peak. The time series in fig. 6 also has a random pattern, but also includes non-constant trend components. Finally, the time series in fig. 7 has a daily pattern similar to that of fig. 4, but also includes non-constant trend components similar to that of fig. 6.

Embodiments of the present disclosure may detect anomalies in time series data in these and other formats, features, and/or patterns. Fig. 8 shows a functional diagram of a network analysis system according to an embodiment of the present disclosure. This exemplary system includes various modules or functions that filter anomalies (representing network problems) and rank them based on their importance.

An input time series generator function (810) communicates formatted time series data. This module may associate and/or aggregate data of any given granularity (e.g., minutes, hours, days, etc.) and any dimension (e.g., UE vendor, network area, user subscription type, carrier frequency, etc.), and combinations thereof (vendor-model-operating system-IMEI software version number, function-service provider, tracking area-service provider, etc.). Each correlated and/or aggregated time series resulting from this function may be considered as a "marginal distribution" of the behavior of the multidimensional system in one or more dimensions. The output of this function is the data source for the rest of the system.

A time series behavior module (820) classifies each time series output by the input time series generator function according to behavior. The system then performs a different process for each time series based on the classification.

For example, the time series may be classified into four categories based on the presence of non-constant trend components and/or seasonal components. Differentiation between seasonal and non-seasonal data is necessary because viewing seasonal data as non-seasonal data may result in undetected anomalies at less busy times. In the case of seasonal data, the presence or absence of trend components does not affect subsequent analysis, as the detector can handle trend and seasonal together. However, in the case of non-seasonal data, the presence or absence of a trend component results in a different process, as described below.

The classification performed in this module may be implemented in a variety of ways. Some seasonal statistical tests include the Welch test (a double sample position test for testing the assumption that two populations have equal averages) and the QS test (a variant of Ljung-Box text calculated from seasonal lags, considering only positive autocorrelation). Some statistical tests of trend components include the smoothness test and the Kolmogorov-Smirnov test. ML techniques such as auto encoders, which are artificial Neural Networks (NNs) that can learn patterns in data in an unsupervised manner, may also be used.

The robust filtering module (830) applies an unsupervised learning technique to suppress abnormal behavior in various time series during training. This facilitates accurate prediction of "normal" behavior even if the training data includes anomalies.

One major weakness of (unsupervised) anomaly detection techniques is their sensitivity to anomalies in the training data. In such cases, conventional analysis systems cannot learn "normal" network behavior, and training time anomalies would shift predictions.

Embodiments of the present disclosure provide robustness against training time errors by exploiting the fact that the system is not fed by a set of independent time sequences, but by time sequences that are different marginal distributions (e.g., of KPIs) of a complex multidimensional system. Since the actual relationship between these marginal distributions is not known in advance, embodiments apply the ML model to learn the "normal behavior" of all time-series data in the context of a large multidimensional system.

In some embodiments, this may be accomplished by using an intentionally under-fitted ML model (or system) based on L1 regularization of weights applied to NNs comprising the ML model. L1 regularization minimizes the combined loss function of NN weights and norms of NN weights, some of which have an optimal value of zero, and promotes sparsity. This may be considered intelligent "noise filtering" in which predictions are made from the time series 'typical' values of the main features of the time series, which represent the "normal" (i.e. non-abnormal) behaviour of the network. Such features include dimensions, combinations of dimensions, and temporal features such as time of day, day of week, and the like.

If the time series was previously classified as seasonal (with or without a trend), it is input to a seasonal time series decomposer module (840) that identifies any seasonal behavior and trend included, removes these effects from the time series, and predicts. By removing seasonal behaviors and trends, any prediction error is independent and uniformly distributed (i.i.d.).

The behavior of the trend component may change over time and there may be complex seasonal patterns in the data. Thus, it is necessary to identify trend change points and multiple seasonal patterns, such as daily/weekly seasonings. Some embodiments may apply bayesian inference methods to extract this information from such complex structures.

One exemplary bayesian inference method is the Facebook propset algorithm, which uses a markov chain monte carlo (Markov Chain Monte Carlo, MCMC) sampling algorithm to fit and predict time series data. In this method, the time series is expressed as y (t) =s (t) +g (t) +h (t) +epsilon (t), where t represents time, s (t) represents a seasonal component on a daily/weekly basis (assuming a fourier series), g (t) represents a trend component (assuming piecewise linearity), h (t) represents a holiday component, and epsilon (t) represents an error component. It is assumed that the model parameters follow a predefined distribution. In a variant, the trend component may be multiplicative instead of additive, and the time series may be expressed as y (t) =g (t) (1+s (t) +h (t))+ε (t).

Propset uses a bayesian model to find the best parameters (e.g., intercept, current initial slope, delta between slopes) of the data. The process starts with a "priori" representing assumed values of the parameters before the data is seen. Given this a priori sum data, the bayesian model returns a "posterior", i.e. confidence (i.e. with maximum probability) of the update of the parameter values. More specifically, prophet uses a priori a laplacian statistical distribution. The maximum a posteriori probability (MAP) estimate of the bayesian model with laplace a priori is known to be equivalent to linear regression with LI regularization.

Fig. 9 shows an exemplary time series collected over a period of about four (4) weeks. Fig. 10-12 respectively show trend component g (t), seasonal component s (t), and error component epsilon (t) extracted from the time series shown in fig. 9, according to an embodiment of the present disclosure.

If the time series was previously classified as non-seasonal with trends, it is input to a trending module (850) that identifies and removes the included trends to obtain error terms for the time series.

These time series do not have any type of seasonal behaviour, or at least do not have behaviour that is too complex to learn by computationally efficient statistics or ML techniques. Thus, these time sequences cannot be processed with seasonal models such as those mentioned above, otherwise the noise will be regarded as missing seasonal component s (t), which will result in an incorrect decomposition of the error component ε (t). On the other hand, treating these time series as pure noise and learning the population-wide behavior will prevent anomaly detection for any particular period. For example, rescaling in the presence of a trend would compress the entire time series, bypassing the time window around the median or average value for a given period of time.

Embodiments of the present disclosure overcome these difficulties by decomposing the trended non-seasonal time series into trended and error components. Embodiments model the trend component as piecewise linear to accommodate trend changes over time. FIG. 13 illustrates an exemplary time series in which trends (detected according to embodiments of the present disclosure) are superimposed as piecewise linear functions.

Some embodiments may apply bayesian inference methods to extract trend information from the time series. For example, the time series may be expressed as y (t) =g (t) +h (t) +ε (t) or y (t) =g (t) (1+h (t))+ε (t), where t represents time, g (t) represents a trend component (which is assumed to be piecewise linear), h (t) represents a holiday component, and ε (t) represents an error component.

In these embodiments, the trending module estimates trending and holiday portions of the time series and eliminates them using bayesian inference, for example:

These residual components are estimates of the error function epsilon (t), typically noise (e.g., white) with some statistical distribution. Fig. 14 shows the remaining components of the time series in fig. 13 after removal of the trend component also shown in fig. 13.

These time sequences may be processed (860) by the anomaly detector module 2. If the time series was previously categorized as trendless non-seasonal, it is input directly to the module.

The anomaly detector module 1 (870) and the anomaly detector module 2 (860) may operate in parallel based on their respective time series inputs from the other modules described above. Based on comparing the deviation of any time series with the actual predictability of other similar time series, these modules learn the normal behavior in the respective time series and detect anomalies. The result of the two anomaly detector modules is to trigger or flag any anomaly time periods on the respective time sequences.

The anomaly detector module 1 (870) utilizes various metrics for each of the input time series, such as rescaled error or trend components. To make anomaly detection more robust, the module operates on all input time sequences at the same time, rather than analyzing them individually. The particular anomaly detector algorithm used depends on the metric selected.

For example, when the error component is the selected metric, any detector that assumes a white noise process may be used. As a more specific example, a method based on ML or extremum theory may be used. More specifically, the second of these methods is based on the Fisher-Tippett-Gnedenko theorem, which states that the maxima of the i.i.d. random variables have the same type of ("extremum") distribution, independent of the distribution of the original random variables. Furthermore, the original data may be used to estimate parameters of the maximum distribution. Since the maximum corresponds to an extreme event, the method facilitates estimating the extreme event distribution. Note that the Fisher-Tippett-Gnedenko theorem is similar to the central limit theorem for the sum of i.i.d. random variables.

Fig. 15 shows an exemplary arrangement in which the anomaly detector module 1 has created upper and lower bounds for multiple time series of data being analyzed simultaneously, while fig. 16-17 show the upper and lower bounds created for two separate time series, and the actual values of those individual time series. Fig. 16 shows one data point in the time series that may be detected as an anomaly.

The anomaly detector module 2 (860) processes time series data with no trend or seasonal white noise signature, the distribution of which is similar to a normal distribution. The module groups time series by KPIs and filters each KPI in multiple dimensions (i.e., marginal distribution). Clustering algorithms are used to identify outliers.

For example, the module may be implemented using spatial clustering (DBSCAN) of density-based noisy applications, which is a density-based non-parametric clustering algorithm. Given a set of points in a certain space, the DBSCAN clusters together points pointing to many nearby neighbors, while points in a low density region and nearest neighbors that are too far apart by a certain measure are marked as outliers.

Clusters with two dimensions (KPI values, time stamps) may be feasible, as there is no significant autocorrelation in the data, these can be treated as independent data points. The KPI values and timestamp dimensions may be rescaled to approximately the same scale before being input to the clustering algorithm. Different scaling techniques may be used, including a Z-score based on distance from the data mean divided by the data variance.

While a clustering algorithm such as DBSCAN may identify any number of clusters, the module has a goal that only identifies outliers and non-outliers. In this way, the module may identify a "primary" cluster of centers (e.g., near the average) and one or more other clusters that are farther from the average. Other clusters are post-processed to identify if they are outliers or belong to the primary cluster. The module may be implemented as a streaming algorithm so that new data points may be marked as outliers or non-outliers shortly after they arrive.

Subsequently, an anomaly ranking module (880) ranks any detected anomalies based on their exposure, frequency, and importance in the network. In other words, anomalies are ordered and filtered by attributes such as their deviation from normal, duration of the deviation, and/or the impact of the deviation on the network (e.g., number of affected subscribers, affected traffic, value of affected services, etc.). In this way, the module attempts to identify the most relevant and/or most significant anomalies, including possible "root cause" anomalies and possible side effects. For example, among the anomalies detected at the same time, the anomaly with the highest deviation from normal behavior is typically the anomaly with the greatest impact on the session and/or subscriber. These anomalies are indicated as root causes, while other detected anomalies are indicative of side effects of these root causes.

In other words, the anomaly ordering module identifies and orders the most salient anomalies in the most specific filtering dimension. This corresponds to the marginal distribution with the highest deviation from its respective normal behavior. The ranking may be utilized by a user interface (UI 890) to filter and/or categorize anomalies to focus on the most relevant network problems.

Fig. 18 illustrates an exemplary implementation of a network analysis system (1800) according to an embodiment of the present disclosure. In particular, this embodiment is directed to a cloud computing (or more simply "cloud") environment.

The cloud embodiment shown in fig. 18 includes a time series generator module (1810), a time series classification module (1820), a robust filtering module (1830), one or more anomaly detector modules (1840), and an anomaly ordering module (1850). These modules may perform similar functions/operations as the corresponding modules in fig. 8, but are implemented with interfaces and parallel processes that may be tuned for the cloud environment. The training and invoking method is described below for this embodiment.

The system receives input data through a streaming service module and collects the input data for a given period of time. The time series generator module may trigger streaming aggregation to generate M different single-or multi-dimensional time series. These are input to a time series behavior detector module, which classifies each time series according to behavior and sends K time series and classification metadata to a persistent database.

When the time series behavior detector module is completed, it triggers a robust filtering module that identifies frequent patterns and commonalities in the M time series and clears the undesired behavior. This may be done, for example, based on unsupervised learning techniques.

These M more robust time sequences are used for regression and training of different model types that may be predefined in the system, for example. If there are M time sequences, there can be M model types at most, but the arrangement in FIG. 18 assumes K < M model types. The number of model types also depends on the available resources of the cloud computing platform. Regression and training of the K different model types may be performed in parallel.

After regression and training, the M time series and their learned behavior (i.e., K model types), shown here as one for each model type, are handed over to the anomaly detection module. These modules detect anomalies based on K models in any of the ways described above.

The scoring agent encapsulates the K models into service models that can be used for batch-based and stream-based predictive processing via the REST API. The scoring agent REST API also writes the detected anomalies into a persistent database associated with the raw time-series data collected from the network. The persistent database is then queried by an anomaly ordering module that will trigger a UI for ranking for the end user. All results at this point may be transferred to the persistent database for use by the final UI.

The open RAN (O-RAN) alliance is a community of mobile operators and RAN providers, working to achieve an open, intelligent, virtualized, operation efficient and fully interoperable RAN. To achieve these goals, communities define an O-RAN architecture with critical functions and interfaces. The O-RAN Working Group (WG) issues various specifications. For example, O-RAN WG1 relates to use cases and overall architecture. One general principle is that the O-RAN architecture and interface specifications should be as consistent as possible with the 3GPP architecture and interface specifications.

Fig. 19 shows a high-level O-RAN architecture and four critical interfaces Al, O1, open fronthaul M-plane and O2. These interfaces connect the service management and orchestration (SERVICE MANAGEMENT AND Orchestration, SMO) framework to the O-RAN Network Functions (NF) and the open Cloud (O-Cloud). In addition, there is an interface between SMO and external systems to provide enriched data. The NG interface between the O-RAN NF and NG-Core is also shown, which is consistent with the NG interface of the 5GC shown in fig. 1.

The O-RAN architecture description defines three control loops with respective delays:

real Time (RT) control loop (< 10 ms);

near RT RIC control Loop (10-1000 ms), and

Non-RT RIC control loop (> 1000 ms).

The use cases of non-RT RIC and near RT RIC control loops are entirely defined by the O-RAN, but for RT control loops (which perform radio scheduling, HARQ, beamforming, etc.) the O-RAN only defines the relevant interactions of other O-RAN nodes or functions.

The non-RT RIC provides an A1 interface to the near RT RIC. One task of a non-RT RIC is to provide policy-based guidance, machine Learning (ML) model management, and enrichment information to support intelligent RAN optimization (e.g., for radio resource management, RRM) of near RT RIC. The non-RT RIC may also perform intelligent RRM in longer non-RT intervals (e.g., greater than 1 second).

The non-RT RIC may use data analysis and Artificial Intelligence (AI)/ML training and inference to determine RAN optimization for which it may utilize SMO services, e.g., collecting data from and providing data to O-RAN nodes. These actions are performed by a non-RT RIC application (rAPP). The non-RT RIC also includes a non-RT RIC framework that logically terminates the A1 interface inside the SMO framework and opens up all required functions and services to the rAPP.

As currently illustrated, the O-RAN architecture does not include any components and/or interfaces that enable incoming data streams from existing data collection components to be used for cross-domain association. For example, SMO non-RT RIC components do not have any data interfaces towards domains outside the RAN. More generally, incoming data from non-RAN domains (e.g., CN, applications, etc.) is outside the range of the O-RAN. Even so, the following describes different possible implementation options for integrating embodiments of the present disclosure into an O-RAN architecture.

Fig. 20 illustrates a first implementation option for integrating embodiments of the present disclosure in a multi-domain network (2000) including an O-RAN architecture. In this option, an anomaly detector (2010) with cross-domain data dependencies runs on an AI server outside of SMO (e.g., on a public or private cloud computing environment) and has an external interface to SMO. The anomaly detector also has an external interface that facilitates collection of data from other domains such as CN (e.g., 5 GC), IMS, etc.

Fig. 21 illustrates a second implementation option for integrating embodiments of the present disclosure in a multi-domain network (2100) that includes an O-RAN architecture. In this option, an anomaly detector (2110) with cross-domain data correlation operates in a non-RT RIC, and optionally partially within a near-RT RIC. For example, "training" components such as time series behavior detection, decomposer, and trending may be run in a non-RT RIC, where anomaly detection logic is run in a non-RT RIC or near-RT RIC, depending on latency requirements.

Both of these implementation options apply to the case where data collection and association is performed only within the RAN, as well as to the case where data collection and association is performed across other network domains (e.g., CN and IMS) as well.

The various features of the embodiments described above correspond to the various operations illustrated in fig. 22 (including parts a and B) that depict an exemplary method (e.g., procedure) for detecting operational anomalies in a multi-domain communication network, according to various embodiments of the present disclosure. In other words, the various features of the operations described below correspond to the various embodiments described above. While fig. 22 shows particular blocks in a particular order, the operations of the exemplary method can be performed in a different order than shown and can be combined and/or divided into blocks having different functions than shown. Optional blocks or operations are indicated by dashed lines.

The following description is based on an exemplary method performed by a network analysis system associated with a communication network. For example, the network analysis system may be implemented in (or as) a Service Management and Orchestration (SMO) system for the RAN, an analysis-related CN node such as NWDAF, a network management node in an OAM system, or an application running in a host computing system outside the network (e.g., a public or private cloud environment).

An exemplary method may include operations of block 2210, where a network analysis system may obtain a plurality of time sequences of performance data from a plurality of domains of a communication network. The example method may also include operations of block 2220, where the network analysis system may determine one or more models of non-abnormal network behavior based on the plurality of time sequences. The example method may also include an operation of block 2230, wherein the network analysis system may classify each time series into a plurality of types based on the presence or absence of at least two types of components in each time series. The example method may also include an operation of block 2240, wherein the network analysis system may detect operational anomalies in the plurality of time sequences or in additional performance data obtained from the plurality of domains of the communication network based on the one or more models and the classified types.

In some embodiments, the example method may further include operations of block 2250, wherein based on detecting a plurality of operational anomalies in the additional performance data, the network analysis system may determine an order of importance of the detected operational anomalies based on deviations of each from corresponding non-anomalous network behavior. In some of these embodiments, the exemplary method may further include the operations of block 2260, wherein the network analysis system may initiate one or more corrective actions in a plurality of domains of the communication network in response to the one or more detected anomalies determined to be most important. In some of these embodiments, the exemplary method may further include the operations of block 2270, wherein the network analysis system may refrain from initiating one or more further corrective actions in one or more domains of the communication network in response to the one or more detected anomalies determined to be of minor importance.

In some embodiments, one or more of the following applies:

Each time series comprising data samples from one of the network elements, or interfaces between network elements, in a single domain, and

Detecting operational anomalies in a plurality of time sequences collected from a plurality of domains.

In some embodiments, classifying each time series based on the presence or absence of at least two types of components in block 2230 includes the following operations, labeled with corresponding subframe numbers:

(2231) detecting whether each of the time series includes a seasonal component and/or a non-constant trend component;

(2232) classifying the time series as a first type when the time series includes seasonal components;

where the time series includes a non-constant trend component but does not include a seasonal component,

Classifying the time series into a second type, and

(2234) Classifying the time series as a third type when the time series includes neither a non-constant trend component nor a seasonal component.

In some of these embodiments, one or more of the following applies:

Detecting whether each of the time series includes a seasonal component in subframe 2231 is based on one of a Welch test, or a QS test, and

Detecting whether each of the time series includes a non-constant trend component in subframe 2231 is based on one of a smoothness check, a Kolmogorov-Smirnov check, or a neural network automatic encoder.

In some of these embodiments, detecting an operational anomaly in the plurality of time series in block 2240 includes the following operations, marked with a corresponding subframe number:

Decomposing each time series classified as the first type into a seasonal component, a non-constant trend component and a noise component, and

Calculating upper and lower bounds applicable to all time series classified as a first type (2242), and

The (2243) detects an operational anomaly in each time series classified as the first type based on comparing one of the respective non-constant trend component and the respective noise component with the upper and lower bounds.

In some of these embodiments, each time series classified as the third type includes a noise component. In such an embodiment, detecting an operational anomaly in the plurality of time series in block 2240 includes the following operations, marked with a corresponding subframe number:

Decomposing each time series classified as the second type into a non-constant trend component and a noise component (2244), and

Detecting (2245) an operational anomaly in each time series classified as the second type or the third type based on the respective noise component.

In some variations, each noise component includes a series of tuples, where each tuple includes a data value and a corresponding time instant. In such a variation, detecting an operational anomaly in each time series classified as either the second type or the third type in subframe 2245 includes the following operations, marked with the corresponding subframe number:

rescaling the data values and/or moments of the tuples comprising noise components (2245 a), and

(2245 B) detecting an operational anomaly based on arranging the tuples into a plurality of clusters, the plurality of clusters including a non-outlier cluster and at least one outlier cluster.

In some embodiments, determining one or more models of non-abnormal network behavior based on the plurality of time sequences in block 2220 includes operations of sub-block 2221, wherein the network analysis system may train one or more Machine Learning (ML) models based on the plurality of time sequences using LI regularization. For example, each ML model includes a Neural Network (NN) having a plurality of weights, and one or more ML models are trained in subframe 2221 using LI regularization, including the operation of subframe 2221a, wherein the network analysis system may minimize a loss function of the weights of NN and a loss function of a norm of the weights of NN for each ML model.

In some of these embodiments, detecting operational anomalies based on one or more models in block 2240 includes operations of sub-block 2246, wherein using one or more trained ML models, the network analysis system may predict non-anomalous network behavior in one or more of:

a second part of the plurality of time sequences, different from the first part for training one or more ML models, and

Additional performance data obtained from multiple domains of the communication network.

For example, detecting operational anomalies in block 2240 is based on non-anomalous network behavior predicted in sub-block 2246 using one or more trained ML models.

In some embodiments, the number of models of non-abnormal network behavior (e.g., determined in block 2220) is less than the number of time series. In some embodiments, the plurality of time sequences represent a plurality of marginal distributions of performance of the corresponding multi-domain communication system.

In some embodiments, the plurality of domains includes at least two of a User Equipment (UE) domain, a Radio Access Network (RAN) domain, a Core Network (CN) domain, and an IP Multimedia System (IMS) domain. In such an embodiment, the plurality of time sequences includes at least one time sequence obtained from each of the at least two domains.

In some of these embodiments, the RAN domain includes an open RAN (O-RAN) architecture. In such an embodiment, the acquiring, determining, and classifying operations of blocks 2210-2230 are performed by an O-RAN non-real-time RAN intelligent controller (non-RT RIC), while the detecting operations of block 2240 are performed by an O-RAN non-RT RIC or an O-RAN near-RT RIC.

In some of these embodiments, the plurality of time sequences includes at least two of:

Time series of one or more of the following RAN domain quality of service (QoS) metrics:

RAN resources used, serving cell load, mobility events between serving cells, and serving and neighbor cell radio measurements;

Time series of one or more of the CN domain QoS metrics packet delay, packet delay jitter, packet loss and priority level;

a time sequence of tracking data for the respective cell provided by the RAN node;

a time sequence of Performance Management (PM) counter values associated with the RAN node;

Time sequence of User Plane (UP) event information associated with CN domain, and

Time series of Control Plane (CP) event information associated with CN domain.

Although various embodiments have been described herein above in terms of methods, apparatus, devices, computer readable media and receivers, those of ordinary skill will readily appreciate that such methods may be implemented by various combinations of hardware and software in various systems, communications devices, computing devices, control devices, apparatus, non-transitory computer readable media, and the like.

Fig. 23 illustrates an example of a communication system 2300 according to some embodiments. In this example, the communication system 2300 includes a telecommunications network 2302 and a core network 2306, the telecommunications network 2302 including an access network 2304, such as a Radio Access Network (RAN), the core network 2306 including one or more core network nodes 2308. In some embodiments, the telecommunications network 2302 may also include one or more network management (Network Management, NM) nodes 2318, which may be part of an Operations Support System (OSS), a Business Support System (BSS), and/or an OAM system. The NM node may monitor and/or control the operation of other nodes of the access network 2304 and the core network 2306. Although not shown in fig. 23, NM node 2318 is configured to communicate with other nodes in access network 2304 and core network 2306 for these purposes.

The access network 2304 includes one or more access network nodes, such as network nodes 2310a and 2310b (one or more of which may be referred to generally as network node 2310), or any other similar 3GPP access node or non-3 GPP access point. The network node 2310 facilitates direct or indirect connection of UEs, such as by connecting UEs 2312a, 2312b, 2312c, and 2312d (one or more of which may be referred to generally as a UE 2312) to the core network 2306 over one or more wireless connections.

Example wireless communications through wireless connections include transmitting and/or receiving wireless signals using electromagnetic waves, radio waves, infrared waves, and/or other types of signals suitable for conveying information without the use of wires, cables, or other material conductors. Further, in different embodiments, communication system 2300 may include any number of wired or wireless networks, network nodes, UEs, and/or any other components or systems that may facilitate or participate in the communication of data and/or signals, whether via wired or wireless connections. The communication system 2300 may include and/or interface with any type of communication, telecommunications, data, cellular, radio network, and/or other similar type of system.

The UE 2312 may be of any of a wide variety of communication devices including wireless devices arranged, configured and/or operable to wirelessly communicate with the network node 2310 and other communication devices. Similarly, the network node 2310 is arranged, capable, configured and/or operable to communicate directly or indirectly with the UE 2312 and/or with other network nodes or devices in the telecommunications network 2302 to enable and/or provide network access, such as wireless network access, and/or to perform other functions, such as management in the telecommunications network 2302.

In the depicted example, core network 2306 connects network node 2310 to one or more hosts, such as host 2316. These connections may be direct or indirect via one or more intermediary networks or devices. In other examples, the network node may be directly coupled to the host. The core network 2306 includes one or more core network nodes (e.g., core network node 2308) that are comprised of hardware and software components. The features of these components may be substantially similar to those described with respect to the UE, network node, and/or host, such that the description thereof applies generally to the corresponding components of core network node 2308. Exemplary core network nodes include functions of one or more of a Mobile Switching Center (MSC), a Mobility Management Entity (MME), a Home Subscriber Server (HSS), an access and mobility management function (AMF), a Session Management Function (SMF), an authentication server function (AUSF), a subscription identifier de-hiding function (SIDF), a Unified Data Management (UDM), a Secure Edge Protection Proxy (SEPP), a network opening function (NEF), and/or a User Plane Function (UPF).

The host 2316 may be under ownership or control of, and may be operated by, or on behalf of, a service provider other than the operator or provider of the access network 2304 and/or the telecommunications network 2302. Host 2316 can host a variety of applications to provide one or more services. Examples of such applications include live and pre-recorded audio/video content, data collection services such as retrieving and compiling data regarding various environmental conditions detected by multiple UEs, analytics functionality, social media, functionality for controlling or otherwise interacting with remote devices, functionality for alerting and monitoring centers, or any other such functionality performed by a server.

In some embodiments, the access network 2304 may include a Service Management and Orchestration (SMO) system or node 2320 that may monitor and/or control the operation of the access network node 2310. Such an arrangement may be used, for example, when the access network 2304 utilizes an open RAN (O-RAN) architecture. The SMO system 2320 may be configured to communicate with the core network 2306 and/or the host 2316, as shown in fig. 23.

In some embodiments, one or more of host 2316, network management node 2318, and SMO system 2320 may be configured to perform various operations of an exemplary method (e.g., process) for detecting operational anomalies in a multi-domain communication network, such as described above with respect to fig. 22.

As a whole, the communication system 2300 of fig. 23 enables connectivity between UEs, network nodes, and hosts. In this sense, the communication system may be configured to operate according to predefined rules or procedures, such as certain standards including, but not limited to, global System for Mobile communications (GSM), universal Mobile Telecommunications System (UMTS), long Term Evolution (LTE), and/or other suitable 2G, 3G, 4G, 5G standards, or any applicable future generation standards (e.g., 6G), wireless Local Area Network (WLAN) standards, such as Institute of Electrical and Electronics Engineers (IEEE) 802.11 standards (WiFi), and/or any other suitable wireless communication standards, such as worldwide interoperability for microwave Access (WiMax), bluetooth, Z-Wave, near Field Communication (NFC) ZigBee, liFi, and/or any Low Power Wide Area Network (LPWAN) standards, such as LoRa and Sigfox.

In some examples, the telecommunications network 2302 is a cellular network implementing 3GPP standardization features. Accordingly, the telecommunications network 2302 can support network slicing to provide different logical networks to different devices connected to the telecommunications network 2302. For example, the telecommunications network 2302 may provide ultra-reliable low-latency communication (URLLC) services to some UEs, enhanced mobile broadband (eMBB) services to other UEs, and/or large-scale machine-type communication (mMTC)/large-scale loT services to yet other UEs.

In some examples, the UE 2312 is configured to send and/or receive information without direct human interaction. For example, the UE may be designed to send information to the access network 2304 on a predetermined schedule when triggered by an internal or external event, or in response to a request from the access network 2304. Further, the UE may be configured to operate in a single RAT or multi-standard mode. For example, the UE may operate with any one or a combination of Wi-Fi, NR (new radio) and LTE, i.e. configured for multi-radio dual connectivity (MR-DC), such as new radio dual connectivity (EN-DC) of E-UTRAN (evolved-UMTS terrestrial radio access network).

In an example, the hub 2314 communicates with the access network 2304 to facilitate indirect communication between one or more UEs (e.g., UEs 2312c and/or 2312 d) and a network node (e.g., network node 2310 b). In some examples, hub 2314 may be a controller, router, content source and analysis, or any other communication device described herein with respect to a UE. For example, the hub 2314 may be a broadband router that enables UEs to access the core network 2306. As another example, the hub 2314 may be a controller that sends commands or instructions to one or more actuators in the UE. The commands or instructions may be received from the UE, network node 2310, or through executable code, scripts, procedures, or other instructions in the hub 2314. As another example, the hub 2314 may be a data collector that serves as temporary storage for UE data, and in some embodiments, may perform analysis or other processing of the data. As another example, hub 2314 may be a content source. For example, for a UE that is a VR headset, display, speaker, or other media delivery device, hub 2314 may retrieve VR assets, video, audio, or other media or data related to the awareness information via a network node, which hub 2314 then provides to the UE either directly, after performing local processing, and/or after adding additional local content. In yet another example, the hub 2314 acts as a proxy server or orchestrator for the UEs, particularly if one or more of the UEs are low-energy loT devices.

The hub 2314 may have a constant/persistent or intermittent connection to the network node 2310 b. The hub 2314 may also allow for different communication schemes and/or schedules between the hub 2314 and UEs (e.g., UEs 2312c and/or 2312 d) and between the hub 2314 and the core network 2306. In other examples, the hub 2314 is connected to the core network 2306 and/or one or more UEs via a wired connection. Further, the hub 2314 may be configured to connect to an M2M service provider through the access network 2304 and/or to connect to another UE through a direct connection. In some scenarios, the UE may establish a wireless connection with the network node 2310 while still connecting via a wired or wireless connection via the hub 2314. In some embodiments, the hub 2314 may be a dedicated hub-i.e., a hub whose primary function is to route communications to/from the UE from the network node 2310b or to the network node 2310 b. In other embodiments, the hub 2314 may be a non-dedicated hub, i.e., a device operable to route communications between the UE and the network node 2310b, but otherwise operable as a communication start and/or end point for certain data channels.

Fig. 24 illustrates a network node 2400 according to some embodiments. As used herein, a network node refers to a device capable of, configured, arranged and/or operable to communicate directly or indirectly with UEs and/or with other network nodes or devices in a telecommunications network. Examples of network nodes include, but are not limited to, access Points (APs) (e.g., radio access points), base Stations (BSs) (e.g., radio base stations, node BS, evolved node BS (enbs), and NR node BS (gnbs)).

Base stations may be classified based on the amount of coverage provided by the base station (or in other words, the transmit power level of the base station), and thus, depending on the amount of coverage provided, a base station may be referred to as a femto base station, pico base station, micro base station, or macro base station. The base station may be a relay node controlling the relay or a relay donor node. The network node may also include one or more (or all) portions of a distributed radio base station, such as a centralized digital unit and/or a Remote Radio Unit (RRU), sometimes referred to as a Remote Radio Head (RRH). Such remote radio units may or may not be integrated with an antenna as an antenna integrated radio. The portion of the distributed radio base station may also be referred to as a node in a Distributed Antenna System (DAS).

Other examples of network nodes include multiple transmission point (multi-TRP) 5G access nodes, multi-standard radio (MSR) devices such as MSR BS, network controllers such as Radio Network Controllers (RNC) or Base Station Controllers (BSC), base Transceiver Stations (BTSs), transmission points, transmission nodes, multi-cell/Multicast Coordination Entities (MCEs), operation and maintenance (O & M) nodes, operation Support System (OSS) nodes, self-organizing network (SON) nodes, positioning nodes (e.g., evolved serving mobile positioning center (E-SMLC)), and/or Minimization of Drive Tests (MDT).

In some embodiments, network node 2400 can be configured to perform various operations of an exemplary method (e.g., procedure) for detecting operational anomalies in a multi-domain communication network, such as described above with respect to fig. 22.

Network node 2400 includes processing circuitry 2402, memory 2404, communication interface 2406, and power source 2408. The network node 2400 may be comprised of a plurality of physically separate components (e.g., a NodeB component and an RNC component, or a BTS component and a BSC component, etc.), each of which may have their own components. In certain scenarios where network node 2400 includes multiple separate components (e.g., BTS and BSC components), one or more of the separate components may be shared among several network nodes. For example, a single RNC may control multiple nodebs. In this scenario, each unique NodeB and RNC pair may be considered as a single separate network node in some cases. In some embodiments, network node 2400 may be configured to support multiple Radio Access Technologies (RATs). In such embodiments, some components may be duplicated (e.g., separate memory 2404 for different RATs), and some components may be reused (e.g., the same antenna 2410 may be shared by different RATs). Network node 2400 may also include multiple sets of various illustrated components for different wireless technologies integrated into network node 2400, such as GSM, WCDMA, LTE, NR, wiFi, zigbee, Z-wave, loRaWAN, radio Frequency Identification (RFID), or Bluetooth wireless technologies. These wireless technologies may be integrated into the same or different chips or chipsets and other components within network node 2400.

The processing circuit 2402 may include a microprocessor, controller, microcontroller, central processing unit, digital signal processor, application specific integrated circuit, combination of one or more of a field programmable gate array, or any other suitable computing device, resource, or combination of hardware, software, and/or encoded logic operable to provide the functionality of the network node 2400, alone or in combination with other network node 2400 components such as memory 2404.

In some embodiments, processing circuit 2402 includes a system on a chip (SOC). In some embodiments, the processing circuitry 2402 includes one or more Radio Frequency (RF) transceiver circuits 2412 and baseband processing circuits 2414. In some embodiments, the Radio Frequency (RF) transceiver circuitry 2412 and baseband processing circuitry 2414 may be on separate chips (or groups of chips), boards, or units, such as radio units and digital units. In alternative embodiments, some or all of the RF transceiver circuitry 2412 and baseband processing circuitry 2414 may be on the same chip or chipset, board, or unit.

Memory 2404 may include any form of volatile or non-volatile computer-readable memory including, but not limited to, persistent memory, solid state memory, remote-mounted memory, magnetic media, optical media, random Access Memory (RAM), read-only memory (ROM), mass storage media (e.g., hard disk), removable storage media (e.g., flash drive, compact Disk (CD) or Digital Video Disk (DVD)), and/or any other volatile or non-volatile, non-transitory device-readable and/or computer-executable memory device that stores information, data, and/or instructions that may be used by processing circuit 2402. The memory 2404 may store any suitable instructions, data, or information, including computer programs, software, applications including one or more of logic, rules, code, tables, and/or other instructions capable of being executed by the processing circuit 2402 and utilized by the network node 2400 (collectively, computer program products 2404 a). Memory 2404 may be used to store any calculations made by processing circuit 2402 and/or any data received via communication interface 2406. In some embodiments, processing circuit 2402 and memory 2404 are integrated.

The communication interface 2406 is used for wired or wireless communication of signaling and/or data between network nodes, access networks, and/or UEs. As shown, the communication interface 2406 includes a port (s)/terminal(s) 2416 to transmit data to and receive data from a network, e.g., through a wired connection. Communication interface 2406 also includes radio front end circuitry 2418 that may be coupled to antenna 2410 or, in some embodiments, to a portion of antenna 2410. Radio front-end circuit 2418 includes a filter 2420 and an amplifier 2422. Radio front-end circuitry 2418 may be connected to antenna 2410 and processing circuitry 2402. The radio front-end circuitry may be configured to condition signals communicated between the antenna 2410 and the processing circuitry 2402. The radio front-end circuitry 2418 may receive digital data to be sent out to other network nodes or UEs via a wireless connection. Radio front-end circuitry 2418 may use a combination of filters 2420 and/or amplifiers 2422 to convert the digital data into a radio signal having the appropriate channel and bandwidth parameters. The radio signal may then be transmitted via the antenna 2410. Similarly, when receiving data, the antenna 2410 may collect radio signals, which are then converted to digital data by the radio front end circuit 2418. The digital data may be passed to processing circuit 2402. In other embodiments, the communication interface may include different components and/or different combinations of components.

In certain alternative embodiments, network node 2400 does not include a separate radio front-end circuit 2418, alternatively, processing circuit 2402 includes a radio front-end circuit and is connected to antenna 2410. Similarly, in some embodiments, all or some of RF transceiver circuitry 2412 is part of communication interface 2406. In other embodiments, the communication interface 2406 includes one or more ports or terminals 2416, radio front-end circuitry 2418, and RF transceiver circuitry 2412 as part of a radio unit (not shown), and the communication interface 2406 communicates with baseband processing circuitry 2414 as part of a digital unit (not shown).

The antenna 2410 may include one or more antennas or antenna arrays configured to transmit and/or receive wireless signals. The antenna 2410 may be coupled to the radio front-end circuitry 2418 and may be any type of antenna capable of wirelessly transmitting and receiving data and/or signals. In certain embodiments, the antenna 2410 is separate from the network node 2400 and may be connected to the network node 2400 through an interface or port.

The antenna 2410, communication interface 2406, and/or processing circuit 2402 may be configured to perform any receiving operations and/or certain obtaining operations described herein as being performed by a network node. Any information, data and/or signals may be received from the UE, another network node and/or any other network device. Similarly, the antenna 2410, communication interface 2406, and/or processing circuit 2402 may be configured to perform any of the transmit operations described herein as being performed by a network node. Any information, data and/or signals may be transmitted to the UE, another network node and/or any other network device.

The power source 2408 provides power to the various components of the network node 2400 in a form suitable for the respective components (e.g., at the voltage and current levels required for each respective component). The power source 2408 may further include or be coupled to a power management circuit to supply power to the components of the network node 2400 for performing the functions described herein. For example, network node 2400 may be connectable to an external power source (e.g., a power grid, a power outlet) via an input circuit or interface, such as a cable, whereby the external power source supplies power to the power circuit of power source 2408. As another example, power source 2408 may include a power source in the form of a battery or battery pack that is connected to or integrated in a power circuit. The battery may provide backup power if the external power source fails.

Embodiments of network node 2400 may include additional components to those shown in fig. 24 for providing certain aspects of network node functionality, including any functionality described herein and/or any functionality necessary to support the subject matter described herein. For example, network node 2400 may include a user interface device to allow information to be input into network node 2400 and to allow information to be output from network node 2400. This may allow a user to perform diagnostic, maintenance, repair, and other management functions on network node 2400.

Fig. 25 is a block diagram of a host 2500, which can be an embodiment of the host 2316 of fig. 23, in accordance with various aspects described herein. As used herein, the host 2500 may be or include various combinations of hardware and/or software, including stand-alone servers, blade servers, cloud-implemented servers, distributed servers, virtual machines, containers, or processing resources in a server farm. Host 2500 can provide one or more services to one or more UEs.

The host 2500 includes processing circuitry 2502 that is operably coupled to an input/output interface 2506, a network interface 2508, a power supply 2510, and a memory 2512 via a bus 2504. Other components may be included in other embodiments. The features of these components may be substantially similar to those described with respect to the devices of the previous figures, such that the description thereof applies generally to the corresponding components of the host 2500.

Memory 2512 may include one or more computer programs including one or more host applications 2514 and data 2516, which may include user data, e.g., data generated by a UE for host 2500 or data generated by host 2500 for a UE. Embodiments of the host 2500 may utilize only a subset or all of the components shown. The host application 2514 may be implemented in a container-based architecture and may provide support for video codecs (e.g., general purpose video coding (VVC), high Efficiency Video Coding (HEVC), advanced Video Coding (AVC), MPEG, VP 9) and audio codecs (e.g., FLAC, advanced Audio Coding (AAC), MPEG, g.711), including transcoding for a plurality of different categories, types or implementations of UEs (e.g., handsets, desktop computers, wearable display systems, heads-up display systems). The host application 2514 may also provide user authentication and permission checks and may periodically report health, routing, and content availability to a central node, such as a device in or on the edge of the core network. Thus, host 2500 can select and/or indicate a different host for cross-roof services for a UE. The host application 2514 may support various protocols such as the HTTP Live Streaming (HLS) protocol, the real-time messaging protocol (RTMP), the real-time streaming protocol (RTSP), dynamic adaptive streaming over HTTP (MPEG-DASH), and the like.

In some embodiments, host 2500 may be configured to perform various operations of an exemplary method (e.g., procedure) for detecting operational anomalies in a multi-domain communication network, such as described above with respect to fig. 22.

Fig. 26 is a block diagram illustrating a virtualization environment 2600 in which functionality implemented by some embodiments can be virtualized. Virtualization in this context means creating a virtual version of an apparatus or device, which may include virtualized hardware platforms, storage devices, and networking resources. As used herein, virtualization may apply to any device or component thereof described herein, and relates to an embodiment in which at least a portion of the functionality is implemented as one or more virtual components. Some or all of the functionality described herein may be implemented as virtual components executed by one or more Virtual Machines (VMs) implemented in one or more virtual environments 2600 hosted by one or more of the hardware nodes, such as a hardware computing device operating as a network node, UE, core network node, or host. Furthermore, in embodiments in which the virtual node does not require radio connectivity (e.g., core network node or host), the node may be fully virtualized.

An application 2602 (which may alternatively be referred to as a software instance, virtual device, network function, virtual node, virtual network function, etc.) runs in the virtualized environment 2600 to implement some features, functions, and/or benefits of some embodiments disclosed herein. In some embodiments, one or more applications 2602 may be configured to perform various operations of an exemplary method (e.g., process) for detecting operational anomalies in a multi-domain communication network, such as described above with respect to fig. 22.

The hardware 2604 includes processing circuitry, memory storing software and/or instructions executable by the hardware processing circuitry (collectively, computer program products 2604 a), and/or other hardware devices as described herein, such as network interfaces, input/output interfaces, and the like. The software may be executed by the processing circuitry to instantiate one or more virtualization layers 2606 (also referred to as a hypervisor or Virtual Machine Monitor (VMM)), provide VMs 2608a and 2608b (one or more of which may be generally referred to as VM 2608), and/or perform any of the functions, features, and/or benefits described in connection with some embodiments described herein. The virtualization layer 2606 may present a virtual operating platform to the VM 2608 that appears to be networking hardware.

VM 2608 includes virtual processing, virtual memory, virtual networking or interfaces, and virtual storage, and may be run by a corresponding virtualization layer 2606. Different embodiments of instances of virtual device 2602 may be implemented on one or more of VMs 2608 and may be implemented in different ways. Virtualization of hardware is referred to in some contexts as Network Function Virtualization (NFV). NFV can be used to incorporate many network device types onto industry standard high capacity server hardware, physical switches, and physical storage, which can be located in data centers as well as customer premises equipment.

In the context of NFV, VM 2608 may be a software implementation of a physical machine running a program as if they were executing on a physical, non-virtualized machine. Each of the VMs 2608 and the portion of the hardware 2604 executing the VM, whether hardware dedicated to the VM and/or hardware shared by the VM with other VMs, form separate virtual network elements. Still in the context of NFV, virtual network functions are responsible for handling specific network functions running in one or more VMs 2608 on top of hardware 2604 and corresponding to applications 2602.

The hardware 2604 may be implemented in a stand-alone network node with general or specific components. The hardware 2604 may implement some functions via virtualization. Alternatively, the hardware 2604 may be part of a larger hardware cluster (e.g., such as in a data center or CPE), where many hardware nodes work together and are managed via management and orchestration 2610, which in particular oversees lifecycle management of the application 2602. In some embodiments, hardware 2604 is coupled to one or more radio units, each radio unit including one or more transmitters and one or more receivers that may be coupled to one or more antennas. The radio unit may communicate directly with other hardware nodes via one or more suitable network interfaces and may be used in combination with virtual components to provide radio capabilities to virtual nodes, such as radio access nodes or base stations. In some embodiments, some signaling may be provided through the use of a control system 2612, the control system 2612 optionally being used for communication between hardware nodes and radio units.

The foregoing merely illustrates the principles of the disclosure. Various modifications and alterations to the described embodiments will be apparent to those skilled in the art in view of the teachings herein. It will thus be appreciated that those skilled in the art will be able to devise numerous systems, arrangements and procedures which, although not explicitly shown or described herein, embody the principles of the disclosure and are thus within the spirit and scope of the present disclosure. Those of ordinary skill in the art will appreciate that the various embodiments may be used with each other or interchangeably.

As used herein, the term "unit" may have a meaning conventional in the electronic, electrical, and/or electronic device arts, and may include, for example, electrical and/or electronic circuitry, devices, modules, processors, memory, logical solid state and/or discrete devices, computer programs or instructions for performing the corresponding tasks, processes, calculations, output and/or display functions, etc., as those described herein.

Any suitable step, method, feature, function, or benefit disclosed herein may be performed by one or more functional units or modules of one or more virtual devices. Each virtual device may include a plurality of these functional units. These functional units may be implemented via processing circuitry, which may include one or more microprocessors or microcontrollers, as well as other digital hardware, which may include a Digital Signal Processor (DSP), dedicated digital logic, or the like. The processing circuitry may be configured to execute program code stored in memory, which may include one or more types of memory such as Read Only Memory (ROM), random Access Memory (RAM), cache memory, flash memory devices, optical storage devices, and the like. The program code stored in the memory includes program instructions for performing one or more telecommunications and/or data communication protocols and instructions for performing one or more of the techniques described herein. In some implementations, processing circuitry may be used to cause various functional units to perform corresponding functions in accordance with one or more embodiments of the present disclosure.

As described herein, the apparatus and/or means may be represented by a semiconductor chip, a chipset, or a (hardware) module comprising such a chip or chipset, however, this does not exclude the possibility that the functionality of the apparatus or means is not implemented by hardware, but rather as software modules, such as a computer program or a computer program product comprising executable software code portions for execution or running on a processor. Furthermore, the functionality of the device or apparatus may be implemented by any combination of hardware and software. An apparatus or device may also be considered a collection of devices and/or devices, whether functionally coordinated with or independent of each other. Furthermore, the devices and apparatus may be implemented in a distributed fashion throughout the system as long as the functionality of the devices or apparatus is preserved. These and similar principles are considered to be known to the skilled person.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. It will be further understood that terms used herein should be interpreted as having a meaning that is consistent with their meaning in the context of this specification and the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Moreover, certain terms used in this disclosure (including the description and figures) may be used synonymously in certain instances (e.g., "data" and "information"). It will be understood that although these terms (and/or other terms that may be synonymous with each other) may be used synonymously herein, there are examples where these terms are intended to be used synonymously.

Claims

1. A computer-implemented method for detecting an operational anomaly in a multi-domain communication network, the method comprising:

obtaining (2210) a plurality of time series of performance data from a plurality of domains of a communication network;

determining (2220) one or more models of non-anomalous network behavior based on the plurality of time series;

Classifying (2230) each time series into a plurality of types based on the presence or absence of at least two types of components in each time series; and

Based on the one or more models and the classified types, operational anomalies are detected (2240) in the plurality of time series or in additional performance data obtained from a plurality of domains of the communication network.

2. The method of claim 1 further comprising: based on detecting a plurality of operational anomalies in the additional performance data, determining (2250) an order of importance of the detected operational anomalies based on respective deviations from corresponding non-anomalous network behavior.

3. The method according to claim 2, further comprising one or more of the following:

initiating (2260) one or more corrective actions in a plurality of domains of the communication network in response to the one or more detected anomalies determined to be most significant; and

In response to the one or more detected anomalies being determined to be less significant, refraining from initiating (2270) one or more further corrective actions in one or more domains of the communication network.

4. The method according to any one of claims 1 to 3, wherein one or more of the following apply:

Each time series includes data samples from one of the following in a single domain: a network element or an interface between network elements; and

Detect operational anomalies in multiple time series collected from multiple domains.

5. The method according to any one of claims 1 to 4, wherein classifying (2230) the respective time series based on the presence or absence of at least two types of components comprises:

detecting (2231) whether each of the time series includes a seasonal component and/or a non-constant trend component;

When the time series includes a seasonal component, classifying (2232) the time series as a first type;

When the time series includes a non-constant trend component but does not include a seasonal component, classifying (2233) the time series as a second type; and

When a time series includes neither a non-constant trend component nor a seasonal component, the time series is classified (2234) as a third type.

6. The method of claim 5, wherein one or more of the following applies:

detecting (2231) whether each of the time series includes a seasonal component based on one of the following statistical tests: Welch's test or QS test; and

Whether each of the time series includes a non-constant trend component is detected (2231) based on one of the following: a stationarity test, a Kolmogorov-Smirnov test, or a neural network autoencoder.

7. The method according to any one of claims 5-6, wherein detecting (2240) an operational anomaly in the plurality of time series comprises:

decomposing (2241) each time series classified as the first type into a seasonal component, a non-constant trend component, and a noise component; and

Calculating (2242) upper and lower bounds applicable to all time series classified as the first type; and

An operational anomaly in each time series classified as a first type is detected (2243) based on comparing one of the following to the upper and lower bounds: a respective non-constant trend component and a respective noise component.

8. The method according to any one of claims 5 to 7, wherein:

each time series classified as the third type includes a noise component; and

Detecting (2240) operational anomalies in multiple time series includes:

decomposing (2244) each time series classified as the second type into a non-constant trend component and a noise component; and

An operational anomaly in each time series classified as the second type or the third type is detected ( 2245 ) based on the respective noise components.

9. The method according to claim 8, wherein:

Each noise component comprises a series of tuples, each tuple comprising a data value and a corresponding time instant; and

Detecting (2245) an operational anomaly in each time series classified as the second type or the third type includes:

rescaling (2245a) the data values and/or times of the tuple including the noise component; and

Operational anomalies are detected (2245b) based on arranging the tuples into a plurality of clusters, the plurality of clusters including non-outlier clusters and at least one outlier cluster.

10. The method according to any one of claims 1-9, wherein determining (2220) one or more models of non-anomalous network behavior based on the multiple time series includes training (2221) one or more machine learning (ML) models based on the multiple time series using LI regularization.

11. The method according to claim 10, wherein:

Each ML model includes a neural network NN having a plurality of weights; and

Training (2221) the one or more ML models using LI regularization includes:

For each ML model, minimize (2221a) the loss function of the NN's weights and the loss function of the norm of the NN's weights.

12. The method according to any one of claims 10-11, wherein detecting (2240) an operational anomaly based on the one or more models comprises: using the one or more trained ML models, predicting (2246) a non-anomalous network behavior in one or more of:

a second portion of the plurality of time series that is different from the first portion used to train the one or more ML models; and

The further performance data is obtained from a plurality of domains of the communications network.

13. The method of claim 12, wherein detecting (2240) an operational anomaly is based on non-anomalous network behavior predicted using the one or more trained ML models.

14. A method according to any one of claims 1-13, wherein the multiple time series include one or more multidimensional time series, and obtaining (2210) the multiple time series includes aggregating (2211) at least two obtained single-dimensional time series to form each multidimensional time series.

15. The method according to any one of claims 1-14, wherein the number of models of non-abnormal network behavior is less than the number of time series.

16. The method according to any one of claims 1-15, wherein the plurality of time series represent a plurality of marginal distributions of performance of a corresponding multi-domain communication system.

17. The method according to any one of claims 1 to 16, wherein:

The plurality of domains include at least two of the following domains: a User Equipment UE domain; a Radio Access Network RAN domain; a Core Network CN domain; and an IP Multimedia System IMS domain; and

The plurality of time series include at least one time series obtained from each of the at least two domains.

18. The method of claim 17, wherein:

The RAN domain includes an open RAN (O-RAN) architecture;

The obtaining, determining and classifying operations are performed by an O-RAN non-real-time RAN intelligent controller (non-RT RIC); and

The detection operation is performed by the O-RAN non-RT RIC or by the O-RAN near-RT RIC.

19. The method according to any one of claims 17-18, wherein the plurality of time series include at least two of the following:

time series of one or more of the following RAN domain Quality of Service (QoS) metrics: RAN resources used, serving cell load, mobility events between serving cells, and serving and neighbor cell radio measurements;

time series of one or more of the following CN domain QoS metrics: packet delay, packet delay jitter, packet loss, and priority level;

Time series of tracking data for each cell provided by the RAN node;

a time series of performance management (PM) counter values associated with a RAN node;

A time series of User Plane (UP) event information associated with the CN domain; and

Time series of control plane (CP) event information associated with the CN domain.

20. A network analysis system (800, 1800, 2010, 2110, 2316, 2318, 2320, 2400, 2500, 2600) configured to detect operational anomalies in a multi-domain communication network (198, 199, 200, 300, 2000, 2100, 2302), the network analysis system comprising:

Communications interface circuitry (2406, 2508, 2604) configured to communicate with a plurality of domains of a communications network; and

a processing circuit (2402, 2502, 2604) operably coupled to the communication interface circuit, whereby the processing circuit and the communication interface circuit are configured to:

obtaining a plurality of time series of performance data from the plurality of domains of the communications network;

determining one or more models of non-anomalous network behavior based on the plurality of time series;

classifying each time series into a plurality of types based on the presence or absence of at least two types of components in each time series; and

Based on the one or more models and the classified types, operational anomalies are detected in the plurality of time series or in additional performance data obtained from a plurality of domains of the communication network.

21. The network analysis system of claim 20, wherein the processing circuit and the communication interface circuit are further configured to perform operations corresponding to any one of claims 2-19.

22. A network analysis system (800, 1800, 2010, 2110, 2316, 2318, 2320, 2400, 2500, 2600) configured to detect operational anomalies in a multi-domain communication network (198, 199, 200, 300, 2000, 2100, 2302), the network analysis system comprising:

a time series generator module (810, 1810) configured to obtain a plurality of time series of performance data from a plurality of domains of a communication network;

a robust filtering module (830, 1830) configured to determine one or more models of non-anomalous network behavior based on the plurality of time series;

A time series classification module (820, 1820) configured to classify each time series into a plurality of types based on the presence or absence of at least two types of components in each time series; and

One or more anomaly detection modules (860, 870, 1840) are configured to detect operational anomalies in the plurality of time series or in additional performance data obtained from a plurality of domains of the communication network based on the one or more models and the classified types.

23. The network analysis system of claim 22, further comprising an anomaly ranking module (880, 1850) configured to determine an order of importance of multiple operational anomalies detected by the one or more anomaly detection modules in additional performance data based on respective deviations of the detected multiple operational anomalies from corresponding non-anomalous network behaviors.

24. The network analysis system according to claim 22, further configured to perform operations corresponding to any one of the methods according to claims 3-19.

25. A non-transitory computer-readable medium (2404, 2604) storing computer-executable instructions that, when executed by a processing circuit (2402, 2502, 2604), configure a network analysis system (800, 1800, 2010, 2110, 2316, 2318, 2320, 2400, 2500, 2600) to detect operational anomalies in a multi-domain communication network (198, 199, 200, 300, 2000, 2100, 2302) based on performing operations corresponding to any of the methods of claims 1-19.

26. A computer program product (2404a, 2604a) comprising computer executable instructions which, when executed by a processing circuit (2402, 2502, 2604), configure a network analysis system (800, 1800, 2010, 2110, 2316, 2318, 2320, 2400, 2500, 2600) to detect operational anomalies in a multi-domain communication network (198, 199, 200, 300, 2000, 2100, 2302) based on performing operations corresponding to any of the methods of claims 1-19.