WO2024018257A1

WO2024018257A1 - Early detection of irregular patterns in mobile networks

Info

Publication number: WO2024018257A1
Application number: PCT/IB2022/056639
Authority: WO
Inventors: Attila BÁDER; András HÓCZA; Gábor MAGYAR
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 2022-07-19
Filing date: 2022-07-19
Publication date: 2024-01-25

Abstract

Methods for evaluating performance of a communication network using natural language modeling. An example method comprises generating (810), from a plurality of data session records, a sequence of tokens, each token in the sequence corresponding to at least one key performance indicator, KPI, for a corresponding data session, where each of the one or more data session records is a collection of performance-related metrics for a data session. The example method further comprises evaluating (820) the sequence, using a KPI token language model generated from training data obtained from the communication network, to detect whether the sequence is improbable, indicating an irregularity in network performance.

Description

EARLY DETECTION OF IRREGULAR PATTERNS IN MOBILE NETWORKS

TECHNICAL FIELD

This disclosure is generally related to communications networks and is more particularly related to techniques for using natural language processing to detect performance irregularities in such networks.

BACKGROUND

Wireless telecommunication networks are configured to run with best performance by using correct configuration management parameters, which may be set and/or tuned to address varied cell sizes, deployment topographies, traffic load patterns, etc. Different configurations induce different behavior in the network, ideally to optimize network throughput, minimize interference and dropped/interrupted calls or data sessions, and otherwise optimize user experience. Normally, the parameters are configured based on guidelines provided by the respective network vendors for running the networks effectively.

Modern telecommunication networks - especially mobile networks - are extremely complex and this complexity is continuously increasing. Performance perceived by end users can be impacted by configuration, load and interactions of thousands of different networks elements. To keep operations cost low and end-user perceived performance high, it is critical to automate network operations and network optimization as much as possible.

Network Analytics systems, which are part of the Network Management domain in communications systems such as the 4G and 5G wireless communication systems specified by members of the 3rd-Generation Partnership Project (3GPP), monitor and analyze service and network quality at session level in mobile networks. Network Analytics systems are increasingly used for automatic network operation as well, improving the network, or eliminating service or network issues.

Basic network Key Performance Indicators (KPIs) are continuously monitored by Network Analytics systems. The KPIs are based on node and network events and counters. KPIs are aggregated in time and often for node or other dimensions (e.g., device type, service provider, or network dimensions, like cell, network node, etc.). KPIs can indicate node or network failures but usually are not detailed enough for troubleshooting purposes, and they are typically not suitable for identifying end-to-end (e2e), user-perceived service quality issues.

In addition to calculating and monitoring KPIs, Network Analytics systems also generate “incidents.” Incidents are either based on explicit network failure indications (e.g., failure cause codes in signaling events) or based on KPI thresholds. Anomaly detection algorithm and methods are used to identify unexpected changes of KPI values, or outlying values for different node, network function, subscriber, terminal groups. Incidents are used for generating alarms in Fault Management (FM) systems.

Advanced, event-based analytics systems collect and correlate elementary network events as well as e2e service quality metrics and computing user level e2e service quality KPIs and incidents. Per-subscriber and per-session and network-level KPIs and incidents are aggregated for different network and time dimensions. These types of solutions are suitable for session-based troubleshooting and analysis of network issues. The monitored network domains include both the radio and core networks.

Problems remain with current network analytics techniques, however. For example, explicit network failure indications, node alarms, usually are generated when the issue is already occurred, which may mean that issues result in temporary or permanent service quality degradation. There are often no indications in advance before the issue occurs. Further, KPI threshold-based fault indications or anomaly detection methods are often based on time-aggregated KPIs values. If the issue is characterized by a relatively short time period within the aggregation period, the issue may escape detection.

For the same reason, if the KPI is aggregated for a parameter, such as node, network function, subscriber group, terminal types, etc., and the issue is related to only a few parameters, the issue may not be detected. Monitoring the KPIs separately for different parameter values is challenging, or not possible, due to the large number of parameters and possible parameter values or value ranges.

Another problem is that if an issue is related to two or more specific parameter combinations, due to the large number of combinations it is very difficult to detect issues with legacy methods.

Service-level indication (SLI) computation systems provide overall information about the network performance. Realtime monitoring of an SLI, e.g., to detect a decreasing SLI over a period, may indicate issues. (See Zlatniczki, et a!., International Patent Application Publication No. W02021171063, “Adaptive method for measuring service level consistency,” hereinafter, “Zlatniczki.”) Many KPI values on a wide time window can be aggregated to create a consistency model, against which monitored SLIs can be evaluated, but this gives back high-level (network-level and service-level) metrics about the service quality. This SLI value does not provide insight into the reasons for issues, on a low level.

Threshold-based anomaly detection systems can simultaneously monitor multiple low-level indicators in the network and indicate defects when one of these metrics is out from its valid range. An example system is described in Octavian et a!., International Patent Application Publication No. WO2017059904, “Anomaly detection in a data packet access network.” This anomaly detection method handles low-level network parameters about the operational status of ports of network elements and evaluates the collected samples in statistical way in a time window. An anomaly is identified when a single outlier value has been found that exceeds the thresholds, which are based on observations.

Previously, machine learning methods have been used for anomaly detection in network traffic. Supervised methods are not easily applicable for this task, because a target function is needed for them to recognize what is the anomaly. A lot of manual work is required to set up all important targets, and with a task which has a large feature space there is no chance to enumerate all possible combinations. Unsupervised methods are more powerful to find outliers using N-dimensional Euclidean distance, clustering, nearest neighbors, or statistical techniques. In one method, described in Sebastien et al., U.S. Patent Application Publication No. 2018/013776, “Specializing unsupervised anomaly detection systems using genetic programming,” scores of unsupervised machine learningbased anomaly detector are improved by specialization with genetic programming. Data session records - as described below - have large number of feature space with sparse filled content, where each item has unused fields that are empty. There are many variations of what fields are used in same data records, and there may also be many overlapping parts among the observable structures. This characteristic makes the data session record comparison hard, whether using Euclidean distance or clustering to reveal outliers.

Several problems with existing techniques for improving network analytics may be addressed by applying techniques based on natural language processing, using language models (LMs). LM applications for analyzing network traffic have been described previously. For example, in Cohavi et a/., U.S. Patent Application Publication No. 2020/0159998, “Method and apparatus for detecting anomalies in mission critical environments using statistical language modeling,” network messages contain a unigram or bigram of transactions. An anomaly detection rule is created when one or more details of a transaction be generalized as a template, for example when two IP addresses are always the same in similar transactions. In Hocza et al., International Patent Application Publication WO2017012654, “Predictive procedure handling,” the network messages are transformed to tokens and a LM is created from the various sequences of successful call flow steps. With the help of the built model, the unsuccessful call setups can be predicted before the alert of failure and throwing them may preserve network resources.

SUMMARY

Previously described LM applications based on network messages can be used when the observed network transactions are executing in a non-deterministic way. Every possible order of messages that finish the flow with successful result should be learned. The characteristics of data session records are different from this, however, because these data session records contain aggregated KPI values instead of execution order, and the question becomes how to extract the frequently occurring KPI value collocations (bigrams) or longer coexistences (N-grams) to build a LM that can be used reveal unusual KPI value combinations. Previously disclosed LM methods do not provide a solution for this kind of usage. Techniques to address this problem are described herein.

These techniques include methods for detecting irregular network behavior, service quality or traffic patterns, subscriber behavior or any deviation from typical operation of a 3GPP mobile data network. The irregular pattern detection is based on Natural Language Processing (NLP) techniques and by using Language Models (LM) applied to the mobile data network domain. The techniques detailed herein transform network events, network performance metrics, and traffic metrics into tokens, to enable the pattern detection by the NLP techniques and LMs.

An example method according to the disclosed techniques comprises generating, from a plurality of data session records, a sequence of tokens, each token in the sequence corresponding to at least one key performance indicator (KPI) for a corresponding data session, where each of the one or more data session records is a collection of performance- related metrics for a data session. The example method further comprises evaluating the sequence, using a KPI token language model generated from training data obtained from the communication network, to detect whether the sequence is improbable, indicating an irregularity in network performance.

Another example method comprises generating, from a plurality of data session records, a plurality of sequences of tokens, each token in each sequence corresponding to at least one key performance indicator (KPI) for a corresponding data session, again where each of the one or more data session records is a collection of performance-related metrics for a data session. This method further comprises training a KPI token language model, using the generated sequences, and using the trained KPI token language model to evaluate subsequently generated sequences of tokens generated from data session records, to detect irregularities in network performance.

Corresponding apparatuses and systems are also described herein.

The techniques and systems described herein may be used to detect problems in a communication network that are hidden, e.g., problems that a human cannot recognize, that are not possible to capture by monitoring a single or a few parameter values, using deterministic numerical thresholds. The techniques detect problems that cannot be described by currently applied rule parameters or crossing numerical thresholds.

The techniques and systems may be used to provide a set of new, unique type of signals of irregular network behavior, detecting irregularities well before possible real incidents or anomalies occur. The techniques and systems may serve as a filtering and/or early warning mechanism, reducing the number of possible real problems that occur.

Further details and advantages of the techniques and systems are described in detail below.

BRIEF DESCRIPTION OF THE FIGURES

Figure 1 illustrates an example system architecture.

Figure 2 illustrates components of an example LM detection system.

Figure 3 illustrates a histogram of not-empty fields in a sample of 5-minute data session records.

Figure 4A is a process flow diagram illustrating an example training-data selection process.

Figure 4B is a process flow diagram illustrating an example process for KPI token model training.

Figure 4C illustrates an example process for aggregated KPI token model training.

Figure 4D shows an example process flow for an early detection process.

Figure 5 shows example results for a random KPI token sequence generation method. Figure 6 is a chart showing maximum aggregated token repetition occurrences for an example data set.

Figure 7 shows an example of selecting an unlikely part in a token sequence, using an LM.

Figure 8 is a process flow diagram illustrating an example method for evaluating network performance, according to some embodiments.

Figure 9 is a process flow diagram illustrating another example method for evaluating network performance, according to some embodiments.

Figure 10 is a block diagram showing an example processing node for carrying out one or more of the presently disclosed techniques.

Figure 11 illustrates a virtualization environment, in which parts of or all of any of the techniques disclosed herein may be implemented.

DETAILED DESCRIPTION

Advantageous techniques for use in improving network performance are described herein. The network referred to herein can be a fifth generation (5G) wireless network, or any other communication network. In various embodiments, the network referred to herein may be a radio access network (RAN), or any other type of network. The techniques described may be implemented on or by one or more nodes, where those one or more nodes are part of or coupled to the network.

The term “machine learning model” is well understood by those familiar with machine learning technology. Nevertheless, for the purposes of this document, the term should be understood to mean a combination of a model data stored in machine memory and a machine-implemented predictive algorithm configured to infer one or more parameters, or “labels,” from one or more input data parameters, or “features.” Thus, a “machine learning model” is not an abstract concept, but an instantiation of a data structure comprising the model data coupled with an instantiation of the predictive algorithm.

Language Model (LM) theory is an intensively researched field for natural languages and there are thousands of publications about various variants, solutions, and their successful applications in Natural Language Processing (NLP). (See, for example, "Speech and Language Processing: An introduction to Natural Language Processing, Computational Linguistics, and Speech Processing", Daniel Jurafsky & James H. Martin, published 2000 by Prentice-Hall, ISBN 0-13-095069-6.) In LMs, the validity, or “goodness,” of a sentence is estimated by aggregating the sequence of N-gram probabilities, where N-grams are N-length subsequences of words or, more generically, “tokens,” in a given sentence or sequence. The N-grams probability values are collected from a prebuilt LM or are estimated from the probabilities of shorter existing N-grams in LM by backoff mechanism. During the model creation, the N-gram counting algorithm assigns probabilities for each existing N-grams in the training corpus, where the probability estimation is based on the occurrence counts.

In the LM process, not only the sentence probability can be estimated, but the word transition probabilities can also be retrieved. An unlikely word transition may explain why the whole sentence is categorized as incorrect for being part of the language that the applied LM was made on. This may be a kind of anomaly, and this makes the LM theory applicable for more general problems where the task is finding contextual deviations in a statistical way, in a mass of correct or incorrect sequences.

LMs can also be used for language generation by using the perplexity measurement. The perplexity can be understood as a measure of uncertainty. Lower perplexity means that the evaluated sequence is more likely part of the language than a sequence with a higher measured perplexity. In language generation, the task is continuing the started sequence of words by predicting the next word in each step. Using LM, the most likely continuation can be listed, and the next word can be estimated with random selection. The generated text will have the same characteristic as the used training corpus was in model building. The outcome of this process can be a simulated chatbot response, a fake scientific paper, or even a realistic Shakespeare-style sonnet.

Many similarities can be found between human conversation and machine-to-machine communication, where the message order (syntax) and message content (semantics) are driven by an API protocol. In this analogy, the words are the messages or tokens, the affixes are the details of message body, the sentences are the network operations, and the language is the extracted token sequences from network traffic.

In the context of network analytics, when a network operation ends with an unsuccessful result, this is only a symptom. It is important to reveal the earlier occurred root cause(s) of the given failure to be able to improve the quality, but this is hidden somewhere in the message details and is not easy to find. In complex communication systems there are many endpoints and possible configurations, and messages for the same transaction can arrive in different orders and timings, for different occurrences of the transaction, making the whole flow nondeterministic. Therefore, valid network message sequences can only be modeled in a statistical way. LM theory provides all necessary apparatus for modeling network traffic as a language - the only thing that should be solved is the appropriate tokenization of network messages.

In a related art, anomaly detection in mission-critical environments, natural language modeling (NLM) is used to generate anomaly detection rules based on N-gram analysis to increase the system’s detection rate and to reduce its false alarm rate. (See Cohavi et a/., U.S. Patent Application Publication No. 2020/0159998, “Method and apparatus for detecting anomalies in mission critical environments using statistical language modeling.”) In another application, NLM is used to detect early anomalies during real-time monitoring of LTE and VoLTE call flows that will later cause dialog ends with error message. Rejecting such dialog immediately will save resources and time in the IMS network. (See Hocza et al., International Patent Application Publication WO2017012654, “Predictive procedure handling.”)

These two applications above are examples for applying LMs for modelling network traffic when the message flow is driven by a communication application programming interface (API) protocol. In an expert analytics system, correlated metrics and events are collected from network nodes, probes, devices, and other sources. The exact communication flow cannot be deduced from this data, although the collected network data from a given mobile identifier (such as a wireless device’s International Mobile Subscriber Identity) contains a huge number of horizontal and vertical metric transitions that opens the possibility for applying NLM to detect unusual coexistences. Here, “horizontal relation” refers to the changing of a particular metric value on the timeline, and “vertical relation” refers to the mutual interaction of multiple metrics in a moment.

Service-level indication (SLI) computation systems provide overall information about the network performance. Realtime monitoring of an SLI, e.g., to detect a decreasing SLI over a period, may indicate issues. (See Zlatniczki, et a/., International Patent Application Publication No. W02021171063, “Adaptive method for measuring service level consistency.”) Many KPI values on a wide time window can be aggregated to create a consistency model, against which monitored SLIs can be evaluated, but this gives back high-level (network- 1 eve I and service-level) metrics about the service quality. This SLI value does not provide insight into the reasons for issues, on a low level. According to the techniques described herein, on the other hand, an LM based system utilizes sequences of KPI tokens and is able to show up unlikely KPI values or unlikely combinations of valid KPI values that may be used as a possible explanation of a network defect. Threshold-based anomaly detection systems can simultaneously monitor multiple low-level indicators in the network and indicate defects when one of these metrics is out from its valid range. An example system is described in Octavian et a!., International Patent Application Publication No. WO2017059904, “Anomaly detection in a data packet access network.” This anomaly detection method handles low-level network parameters about the operational status of ports of network elements and evaluates the collected samples in statistical way in a time window. An anomaly is identified when a single outlier value has been found that exceeds the thresholds, which are based on observations. The LM method described herein handles outliers more flexible way, such that a too low or high KPI value can be acceptable if it is occurring multiple times in the network normal operational level period where the training data was selected for model building.

As discussed above, machine learning methods are often used for anomaly detection in network traffic. But, supervised methods are not easily applicable for this task, because a target function is needed for them to recognize what is the anomaly. A lot of manual work is required to setup all important targets and a task which has a large feature space there is no chance to enumerate all possible combinations. Unsupervised methods are more powerful to find outliers using N-dimensional Euclidean distance, clustering, nearest neighbors, or statistical techniques. In one method, described in Sebastien et al., U.S. Patent Application Publication No. 2018/013776, “Specializing unsupervised anomaly detection systems using genetic programming,” scores of unsupervised machine learningbased anomaly detector are improved by specialization with genetic programming.

However, data session records - as described below - have large number of feature space with sparse filled content, where each item has unused fields that are empty. There are many variations of what fields are used in same data records, and there may also be many overlapping parts among the observable structures. This characteristic makes the data session record comparison hard, whether using Euclidean distance or clustering to reveal outliers. Data session records can be more efficiently analyzed by LMs because these are not depending on the number of feature dimensions. It just counts the overlapping similarities (N-grams) in the tokenized sequences, and in evaluation phase it gives back statistical metrics about the confidence of how the given token sequence is part of the trained network language.

Previously described LM applications can be used when the observed network transactions are executing in a non-deterministic way. In this case, every possible order of messages that finishes the flow with a successful result should be learned. The characteristics of data session records are different from this, however, because these data session records contain aggregated KPI values instead of execution order, and the question becomes how to extract the frequently occurring KPI value collocations (bigrams) or longer coexistences (N-grams) to build a LM that can be used reveal unusual KPI value combinations. Previously disclosed LM methods do not provide a solution for this kind of usage.

The systems and techniques described herein can be used to detect irregular network behavior, irregular QoE patterns, irregular traffic patterns, irregular subscriber behavior, shortly irregularities, signs of deviation from normal operation of a 3GPP mobile data network. The irregular pattern detection is performed by Natural Language Processing (NLP) techniques and by using Language Models (LM) that are specially applied to the mobile data network domain.

According to the several embodiments, the presently disclosed techniques transform the network events, network performance metrics, traffic metrics into tokens, to enable the pattern detection by the NLP techniques and LMs.

Irregular pattern detection can be done in two phases. First, normal network behavior is learned by a special initial method where it is ensured, and retested, that the learning period contains normal behavior. Second, in an operational phase, irregularities in the network behavior are continuously reported by the system.

The detected first-level irregularities are further analyzed and compared with certain network performance metrics to filter out false signals and to rank the findings. Finally, the irregularities are aggregated periodically. The detected irregularities serve as input to an early warning system. Like anomaly detection based on alarms or incidents, the early warning system aggregates the detected and analyzed (postprocessed) irregularities by the available dimensions and helps to avoid real problems.

In a preparation process for a long-term model, the discretized name-value tokens from five-minute samples, for example, are concatenated into one long aggregated KPI token in a predefined order, where one KPI record will be in one aggregated token. The token sequences are put together from aggregated KPI tokens on a daily basis, where one token sequence contains one subscriber’s data for a day in time series order, in 5-minute pieces. The generated model is the long-term aggregated KPI token LM. The aggregated KPI tokens for a single subscriber act like a user profile - this must contain a limited set of tokens and their transitions in long term observations. In evaluation, an irregularity will be an unlikely token transition that does not fit the modelled subscriber’s profile or other subscriber’s profile that are using the network services in the similar way.

Figure 1 illustrates an example system architecture showing the involved nodes/functions, in the context of a 3GPP wireless communications network, and the place of the new functionality, in the Analytics System 110 that forms part of the Operations Support System (OSS) 100.

The Analytics System 110 collects real-time events from nodes and network functions from different network domains: the radio access network (RAN) 115, the core network 120, and the IP Multimedia Subsystem (IMS) 125. The most relevant data sources are 4G and 5G base stations, shown in the figure as eNBs 130 and gNBs 135, the session management function (SMF) 140 and access and mobility management function (AMF) 145, which are core network control functions, as well as the core network user plane function (UPF) 150 and the Call Session Control Functions (CSCFs) 155 from the IMS core domain.

In the Analytics System 110, the Event Correlator function 160 correlates these events per session into a per-session correlated records, which are streamed to a KPI calculation module 165 and a LM Detection Module 170. The main new functions are implemented in the LM Detection Module 170, which is described in more detail below.

The Rule Engine 175 contains expert rules, explicit logics to identify network or service quality issues.

The Aggregator 180 aggregates KPIs, Incidents for different time periods and parameters. Based on single incident, or aggregated incidents, the Alarm/lncident Generator 185 sends alarms to a Fault Manager (FM) 190, or to other Alarm consumers.

The new LM detection system 170 extends the rule-based alarms with early warnings, using the techniques described herein. Figure 2 illustrates components of an example LM detection system 170. LM detection system 170 includes a training subsystem 210, as well as an evaluation subsystem 220. The training subsystem 210 includes a training data selection module 225, a KPI token model training module 230, and an aggregated KPI token model training module 235. The functions and operations of these modules will be described below. The evaluation subsystem 220 includes an early detection module 240 - as will be detailed below, evaluating KPI token systems in real time using a language model trained by training data selection module 225 facilitates early detection of anomalies, which in turn allows corrective actions to be taken sooner than would otherwise be possible. The input data to an LM detection system 170 like that shown in Figure 2 may be collected by an underlying existing network management system. The input (called “data session records”) contains records of each individual data session of each subscriber. One data session describes one service usage transaction with all the details. Here, by “service” is meant a particular service offered to subscriber stations (e.g., user equipments, or “UEs,”) by the mobile data network, such as video streaming, web browsing, etc.

The input record has a fixed (flat) structure and contains all the possible fields that can be relevant in any of the possible service usage transactions. There are fields that can be common for multiple service types, e.g., a service provider field is a valid field both for web browsing and video streaming. There may also be fields that are specific to a certain service type, e.g., a video resolution field is only relevant in case of video service. In the latter case, technically, the non-relevant fields of the input record may be kept as “null”.

An example input record, which might also be referred to as a service usage transaction record or a session record, may have the following content:

• User ID (IMSI)

• User Equipment ID (IMEI-TAC)

• Start timestamp of activity (epoch)

• Duration of activity (seconds)

• Geographical location(s) of the user during the activity (cell-ID, RAT)

• Enrichments of the preceding items (described in further detail below)

• Activity type / Service type (described in further detail below)

• Service quality KPI values (described in further detail below)

• Service specific extra attribute values (described in further detail below)

• Service types

Subscribers may use a variety of services provided by their devices and the mobile network operator. These may include, but are not limited to, any of the following set of services:

• Video streaming

• Video conferencing

• Video chat / video call

• Voice call

• Messaging

• E-mail

• Web browsing • File download

• File upload

• Location services

• Presence

• Software update

• Social networking

Each of the services listed above, as well as any other services, can be measured by several KPIs (Key Performance Indicators and/ or Metrics of Quality of Experience (QoE), where these are collectively or separately referred to herein as simply KPIs. Service-level KPIs indicate the subscriber perceived service quality. Some of these KPIs can be common to several service types, e.g., downlink throughput KPI, can be relevant for Web browsing, video streaming, file transfer, and several other service types. In many cases, though, the KPI is unique for the given service. For example, the KPI “video stall ratio” is relevant to and can be computed for video streaming service only.

KPIs calculated by the network analysis system might include, but are not limited to, any of the following metrics:

• Downlink / uplink throughput of data traffic

• Video quality

• Video stall time ratio

• Video initial buffering time

• Web page access time

• Web page download time

• Web page download success ratio

• Downloaded-uploaded bytes

• Voice quality

• Call setup time

• Call setup success ratio

• Call drop ratio

• Reference Signal Received Power (RSRP): signal strength, characterizing coverage

• Reference signal Received Quality (RSRQ): interference

• Signal-to-lnterference-plus-Noise Ratio (SINR): Signal to noise ratio

• Handover success or failure ratio

• Handover execution time • Session setup success or failure ratio

• Session setup time

• Registration success or failure ratio

• Session failure ratio (drop)

• Bitrate

• Packet loss ratio

• Packet delay or round-trip time

• Jitter

In some embodiments, any of the following fields are added as enrichment, to enhance the first several fields of the input record for the atomic service usage transaction.

• User-ID based enrichment: a. Subscription type (plan type, plan name) b. Subscriber category (VIP, enterprise, group, private, etc.)

• Device based enrichment: a. Device vendor b. Device model c. Device type (smartphone, tablet, stick, etc.) d. Device capability (HS-cat, LTE-cat, etc.) e. Screen size (inches)

• Time-specific enrichment of start timestamp: a. Day of week b. Weekend / weekday c. Hour of day d. Month e. Morning / Afternoon / Evening / Night

• Location-based enrichment: a. Cell area type (rural, downtown, shopping mall, suburb, event hall, office area, etc.) b. Cell location type (indoor/outdoor) c. RAT (2G, 3G, 4G, 5G) d. Cell vendor e. Carrier

The service types can each have further attributes in the session record as well. For example, each service type can have a set of attributes that describe important details related to the service usage. These attributes can be common for multiple service types, e.g., “content provider” as an attribute can be relevant both for web browsing and video streaming, however, in many cases the attributes are service-specific.

The set of attributes may include, but are not limited to, the following, in various embodiments:

• Content provider / service provider

• Encryption type

• Video resolution

• Video bitrate

• Radio conditions (signal strengths)

• Congestion conditions (cell load)

These attribute types may have nominal values (e.g., content provider is a string, having a few hundred possible values, e.g., YouTube, Netflix, Google, Facebook, etc.). Some dimension types such as radio conditions, for example (RSRP, RSRQ, CQI) can have numeric values, and can be transformed to nominal scale if necessary.

Below is an example input data record collected and assembled by the existing network analytics system:

[user-id=1234567890 (prepaid user, VIP user, unlimited plan), device-id=453786 (Apple, Iphone 7, smartphone, ...), start-ts=1618901475 (Tuesday, April, morning, 6h, weekday), duration=5.2, location=cell-12345 (rural, Ericsson, 4G, outdoor, ...), Web browsing (content provider = CNN, encryption=none, radio conditions=good, cell load=moderate, ...), Web page download time = 4sec, downlink throughput = 4Mbps, ...]

Only the non-null fields are shown in the above example, i.e. , those that have relevance for the given service usage transaction.

Input data session records contain sparse filled data rows. Figure 3 illustrates a histogram of not-empty fields in a sample of 5-minute data session records (-4.5M rows, -600 columns). As shown in this example, the average rate of not empty fields might be about 10% per row in a certain system. The variance of differently filled records are high, as there are many different structures mixed in a high-dimensional feature space. This characteristic makes the comparison of any two data session records difficult, because there can be filled or unfilled dimensions in both records, in addition to any overlapping parts. The weighting of similarities or differences are also not clear because the scales of numeric fields may be different. The multiplicity of non-empty values are also different at a field-by-field. All of these differences make direct comparison difficult.

With respect to data selection, in NLP tasks the model is made on a good quality corpus using its sentences as etalon. The created LM then contains positive cases of subsequences that are most likely valid parts of the given language. Live sentences that are evaluated with LM may contain unseen (maybe improper) sub-sequences that will be computed with low probability values.

Following the NLP’s training data selection pattern for application in a communications network environment, positive examples are needed from the captured input data, for building the NLM. Adding supervised extra knowledge about the subject to annotate data records would be impractical, so an automatic method is needed to measure the data quality. The input data described above contains several KPIs that alone cannot provide overall picture about the quality of the provided service - rather, these just give insight into the performance of sub-services like throughput, delay, setup time, call drops, voice quality video quality, etc. An aggregated metric is needed to get the overall quality. The Service Level Indicator (SLI) is a metric for service providers about the subscriber satisfaction, and it is in close relation with the service quality.

The SLI computing method described in the Zlatniczki reference mentioned above provides a consistency score for both the subscriber and service provider. It also computes scores for individual services, where the aggregates of these are the overall consistency score that indicates the overall service quality in the given period. At the same time the subscribers can be split into groups by their SLI value - for example they may feel the service quality poor, average, good or excellent.

Figure 4A illustrates an example training-data selection process, e.g., as might be carried out by the training data selection module 225 shown in Figure 2. As shown at block 402, the process is supplied with streamed input data, e.g., comprising data records like those described above. As shown at block 402, a Subscriber SLI module takes the input data and extends it with sli_s subscriber SLI value and slip service provider SLI value. As shown at block 404, certain input records are selected for the training set 406. According to an example implementation, an input record is selected into the training set when slis - L_SH_S and slip > L_su_p where the L_slis and L_slip limit values are predefined system parameters. These parameters should be set experientially. With too low limits the NLM will accept sequences of poor- quality data as normal behavior and thus not highlight deviations; with too high limits the NLM will be too sensitive, detecting any difference from excellent-quality sequences.

The Subscriber SLI module is only needed for training data selection. Generally, using the SLI for training data collection can be a good approach for lots of machine-learning (ML) tasks in a network environment.

Feature selection is also important, because unnecessary fields may overload the input data and model generation, which may slow down the evaluation process and decrease the quality of generated models. IDs, locations, IP addresses, timestamps should generally not be selected. Most enrichment fields should also be excluded, except for those that provide contextual information for one or more KPI values, for example cell frequency may affect some KPIs performance.

The vocabulary of an LM is the set of observed words/tokens. This is an open set, which means the training cannot be selected large enough to avoid unseen elements. Unseen tokens do not have a pre-determined probability, but the LM is able to compute an aggregated probability for the whole sequence even if it contains unseen parts. If the training corpus is large enough, it is not a problem if the vocabulary size is huge (for example 1 -10 million, in the case of conjugative languages) but it cannot be infinite because it reduces the chance that same N-grams are occurring more than one times. The solution is KPI discretization, which is used for the multiple data flows of Figures 4A, 4B, and 4C.

KPI values can be discrete (Boolean, Enum, String) or continuous (Integer, Float). Boolean can have two values (True or False). Enums and Strings act like words in LM, but the size of different values should be checked is it too numerous. For example, if it contains IDs, it should be excluded in the data selection, because its explicit value does not affect the sequence validity. Another possibility is the grouping if exact rule exits to replace similar values with the same label.

Continuous values can be more problematic, because these values may provide almost infinite vocabulary, with low chance of occurring same sequences and high chance to get unseen values during the live evaluation. Especially in the case of floating-point numbers, between any two seen values almost infinite number of unseen values may occur. Rounding may solve the problem in most of the cases when a few decimals shorter value means the same as the original value. However, in some cases when the main peak of the occurrence distribution function is between two close bounding values, a uniform rounding would take away too much from the characteristics of distribution, after the conversion each original value would be represented with the same value.

A proposed method is the occurrence distribution balanced bounding, where intervals are distinct and contain predictably equal numbers of values. Formally, let N the desired number of including intervals between the minimum and maximum value of given KPI, let f(x) the approximated occurrence distribution function, the bounding values b_{l t} b₂, ... b_N+1 should be chosen to fulfill the equations:

After grouping and discretization of KPIs an input record is transformed to a space separated KPI sequence:

{KPI record} -> t t₂ ... t_n , where one token has the following syntax: tj = {KPI labeli){KPI valuei)

The KPI labels is the unique name of the KPli or its abbreviation, and the KPI valuei is the unique value KPli in discrete case or the interval number in continuous case.

The occurrence together of two or more tokens multiple times in different KPI records is referred to as a coexistence pattern. Statistically, frequent KPI token combinations have a higher probability to occur again than rare combinations. The following discussion describes how to apply LMs to model probabilities of coexistence patterns.

KPI sequences can be generated in many combinations from an input KPI record. The idea is the application of an ordering heuristic by putting the elements of coexistence patterns one after another as N-grams. Such generated KPI sequences are used as training data for building LM. With this method, higher probabilities will be assigned in LM for the earlier identified coexistence pattern N-grams.

Figure 4B shows the generation of a KPI token model, e.g., as might be carried out by the KPI token model training model 230 shown in Figure 2, using an iterative algorithm. At the initial step, sequences of KPI tokens, after discretization shown at block 412, are enumerated in random order. If there are token pairs that are occurring together in multiple input records, there is a high chance that these will also be bigrams in the random KPI sequences. These sequences, generated as shown at block 414, will be the training set for the first LM. In the next iteration the KPI sequences are created by random perplexitybased language generation from the input record (see the Shakespeare example above) using the latest LM_n at the nth iteration:

where N is the count of KPI tokens in the input record and p_LMnO function is the token’s conditional probability with the already selected tokens.

Continuing the iteration, the average perplexity of actual KPI sequences will be lower and lower. The iteration converges, and will end when the improvement on average perplexity will be lower than an s predefined threshold:

P/.Mn-, ( /.Mn-,) - PLM_n(T_LMn) < E where P_LMk(T_LMi) is the average perplexity of all generated KPI sequences evaluating with LM_n.

This random KPI token sequence generation method has been verified on a small training dataset. The perplexity was lower and lower after each iteration on the same test pieces of KPI records (the low perplexity means high confidence that the evaluated sequence is part of the modelled language). The results of this verification are shown in Figure 5, which shows that the random shuffle method results lower and lower overall perplexity, for an example training process using 15K training and 20 x 10 test sequences with 120 discretized KPI attributes on 5-minutes sample data, with N = 10 discrete values per KPI.

Using the final LM_£, the generated KPI sequence is the maximum-likelihood ordering from all possible combinations. In the model evaluation flow shown in Figure 4D, the next token is selected in a deterministic way with the highest probability in the KPI Token Sequence Generation step shown at block 432:

The determinism in sequence generation is important because this guarantees that the same order is used for the same input record at training and evaluation.

Selecting one subscriber’s data records, it can be observed how the KPIs are changing in time. One period’s KPI data can be similar to KPI data taken from another period, or these can be significantly different, depending on the service usage. Different subscribers may show similar characteristics in KPI data changes if they left the same digital fingerprint in the network. If there are millions of subscribers in the network and their data is available from a long observation period, there is a high chance that a lots of the same or similar data records can be found, where each KPI values are on the same level, especially when the KPI values are discretized.

The simplest way to compare two KPI value sets is if the discretized KPI token elements are serialized in the same order into one big aggregated KPI token. In this case, the token set comparison is equivalent to the equality check of these two aggregated strings. The size of the aggregated KPI token may be more than one hundred characters, but this can be significantly shorter by using its MD5 hash, which is a 16-character-long hexadecimal number. Formally, let the nth data record that contains N_n nonempty discretized KPI tokens and these are t_nit„₂ - tjv n a predefined order, the aggregated KPI token will be the following hash of string concatenation: t_n = M£>5(t_nit„₂ ... t_Nn)

Always the same token aggregation method and ordering should be used for each subscriber data records, as this guarantees that the aggregated KPI tokens will be comparable to each other. Although even with KPI discretization the number of possible KPI token combinations is relatively high, it is expected that there will be aggregated KPI token repetitions in the long-term time series data of a same subscriber, because a given subscriber is generally using the same set of services on the mobile network and this activity is (normally) measured with the same level of the correlated metrics and events that are collected from the network by the network analytics system. The statistics shown in Figure 6 verifies this statement, because in more than 66% of the cases for this example set of data there were two or more aggregated KPI token repetition in the same subscriber’s data one-hour data. More particularly, Figure 6 illustrates the maximum number of aggregated KPI token repetitions in same subscriber’s data in different 5-minute samples, for a sample data set comprising 2000 different subscribers, 12 x 5-minute sample data, where ethe tokens were made by aggregating 120 discretized KPI attributes, with N = 10 discrete value level per KPI. If there are repetitions in the token sequences in a short period, collocations are also likely to occur, which means there will be an apropriate number of N-grams in the training data for building LM on a daily basis, with the entire subscribers set. According to the techniques described herein, the sentences of “language” for the monitored communication network should be the chronological series of aggregated KPI token sequences, e.g., from each 5 minutes of the same subscriber on daily basis. The final model should be built on a large dataset, with multiple days of all subscriber’s data from good quality service periods from the same network. The model building process of the aggregated KPI token LM is shown in Figure 4C - these operations might be carried out by an aggregated KPI token model training module 235, as shown in Figure 2.

The LM approach is able to handle contextual data validation in a statistical way. A low probability part in an evaluated sequence during the operational phase may indicate a quality degradation earlier than when the declension becomes permanent and detectable with single KPI threshold-based indicators.

The combined process flow shown in Figure 4D applies the built KPI token LMs for evaluating the live network data with two-level analysis. The Aggregated KPI Token Analysis step is responsible to reveal time series deviations on a daily basis. The outcome of this step is a list of rarely or never occurring aggregated KPI token. An infrequent aggregated KPI token contains unusual KPI token combinations that must be further analyzed with the single KPI token LM - this is the second level.

Figure 7 contains an example for LM evaluation of a KPI token sequence that was taken by turning on the debug level for a test sequence. The overall perplexity of this sequence is relatively high, and the reason of this bad measurement can be found at the highlighted part in Figure 7, which illustrates where the token transition probability is low. The LM’s vocabulary contained unigram (1-gram) patterns only to cover this part.

Irregularity detection as described above can be applied on language models generated based on different input sizes, i.e., with different timespans. One approach is to use an immediate irregularity detection that uses a LM that was created on 5-minute samples. Five minutes of data is long enough to collect enough measurements, but short enough to include effects due to daily traffic change.

A second approach is to detect irregularities based on a much larger input size, e.g., data session records belonging to a 1 -day time window. The practical evaluation of the irregularity detection can still be executed online, using a sliding window, so one does not need to wait in this latter case a whole day to find any irregularity. Note that 1-day data includes and, therefore, takes into account the daily periodic change of traffic pattern. The time window that is chosen for creating the language model in the system is a parameter.

The collected unusual KPI token combinations (the 1 st-level irregularities) may contain false positive cases that must be filtered out from final findings. These cases may be caused by network service improvements, however these changes can be handled by keeping models up to date by rebuilding them periodically on the latest network data. Other types of false positive cases can be identified by additional KPI analysis. All KPIs have a range of validity to categorize their change direction, for example in case of download/upload bytes/speed the higher value is better, but KPI values like call drop/failure rate have reverse goodness direction.

Remaining potential issues can be labelled and categorized by KPI or service names. For example, if a service degradation is caused by problems with a single cell, there can be many affected subscribers and the same type KPI token deviation will be collected many times in LM analysis. Using the categorization, the revealed issues can be counted and ordered by relevance.

With regards to potential uses for the techniques described herein, the detected irregularities are valuable as these are a new type of network misbehavior signal that can reveal hidden problems early. Further, these irregularity signals can be processed further by the traditional methods that are applied to normal network incidents: i.e. , aggregation, localization, alarm generation, etc. These latter methods are shortly discussed below.

In the Analytics System, the KPIs and the detected irregularities may be aggregated for different time periods and for different parameters. Although most of the enrichment fields are excluded in irregularity detection, these fields can be used for presenting the findings, e.g., comparing the occurrences of identified issues. For example, a given type of collected unusual KPI token combination can be aggregated by cells, areas, service providers, device types, etc. Therefore, during the KPI token sequence evaluation the connection with the original data record must be preserved, and the found irregularities must be stored with the enrichment context.

The aggregation technique can be used to either localize the issue and/or to identify the possible root cause of the issue or irregularity. Finally, if there is any meaningful output of the aggregation (statistically relevant overrepresentation of irregularities on certain elements/attributes), alarm generation can take place. If the aggregation of irregularity signals reveals that there is a statistically relevant concentration of irregularities on a (some) given object(s) of a given type (e.g., on 3 cells out of 1000 cells), this information can be used to localize the problem and also to provide hints on the possible root cause, namely, that there is something wrong with those 3 cells. Instead of cells, all types of fields present in the original input data session records are possible subject of aggregation, typically service provider, device type, device radio and capability details, cell antenna radio specific details, etc., can be used for possible root cause detection and/or localization.

When an irregular pattern is identified for a given session, a subscriber-related irregularity signal can be generated. Subscriber related irregularities, similarly to the KPIs, are aggregated for different services. Once the number of irregularity signals or the ratio of the irregularity signals (number of irregularity signals per number of total service usage) exceeds a threshold, a service quality warning alarm is generated. These alarms are sent to the Fault Management systems, where these alarms appear as warnings in the same way as network or node generated explicate alarms. These warnings are important since network and node alarms usually do not cover or indicate service quality issues, and these alarms appear usually before real service degradation occur, therefore, there is time to fix the network issue or modify network parameters to avoid real service degradation.

Alarms can be generated also when the irregularities are concentrated on a given cell, service provider, device type, etc. as seen in the localization section above. These are also similar to network or node alarm, but they can cover entities, e.g. end device, which do not generate explicit alarms.

Regarding detection time frames, short-term detection is based on identifying an unexpected single correlated record. This is effectively real-time detection, identifying a single wrong, unexpected sentence. This short-term detection can be used for generating, e.g., service quality related subscriber micro-incidents in advance, even if the KPI explicitly indicating the service quality is still good. Since this detection is relatively fast, this trigger can be used for any real-time or near real-time closed loop action, e.g., NWDAF or O-RAN which improves radio, or prioritize the gives session, to improve service quality or prevent any service quality degradation, e.g., dropping the session.

Long-term tendencies can be detected as well. These amount to language changes over time. As described above, the service quality incidents can be aggregated and generate an alarm related to multiple incidents. If there are more alarms related to a network or node parameter or device, the network can be fixed automatically or manually to avoid real service quality degradation. It is also possible to detect hidden issues due to long term traffic increase, introducing new services, terminals, etc.

Figure 8 is a process flow diagram illustrating an example method according to some of the techniques described herein. It should be understood that this method is intended to be a generalization of, and therefore encompass, many of the techniques described above. Thus, where terms used to describe this method differ from those used above, the terms used here should generally be interpreted to at least encompass similar or clearly related terms used in the description of examples above.

The method shown in Figure 8, which may be regarded as a method for evaluating performance of a communication network, is focused on the application of a KPI token language model, as described above, to the evaluation of a token sequence generated according to the techniques described herein. The method comprises, as shown at block 810, the step of generating, from a plurality of data session records, a sequence of tokens, where each token in the sequence corresponds to at least one key performance indicator (KPI) for a corresponding data session, and where each of the one or more data session records is a collection of performance-related metrics for a data session. As discussed in further detail below, each token may correspond to a single KPI, in some embodiments, while in others each token may comprise an aggregation of several KPIs. Note that the term “KPI,” as used here and elsewhere in this document, should be understood to refer to any parameter, numerical or otherwise, that is indicative of performance of the network or data session. Likewise, the KPIs or performance metrics may comprise service-quality KPIs or metrics, and/or radio performance metrics or KPIs, and/or radio environment metrics or KPIs, and/or core network performance metrics or KPIs.

The method further comprises, as shown at block 820, the step of evaluating the sequence, using a KPI token language model generated from training data obtained from the communication network, to detect whether the sequence is improbable. This detecting can be used to detect irregularities in network performance, as discussed in detail above.

In some embodiments, the method further comprises discretizing one or more KPIs in the data session records, where the sequence of tokens is based on the discretized KPIs. Examples of this discretization process were discussed above, and include rounding one or more numerical KPIs and/or categorizing one or more numerical KPIs into intervals according to occurrence distribution balanced bounding. The method illustrated generally in Figure 8 may involve the use of a KPI token language model for use in detecting short-term anomalies in KPI coexistence or for use in long-term, time-series-based, anomaly detection. The first approach might be referred to as a KIP token model approach, which involves enumerating the fields of a data session record into a statistically most likely order with language generation and evaluating the sequence with the language model to find anomalies, e.g., as indicated by the presence in the sequence of one or more token transitions with low probability. The second approach might be referred to as an aggregated KPI token model approach, which involves transforming each data session record into one aggregated token, and where the sequence of tokens comprises a time-ordered sequence of such aggregated KPIs, e.g., for a given user. Again, the sequence is evaluated with the corresponding language model to find anomalies, e.g., as indicated by a token transition with low probability. These approaches might be combined, in some implementations or embodiments.

Thus, in some embodiments or instances of the method, each token in the sequence corresponds to a KPI for a single data session record, and generating the sequence of tokens comprises enumerating the tokens into a statistically most likely order. Then, evaluating the sequence comprises using the KPI token language model to detect low- probability transitions in the sequence. For example, the probabilities may be estimated for each individual token in the test sequence from left to right. This is a conditional probability P(... t_n-3 tn-2 t_n-i | t_n) where t_n is the current token and the others are the preceding tokens. For example if the LM was created with n = 4 maximum length of n-grams, this model contains all the 4, 3, 2,1 -grams that could be collected from the training sequences, and these n-grams are stored with their occurrence based probabilities. If the current and preceding tokens can be covered with an existing 4-gram from the LM, its stored value will be the transition’s probability of the current token. Otherwise the probability is estimated from a lower level covering 3,2,1 -grams by a backoff mechanism. An overall probability is also assigned for the whole sequence by aggregating the individual transition probabilities. With this approach an anomaly corresponds to a low-token transition probability.

In other embodiments or instances, generating the sequence of tokens may comprise, for each of a plurality of data session records for a single user of the communication network, aggregating multiple KPIs corresponding to the single user into a respective token comprising the multiple KPIs, the sequence of tokens comprising a time-series of the respective tokens. Again, evaluating the sequence comprises using the KPI token language model to detect low-probability transitions in the sequence. In some of these embodiments or instances, the sequence of tokens comprises tokens corresponding to data session records generated at a predetermined regular interval. The predetermined regular interval might be in the range of 1 to 5 minutes, for example.

In some embodiments or instances, evaluating the sequence comprises evaluating probabilities of n-grams in the sequence, based on the KPI token language model, to detect an irregularity in network performance. In some of these embodiments or instances, the method may comprise repeating the generating and evaluating steps shown at blocks 810 and 820 for sequences generated at different times, and aggregating detected irregularities. The method may comprise generating an alarm responsive to the aggregated detected irregularities exceeding a threshold. In some embodiments or instances, data session information associated with the KPIs may be used to localize a cause of irregularities. This data session information may comprise any one or more of any of a cell identifier, an area identifier, a service provider identifier, and a device type, for example.

Figure 9 is a process flow diagram illustrating another example method, according to some implementations or embodiments of the presently disclosed techniques. Again, this should be understood as a generalization of several of the techniques described above, for evaluating communication network performance. Thus, inconsistencies in terminology should be resolved in favor of interpreting the terms used to describe this method so as to at least encompass similar or related terms used to describe the detailed examples above.

As shown at block 910, the method comprises generating, from a plurality of data session records, a plurality of sequences of tokens, where tokens in each sequence each correspond to one or more KPIs for corresponding data sessions. The method further comprises, as shown at block 920, training a KPI token language model, using the generated sequences. The method further comprises using the trained KPI token language model to evaluate subsequently generated sequences of tokens generated from data session records, to detect irregularities in network performance.

In some embodiments, the method still further comprises updating the KPI token language model using training sequences periodically obtained from data session records.

Any of the variations of the techniques described above are applicable to either or both of the methods illustrated in Figures 8 and 9.

Figure 10 illustrates an example processing node 1000 in which all or parts of any of the techniques described above might be implemented. Processing node 1000 may comprise various combinations of hardware and/or software, including a standalone server, a blade server, a cloud-implemented server, a distributed server, a virtual machine, container, or processing resources in a server farm. Processing node 1000 may communicate with one or more radio access network (RAN) and/or core network nodes, in the context of a communications network, e.g., for collection of network performance data and/or for the monitoring and adjusting of network configuration parameters.

Processing node 1000 includes processing circuitry 1002 that is operatively coupled via a bus 1004 to an input/output interface 1006, a network interface 1008, a power source 1010, and a memory 1012. Other components may be included in other embodiments.

Memory 1012 may include one or more computer programs including one or more application programs 1014 and data 1016. Embodiments of the processing node 1000 may utilize only a subset or all of the components shown. The application programs 1014 may be implemented in a container-based architecture.

It will be appreciated that multiple processing nodes may be utilized to carry out any of the techniques described herein, e.g., by allocating different functions to different nodes. Figure 9 illustrates several example cloud implementations. In a first example, for instance, various functions of the CFR algorithms described herein are implemented as Function-as-a-Service (FaaS) functions deployed in a serverless FaaS system. This option of deployment can be for both cloud and near edge platforms, where functions are built with CFR as additional functionalities are available with them. In a second example, the CFR implementation is available as a side-car container with application. This option of deployment can be for both cloud and near edge platform applications. Applications that prefer to do the life cycle management of CFR like it does for itself prefer this architecture. In a third option, CFR is available as pod with its own scaling and security. This option is the only option for edge devices to get CFR functionalities, as they are resource constrained. Also, this option is available for near edge and cloud as alternative architecture where applications and functions prefer to use a common pod rather than having a side car container.

Figure 11 is a block diagram illustrating a virtualization environment 1100 in which functions implemented by some embodiments may be virtualized. In the present context, virtualizing means creating virtual versions of apparatuses or devices which may include virtualizing hardware platforms, storage devices and networking resources. As used herein, virtualization can be applied to any device described herein, or components thereof, and relates to an implementation in which at least a portion of the functionality is implemented as one or more virtual components. Some or all of the functions described herein may be implemented as virtual components executed by one or more virtual machines (VMs) implemented in one or more virtual environments 1100 hosted by one or more of hardware nodes, such as a hardware computing device that operates as a network node, UE, core network node, or host. Further, in embodiments in which the virtual node does not require radio connectivity (e.g., a core network node or host), then the node may be entirely virtualized.

Applications 1102 (which may alternatively be called software instances, virtual appliances, network functions, virtual nodes, virtual network functions, etc.) are run in the virtualization environment to implement some of the features, functions, and/or benefits of some of the embodiments disclosed herein.

Hardware 1104 includes processing circuitry, memory that stores software and/or instructions executable by hardware processing circuitry, and/or other hardware devices as described herein, such as a network interface, input/output interface, and so forth. Software may be executed by the processing circuitry to instantiate one or more virtualization layers 1106 (also referred to as hypervisors or virtual machine monitors (VMMs)), provide VMs 1108a and 1108b (one or more of which may be generally referred to as VMs 1108), and/or perform any of the functions, features and/or benefits described in relation with some embodiments described herein. The virtualization layer 1106 may present a virtual operating platform that appears like networking hardware to the VMs 1108.

The VMs 1108 comprise virtual processing, virtual memory, virtual networking or interface and virtual storage, and may be run by a corresponding virtualization layer 1106. Different embodiments of the instance of a virtual appliance 1102 may be implemented on one or more of VMs 1108, and the implementations may be made in different ways. Virtualization of the hardware is in some contexts referred to as network function virtualization (NFV). NFV may be used to consolidate many network equipment types onto industry standard high volume server hardware, physical switches, and physical storage, which can be located in data centers, and customer premise equipment.

In the context of NFV, a VM 1108 may be a software implementation of a physical machine that runs programs as if they were executing on a physical, non-virtualized machine. Each of the VMs 1108, and that part of hardware 1104 that executes that VM, be it hardware dedicated to that VM and/or hardware shared by that VM with others of the VMs, forms separate virtual network elements. Still in the context of NFV, a virtual network function is responsible for handling specific network functions that run in one or more VMs 1108 on top of the hardware 1104 and corresponds to the application 1102. Hardware 1104 may be implemented in a standalone network node with generic or specific components. Hardware 1104 may implement some functions via virtualization. Alternatively, hardware 1104 may be part of a larger cluster of hardware (e.g., such as in a data center or CPE) where many hardware nodes work together and are managed via management and orchestration 1110, which, among others, oversees lifecycle management of applications 1102.

The foregoing merely illustrates the principles of the disclosure. Various modifications and alterations to the described embodiments will be apparent to those skilled in the art in view of the teachings herein. It will thus be appreciated that those skilled in the art will be able to devise numerous systems, arrangements, and procedures that, although not explicitly shown or described herein, embody the principles of the disclosure and can be thus within the spirit and scope of the disclosure. Various embodiments can be used together with one another, as well as interchangeably therewith, as should be understood by those having ordinary skill in the art.

The term unit or module, as used herein, can have conventional meaning in the field of electronics, electrical devices and/or electronic devices and can include, for example, electrical and/or electronic circuitry, devices, modules, processors, memories, logic solid state and/or discrete devices, computer programs or instructions for carrying out respective tasks, procedures, computations, outputs, and/or displaying functions, and so on, as such as those that are described herein.

Any appropriate steps, methods, features, functions, or benefits disclosed herein may be performed through one or more functional units or modules of one or more virtual apparatuses. Each virtual apparatus may comprise a number of these functional units. These functional units may be implemented via processing circuitry, which may include one or more microprocessor or microcontrollers, as well as other digital hardware, which may include Digital Signal Processor (DSPs), special-purpose digital logic, and the like. The processing circuitry may be configured to execute program code stored in memory, which may include one or several types of memory such as Read Only Memory (ROM), Random Access Memory (RAM), cache memory, flash memory devices, optical storage devices, etc. Program code stored in memory includes program instructions for executing one or more telecommunications and/or data communications protocols as well as instructions for carrying out one or more of the techniques described herein. In some implementations, the processing circuitry may be used to cause the respective functional unit to perform corresponding functions according one or more embodiments of the present disclosure. Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. It will be further understood that terms used herein should be interpreted as having a meaning that is consistent with their meaning in the context of this specification and the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

In addition, certain terms used in the present disclosure, including the specification and drawings, can be used synonymously in certain instances (e.g., “data” and “information”). It should be understood, that although these terms (and/or other terms that can be synonymous to one another) can be used synonymously herein, there can be instances when such words can be intended to not be used synonymously. All publications referenced are incorporated herein by reference in their entireties.

Claims

CLAIMS What is claimed is:

1 . A method for evaluating performance of a communication network, the method comprising: generating (810), from one or more data session records, a sequence of tokens, each token in the sequence corresponding to at least one key performance indicator, KPI, for a corresponding data session, wherein each of the one or more data session records is a collection of performance-related metrics for a data session; and evaluating (820) the sequence, using a KPI token language model generated from training data obtained from the communication network, to detect whether the sequence is improbable, indicating an irregularity in network performance.

2. The method of claim 1 , wherein the method further comprises discretizing one or more KPIs in the data session records, wherein the sequence of tokens is based on the discretized KPIs.

3. The method of claim 2, wherein said discretizing comprises rounding one or more numerical KPIs and/or categorizing one or more numerical KPIs into intervals according to occurrence distribution balanced bounding.

4. The method of any of claims 1-3, wherein the performance-related metrics comprise any one or more of the following: one or more service-quality metrics or KPIs; one or more radio performance metrics or KPIs, or one or more radio environment metrics or KPIs; and one or more core network performance metrics or KPIs.

5. The method of any of claims 1-4, wherein each token in the sequence corresponds to a KPI for a single data session record, wherein generating (810) the sequence of tokens comprises enumerating the tokens into a statistically most likely order, and wherein evaluating (820) the sequence comprises using the KPI token language model to detect low- probability transitions in the sequence.

6. The method of any of claims 1-4, wherein generating (810) the sequence of tokens comprises, for each of a plurality of data session records for a single user of the communication network, aggregating multiple KPIs corresponding to the single user into a respective token comprising the multiple KPIs , the sequence of tokens comprising a timeseries of the respective tokens, and wherein evaluating (820) the sequence comprises using the KPI token language model to detect low-probability transitions in the sequence.

7. The method of claim 6, wherein the sequence comprises tokens corresponding to data session records generated at a predetermined regular interval.

8. The method of claim 7, wherein the predetermined regular interval is in the range of 1 to 5 minutes.

9. The method of any of claims 1-8, wherein evaluating (820) the sequence comprises evaluating probabilities of n-grams in the sequence, based on the KPI token language model, to detect an irregularity in network performance.

10. The method of claim 9, wherein the method comprises: repeating said generating (810) and evaluating (820) for sequences generated at different times; aggregating detected irregularities; and generating an alarm responsive to the aggregated detected irregularities exceeding a threshold.

11 . The method of any of claims 1 -10, wherein the method further comprises using data session information associated with the KPIs to localize a cause of irregularities, wherein the data session information comprises any one or more of any of: a cell identifier; an area identifier; a service provider identifier; a device type.

12. A method for evaluating performance of a communication network, the method comprising: generating (910), from a plurality of data session records, a plurality of sequences of tokens, each token in each sequence corresponding to at least one key performance indicator, KPI, for a corresponding data session, wherein each of the one or more data session records is a collection of performance-related metrics for a data session; training (920) a KPI token language model, using the generated sequences; and using (930) the trained KPI token language model to evaluate subsequently generated sequences of tokens generated from data session records, to detect irregularities in network performance.

13. The method of claim 12, wherein each token in the sequences corresponds to a KPI for a single data session record, wherein generating (910) the plurality of sequences comprises enumerating the tokens of each sequence into a statistically most likely order.

14. The method of claim 12, wherein generating (910) the plurality of sequences comprises generating each of the sequences by, for each of a plurality of data session records for a user of the communication network, aggregating multiple KPIs corresponding to the user into a respective token comprising the multiple KPIs such that the sequence of tokens for the user comprises a time-series of the respective tokens.

15. The method of claim 14, wherein each sequence comprises tokens corresponding to data session records generated at a predetermined regular interval.

16. The method of claim 15, wherein the predetermined regular interval is in the range of 1 to 5 minutes.

17. The method of any one of claims 12-16, further comprising updating (940) the KPI token language model using training sequences periodically obtained from data session records.

18. One or more processing nodes (1000) for use in or in association with a communication network, each of the one or more processing nodes (1000) comprising processing circuitry (1002) and a memory (1012) operatively coupled to the processing circuitry and comprising program instructions for execution by the processing circuitry (1002), whereby the processing nodes (1000) are configured to: generate, from a plurality of data session records, a sequence of tokens, each token in the sequence corresponding to at least one key performance indicator, KPI, for a corresponding data session, wherein each of the one or more data session records is a collection of performance-related metrics for a data session; and evaluate the sequence, using a KPI token language model generated from training data obtained from the communication network, to detect whether the sequence is improbable, indicating an irregularity in network performance.

19. The one or more processing nodes (1000) of claim 18, wherein the processing nodes (1000) are configured to carry out a method according to any one of claims 2-17.

20. One or more processing nodes (1000) for use in or in association with a communication network, the processing nodes (1000) being adapted to carry out a method according to any one of claims 1-17.

21 . A computer program product comprising program instructions for execution by one or more processing circuits in one or more processing nodes operating in or in association with a communication network, wherein the program instructions are configured to cause the processing nodes to carry out a method according to any one of claims 1 -17.

22. A computer-readable medium comprising, stored thereupon, a computer program product according to claim 21.