WO2023160778A1

WO2023160778A1 - Initialization of k-means clustering technique for anomaly detection in communication network monitoring data

Info

Publication number: WO2023160778A1
Application number: PCT/EP2022/054523
Authority: WO
Inventors: Oscar Novo Diaz; Edgar Ramos
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 2022-02-23
Filing date: 2022-02-23
Publication date: 2023-08-31

Abstract

A method of initializing a number of centroids, K, of a dataset including a plurality of data points having a number of features, N, for processing with a K‐means clustering technique includes, for at least one feature, obtaining a list of values of the feature from the data points, for the at least one feature, sorting the list of values to obtain a sorted list, dividing each sorted list of values into a plurality of sub‐lists of sorted values, storing a median value of each sub‐list in a median value list, and selecting K centroid points by combining median values from the median value list.

Description

INITIALIZATION OF K-MEANS CLUSTERING TECHNIQUE FOR ANOMALY DETECTION IN COMMUNICATION NETWORK MONITORING DATA

TECHNICAL FIELD

[0001] The present disclosure relates to systems and methods for performing K-means clustering. In particular, the present disclosure relates to systems and methods for initializing systems and methods for K-means clustering for detection of anomalies in communication network monitoring data.

BACKGROUND

[0002] K-means clustering is an unsupervised machine learning technique that analyzes unlabeled source data and separates or organizes the data into distinct groups (i.e., clusters) so that the data can be better understood. The source data may include extremely large datasets of multi-dimensional data. That is, each data point in the dataset may be characterized by multiple features. In particular, K-means clustering groups data points that exhibit certain similarities into K groups, where K is a number specified in advance by a user. The goal of K- means clustering is to use artificial intelligence to discover hidden structures or groupings within the data.

[0003] Unsupervised grouping techniques, such as K-means clustering, can be particularly useful for analyzing large datasets. However, choosing initial conditions for K- means clustering can be challenging, as a poor choice of initial conditions for K-means clustering can lead to suboptimal results.

[0004] Several different approaches to initialization have been proposed, such as the methods described in M. Celebi et al., A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm, arXiv:1209.1960 [cs.LG] (2012). However, there still exist challenges for initializing data for processing using K-means clustering techniques. SUMMARY

[0005] Some embodiments provide a method of initializing a number of centroids, K, of a dataset including a plurality of data points having a number of features, N, for processing with a K-means clustering technique. The method includes, for at least one feature, obtaining a list of values of the feature from the data points, for the at least one feature, sorting the list of values to obtain a sorted list, dividing each sorted list of values into a plurality of sub-lists of sorted values, storing a median value of each sub-list in a median value list, and selecting K centroid points by combining median values from the median value list.

[0006] Dividing each sorted list may include dividing each sorted list into K/ V/ | sublists, where the symbol [x] denotes a ceiling function that maps x to a least integer greater than or equal to x.

[0007] In some embodiments, selecting K centroid points by combining values from the median value list includes generating permutations of values from the median value list and selecting K of the permutations.

[0008] In some embodiments, the median value list includes a list of tuples, and each of the tuples includes K/ Vfr] median values of sub-lists generated from values of a corresponding one of the features.

[0009] The method may further include performing K-means clustering using the selected centroids as initial centroid values.

[0010] The method may further include detecting anomalous data points in the dataset using clusters obtained by performing the K-means clustering.

[0011] The dataset includes network monitoring data obtained by monitoring communications in a communications network. The network monitoring data includes service protocol metrics in the communications network. The method may further include determining whether or not to generate an alarm based on the detection of the anomalous data points.

[0012] In some embodiments, dividing each sorted list of values into K/ VK| sub-lists of sorted values defines a plurality of quadrants of the features in an N-dimensional feature space. [0013] In some embodiments, generating K centroid points by combining values from the median value list includes generating a plurality of permutations of values in the median value list, and selecting K of the permutations as centroids.

[0014] Some embodiments provide a method of detecting anomalous data points in a dataset of communications monitoring data obtained from a communications network. The method includes initializing a number of centroids, K, of the dataset including a plurality of data points having a number of features, N, for processing with a K-means clustering technique. Initializing the centroids includes, for at least one feature, obtaining a list of values of the feature from the data points, for the at least one feature, sorting the list of values to obtain a sorted list, dividing each sorted list of values into a plurality of sub-lists of sorted values, storing a median value of each sub-list in a median value list, and selecting K centroid points by combining median values from the median value list. The method further includes performing K-means clustering using the selected centroid points as initial centroid points to obtain K clusters of data points, and identifying anomalous data points based on the clusters.

[0015] Some embodiments provide a computer program including program code to be executed by processing circuitry of an apparatus, whereby execution of the program code causes the apparatus to perform operations described herein. Some embodiments provide a computer program product including a non-transitory storage medium including program code to be executed by processing circuitry of an apparatus, whereby execution of the program code causes the apparatus to perform operations described herein.

[0016] An anomaly detection system according to some embodiments includes a processor, a communication interface coupled to the processor, and a memory coupled to the processor. The memory includes computer readable instructions that when executed by the processor cause the system to perform operations including initializing a number of centroids, K, of a dataset including a plurality of data points having a number of features, N, for processing with a K-means clustering technique. Initializing the centroids includes, for at least one feature, obtaining a list of values of the feature from the data points, for the at least one feature, sorting the list of values to obtain a sorted list, dividing each sorted list of values into a plurality of sub-lists of sorted values, storing a median value of each sub-list in a median value list, and selecting K centroid points by combining median values from the median value list. The method further includes performing K-means clustering using the selected centroid points as initial centroid points to obtain K clusters of data points, and identifying anomalous data points based on the clusters.

BRIEF DESCRIPTION OF THE DRAWINGS

[0017] Figure 1 illustrates an example of a network management system including a service monitoring dashboard.

[0018] Figure 2 illustrates an example workflow of operations of a service monitoring dashboard in a network management system.

[0019] Figures 3-6 illustrate conventional methods of initializing centroids of a dataset for processing using a K-means clustering technique.

[0020] Figures 7-10 illustrate systems/methods of initializing centroids of a dataset for processing using a K-means clustering technique according to some embodiments.

[0021] Figures 11-16 illustrate example results of different initialization methods of datasets for processing using a K-means clustering technique according to some embodiments.

[0022] Figure 17A is a block diagram that illustrates elements of a K-means clustering system according to some embodiments.

[0023] Figure 17B illustrates various functional modules that may be stored in the memory of a K-means clustering system according to some embodiments.

[0024] Figure 18 is a flowchart that illustrates operations of a K-means clustering system according to some embodiments.

DETAILED DESCRIPTION OF EMBODIMENTS

[0025] Inventive concepts will now be described more fully hereinafter with reference to the accompanying drawings, in which examples of embodiments of inventive concepts are shown. Inventive concepts may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of present inventive concepts to those skilled in the art. It should also be noted that these embodiments are not mutually exclusive. Components from one embodiment may be tacitly assumed to be present/used in another embodiment.

[0026] The following description presents various embodiments of the disclosed subject matter. These embodiments are presented as teaching examples and are not to be construed as limiting the scope of the disclosed subject matter. For example, certain details of the described embodiments may be modified, omitted, or expanded upon without departing from the scope of the described subject matter.

[0027] Tools for managing communications networks have been developed with the goal of monitoring the performance of the network, detecting anomalies in the network and generating alerts when anomalies occur. Network monitoring data can be voluminous and muti-dimensional. As such, the task of identifying anomalies lends itself naturally to the use of clustering techniques, such as K-means clustering, as the purpose of anomaly detection is to identify data points that deviate from normal or expected behavior. Anomaly detection is a method frequently use to analyze Internet of Things (loT) traffic to discover critical incidents or potential opportunities.

[0028] Network management tools may collect key performance indicator (KPI) metrics from Service Level Monitoring (SLM) agents and use such information to visualize and report the performance of systems and services. Agents are software modules that can parse and collect KPI metrics from different nodes within the system and translate them into a format that can be further stored or processed by a network management system.

[0029] Some network management systems have the capability of combining different metrics obtained from service level metric agents, and using the metrics to determine failure rates and notification criteria. Network management systems may further integrate with connectivity management and device management systems that manage devices, such as loT devices, within the network.

[0030] An example architecture of a network management system is shown in Figure 1. In the system shown in Figure 1, a metrics collector 16 collects metrics from monitoring applications 10 that monitor various service protocols in the communication system, such as message queuing telemetry transport protocol (MQTT), constrained application protocol (CoAP), hypertext transport protocol (HTTP), lightweight machine to machine protocol (LwM2M), etc., which are used by an loT Automation system 18 for loT automation. Metrics generated by the monitoring applications 10 and the loT automation system 18 are collected by the metrics collector 16 and provided to a service monitoring dashboard 14. The service monitoring dashboard 14 tracks errors generated by the supported protocols. The data generated by the service monitoring dashboard 14 may be provided to a network automation/management system 12, for example, through a representational state transfer (REST) interface.

[0031] In particular, the service monitoring dashboard 14 may have a predefined set of operations that probe the system and record results of the tests. The service monitoring dashboard 14 collects these records and computes failure rates (among other metrics) based on the results. After calculating various service failure rates and failure severity, the service monitoring dashboard 14 may notify the customer of the result and provide an estimate of the number of impacted devices.

[0032] Figure 2 shows an example workflow of the generation of notifications by a service monitoring dashboard 14. As shown in Figure 2, metrics are collected at block 24 by the service monitoring dashboard 14 at regular intervals based on input from a timer 22. When metrics are collected, a failure rate is determined for each service under test (block 32). In parallel, a percentage of device failures is calculated for each service (block 26), and the number of impacted devices is recorded (block 28). From this information, a severity of the failure is assigned at block 30. The method then determines, on a per-service basis, whether the failure rate and severity differ from a previous determination (block 34), and if not, the procedure ends. If the failure rate and severity for a service have changed from previous determinations, the method determines if an alarm has already been raised for the service (block 36), and if so, clears the previous alarm (block 38). If no alarm has been raised, the system raises an alarm indicating the current time, the service in question, the failure rate, severity, and number of impacted devices (block 40). [0033] The severity may be determined based on factors such as the total failure rate (per service group), the estimated number of impacted devices, and the time when an incident that triggered the notification was detected.

[0034] The service monitoring dashboard 14 may also provide an interface for monitoring the services to allow a human operator to visualize the performance of the services as well as the critical anomalies.

[0035] Currently, one of the main problems with network management systems is their lack of flexibility in detecting anomalies in network monitoring data. Current methods for detecting anomalies in network monitoring include the use of predefined scripts that are written in advance by trained technicians. Generating and testing such scripts is a timeconsuming and complicated task that requires sophisticated knowledge of the systems being analyzed.

[0036] Recently, artificial intelligence and machine learning have emerged as potential alternatives to solve this problem by reducing the required tuning time and offering the possibility to explore more complex anomaly detection use cases. While some Al tools have been investigated, they still suffer from certain limitations. Improvements of the Al techniques used for unsupervised clustering are needed to help make the anomaly detection processes more accurate and faster.

[0037] As noted above, K-means clustering is an attractive technique for organizing large groups of data points that may have hidden underlying structures into meaningful groupings. Because learning systems such as K-means clustering are unsupervised, they are not trained using labeled training data. Thus, one peculiarity of K-means clustering techniques is their initialization. Before a K-means clustering technique can start to analyze data to find patterns in it, the technique needs to be initialized with initial label assignments.

[0038] In general, there are currently three most common initialization methods used with K-means clustering techniques, each of which uses some degree of randomness, referred to as random partition initialization, random points initialization, and K-means++ initialization. In random partition initialization, labels are randomly assigned to each data point such that each data point is randomly assigned to a group. In random points initialization, centroids of the K groups are initialized by selecting K data points uniformly at random, where a "centroid" is the mean of all the points in a given cluster. Conceptually, K-means clustering treats data as composed of a number of roughly circular distributions where the centroid is the center point of that circular distribution. The remaining data points are then assigned to groups based on their proximity to the randomly selected centroids.

[0039] In K-means++ initialization, the centroids are initialized by selecting K data points sequentially following a probability distribution determined by previously selected centroids, and initial labels are assigned to the data points based on the centroid locations.

[0040] Conventional methods of initialization for K-means clustering are illustrated in Figures 3 to 6. Figure 3 illustrates an example distribution of data points having two dimensions (shown on the X and Y axis, respectively). Figure 4 illustrates random partition initialization, Figure 5 illustrates random points initialization, and Figure 6 illustrates K-means++ initialization. Referring to Figure 3, an example dataset includes a plurality of data points having two features. The features are either numeric or are converted to numeric representations, and the data are plotted in two dimensions with each dimension representing a feature. In the example of Figure 3, the data falls within four distinct groupings 32. In practice, when the data are multi-dimensional, grouping relationships may be more difficult to visualize and less distinct. Before running the K-means grouping technique, the groupings of data points are not known.

[0041] Referring to Figure 4, a random partition initialization technique is illustrated for a K-means clustering technique where the total number of clusters, K, is 4. in Figure 4(a), each data point is randomly assigned to one of K = 4 clusters represented by the circle, square, horizontal diamond and vertical diamond shapes.

[0042] In Figure 4(b), the centroids 42A-42D of the initially assigned clusters are calculated where, as noted above, the centroid of a cluster represents the mean of all data points in the cluster. Because the data points are initially assigned randomly to clusters, the centroids 42A-42D are relatively close together near the center of the data points. [0043] Through the K-means clustering technique, the data points are then re-assigned to clusters based on the centroids. As seen in Figure 4(c), this assignment can result in overlapping of some of the clusters.

[0044] Figure 5 illustrates random points initialization. In the random points initialization method, the centroids are initialized first rather than the data labels. This is done by choosing K data points at random to be the initial centroids. Accordingly, referring to Figure 5(a), four centroids 52A-52D are randomly chosen from the data points. Then, referring to Figure 5(b), each point in the data is assigned to a cluster according to its proximity to the centroids 52A-52D (i.e., each point is assigned to the cluster corresponding to the closest centroid). Referring to Figure 5(c), centroids 52A-52D of the clusters are then re-calculated based on the cluster assignments.

[0045] Figure 6 illustrates K-means++ assignment. The K-means++ technique initializes the centroids by selecting K data points sequentially, following a probability distribution determined by the centroids that have already been selected. The probability that a point will be chosen as an initial centroid is proportional to the square of its distance to the existing initial centroids.

[0046] Figure 6 shows how the centroids are chosen according to their distance from each other. In particular, Figure 6(a) shows a distribution of data points. In Figure 6(b) a first centroid 62A is randomly chosen. In Figure 6(c) a density is determined for the second centroid, and in Figure 6(d) the second centroid 62B is chosen based on a probability determined by the square of the distance to the first centroid 62A. This process is repeated until all four centroids 62A-62D have been selected.

[0047] Because the data points of the example depicted in Figures 3 to 6 have only two features (represented on the X- and Y- axes), they can be visualized using a 2-dimensional (2-D) graph, which can easily show the functionality of the K-means clustering technique. However, most loT data collected by a network management system are high dimensional, meaning that each data point will have multiple attributes or features (multivariate features). Multivariate features are also common in other network monitoring metrics. [0048] The K-means clustering technique does not work well with dense dimensionality, because K-means clustering uses distance-based metrics to identify similar points in the clusters. Thus, as the dimensions increase, the distances between different data points tend to be close together, making it difficult for the technique to identify the clusters.

[0049] There are methods to solve this high-dimensionality issue in K-means clustering. One of the most common solutions is the Principal Component Analysis (PCA), which reduces the number of features while keeping as much variation of the data as possible. However, reducing the number of features is not practical in many network management scenarios, since dimensionality reduction may cause some loss of information.

[0050] Another problem of K-means clustering techniques is related to initialization. As described above, the most common initialization methods for K-means clustering techniques have a degree of randomness. K-means clustering techniques are very sensitive to the initial conditions. Because of the randomness associated with conventional initialization methods, it is often necessary to run a K-means clustering technique multiple times to achieve a desired level of accuracy in its final results.

[0051] Some embodiments described herein provide systems and/or methods for initializing data for processing in a K-means clustering technique that reduces or removes randomness from the initialization procedure. A system/method according to some embodiments divides an input dataset into a plurality of quadrants equal to the number K of centroids and chooses a centroid point for each quadrant.

[0052] A system and/or method as described herein may provide certain advantages. For example, the use of a system/method according to some embodiments to initialize the centroids of a K-means clustering technique may significantly reduce the time to analyze high dimensionality data, such as network monitoring data collected by a network management system. For example, the use of a system/method according to some embodiments to initialize the centroids of a K-means clustering technique may reduce the time of the analysis of high dimensionality data. This may allow a network management system to provide desired metrics much faster. [0053] The use of a system/method according to some embodiments to initialize the centroids of a K-means clustering technique may also result in improved accuracy. For example, the use of a system/method according to some embodiments to initialize the centroids of a K- means clustering technique may increase the effectiveness of the K-means clustering technique, thereby providing better results on the final clustering configuration. Accordingly, it may not be necessary to perform multiple iterations of the K-means clustering technique as is typically required with conventional approaches. In addition, a system/method for initializing the centroids of dataset for performing a K-means clustering technique according to some embodiments may be feature-agnostic and computationally inexpensive.

[0054] A method for initializing the centroids of a dataset for performing K-means clustering is as follows. For each of N features in the dataset, all of the values for the feature are placed into a list, and the list is sorted. The method divides the list by the Nth root of the number of centroids to be initialized, where N is the number of dimensions (or features) of the data. For example, for a system with K centroids and data with N dimensions (or features), the list of values for a feature is calculated as K/ fr| ■ For example, if the number of centroids is 4 and the number of features is 2, the list L is divided in 2 sub-lists ( 4j =2). If the number of centroids is 5 and the number of features is 2, the list L is still divided by 2 sub-lists since the final result is rounded down to the nearest integer, 5j = [2.24] =2. In that particular case, an extra centroid is added into one of the quadrants. The symbol [x] denotes the ceiling function that maps x to the least integer greater than or equal to x.

[0055] Next, for each sub-list, a median number is selected from the sub-list, and the selected median number is stored in a median value list L. The method then selects K tuples containing permutations of the numbers stored in the median value list L. Each such tuple represents the coordinates (dimensions) of an initial centroid in the data.

[0056] An example with two features will now be described with reference to Figures 7- 10. The example has only two features (N = 2) for easy visualization of the data in two dimensions. Each feature is represented in an axis of the graph, axis X and Y. However, the method can be used with as many features as needed. Figure 7 shows the distribution of the data in a 2-dimensional graph. For this example, the dataset will be divided in four clusters (K = 4). A first feature is represented on the x-axis of Figure 7, and a second feature is represented on the y-axis of Figure 7.

[0057] To assign initial centroids, first, the values of the first feature are inserted in a list. The list is then sorted and divided in two ( 4j = 2). That is, since the number of centroids (clusters) is 4 and the number of features is 2, the list is divided by 2. The median number from each sub-list is chosen. This procedure is illustrated in Figure 8. In the example data of Figure 7, the median numbers for the first feature (on the x-axis) are 3 and 10. These numbers are placed into the list L as a tuple.

[0058] The same procedure is done for the second feature (on the y-axis), as shown in Figure 9. The median numbers 4 and 11 are chosen for the second feature. It will be appreciated that datasets will typically yield different median numbers for the first and second features. The median value list is constructed as a list of tuples, wherein each of the tuples comprises K/ ^;W] median values of sub-lists generated from values of a corresponding one of the features. Thus, the median value list L contains the tuples [(3, 10), (4, 11)]. The last step is to find K points containing permutations of the numbers in the list [(3, 10), (4, 11)], where each point has a first coordinate selected from the first tuple and a second coordinate selected from the second tuple. In the 2D example, the K points are [(3,4), (3,11), (10,4), (10,11)].

[0059] Each of the K points are selected as the initial centroids in the dataset. The K- means clustering technique will use those centroids as a starting point for performing the technique.

[0060] Figure 10 illustrates how the method divides the dataset in quadrants in two dimensions. The area of each quadrant is proportional to the data included in it. Thus, in the 2D example, the larger quadrant (top-right) contains the most disperse data. On the other hand, the smallest quadrant (bottom-left) contains a larger density of points in the data set.

[0061] The centroids are chosen according to the quadrants. For each quadrant, the centroids are roughly in the center of the distribution of the data of the quadrant.

[0062] For three-dimensional data (N = 3) where K = 8, there will be K permutations of numbers in the list L. For example, if the data in the dataset had a third feature whose median values were 25 and 55, the list L would include the tuples [(3, 10), (4, 11), (25, 55)] and the permutations would include the following points:

(3, 4, 25), (3, 4, 55), (3, 11, 25), (3, 11, 55), (10, 4, 25), (10, 4, 55), (10, 11, 25), (10, 11, 55)

[0063] The K centroids will be the 8 N-tuples constructed from all of the permutations of numbers in the list. If the number of centroids is 9 and the number of features is still 3, the list L is still divided into two sub-lists since the final result is round down to the nearest integer, [^9]=[2.08]=2. In that particular case, an extra centroid is added into one of the quadrants.

[0064] A method as described herein has been evaluated against the K-means++ and random initialization methods. The evaluation use case is related to anomaly detection where the K-means clustering technique detects the anomaly data from a data set and its performance (latency) is evaluated. That evaluation is illustrated in Figures 13, 15 and 16. The K-means clustering technique identifies the data points that deviate from a dataset's normal behavior. Those data points are called outliers and in the context of network performance may indicate a problem of some kind in the network.

[0065] To evaluate the initialization method described above, different variables have been tested, such as the number of data points (from 50 to 100 million of data points), the number of clusters (from 4 to 25), and the number of dimensions (from 2 to 5). Statistics defines dimensionality as the number attributes or features a dataset has. In all the cases evaluated and illustrated in Figures 13, 15, and 16, a method as described herein outperforms the existing initialization methods. In general, the method described herein is faster, and the results are very accurate.

[0066] Number of Data Points

[0067] A first test evaluated the performance of the K-means++ initialization methods, the random initialization methods, and the initialization method described herein to find the anomalies in the data.

[0068] The data was varied from 50 data points (low degree of data density) up to 100 million data points (high degree of data density). The K-means clustering technique also divided the data into 4 clusters. [0069] The data generated by the K-means clustering technique using different initialization methods is depicted in Figure 11.

[0070] Figure 11 depicts four graphs. The graph in the upper-left corner (Figure 11(a)) shows the original raw data without any modification. The graph from the upper-right corner (Figure 11(b)) shows the result data after performing the K-means clustering technique with the K-means++ initialization method. The graph in the lower-left corner (Figure 11(c)) shows the results of the K-means clustering technique using a random initialization method. The graph in the lower-right corner (Figure 11(d)) shows the K-means clustering technique using a method as described herein.

[0071] The clusters are defined in different colors. All the tests run the same data. The data set in Figure 11 has a low degree of data density, around 300 data points.

[0072] Figure 12 shows the same techniques running a large number of data points, around 1000 data points. In this example the clusters appear clearer due to the large amount of data.

[0073] Figure 13 shows the performance evaluation of all the tests. The graphs show the performance of the K-means initialization method (in blue), the random initialization method (in red) and the enhanced initialization method (yellow, green, and brown).

[0074] The results of the method described herein have been split in three to show more accurately the performance of the method. The yellow line shows the time it takes to create the matrix used for initialization. The line in green shows the time the K-means clustering technique takes to give a result using a method as described herein. The brown line shows the total time including the time to create the enhanced initialization matrix and the time it takes for K-means to give the results.

[0075] Figure 13 shows that a method as described herein outperforms the K-means++ and random initialization methods. Even taking into consideration the time it takes to create the enhanced initialization matrix, the method described herein provides less latency than the other two methods. The data was tested from 50 data points to 100 million data points.

[0076] Number of Clusters [0077] K-means clustering performs the division of objects into clusters that share similarities and are dissimilar to the objects belonging to other clusters.

[0078] The performance of the K-means clustering technique was tested around different clusters. The data was divided in different numbers of clusters and the initialization methods were evaluated according to latency to provide the results.

[0079] Figure 14 shows the visual results of the K-means clustering technique using different initialization methods where the observation of the data is partitioned in 16 different clusters. The cluster are colored differently to differentiate them. The outliers are rounded red to differentiate them from the data belonging to the cluster.

[0080] Figure 15 shows the performance of the K-means clustering technique with three different initialization methods. The method described herein outperformed K-means++ and the random initialization methods. The tests were done with 1000 data points in each execution and the number of clusters varied between 4 to 25 clusters. The data shows that the higher the number of clusters, the better performance of the method described herein relative to K-means++ and random initialization methods.

[0081] Number of Dimensions

[0082] Dimensionality refers to the attributes or features of each dataset. Figures 11, 12 and 14 showed the results of the K-means clustering technique using 2 dimensions. Thus, each cluster has two features that differentiate it from the other clusters. The example with two features is illustrated because two dimensions are easy to represent graphically. One feature is mapped to the X-axis and the other feature to the Y-axis. However, clusters can be represented using multiple features.

[0083] This last test evaluated the initialization methods using several features per cluster (called multivariate observations). The features varied from 2 to 5 features divided in 4 clusters. The results are shown in Figure 16.

[0084] Number of Iterations

[0085] K-means++ and the random initialization methods have both a degree of randomness. Hence, the data generated by the K-means clustering technique using those methods is not entirely accurate because the results of K-means depend heavily on its initialization.

[0086] K-means can be significantly improved repeating the initialization procedure several times. All the tests performed by us were using only one single iteration of the K- means++ and random initialization methods which provide a low accuracy of the cluster results. Typically, when the K-means clustering technique is deployed with K-means++ or random initialization methods, the technique needs to be executed several times and the results of all these executions are combined together. This procedure is called Ensemble Clustering.

[0087] Running the K-means clustering technique several times implies its overall performance will decrease by the number of iterations. For example, if the initialization procedure is repeated 10 times, the latency will also increase by 10 times.

[0088] A method according to embodiments described herein, on the other hand, only requires one iteration because this method does not use any randomness to assign the initialization values.

[0089] Therefore, another advantage of the method described herein is that it may only require one iteration, while K-means++ and random methods require more iterations because their initializations always use some random initial step.

[0090] Figure 17A is a block diagram of an anomaly detection system 100 according to some embodiments. The anomaly detection system 100 may be incorporated into the functionality of an element of a network management system, such as a service monitoring dashboard 14, network automation/management system 12 or metrics collector 16 as shown in Figure 1. In some embodiments, the anomaly detection system 100 may be a standalone system or a cloud-based system that provides services to one or more elements of a network management system, which act as clients of the anomaly detection system 100. It will be is apparent, however that the embodiments of the disclosure are not limited to anomaly detection. Other applications of the method and system according to the embodiments are possible, such as data partitioning, event detection, object detection, pattern detection, etc.

[0091] Various embodiments provide an anomaly detection system 100 that includes a processor circuit 134 a communication interface 118 coupled to the processor circuit 134, and a memory 136 coupled to the processor circuit 134. The processor circuit 134 may be a single processor or may comprise a multi-processor system. In some embodiments, processing may be performed by multiple different systems that share processing power, such as in a distributed or cloud computing system. The memory 136 includes machine-readable computer program instructions that, when executed by the processor circuit, cause the processor circuit to perform some of the operations and/or implement the functions depicted described herein.

[0092] As shown, an anomaly detection system 100 includes a communication interface 118 (also referred to as a network interface) configured to provide communications with other devices. The anomaly detection system 100 also includes a processor circuit 134 (also referred to as a processor) and a memory circuit 136 (also referred to as memory) coupled to the processor circuit 134. According to other embodiments, processor circuit 134 may be defined to include memory so that a separate memory circuit is not required.

[0093] As discussed herein, operations of the anomaly detection system 100 may be performed by processing circuit 134 and/or communication interface 118. For example, the processing circuit 134 may control the communication interface 118 to transmit communications through the communication interface 118 to one or more other devices and/or to receive communications through network interface from one or more other devices. Moreover, modules may be stored in memory 136, and these modules may provide instructions so that when instructions of a module are executed by processing circuit 134, processing circuit 134 performs respective operations (e.g., operations discussed herein with respect to example embodiments.

[0094] Figure 17B illustrates various functional modules that may be stored in the memory 136 of the anomaly detection system 100. The modules may include an initialization module 122 that initializes centroids of a dataset as described above and a K-means clustering module 124 that executes a K-means clustering technique using the initialized centroids.

[0095] Figure 18 is a flowchart that illustrates operations of an anomaly detection system 100 according to some embodiments. In particular, referring to Figure 18, a dataset of network data containing data points is provided (block 201). Each data point in the dataset has a value for one of N features. The data points may include network data collected by a metrics collector of a network monitoring system as illustrated in Figure 1, such as network metrics that are collected by a service monitoring dashboard 14 shown in Figure 1. The data points may include large amounts of multi-dimensional network monitoring data collected by a metrics collector 16 and relating to data traffic carried in the network using one or more communication protocols, such as HTTP, MQTT, CoAP, LwM2M, etc. Other data types are possible, such as radio access network data (e.g., key performance indicator data, radio channel related metrics, etc.), image data, environmental sensory data (radar, LIDAR, camera, etc).

[0096] The method then selects a feature (block 204) and generates a list of values of the feature from the data points (block 206). The list of values is then sorted (block 208), and the sorted list is then divided into K/ V/ | sub-lists, where K is the number of centroids that are being initialized for use in a K-means clustering technique and N is the dimensionality of the data points. A median value of each sub-list is determined and stored in a median value list (block 210).

[0097] The method then determines if there are any other features to be considered (block 212). If not, the operations return to block 202, and another feature is selected and analyzed at blocks 204-210 to identify the median values of the sub-lists of values associated with the selected feature.

[0098] If it is determined at block 212 that there are no more features to consider, the method generates N-dimensional points from permutations of the median values at block 214. The method then selects K of the N-dimensional points as initial centroids for the K-means clustering technique (block 216), and executes the K-means clustering technique using the initial centroids (block 218).

[0099] Finally, at block 220, the system detects anomalous data points (e.g., outlier data points) in the data set based on the clusters identified by the K-means clustering technique. The system may handle the anomalous data points in a number of ways. For example, the system may ignore the anomalous data points when generating failure rates, determining whether to raise an alarm based on the data, determining a severity of a failure, computing a number of affected devices, etc. In other cases, the system may generate an alarm in response to the presence of anomalous data points. [0100] In the above description of various embodiments of present inventive concepts, it is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of present inventive concepts. Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which present inventive concepts belong. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of this specification and the relevant art.

[0101] As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Well-known functions or constructions may not be described in detail for brevity and/or clarity. The term "and/or" includes any and all combinations of one or more of the associated listed items.

[0102] It will be understood that although the terms first, second, third, etc. may be used herein to describe various elements/operations, these elements/operations should not be limited by these terms. These terms are only used to distinguish one element/operation from another element/operation. Thus, a first element/operation in some embodiments could be termed a second element/operation in other embodiments without departing from the teachings of present inventive concepts. The same reference numerals or the same reference designators denote the same or similar elements throughout the specification.

[0103] As used herein, the terms "comprise", "comprising", "comprises", "include", "including", "includes", "have", "has", "having", or variants thereof are open-ended, and include one or more stated features, integers, elements, steps, components, or functions but does not preclude the presence or addition of one or more other features, integers, elements, steps, components, functions, or groups thereof.

[0104] Example embodiments are described herein with reference to block diagrams and/or flowchart illustrations of computer-implemented methods, apparatus (systems and/or devices) and/or computer program products. It is understood that a block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions that are performed by one or more computer circuits. These computer program instructions may be provided to a processor circuit of a general purpose computer circuit, special purpose computer circuit, and/or other programmable data processing circuit to produce a machine, such that the instructions, which execute via the processor of the computer and/or other programmable data processing apparatus, transform and control transistors, values stored in memory locations, and other hardware components within such circuitry to implement the functions/acts specified in the block diagrams and/or flowchart block or blocks, and thereby create means (functionality) and/or structure for implementing the functions/acts specified in the block diagrams and/or flowchart block(s).

[0105] These computer program instructions may also be stored in a tangible computer- readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instructions which implement the functions/acts specified in the block diagrams and/or flowchart block or blocks. Accordingly, embodiments of present inventive concepts may be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.) that runs on a processor such as a digital signal processor, which may collectively be referred to as "circuitry," "a module" or variants thereof.

[0106] It should also be noted that in some alternate implementations, the functions/acts noted in the blocks may occur out of the order noted in the flowcharts. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Moreover, the functionality of a given block of the flowcharts and/or block diagrams may be separated into multiple blocks and/or the functionality of two or more blocks of the flowcharts and/or block diagrams may be at least partially integrated. Finally, other blocks may be added/inserted between the blocks that are illustrated, and/or blocks/operations may be omitted without departing from the scope of inventive concepts. Moreover, although some of the diagrams include arrows on communication paths to show a primary direction of communication, it is to be understood that communication may occur in the opposite direction to the depicted arrows.

[0107] Many variations and modifications can be made to the embodiments without substantially departing from the principles of the present inventive concepts. All such variations and modifications are intended to be included herein within the scope of present inventive concepts. Accordingly, the above disclosed subject matter is to be considered illustrative, and not restrictive, and the examples of embodiments are intended to cover all such modifications, enhancements, and other embodiments, which fall within the spirit and scope of present inventive concepts. Thus, to the maximum extent allowed by law, the scope of present inventive concepts are to be determined by the broadest permissible interpretation of the present disclosure including the examples of embodiments and their equivalents, and shall not be restricted or limited by the foregoing detailed description.

Claims

CLAIMS:

1. A computer-implemented method of initializing a number of centroids, K, of a dataset including a plurality of data points having a number of features, N, for processing with a K-means clustering technique, the method comprising: for at least one feature, obtaining (204) a list of values of the feature from the data points; for the at least one feature, sorting (206) the list of values to obtain a sorted list; dividing (208) each sorted list of values into a plurality of sub-lists of sorted values; storing (210) a median value of each sub-list in a median value list; and selecting (216) K centroid points by combining median values from the median value list.

2. The method of Claim 2, wherein dividing each sorted list comprises dividing each sorted list into K/ Vfr| sub-lists, where the symbol |x] denotes a ceiling function that maps x to a least integer greater than or equal to x.

3. The method of Claim 1 or 2, wherein selecting K centroid points by combining values from the median value list comprises generating permutations of values from the median value list and selecting K of the permutations.

4. The method of Claim 2 or 3, wherein the median value list comprises a list of tuples, wherein each of the tuples comprises K/ Vfr] median values of sub-lists generated from values of a corresponding one of the features.

5. The method of any of Claims 1 to 4, further comprising: performing (218) K-means clustering using the selected centroids as initial centroid values.

6. The method of Claim 5, further comprising detecting (220) anomalous data points in the dataset using clusters obtained by performing the K-means clustering.

7. The method of any of Claims 1 to 6, wherein the dataset comprises network monitoring data obtained by monitoring communications in a communications network.

8. The method of Claim 7, wherein the network monitoring data comprises service protocol metrics in the communications network.

9. The method of Claim 6, further comprising: determining whether or not to generate an alarm based on the detection of the anomalous data points.

10. The method of Claim 2, wherein dividing each sorted list of values into K/ fr| sub-lists of sorted values defines a plurality of quadrants of the features in an N-dimensional feature space.

11. The method of Claim 2, wherein generating K centroid points by combining values from the median value list comprises: generating a plurality of permutations of values in the median value list; and selecting K of the permutations as centroids.

12. An anomaly detection system (100), comprising: a processor (134); a communication interface (118) coupled to the processor; and a memory (136) coupled to the processor, wherein the memory comprises computer readable instructions that when executed by the processor cause the system to perform operations comprising: initializing a number of centroids, K, of a dataset including a plurality of data points having a number of features, N, for processing with a K-means clustering technique, wherein initializing the centroids comprises: for at least one feature, obtaining (204) a list of values of the feature from the data points; for the at least one feature, sorting (206) the list of values to obtain a sorted list; dividing (208) each sorted list of values into a plurality of sub-lists of sorted values; storing (210) a median value of each sub-list in a median value list; and selecting (216) K centroid points by combining median values from the median value list; performing (218) K-means clustering using the selected centroid points as initial centroid points to obtain K clusters of data points; and identifying (220) anomalous data points based on the clusters.

13. The anomaly detection system of Claim 12, wherein dividing each sorted list comprises dividing each sorted list into K/ fr| sub-lists, where the symbol |x] denotes a ceiling function that maps x to a least integer greater than or equal to x.

14. The anomaly detection system of Claim 12 or 13, wherein generating k selecting K centroid points by combining values from the median value list comprises generating permutations of values from the median value list and selecting K of the permutations.

15. The anomaly detection system of Claim 13 or 14, wherein the median value list comprises a list of tuples, wherein each of the tuples comprises K/

median values of sublists generated from values of a corresponding one of the features.

16. The anomaly detection system of any of Claims 12 to 15, wherein the dataset comprises network monitoring data obtained by monitoring communications in a communications network.

17. The anomaly detection system of Claim 16, wherein the network monitoring data comprises service protocol metrics in the communications network.

18. The anomaly detection system of Claim 17, wherein the anomaly detection system determines whether or not to generate an alarm based on the detection of the anomalous data points.

19. The anomaly detection system of Claim 12, wherein dividing each sorted list of values into sub-lists of sorted values defines a plurality of quadrants of the features in an N- dimensional feature space.

20. The anomaly detection system of Claim 12, wherein generating K centroid points by combining values from the median value list comprises: generating a plurality of permutations of values in the median value list; and selecting K of the permutations as centroids.

21. A computer program comprising program code to be executed by processing circuitry of an apparatus, whereby execution of the program code causes the apparatus to perform operations according to any of Claims 1 to 11.

22. A computer program product comprising a non-transitory storage medium including program code to be executed by processing circuitry of an apparatus, whereby execution of the program code causes the apparatus to perform operations according to any of Claims 1 to 11.

23. A computer-implemented method of detecting anomalous data points in a dataset of communications monitoring data obtained from a communications network, the method comprising: initializing a number of centroids, K, of the dataset including a plurality of data points having a number of features, N, for processing with a K-means clustering technique, wherein initializing the centroids comprises: for at least one feature, obtaining (204) a list of values of the feature from the data points; for the at least one feature, sorting (206) the list of values to obtain a sorted list; dividing (208) each sorted list of values into a plurality of sub-lists of sorted values; storing (210) a median value of each sub-list in a median value list; and selecting (216) K centroid points by combining median values from the median value list; performing (218) K-means clustering using the selected centroid points as initial centroid points to obtain K clusters of data points; and identifying (220) anomalous data points based on the clusters.