WO2022117911A1 - Anomaly detection - Google Patents

Anomaly detection Download PDF

Info

Publication number
WO2022117911A1
WO2022117911A1 PCT/FI2021/050783 FI2021050783W WO2022117911A1 WO 2022117911 A1 WO2022117911 A1 WO 2022117911A1 FI 2021050783 W FI2021050783 W FI 2021050783W WO 2022117911 A1 WO2022117911 A1 WO 2022117911A1
Authority
WO
WIPO (PCT)
Prior art keywords
measurement results
anomaly detection
component
time series
target system
Prior art date
Application number
PCT/FI2021/050783
Other languages
French (fr)
Inventor
Qi Yu
Viivi UURTIO
Original Assignee
Elisa Oyj
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Elisa Oyj filed Critical Elisa Oyj
Publication of WO2022117911A1 publication Critical patent/WO2022117911A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B23/00Testing or monitoring of control systems or parts thereof
    • G05B23/02Electric testing or monitoring
    • G05B23/0205Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults
    • G05B23/0218Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults
    • G05B23/0221Preprocessing measurements, e.g. data collection rate adjustment; Standardization of measurements; Time series or signal analysis, e.g. frequency analysis or wavelets; Trustworthiness of measurements; Indexes therefor; Measurements using easily measured parameters to estimate parameters difficult to measure; Virtual sensor creation; De-noising; Sensor fusion; Unconventional preprocessing inherently present in specific fault detection methods like PCA-based methods
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B23/00Testing or monitoring of control systems or parts thereof
    • G05B23/02Electric testing or monitoring
    • G05B23/0205Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults
    • G05B23/0259Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterized by the response to fault detection
    • G05B23/0267Fault communication, e.g. human machine interface [HMI]
    • G05B23/0272Presentation of monitored results, e.g. selection of status reports to be displayed; Filtering information to the user
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/08Testing, supervising or monitoring using real traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present application generally relates to analyzing measurement results and more particularly to anomaly detection in analysis of measurement results.
  • Anomaly detection refers to identification of data points, items, observations, events or other variables that do not conform to an expected pattern of a given data sample or data vector.
  • Anomaly detection models can be trained to learn the structure of normal data samples. The models output an anomaly score for an analysed sample, and the sample is classified as an anomaly, if the anomaly score exceeds some predefined threshold.
  • the method comprises analysing data samples using an anomaly detection model to form an initial time series of anomaly detection scores, wherein the data samples comprise measurement results of the target system; decomposing the initial time series of anomaly detection scores into a trend component, a seasonal component, and a remainder component (404), wherein the remainder component is determined by deducting the trend component and the seasonal component from the initial time series; and outputting at least the remainder component of the anomaly detection scores for the purpose of evaluating performance of the target system.
  • the initial time series of anomaly detection scores is decomposed into the trend component, the seasonal component, and the remainder component through two processing loops, wherein an outer loop assigns robustness weights to each data point of the time series depending on the size of the remainder component, and an inner loop iteratively updates the trend and seasonal components using LOESS smoothing.
  • the method further comprises filtering the remainder component of the anomaly detection scores by filtering out data points that do not reach a predefined threshold, and providing the filtered remainder component as the output for the purpose of evaluating performance of the target system.
  • the anomaly detection model is isolation forest model.
  • the initial time series of anomaly detection scores is formed of aggregated anomaly detection scores.
  • the measurement results of the data samples are grouped into subcategories, and an aggregate measurement result is determined for each subcategory, and the data samples that are analysed are formed of the aggregated measurement results.
  • the measurement results of the data samples are grouped into subcategories, and the data samples to be analysed are set to comprising measurement results of selected subcategories.
  • the target system is a telecommunication network and the measurement results comprise performance metrics from the telecommunication network.
  • the performance metrics comprise key performance indicator data and/or probe data from the telecommunication network.
  • the performance metrics are selected from the group consisting of: assignment parameters, circuit switched fall back parameters, handover parameters, sms parameters, call parameters, paging parameters, throughput, signal level, number of users, signal quality indicators, timing advance.
  • the target system is a telecommunication network and the subcategories are selected from the group consisting of: measurement results relating to certain cell, measurement results relating to certain cell type, measurement results relating to base stations, measurement results relating to mobile switching center, measurement results relating to roaming network, measurement results relating to subscriber category, measurement results relating to traffic category, measurement results relating to connection release reason.
  • the target system is an industrial process and the measurement results comprise sensor data from the industrial process.
  • an apparatus comprising a processor and a memory including computer program code; the memory and the computer program code configured to, with the processor, cause the apparatus to perform the method of the first aspect or any related embodiment.
  • a computer program comprising computer executable program code which when executed by a processor causes an apparatus to perform the method of the first aspect or any related embodiment.
  • a computer program product comprising a non-transitory computer readable medium having the computer program of the third example aspect stored thereon.
  • an apparatus comprising means for performing the method of the first aspect or any related embodiment.
  • Any foregoing memory medium may comprise a digital data storage such as a data disc or diskette, optical storage, magnetic storage, holographic storage, opto- magnetic storage, phase-change memory, resistive random access memory, magnetic random access memory, solid-electrolyte memory, ferroelectric random access memory, organic memory or polymer memory.
  • the memory medium may be formed into a device without other substantial functions than storing memory or it may be formed as part of a device with other functions, including but not limited to a memory of a computer, a chip set, and a sub assembly of an electronic device.
  • Fig. 1 schematically shows an example scenario according to an example embodiment
  • Fig. 2 shows a block diagram of an apparatus according to an example embodiment
  • Fig. 3 shows a flow diagram illustrating example methods according to certain embodiments; and Fig. 4 shows graphs according to an example embodiment.
  • Different known anomaly detection models can be applied to data samples of measurement data, such as for example sensor data or performance metrics, of a target system in order to identify anomalous situations that may require attention of maintenance personnel.
  • the challenge is that there may be huge amount of detected anomalies and not all of them are necessarily relevant. Further, the highest anomaly detection scores do not necessarily indicate the most relevant anomalies.
  • performance metrics of a telecommunication network may result in detection of anomalies during night time, when the amount of traffic and users in the network is low.
  • the reason for this is that in small amount of performance metrics even a small degradation in performance may stand out as an anomaly although the users of the network may not even notice any problem. Putting an effort to avoiding such anomalies may be waste of resources. Therefore, there is a need to further process anomaly detection scores obtained from an anomaly detection model in order to identify significant anomalies that should be looked at by maintenance personnel or resolved by some automated solution.
  • Various embodiments of present disclosure provide mechanisms to analyze time series of anomaly detection scores especially with the aim of identifying most relevant anomalies or ignoring less significant anomalies. Seasonal and trend adjustment of the time series is performed for this purpose.
  • the data samples in the context of present disclosure may in general relate to measurement results from a target system, such as an industrial process or a telecommunication network.
  • the variables of the measurement result samples may involve sensor data and/or performance metrics such as pressure, temperature, manufacturing time, yield of a production phase etc.
  • the variables of the measurement result samples may involve sensor data and/or performance metrics such as key performance indicator values, signal level, number of users, number of dropped connections etc.
  • Example embodiments suit well for analyzing data samples that are inherently noisy. For example performance indicator data or probe data from telecommunication networks often comprises seasonal or trend components that cause noise.
  • Fig. 1 shows an example scenario according to an embodiment.
  • the scenario shows a controllable target system 101 and an automation system 111 configured to implement analysis of measurement results according to example embodiments.
  • the target system 101 may be a telecommunication network 104 comprising a plurality of physical network sites comprising base stations and other network devices, or the target system 101 may be an industrial process 105.
  • the automation system 111 is configured to implement at least some example embodiments of present disclosure.
  • the scenario of Fig. 1 operates as follows:
  • the automation system 111 receives data samples from the target system 101.
  • the data samples concern measurement results from the target system 101 .
  • the automation system 111 analyzes the data samples, and in phase 13, the automation system 111 outputs the results of the analysis. This output may then be used for manually or automatically controlling the target system 101 .
  • the process in the automation system 111 may be manually or automatically triggered.
  • Fig. 2 shows a block diagram of an apparatus 20 according to an embodiment.
  • the apparatus 20 is for example a general-purpose computer or server or some other electronic data processing apparatus.
  • the apparatus 20 can be used for implementing at least some embodiments of the invention. That is, with suitable configuration the apparatus 20 is suited for operating for example as the automation system 111 of foregoing disclosure.
  • the apparatus 20 comprises a communication interface 25; a processor 21 ; a user interface 24; and a memory 22.
  • the apparatus 20 further comprises software 23 stored in the memory 22 and operable to be loaded into and executed in the processor 21.
  • the software 23 may comprise one or more software modules and can be in the form of a computer program product.
  • the processor 21 may comprise a central processing unit (CPU), a microprocessor, a digital signal processor (DSP), a graphics processing unit, or the like.
  • Fig. 2 shows one processor 21 , but the apparatus 20 may comprise a plurality of processors.
  • the user interface 24 is configured for providing interaction with a user of the apparatus. Additionally or alternatively, the user interaction may be implemented through the communication interface 25.
  • the user interface 24 may comprise a circuitry for receiving input from a user of the apparatus 20, e.g., via a keyboard, graphical user interface shown on the display of the apparatus 20, speech recognition circuitry, or an accessory device, such as a headset, and for providing output to the user via, e.g., a graphical user interface or a loudspeaker.
  • the memory 22 may comprise for example a non-volatile or a volatile memory, such as a read-only memory (ROM), a programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), a random-access memory (RAM), a flash memory, a data disk, an optical storage, a magnetic storage, a smart card, or the like.
  • the apparatus 20 may comprise a plurality of memories.
  • the memory 22 may serve the sole purpose of storing data, or be constructed as a part of an apparatus 20 serving other purposes, such as processing data.
  • the communication interface 25 may comprise communication modules that implement data transmission to and from the apparatus 20.
  • the communication modules may comprise a wireless or a wired interface module(s) or both.
  • the wireless interface may comprise such as a WLAN, Bluetooth, infrared (IR), radio frequency identification (RF ID), GSM/GPRS, CDMA, WCDMA, LTE (Long Term Evolution) or 5G radio module.
  • the wired interface may comprise such as Ethernet or universal serial bus (USB), for example.
  • the communication interface 25 may support one or more different communication technologies.
  • the apparatus 20 may additionally or alternatively comprise more than one of the communication interfaces 25.
  • the apparatus 20 may comprise other elements, such as displays, as well as additional circuitry such as memory chips, application-specific integrated circuits (ASIC), other processing circuitry for specific purposes and the like. Further, it is noted that only one apparatus is shown in Fig. 2, but the embodiments of the invention may equally be implemented in a cluster of shown apparatuses.
  • ASIC application-specific integrated circuits
  • Fig. 3 shows a flow diagram illustrating example methods according to certain embodiments.
  • the methods may be implemented in the automation system 111 of Fig. 1 and/or in the apparatus 20 of Fig. 2.
  • the methods are implemented in a computer and do not require human interaction unless otherwise expressly stated. It is to be noted that the methods may however provide output that may be further processed by humans and/or the methods may require user input to start. Different phases shown in the flow diagrams may be combined with each other and the order of phases may be changed except where otherwise explicitly defined. Furthermore, it is to be noted that performing all phases of the flow diagram is not mandatory.
  • the method of Fig. 3 provides analyzing measurement results of a target system.
  • the method of Fig. 3 comprises the following phases:
  • the data samples comprise measurement results of the target system.
  • the measurement results comprise for example performance metrics, such as key performance indicator (KPI) data and/or probe data, from a telecommunication network.
  • KPI key performance indicator
  • the data samples may be received e.g. from a network optimization or network management tool of a telecommunication network.
  • the data samples may be multidimensional.
  • the performance metrics of the telecommunication network may comprise any numerical parameter values that may be available in the network. By way of example, one or more of the following may be included: assignment parameters, circuit switched fall back parameters, handover parameters, sms parameters, call parameters, paging parameters, throughput, signal level, number of users, signal quality indicators, timing advance.
  • Measurement results of the data samples are optionally grouped into subcategories. The analysis may be performed for data samples comprising measurement results of selected subcategories. Alternatively, an aggregate measurement result may be determined for each subcategory, and the data samples that are analysed can be formed of the aggregated measurement results.
  • the subcategories may be selected from the group consisting of: measurement results relating to certain cell, measurement results relating to certain cell type, measurement results relating to base stations, measurement results relating to mobile switching center, measurement results relating to roaming network, measurement results relating to subscriber category, measurement results relating to traffic category, measurement results relating to connection release reason.
  • An anomaly detection model is applied to the data samples to form an initial time series of anomaly detection scores.
  • isolation forest model can be used but likewise some other anomaly detection model, such as k nearest neighbors (kNN), local outlier factor (LOF), principal component analysis (PCA), kernel principal component analysis, independent component analysis (ICA), autoencoder, angle-based outlier detection (ABOD), may be applied.
  • kNN k nearest neighbors
  • LPF local outlier factor
  • PCA principal component analysis
  • ICA independent component analysis
  • ABOD angle-based outlier detection
  • Each data point in the initial time series of the anomaly detection scores may comprise individual anomaly detection score or some aggregated anomaly detection scores may be used.
  • each data point may comprise mean of multiple anomaly detection scores.
  • the aggregated anomaly detection scores may be an aggregate of anomaly detection scores from multiple cells or from certain predefined cells (e.g. cells of certain type, or cells on certain location), or an aggregate of anomaly detection scores related to certain users (e.g. certain user types) or related to certain technology.
  • the initial time series of anomaly detection scores is decomposed into a trend component, a seasonal component, and a remainder component.
  • estimates of the trend component and the seasonal component are determined from the initial time series.
  • the remainder component may be determined by deducting the trend component and the seasonal component from the initial time series.
  • the remainder component provides anomaly detection scores of interest although some further processing of the remainder component may be optionally performed.
  • the remainder component is optionally filtered by filtering out data points that do not reach a predefined threshold.
  • the threshold may be set to 0.5. This is to be considered as one possible example, while also other values can be used. Further it is to be noted that the threshold may be different depending on the anomaly detection model that is used. The threshold may be a pre-selected value or the threshold may be fine tuned according to the expected proportion of anomalies in training data samples.
  • At least the remainder component of the anomaly detection scores or the filtered remainder component is output for the purpose of evaluating performance of the target system. Maintenance personnel may then take appropriate measures based on anomalies included in this output.
  • Fig. 4 shows graphs according to an example embodiment.
  • Fig. 4 shows an example of initial time series 401 of anomaly detection scores, a trend component 402 determined based on the time series 401 , a seasonal component 403 determined based on the time series 401 , and a remainder component 404 determined based on the time series 401 , the trend component 402, and the seasonal component 403.
  • the initial time series 401 represents anomaly detection scores at different time points. There may be a time point every 15 minutes, for example. Each time point may comprise individual anomaly detection score or an aggregated value based on multiple anomaly detection scores. For example, mean value may be used.
  • the trend component represents long term tendencies in the time series.
  • the seasonal component represent seasonal changes in the time series.
  • the anomaly detection scores that remain in the remainder component represent the most relevant anomaly detection scores. That is, the less relevant anomaly detection scores have been ignored by the deduction of the trend component and the seasonal component.
  • the anomaly detection scores that remain in the remainder component may be filtered to further reduce the amount of remaining anomalies.
  • the decomposition of the initial time series of anomaly detection scores is performed by one of the following methods: X11 decomposition, SEATS (Seasonal Extraction in ARIMA Time Series).
  • the decomposition of the initial time series into the trend, seasonal and remainder components employs LOESS (locally estimated scatterplot smoothing) to extract smooth estimates of the three components.
  • LOESS locally estimated scatterplot smoothing
  • the initial time series of anomaly detection scores is decomposed into the trend component, the seasonal component, and the remainder component through two processing loops including an inner loop and an outer loop.
  • the outer loop assigns robustness weights to each data point of the time series depending on the size of the remainder component. This allows reducing or eliminating the effects of outliers in the final remainder component.
  • the inner loop iteratively updates the trend and seasonal components using LOESS smoothing.
  • the inner loop may perform the following steps:
  • Another technical effect of one or more of the example embodiments disclosed herein is that analysis that suits well to analyzing measurement results from telecommunication networks as measurement results from telecommunication networks tend to include measurement results that can be ignored e.g. on the basis of their timing.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Pure & Applied Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Signal Processing (AREA)
  • Algebra (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Evolutionary Computation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computational Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Operations Research (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

Analyzing measurement results of a target system. Data samples are analysed using an anomaly detection model to form an initial time series (401) of anomaly detection scores, wherein the data samples comprise measurement results of the target system. The initial time series (401) of anomaly detection scores is decomposed into a trend component (402), a seasonal component (403), and a remainder component (404), wherein the remainder component is determined by deducting the trend component and the seasonal component from the initial time series. At least the remainder component of the anomaly detection scores is output for the purpose of evaluating performance of the target system.

Description

ANOMALY DETECTION
TECHNICAL FIELD
The present application generally relates to analyzing measurement results and more particularly to anomaly detection in analysis of measurement results.
BACKGROUND
This section illustrates useful background information without admission of any technique described herein representative of the state of the art.
Anomaly detection refers to identification of data points, items, observations, events or other variables that do not conform to an expected pattern of a given data sample or data vector. Anomaly detection models can be trained to learn the structure of normal data samples. The models output an anomaly score for an analysed sample, and the sample is classified as an anomaly, if the anomaly score exceeds some predefined threshold. There are various unsupervised and semi-supervised learning models that can be used in anomaly detection. Such models include for example k nearest neighbors (kNN), local outlier factor (LOF), principal component analysis (PCA), kernel principal component analysis, independent component analysis (ICA), isolation forest, autoencoder, angle-based outlier detection (ABOD), and others. Different models represent different hypotheses about how anomalous points stand out from the rest of the data.
Now a new approach is provided for anomaly detection.
SUMMARY
The appended claims define the scope of protection. Any examples and technical descriptions of apparatuses, products and/or methods in the description and/or drawings not covered by the claims are presented not as embodiments of the invention but as background art or examples useful for understanding the invention. According to a first example aspect there is provided a computer implemented method for analyzing measurement results of a target system. The method comprises analysing data samples using an anomaly detection model to form an initial time series of anomaly detection scores, wherein the data samples comprise measurement results of the target system; decomposing the initial time series of anomaly detection scores into a trend component, a seasonal component, and a remainder component (404), wherein the remainder component is determined by deducting the trend component and the seasonal component from the initial time series; and outputting at least the remainder component of the anomaly detection scores for the purpose of evaluating performance of the target system.
In some example embodiments, the initial time series of anomaly detection scores is decomposed into the trend component, the seasonal component, and the remainder component through two processing loops, wherein an outer loop assigns robustness weights to each data point of the time series depending on the size of the remainder component, and an inner loop iteratively updates the trend and seasonal components using LOESS smoothing.
In some example embodiments, the method further comprises filtering the remainder component of the anomaly detection scores by filtering out data points that do not reach a predefined threshold, and providing the filtered remainder component as the output for the purpose of evaluating performance of the target system.
In some example embodiments, the anomaly detection model is isolation forest model.
In some example embodiments, the initial time series of anomaly detection scores is formed of aggregated anomaly detection scores.
In some example embodiments, the measurement results of the data samples are grouped into subcategories, and an aggregate measurement result is determined for each subcategory, and the data samples that are analysed are formed of the aggregated measurement results.
In some example embodiments, the measurement results of the data samples are grouped into subcategories, and the data samples to be analysed are set to comprising measurement results of selected subcategories.
In some example embodiments, the target system is a telecommunication network and the measurement results comprise performance metrics from the telecommunication network.
In some example embodiments, the performance metrics comprise key performance indicator data and/or probe data from the telecommunication network.
In some example embodiments, the performance metrics are selected from the group consisting of: assignment parameters, circuit switched fall back parameters, handover parameters, sms parameters, call parameters, paging parameters, throughput, signal level, number of users, signal quality indicators, timing advance.
In some example embodiments, the target system is a telecommunication network and the subcategories are selected from the group consisting of: measurement results relating to certain cell, measurement results relating to certain cell type, measurement results relating to base stations, measurement results relating to mobile switching center, measurement results relating to roaming network, measurement results relating to subscriber category, measurement results relating to traffic category, measurement results relating to connection release reason.
In some example embodiments, the target system is an industrial process and the measurement results comprise sensor data from the industrial process.
According to a second example aspect of the present invention, there is provided an apparatus comprising a processor and a memory including computer program code; the memory and the computer program code configured to, with the processor, cause the apparatus to perform the method of the first aspect or any related embodiment.
According to a third example aspect of the present invention, there is provided a computer program comprising computer executable program code which when executed by a processor causes an apparatus to perform the method of the first aspect or any related embodiment.
According to a fourth example aspect there is provided a computer program product comprising a non-transitory computer readable medium having the computer program of the third example aspect stored thereon.
According to a fifth example aspect there is provided an apparatus comprising means for performing the method of the first aspect or any related embodiment.
Any foregoing memory medium may comprise a digital data storage such as a data disc or diskette, optical storage, magnetic storage, holographic storage, opto- magnetic storage, phase-change memory, resistive random access memory, magnetic random access memory, solid-electrolyte memory, ferroelectric random access memory, organic memory or polymer memory. The memory medium may be formed into a device without other substantial functions than storing memory or it may be formed as part of a device with other functions, including but not limited to a memory of a computer, a chip set, and a sub assembly of an electronic device.
Different non-binding example aspects and embodiments have been illustrated in the foregoing. The embodiments in the foregoing are used merely to explain selected aspects or steps that may be utilized in different implementations. Some embodiments may be presented only with reference to certain example aspects. It should be appreciated that corresponding embodiments may apply to other example aspects as well.
BRIEF DESCRIPTION OF THE FIGURES
Some example embodiments will be described with reference to the accompanying figures, in which:
Fig. 1 schematically shows an example scenario according to an example embodiment;
Fig. 2 shows a block diagram of an apparatus according to an example embodiment;
Fig. 3 shows a flow diagram illustrating example methods according to certain embodiments; and Fig. 4 shows graphs according to an example embodiment.
DETAILED DESCRIPTION
In the following description, like reference signs denote like elements or steps.
Different known anomaly detection models can be applied to data samples of measurement data, such as for example sensor data or performance metrics, of a target system in order to identify anomalous situations that may require attention of maintenance personnel. The challenge is that there may be huge amount of detected anomalies and not all of them are necessarily relevant. Further, the highest anomaly detection scores do not necessarily indicate the most relevant anomalies.
For example, performance metrics of a telecommunication network may result in detection of anomalies during night time, when the amount of traffic and users in the network is low. The reason for this is that in small amount of performance metrics even a small degradation in performance may stand out as an anomaly although the users of the network may not even notice any problem. Putting an effort to avoiding such anomalies may be waste of resources. Therefore, there is a need to further process anomaly detection scores obtained from an anomaly detection model in order to identify significant anomalies that should be looked at by maintenance personnel or resolved by some automated solution.
Having noticed that highest anomaly detection scores do not necessarily indicate the most relevant anomalies, the inventors of present disclosure have come to the conclusion that further analysis of the anomaly scores is needed. As a solution, they have ended up with arranging the anomaly scores in time series and analyzing the time series of anomaly scores.
Various embodiments of present disclosure provide mechanisms to analyze time series of anomaly detection scores especially with the aim of identifying most relevant anomalies or ignoring less significant anomalies. Seasonal and trend adjustment of the time series is performed for this purpose.
The data samples in the context of present disclosure may in general relate to measurement results from a target system, such as an industrial process or a telecommunication network. In the context of industrial processes, the variables of the measurement result samples may involve sensor data and/or performance metrics such as pressure, temperature, manufacturing time, yield of a production phase etc. In the context of telecommunication networks, the variables of the measurement result samples may involve sensor data and/or performance metrics such as key performance indicator values, signal level, number of users, number of dropped connections etc.
Example embodiments suit well for analyzing data samples that are inherently noisy. For example performance indicator data or probe data from telecommunication networks often comprises seasonal or trend components that cause noise.
Fig. 1 shows an example scenario according to an embodiment. The scenario shows a controllable target system 101 and an automation system 111 configured to implement analysis of measurement results according to example embodiments. The target system 101 may be a telecommunication network 104 comprising a plurality of physical network sites comprising base stations and other network devices, or the target system 101 may be an industrial process 105. The automation system 111 is configured to implement at least some example embodiments of present disclosure.
In an embodiment of the invention the scenario of Fig. 1 operates as follows: In phase 11 , the automation system 111 receives data samples from the target system 101. In general, the data samples concern measurement results from the target system 101 .
In phase 12, the automation system 111 analyzes the data samples, and in phase 13, the automation system 111 outputs the results of the analysis. This output may then be used for manually or automatically controlling the target system 101 .
The process in the automation system 111 may be manually or automatically triggered.
Fig. 2 shows a block diagram of an apparatus 20 according to an embodiment. The apparatus 20 is for example a general-purpose computer or server or some other electronic data processing apparatus. The apparatus 20 can be used for implementing at least some embodiments of the invention. That is, with suitable configuration the apparatus 20 is suited for operating for example as the automation system 111 of foregoing disclosure.
The apparatus 20 comprises a communication interface 25; a processor 21 ; a user interface 24; and a memory 22. The apparatus 20 further comprises software 23 stored in the memory 22 and operable to be loaded into and executed in the processor 21. The software 23 may comprise one or more software modules and can be in the form of a computer program product.
The processor 21 may comprise a central processing unit (CPU), a microprocessor, a digital signal processor (DSP), a graphics processing unit, or the like. Fig. 2 shows one processor 21 , but the apparatus 20 may comprise a plurality of processors.
The user interface 24 is configured for providing interaction with a user of the apparatus. Additionally or alternatively, the user interaction may be implemented through the communication interface 25. The user interface 24 may comprise a circuitry for receiving input from a user of the apparatus 20, e.g., via a keyboard, graphical user interface shown on the display of the apparatus 20, speech recognition circuitry, or an accessory device, such as a headset, and for providing output to the user via, e.g., a graphical user interface or a loudspeaker.
The memory 22 may comprise for example a non-volatile or a volatile memory, such as a read-only memory (ROM), a programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), a random-access memory (RAM), a flash memory, a data disk, an optical storage, a magnetic storage, a smart card, or the like. The apparatus 20 may comprise a plurality of memories. The memory 22 may serve the sole purpose of storing data, or be constructed as a part of an apparatus 20 serving other purposes, such as processing data.
The communication interface 25 may comprise communication modules that implement data transmission to and from the apparatus 20. The communication modules may comprise a wireless or a wired interface module(s) or both. The wireless interface may comprise such as a WLAN, Bluetooth, infrared (IR), radio frequency identification (RF ID), GSM/GPRS, CDMA, WCDMA, LTE (Long Term Evolution) or 5G radio module. The wired interface may comprise such as Ethernet or universal serial bus (USB), for example. The communication interface 25 may support one or more different communication technologies. The apparatus 20 may additionally or alternatively comprise more than one of the communication interfaces 25.
A skilled person appreciates that in addition to the elements shown in Fig. 2, the apparatus 20 may comprise other elements, such as displays, as well as additional circuitry such as memory chips, application-specific integrated circuits (ASIC), other processing circuitry for specific purposes and the like. Further, it is noted that only one apparatus is shown in Fig. 2, but the embodiments of the invention may equally be implemented in a cluster of shown apparatuses.
Fig. 3 shows a flow diagram illustrating example methods according to certain embodiments. The methods may be implemented in the automation system 111 of Fig. 1 and/or in the apparatus 20 of Fig. 2. The methods are implemented in a computer and do not require human interaction unless otherwise expressly stated. It is to be noted that the methods may however provide output that may be further processed by humans and/or the methods may require user input to start. Different phases shown in the flow diagrams may be combined with each other and the order of phases may be changed except where otherwise explicitly defined. Furthermore, it is to be noted that performing all phases of the flow diagram is not mandatory.
The method of Fig. 3 provides analyzing measurement results of a target system. The method of Fig. 3 comprises the following phases:
301 : Data samples are received. The data samples comprise measurement results of the target system. The measurement results comprise for example performance metrics, such as key performance indicator (KPI) data and/or probe data, from a telecommunication network. The data samples may be received e.g. from a network optimization or network management tool of a telecommunication network. The data samples may be multidimensional.
The performance metrics of the telecommunication network may comprise any numerical parameter values that may be available in the network. By way of example, one or more of the following may be included: assignment parameters, circuit switched fall back parameters, handover parameters, sms parameters, call parameters, paging parameters, throughput, signal level, number of users, signal quality indicators, timing advance. 302: Measurement results of the data samples are optionally grouped into subcategories. The analysis may be performed for data samples comprising measurement results of selected subcategories. Alternatively, an aggregate measurement result may be determined for each subcategory, and the data samples that are analysed can be formed of the aggregated measurement results.
In telecommunication network embodiments, the subcategories may be selected from the group consisting of: measurement results relating to certain cell, measurement results relating to certain cell type, measurement results relating to base stations, measurement results relating to mobile switching center, measurement results relating to roaming network, measurement results relating to subscriber category, measurement results relating to traffic category, measurement results relating to connection release reason.
303: An anomaly detection model is applied to the data samples to form an initial time series of anomaly detection scores. For example isolation forest model can be used but likewise some other anomaly detection model, such as k nearest neighbors (kNN), local outlier factor (LOF), principal component analysis (PCA), kernel principal component analysis, independent component analysis (ICA), autoencoder, angle-based outlier detection (ABOD), may be applied.
Each data point in the initial time series of the anomaly detection scores may comprise individual anomaly detection score or some aggregated anomaly detection scores may be used. For example, each data point may comprise mean of multiple anomaly detection scores. As an example, the aggregated anomaly detection scores may be an aggregate of anomaly detection scores from multiple cells or from certain predefined cells (e.g. cells of certain type, or cells on certain location), or an aggregate of anomaly detection scores related to certain users (e.g. certain user types) or related to certain technology.
304: The initial time series of anomaly detection scores is decomposed into a trend component, a seasonal component, and a remainder component. In general, estimates of the trend component and the seasonal component are determined from the initial time series. The remainder component may be determined by deducting the trend component and the seasonal component from the initial time series. The remainder component provides anomaly detection scores of interest although some further processing of the remainder component may be optionally performed.
Decomposition of the initial time series is discussed in more detail later in this document.
305: The remainder component is optionally filtered by filtering out data points that do not reach a predefined threshold. In an example, where isolation forest is used a s the anomaly detection model, the threshold may be set to 0.5. This is to be considered as one possible example, while also other values can be used. Further it is to be noted that the threshold may be different depending on the anomaly detection model that is used. The threshold may be a pre-selected value or the threshold may be fine tuned according to the expected proportion of anomalies in training data samples.
By performing the filtering in this phase instead of initially filtering out anomaly detection scores that are below a threshold, one achieves that more accurate estimates of the trend and seasonality components can be achieved and thereby the analysis may be improved.
306: At least the remainder component of the anomaly detection scores or the filtered remainder component is output for the purpose of evaluating performance of the target system. Maintenance personnel may then take appropriate measures based on anomalies included in this output.
Fig. 4 shows graphs according to an example embodiment. Fig. 4 shows an example of initial time series 401 of anomaly detection scores, a trend component 402 determined based on the time series 401 , a seasonal component 403 determined based on the time series 401 , and a remainder component 404 determined based on the time series 401 , the trend component 402, and the seasonal component 403.
The initial time series 401 represents anomaly detection scores at different time points. There may be a time point every 15 minutes, for example. Each time point may comprise individual anomaly detection score or an aggregated value based on multiple anomaly detection scores. For example, mean value may be used.
The trend component represents long term tendencies in the time series. The seasonal component represent seasonal changes in the time series. By deducting the trend and the seasonal component from the time series of the anomaly detection scores, one obtains a remainder component in which seasonal and trend effects have been eliminated. This enables identifying most relevant anomaly detection scores in the time series or ignoring the less significant anomaly detection scores in the time series. The anomaly detection scores that remain in the remainder component, represent the most relevant anomaly detection scores. That is, the less relevant anomaly detection scores have been ignored by the deduction of the trend component and the seasonal component.
It is to be noted that the anomaly detection scores that remain in the remainder component may be filtered to further reduce the amount of remaining anomalies.
In some embodiments, the decomposition of the initial time series of anomaly detection scores is performed by one of the following methods: X11 decomposition, SEATS (Seasonal Extraction in ARIMA Time Series).
In some embodiments, the decomposition of the initial time series into the trend, seasonal and remainder components employs LOESS (locally estimated scatterplot smoothing) to extract smooth estimates of the three components.
In some embodiments, the initial time series of anomaly detection scores is decomposed into the trend component, the seasonal component, and the remainder component through two processing loops including an inner loop and an outer loop. The outer loop assigns robustness weights to each data point of the time series depending on the size of the remainder component. This allows reducing or eliminating the effects of outliers in the final remainder component.
The inner loop iteratively updates the trend and seasonal components using LOESS smoothing. The inner loop may perform the following steps:
- deducting current estimation of the trend component from the time series to obtain time series without trend,
- partitioning the time series without trend into cycle-subseries, and LOESS smoothing the cycle-subseries,
- performing low-pass filtering on the smoothed cycle-subseries and deducting the filtered cycle-subseries from the smoothed cycle- subseries to obtain an estimate of the seasonal component, - deducting the estimate of the seasonal component from the time series to obtain time series without seasonality,
- LOESS smoothing the time series without seasonality to obtain an estimate of the trend component, and
- repeating the process to improve accuracy of the estimate of the trend component and the estimate of the seasonal component.
Without in any way limiting the scope, interpretation, or application of the claims appearing below, a technical effect of one or more of the example embodiments disclosed herein is improved analysis of measurement results. More particularly, improved analysis of results from an anomaly detection model is provided.
Another technical effect of one or more of the example embodiments disclosed herein is that analysis that suits well to analyzing measurement results from telecommunication networks as measurement results from telecommunication networks tend to include measurement results that can be ignored e.g. on the basis of their timing.
If desired, the different functions discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more of the before-described functions may be optional or may be combined
Various embodiments have been presented. It should be appreciated that in this document, words comprise, include and contain are each used as open-ended expressions with no intended exclusivity.
The foregoing description has provided by way of non-limiting examples of particular implementations and embodiments a full and informative description of the best mode presently contemplated by the inventors for carrying out the invention. It is however clear to a person skilled in the art that the invention is not restricted to details of the embodiments presented in the foregoing, but that it can be implemented in other embodiments using equivalent means or in different combinations of embodiments without deviating from the characteristics of the invention.
Furthermore, some of the features of the afore-disclosed example embodiments may be used to advantage without the corresponding use of other features. As such, the foregoing description shall be considered as merely illustrative of the principles of the present invention, and not in limitation thereof. Hence, the scope of the invention is only restricted by the appended patent claims.

Claims

1 . A computer implemented method for analyzing measurement results of a target system (101 ), the method comprising analysing (303) data samples using an anomaly detection model to form an initial time series (401 ) of anomaly detection scores, wherein the data samples comprise measurement results of the target system; decomposing (304) the initial time series (401 ) of anomaly detection scores into a trend component (402), a seasonal component (403), and a remainder component (404), wherein the remainder component is determined by deducting the trend component and the seasonal component from the initial time series; and outputting (306) at least the remainder component of the anomaly detection scores for the purpose of evaluating performance of the target system.
2. The method of claim 1 , wherein the initial time series of anomaly detection scores is decomposed into the trend component, the seasonal component, and the remainder component through two processing loops, wherein an outer loop assigns robustness weights to each data point of the time series depending on the size of the remainder component, and an inner loop iteratively updates the trend and seasonal components using LOESS smoothing.
3. The method of any preceding claim, further comprising filtering (305) the remainder component of the anomaly detection scores by filtering out data points that do not reach a predefined threshold, and providing the filtered remainder component as the output for the purpose of evaluating performance of the target system.
4. The method of any preceding claim, wherein the anomaly detection model is isolation forest model.
5. The method of any preceding claim, wherein the initial time series of anomaly detection scores is formed of aggregated anomaly detection scores.
6. The method of any preceding claim, wherein the measurement results of the data samples are grouped (302) into subcategories, and an aggregate measurement result is determined for each subcategory, and the data samples that are analysed are formed of the aggregated measurement results.
7. The method of any preceding claim, wherein the measurement results of the data samples are grouped (302) into subcategories, and the data samples to be analysed are set to comprising measurement results of selected subcategories.
8. The method of any preceding claim, wherein the target system (101 ) is a telecommunication network and the measurement results comprise performance metrics from the telecommunication network.
9. The method of claim 8, wherein the performance metrics comprise key performance indicator data and/or probe data from the telecommunication network.
10. The method of claim 8 or 9, wherein the performance metrics are selected from the group consisting of: assignment parameters, circuit switched fall back parameters, handover parameters, sms parameters, call parameters, paging parameters, throughput, signal level, number of users, signal quality indicators, timing advance.
11. The method of any one of claim 6-10, wherein the target system (101 ) is a telecommunication network and the subcategories are selected from the group 16 consisting of: measurement results relating to certain cell, measurement results relating to certain cell type, measurement results relating to base stations, measurement results relating to mobile switching center, measurement results relating to roaming network, measurement results relating to subscriber category, measurement results relating to traffic category, measurement results relating to connection release reason.
12. The method of any one of claims 1-7, wherein the target system (101) is an industrial process and the measurement results comprise sensor data from the industrial process.
13. An apparatus (20, 111) comprising a processor (21 ), and a memory (22) including computer program code; the memory and the computer program code configured to, with the processor, cause the apparatus to perform the method of any one of claims 1 -12.
14. A computer program comprising computer executable program code (23) which when executed by a processor causes an apparatus to perform the method of any one of claims 1 -12.
PCT/FI2021/050783 2020-12-04 2021-11-18 Anomaly detection WO2022117911A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FI20206252A FI129553B (en) 2020-12-04 2020-12-04 Anomaly detection
FI20206252 2020-12-04

Publications (1)

Publication Number Publication Date
WO2022117911A1 true WO2022117911A1 (en) 2022-06-09

Family

ID=78822242

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2021/050783 WO2022117911A1 (en) 2020-12-04 2021-11-18 Anomaly detection

Country Status (2)

Country Link
FI (1) FI129553B (en)
WO (1) WO2022117911A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114925753A (en) * 2022-04-28 2022-08-19 南通东升灯饰有限公司 Use abnormity alarm system of LED floor lamp
CN116627707A (en) * 2023-07-20 2023-08-22 中孚安全技术有限公司 Detection method and system for abnormal operation behavior of user

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160342910A1 (en) * 2015-05-18 2016-11-24 International Business Machines Corporation Automatic time series exploration for business intelligence analytics

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160342910A1 (en) * 2015-05-18 2016-11-24 International Business Machines Corporation Automatic time series exploration for business intelligence analytics

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DINH PHUC TRINH ET AL: "ECSD: Enhanced Compromised Switch Detection in an SDN-Based Cloud Through Multivariate Time-Series Analysis", IEEE ACCESS, IEEE, USA, vol. 8, 20 June 2020 (2020-06-20), pages 119346 - 119360, XP011797595, DOI: 10.1109/ACCESS.2020.3004258 *
DUAN QI ET AL: "Base Station Traffic Prediction based on STL-LSTM Networks", 2018 24TH ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS (APCC), IEEE, 12 November 2018 (2018-11-12), pages 407 - 412, XP033512916, DOI: 10.1109/APCC.2018.8633565 *
JING GAO ET AL: "Converting Output Scores from Outlier Detection Algorithms into Probability Estimates", PROCEEDINGS / SIXTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, ICDM 2006 ;HONG KONG, CHINA; 18 - 22 DECEMBER, 2006, IEEE COMPUTER SOCIETY, PISCATAWAY, NJ, USA, 18 December 2006 (2006-12-18), pages 212 - 221, XP031003032, ISBN: 978-0-7695-2701-7, DOI: 10.1109/ICDM.2006.43 *
NEDIYANCHATH ANISH ET AL: "Anomaly Detection in Mobile Networks", 2020 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE WORKSHOPS (WCNCW), IEEE, 6 April 2020 (2020-04-06), pages 1 - 5, XP033784494, DOI: 10.1109/WCNCW48565.2020.9124843 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114925753A (en) * 2022-04-28 2022-08-19 南通东升灯饰有限公司 Use abnormity alarm system of LED floor lamp
CN116627707A (en) * 2023-07-20 2023-08-22 中孚安全技术有限公司 Detection method and system for abnormal operation behavior of user

Also Published As

Publication number Publication date
FI20206252A1 (en) 2022-04-14
FI129553B (en) 2022-04-14

Similar Documents

Publication Publication Date Title
WO2022117911A1 (en) Anomaly detection
CN109302719B (en) LTE cell capacity prediction analysis method and device
CN104396188B (en) System and method for carrying out basic reason analysis to mobile network property problem
EP3314762B1 (en) Adaptive filtering based network anomaly detection
EP2986048A1 (en) Network optimization method, device and apparatus
Routtenberg et al. Estimation after parameter selection: Performance analysis and estimation methods
WO2022090609A1 (en) Building an ensemble of anomaly detection models for analyzing measurement results
WO2017220107A1 (en) Method and network node for detecting degradation of metric of telecommunications network
CN112269937B (en) Method, system and device for calculating user similarity
FI129551B (en) Analyzing operation of communications network
EP4356313A1 (en) Analyzing measurement results of a communications network or other target system
CN113570070B (en) Streaming data sampling and model updating method, device, system and storage medium
US20240046149A1 (en) Analyzing measurement results of a target system
US20220376989A1 (en) Management of predictive models of a communication network
WO2024094920A1 (en) Controlling a target system
US11537116B2 (en) Measurement result analysis by anomaly detection and identification of anomalous variables
FI129316B (en) Monitoring performance of a communication network
CN112801327A (en) Method, device, equipment and storage medium for predicting logistics flow and modeling thereof
CN113051128B (en) Power consumption detection method and device, electronic equipment and storage medium
CN112487250B (en) Method and device for identifying service account group
CN116346259B (en) Channel occupancy state prediction method and device based on power variance comparison
CN114286370B (en) Method and device for determining influence of base station alarm on user perception service
FI129036B (en) Automatic neighbor list optimization in communication networks
WO2023084146A1 (en) Analyzing operation of communications network
WO2022234178A1 (en) Identifying stationary user devices of a cellular communications network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21820294

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21820294

Country of ref document: EP

Kind code of ref document: A1