WO2022188966A1 - Technique for controlling network traffic monitoring - Google Patents

Technique for controlling network traffic monitoring Download PDF

Info

Publication number
WO2022188966A1
WO2022188966A1 PCT/EP2021/056054 EP2021056054W WO2022188966A1 WO 2022188966 A1 WO2022188966 A1 WO 2022188966A1 EP 2021056054 W EP2021056054 W EP 2021056054W WO 2022188966 A1 WO2022188966 A1 WO 2022188966A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
traffic
type
attribute
metric
Prior art date
Application number
PCT/EP2021/056054
Other languages
French (fr)
Inventor
Gergely BÓNÉ
Attila BÁDER
Ferenc SZÁSZ
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Priority to EP21712081.5A priority Critical patent/EP4305821A1/en
Priority to PCT/EP2021/056054 priority patent/WO2022188966A1/en
Publication of WO2022188966A1 publication Critical patent/WO2022188966A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0806Configuration setting for initial configuration or provisioning, e.g. plug-and-play
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/0645Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis by additionally acting on or stimulating the network after receiving notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0813Configuration setting characterised by the conditions triggering a change of settings
    • H04L41/0816Configuration setting characterised by the conditions triggering a change of settings the condition being an adaptation, e.g. in response to network events
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5003Managing SLA; Interaction between SLA and QoS
    • H04L41/5009Determining service level performance parameters or violations of service level contracts, e.g. violations of agreed response time or mean time between failures [MTBF]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/026Capturing of monitoring data using flow identification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters

Definitions

  • the present disclosure generally relates to the monitoring of network traffic.
  • a technique for dynamically monitoring network traffic of different types in a communication network is presented.
  • the technique may be implemented as a method, a computer program product, an apparatus or a system.
  • Network management is an important feature of modern wired and wireless commu ⁇ nication networks. Network management in particular allows "troubleshooting" when quality of service issues or other network performance degradations are detected.
  • Proper network management decisions require a continuous collection and analysis of a plethora of network-related events occurring locally within the managed network and reported by that network to a network management domain.
  • the network events are often reported on a sub ⁇ scriber level to achieve a sufficiently high resolution for network analysis.
  • the network events are typically processed in the form of data sets, and the data sets can include network event information in a possibly aggregated (e.g., averaged) form.
  • a value pertaining to a certain traffic metric such as packet loss, video stall time or bitrate may be associated with a value of a network attribute indicative of one or more network entities for which the traffic metric value has been obtained.
  • Different attribute values may be defined per network attrib ⁇ ute dimension (e.g., "network cell” or “terminal device”) such that the attribute values are mutually exclusive to allow a "drill down" for troubleshooting.
  • the traffic metric value may have been obtained by aggregating individual traffic metric values across a certain population of subscribers or subscriber sessions all associated with the attribute value in the data set.
  • Traditional network event collection is based on passive probing of, or pre-configured event reporting by, different network functions of a communication network. In the case of certain wireless communication networks, those network functions stretch over different network domains, such as a radio access network domain and a core network domain.
  • 5G 5th Generation
  • IoT Internet of Things
  • Short reaction times in network management are desirable and require real-time analytics solutions, which in turn consume considerable processing and storage re ⁇ sources.
  • event collection by user plane probing in a 5G network will per core network site easily result in several terabit of user plane traffic that needs to be processed and evaluated in real time.
  • a similar situation will arise in the radio access network domain as a result of the increasing numbers of terminal devices and network cells.
  • significant server capacities, and also significant electric power will be consumed in this regard.
  • a method of controlling monitoring of network traffic in a communication network wherein the network traffic comprises network traffic of a first and a second type that can be classified in accordance with mutually exclusive network attribute values of one or more network attribute dimensions.
  • Monitoring of the first type of network traffic yields first data sets, with each first data set being indicative of a dedicated value of a first traffic metric and an associated network attribute value of one of the one or more network attribute dimensions.
  • the method comprises analyzing the first data sets to detect at least one first traffic metric value indicative of a network performance degradation, identifying the network attribute value associated with the detected first traffic metric value, and controlling monitoring of the second type of network traffic to increase in volume for the identified network attribute value, or for a network attribute value having the potential of correlating with the identified network attribute value.
  • the computer program product comprises program code portions for performing the steps of the method presented herein when the computer program product is executed on one or more processors.
  • the computer program product may be stored on a computer- readable recording medium.
  • an apparatus for controlling monitoring of network traffic in a communication network wherein the network traffic comprises network traffic of a first and a second type that can be classified in accordance with mutually exclusive network attribute values of one or more network attribute dimensions.
  • Monitoring of the first type of network traffic yields first data sets, each first data set being indicative of a dedicated value of a first traffic metric and an associated network attribute value of one of the one or more network attribute dimensions.
  • the apparatus is configured to analyze the first data sets to detect at least one first traffic metric value indicative of a network performance degradation, to identify the network attribute value associated with the detected first traffic metric value, and to control monitoring of the second type of network traffic to increase in volume for the identified network attribute value, or for a network attribute value having the potential of correlating with the identified network attribute value.
  • Fig. 1 is a diagram illustrating a system embodiment of the present disclosure
  • Fig. 2 is a block diagram illustrating an embodiment of a monitoring control apparatus in accordance with the present disclosure
  • Fig. 3 is a flow diagram of a method embodiment of the present disclosure
  • Fig. 4 is a schematic diagram of a collection of data sets in accordance with the present disclosure.
  • Figs. 5 & 6 are signalling diagrams according to embodiments of the present disclo ⁇ sure;
  • Fig. 7 is a flow diagram illustrating a further method embodiment of the present disclosure.
  • Figs. 8A - 12 are schematic diagrams illustrative of monitoring results.
  • the pre ⁇ sent disclosure is not limited in this regard.
  • the present disclosure could also be implemented in other wired or wireless communication networks (e.g., ac ⁇ cording to 4G specifications).
  • ASICs Application Specific Integrated Circuits
  • DSP Digital Signal Processors
  • FIG. 1 illustrates an embodiment of a system 10 in which the present disclosure can be implemented.
  • the system 10 comprises a communication network domain 100 configured to monitor network traffic and a network management (NM) domain 200 configured to control network traffic monitoring in the communication network domain 200 and to analyze the monitoring results.
  • NM network management
  • the communication network to be monitored is config ⁇ ured as a wireless cellular communication network.
  • the communication network domain 100 comprises one or more wireless terminal devices 110, a radio network access (RAN) domain 120 and a core network (CN) domain 130, as generally known in the art.
  • the RAN domain 120 and the CN domain 130 each comprises a large number of network functions (NFs).
  • a particular NF may be a software entity (e.g., implemented using cloud computing resources), a stand-alone hardware entity (e.g., in the form a network node), or a combination thereof.
  • the NFs may conform to the definitions of "network functions" as standardized by 3GPP in its 5G specifications, but in other variants (e.g., in 4G implementations) this may not be the case.
  • the NM domain 200 comprises an event collector 210 configured to receive and, optionally, store and pre-process network event information resulting from network monitoring.
  • the NM domain 200 further comprises a monitoring control apparatus 220 configured to analyse the (pre-processed) event information to arrive at monitoring control decisions.
  • network events are to be construed broadly. Network events generally characterize what is happening in the communication network domain 200, such as session initiation or termination, the status of an ongoing session, transmission of a certain amount of data and so on. So called Key Performance Indicators (KPIs), usually numeric values, can be reported as events as such or as characteristic parameters of one or more events, such as session initiation time, ratio of unsuccessful session initiations, the amount of transmitted bytes over a given amount of time and so on.
  • KPIs Key Performance Indicators
  • An event can be reported when it is locally detected at a dedicated monitoring site (e.g., a dedicated NF) or in response to probing.
  • the network events can be standardized (e.g., 4G or 5G) signalling events or vendor-specific events (of, e.g., a network node acting as NF).
  • Event probing may be performed in the communication network domain 100 to capture the events at a network interface, or to capture user plane traffic, sample it and generate user plane traffic metrics that are to be reported as one or more events.
  • KPIs can be calculated from or attributed to one or multiple events.
  • a handover failure can be reported in an event.
  • Exemplary KPIs calculated from this or these events either locally in the communication network domain 100 or centrally in an NM domain 200 are a number of handover failures or a ratio of the handover failures and the total handovers in a certain period of time.
  • an NF user plane probe may report a throughput event every 5 s in a dedicated event report.
  • An average throughput KPI can be calculated locally or centrally as the average of these throughputs for 1 min, and a maximum throughput KPI can be calculat ⁇ ed locally or centrally as the maximum of the reported throughputs in 1 min.
  • FIG. 1 An embodiment of the monitoring control apparatus 220 of Fig. 1 will be described with reference to Fig. 2, and operational details of the monitoring control apparatus 220 will be described with reference to a method embodiment as illustrated in flow diagram 300 of Fig. 3.
  • the monitoring control apparatus 220 comprises a processor 222 and a memory 224 coupled to the processor 222.
  • the memory 224 stores program code (e.g., in the form of a set of instructions) that controls operation of the processor 22 so that the monitoring control apparatus 220 is operative to perform any of the method aspects presented herein (see Fig. 3).
  • a processor such as processor 222, may be implemented using any processing circuitry and is not limited to, for example, a single processing core, but may also have a distributed topology (e.g., using cloud computing resources).
  • the monitoring control apparatus 220 further comprises an input interface 226 and an output interface 228.
  • the two interfaces 226, 228 are configured for communication with the event collector 210 on the one hand and the communication network domain 100 (e.g., individual NFs therein) on the other hand.
  • operation of the monitoring control apparatus 220 comprises processing of data sets that include (possibly pre- processed, such as aggregated) event information obtained from monitoring network traffic in the communication network domain 100.
  • the flow diagram 300 illustrates a step 302 of analyzing, by the monitoring control apparatus 220, a collection of first data sets to detect at least one first traffic metric value indicative of a network performance degradation.
  • the first data sets may be stored on the event collector 210 and accessed by the monitoring control apparatus 220 via its input interface 226.
  • the network traffic in the communication network domain 100 comprises network traffic of a first type and network traffic of a second type, wherein the first data sets analyzed in step 302 have been obtained for the first network traffic type.
  • the network traffic of the first type is, or includes, at least one of real-time traffic, voice traffic and uplink traffic.
  • Uplink traffic refers to traffic originating at the terminal devices 110.
  • the network traffic of the first type may be governed by at least one of a connectionless communication protocol and the Real-time Transport Protocol (RTF).
  • RTF Real-time Transport Protocol
  • the network traffic of the first type may have a lower traffic volume per predefined period of time than the network traffic of the second type.
  • voice traffic as an exemplary network traffic of the first type, is real-time traffic that is only around 1-5 % of the total traffic volume, the latter being dominated by MBB traffic .
  • the network traffic of the second type is, or includes, at least one of non-real-time traffic, service traffic (in particular multimedia streaming traffic or Internet traffic), Mobile Broad Band (MBB) traffic and uplink traffic.
  • the network traffic of the second type may have a higher traffic volume per predefined period of time than the net ⁇ work traffic of the first type.
  • the network traffic can further be classified in accordance with mutually exclusive network attribute values of one or more network attribute dimensions (abbreviated as "attribute values" and “attribute dimensions” hereinafter).
  • Each attribute dimension may define a set of possible sources of the network performance degradation.
  • an attribute dimension can also be viewed as defining a set of network entities that each may individually degrade network performance due to, for example, a malfunction.
  • the attribute values spanning a given attribute dimension may define mutually exclusive sub-sets of one or more such network entities (e.g., to allow a proper "drill down" for troubleshooting purposes in case network performance degradations are detected in step 302).
  • the one or more attribute dimensions may, for example, comprise one or more of: a) at least one network subscription-related dimension for a subscription- based communication network (e.g., subscription type, roaming status, etc.); b) at least one terminal device-related dimension for a communication net ⁇ work comprising individual terminal devices (e.g., terminal type, terminal model, terminal vendor, terminal capabilities, etc.); c) at least one network hierarchy-related dimension for a communication network split in multiple hierarchy levels (e.g., RAN node vs. CN node, network slice, etc.); and d) at least one network geography-related dimension for a communication network split in dedicated geographical regions (e.g., network cell, routing area, tracking area, registration area, etc.)
  • Each of those attribute dimensions comprises a set of mutually exclusive (numerical or non-numerical) attribute values, or simply attributes.
  • the attribute values of the dimension “network cell” can be cell identifiers
  • the attribute values of the dimension “terminal type” can be "smartphone", “dongle”, “IoT device”, and similarly for other dimensions.
  • Monitoring of the first type of network traffic yields first data sets (see step 302 in Fig. 3), with each first data set being indicative of a dedicated value of a (possibly aggregated) first traffic metric and an associated attribute value of one of the one or more attribute dimensions.
  • monitoring of the second type of network traffic may in some variants yield corresponding second data sets, with each second data set being indicative of a dedicated value of a (possibly aggregated) second traffic metric and an associated attribute value of one of the one or more attribute dimensions.
  • the second traffic metric may be different from or identical with the first traffic met- ric.
  • the network traffic of the first and/or second type may be packet-based, and the first and/or second traffic metric may be a packet-based traffic metric.
  • the network traffic of the second type may relate to multimedia streaming, and the second traffic metric may be multimedia streaming-related traffic metric (e.g., a video-related KPI, such as video stall time).
  • the network traffic of the second type may relate to an Internet service, and the second traffic metric may be an Internet service-related traffic metric.
  • the content of the first and second data sets is at least partially derived from event information that has been obtained (e.g., measured) in the communication network domain 100 for the associated traffic type before being communicated to the NM domain 200 (see the two arrows in the center of Fig. 1).
  • the event information may enter a particular data set in aggregated form, for example aggregated across subscribers or subscriber- sessions associated with the attribute value in the data set and/or across a certain period of time (and possibly averaged).
  • an aggregated traffic metric value can be obtained by aggregating non-aggregated subscriber-related or subscriber session-related traffic matric values across those monitored subscribers or subscriber sessions that comply with the attribute value that is associated with the traffic metric value in a given data set. Aggregation may occur in one or both of the communica- tion network domain 100 and the NM domain 200 (e.g., by the event collector 210). Further optionally, the event information may be "enriched" (e.g., by the event collector 210 or by a local monitoring site, such as a dedicated NF, in the communica ⁇ tion network domain 100) with further information, such as attribute-related information. Such further information may be obtained from an information source different from a local monitoring site in the communication network domain 100.
  • a given data set thus associates a value pertaining to a certain traffic metric (such as packet loss, video stall time or bitrate) with a value of an attribute indicative of an attribute dimension for which the traffic metric value has been obtained.
  • Different attribute values e.g., different Tracking Area Codes, TACs
  • TACs Tracking Area Codes
  • a particular attribute value e.g., TAC ID1 is associated, in a data set, with a value of a given traffic metric (e.g., average packet loss or any video-related KPI such as video stall time).
  • Fig. 4 illustrates a data storage (e.g., a database) in the event collector 210 and the individual data sets collected therein.
  • An exemplary first subset of those data sets associates different TAC IDs with corresponding average packet losses per TAC, and an exemplary second subset associates the same TAC IDs with corresponding video KPIs.
  • the traffic metric value included therein may have been obtained based on aggregating (e.g., averaging) individual traffic metric values across a period of time and across a certain population of subscribers or subscriber sessions all associated with a particular attribute value, such as a given TAC ID.
  • the two subsets of data sets illustrated in Fig. 4 may all pertain to the first type of network traffic.
  • one of the subsets may pertain to the first type of network traffic and the other of the subsets may pertain to the second type of network traffic.
  • different traffic metric types may be available for the different types of network traffic, and that not all traffic metric types may be available for all network traffic types.
  • Fig. 4 could be stored in any format, for example as a table, list, etc. It will further be appreciated that more than two traffic types may be defined. Similarly, more than two different subsets of data sets may be provided by the event collector 210 for analysis by the monitoring control apparatus 220.
  • the method further comprises a step 304 of identifying the at ⁇ tribute value associated with the particular traffic metric value that was detected (e.g., using a threshold decision) in step 302 to be indicative of a network perfor ⁇ mance degradation.
  • the attribute value may be read from the data set in which the particular traffic metric value was detected.
  • the method continues with controlling, in step 306 of Fig. 3, monitoring of the second type of network traffic to increase in volume for the identified attribute value, or for an attribute value having the potential of correlating with the identified attribute value.
  • the identified attribute value and the attribute value having the potential of correlating with the identified attribute value may relate to the same possible source of network performance degradation.
  • monitoring of the second type of network traffic may increase in volume for an attribute dimension specifically related to the network traffic of the second type.
  • the attribute dimension related to the network traffic of the second type may not be available for the network traffic of the first type (but may, in some variants, have the potential of correlating therewith).
  • the non-availability may be due to inherent differences between the two types of network traffic.
  • the first type of network traffic is real-time (e.g., voice) traffic and the attribute dimension is related to real-time traffic, such an attribute dimension will not be available if the second type of network traffic is MBB traffic (e.g., video streaming).
  • MBB traffic e.g., video streaming
  • a monitoring control command may be transmitted by the monitoring control apparatus 220 to the communication network domain 100. Transmission of such a control command is illustrated by an arrow on the right-hand side of Fig. 1.
  • Controlling monitoring of the second type of network traffic to increase in volume may comprise at least one of (i) increasing a traffic sampling rate at a given traffic monitoring site (e.g., a given NF) in the communication network domain 100 and (ii) suitably adjusting a traffic filter at a given monitoring site.
  • the monitoring control command sent in step 306 may thus be indicative of an increased sampling rate to be applied to the second type of network traffic in regard to subscribers or subscriber sessions.
  • the monitoring control command may be indicative of an traffic filter setting to be adjusted so that more (e.g., all) of the network traffic of the second type is monitored.
  • the traffic filter setting may define a set of subscribers for which subscriber sessions are to be monitored for event reporting purposes.
  • the traffic filter setting may comprise a white list of subscribers to be monitored or a black list of subscribers not to be moni ⁇ tored.
  • the corresponding list may be defined using Subscription Permanent Identifiers (SUPIs), International Mobile Subscriber Identifiers (IMSIs) or any other identifier type.
  • the list may, for example, include or exclude certain subscribers based on consent or subscription type. Therefore, controlling monitoring of the sec ⁇ ond type of network traffic to increase in volume may comprises increasing a per- centage of network subscribers or network subscriber sessions for which network traffic of the second type is detected at a given monitoring site.
  • the second type of network traffic is not monitored at all prior to step 306.
  • the second type of network traffic is monitored to yield the second data sets that are each indicative of a dedicated value of the second traffic metric and associated with the identified attribute value, or the attribute value having the potential of correlating therewith.
  • the second type of network traffic prior to step 306, is already monitored to yield a certain number of the second data sets over a prede ⁇ fined period of time. Then, after controlling monitoring of the second type of network traffic to increase in volume, the second type of network traffic is monitored to yield a higher number of second data sets than before controlling monitoring of the second type of network traffic to increase in volume in step 306.
  • a possible source of the network performance degradation may be identified based at least on the second data sets yield- ed after controlling monitoring of the second type of network traffic to increase in volume.
  • differences in the second traffic metric values of second data sets yielded before and after controlling monitoring of the second type of network traffic to increase in volume may be evaluated. If no (or no substantial) differences are found, monitoring of the second type of network traffic may be con- trolled to decrease in volume again for the identified attribute value, or for the attrib ⁇ ute value having the potential of correlating with the identified attribute value.
  • the first data sets are analyzed is step 302 in regard to a first at ⁇ tribute dimension ("primary" attribute dimension) to detect the at least one first traffic metric value indicative of a network performance degradation.
  • the increased volume of the monitored second type of network traffic may then be analyzed in regard to a second attribute dimension ("secondary" attrib ⁇ ute dimension) different from the first attribute dimension so as to localize a possible source of the network performance degradation.
  • second attribute dimension second attribute dimension
  • the increased volume of the monitored second type of network traffic may also be analyzed in regard to the first attribute dimension.
  • the NM domain 200 comprises an event collector 210 and a monitoring control apparatus 220.
  • the monitoring control apparatus 220 comprises one or more network analytics components 220A configured to perform at least steps 302 and 304 of Fig. 3 and a monitoring controller 220B configured to perform at least step 306 of Fig. 3.
  • the monitoring controller 220B is provided as an extra control layer between the one or more analytics components 220A on the one hand and the RAN and CN domains 120, 130 on the other hand.
  • the analytics components 220A may be configured as customer experience management (CEM) systems or subscriber analytics systems (such as Ericsson Experts Analytics, EEA, systems).
  • the analytics components 220A may be comprised by one or more of network operation centres (NOCs), service operation centres (SOC) and network optimization engineering (NOE) systems.
  • NOCs network operation centres
  • SOC service operation centres
  • NOE network optimization engineering
  • the analytics components 220A are configured to monitor and analyse service quality and network quality on a subscriber level.
  • the analytics components 220A may be soft- ware entities implemented, for example, using cloud computing resources, hardware entities, or combinations thereof.
  • the analytics components 220A are each configured to send network analytics requests to the event collector 210 that receives these requests via a dedicated interface 210A.
  • the event collector 210 comprises a further dedicated interface 210B towards the RAN domain 120 and the CN domain 130 to receive network event information.
  • the RAN and CN domains 120, 130 comprise a plethora of NFs 122, 132, respectively.
  • Each NF 122, 132 comprises a bi-directional communication link to the NM domain 200 for receiving monitoring control commands from the NM domain 200 on the one hand and reporting network information resulting from the monitoring to the NM domain 130 on the other hand.
  • the exemplary NFs 122, 132 of Fig. 5 belong to a 4G/5G wireless communication network as standardized by the 3rd Generation Part ⁇ nership Project (3GPP).
  • the CN domain 130 comprises, inter alia, multiple User Plane Functions (UPFs), a Session Management Function (SMF) and an Access and Mobility management Function (AMF). While not shown in Fig.
  • UPFs User Plane Functions
  • SMF Session Management Function
  • AMF Access and Mobility management Function
  • the CN domain 130 may, for example, additionally comprise a Mobility Management Entity (MME) and gateways, such as a Serving Gateway (SGW) and a Packet Data Network Gateway (PGW), see also Fig. 6.
  • MME Mobility Management Entity
  • SGW Serving Gateway
  • PGW Packet Data Network Gateway
  • the RAN domain 120 comprises multiple base stations in the form of so-called 4G eNodeBs (eNBs) and 5G gNodeBs (gNBs).
  • the network scenario of Fig. 6 illustrates further aspects of a 4G/5G communication network with dedicated communication interfaces between the various NFs and the terminal device (also called User Equipment, UE, 110).
  • the terminal device also called User Equipment, UE, 110.
  • such communication network types comprise a user plane on which network traffic is routed as well as a control plane that is, inter alia, used to control network traffic routing.
  • Fig. 6 illustrates that the SGW connects a 4G Evolved Packet Core (EPC) part of the CN domain 130 towards the RAN domain 120, while the PGW connects the EPC to an IP network, such as an IP Multimedia Subsystem (IMS) 134.
  • EPC Evolved Packet Core
  • IMS IP Multimedia Subsystem
  • IMS 134 provides control and media functions for real-time voice services (such as Voice over LTE, VoLTE, or Voice over NR, VoNR) and other real-time services.
  • event information pertaining to VoLTE- or VoNR-related (or other real-time) user plane traffic - as an exemplary first type of network traffic - can be obtained from the IMS 134 and/or various 4G/5G NFs 132 in the CN domain 130, such as the UPF, SGW and/or PGW (see thick arrows in Fig. 6).
  • Network event monitoring at those event capture points can be performed using, for example, physical probes, software probes or node logs.
  • network event reporting can be performed in parallel for a second type of network traffic, for example MBB traffic.
  • this information Prior to reporting the event information resulting from monitoring of the user plane in the communication network domain 100, this information can be enriched with one or more attribute values of one or more attribute dimensions locally available at user plane event capture points or received in reports from control plane-related NFs.
  • This enrichment can be based on correlating information from the user plane and the control plane, using for example one or both of Fully Qualified Tunnel Endpoint IDs (FTEIDs) and Fully Qualified Session Endpoint IDs (FSEIDs) in case of NFs 132 in the CN domain 130 (as one of these IDs will always be available on both the user plane and the control plane).
  • FTEIDs Fully Qualified Tunnel Endpoint IDs
  • FSEIDs Fully Qualified Session Endpoint IDs
  • the correlation and enrichment with attribute values can additionally or alternatively be done using Internet Protocol (IP) addresses related to voice sessions in case of the IMS 134 and/or using Border Gateway Function (BGF) and Session Border Gateway (SBG) data.
  • IP Internet Protocol
  • BGF and SBG are two NFs within the IMS 134.
  • IP Internet Protocol
  • the method illustrated in Fig. 7 includes two dedicated phases, namely a "normal operation” phase that is followed by a “troubleshooting” phase in case a network performance degradation has been detected. From the “troubleshooting” phase, the method may loop back to the "normal operation” phase.
  • the two phases essentially differ from each other in that monitoring of the second type of network traffic increases in volume in the "troubleshooting" phase compared to the "normal operation” phase.
  • This also means that the hardware and software resources consumed by the network monitoring in the communication network domain 100 can be reduced in the "normal operation” phase, while - in the exemplary embodiment of Fig. 7 - the first type of network traffic is fully monitored so as to increase the likelihood of detecting a net- work performance degradation.
  • real-time traffic which constitutes the first type of network traffic in the scenario of Fig. 7, is particularly sensitive to any network performance degradation and can thus be considered as an "early indicator" of any issues that may also effect other traffic types.
  • MBB traffic second network traffic type
  • second network traffic type the volume of MBB traffic
  • voice traffic first network traffic type
  • MBB traffic continuously monitor all services for a small percentage (such as 10%) of subscribers (e.g., subscriber sessions) using random sampling see (step 702 in Fig. 7) to continuously collect associated traffic metrics.
  • this percentage can be reduced to zero.
  • RTP-based voice traffic continuously monitor only uplink traffic, but for all subscribers (e.g., all subscriber sessions; see step 704) to continuously collect associated traffic metrics. This 100% monitoring can be reduced to a high percentage (e.g., above 50%).
  • KPIs may be calculated based on aggregating subscriber-related or subscriber session-related metric values derived by network traffic monitoring. For calculating KPIs for the time dimension and one or more attribute dimensions with a certain precision (confidence interval associated with a confidence level), a well-defined number of samples (e.g., of monitored events) is needed. Monitoring of the MBB traffic with a random sampling of 10% of all subscribers has turned out to be sufficient in this regard, which results in only around 10% of a resource consumption footprint as well. Monitoring of the voice traffic leads to a small resource consumption footprint anyhow even when covering all the subscribers. This means that larger network performance degradations (e.g., in regard to quality of service) can be easily recognized without full subscriber coverage. Even smaller degradations can be also identified, making troubleshooting feasible.
  • network performance degradations e.g., in regard to quality of service
  • the event collector 210 or the analytics components 220A of Fig. 5, or any of the NFs in Fig. 5 or Fig. 6, is configured to correlate information from multiple data sources so as to enrich the network event information obtained by probing, reporting or otherwise (see step 706).
  • the network event information can be enriched with parameters which are not available in the events as such, such as subscription types, subscriber groups, physical coordinates, terminal vendors, etc.
  • the main goal of enrichment is to add, or increase, the number of attribute dimensions common for both types of network traffic, or to identify hidden correlations between attribute dimensions or attribute values (e.g., due to common user behavior). As such, multiple data sets are obtained for each type of network traffic, see Fig. 4.
  • the same set of traffic metric values can be aggregated across subscribers or subscriber sessions for different attribute dimensions, so that different subsets of data sets can be derived for the same set of traffic metric values.
  • a drilldown per "primary" attribute dimension may be performed in step 706.
  • KPIs may be filtered for attribute values. If there is a degradation such as a specific KPI value issue which affects only a limited number of subscribers, it may not be detected if one monitors the KPIs for all attribute values of a given attribute dimension in aggregated form. If there is an issue causing network performance degradation directly related to a specific attribute value of a given attribute dimension, it can be detected by comparing the KPI values of the different attribute values.
  • Geographical area / network hierarchy i.e., what is the location of the sub ⁇ scriber, which network elements serve the communication - to identify if a certain issue has network wide or geographically/hierarchically limited impact: i. cell (4G cell, 5G cell, including dual-connectivity cases) ii. radio node (e.g., eNB, gNB, including dual-connectivity cases) iii. core node (e.g., MME, SGW, PGW, AMF, SMF, UPF) iv. routing area, tracking area, registration area v. network slice
  • radio node e.g., eNB, gNB, including dual-connectivity cases
  • core node e.g., MME, SGW, PGW, AMF, SMF, UPF
  • Subscriber i.e. what kind of subscriber(s) are affected by a certain issue: i. subscription type ii. roaming / home
  • Terminal device i.e. what kind of device(s) are affected by a certain issue: i. terminal type (e.g mobile, dongle, etc.) ii. terminal vendor, terminal model iii. terminal capabilities Note that other attribute dimensions, which are not explicitly applicable or available to MBB traffic, can be defined as well. It is enough to have a, many times hidden, correlation between traffic services connected to the "primary" attribute dimension.
  • step 708 the data sets thus obtained for the voice traffic, in particular the traffic metrics information such as KPIs contained in the data sets, is analyzed (as explained above with reference to step 302 of Fig. 3). It has been found that real-time network traffic such as voice traffic is particularly sensitive for network issues that lead to network performance degradations. As an example, the following RTP metrics are indicative of whether there is any service quality degradation:
  • RTP stream gaps i.e., consecutively lost packets
  • RTP packet sequence anomalies e.g., forward and backward jumps
  • RTP jitter i.e., delay variation
  • step 710 the data sets that may have been obtained for MBB traffic are generally analyzed.
  • Generic service degradation can be detected by pre-set thresholds, and dynamic anomaly detection functions can indicate if one or more KPIs deteriorate for a certain dimension.
  • the reason to collect a limited amount of MBB traffic as well beside the RTP traffic is to obtain a high level view and values for normal operation cases of MBB-related KPIs.
  • Analysis may be based on a graph showing MBB KPIs in relation to primary dimensions. This is meant by "generic" analysis.
  • the non-generic analysis will be the drilldown for the increased traffic volume in regard to primary and, possibly, secondary dimensions for troubleshooting (see step 718).
  • step 712 a decision is made based on the traffic metric value analysis as to whether or not there exists a network performance degradation (using, e.g., one or more thresholding decisions), see also step 302 of Fig. 3. If there is no degradation, the "normal operation" phase continues with steps 702, 704 and the cycle is repeated. Otherwise, i.e., if a network performance degradation can be detected in step 712, the method continues with step 714 and enters the "troubleshooting" phase, see also step 304 and 306 of Fig. 3.
  • a network performance degradation using, e.g., one or more thresholding decisions
  • step 714 the attribute dimension and attribute value of associated with the traffic metric value indicative of the service performance degradation are determined, as generally explained above with reference to step 304 in Fig. 3. Also in step 714, the monitoring of the MBB traffic is increased in volume, as generally explained above with reference to step 306 in Fig. 3
  • a detected voice traffic degradation for a certain attribute dimension and a certain attribute value gives an indication where to shift the full-coverage monitoring for the MBB traffic to improve troubleshooting. If a degradation in one of the attribute dimensions is detected for the monitored voice traffic, due to the common background, there is an increased probability that other traffic types, which are only partially monitored in the "normal operation" phase, are also degraded.
  • the MBB-based troubleshooting requires more data than collected during the "normal operation" phase, but there is no need to increase data collection for the entire communication network and for all the subscribers.
  • the increase of the data collection can be well directed, or focused, to the identified dimensions only.
  • repre ⁇ sentative sampling may be used at the NFs.
  • Representative sampling is done by combin- ing filtering and sampling capabilities of the NFs. For example, assume that in steps 708 and 712 a particular registration area is identified in which the RTP metrics are degraded.
  • the UPFs support filtering of event information for the attribute dimension "registration area".
  • 10% of MBB traffic is monitored at the UPFs for each individual registration area, using random IMSI sampling.
  • the MBB traffic monitoring for that specific registration area is increased, for example to 50%, still using random IMSI sampling.
  • Another option is to increase the MBB traffic monitoring to 100%. In this case, no sampling is needed in relation to the problematic registration area.
  • a subscriber group is identified in steps 708 and 712 for which RTP traffic metrics are degraded (e.g., subscribers having a particular subscription type).
  • one of the analytics components 220A generates an IMSI white list, which includes the subscribers belonging to the identified subscriber group. This white list is configured at the UPFs. The UPFs will only send events related the subscribers in the white list in addition to the random 10% of subscribers.
  • the increased volume of monitored MBB traffic may be analyzed further based on the "primary" (see step 706) and/or a "secondary" attribute dimension, see step 716.
  • the following MBB data service KPIs are examples of what can be analyzed in relation to both the "primary" and "secondary" dimensions.
  • traffic metrics are applicable for any traffic type (e.g., throughput) while some others (e.g., stall time ratio) are specific to certain traffic types or services (e.g., video): throughput, bitrate - packet loss ratio, packet retransmission ratio, round trip time video stall time ratio, video resolution, video MOS web page access time, web page download success ratio
  • secondary attribute dimensions are data service specific, hence they can be analyzed during detailed MBB-based troubleshooting in step 716 (although not applicable as "primary" attribute dimension for voice traffic).
  • these attribute dimensions are analyzed to set the right scope of the very detailed data collection.
  • identification of a certain problematic attribute dimension can show the root cause itself or can guide the troubleshooting process to find the root cause of the service quality degradation.
  • data network service provider service functionality e.g., video, gaming etc.
  • other traffic classification type attributes client application radio quality parameters, Reference Signal Received Power/Quality (RSRP/RSRQ), etc.
  • the required sample size is calculated based on the required target precision.
  • the mean value follows a normal distribution.
  • the confidence interval of the mean is 2*Z*s/sqrt(n), where Z is the value of Z distribution at the chosen confidence level (e.g., 95%), s is the standard deviation of the population, and n is the sample size. Based on this formula, the required number of samples for a target confidence interval can be determined. Reference is now made to step 718.
  • the MBB-related traffic metric values are calculated for these one or more attribute values and are compared with the ones for other attribute values of the same attribute dimension.
  • step 720 If it is found in step 720 that they are not different at the chosen confi ⁇ dence level (e.g., the confidence intervals of these values overlap), the sampling rate for these dimensions is restored in step 722 to the basic level (e.g., 10%). Additionally, or in the alternative, the MBB-related traffic metric values calculated for the "problematic" one or more attribute values as derived at the lower sampling rate are compared with those traffic metric values that have been calculated at the higher sampling rate. If it is found in step 720 that they are not different at the chosen confidence level (e.g., the confidence intervals of these values overlap), the sampling rate for these dimensions is restored in step 722 to the basic level (e.g., 10%). The method then enters the "normal operation" phase again.
  • the chosen confi ⁇ dence level e.g., the confidence intervals of these values overlap
  • step 720 If it turns out in step 720 that the issue giving rise to the network performance degradation could not be fixed or that no correlation has been found in step 720, a more detailed and possibly manual troubleshooting is performed in step 724.
  • a troubleshooting example will be described with reference to the schematic diagrams of Figs. 8A to 12 and in the context of Fig. 7.
  • a network performance degradation for the first type of network traffic i.e., voice traffic or other real-time traffic
  • two RTP traffic metrics values are found to be problematic (e.g., above a given threshold) for the attribute dimension "Tracking Areas", concretely for the attribute values "TAC ID 13816" and possibly "TAC ID 11456". This situation is illustrated in Fig. 8A (RTP packet loss) and Fig. 8b (RTP forward jumps).
  • step 714 the monitoring of the MBB traffic is increased in vol- ume for the "worst" TAC ID 13816 and possibly the "second worst" TAC ID 11456 as well.
  • "full” monitoring is focused on one or two attribute values of a given attribute dimension only.
  • the result of the increased monitoring is illustrated in Fig. 10B, which shows that the video quality issues actually happen in relation to one dedicated service provider (here: TikTok). Accordingly, troubleshooting can be fo- cused to a limited number of tracking areas and a particular service provider. As an example, it may be guessed that routing issues or issues with server settings may exist in hardware installed by that service provider in that a particular tracking area.
  • the diagrams of Figs. 11A and 11B show a comparison of the MBB- related traffic metric "averaged downlink TCP session throughput for classified traffic" for the attribute dimensions "tracking area” (here: TAC 13816) and "service provider” (here: “Facebook”, "TikTok” and “Netflix”) at a low sampling rate (Fig. 11A) and an increased sampling rate (Fig. 11B).
  • TAC 13816 tracking area
  • service provider here: “Facebook”, "TikTok” and “Netflix”
  • the diagram of Fig. 12 illustrates that not every "primary" attribute dimension that is identified as problematic from an RTP- or voice-related point of view is problematic from an MBB-related point of view as well.
  • the "worst" TAC 13816 was indeed problematic for MBB-related traffic for one service provider, as explained above, but the "second worst" TAC 11456 was not problematic for any service provider. As such, increased monitoring can immedi- ately be set back for TAC 11456 without delay as no correlation could be found (see steps 718, 720 and 722 in Fig.
  • the increased monitoring can be continued until the issue has been fixed (optionally in cooperation with the service provider) before it is also set back after having checked that the issue has indeed been fixed.
  • the resource footprint of network traffic monitoring can be kept at an optimally low level.
  • the technique presented herein reduces the overall volume of network traffic that has to be monitored (e.g., because random sampling is applied to "heavy volume"-type traffic such as MBB traffic), while still allowing a reliable detection of network performance degradations (e.g., because all or a significant part of a less voluminous type of real-time traffic is monitored). Upon detecting such a degradation, the technique allows focusing monitoring to a possibly problematic portion of the network traffic.

Abstract

A technique for controlling monitoring of network traffic in a communication network is presented, wherein the network traffic comprises network traffic of a first and a second type that can be classified in accordance with mutually exclusive network attribute values of one or more network attribute dimensions. Monitoring of the first type of network traffic yields first data sets, with each first data set being indicative of a dedicated value of a first traffic metric and an associated network attribute value (e.g., a particular cell identifier) of one of the one or more network attribute dimensions (e.g., all cells). A method aspect comprises the steps of analyzing the first data sets to detect at least one first traffic metric value indicative of a network performance degradation, identifying the network attribute value associated with the detected first traffic metric value, and controlling monitoring of the second type of network traffic to increase in volume for the identified network attribute value, or for a network attribute value having the potential of correlating with the identified network attribute value.

Description

Technique for controlling network traffic monitoring Technical Field
The present disclosure generally relates to the monitoring of network traffic. In particular, a technique for dynamically monitoring network traffic of different types in a communication network is presented. The technique may be implemented as a method, a computer program product, an apparatus or a system.
Background Network management is an important feature of modern wired and wireless commu¬ nication networks. Network management in particular allows "troubleshooting" when quality of service issues or other network performance degradations are detected.
Proper network management decisions require a continuous collection and analysis of a plethora of network-related events occurring locally within the managed network and reported by that network to a network management domain. In subscription- based communication networks, the network events are often reported on a sub¬ scriber level to achieve a sufficiently high resolution for network analysis. For network management purposes, the network events are typically processed in the form of data sets, and the data sets can include network event information in a possibly aggregated (e.g., averaged) form. In a given data set, a value pertaining to a certain traffic metric such as packet loss, video stall time or bitrate may be associated with a value of a network attribute indicative of one or more network entities for which the traffic metric value has been obtained. Different attribute values (e.g., different cell identifiers or terminal device types) may be defined per network attrib¬ ute dimension (e.g., "network cell" or "terminal device") such that the attribute values are mutually exclusive to allow a "drill down" for troubleshooting. In a given data set, the traffic metric value may have been obtained by aggregating individual traffic metric values across a certain population of subscribers or subscriber sessions all associated with the attribute value in the data set. Traditional network event collection is based on passive probing of, or pre-configured event reporting by, different network functions of a communication network. In the case of certain wireless communication networks, those network functions stretch over different network domains, such as a radio access network domain and a core network domain.
While the volume of reported network events is already significant in wireless communication networks of the 4th Generation (4G), the event reporting volume is expected to drastically increase with the ongoing deployment of 5th Generation (5G) networks (also called New Radio, NR, networks). This increase is partly due to higher numbers of terminal devices of new kinds, including Internet of Things (IoT) devices, and partly the result of new service types that will become available in 5G networks.
Short reaction times in network management are desirable and require real-time analytics solutions, which in turn consume considerable processing and storage re¬ sources. As an example, it is expected that event collection by user plane probing in a 5G network will per core network site easily result in several terabit of user plane traffic that needs to be processed and evaluated in real time. A similar situation will arise in the radio access network domain as a result of the increasing numbers of terminal devices and network cells. Evidently, significant server capacities, and also significant electric power, will be consumed in this regard.
Attempts have been made to reduce the reported volume of network events. For example, it has been suggested to apply random event sampling techniques to re- duce the amount of data that needs to be analyzed for network management purposes. However, such random sampling of network events has in some cases been found to reduce the efficiency of detecting network anomalies as it cannot be ensured that, for example, problematic communication sessions are not "filtered out" in view of the applied randomness. On the other hand, a continuous and full traffic coverage by network monitoring is - for the reasons set out above - likewise problematic in certain cases.
Summary
Accordingly, there is a need for a network monitoring control technique that is re¬ source efficient while still enabling a reliable detection of a network performance degradation. According to a first aspect, a method of controlling monitoring of network traffic in a communication network is presented, wherein the network traffic comprises network traffic of a first and a second type that can be classified in accordance with mutually exclusive network attribute values of one or more network attribute dimensions. Monitoring of the first type of network traffic yields first data sets, with each first data set being indicative of a dedicated value of a first traffic metric and an associated network attribute value of one of the one or more network attribute dimensions. The method comprises analyzing the first data sets to detect at least one first traffic metric value indicative of a network performance degradation, identifying the network attribute value associated with the detected first traffic metric value, and controlling monitoring of the second type of network traffic to increase in volume for the identified network attribute value, or for a network attribute value having the potential of correlating with the identified network attribute value.
According to a second aspect, computer program product is presented. The computer program product comprises program code portions for performing the steps of the method presented herein when the computer program product is executed on one or more processors. The computer program product may be stored on a computer- readable recording medium.
Also presented is an apparatus for controlling monitoring of network traffic in a communication network, wherein the network traffic comprises network traffic of a first and a second type that can be classified in accordance with mutually exclusive network attribute values of one or more network attribute dimensions. Monitoring of the first type of network traffic yields first data sets, each first data set being indicative of a dedicated value of a first traffic metric and an associated network attribute value of one of the one or more network attribute dimensions. The apparatus is configured to analyze the first data sets to detect at least one first traffic metric value indicative of a network performance degradation, to identify the network attribute value associated with the detected first traffic metric value, and to control monitoring of the second type of network traffic to increase in volume for the identified network attribute value, or for a network attribute value having the potential of correlating with the identified network attribute value.
Also presented is network management system comprising the apparatus for controlling monitoring of network traffic. Brief Description of the Drawings
Further aspects, details and advantages of the present disclosure will become apparent from the detailed description of exemplary embodiments below and from the drawings, wherein:
Fig. 1 is a diagram illustrating a system embodiment of the present disclosure;
Fig. 2 is a block diagram illustrating an embodiment of a monitoring control apparatus in accordance with the present disclosure;
Fig. 3 is a flow diagram of a method embodiment of the present disclosure;
Fig. 4 is a schematic diagram of a collection of data sets in accordance with the present disclosure;
Figs. 5 & 6 are signalling diagrams according to embodiments of the present disclo¬ sure;
Fig. 7 is a flow diagram illustrating a further method embodiment of the present disclosure;
Figs. 8A - 12 are schematic diagrams illustrative of monitoring results.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be apparent to one skilled in the art that the present disclosure may be practiced in other embodiments that depart from these specific details.
While, for example, some embodiments of the following description focus on an exemplary core network configuration in accordance with 5G specifications, the pre¬ sent disclosure is not limited in this regard. In particular, the present disclosure could also be implemented in other wired or wireless communication networks (e.g., ac¬ cording to 4G specifications). Those skilled in the art will further appreciate that the steps, services and functions explained herein may be implemented using individual hardware circuits, using software functioning in conjunction with a programmed microprocessor or general purpose computer, using one or more Application Specific Integrated Circuits (ASICs) and/or using one or more Digital Signal Processors (DSP). It will also be appreciated that when the present disclosure is described in terms of a method, it may also be embodied in one or more processors and one or more memories coupled to the one or more processors, wherein the one or more memories store one or more computer programs that perform the steps, services and functions disclosed herein when exe- cuted by one or more processors.
In the following description of exemplary embodiments, the same reference numerals denote the same or similar components. Fig. 1 illustrates an embodiment of a system 10 in which the present disclosure can be implemented. The system 10 comprises a communication network domain 100 configured to monitor network traffic and a network management (NM) domain 200 configured to control network traffic monitoring in the communication network domain 200 and to analyze the monitoring results.
In the embodiment of Fig. 1, the communication network to be monitored is config¬ ured as a wireless cellular communication network. As such, the communication network domain 100 comprises one or more wireless terminal devices 110, a radio network access (RAN) domain 120 and a core network (CN) domain 130, as generally known in the art. The RAN domain 120 and the CN domain 130 each comprises a large number of network functions (NFs). A particular NF may be a software entity (e.g., implemented using cloud computing resources), a stand-alone hardware entity (e.g., in the form a network node), or a combination thereof. In some variants, the NFs may conform to the definitions of "network functions" as standardized by 3GPP in its 5G specifications, but in other variants (e.g., in 4G implementations) this may not be the case.
The NM domain 200 comprises an event collector 210 configured to receive and, optionally, store and pre-process network event information resulting from network monitoring. The NM domain 200 further comprises a monitoring control apparatus 220 configured to analyse the (pre-processed) event information to arrive at monitoring control decisions. As understood herein, network events are to be construed broadly. Network events generally characterize what is happening in the communication network domain 200, such as session initiation or termination, the status of an ongoing session, transmission of a certain amount of data and so on. So called Key Performance Indicators (KPIs), usually numeric values, can be reported as events as such or as characteristic parameters of one or more events, such as session initiation time, ratio of unsuccessful session initiations, the amount of transmitted bytes over a given amount of time and so on. An event can be reported when it is locally detected at a dedicated monitoring site (e.g., a dedicated NF) or in response to probing. The network events can be standardized (e.g., 4G or 5G) signalling events or vendor-specific events (of, e.g., a network node acting as NF). Event probing may be performed in the communication network domain 100 to capture the events at a network interface, or to capture user plane traffic, sample it and generate user plane traffic metrics that are to be reported as one or more events.
KPIs can be calculated from or attributed to one or multiple events. As an example, a handover failure can be reported in an event. Exemplary KPIs calculated from this or these events either locally in the communication network domain 100 or centrally in an NM domain 200 are a number of handover failures or a ratio of the handover failures and the total handovers in a certain period of time. As another example, an NF user plane probe may report a throughput event every 5 s in a dedicated event report. An average throughput KPI can be calculated locally or centrally as the average of these throughputs for 1 min, and a maximum throughput KPI can be calculat¬ ed locally or centrally as the maximum of the reported throughputs in 1 min.
In the following, an embodiment of the monitoring control apparatus 220 of Fig. 1 will be described with reference to Fig. 2, and operational details of the monitoring control apparatus 220 will be described with reference to a method embodiment as illustrated in flow diagram 300 of Fig. 3.
In the apparatus embodiment illustrated in Fig. 2, the monitoring control apparatus 220 comprises a processor 222 and a memory 224 coupled to the processor 222. The memory 224 stores program code (e.g., in the form of a set of instructions) that controls operation of the processor 22 so that the monitoring control apparatus 220 is operative to perform any of the method aspects presented herein (see Fig. 3). As understood herein, a processor, such as processor 222, may be implemented using any processing circuitry and is not limited to, for example, a single processing core, but may also have a distributed topology (e.g., using cloud computing resources). The monitoring control apparatus 220 further comprises an input interface 226 and an output interface 228. The two interfaces 226, 228 are configured for communication with the event collector 210 on the one hand and the communication network domain 100 (e.g., individual NFs therein) on the other hand.
Now referring to the flow diagram 300 of Fig. 3, operation of the monitoring control apparatus 220 comprises processing of data sets that include (possibly pre- processed, such as aggregated) event information obtained from monitoring network traffic in the communication network domain 100. In this regard, the flow diagram 300 illustrates a step 302 of analyzing, by the monitoring control apparatus 220, a collection of first data sets to detect at least one first traffic metric value indicative of a network performance degradation. The first data sets may be stored on the event collector 210 and accessed by the monitoring control apparatus 220 via its input interface 226.
The network traffic in the communication network domain 100 comprises network traffic of a first type and network traffic of a second type, wherein the first data sets analyzed in step 302 have been obtained for the first network traffic type. The network traffic of the first type is, or includes, at least one of real-time traffic, voice traffic and uplink traffic. Uplink traffic refers to traffic originating at the terminal devices 110. The network traffic of the first type may be governed by at least one of a connectionless communication protocol and the Real-time Transport Protocol (RTF).
It has been found that traffic governed by a connectionless communication protocol such as RTP is particularly sensitive to degradations due to transport or radio issues.
The network traffic of the first type may have a lower traffic volume per predefined period of time than the network traffic of the second type. As an example, voice traffic, as an exemplary network traffic of the first type, is real-time traffic that is only around 1-5 % of the total traffic volume, the latter being dominated by MBB traffic .
The network traffic of the second type is, or includes, at least one of non-real-time traffic, service traffic (in particular multimedia streaming traffic or Internet traffic), Mobile Broad Band (MBB) traffic and uplink traffic. The network traffic of the second type may have a higher traffic volume per predefined period of time than the net¬ work traffic of the first type. The network traffic can further be classified in accordance with mutually exclusive network attribute values of one or more network attribute dimensions (abbreviated as "attribute values" and "attribute dimensions" hereinafter). Each attribute dimension may define a set of possible sources of the network performance degradation. As such, an attribute dimension can also be viewed as defining a set of network entities that each may individually degrade network performance due to, for example, a malfunction. The attribute values spanning a given attribute dimension may define mutually exclusive sub-sets of one or more such network entities (e.g., to allow a proper "drill down" for troubleshooting purposes in case network performance degradations are detected in step 302).
The one or more attribute dimensions may, for example, comprise one or more of: a) at least one network subscription-related dimension for a subscription- based communication network (e.g., subscription type, roaming status, etc.); b) at least one terminal device-related dimension for a communication net¬ work comprising individual terminal devices (e.g., terminal type, terminal model, terminal vendor, terminal capabilities, etc.); c) at least one network hierarchy-related dimension for a communication network split in multiple hierarchy levels (e.g., RAN node vs. CN node, network slice, etc.); and d) at least one network geography-related dimension for a communication network split in dedicated geographical regions (e.g., network cell, routing area, tracking area, registration area, etc.)
Each of those attribute dimensions comprises a set of mutually exclusive (numerical or non-numerical) attribute values, or simply attributes. For example, the attribute values of the dimension "network cell" can be cell identifiers, the attribute values of the dimension "terminal type" can be "smartphone", "dongle", "IoT device", and similarly for other dimensions.
Monitoring of the first type of network traffic yields first data sets (see step 302 in Fig. 3), with each first data set being indicative of a dedicated value of a (possibly aggregated) first traffic metric and an associated attribute value of one of the one or more attribute dimensions. Similarly, monitoring of the second type of network traffic may in some variants yield corresponding second data sets, with each second data set being indicative of a dedicated value of a (possibly aggregated) second traffic metric and an associated attribute value of one of the one or more attribute dimensions.
The second traffic metric may be different from or identical with the first traffic met- ric. The network traffic of the first and/or second type may be packet-based, and the first and/or second traffic metric may be a packet-based traffic metric. Alternatively, or in addition, the network traffic of the second type may relate to multimedia streaming, and the second traffic metric may be multimedia streaming-related traffic metric (e.g., a video-related KPI, such as video stall time). In addition, or alternative- ly, the network traffic of the second type may relate to an Internet service, and the second traffic metric may be an Internet service-related traffic metric.
The content of the first and second data sets, in particular in regard to the traffic metric values, is at least partially derived from event information that has been obtained (e.g., measured) in the communication network domain 100 for the associated traffic type before being communicated to the NM domain 200 (see the two arrows in the center of Fig. 1). Optionally, the event information may enter a particular data set in aggregated form, for example aggregated across subscribers or subscriber- sessions associated with the attribute value in the data set and/or across a certain period of time (and possibly averaged). As such, an aggregated traffic metric value can be obtained by aggregating non-aggregated subscriber-related or subscriber session-related traffic matric values across those monitored subscribers or subscriber sessions that comply with the attribute value that is associated with the traffic metric value in a given data set. Aggregation may occur in one or both of the communica- tion network domain 100 and the NM domain 200 (e.g., by the event collector 210). Further optionally, the event information may be "enriched" (e.g., by the event collector 210 or by a local monitoring site, such as a dedicated NF, in the communica¬ tion network domain 100) with further information, such as attribute-related information. Such further information may be obtained from an information source different from a local monitoring site in the communication network domain 100.
A given data set thus associates a value pertaining to a certain traffic metric (such as packet loss, video stall time or bitrate) with a value of an attribute indicative of an attribute dimension for which the traffic metric value has been obtained. Different attribute values (e.g., different Tracking Area Codes, TACs) are defined per attribute dimension (e.g., Tracking Area, TA). A particular attribute value (e.g., TAC ID1) is associated, in a data set, with a value of a given traffic metric (e.g., average packet loss or any video-related KPI such as video stall time). Fig. 4 illustrates a data storage (e.g., a database) in the event collector 210 and the individual data sets collected therein. An exemplary first subset of those data sets associates different TAC IDs with corresponding average packet losses per TAC, and an exemplary second subset associates the same TAC IDs with corresponding video KPIs. In a given data set, the traffic metric value included therein may have been obtained based on aggregating (e.g., averaging) individual traffic metric values across a period of time and across a certain population of subscribers or subscriber sessions all associated with a particular attribute value, such as a given TAC ID.
The two subsets of data sets illustrated in Fig. 4 may all pertain to the first type of network traffic. Alternatively, one of the subsets may pertain to the first type of network traffic and the other of the subsets may pertain to the second type of network traffic. It is to be noted that different traffic metric types may be available for the different types of network traffic, and that not all traffic metric types may be available for all network traffic types.
It will be appreciated that the data sets illustrated in Fig. 4 could be stored in any format, for example as a table, list, etc. It will further be appreciated that more than two traffic types may be defined. Similarly, more than two different subsets of data sets may be provided by the event collector 210 for analysis by the monitoring control apparatus 220.
To avoid the drawbacks of randomly sampling individual subscribers of subscriber session, all (or at least over 50%) of network subscribers or network subscriber sessions for which network traffic of the first type is detected at a given monitoring site (e.g., a given NF) may be monitored. As such, the data sets analyzed in step 302 may be (almost) complete in this regard. Returning to Fig. 3, the method further comprises a step 304 of identifying the at¬ tribute value associated with the particular traffic metric value that was detected (e.g., using a threshold decision) in step 302 to be indicative of a network perfor¬ mance degradation. To this end, the attribute value may be read from the data set in which the particular traffic metric value was detected.
The method continues with controlling, in step 306 of Fig. 3, monitoring of the second type of network traffic to increase in volume for the identified attribute value, or for an attribute value having the potential of correlating with the identified attribute value. The identified attribute value and the attribute value having the potential of correlating with the identified attribute value may relate to the same possible source of network performance degradation. When a network performance degradation is detected, monitoring of the second type of network traffic may increase in volume for an attribute dimension specifically related to the network traffic of the second type. The attribute dimension related to the network traffic of the second type may not be available for the network traffic of the first type (but may, in some variants, have the potential of correlating therewith). The non-availability may be due to inherent differences between the two types of network traffic. As an example, if the first type of network traffic is real-time (e.g., voice) traffic and the attribute dimension is related to real-time traffic, such an attribute dimension will not be available if the second type of network traffic is MBB traffic (e.g., video streaming).
To effect the control in step 306, a monitoring control command may be transmitted by the monitoring control apparatus 220 to the communication network domain 100. Transmission of such a control command is illustrated by an arrow on the right-hand side of Fig. 1.
Controlling monitoring of the second type of network traffic to increase in volume may comprise at least one of (i) increasing a traffic sampling rate at a given traffic monitoring site (e.g., a given NF) in the communication network domain 100 and (ii) suitably adjusting a traffic filter at a given monitoring site. The monitoring control command sent in step 306 may thus be indicative of an increased sampling rate to be applied to the second type of network traffic in regard to subscribers or subscriber sessions. Alternatively, or in addition, the monitoring control command may be indicative of an traffic filter setting to be adjusted so that more (e.g., all) of the network traffic of the second type is monitored. In some cases, the traffic filter setting may define a set of subscribers for which subscriber sessions are to be monitored for event reporting purposes. In such a case, the traffic filter setting may comprise a white list of subscribers to be monitored or a black list of subscribers not to be moni¬ tored. The corresponding list may be defined using Subscription Permanent Identifiers (SUPIs), International Mobile Subscriber Identifiers (IMSIs) or any other identifier type. The list may, for example, include or exclude certain subscribers based on consent or subscription type. Therefore, controlling monitoring of the sec¬ ond type of network traffic to increase in volume may comprises increasing a per- centage of network subscribers or network subscriber sessions for which network traffic of the second type is detected at a given monitoring site.
In a first implementation, the second type of network traffic is not monitored at all prior to step 306. After controlling monitoring of the second type of network traffic to increase in volume in step 306, the second type of network traffic is monitored to yield the second data sets that are each indicative of a dedicated value of the second traffic metric and associated with the identified attribute value, or the attribute value having the potential of correlating therewith.
In a second implementation, prior to step 306, the second type of network traffic is already monitored to yield a certain number of the second data sets over a prede¬ fined period of time. Then, after controlling monitoring of the second type of network traffic to increase in volume, the second type of network traffic is monitored to yield a higher number of second data sets than before controlling monitoring of the second type of network traffic to increase in volume in step 306.
For troubleshooting, in both implementations a possible source of the network performance degradation may be identified based at least on the second data sets yield- ed after controlling monitoring of the second type of network traffic to increase in volume. In the first implementation, differences in the second traffic metric values of second data sets yielded before and after controlling monitoring of the second type of network traffic to increase in volume may be evaluated.. If no (or no substantial) differences are found, monitoring of the second type of network traffic may be con- trolled to decrease in volume again for the identified attribute value, or for the attrib¬ ute value having the potential of correlating with the identified attribute value.
In some variants, the first data sets are analyzed is step 302 in regard to a first at¬ tribute dimension ("primary" attribute dimension) to detect the at least one first traffic metric value indicative of a network performance degradation. After initiation of step 306, the increased volume of the monitored second type of network traffic may then be analyzed in regard to a second attribute dimension ("secondary" attrib¬ ute dimension) different from the first attribute dimension so as to localize a possible source of the network performance degradation. Evidently, the increased volume of the monitored second type of network traffic may also be analyzed in regard to the first attribute dimension.
In the following, further aspects of the above embodiments will be described with specific reference to communication networks of the 4G and 5G types as illustrated in Figs. 5 and 6. It will be evident that many of those aspects are not specifically limited to a 4G or 5G implementation.
As illustrated in Fig. 5 and explained above with reference to Fig. 1, the NM domain 200 comprises an event collector 210 and a monitoring control apparatus 220. In the embodiment of Fig. 5, the monitoring control apparatus 220 comprises one or more network analytics components 220A configured to perform at least steps 302 and 304 of Fig. 3 and a monitoring controller 220B configured to perform at least step 306 of Fig. 3. The monitoring controller 220B is provided as an extra control layer between the one or more analytics components 220A on the one hand and the RAN and CN domains 120, 130 on the other hand.
The analytics components 220A may be configured as customer experience management (CEM) systems or subscriber analytics systems (such as Ericsson Experts Analytics, EEA, systems). The analytics components 220A may be comprised by one or more of network operation centres (NOCs), service operation centres (SOC) and network optimization engineering (NOE) systems. In some implementations, the analytics components 220A are configured to monitor and analyse service quality and network quality on a subscriber level. The analytics components 220A may be soft- ware entities implemented, for example, using cloud computing resources, hardware entities, or combinations thereof. The analytics components 220A are each configured to send network analytics requests to the event collector 210 that receives these requests via a dedicated interface 210A. Moreover, the event collector 210 comprises a further dedicated interface 210B towards the RAN domain 120 and the CN domain 130 to receive network event information.
The RAN and CN domains 120, 130 comprise a plethora of NFs 122, 132, respectively. Each NF 122, 132 comprises a bi-directional communication link to the NM domain 200 for receiving monitoring control commands from the NM domain 200 on the one hand and reporting network information resulting from the monitoring to the NM domain 130 on the other hand. The exemplary NFs 122, 132 of Fig. 5 belong to a 4G/5G wireless communication network as standardized by the 3rd Generation Part¬ nership Project (3GPP). In more detail, the CN domain 130 comprises, inter alia, multiple User Plane Functions (UPFs), a Session Management Function (SMF) and an Access and Mobility management Function (AMF). While not shown in Fig. 5, the CN domain 130 may, for example, additionally comprise a Mobility Management Entity (MME) and gateways, such as a Serving Gateway (SGW) and a Packet Data Network Gateway (PGW), see also Fig. 6. The RAN domain 120 comprises multiple base stations in the form of so-called 4G eNodeBs (eNBs) and 5G gNodeBs (gNBs).
The network scenario of Fig. 6 illustrates further aspects of a 4G/5G communication network with dedicated communication interfaces between the various NFs and the terminal device (also called User Equipment, UE, 110). As is well known, such communication network types comprise a user plane on which network traffic is routed as well as a control plane that is, inter alia, used to control network traffic routing.
Fig. 6 illustrates that the SGW connects a 4G Evolved Packet Core (EPC) part of the CN domain 130 towards the RAN domain 120, while the PGW connects the EPC to an IP network, such as an IP Multimedia Subsystem (IMS) 134. In a 5G part of the CN domain 130, those connections are provided by the UPF.
IMS 134 provides control and media functions for real-time voice services (such as Voice over LTE, VoLTE, or Voice over NR, VoNR) and other real-time services. As such, event information pertaining to VoLTE- or VoNR-related (or other real-time) user plane traffic - as an exemplary first type of network traffic - can be obtained from the IMS 134 and/or various 4G/5G NFs 132 in the CN domain 130, such as the UPF, SGW and/or PGW (see thick arrows in Fig. 6). Network event monitoring at those event capture points can be performed using, for example, physical probes, software probes or node logs. In a similar manner, network event reporting can be performed in parallel for a second type of network traffic, for example MBB traffic.
Prior to reporting the event information resulting from monitoring of the user plane in the communication network domain 100, this information can be enriched with one or more attribute values of one or more attribute dimensions locally available at user plane event capture points or received in reports from control plane-related NFs. This enrichment can be based on correlating information from the user plane and the control plane, using for example one or both of Fully Qualified Tunnel Endpoint IDs (FTEIDs) and Fully Qualified Session Endpoint IDs (FSEIDs) in case of NFs 132 in the CN domain 130 (as one of these IDs will always be available on both the user plane and the control plane). The correlation and enrichment with attribute values can additionally or alternatively be done using Internet Protocol (IP) addresses related to voice sessions in case of the IMS 134 and/or using Border Gateway Function (BGF) and Session Border Gateway (SBG) data. BGF and SBG are two NFs within the IMS 134. The correlation of user and data plane data, which is explained above, can be done using Internet Protocol (IP) addresses obtained from the signalling procedures between the BGF and SBG.
In the following, a further method embodiment will be discussed with reference to the flow diagram 700 of Fig. 7. While this method embodiment will partially be described with reference to the network scenario of Fig. 6, it may likewise be implemented in the more general network contexts of Fig. 1 or 5. Moreover, while the method embodiment will exemplarily be described with RTP-controlled uplink voice traffic (e.g., VoLTE or VoNR traffic) as the first network traffic type and MBB traffic as the second network traffic type, the method embodiment may also be practiced in any other traffic scenarios.
The method illustrated in Fig. 7 includes two dedicated phases, namely a "normal operation" phase that is followed by a "troubleshooting" phase in case a network performance degradation has been detected. From the "troubleshooting" phase, the method may loop back to the "normal operation" phase.
As far as network traffic monitoring is concerned, the two phases essentially differ from each other in that monitoring of the second type of network traffic increases in volume in the "troubleshooting" phase compared to the "normal operation" phase. This also means that the hardware and software resources consumed by the network monitoring in the communication network domain 100 can be reduced in the "normal operation" phase, while - in the exemplary embodiment of Fig. 7 - the first type of network traffic is fully monitored so as to increase the likelihood of detecting a net- work performance degradation. It has been found that real-time traffic, which constitutes the first type of network traffic in the scenario of Fig. 7, is particularly sensitive to any network performance degradation and can thus be considered as an "early indicator" of any issues that may also effect other traffic types. Measurements have shown that the volume of MBB traffic (second network traffic type) strongly depends on the service type. For video or file download services this volume can easily be in the range of hundreds of megabytes per session. As a com¬ parison, the volume of voice traffic (first network traffic type) is around one mega¬ byte per session. Therefore, to reduce the resource impact of traffic monitoring in the communication network domain 100 while still ensuring a meaningful coverage of the monitoring, in the "normal operation" phase the following monitoring setup is used: 1) MBB traffic: continuously monitor all services for a small percentage (such as 10%) of subscribers (e.g., subscriber sessions) using random sampling see (step 702 in Fig. 7) to continuously collect associated traffic metrics. Optionally, this percentage can be reduced to zero.
2) RTP-based voice traffic: continuously monitor only uplink traffic, but for all subscribers (e.g., all subscriber sessions; see step 704) to continuously collect associated traffic metrics. This 100% monitoring can be reduced to a high percentage (e.g., above 50%).
KPIs may be calculated based on aggregating subscriber-related or subscriber session-related metric values derived by network traffic monitoring. For calculating KPIs for the time dimension and one or more attribute dimensions with a certain precision (confidence interval associated with a confidence level), a well-defined number of samples (e.g., of monitored events) is needed. Monitoring of the MBB traffic with a random sampling of 10% of all subscribers has turned out to be sufficient in this regard, which results in only around 10% of a resource consumption footprint as well. Monitoring of the voice traffic leads to a small resource consumption footprint anyhow even when covering all the subscribers. This means that larger network performance degradations (e.g., in regard to quality of service) can be easily recognized without full subscriber coverage. Even smaller degradations can be also identified, making troubleshooting feasible.
The event collector 210 or the analytics components 220A of Fig. 5, or any of the NFs in Fig. 5 or Fig. 6, is configured to correlate information from multiple data sources so as to enrich the network event information obtained by probing, reporting or otherwise (see step 706). In particular, the network event information can be enriched with parameters which are not available in the events as such, such as subscription types, subscriber groups, physical coordinates, terminal vendors, etc. The main goal of enrichment is to add, or increase, the number of attribute dimensions common for both types of network traffic, or to identify hidden correlations between attribute dimensions or attribute values (e.g., due to common user behavior). As such, multiple data sets are obtained for each type of network traffic, see Fig. 4. It will be appreciated that the same set of traffic metric values can be aggregated across subscribers or subscriber sessions for different attribute dimensions, so that different subsets of data sets can be derived for the same set of traffic metric values. Also, a drilldown per "primary" attribute dimension may be performed in step 706. As an example, KPIs may be filtered for attribute values. If there is a degradation such as a specific KPI value issue which affects only a limited number of subscribers, it may not be detected if one monitors the KPIs for all attribute values of a given attribute dimension in aggregated form. If there is an issue causing network performance degradation directly related to a specific attribute value of a given attribute dimension, it can be detected by comparing the KPI values of the different attribute values. If, for example, there is a technical issue with terminal devices of vendor A (as attribute value) out of the terminal devices of all vendors (as attribute dimen¬ sion), it can be detected in this way. If one looks at the KPIs aggregated across the terminal devices of all vendors, the issue may not be detectable (e.g., because it affects a relatively small number of all subscriber sessions). The drilldown may al¬ ready belong to the analysis phase (see also step 302 of Fig. 3).
There exists a number of attribute dimensions that are common for voice and MBB traffic, including the following:
1) Geographical area / network hierarchy (i.e., what is the location of the sub¬ scriber, which network elements serve the communication - to identify if a certain issue has network wide or geographically/hierarchically limited impact): i. cell (4G cell, 5G cell, including dual-connectivity cases) ii. radio node (e.g., eNB, gNB, including dual-connectivity cases) iii. core node (e.g., MME, SGW, PGW, AMF, SMF, UPF) iv. routing area, tracking area, registration area v. network slice
2) Subscriber (i.e. what kind of subscriber(s) are affected by a certain issue): i. subscription type ii. roaming / home
3) Terminal device (i.e. what kind of device(s) are affected by a certain issue): i. terminal type (e.g mobile, dongle, etc.) ii. terminal vendor, terminal model iii. terminal capabilities Note that other attribute dimensions, which are not explicitly applicable or available to MBB traffic, can be defined as well. It is enough to have a, many times hidden, correlation between traffic services connected to the "primary" attribute dimension.
In step 708, the data sets thus obtained for the voice traffic, in particular the traffic metrics information such as KPIs contained in the data sets, is analyzed (as explained above with reference to step 302 of Fig. 3). It has been found that real-time network traffic such as voice traffic is particularly sensitive for network issues that lead to network performance degradations. As an example, the following RTP metrics are indicative of whether there is any service quality degradation:
RTP packet loss
RTP stream gaps (i.e., consecutively lost packets)
RTP packet sequence anomalies (e.g., forward and backward jumps)
RTP delay
RTP jitter (i.e., delay variation)
In step 710, the data sets that may have been obtained for MBB traffic are generally analyzed. Generic service degradation can be detected by pre-set thresholds, and dynamic anomaly detection functions can indicate if one or more KPIs deteriorate for a certain dimension. The reason to collect a limited amount of MBB traffic as well beside the RTP traffic is to obtain a high level view and values for normal operation cases of MBB-related KPIs. Analysis may be based on a graph showing MBB KPIs in relation to primary dimensions. This is meant by "generic" analysis. The non-generic analysis will be the drilldown for the increased traffic volume in regard to primary and, possibly, secondary dimensions for troubleshooting (see step 718).
In step 712, a decision is made based on the traffic metric value analysis as to whether or not there exists a network performance degradation (using, e.g., one or more thresholding decisions), see also step 302 of Fig. 3. If there is no degradation, the "normal operation" phase continues with steps 702, 704 and the cycle is repeated. Otherwise, i.e., if a network performance degradation can be detected in step 712, the method continues with step 714 and enters the "troubleshooting" phase, see also step 304 and 306 of Fig. 3.
In step 714, the attribute dimension and attribute value of associated with the traffic metric value indicative of the service performance degradation are determined, as generally explained above with reference to step 304 in Fig. 3. Also in step 714, the monitoring of the MBB traffic is increased in volume, as generally explained above with reference to step 306 in Fig. 3
A detected voice traffic degradation for a certain attribute dimension and a certain attribute value gives an indication where to shift the full-coverage monitoring for the MBB traffic to improve troubleshooting. If a degradation in one of the attribute dimensions is detected for the monitored voice traffic, due to the common background, there is an increased probability that other traffic types, which are only partially monitored in the "normal operation" phase, are also degraded.
The MBB-based troubleshooting requires more data than collected during the "normal operation" phase, but there is no need to increase data collection for the entire communication network and for all the subscribers. When evaluating the above described voice metrics and drilling them down according to the above described "pri- mary" attribute dimensions, the increase of the data collection can be well directed, or focused, to the identified dimensions only.
For collecting more events for the identified problematic attribute value(s), repre¬ sentative sampling may used at the NFs. Representative sampling is done by combin- ing filtering and sampling capabilities of the NFs. For example, assume that in steps 708 and 712 a particular registration area is identified in which the RTP metrics are degraded. The UPFs support filtering of event information for the attribute dimension "registration area". During normal operation, 10% of MBB traffic is monitored at the UPFs for each individual registration area, using random IMSI sampling. After detect- ing an RTP issue or other performance degradation in relation to a particular registra¬ tion area, the MBB traffic monitoring for that specific registration area is increased, for example to 50%, still using random IMSI sampling. Another option is to increase the MBB traffic monitoring to 100%. In this case, no sampling is needed in relation to the problematic registration area.
In another example, a subscriber group is identified in steps 708 and 712 for which RTP traffic metrics are degraded (e.g., subscribers having a particular subscription type). In this case, one of the analytics components 220A generates an IMSI white list, which includes the subscribers belonging to the identified subscriber group. This white list is configured at the UPFs. The UPFs will only send events related the subscribers in the white list in addition to the random 10% of subscribers. Optionally, the increased volume of monitored MBB traffic may be analyzed further based on the "primary" (see step 706) and/or a "secondary" attribute dimension, see step 716. The following MBB data service KPIs are examples of what can be analyzed in relation to both the "primary" and "secondary" dimensions. Note: some of the below traffic metrics are applicable for any traffic type (e.g., throughput) while some others (e.g., stall time ratio) are specific to certain traffic types or services (e.g., video): throughput, bitrate - packet loss ratio, packet retransmission ratio, round trip time video stall time ratio, video resolution, video MOS web page access time, web page download success ratio
The following "secondary" attribute dimensions are data service specific, hence they can be analyzed during detailed MBB-based troubleshooting in step 716 (although not applicable as "primary" attribute dimension for voice traffic). On the one hand, these attribute dimensions are analyzed to set the right scope of the very detailed data collection. On the other hand, identification of a certain problematic attribute dimension can show the root cause itself or can guide the troubleshooting process to find the root cause of the service quality degradation. data network service provider service functionality (e.g., video, gaming etc.) and other traffic classification type attributes client application radio quality parameters, Reference Signal Received Power/Quality (RSRP/RSRQ), etc. The required sample size is calculated based on the required target precision. Based on the central limit theory, even if the distribution of a given KPI is not normal, the mean value follows a normal distribution. The confidence interval of the mean, therefore, is 2*Z*s/sqrt(n), where Z is the value of Z distribution at the chosen confidence level (e.g., 95%), s is the standard deviation of the population, and n is the sample size. Based on this formula, the required number of samples for a target confidence interval can be determined. Reference is now made to step 718. When the monitoring and, thus, data collection is increased in volume in one or more attribute values of an attribute dimension, the MBB-related traffic metric values are calculated for these one or more attribute values and are compared with the ones for other attribute values of the same attribute dimension. If it is found in step 720 that they are not different at the chosen confi¬ dence level (e.g., the confidence intervals of these values overlap), the sampling rate for these dimensions is restored in step 722 to the basic level (e.g., 10%). Additionally, or in the alternative, the MBB-related traffic metric values calculated for the "problematic" one or more attribute values as derived at the lower sampling rate are compared with those traffic metric values that have been calculated at the higher sampling rate. If it is found in step 720 that they are not different at the chosen confidence level (e.g., the confidence intervals of these values overlap), the sampling rate for these dimensions is restored in step 722 to the basic level (e.g., 10%). The method then enters the "normal operation" phase again.
If it turns out in step 720 that the issue giving rise to the network performance degradation could not be fixed or that no correlation has been found in step 720, a more detailed and possibly manual troubleshooting is performed in step 724. In the following, a troubleshooting example will be described with reference to the schematic diagrams of Figs. 8A to 12 and in the context of Fig. 7.
It is assumed here that in steps 708 and 712, a network performance degradation for the first type of network traffic (i.e., voice traffic or other real-time traffic) is identi- fied. In more detail, two RTP traffic metrics values (average RTP packet loss in the uplink and average RTP forward jumps in the uplink) are found to be problematic (e.g., above a given threshold) for the attribute dimension "Tracking Areas", concretely for the attribute values "TAC ID 13816" and possibly "TAC ID 11456". This situation is illustrated in Fig. 8A (RTP packet loss) and Fig. 8b (RTP forward jumps).
Even when drilling down the traffic metrics available for the less-monitored second type of network traffic (i.e., MBB) traffic for TAC ID 13816, i.e., in the same "primary" attribute dimension, (steps 710 and 712), no significant degradation is visible yet. This situation is shown in Fig. 9A (averaged video quality of experience) and Fig. 9B (downlink TCP session throughput). Even when additionally drilling down the traffic metrics available for the less- monitored second type of network traffic (i.e., MBB) traffic in regard to a "secondary" attribute dimension, no further explanations can be obtained, mainly because of the low sampling rate. This situation is illustrated in Fig. 10A for KPI "averaged video quality of experience" and a combination of the "primary" attribute value TAC 13816 and different "secondary" attribute values ("TubiTV", "Netflix", "Akamai" and "Tik- Tok") of the attribute dimension "service provider".
As a consequence, in step 714 the monitoring of the MBB traffic is increased in vol- ume for the "worst" TAC ID 13816 and possibly the "second worst" TAC ID 11456 as well. As such, "full" monitoring is focused on one or two attribute values of a given attribute dimension only. The result of the increased monitoring is illustrated in Fig. 10B, which shows that the video quality issues actually happen in relation to one dedicated service provider (here: TikTok). Accordingly, troubleshooting can be fo- cused to a limited number of tracking areas and a particular service provider. As an example, it may be guessed that routing issues or issues with server settings may exist in hardware installed by that service provider in that a particular tracking area.
As a side note, the diagrams of Figs. 11A and 11B show a comparison of the MBB- related traffic metric "averaged downlink TCP session throughput for classified traffic" for the attribute dimensions "tracking area" (here: TAC 13816) and "service provider" (here: "Facebook", "TikTok" and "Netflix") at a low sampling rate (Fig. 11A) and an increased sampling rate (Fig. 11B). Evidently, the precision increases with the sampling rate. However, these diagrams also show that not necessarily all MBB- related traffic metrics correlate with RTP-related traffic metrics as the throughput is not problematic for any service provider, not even the one for which video quality issues were detected (see Fig. 10B).
As a further side note, the diagram of Fig. 12 illustrates that not every "primary" attribute dimension that is identified as problematic from an RTP- or voice-related point of view is problematic from an MBB-related point of view as well. In the scenar¬ io of Fig. 12, the "worst" TAC 13816 was indeed problematic for MBB-related traffic for one service provider, as explained above, but the "second worst" TAC 11456 was not problematic for any service provider. As such, increased monitoring can immedi- ately be set back for TAC 11456 without delay as no correlation could be found (see steps 718, 720 and 722 in Fig. 7), while for TAC 13816 the increased monitoring can be continued until the issue has been fixed (optionally in cooperation with the service provider) before it is also set back after having checked that the issue has indeed been fixed. In sum, the resource footprint of network traffic monitoring can be kept at an optimally low level.
As has become apparent from the above description of exemplary embodiments, the technique presented herein reduces the overall volume of network traffic that has to be monitored (e.g., because random sampling is applied to "heavy volume"-type traffic such as MBB traffic), while still allowing a reliable detection of network performance degradations (e.g., because all or a significant part of a less voluminous type of real-time traffic is monitored). Upon detecting such a degradation, the technique allows focusing monitoring to a possibly problematic portion of the network traffic.
While the present disclosure has been described with reference to exemplary embodiments, it will be appreciated that the present disclosure can be modified in various ways without departing from the scope of the present disclosure as defined in the appended claims.

Claims

1. A method (300) of controlling monitoring of network traffic in a communication network (100), wherein the network traffic comprises network traffic of a first and a second type that can be classified in accordance with mutually exclusive network attribute values of one or more network attribute dimensions, wherein monitoring of the first type of network traffic yields first data sets, each first data set being indicative of a dedicated value of a first traffic metric and an associated network attribute value of one of the one or more network attribute dimensions, the method comprising: analyzing (302) the first data sets to detect at least one first traffic metric value indicative of a network performance degradation; identifying (304) the network attribute value associated with the detected first traffic metric value; and controlling (306) monitoring of the second type of network traffic to increase in volume for the identified network attribute value, or for a network attribute value having the potential of correlating with the identified network attribute value. 2. The method of claim 1, wherein each attribute dimension defines a set of possible sources of the net¬ work performance degradation, and wherein the associated attribute values define mutually exclusive sub-sets thereof. 3. The method of any of the preceding claims, wherein the first data sets are analyzed in regard to a first attribute dimension to detect the at least one first traffic metric value indicative of a network per¬ formance degradation; and wherein the increased volume of the monitored second type of network traffic is analyzed in regard to a second attribute dimension different from the first attribute dimension so as to localize a possible source of the network performance degradation.
4. The method of any of the preceding claims, wherein the one or more attribute dimensions comprise one or more of: i. at least one network subscription-related dimension for a subscription-based communication network; ii. at least one terminal device-related dimension for a communica¬ tion network comprising individual terminal devices; iii. at least one network hierarchy-related dimension for a communication network split in multiple hierarchy levels; iv. at least one network geography-related dimension for a communication network split in dedicated geographical regions.
5. The method of any of the preceding claims, wherein the first data sets comprise a first subset of first data sets for a first attribute dimension and a second subset of first data sets for a second attribute dimension.
6. The method of any of the preceding claims, wherein the first traffic metric is not available for the network traffic of the second type.
7. The method of any of the preceding claims, wherein the communication network is subscription based, and wherein the first traffic metric value in each of the first data sets is based on an aggregated value that has been obtained by aggregating non-aggregated subscriber- related or subscriber session-related traffic matric values across those monitored subscribers or subscriber sessions that comply with the attribute value that is associated with the first traffic metric value.
8. The method of any of the preceding claims, wherein the network traffic of the first type is or includes at least one of i. real-time traffic; ii. voice traffic; iii. governed by a connectionless communication protocol; iv. governed by the Real-time Transport Protocol, RTP; v. uplink traffic; and vi. having a lower traffic volume per predefined period of time than the network traffic of the second type.
9. The method of any of the preceding claims, wherein the network traffic of the second type is or includes at least one of i. non-real-time traffic; ii. service traffic, in particular multimedia streaming traffic or Inter¬ net traffic; iii. Mobile Broad Band, MBB, traffic; iv. uplink traffic; and v. having a higher traffic volume per predefined period of time than the network traffic of the first type.
10. The method of any of the preceding claims, wherein the communication network is subscription based, and wherein the first data sets are indicative of all network subscribers or network subscriber sessions for which network traffic of the first type is detected at a given monitoring site. 11. The method of any of the preceding claims, wherein controlling monitoring of the second type of network traffic to increase in volume comprises at least one of i. increasing a traffic sampling rate at a given monitoring site; and ii. adjusting a traffic filter at a given monitoring site.
12. The method of any of the preceding claims, wherein the communication network is subscription based, and wherein controlling monitoring of the second type of network traffic to increase in volume comprises increasing a percentage of network subscribers or network sub- scriber sessions for which network traffic of the second type is detected at a given monitoring site.
13. The method of any of the preceding claims, wherein prior to controlling monitoring of the second type of network traffic to increase in volume, the second type of network traffic is not monitored at all.
14.The method of any of the preceding claims, wherein after controlling monitoring of the second type of network traffic to increase in volume, the second type of network traffic is monitored to yield sec- ond data sets each indicative of a dedicated value of a second traffic metric and associated with the identified attribute value, or the attribute value having the potential of correlating therewith.
15. The method of any of claims 1 to 12, wherein prior to controlling monitoring of the second type of network traffic to increase in volume, the second type of network traffic is monitored to yield second data sets each indicative of a dedicated value of a second traffic metric and associated with the identified attribute value, or the attribute value having the potential of correlating therewith. 16. The method of claim 15, wherein after controlling monitoring of the second type of network traffic to increase in volume, the second type of network traffic is monitored to yield a higher number of second data sets than before controlling monitoring of the second type of network traffic to increase in volume.
17. The method of claim 14, 15 or 16, wherein at least one of the following conditions apply for the second traffic met- ric: i. the second traffic metric is identical with the first traffic metric; ii. the second traffic metric is different from the first traffic metric; iii. the network traffic of the second type is packet-based, and the second traffic metric is a packet-based traffic metric; iv. the network traffic of the second type relates to multimedia streaming, and the second traffic metric is multimedia streaming-related traffic metric; and v. the network traffic of the second type relates to an Internet service, and the second traffic metric is an Internet service-related traffic metric.
18. The method of claim 14 or 16, or claim 17 when depending on claim 14 or 16, comprising identifying a possible source of the network performance degradation based at least on the second data sets yielded after controlling monitoring of the second type of network traffic to increase in volume.
19. The method of claim 16, or claim 17 or 18 when depending on claim 16, comprising evaluating differences in the second traffic metric values of second data sets yielded before and after controlling monitoring of the second type of network traffic to increase in volume. 0.The method of claim 19, comprising controlling monitoring of the second type of network traffic to decrease in volume for the identified attribute value, or for the attribute value having the potential of correlating with the identified attribute value, if no or no substantial differences are found.
21.The method of any of the preceding claims, wherein the network traffic is packet-based, and wherein the first traffic metric is a packet-based traffic metric. 22.The method of any of the preceding claims, wherein the identified attribute value and the attribute value having the potential of correlating with the identified attribute value relate to the same possible source of the network performance degradation. 23. The method of any of the preceding claims, wherein detecting at least one first traffic metric value indicative of a network performance degradation comprises subjecting the first traffic metric values in the first data sets to a threshold decision. 24. The method of any of the preceding claims, comprising controlling, if a network performance degradation is detected, monitoring of the second type of network traffic to increase in volume for an attribute dimension related to the network traffic of the second type. 25. The method of claim 24, wherein the attribute dimension related to the network traffic of the second type is not available for the network traffic of the first type.
26. A computer program product comprising program code portions for performing the steps of any of the preceding claims when the computer program product is executed on one or more processors.
27. The computer program product of claim 26, stored on a computer-readable recording medium.
28. An apparatus (220) for controlling monitoring of network traffic in a communication network, wherein the network traffic comprises network traffic of a first and a second type that can be classified in accordance with mutually exclusive network attribute values of one or more network attribute dimensions, where- in monitoring of the first type of network traffic yields first data sets, each first data set being indicative of a dedicated value of a first traffic metric and an associated network attribute value of one of the one or more network attribute dimensions, the apparatus (220) being configured to: analyze (302) the first data sets to detect at least one first traffic metric value indicative of a network performance degradation; identify (304) the network attribute value associated with the detected first traffic metric value; and control (306) monitoring of the second type of network traffic to in- crease in volume for the identified network attribute value, or for a network attribute value having the potential of correlating with the identified network attribute value.
29.The apparatus of claim 28, configured to perform the steps of any of claims 2 to 24.
30. A network management system comprising the apparatus of claim 28 or 29.
PCT/EP2021/056054 2021-03-10 2021-03-10 Technique for controlling network traffic monitoring WO2022188966A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21712081.5A EP4305821A1 (en) 2021-03-10 2021-03-10 Technique for controlling network traffic monitoring
PCT/EP2021/056054 WO2022188966A1 (en) 2021-03-10 2021-03-10 Technique for controlling network traffic monitoring

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2021/056054 WO2022188966A1 (en) 2021-03-10 2021-03-10 Technique for controlling network traffic monitoring

Publications (1)

Publication Number Publication Date
WO2022188966A1 true WO2022188966A1 (en) 2022-09-15

Family

ID=74874828

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2021/056054 WO2022188966A1 (en) 2021-03-10 2021-03-10 Technique for controlling network traffic monitoring

Country Status (2)

Country Link
EP (1) EP4305821A1 (en)
WO (1) WO2022188966A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150065121A1 (en) * 2013-08-30 2015-03-05 International Business Machines Corporation Adaptive monitoring for cellular networks
US20150333992A1 (en) * 2014-05-13 2015-11-19 Cisco Technology, Inc. Dynamic collection of network metrics for predictive analytics
US10411978B1 (en) * 2018-08-09 2019-09-10 Extrahop Networks, Inc. Correlating causes and effects associated with network activity

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150065121A1 (en) * 2013-08-30 2015-03-05 International Business Machines Corporation Adaptive monitoring for cellular networks
US20150333992A1 (en) * 2014-05-13 2015-11-19 Cisco Technology, Inc. Dynamic collection of network metrics for predictive analytics
US10411978B1 (en) * 2018-08-09 2019-09-10 Extrahop Networks, Inc. Correlating causes and effects associated with network activity

Also Published As

Publication number Publication date
EP4305821A1 (en) 2024-01-17

Similar Documents

Publication Publication Date Title
US11758416B2 (en) System and method of network policy optimization
KR101503680B1 (en) Method and apparatus for network analysis
EP2676470B1 (en) Service centric measurements for minimizing drive tests
US8750133B2 (en) Method and monitoring component for network traffic monitoring
CN111614563A (en) User plane path selection method and device
US9185001B2 (en) Backhaul network performance monitoring using segmented analytics
US20140080447A1 (en) Multiple Protocol Session Record Mapper
Iyer et al. Automating diagnosis of cellular radio access network problems
JP2022539901A (en) Signaling storm blocking method, apparatus and device, and storage medium
EP3449596B1 (en) Technique for handling service level related performance data for roaming user terminals
US20220330071A1 (en) SYSTEMS AND METHODS FOR ADAPTIVE COLLECTION OF QUALITY OF EXPERIENCE (QoE) MEASUREMENTS
KR20210068106A (en) Data transmission method and device
EP4305821A1 (en) Technique for controlling network traffic monitoring
US20230370344A1 (en) Data processing node device and information transmission method performed in same device
CN114980148B (en) Network capability determining method and device
Ahokangas et al. Quality-of-Service Measurements: For end-to-end testing
KR20150121419A (en) Recording Medium, Method and Device for Setting up Inactivity Time
WO2015017973A1 (en) Data transmission method and apparatus

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2021712081

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021712081

Country of ref document: EP

Effective date: 20231010