WO2021160270A1 - Technique de détermination d'une incidence d'une défaillance dans un réseau de communication mobile - Google Patents

Technique de détermination d'une incidence d'une défaillance dans un réseau de communication mobile Download PDF

Info

Publication number
WO2021160270A1
WO2021160270A1 PCT/EP2020/053789 EP2020053789W WO2021160270A1 WO 2021160270 A1 WO2021160270 A1 WO 2021160270A1 EP 2020053789 W EP2020053789 W EP 2020053789W WO 2021160270 A1 WO2021160270 A1 WO 2021160270A1
Authority
WO
WIPO (PCT)
Prior art keywords
subscribers
failure
time
data
subscriber
Prior art date
Application number
PCT/EP2020/053789
Other languages
English (en)
Inventor
Attila BÁDER
László KOVÁCS
Gábor MAGYAR
Norbert PURGER
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Priority to PCT/EP2020/053789 priority Critical patent/WO2021160270A1/fr
Publication of WO2021160270A1 publication Critical patent/WO2021160270A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/04Arrangements for maintaining operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/064Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/10Scheduling measurement reports ; Arrangements for measurement reports

Definitions

  • the present disclosure generally relates to a technique for determining an impact of a failure in a mobile communication network.
  • the present disclosure is in the field of Fault Management (FM).
  • FM Fault Management
  • a typical approach for analyzing failure impacts in communication networks is to provide a designated function for a designated purpose, focusing on error cases. Examples related to mobile communications for analyzing handover failures are described in US 10,051,532. Aiming to be more generic, CN 207743973 U describes an architecture including an intelligent terminal and a component configured to analyze failure information.
  • US 5,872,911 focuses on assessing network failures in terms of an absolute impact on a type of aggregated service indicator, like total call traffic over the network.
  • a type of aggregated service indicator like total call traffic over the network.
  • the system like the mobile communication network, is designed to provide various levels of redundancies, such indicators alone might not even change - while the faults still cause degradation in other related metrics.
  • FM systems receive node and network alarms from the different network elements, e.g., network nodes of a mobile communication network. Alarms are related to node failures and not directly related to service usage and, therefore, the operator has no information about the impact on services and the number and type of impacted subscribers.
  • FM alarms are prioritized without properly considering the service or subscriber impact. Therefore, technical resources working on the solution of tickets generated by FM alarms are not allocated in an optimum way. It can happen that alarms which have no or little impact on services or subscribers are solved with high priority, and alarms having large impacts are down-prioritized.
  • the number of raised alarms in an FM system is high, usually much higher than the number the operator can analyze or solve.
  • Alarms are prioritized based on rules and best technical knowledge. The operator does not have exact information which alarms can be neglected. There is a high risk that an operator neglects or down- prioritizes important alarms.
  • a device for determining an impact of a failure in a mobile communication network comprises at least one processor and memory comprising instructions executable by the at least one processor.
  • the device is operable to receive fault management, FM, data informing of the failure in the mobile communication network.
  • the fault management data comprises information on a time of the failure and information on a failed cell of the mobile communication network.
  • the device is further operable to determine a set of possibly impacted cells, based on the information on the failed cell, obtain subscriber activity data for subscribers in the set of possibly impacted cells, calculate, based on the subscriber activity data, a network level impact value indicating an impact of the failure to the mobile communication network, and store the network level impact value in a fault register, in association with the failure.
  • the mobile communication network may operate according to a 3rd Generation Partnership Project (3GPP) standard, such as a 4G standard (e.g., LTE) or a 5G standard.
  • 3GPP 3rd Generation Partnership Project
  • 4G e.g., LTE
  • 5G 5th Generation Partnership Project
  • the mobile communication network is not limited to 3GPP mobile communication standards.
  • the present technique may also be employed in other mobile communication networks such as cdma2000 systems standardized by 3GPP2.
  • the mobile communication network is not limited to a particular radio access network (RAN) and may comprise a corresponding core network.
  • the mobile communication network may comprise a plurality of different radio access networks (RANs), such as LTE RAN and 5G RAN.
  • RANs radio access networks
  • the failure in the mobile communication network may refer to any failure in any node of the above-described mobile communication network, which might have an impact on other nodes and/or devices of the mobile communication network.
  • the failure might have an impact on mobile devices (e.g., user equipment, UE) connected to a base station (e.g., eNodeB or 5GNB) of a corresponding radio access network.
  • a failure in a particular base station e.g., eNodeB
  • the device may be integrated in a computer system of the mobile communication network.
  • the device may be part of an operations support system (OSS).
  • OSS operations support system
  • the device is capable of receiving FM data that is indicative of a fault having occurred in the mobile communication network.
  • the device may either be part of the mobile communication network or be communicatively coupled to the mobile communication network.
  • the device may be implemented in a cloud computing environment.
  • the at least one processor may comprise a plurality of processors at different sites.
  • the FM data may trigger an FM alarm.
  • the FM data may comprise further details of the failure, such as a type of the failure (planned or unplanned).
  • the information on the time of failure may be represented by a timestamp of the failure (indicating, e.g., date and time of the failure).
  • the information on the failed cell may be represented by an ID of a failed network node or a failed cell, such as a cell ID.
  • the failed cell may be part of the set of possibly impacted cells.
  • the possibly impacted cells may be regarded as neighboring cells of the failed cell.
  • the possibly impacted cells may use the same Radio Access Technology (RAT) or a different RAT with regard to that of the failed cell.
  • RAT Radio Access Technology
  • the subscriber activity data may describe details and/or statistics of a subscriber activity for the subscribers in the set of possibly impacted cells.
  • the subscriber activity may refer to activity before and/or after the time of the failure (indicated, e.g., by a timestamp). For example, for each of the subscribers of the mobile communication network, an activity may be recorded and saved as subscriber activity data together with a timestamp of the activity.
  • the subscriber activity data may indicate, which subscribers were active in a particular cell at a particular time.
  • the subscriber activity data may comprise entries for a plurality of times (e.g., at fixed intervals). The network level impact value is calculated based on the subscriber activity data.
  • a larger impact value may be calculated in a case where more subscribers are affected by the fault, whereas a smaller impact value may be calculated in a case where less subscribers are affected by the fault.
  • a larger impact value may be calculated in a case where subscribers using services that are more important are affected by the fault, whereas a smaller impact value may be calculated in a case where subscribers using services that are less important are affected by the fault.
  • the network level impact value may be stored in the fault register, in association with the failure, such that an impact of the failure may be put into context with other failures that have occurred or will occur.
  • a significance or importance of the fault can be derived, such that the fault can be correctly prioritized, if necessary.
  • a list of faults can be output, ordered by their network level impact value (i.e., ordered by their priority).
  • the device may be further configured to determine the set of possibly impacted cells based on the information on the failed cell and based on handover relation data indicative of previous handover processes between the failed cell and neighboring cells.
  • Handover relation data may include information on previous handover processes between cells of the mobile communication network. When handover processes involving the failed cell are considered, it is possible to identify neighboring cells. In other words, neighboring cells may be identified as cells from and/or to which previous handover processes have been performed involving the failed cell. These neighboring cells may be identified as the possibly impacted cells.
  • the device may be further configured to determine the set of possibly impacted cells based on the information on the failed cell and based on geo-location data indicative of geographic locations of a plurality of cells including the failed cell.
  • the device may consider geolocation data.
  • the geo-location data may be part of Configuration Management (CM) data.
  • CM Configuration Management
  • the geo-location data may indicate geographical locations (e.g., latitude and longitude coordinates) for each cell of the mobile communication network. Based on this information, with regard to the failed cell, neighboring cells or cells within a predefined distance from the failed cell can be identified as belonging to the group of possibly impacted cells.
  • the handover relation data may be used to make a pre-selection of neighboring cells and, in a second step, the geolocation data may be used for selecting the possibly impacted cells out of this pre ⁇ selection, wherein only cells within a predefined distance from the failed cell are considered.
  • the subscriber activity data may comprise subscriber activity data for subscribers in the failed cell.
  • the subscriber activity data may comprise at least one data set for each subscriber present in the failed cell and in the possibly impacted cells, the data set defining a type and/or identification number of the respective subscriber, wherein the subscriber activity data is indicative of a number of the subscribers present in the failed cell and in the possibly impacted cells.
  • the type of the respective subscriber may indicate a model type of a mobile device used by the subscriber.
  • the identification number of the respective subscriber may be an international mobile subscriber identity (IMSI) or an international mobile equipment identity (IMEI). Based on the plurality of data sets that are part of the subscriber activity data, the number of subscribers present in the failed cell and the possibly impacted cells can be derived. Further, this number may also be explicitly indicated in the subscriber activity data (e.g., number of active subscribers for each cell, for each time).
  • At least one of the data sets may comprise a communication service used by the respective subscriber and a key performance indicator, KPI, value for the communication service.
  • the communication service may refer to a particular type of communication service used by the corresponding subscriber at a particular time.
  • Examples of communication services are, e.g., video streaming, audio streaming, data download, telephony, video telephony, etc.
  • the corresponding KPI may be a KPI suitable for describing a performance of the corresponding communication service. For example, "stall ratio" may be used for "video” communication service and “throughput” may be used for "data download” communication service.
  • the communication service may be at least one of the list including non-encrypted TCP, encrypted TCP, VoLTE, non-encrypted video, and non-encrypted web.
  • Suitable KPIs for the communication services encrypted TCP and non-encrypted TCP may be at least one of create session, create PDP, LTE Attach, Packet loss (all directions and network segments), RTT / delay, TCP throughput, and rtt_term_sum_avg.
  • Suitable KPIs for the communication service VoLTE may be at least one of IMS Session Setup Time & Session Setup Attempts, Session Establishments, Audio MOS and voice quality issues (muting, garbling, etc.), IMS Call Drops, Call Duration, Mean opinion score, soft drop, call setup success, packet loss.
  • Suitable KPIs for the communication service non-encrypted video may be at least one of video stall ratio, init time, and mean opinion score (MOS).
  • Suitable KPIs for the communication service non- encrypted web may be at least one of webpage access time, download time, and success ratio.
  • Each of the data sets may comprise a time stamp and the subscriber activity data may comprise data sets with time stamps indicating a time before the time of the failure and data sets with time stamps indicating a time after the time of the failure.
  • Subscriber activity data may be recorded and stored in a subscriber activity data storage, from which it is retrieved for each received FM data.
  • a time of recording the respective activity data may be stored in association with the corresponding data set in the form of a timestamp. For example, in case the subscriber activity data is continuously stored (i.e., at fixed or flexible time intervals), data sets exist having a timestamp indicating a time before the time of the failure and data sets exist having a timestamp indicating a time after the time of the failure. These data sets can be analyzed in order to calculate the network level impact value.
  • the device may be further configured to calculate the network level impact value based on a final impact value for subscribers in the set of possibly impacted cells that are active after the time of the failure.
  • the final impact value may also be regarded a final impact value for all active users (as compared to users that have gone lost because of the failure. If not indicated otherwise, the expressions "users” and “subscribers” are used as synonyms herein.
  • the final impact value may indicate an estimated KPI deterioration for the users that are active also after the fault (indicated by the FM data).
  • Calculating the final impact value may comprise determining, based on the subscriber activity data, a set of active subscribers that were active in the possibly impacted cells before and after the time of the failure. Calculating the final impact value may further comprise, for each subscriber of the set of active subscribers, for each communication service, and for each KPI type defined for the given communication service, computing a difference of the KPI values observed for the subscriber before the time of the failure and after the time of the failure, and weighting the differences according to a predefined KPI weight per communication service, to obtain a service impact value for the subscriber per each defined communication service.
  • Calculating the final impact value may further comprise weighting the individual service impact values for each subscriber with a predefined service weight value and summing the weighted service impact values to obtain a total subscriber impact value and summing the total subscriber impact values for each of the active subscribers to obtain the final impact value.
  • the KPI weights for the different KPIs may be stored in a table within a memory of the device.
  • the service weight values for the different communication services may be stored in a table within a memory of the device.
  • the device may be further configured to calculate the network level impact value based on a number of subscribers that are lost in the mobile communication network due to the failure.
  • the subscribers that are lost due to the failure may correspond, e.g., to subscribers that lose a wireless connection to the failed cell due to the failure.
  • the network level impact value may be higher and in case the number of subscribers that are lost is smaller, also the network level impact value may be smaller.
  • the device may further be configured to calculate the number of subscribers that are lost based on learnt cell traffic data, the learnt cell traffic data comprising an average estimate of a number of subscribers in a specific cell for a specific time.
  • the learnt cell traffic data may be derived from the subscriber activity data.
  • the learnt cell traffic data may represent a typical condition in the mobile communication network (i.e., without a fault).
  • the learnt cell traffic data may indicate, for a particular time of day, an average estimate of a number of subscribers in a specific cell.
  • the data may be specific for a particular weekday (e.g., typical condition on Tuesdays, 9:00 am). Averaging may also be carried out on the basis of other aspects (other than the weekday), e.g., a particular month, a holiday/no holiday, etc.). Further, it may be averaged over a longer period (e.g., half a day, a full day, etc.).
  • the device may further be configured to calculate the number of subscribers that are lost based on a number of subscribers active before the time of the fault, derived from the subscriber activity data, a number of subscribers active after the time of the fault, derived from the subscriber activity data, an average estimated number of subscribers active before the time of the fault, derived from the learnt cell traffic data, and an average estimated number of subscribers active after the time of the fault, derived from the learnt cell traffic data.
  • the number of subscribers that are lost may correspond to a difference between the number of subscribers active before the time of the fault and the number of subscribers active after the time of the fault, wherein a typical percentage loss/increase is considered based on the average estimated number of subscribers active before/after the time of the fault. For example, when the average estimated number of subscribers active before/after the time of the fault indicates a typical loss at that time of 10%, then this loss is considered during the determination of the number of subscribers that are lost due to the fault. In other words, it is possible to consider only the users that are lost actually due to the fault and to ignore those who would have entered/exited the cell in any case (in the considered period).
  • the number of subscribers active before the time of the fault may be derived from data recorded as close to a timestamp of the fault as possible before the fault and the number of subscribers active after the time of the fault may be derived from data recorded as close to a timestamp of the fault as possible after the fault.
  • the device may further be configured to calculate the network level impact value based on an average KPI impact value, wherein the average KPI impact value is calculated based on historical average KPI levels for each defined communication service, and based on the KPI values of the subscriber activity data after the time of the impact, using predefined KPI weights per communication service and predefined service weight values.
  • the KPI weights and the predefined service weight values may correspond to those mentioned above.
  • the average KPI impact value enables to determine an impact on KPIs for users that have no corresponding entry in the subscriber activity data in the possibly impacted cells before the time of the fault.
  • the device may further be configured to calculate the network level impact value based on the final impact value, based on the number of subscribers that are lost, and based on the average KPI impact value.
  • Each of the parameters final impact value, number of subscribers that are lost, and average KPI impact value may be weighted with a predefined weight value. Further, the parameters may be normalized such that a single number may be output as network level impact value (e.g., a number between 0 and 10), depending on the impact.
  • the network level impact value NLIV may be calculated as:
  • NLIV a*FIV + b*IMOSTAL + c*AKIV, wherein FIV is the final impact value, NOSTAL is the number of subscribers that are lost, and AKIV is the average KPI impact value.
  • the parameters a, b, and c are predefined weighting and normalization factors.
  • the individual values for FIV, NOSTAL, and AKIV may also be stored in the fault register, in association with the failure. Thereby, a more detailed analysis regarding the impact of the failure can be performed. In other words, it may be possible to prioritize the individual faults on the basis of different criteria.
  • the device may further be configured to update the network level impact value stored in the fault register, in association with the failure.
  • the network level impact value may be updated in case at least one of the parameters, on the basis of which the previous network level impact value has been calculated, has changed.
  • the network level impact value may be updated in case the subscriber activity data has changed. More precisely, an update may be carried out when a timestamp T_after in the subscriber activity data of a relevant entry is newer than a timestamp TJastupdate of the network level impact value stored in the fault register.
  • a new network level impact value may be calculated by considering the newer subscriber activity data.
  • a method for determining an impact of a failure in a mobile communication network comprises receiving fault management, FM, data informing of the failure in the mobile communication network.
  • the fault management data comprises information on a time of the failure and information on a failed cell of the mobile communication network.
  • the method further comprises determining a set of possibly impacted cells, based on the information on the failed cell, obtaining subscriber activity data for subscribers in the set of possibly impacted cells, calculating, based on the subscriber activity data, a network level impact value indicating an impact of the failure to the mobile communication network, and storing the network level impact value in a fault register, in association with the failure.
  • the device of the first aspect may also apply to the method of the second aspect.
  • the device of the first aspect may be configured to perform the method of the second aspect.
  • Determining the set of possibly impacted cells based on the information on the failed cell may further be carried out based on handover relation data indicative of previous handover processes between the failed cell and neighboring cells.
  • Determining the set of possibly impacted cells based on the information on the failed cell may further carried out based on geo-location data indicative of geographic locations of a plurality of cells including the failed cell.
  • the subscriber activity data may comprise subscriber activity data for subscribers in the failed cell.
  • the subscriber activity data may comprise at least one data set for each subscriber present in the failed cell and in the possibly impacted cells, the data set defining a type and/or identification number of the respective subscriber, wherein the subscriber activity data is indicative of a number of the subscribers present in the failed cell and in the possibly impacted cells.
  • At least one of the data sets may comprise a communication service used by the respective subscriber and a key performance indicator, KPI, value for the communication service.
  • Each of the data sets may comprise a time stamp and the subscriber activity data may comprise data sets with time stamps indicating a time before the time of the failure and data sets with time stamps indicating a time after the time of the failure.
  • Calculating the network level impact value may be based on a final impact value for subscribers in the set of possibly impacted cells that are active after the time of the failure.
  • the method may further comprise calculating the final impact value, wherein calculating the final impact value comprises determining, based on the subscriber activity data, a set of active subscribers that were active in the possibly impacted cells before and after the time of the failure, for each subscriber of the set of active subscribers, for each communication service, and for each KPI type defined for the given communication service, computing a difference of the KPI values observed for the subscriber before the time of the failure and after the time of the failure, and weighting the differences according to a predefined KPI weight per communication service, to obtain a service impact value for the subscriber per each defined communication service, weighting the individual service impact values for each subscriber with a predefined service weight value and summing the weighted service impact values to obtain a total subscriber impact value, and summing the total subscriber impact values for each of the active subscribers to obtain the final impact value.
  • Calculating the network level impact value may be based on a number of subscribers that are lost in the mobile communication network due to the failure.
  • the method may further comprise calculating the number of subscribers that are lost, wherein calculating the number of subscribers that are lost is based on learnt cell traffic data, the learnt cell traffic data comprising an average estimate of a number of subscribers in a specific cell for a specific time.
  • the method may further comprise calculating the number of subscribers that are lost, wherein calculating the number of subscribers that are lost is based on a number of subscribers active before the time of the fault, derived from the subscriber activity data, a number of subscribers active after the time of the fault, derived from the subscriber activity data, an average estimated number of subscribers active before the time of the fault, derived from the learnt cell traffic data, and an average estimated number of subscribers active after the time of the fault, derived from the learnt cell traffic data.
  • Calculating the network level impact value may be based on an average KPI impact value, wherein the average KPI impact value is calculated based on historical average KPI levels for each defined communication service, and based on the KPI values of the subscriber activity data after the time of the impact, using predefined KPI weights per communication service and predefined service weight values.
  • Calculating the network level impact value may be based on the final impact value, based on the number of subscribers that are lost, and based on the average KPI impact value.
  • the method may further comprise updating the network level impact value stored in the fault register, in association with the failure.
  • a computer program comprises instructions which, when the program is executed by a computer, cause the computer to carry out the method of the second aspect.
  • a computer-readable medium comprises instructions which, when executed by a computer, cause the computer to carry out the method of the second aspect.
  • the computer-readable medium may be a tangible or non-tangible computer- readable medium.
  • Examples of a computer-readable medium are, e.g., an optical recording medium, a magnetic recording medium, a solid-state recording medium, etc.
  • Fig. 1 shows the impact a failure in a failed cell has on different mobile devices
  • Fig. 2 shows an arrangement of a device according to the present disclosure in the architecture of a mobile communication network
  • Fig. 3 shows a schematic representation of a device for determining an impact of a failure in a mobile communication network according to the present disclosure
  • Fig. 4 shows a schematic representation of a method for determining an impact of a failure in a mobile communication network according to the present disclosure
  • Fig. 5 shows a further schematic representation of a method for determining an impact of a failure in a mobile communication network according to the present disclosure
  • Fig. 6 shows a further schematic representation of a technique for determining an impact of a failure in a mobile communication network according to the present disclosure.
  • the expression base station when in this disclosure the expression base station is used, it generally refers to a base station of a mobile communication network, said base station being configured to provide a cell.
  • the base station may be an evolved NodeB (eNodeB or eNB) and in 5G networks, the base station may be one of the base stations defined in the 5G standard (e.g., gNB, ng-eNB, herein summarized as 5GNB).
  • eNodeB evolved NodeB
  • 5G 5G standard
  • the expression mobile device when used, it refers to a mobile device of a subscriber (or user) connected to a particular cell (e.g., a smartphone, a portable computer, a tablet, etc.). The mobile device may also be referred to as UE.
  • Fig. 1 shows a cell 10, in which a cell outage has occurred.
  • a technical fault may have occurred in the base station hosting the cell 10, which has led to the outage.
  • the cell 10 will be referred to as failed cell or faulty cell.
  • the failed cell 10 is surrounded by (exemplarily 6) neighbor cells 12.
  • exemplary UEs are shown in Fig. 1 (as squares and triangles), which are located either within the failed cell 10 or within one of the neighbor cells 12.
  • the cell outage may have a different impact on each of the UEs. The impact is not necessarily predetermined only by the fact whether the respective UE is located within the failed cell 10 or not.
  • a neighbor cell 12 takes over the connection to the respective UE.
  • the following possible impacts to the UEs in the failed cell 10 and in a neighbor cell 12 are possible: Lost in faulty cell, impacted in faulty cell, impacted in neighbor cell, no impact in faulty cell, and no impact in neighbor cell.
  • the impact of a cell failure is a) difficult to predict and b) requires a consideration of the full impact (e.g., on each failed cell and in other possibly impacted cells). It is therefore an object of the present disclosure to provide a technique that considers situations like that shown in Fig. 1 and which outputs a reliable network level impact value with regard to a particular failure.
  • Fig. 2 shows a device for determining an impact of a failure in a mobile communication network, wherein said device is referred to as customer experience manager (CEM) 20.
  • CEM customer experience manager
  • the CEM 20 also has additional functions apart from those described below, wherein those additional functions are not relevant for the present invention and will therefore not be described herein.
  • the CEM 20 is part of an operations support system (OSS) 22.
  • the CEM 20 comprises a network and subscriber analyzer (impact analysis) 24 and a correlator 26.
  • the correlator 26 receives data from various devices of the mobile communication network and correlates this data in order to be further processed by the network and subscriber analyzer 24.
  • a fault management module FM, a performance management module PM, and a configuration management module CM are connected to the network and subscriber analyzer 24.
  • the OSS 22 monitors a core network, an LTE radio network, and a 5G radio network.
  • a plurality of UEs is wirelessly connected to each of the radio networks (LTE and 5G). More precisely, a plurality of UEs is connected to a corresponding base station (eNodeB and 5GNB) of the respective radio network.
  • the core network comprises a session management function (SMF), an access and mobility management function (AMF), and a user plane function (UPF).
  • SMF session management function
  • AMF access and mobility management function
  • UPF user plane function
  • the aforementioned elements of the radio networks and the core network each are configured to transmit data to the correlator 26 of the CEM 20.
  • Fig. 3 shows a schematic structure of a device 30 for determining an impact of a failure in a mobile communication network, according to the present disclosure.
  • the device 30 comprises a network interface 32 that is adapted to communicatively couple the device 30 to the mobile communication network.
  • the device 30 further comprises a processor 34 and a memory 36 containing instructions executable by the processor 34 to cause the device 30 to perform the method shown in Fig. 4.
  • the interface 32 may be configured to receive fault management (FM) data informing of a failure in the mobile communication network.
  • FM fault management
  • the fault register may also be part of the memory 36.
  • the device 30 may be the device 20 shown in Fig. 2.
  • Fig. 4 shows a flowchart of a method for determining an impact of a failure in a mobile communication network.
  • the method starts with a step of receiving 40 fault management (FM) data informing of the failure in the mobile communication network, the fault management data comprising information on a time of the failure and information on a failed cell of the mobile communication network.
  • FM fault management
  • a set of possibly impacted cells is determined 42, based on the information on the failed cell.
  • subscriber activity data is obtained 44 for subscribers in the set of possibly impacted cells.
  • a network level impact value indicating an impact of the failure to the mobile communication network is calculated 46.
  • the network level impact value is stored 48 in a fault register, in association with the failure.
  • Fig. 5 shows a different representation of a method for determining an impact of a failure in a mobile communication network according to the present disclosure.
  • the method described with regard to Fig. 4 may be considered a more generic representation of the method of Fig. 5.
  • an FM alarm is received.
  • the FM alarm may comprise a plurality of additional information, e.g., information on the failed cell (e.g., cell ID) and a timestamp of a time the failure was detected.
  • a second step 52 geo-location data and neighbor cell relation of cells is obtained/refreshed. This information helps to identify the impacted cells in the following step 53.
  • step 54 subscriber activity for the impacted cells before and after the failure is obtained. The subscriber activity may be recorded as a continuous process, e.g., at fixed time intervals, wherein each entry of subscriber activity data is provided with a corresponding timestamp.
  • the obtained data is correlated.
  • activity patterns are learned.
  • the activity patterns may help to obtain a typical activity in the failed cell and the possibly impacted cells, i.e., a hypothetical activity in those cells for the case that the failure would not have occurred.
  • an impact analysis is performed on the basis of the data received and processed in the previous steps. As a result, a network level impact value is calculated and output.
  • an impact of the alarm (in the form of the network level impact value) is sent to a fault management (FM) system.
  • FM fault management
  • Embodiments of the present disclosure correlate the customer experience information (service metrics), the FM/CM (fault management/configuration management) data, and the cell level reference data including geo-location and/or handover relation information. Then the impact of the cell outage is analyzed and a report is created.
  • the report comprises at least one of:
  • the list of impacted (serving and neighboring) cells Per cell reports i.e. a set of well-defined KPIs (RAN, Traffic and Service metrics, Handover failure increase) computed for before and after the incident List of subscribers categorized into at least one of the following categories: o Lost in the faulty cell o Moved from the faulty cell to a neighboring cell o Connected to a neighboring cell but impacted by the outage of the faulty cell (by the temporary load increase because of the outage) o Connected to a neighboring cell and not impacted by the outage List of impacted services for each subscriber and cell Calculated total impact of the outage based on the individual service impacts including spillover effects to neighboring cells (that helps alarm prioritization)
  • KPIs RAN, Traffic and Service metrics, Handover failure increase
  • learned historical patterns may be used, which include cell traffic and handover statistics.
  • service quality expressed as service related KPIs and their weighted combination, is used to quantify the failure and monitor the recovery of the system.
  • the device for determining an impact of a failure in a mobile communication network may receive and process a plurality of data elements in order to output information on impacted cells and services and/or a subscriber impact report.
  • One or more of the following data elements may be processed by the device: KPI pre-processing, correlation (QoE measurements), see item 1.2.2 below; fault management (time, cell identifier), see item 1.2.1 below; configuration management (neighbor information), see item 1.2.5 below; cell reference data (name, location, etc.), see item 1.2.4 below; and customer information (group info, ARPU, etc.), see item 1.2.3 below.
  • the analytics system of the device automatically collects the following information from external systems in near real-time.
  • the input data used for the analysis may comprise at least one of the following items:
  • the analytics system subscribes for radio node and cell failure alarms in the network management FM systems.
  • An FM ticket contains - among other data - one or more of the following key fields:
  • Real-time E2E (end-to-end) customer experience information is obtained from a customer assurance system about active subscribers, including one or more of the following:
  • IMSI International Mobile Subscriber Identity
  • IMEI International Mobile Equipment identity
  • Used service(s) i.e., communication service (e.g. Web browsing, Video streaming, etc.)
  • communication service e.g. Web browsing, Video streaming, etc.
  • the impact analysis obtains one or more of the following customer related information from customer reference data:
  • IMSI International Mobile Subscriber Identity
  • IMEI International Mobile Equipment identity
  • Base station at least one of:
  • Radio access technologies e.g. 3G, LTE, 5G
  • a learnt activity pattern may be considered by the analytics system of the device.
  • the analytics system may consider at least one of the following learnt patterns:
  • the system For each cell, the system maintains and stores historical traffic and QoE (Quality of Experience) data, at least one of: • Day of week and time of day
  • the system collects and maintains handover statistics regarding the question what are the most frequent neighbors for each cell
  • the impact analysis process may receive and process at least one of the above data and/or learnt patterns.
  • Fig. 6 shows an overview of a method according to an embodiment of the present disclosure.
  • the impact analysis process is triggered by a new incoming FM alarm.
  • a geographical area for which the impact is calculated must be determined. This is done by determining a set of radio network cells in the network that are located near the faulty cell contained in the FM alarm (so-called set of possibly impacted cells).
  • the correlation phase creates all the necessary data from the above input data.
  • the determination of which cells to include in the impact analysis is based on at least one of the following data: Handover relations and geo-location data.
  • Handover relations One of these data types may be sufficient for determining the set of possibly impacted cells.
  • the process may be repeated for the different radio access technologies, by utilizing the geo-location of the failed cell. For example, in case of a 4G cell alarm, first the neighboring 4G cells are included in the set of possibly impacted cells, then further 3G cells are added to the set, those that are at the same geo-location as the problematic 4G ceil and then neighbors of those.
  • the FM alarm is enriched with all the details about the number and type of subscribers present in the actual cell (failed cell) and the previously determined potentially impacted cells, their activity types together with their perceived QoS (Quality of Service) per used communication service, one-by-one.
  • the data is generated both for the latest timestamp available in the E2E monitoring system before the alarm timestamp, then for subsequent timestamps after the alarm timestamp.
  • the resulting data set of this correlation and enrichment is the input to the impact analysis, see the Impact Analysis section 1.4.3.
  • the resulting data set may also be referred to as subscriber activity data.
  • the resulting correlated data set is structured as follows, by example:
  • a goal of the technique proposed by the present disclosure is to properly assess the identified faults (coming as input from external FM system), namely estimate the total impact (or pain) each reported fault represents for the subscribers in the operator's network.
  • the estimation may include the spillover effects of faults in a cell to neighboring network nodes / cells. The larger the impact estimate, the more severe the problem is.
  • the initial result of the impact estimate (i.e., the network level impact value) is recorded in the central fault register as a new entry.
  • the faults stored in the central fault register are then periodically updated with the actual impact enabling the continuous ranking of all the problems present at a time in the operator's network.
  • the fault entry is finally removed from the central fault register.
  • Step 1 (New alarm) There is a new incoming FM alarm for cell C, with timestamp T. Current time is T_current.
  • Step 2. Cell set
  • C_all Determine the possibly impacted cell set (C_all) as described above (item 1.4.1).
  • Step 3. T_before
  • T_before Let T_before be the time for which there exists Customer Experience data (see item 1.2.2) [depending on the resolution of the customer experience data collection process] and T_before ⁇ T and (T-T_before) is minimal. In other words, T_before is the closest time before the alarm for which there exist Customer Experience data.
  • Step 4a If T_after cannot be determined (i.e., no such customer data exist that fulfils the time conditions), wait/sleep for a given period (a few seconds or minutes, depending on the resolution of the customer experience data collection process), then repeat Step 4.
  • Step 5 Based on the customer experience data for time T_before and T_after, determine the following values and sets:
  • Step 6. (Historical estimate) Based on the learnt traffic patterns for cells and traffic (see item 1.3.1), determine the following values:
  • U_after_history be the historical average estimate of the number of users in the cell set C_all for the time T_after.
  • Step 7 Estimate the number of users that are lost in the network due to the fault based on the values U_before, U_after, U_before_history and U_after_history. For example, the number NL of users that are lost (i.e., the number of subscribers that are lost) can be calculated as
  • the relation U_after_history / U_before_history indicates a typical relative loss/gain in the considered period. For example, in case it is known that the number of subscribers increases from 20 to 40 in the considered period, the above relation equals 2. This relation is considered as a factor in the above equation, such that it can be estimated, how many subscribers would be active after the fault in case the fault had not occurred. From this estimated number, the real number U_after is subtracted in order to estimate the numbers of subscribers that are lost.
  • Step 8 Estimate the possible KPI deterioration for the users that are active also after the FM alarm (i.e., they exists in Userset_before and Userset_after as well):
  • Step 9 (Average KPI impact) Based on the difference between the historical average KPI levels for each defined communication service in the impact model and the KPI levels at T_after, using the KPI weights and service weights (see item 1.7), estimate the generic KPI impact for cells in C_all. This estimate will enable the impact calculation extension to users who were not present in C_all at T_before.
  • the parameters a, b, and c are predefined weighting and normalization factors.
  • NLIV only based on one or two of the parameters FIV, NOSTAL, and AKIV.
  • the other two or one parameters are either not considered or are not even determined.
  • NLIV a* FIV + b*NOSTAL.
  • the NLIV equals the number of subscribers that are lost:
  • the FM alarm entries stored in the central fault register are periodically updated.
  • the update frequency depends on the availability of the new customer experience data set (the resolution of this data).
  • the following procedure is performed, when a given entry (C,T,C_all,TJastupdate) is updated and a new ⁇ T_after,networkJevel_impact> entry is calculated for the existing alarm for the latest impact estimation: T_after is the closest time to the current time for which customer experience data exists and it is already past the time when the FM alarm impact was last updated (TJastupdate).
  • Step 4a If T_after cannot be determined (i.e., no such customer data exist that fulfils the time conditions), wait/sleep for a given period (a few seconds or minutes, depending on the resolution of the customer experience data collection process), then repeat Step 1.
  • Step 2. (User numbers) Based on the customer experience data for time T_after, and the learned cell and traffic patterns, determine the following values:
  • U_after_history be the historical average estimate of the number of users in the cell set C_all for the time T_after.
  • Step 3 Estimate the number of users that are lost in the network due to the fault based on the values U_after and U_after_history.
  • Step 4. Average KPI impact
  • Step 5 (Impact calculation) Based on the estimated number of missing users (Step 3) and the average KPI impact (Step 4), compute the networkjeveljmpact for the FM alarm, and update it in the central fault register. Add the entry (T_after, networkjeveljmpact) as the latest computed ⁇ time,impact> value for the alarm, and change TJastupdate to be equal to T_after.
  • the proposed system generates a report on the impacted cells.
  • the generated report may include (but is not limited to) at least one of:
  • the list can be ordered by cost / services / service impact to support the different type of analyses and workflows at the operators.
  • Reports contain the most important infrastructure, traffic and service related KPIs, and show the degradation around the outage.
  • the report may include (but is not limited to) at least one of the following main categories and KPIs: o RAN KPIs to describe the radio environment and infrastructure (e.g. RSRP, RSRQ, CQI, RAN drops) o Traffic KPIs to describe the cell traffic and load (e.g. data volume, cell throughput, number of active users) o Service KPIs to describe the service experience of the different services MBB (data volume and service usage, TCP level metrics: throughput, packet loss, delay)
  • the records in the fault register are continuously updated with a chosen frequency, e.g. every 5-10 minutes.
  • the records in the fault register are deleted whenever the original alarm was cleared.
  • the fault register has at least the following fields per entry:
  • Example for KPI weights Table 3: The above tables are stored in the device and may be adapted, if necessary. Similar to the above KPI weights, service weights are assigned to the individual communication services.
  • the technique for determining an impact of a failure in a mobile communication network described herein may have one or more of the following advantages (depending on the details of the considered embodiment).
  • the operator receives prompt feedback about the number of affected subscribers and services related to a radio node or radio nodes failure in case of overlapping cells.
  • the system may provide detailed information on which services and which subscribers are affected and to what extent. This information is primarily useful for prioritization of fault management alarm tickets.
  • the operator can focus on solving issues which highly impact users and services. This information can also be used for customer care to improve customer relation management processes, e.g. offer compensation, or send notification and acknowledge that there was a problem.
  • the solution also identifies if services and subscribers were not impacted by a radio node failure or outage.
  • the solution can identify alarms that can be cleared/ignored, namely reduce the alarms the operator should handle.
  • the solution may provide information about the robustness, redundancy of the radio network, may identify the affected neighbor cells, may point out areas, where neighbor cells can take over traffic without any major problem; and may identify areas, where outage can cause coverage or capacity issues.
  • the solution and system may identify the critical areas where network improvements, capacity should be increased.
  • the system may distinguish two different failure types: planned outage when traffic is taken over in a planned way and sudden failure when fast processes try to solve the issue.
  • the system may determine the impacted subscribers and services lost in the faulty cell, the subscribers and traffic taken over by neighboring cells, the subscribers and services influenced by taking over subscribers and traffic from the faulty cell within a few minutes of the failure.
  • the system also may provide information in a longer time scale (e.g., a few hours) about the key performance indicators of the affected cells and area. Using the change of the neighbor relation and the handover KPIs among the neighbor cells the system may estimate the remaining coverage hole due to cell or radio node failure, considering the redundancy of the radio network in the affected area.
  • a longer time scale e.g., a few hours
  • the solution provides a generic way to estimate the impact of each radio network failure to support the network operator to focus on the most severe issues first, especially including the spillover effects of radio failures to neighboring cells in the estimation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

Selon un aspect, la présente invention concerne un dispositif de détermination d'une incidence d'une défaillance dans un réseau de communication mobile. Le dispositif comprend au moins un processeur et une mémoire comprenant des instructions exécutables par ledit processeur. Le dispositif permet de recevoir des données de gestion de défaillance (FM) informant de la défaillance dans le réseau de communication mobile. Les données de gestion de défaillance comprennent des informations concernant le moment de la défaillance et des informations concernant une cellule défaillante du réseau de communication mobile. Le dispositif permet en outre de déterminer un ensemble de cellules éventuellement concernées par la défaillance, sur la base des informations concernant la cellule défaillante, d'obtenir des données d'activité d'abonné pour des abonnés dans l'ensemble de cellules éventuellement concernées par la défaillance, de calculer, sur la base des données d'activité d'abonné, une valeur d'incidence de niveau de réseau indiquant une incidence de la défaillance sur le réseau de communication mobile, et de stocker la valeur d'incidence de niveau de réseau dans un registre de défaillances, en association avec la défaillance. D'autres aspects connexes concernent un procédé, un programme informatique et un support lisible par ordinateur.
PCT/EP2020/053789 2020-02-13 2020-02-13 Technique de détermination d'une incidence d'une défaillance dans un réseau de communication mobile WO2021160270A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2020/053789 WO2021160270A1 (fr) 2020-02-13 2020-02-13 Technique de détermination d'une incidence d'une défaillance dans un réseau de communication mobile

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2020/053789 WO2021160270A1 (fr) 2020-02-13 2020-02-13 Technique de détermination d'une incidence d'une défaillance dans un réseau de communication mobile

Publications (1)

Publication Number Publication Date
WO2021160270A1 true WO2021160270A1 (fr) 2021-08-19

Family

ID=69593675

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2020/053789 WO2021160270A1 (fr) 2020-02-13 2020-02-13 Technique de détermination d'une incidence d'une défaillance dans un réseau de communication mobile

Country Status (1)

Country Link
WO (1) WO2021160270A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230198869A1 (en) * 2021-12-22 2023-06-22 T-Mobile Innovations Llc Customer problem reporting

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5872911A (en) 1995-12-29 1999-02-16 Mci Communications Corporations Method and system of service impact analysis in a communications network
US7092707B2 (en) 2004-02-13 2006-08-15 Telcordia Technologies, Inc. Service impact analysis and alert handling in telecommunications systems
WO2012041555A1 (fr) * 2010-09-30 2012-04-05 Telefonaktiebolaget L M Ericsson (Publ) Procédé pour déterminer la gravité d'un incident de réseau
WO2012143059A1 (fr) * 2011-04-21 2012-10-26 Telefonaktiebolaget L M Ericsson (Publ) Reprise sur défaillances multiples dans un réseau de communication
US10051532B2 (en) 2012-01-30 2018-08-14 Nokia Solutions And Networks Oy Evaluating handover failures
CN207743973U (zh) 2017-10-11 2018-08-17 郑州极致科技有限公司 一种基于智能终端的移动网络故障分析系统
US10390238B1 (en) * 2018-10-30 2019-08-20 Amdocs Development Limited System, method, and computer program for quantifying real-time business and service impact of underperforming, overloaded, or failed cells and sectors, and for implementing remedial actions prioritization

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5872911A (en) 1995-12-29 1999-02-16 Mci Communications Corporations Method and system of service impact analysis in a communications network
US7092707B2 (en) 2004-02-13 2006-08-15 Telcordia Technologies, Inc. Service impact analysis and alert handling in telecommunications systems
WO2012041555A1 (fr) * 2010-09-30 2012-04-05 Telefonaktiebolaget L M Ericsson (Publ) Procédé pour déterminer la gravité d'un incident de réseau
WO2012143059A1 (fr) * 2011-04-21 2012-10-26 Telefonaktiebolaget L M Ericsson (Publ) Reprise sur défaillances multiples dans un réseau de communication
US10051532B2 (en) 2012-01-30 2018-08-14 Nokia Solutions And Networks Oy Evaluating handover failures
CN207743973U (zh) 2017-10-11 2018-08-17 郑州极致科技有限公司 一种基于智能终端的移动网络故障分析系统
US10390238B1 (en) * 2018-10-30 2019-08-20 Amdocs Development Limited System, method, and computer program for quantifying real-time business and service impact of underperforming, overloaded, or failed cells and sectors, and for implementing remedial actions prioritization

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230198869A1 (en) * 2021-12-22 2023-06-22 T-Mobile Innovations Llc Customer problem reporting

Similar Documents

Publication Publication Date Title
US10791478B2 (en) System and method for optimizing performance of a communication network
US10674388B2 (en) Wireless communication data analysis and reporting
US9031561B2 (en) Method and system for optimizing cellular networks operation
US8798611B2 (en) Mobile network monitoring
US8199654B2 (en) Method and apparatus for providing end-to-end high quality services based on performance characterizations of network conditions
US8611228B2 (en) Anomaly detection method and system and maintenance method and system
US9503919B2 (en) Wireless communication network using multiple key performance indicators and deviations therefrom
US9585036B1 (en) Determining cell site performance impact for a population of cell sites of a mobile wireless data network
US7292849B2 (en) Method and apparatus for determining gaps in cellular phone area coverage
JP2014225883A (ja) ネットワーク解析のための方法および装置
JP4941296B2 (ja) 移動通信のサービスレベル管理システム
Ozovehe et al. Performance analysis of GSM networks in Minna Metropolis of Nigeria
US20140066052A1 (en) Device and network monitoring correlation system for network troubleshooting
CN104838692A (zh) 用于单独地控制用户设备以便优化体验质量(qoe)的方法和设备
EP4122162B1 (fr) Analyse de performance de réseau efficace en ressources
US20110122761A1 (en) KPI Driven High Availability Method and apparatus for UMTS radio access networks
US20210258808A1 (en) Assessing the impacts of cellular network changes
US20210385670A1 (en) Method of controlling traffic in a cellular network and system thereof
WO2016020917A1 (fr) Procédé de fonctionnement d'un réseau auto-organiseur et système associé
US20230090169A1 (en) Monitoring a Communication Network
WO2021160270A1 (fr) Technique de détermination d'une incidence d'une défaillance dans un réseau de communication mobile
US10721707B2 (en) Characterization of a geographical location in a wireless network
US20140328188A1 (en) Maintaining High Signal Quality In Mobile Wireless Networks Using Signal Relative Importance Values
US20240224088A1 (en) Systems, apparatuses, and methods for network analysis
US20210306891A1 (en) Network monitoring system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20705667

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20705667

Country of ref document: EP

Kind code of ref document: A1