WO2015028714A1 - Fault anticipating service monitoring in a communications network - Google Patents

Fault anticipating service monitoring in a communications network Download PDF

Info

Publication number
WO2015028714A1
WO2015028714A1 PCT/FI2014/050651 FI2014050651W WO2015028714A1 WO 2015028714 A1 WO2015028714 A1 WO 2015028714A1 FI 2014050651 W FI2014050651 W FI 2014050651W WO 2015028714 A1 WO2015028714 A1 WO 2015028714A1
Authority
WO
WIPO (PCT)
Prior art keywords
class
data
event
values
malfunction
Prior art date
Application number
PCT/FI2014/050651
Other languages
French (fr)
Inventor
Jukka Rantala
Markus TAKALA
Original Assignee
Elisa Oyj
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Elisa Oyj filed Critical Elisa Oyj
Publication of WO2015028714A1 publication Critical patent/WO2015028714A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0604Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0781Error filtering or prioritizing based on a policy defined by the user or on a policy defined by a hardware/software module, e.g. according to a severity level
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5061Network service management, e.g. ensuring proper service fulfilment according to agreements characterised by the interaction between service providers and their network customers, e.g. customer relationship management
    • H04L41/5067Customer-centric QoS measurements

Definitions

  • the invention relates to a method, according to the preamble of Claim 1, for automatically predicting and classifying malfunctions in a communications network, with the aid of a computer system.
  • the invention relates particularly to the automatic prediction of malfunctions in a wireless mobile-telephone network and to classification based on the assumed effect of a malfunction.
  • the invention also relates to a computer system, according to the preamble of Claim 15, for a corresponding purpose.
  • the background art includes, among other things, various methods and systems for monitoring communications networks.
  • US 7,539,752 Bl discloses a proactive and predictive system for a communications network.
  • EP 2 408 143 Al discloses a proactive resource- management method for virtual networks.
  • WO 2010/112855 Al discloses an analysis method particularly for telecommunications networks.
  • WO 2012/143059 Al discloses a method a method for restoring a communications network after several malfunctions.
  • US 2007/0222576 Al discloses the dynamic prioritization of malfunction states in a communications network, on the basis of the deterioration of services.
  • the invention is intended to create a new technical solution for monitoring a communications network.
  • the invention is based on receiving, from the communications network, event information, which is classified according to various criteria, and the deviation data contained in which, such as malfunction data, being given severity values.
  • the severity values are weighted with weighting factors corresponding to the class data and, on the basis of these values, a group of reference values are calculated, which are compared to corresponding limit values. On the basis of the comparison, decisions can be made on possible further actions.
  • the method according to the invention is characterized by what is stated in the characterizing portion of Claim 1.
  • the system according to the invention is, for its part, characterized by what is stated in the characterizing portion of Claim 15.
  • the invention provides, among other things, an effective way to collect, process, and filter essential malfunction information from even a large number of individual event data.
  • the further actions can be focused and dimensioned correctly, and repair operations or proactive measures can be started quickly.
  • the method and system in which the collection and processing of information is done with the aid of the elements of the mobile network and/or the telephone network, the method and system can be implemented very cost-effectively and a very comprehensive geographical coverage can be achieved by means of the system.
  • the monitoring technical system can thus also contain other technical systems than the elements of the actual communications network.
  • the system in which the parameters are set dynamically and automatically, can be implemented to be self-learning.
  • the system's parameters are then initially given a default value and the system corrects the parameters on the basis of the received event information and possibly also of other information.
  • Such a system is able to weight events on the basis of previously collected information, in such a way that individual random events are filtered out and attention can be concentrated on cumulative problem points.
  • Figure 1 shows on a general level the system environment according to one embodiment.
  • FIG 2 shows in greater detail the information-processing system 2 shown in Figure 1, according to one embodiment.
  • Figure 3 shows processes according to embodiments.
  • Figure 4 shows the enrichment of an individual event, according to one embodiment.
  • Figure 5 shows the calculation of class-specific vertical triggers, according to one embodiment.
  • the system of Figure 1 comprises a data-processing system 2, which contains, among other things, given parameters 3 and a data store 4.
  • the system comprises an input interface 1, through which the data to be processed is received from data sources 0, as well as an output interface 5, through which the information enriched during processing is forwarded, e.g., to operation control and monitoring systems, or directly as action commands to operatives.
  • the data-processing system 2 is a data-processing system, such as a mediator system, contained in or relating to a telephone and communications network.
  • a data-processing system such as a mediator system, contained in or relating to a telephone and communications network.
  • telephone and communications network refers, for example, to a mobile-telephone network and a landline telephone network.
  • a data-processing system contained in or relating to a telephone and communications network refers, for its part, to the fact that the data-processing system also participates in the performance of actions relating to the operation of the telephone and communications network. Such actions are, for example, the collection of event data from the network elements of the telephone and communications network, processing of the event data, such as aggregation, correlation, enrichment, and pricing.
  • the data-processing system contained in or relating to the telephone and communications network does not refer to a system that has only a communications link to the telephone and communications network but which has no functional connection to the measures performed by and services provided by the telephone and communications network.
  • the service management can be very effectively implemented by exploiting the powerful servers used for controlling and monitoring the telephone and communications network.
  • these servers can, in addition to their basic tasks, thus be programmed to provide value- added services according to the embodiments.
  • the data sources 0 can be, for example, network elements of the telephone and communications network, as well as the terminal devices connected to the telephone and communications network.
  • FIG. 2 shows in greater detail the data-processing system 2 shown in Figure 1, according to one embodiment.
  • the data-processing system 2 of Figure 2 comprises, as one element, an online analysis module 21, which in turn includes a modelling engine 211 and an event analysis engine 212.
  • the data-processing system 2 of Figure 2 also comprises, as one element, a decision engine 23, which contains the parameters 3 and data store 4 already referred to above 4.
  • the decision engine 23 comprises a specific analysis engine for the classification data used in each embodiment.
  • the analysis engines are a location analysis engine 223, a service analysis engine 224, and a customer analysis engine 225.
  • the system can produce various event depictions, of which the figure shows horizontal, informative, proactive, and reactive event depictions. These event depictions can then be forwarded through the output interface 5.
  • the system of Figure 2 can produce different types of event depictions.
  • event depiction the data of the original event are enriched with weighting values and additional data.
  • Another type of event depiction is a completely new event or event depiction, which is created on the basis of the event data obtained.
  • New events or event depictions can be created for the data of each classification, which can be, for example, Customer, Location, and Service.
  • Each new event or event depiction can also have a malfunction class, given on the basis of threshold values, which can be, for example, Informative, Proactive, and Reactive.
  • the malfunction class Informative is intended to inform the maintainer of the system.
  • the malfunction class Proactive is intended for proactive actions in order to prevent an actual malfunction disturbing the service.
  • the malfunction class Reactive is, in turn, intended to depict malfunctions disturbing the service, which demand immediate reaction.
  • FIG 3 shows processes, which can be implemented with the aid of the system of Figure 2 between the input interface 1 and the output interface 5.
  • the actions are described with reference to the figure:
  • 301 Collect and store event -
  • the event depictions are collected and recorded in the system.
  • the event depictions are received as records, which contain data on the event.
  • a check is made as to whether the event depicted in the event depiction is a known event.
  • a severity factor has also been defined for the deviation information contained in the record and this has been weighted with a weighting factor corresponding to the class Location, and thus a weighted class-specific severity value has be obtained for the event depicted by the record, relative to the class Location.
  • the class-specific severity value obtained is added to the counter of the class Location, the value of which can be later compared to the so-called vertical trigger of the class Location.
  • 311 Check Prestored Service parameter - Check whether there is reason to alter the event's weight value based on the present fixed Service base value. This stage corresponds to stage 307.
  • stage 312 Check Service Experience value - Perform actions corresponding to stage 308, in the case of class Service. If exceptionally many events begin to accumulate for some service, its weight value increases.
  • the weighting factor can be increased automatically or manually, as has been described, for example in connection with stage 308.
  • the actual definition of the value can be implemented, for example in stage 314, at least when automatic control is being used.
  • stage 313 Check new information for event - Check for possible additional data.
  • This stage corresponds to stage 309 described above and corresponding variations are also valid for stage 313.
  • additional data can also be, for example, changes of version and corresponding important data relating to changes in service.
  • stage 319 Read summary of all Experience values: Customer, Location, Service -
  • the class-specific values of the class-specific counters are read.
  • Stage 319 thus gives the class-specific counter values of the classes Customer, Location, and Service as the initial values.
  • each class is processed separately. This is done after each event, i.e. when the original event has travelled through the whole weighting chain.
  • stage 309 is performed automatically as a batch run at predefined intervals of time, or always when a predefined number of new events have arrived at the buffer. At this point attention should be paid to the time window in which matters are examined.
  • Events can be monitored at intervals of, for example, a second, a minute, an hour, or a day and the time, so that naturally a different number of values accumulate for examination. If, on the other hand, a specific number of values are collected, the time interval to be examined varies, which must be taken into account. In one embodiment, for example, 4 different values are obtained each time from each dimensions, and also from each individual instance:
  • a class-specific event is created from each class, the class-specific vertical -trigger value of which is ascertained as having been exceeded in stage 320.
  • the class-specific event depicts the fact that a malfunction has occurred in the class in question.
  • the time windows in the examination can be of different lengths, the new event is created with the same weightings.
  • the same weighting factor as in the original event is used as the weighting in the case of each weighting (0 - 5).
  • a fixed value for example, 5, is used as the Severity value, so that possible values would be 0, 5, 10, 15, 20, and 25.
  • Severity is a Prestored parameter, which can be adjusted in the range 0 - 5, and must be set separately for each class.
  • 324 Store value - The reference value is recorded in the database.
  • 325 Create Reactive event - A new event is created, in which there is the multiple type Reactive. All the weighting values obtained along the way can be included in the event. This can be an entirely new Location, Service, or Customer-based event, or an original event, which has arrived here through the additions of all the weightings and additional data. Correspondingly, a combination value depicting the joint effect of the event obtained from stage 332 can also be examined. The event created can then be sent, for example, to the operator personnel for further action. Because this concerns a Reactive-class event, the malfunction requires immediate action and the system is intended to trigger a suitable alarm in steps. 326 Bigger than Proactive?
  • stage 321 check whether the reference obtained from stage 321 is greater than or equal to the threshold value of the malfunction class Proactive. If it is, go to stages 327 and 328. If the reference value is not greater than the threshold value of the malfunction class Proactive, go to stage 329.
  • stage 327 Store value - The reference value is recorded in the database.
  • Enrich weighted event - A value is created depicting the combined effect of the classes, in which all the values Customer, Location, Service Experience are summed.
  • the number can be scaled on the same scale are the class-specific weighted values described above. According to one embodiment, for example, the values of each class are summed:
  • Severity * Weight (value 0-25) and the result is divided by the number of classes, i.e. in this embodiment three.
  • the event has thus gone through all the weightings and received a final group of weighted values.
  • possible additional data can be set from stages 309, 313, 317.
  • the combination record can be taken for comparison and classification to stage 323.
  • Other dimensions can also be used in service management.
  • the other dimensions could be, for example, in a technically orientated environment Server, Database, Switch, etc.
  • the event can be, for example, an individual alarm, measurement variable, ticket, CDR, or call.
  • a malfunction is avoided by weighting events proactively in different dimensions.
  • the processing of an event of an individual customer, location, or service can be influenced by utilizing historical information.
  • New events are created on the basis of the values of the various dimensions.
  • the enrichment of an individual event is described with the aid of an example, referring to Figure 4.
  • the main dimensions are customer, location, and service, and the reaction sensitivity is divided into three time dimensions like the previous case, i.e. the dimensions are Informative, Proactive, and Reactive.
  • An individual event or event record, an alarm, measurement variable, ticket, CDR, or call can act as an impulse for performing the method.
  • the record relating to the event is taken through the system and is weighted with weighting factors of all three dimensions (customer, location, and service).
  • a severity value (Severity), which is defined permanently, i.e. is fixed, and is defined on technical grounds, is also recorded in the system for each malfunction. This fixed value is taken into account when processing the record.
  • the severity classification can contain the desired number of classes and the corresponding severity values. According to the classification, a severity value, for example, malfunction, serious malfunction, malfunction lowering the service level, etc., is thus added to the record corresponding to the event.
  • Weight is a dynamically changeable weighting factor, which can be given a value in the range 0 - 5.
  • dynamically changeable refers to the fact that the customer or service provider can also alter the weighting factor temporarily, in order to achieve the desired sensitivity.
  • a raw event containing a record arrives in the system, and is taken through the system as described in Figure 3.
  • the record is given a weighted class-specific severity value corresponding to the class in question for the event depicted by the record.
  • the class-specific severity value (Customer Experience, Location Experience, Service Experience) can be calculated, for example, by multiplying the severity value (Severity) of the event by class-specific weighting factor (Customer Weight, Location Weight, Service Weight). The following calculation operations are then performed:
  • the horizontal trigger effectively depicts the magnitude of the detriment caused by an individual events and, in this example, in can be given a value in the range 0 - 75.
  • the horizontal trigger can now be given a severity class, in order to select the necessary further processing and to facilitate evaluation.
  • the horizontal trigger is classified according to the following limit values: ⁇ If the horizontal trigger is less than 20, the severity class is Informative
  • Examples of situations for use are, for example, an important public event arranged in a specific area, for example, a trade fair, or some other event, such as a collision or official situation, due to which it is wished to monitor the service level in a specific area in greater detail than usual.
  • the Location Weight parameter can then be increased for the duration of the said event in the area in question and thus the operator's system can be made to react more sensitively to malfunction information obtained from the event area. When necessary, this facilitates the rapid allocation of resources to ensure the service level in that area.
  • Another corresponding situation for use can be, for example, customer-specific weighting.
  • An example that can be envisaged is, for example, the opening of a shop, when events relating to the customer in question can be emphasized and thus reacted to more quickly for a specific period of time.
  • a service-specific temporary increase in reaction sensitivity can also be made.
  • the reason for this can be, for example, changing a version in terminal devices, or similar.
  • Other reasons to increase the sensitivity of the system can be, for example, difficult weather conditions. No matter what the reason is, the system permits the sensitivity of the system to be adjusted by setting the weighting-factor values as desired.
  • the actions described above in connection with Figure 4 can be performed, for example, in the system of Figure 2 and by performing the method shown in Figure 3.
  • the logical structure shown in Figure 4 is implemented in the decision engines 23 shown in Figure 2.
  • the records come to be processed to the structure of Figure 4 through the online analysis module 21.
  • the location analysis engine 223 calculates the class-specific severity value according to the location (Location Experience), the service analysis engine 224 calculates the class-specific severity value according to the service (Service Experience), and the customer analysis engine 225 calculates the class-specific severity value according to the customer (Customer Experience).
  • the system can produce various event depictions and forward them through the output interface 5.
  • Customer Experience Trigger Sum of all the given Customer Experience values of the records that have gone through the system during the examination period.
  • the Customer Experience Trigger is calculated by customer, i.e. for each customer separately in the case of events concerning this customer.
  • Location Experience Trigger Sum of all the given Location Experience values of the records that have gone through the system during the examination period.
  • the Location Experience Trigger is calculated by area, i.e. for each area examined separately, on the basis of events concerning this area.
  • Service Experience Trigger Sum of all the given Service Experience values of the records that have gone through the system during the examination period. The Service Experience Trigger is calculated separately for each service examined with the aid of events concerning this service.
  • the vertical triggers can be dimensioned according to reaction sensitivity, in order to achieve the desired sensitivity and further processing.
  • the triggers and examination time are selected as required by the application.
  • the example of Figure 5 depicts some possible triggers and further actions. For example, when the counter value examined in a one-hour time window exceeds 100 (informative), a notification is sent and when the counter value exceeds 200 (proactive) actions are taken in order to avoid the damage caused by a possible malfunction.
  • Such actions can be, for example, starting a standby system for a customer, or sending standby generators to specific areas.
  • Vertical triggers can also be used, for example to determine the basic cause of malfunctions events and to determine the effect of malfunctions (customer, location, service).
  • Vertical triggers can also be used, for example, to focus notifications.
  • a sum value of a specific size can initiate, for example, a customer notification and, if the sum increases, the starting of the construction of a standby system for the customer or the sending of standby power to a specific area.
  • a malfunction ticket can also be created for an event and, at a second higher value, for example, a technician can be sent directly to the location.
  • the vertical trigger's Location Experience sum begins to increase, because in the same area several base stations are down, i.e. switched off. With the aid of the system it can be decided from this that there is an electrical fault in the area.
  • a malfunction report can be sent to the power company.
  • the functionality of the mobile-telephone network can be used to create an automatic report of power-network malfunctions and the extent of the area concerned. This permits an effective alarm system, which acts rapidly and helps the power company to locate the fault and plan repairs effectively.
  • so-called dynamic triggering is implemented.
  • the system can then be parametricized dynamically as required by the situation in all the main dimensions. Adjustment or setting of the parameters can be implemented by the action of personnel or the system can set parameters automatically according to the program.
  • the parameters are set dynamically and automatically.
  • the system parameters are then initially given their default values and the system corrects the parameters on the basis of the event information received and possibly also on the basis of other information.
  • Such a system is thus a self-learning system.
  • the procedure is that the number of events relating to each class is monitored and the corresponding class-specific weighting factors are altered on the basis of the number observed. This can be done, for example, by collecting number data over defined examination periods and comparing the number data with the numbers for previous examinations periods.
  • the examination period can be, for example, 5 minutes, 15 minutes, one hour, or 4 hours.
  • a different examination period can, of course, be freely selected.
  • the examination period can be, for example, one week, or one month, or one minute.
  • the increase of the weighting factors can be done, for example, in such a way that the class-specific weighting factors are increased by one step when it is observed that the number of received events concerning the class in question is, for example 20 % greater than the corresponding number during the previous examination period.
  • the weighting factor is decreased by one step when it is observed that the number of received events concerning the class in question is, for example, 20 % smaller than the corresponding number during the previous examination period.
  • the number of the previous examination period can also be replaced with the sliding mean value of the numbers of, for example, the previous 2 - 10 periods.
  • the sensitivity of the system can be weighted dynamically, in such a way that problem situations that are arising and growing can be detected more sensitively.
  • the system itself increases its sensitivity in the case of accumulating problems and thus in a way filters individual malfunction notifications due to random factors. For example, if, in a specific area, more malfunction events than previously begin to arise, this can be a sign of the poor condition of the system, of wrong dimensioning, or some other problem, which should be dealt with quickly to avoid breaks in operation and damage.
  • weighting factor does not, of course, exclude the possibility of giving the weighting factors new values manually.
  • a weighting factor can thus also be increased by the operatives, for example, due to some public event. After this action, the weighting factor once again begins to be controlled dynamically according to the automatic system.
  • the system can also be programmed in such a way that the event numbers of the examination periods are stored in the database and correlated and repeated patterns are retrieved from it. For example, it can be noted that in some location area there is regularly a greater number of events at a certain time of day, for example, from 8 to 9 o'clock.
  • the system can then increase the weighting factors on the basis of the observed regularities, i.e. in the case of the example, for example, increase the weighting value by one step before 8 o'clock and reduce it again after 9 o'clock.
  • the weighting factor of a class is raised automatically in response to the number of received events in a class being observed as having increased compared to previous numbers.
  • the weighting factor of the class can also be correspondingly reduced automatically in response to the number of received events in the class being observed to having decreased compared to previous numbers.
  • Each event travelling through the system can thus also alter the weighting factors to be applied to future events. Irrespective of whether the change in the weighting factor is performed automatically or manually or in both ways, the change in the weighting factors helps to set the system's sensitivity to correspond better to a new situation. For example, in the case of trade fairs, the horizontal trigger can be made more sensitive by temporarily emphasizing the customer and location dimensions relating to trade fairs by raising their weighting factors. According to one example of use, it is noticed that the data-transfer service of a mobile network gives malfunction notifications in some area.
  • a method for automatically predicting and classifying malfunction states in a communications network is thus implemented with the aid of a computer system.
  • the technical system being monitored can thus be, for example, a mobile system or a telephone network.
  • the technical system being monitored can also be some such technical system that uses a communications network as an essential component.
  • An example of such a system is a payment-terminal system, which uses a mobile-telephone system for the communications relating to the payment event.
  • records e.g., event records
  • records which contain data on an event
  • Other data too are recorded and analysed and, with their aid, weightings and limit values are formed.
  • defined classification data are picked from the records, which contain class data for each class relating to a defined event. After this, the deviation information contained in each identified record is given a severity value and this severity value is weighted with a weighting factors corresponding to the picked class data. In this way, weighted class-specific severity values are formed.
  • the weighting factors can be altered dynamically, and even so as to give a different weighting to consecutive events with the same class data.
  • a group of reference values is calculated and the reference values are compared with corresponding limit values.
  • further actions are decided according to the action classification.
  • the records are received from the network elements of a telephone and/or data communications network.
  • the classification data comprise at least the following classes: customer, location, and service.
  • the class data individuate correspondingly at least the customer related to the event, the location or location area of the event, and the service in question.
  • the reference values contain a so-called horizontal trigger value calculated for each detected record, which is formed on the basis on the weighted class- specific severity values corresponding to the record.
  • the reference values contain so-called vertical triggers calculated for each class datum of a group of first class data, which are formed on the basis of all such detected records, which are received and/or processed during a specific period of time and which contain the class datum in question.
  • the group of class data comprises at least two class data in a first class and at least two class data in a second class differing from the first. In one further embodiment of the previous embodiment, the group of class data comprises at least two class data from each of the following classes: customer, location, and service.
  • the first group of class data is the same as the group of all class data.
  • the vertical trigger is formed on the basis of weighted class-specific severity values corresponding to the class data.
  • the limit values corresponding to the vertical triggers provide the vertical triggers with malfunction classes of from one to three classes. The lowest of these malfunction classes is intended to be informative to the maintainer of the system, the middle malfunction class is intended for proactive actions to prevent the malfunction from damaging the actual service, and the upper malfunction class is intended to depict malfunctions damaging the service, which require an immediate reaction.
  • at least one weighting factor is altered for a specific period of time in order to change the reaction sensitivity of the method for this period of time in the case of the class datum in question.
  • a system and computer software are implemented in order to implement the methods described above.
  • a computer system is implemented for the automatic prediction and classification of malfunction states in a communications network.
  • the system is part of a telephone and/or communications network and is arranged to receive records from network elements of the telephone and/or communications network and to implement prediction and classification in the server system relating to the telephone and/or communications network.
  • the system is arranged to provide a user interface, through which the weighting factors can be set as desired at each moment in time.
  • the user interface can provide selected and desired functionalities not only to the maintainer of the system, but also to a customer, so that the customer can, for example, alter the weighting factors relating to its own services.
  • the system is arranged to provide a user interface, which in turn is arranged to provide class-specific windows to display the class-specific operating state of the communications network.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)
  • Telephonic Communication Services (AREA)

Abstract

Adaptive service management taking user experience into account is based on event information being received from a communications network, which is classified according to various criteria, and the deviation data contained in which, such as malfunction data, are given severity values. The severity values are weighted using weighting factors corresponding to the class data and, on the basis of their values, a group of reference values is calculated, which are compared to corresponding limit values. On the basis of the comparison, a decision can be made on possible further actions. The invention provides, among other things, an effective way to collect, process, and filter essential malfunction information from even a large group of individual event data. Thus, possible further actions can be focused and dimensioned correctly and repair measures or preventive actions can be started quickly.

Description

FAULT ANTICIPATING SERVICE MONITORING IN
A COMMUNICATIONS NETWORK
The invention relates to a method, according to the preamble of Claim 1, for automatically predicting and classifying malfunctions in a communications network, with the aid of a computer system. The invention relates particularly to the automatic prediction of malfunctions in a wireless mobile-telephone network and to classification based on the assumed effect of a malfunction.
The invention also relates to a computer system, according to the preamble of Claim 15, for a corresponding purpose.
Background Art
The background art includes, among other things, various methods and systems for monitoring communications networks.
US 7,539,752 Bl discloses a proactive and predictive system for a communications network.
EP 2 408 143 Al discloses a proactive resource- management method for virtual networks.
WO 2010/112855 Al discloses an analysis method particularly for telecommunications networks.
WO 2012/143059 Al discloses a method a method for restoring a communications network after several malfunctions.
US 2007/0222576 Al discloses the dynamic prioritization of malfunction states in a communications network, on the basis of the deterioration of services.
Disclosure of Invention
The invention is intended to create a new technical solution for monitoring a communications network. The invention is based on receiving, from the communications network, event information, which is classified according to various criteria, and the deviation data contained in which, such as malfunction data, being given severity values. The severity values are weighted with weighting factors corresponding to the class data and, on the basis of these values, a group of reference values are calculated, which are compared to corresponding limit values. On the basis of the comparison, decisions can be made on possible further actions.
More specifically, the method according to the invention is characterized by what is stated in the characterizing portion of Claim 1. The system according to the invention is, for its part, characterized by what is stated in the characterizing portion of Claim 15.
Considerable advantages are gained with the aid of the invention. The invention provides, among other things, an effective way to collect, process, and filter essential malfunction information from even a large number of individual event data. Thus, the further actions can be focused and dimensioned correctly, and repair operations or proactive measures can be started quickly.
In an embodiment, in which the collection and processing of information is done with the aid of the elements of the mobile network and/or the telephone network, the method and system can be implemented very cost-effectively and a very comprehensive geographical coverage can be achieved by means of the system. In the embodiments, the monitoring technical system can thus also contain other technical systems than the elements of the actual communications network. Thus, when monitoring a system external to the telecommunications network, the effective data-processing potentialities and extensive geographical coverage provided by the telecommunications network's systems can be exploited.
In an embodiment, in which the parameters are set dynamically and automatically, the system can be implemented to be self-learning. The system's parameters are then initially given a default value and the system corrects the parameters on the basis of the received event information and possibly also of other information. Such a system is able to weight events on the basis of previously collected information, in such a way that individual random events are filtered out and attention can be concentrated on cumulative problem points.
Brief Description of Drawings
In the following, the invention is examined with the aid of examples and with reference to the accompanying drawings.
Figure 1 shows on a general level the system environment according to one embodiment.
Figure 2 shows in greater detail the information-processing system 2 shown in Figure 1, according to one embodiment. Figure 3 shows processes according to embodiments.
Figure 4 shows the enrichment of an individual event, according to one embodiment.
Figure 5 shows the calculation of class-specific vertical triggers, according to one embodiment.
Modes for Carrying Out the Invention The system of Figure 1 comprises a data-processing system 2, which contains, among other things, given parameters 3 and a data store 4. In addition, the system comprises an input interface 1, through which the data to be processed is received from data sources 0, as well as an output interface 5, through which the information enriched during processing is forwarded, e.g., to operation control and monitoring systems, or directly as action commands to operatives.
According to a preferred embodiment, the data-processing system 2 is a data-processing system, such as a mediator system, contained in or relating to a telephone and communications network. Here, the term telephone and communications network refers, for example, to a mobile-telephone network and a landline telephone network. The term a data-processing system contained in or relating to a telephone and communications network refers, for its part, to the fact that the data-processing system also participates in the performance of actions relating to the operation of the telephone and communications network. Such actions are, for example, the collection of event data from the network elements of the telephone and communications network, processing of the event data, such as aggregation, correlation, enrichment, and pricing. Thus, in this embodiment the data-processing system contained in or relating to the telephone and communications network does not refer to a system that has only a communications link to the telephone and communications network but which has no functional connection to the measures performed by and services provided by the telephone and communications network. In such an embodiment, the service management can be very effectively implemented by exploiting the powerful servers used for controlling and monitoring the telephone and communications network. According to embodiments, these servers can, in addition to their basic tasks, thus be programmed to provide value- added services according to the embodiments.
According to the embodiments, the data sources 0 can be, for example, network elements of the telephone and communications network, as well as the terminal devices connected to the telephone and communications network.
Figure 2 shows in greater detail the data-processing system 2 shown in Figure 1, according to one embodiment. The data-processing system 2 of Figure 2 comprises, as one element, an online analysis module 21, which in turn includes a modelling engine 211 and an event analysis engine 212. The data-processing system 2 of Figure 2 also comprises, as one element, a decision engine 23, which contains the parameters 3 and data store 4 already referred to above 4. In addition, the decision engine 23 comprises a specific analysis engine for the classification data used in each embodiment. In the embodiment of Figure 2, the analysis engines are a location analysis engine 223, a service analysis engine 224, and a customer analysis engine 225. After the processing work carried out by the analysis engines, the system can produce various event depictions, of which the figure shows horizontal, informative, proactive, and reactive event depictions. These event depictions can then be forwarded through the output interface 5.
The system of Figure 2 can produce different types of event depictions. In one event depiction, the data of the original event are enriched with weighting values and additional data. Another type of event depiction is a completely new event or event depiction, which is created on the basis of the event data obtained. New events or event depictions can be created for the data of each classification, which can be, for example, Customer, Location, and Service. Each new event or event depiction can also have a malfunction class, given on the basis of threshold values, which can be, for example, Informative, Proactive, and Reactive. The malfunction class Informative is intended to inform the maintainer of the system. The malfunction class Proactive is intended for proactive actions in order to prevent an actual malfunction disturbing the service. The malfunction class Reactive is, in turn, intended to depict malfunctions disturbing the service, which demand immediate reaction.
Figure 3 shows processes, which can be implemented with the aid of the system of Figure 2 between the input interface 1 and the output interface 5. In the following, the actions are described with reference to the figure:
301 Collect and store event - In the first stage, the event depictions are collected and recorded in the system. The event depictions are received as records, which contain data on the event. 302 Known event? - In the second stage, a check is made as to whether the event depicted in the event depiction is a known event.
303 Parse all values (customer, location, services) - If the event was an event known by the system, the parameters relating to the event are picked out.
304 Values correct or missing? - The record created is examined for possible erroneous or missing data. In this way records are detected that contain defined classification data.
305 Create alert - Records containing sufficient data are taken to further processing.
306 Read event profile - At the start of further processing, the data relating to the event are read. At least the recognized records, which contain at least one deviation datum, which can be interpreted as representing a malfunction, are taken for further processing. 307 Check Prestored Location parameter - Check whether there is reason to change the weighting value of an event based on preset fixed location base value.
308 Check Location Experience value - Calculate the location experience value, i.e. more specifically form a weighted class-specific severity value for the event depicted in the record, relative to the class Location. If some location begins to accumulate exceptionally many events then its weight value increases and the weighting factor is raised with the aid of automation. The weighting value can also be manually set higher, for example, due to trade-fair event. Thus, in the method a check is made as to whether there is reason to change the weighting value of the event based on location. A new value can be defined in stage 310 at least when the matter concerns automatic control. Manual setting, on the other hand, can be implemented, for example, by altering the preset fixed location base value as desired.
309 Check new information for event - Check for possible new data. In this stage, a check is made as to whether some new information has been set in addition to the Location weight value, relating to this location, which there would be reason to add to the event's field. The additional information can be, for example, information of a public event taking place in the area. According to the embodiment, this stage can also be combined with stage 307. 310 Add value to Location Experience counter. Once this stage has been reached, the defined classification data, which contain class data for a class relating to each defined event, which in this embodiment are Customer, Location, and Service, have been picked from event information contained in the record. A severity factor has also been defined for the deviation information contained in the record and this has been weighted with a weighting factor corresponding to the class Location, and thus a weighted class-specific severity value has be obtained for the event depicted by the record, relative to the class Location. The class-specific severity value obtained is added to the counter of the class Location, the value of which can be later compared to the so-called vertical trigger of the class Location. 311 Check Prestored Service parameter - Check whether there is reason to alter the event's weight value based on the present fixed Service base value. This stage corresponds to stage 307.
312 Check Service Experience value - Perform actions corresponding to stage 308, in the case of class Service. If exceptionally many events begin to accumulate for some service, its weight value increases. The weighting factor can be increased automatically or manually, as has been described, for example in connection with stage 308. The actual definition of the value can be implemented, for example in stage 314, at least when automatic control is being used.
313 Check new information for event - Check for possible additional data. This stage corresponds to stage 309 described above and corresponding variations are also valid for stage 313. In terms of services, additional data can also be, for example, changes of version and corresponding important data relating to changes in service.
314 Add value to Service Experience counter - In this stage, events corresponding to stage 310 are made relating to the class Service.
315 Check Prestored Customer parameter - This stage corresponds to the stages 307 and 311 described above and can be implemented correspondingly in relation to the class Customer. The aforementioned comments and variation are correspondingly valid.
316 Check Customer Experience value - This stage corresponds to the stages 308 and
312 described above and can be implemented correspondingly in relation to the class Customer. The aforementioned comments and variations are correspondingly valid. An example of a special situation is, for example, a campaign organized by the customer. The possible automatic control of the value is implemented in stage 318 in a corresponding manner to that described above.
317 Check new information for event - This stage corresponds to the stages 309 and
313 described above and can be implemented correspondingly in relation to the class Customer. The aforementioned comments and variations are correspondingly valid. An example of additional information can be, for example, the special markets of a shop.
318 Add value to Customer Experience counter - This stage corresponds to the stages 310 and 314 described above and can be implemented correspondingly in relation to the class Customer. The aforementioned comments and variations are correspondingly valid.
319 Read summary of all Experience values: Customer, Location, Service - In this stage, the class-specific values of the class-specific counters are read. Stage 319 thus gives the class-specific counter values of the classes Customer, Location, and Service as the initial values. Thus each class is processed separately. This is done after each event, i.e. when the original event has travelled through the whole weighting chain. According to a second embodiment, stage 309 is performed automatically as a batch run at predefined intervals of time, or always when a predefined number of new events have arrived at the buffer. At this point attention should be paid to the time window in which matters are examined. Events can be monitored at intervals of, for example, a second, a minute, an hour, or a day and the time, so that naturally a different number of values accumulate for examination. If, on the other hand, a specific number of values are collected, the time interval to be examined varies, which must be taken into account. In one embodiment, for example, 4 different values are obtained each time from each dimensions, and also from each individual instance:
Location_Experience_sum_over l_second
Location_Experience_sum_over_l_minute
Location_Experience sum_over l hour Location_Experience_sum_over_l_day
Service_Experience_sum_over 1 second
Service_Experience_sum_over_l_minute
Service_Experience_sum_over l_hour
Service_Experience_sum_over_l_day Customer_Experience_sum_over l_second
Customer_Experience_sum_over_l_minute
Customer_Experience_sum_over l_hour
Customer_Experi ence_sum_over_ 1 _day If the application has, for instance, 100 customers, 1000 locations, and 100 services, then a huge number of new values will arise in a day:
Customer_Experi ence_sum_xx :
- 100 items _over_day
- 2400 items over_hour
- 144000 items over hour
- 8640000 items over second Location Experience sum xx.
- 1000 items _over_day
- 24000 items over_hour
- 1440000 items over hour
- 86400000 items over_second Service_Experience_sum_xx:
- 100 items_over_day
- 2400 items over_hour
- 144000 items over hour
- 8640000 items over_second
320 Threshold exceeded? - In this stage, the class-specific counter values obtained in stage 319 are compared with the corresponding class-specific vertical-trigger values. If the class-specific vertical-trigger values are exceeded, continue to stage 321. In the comparison, of course, the time window being examined is taken into account.
321 Create Location, Service or Customer event - In this stage, a class-specific event is created from each class, the class-specific vertical -trigger value of which is ascertained as having been exceeded in stage 320. The class-specific event depicts the fact that a malfunction has occurred in the class in question. Though the time windows in the examination can be of different lengths, the new event is created with the same weightings. In one embodiment, the same weighting factor as in the original event is used as the weighting in the case of each weighting (0 - 5).
In one embodiment, a fixed value, for example, 5, is used as the Severity value, so that possible values would be 0, 5, 10, 15, 20, and 25.
In another embodiment, Severity is a Prestored parameter, which can be adjusted in the range 0 - 5, and must be set separately for each class.
Of course, these numerical values are only examples. The numerical values can be selected as desired and as required by the application. The embodiment can easily be scaled, for example, to the values 0 - 75, when there will be no need for various limit values in stages 323, 326, and 329. 323 Bigger than Reactive? - In this stage check whether the reference value obtained from stage 321 is greater or equal to the threshold value of the malfunction class Reactive. If it is, go to stages 324 and 325. If the reference value is not greater than the threshold value of the malfunction class Reactive, go to stage 326.
324 Store value - The reference value is recorded in the database. 325 Create Reactive event - A new event is created, in which there is the multiple type Reactive. All the weighting values obtained along the way can be included in the event. This can be an entirely new Location, Service, or Customer-based event, or an original event, which has arrived here through the additions of all the weightings and additional data. Correspondingly, a combination value depicting the joint effect of the event obtained from stage 332 can also be examined. The event created can then be sent, for example, to the operator personnel for further action. Because this concerns a Reactive-class event, the malfunction requires immediate action and the system is intended to trigger a suitable alarm in steps. 326 Bigger than Proactive? - In this stage, check whether the reference obtained from stage 321 is greater than or equal to the threshold value of the malfunction class Proactive. If it is, go to stages 327 and 328. If the reference value is not greater than the threshold value of the malfunction class Proactive, go to stage 329. 327 Store value - The reference value is recorded in the database.
328 Create Proactive event - Create a new event, in which there is the multiple type Proactive. This corresponds to stage 325 and differs only in its malfunction class.
329 Bigger than Informative? - If this stage, a check whether the reference value obtained from stage 321 is greater than the threshold value of the malfunction class Informative. If it is, go to stages 330 and 331. If the reference value is not greater than the threshold value of the malfunction class Informative, performance of this branch of the method terminates.
330 Store value - The reference value is recorded in the database.
331 Create Informative event - A new event is created, in which there is the multiple type Informative. This corresponds to the stages 325 and 328. The difference is only of the malfunction class.
332 Enrich weighted event - A value is created depicting the combined effect of the classes, in which all the values Customer, Location, Service Experience are summed. The number can be scaled on the same scale are the class-specific weighted values described above. According to one embodiment, for example, the values of each class are summed:
Severity * Weight (value 0-25) and the result is divided by the number of classes, i.e. in this embodiment three. At this point, the event has thus gone through all the weightings and received a final group of weighted values. Thus, at this stage check the combined effect of the classes in the case of the event in question. At the same time, possible additional data can be set from stages 309, 313, 317. After this, the combination record can be taken for comparison and classification to stage 323. Thus, it is possible to examine whether the combined effect is informative, proactive, or reactive and on the basis of this create, if necessary, a new event, which includes weighting and additional data. With the aid of the embodiments, it is possible to process a large number of events and event records.
In the embodiment of Figure 3, classification takes place according to the following dimensions:
1. Main dimensions: customer, location, and service. 2. Reaction sensitivity is divided into three time dimensions: Informative, Proactive, and Reactive.
Other dimensions can also be used in service management. The other dimensions could be, for example, in a technically orientated environment Server, Database, Switch, etc.
The event can be, for example, an individual alarm, measurement variable, ticket, CDR, or call.
The embodiment provides significant advantages compared to previous known solutions:
1. A malfunction is avoided by weighting events proactively in different dimensions.
2. Real events are emphasized, as the situation is reactive and its effect can be limited. 3. External parameters can be used to affect the processing of an individual customer, location, or service.
4. The processing of an event of an individual customer, location, or service can be influenced by utilizing historical information.
5. New events are created on the basis of the values of the various dimensions. In the following, the enrichment of an individual event is described with the aid of an example, referring to Figure 4. In this embodiment, the main dimensions are customer, location, and service, and the reaction sensitivity is divided into three time dimensions like the previous case, i.e. the dimensions are Informative, Proactive, and Reactive.
An individual event or event record, an alarm, measurement variable, ticket, CDR, or call can act as an impulse for performing the method. The record relating to the event is taken through the system and is weighted with weighting factors of all three dimensions (customer, location, and service). A severity value (Severity), which is defined permanently, i.e. is fixed, and is defined on technical grounds, is also recorded in the system for each malfunction. This fixed value is taken into account when processing the record. The severity classification can contain the desired number of classes and the corresponding severity values. According to the classification, a severity value, for example, malfunction, serious malfunction, malfunction lowering the service level, etc., is thus added to the record corresponding to the event.
In the embodiment of Figure 4, Weight is a dynamically changeable weighting factor, which can be given a value in the range 0 - 5. Here, the term dynamically changeable refers to the fact that the customer or service provider can also alter the weighting factor temporarily, in order to achieve the desired sensitivity.
According to Figure 4, a raw event containing a record arrives in the system, and is taken through the system as described in Figure 3. In the method stages relating to each class, the record is given a weighted class-specific severity value corresponding to the class in question for the event depicted by the record. The class-specific severity value (Customer Experience, Location Experience, Service Experience) can be calculated, for example, by multiplying the severity value (Severity) of the event by class-specific weighting factor (Customer Weight, Location Weight, Service Weight). The following calculation operations are then performed:
Customer Weight* Severity = Customer Experience Location Weight* Severity = Location Experience Service Weight* Severity = Service Experience A so-called horizontal trigger (Horizontal Trigger) is calculated for the event depicted by the record, for example, by summing the aforementioned class-specific weighting factors. The following calculation is then performed:
Customer Experience + Location Experience + Service Experience = Horizontal Trigger The horizontal trigger effectively depicts the magnitude of the detriment caused by an individual events and, in this example, in can be given a value in the range 0 - 75.
The horizontal trigger can now be given a severity class, in order to select the necessary further processing and to facilitate evaluation. In this example, the horizontal trigger is classified according to the following limit values: · If the horizontal trigger is less than 20, the severity class is Informative
• If the horizontal trigger is less than 35, the severity class is Proactive
• If the horizontal trigger is at least 50, the severity class is Reactive
Examples of situations for use are, for example, an important public event arranged in a specific area, for example, a trade fair, or some other event, such as a collision or official situation, due to which it is wished to monitor the service level in a specific area in greater detail than usual. The Location Weight parameter can then be increased for the duration of the said event in the area in question and thus the operator's system can be made to react more sensitively to malfunction information obtained from the event area. When necessary, this facilitates the rapid allocation of resources to ensure the service level in that area.
Another corresponding situation for use can be, for example, customer-specific weighting. An example that can be envisaged is, for example, the opening of a shop, when events relating to the customer in question can be emphasized and thus reacted to more quickly for a specific period of time. Correspondingly, a service-specific temporary increase in reaction sensitivity can also be made. The reason for this can be, for example, changing a version in terminal devices, or similar. Other reasons to increase the sensitivity of the system can be, for example, difficult weather conditions. No matter what the reason is, the system permits the sensitivity of the system to be adjusted by setting the weighting-factor values as desired.
The actions described above in connection with Figure 4 can be performed, for example, in the system of Figure 2 and by performing the method shown in Figure 3. The logical structure shown in Figure 4 is implemented in the decision engines 23 shown in Figure 2. The records come to be processed to the structure of Figure 4 through the online analysis module 21. The location analysis engine 223 calculates the class-specific severity value according to the location (Location Experience), the service analysis engine 224 calculates the class-specific severity value according to the service (Service Experience), and the customer analysis engine 225 calculates the class-specific severity value according to the customer (Customer Experience). After enrichment performed by the analysis engines, the system can produce various event depictions and forward them through the output interface 5. In the following, the calculation of class-specific vertical triggers on the basis of received events is described with the aid of an example and referring to Figure 5. In other ways, the embodiment of Figure 5 corresponds to the embodiment of Figure 4 described above. The vertical triggers are formed by summing the class-specific severity values given to the events, which have been received during a specific examination period. The calculation is done, for example, using the following formulae:
• Customer Experience Trigger = Sum of all the given Customer Experience values of the records that have gone through the system during the examination period. The Customer Experience Trigger is calculated by customer, i.e. for each customer separately in the case of events concerning this customer. · Location Experience Trigger = Sum of all the given Location Experience values of the records that have gone through the system during the examination period. The Location Experience Trigger is calculated by area, i.e. for each area examined separately, on the basis of events concerning this area.
• Service Experience Trigger = Sum of all the given Service Experience values of the records that have gone through the system during the examination period. The Service Experience Trigger is calculated separately for each service examined with the aid of events concerning this service.
The vertical triggers can be dimensioned according to reaction sensitivity, in order to achieve the desired sensitivity and further processing. The triggers and examination time are selected as required by the application. The example of Figure 5 depicts some possible triggers and further actions. For example, when the counter value examined in a one-hour time window exceeds 100 (informative), a notification is sent and when the counter value exceeds 200 (proactive) actions are taken in order to avoid the damage caused by a possible malfunction. Such actions can be, for example, starting a standby system for a customer, or sending standby generators to specific areas.
Vertical triggers can also be used, for example to determine the basic cause of malfunctions events and to determine the effect of malfunctions (customer, location, service).
Vertical triggers can also be used, for example, to focus notifications. For example, a sum value of a specific size can initiate, for example, a customer notification and, if the sum increases, the starting of the construction of a standby system for the customer or the sending of standby power to a specific area. In the case of a specific sum value, a malfunction ticket can also be created for an event and, at a second higher value, for example, a technician can be sent directly to the location. According to an example of the use of one embodiment, it is noticed that the vertical trigger's Location Experience sum begins to increase, because in the same area several base stations are down, i.e. switched off. With the aid of the system it can be decided from this that there is an electrical fault in the area. Based on this observation, a malfunction report can be sent to the power company. With the aid of such an embodiment, the functionality of the mobile-telephone network can be used to create an automatic report of power-network malfunctions and the extent of the area concerned. This permits an effective alarm system, which acts rapidly and helps the power company to locate the fault and plan repairs effectively.
According to one embodiment, so-called dynamic triggering is implemented. The system can then be parametricized dynamically as required by the situation in all the main dimensions. Adjustment or setting of the parameters can be implemented by the action of personnel or the system can set parameters automatically according to the program.
According to one embodiment, the parameters are set dynamically and automatically. The system parameters are then initially given their default values and the system corrects the parameters on the basis of the event information received and possibly also on the basis of other information. Such a system is thus a self-learning system.
According to one embodiment, the procedure is that the number of events relating to each class is monitored and the corresponding class-specific weighting factors are altered on the basis of the number observed. This can be done, for example, by collecting number data over defined examination periods and comparing the number data with the numbers for previous examinations periods. For its part, the examination period can be, for example, 5 minutes, 15 minutes, one hour, or 4 hours. A different examination period can, of course, be freely selected. The examination period can be, for example, one week, or one month, or one minute.
The increase of the weighting factors can be done, for example, in such a way that the class-specific weighting factors are increased by one step when it is observed that the number of received events concerning the class in question is, for example 20 % greater than the corresponding number during the previous examination period. Correspondingly, the weighting factor is decreased by one step when it is observed that the number of received events concerning the class in question is, for example, 20 % smaller than the corresponding number during the previous examination period. In the comparison, the number of the previous examination period can also be replaced with the sliding mean value of the numbers of, for example, the previous 2 - 10 periods.
By means of such an arrangement, the sensitivity of the system can be weighted dynamically, in such a way that problem situations that are arising and growing can be detected more sensitively. Thus, the system itself increases its sensitivity in the case of accumulating problems and thus in a way filters individual malfunction notifications due to random factors. For example, if, in a specific area, more malfunction events than previously begin to arise, this can be a sign of the poor condition of the system, of wrong dimensioning, or some other problem, which should be dealt with quickly to avoid breaks in operation and damage.
The dynamic and automatic setting of the weighting factors does not, of course, exclude the possibility of giving the weighting factors new values manually. A weighting factor can thus also be increased by the operatives, for example, due to some public event. After this action, the weighting factor once again begins to be controlled dynamically according to the automatic system.
The system can also be programmed in such a way that the event numbers of the examination periods are stored in the database and correlated and repeated patterns are retrieved from it. For example, it can be noted that in some location area there is regularly a greater number of events at a certain time of day, for example, from 8 to 9 o'clock. The system can then increase the weighting factors on the basis of the observed regularities, i.e. in the case of the example, for example, increase the weighting value by one step before 8 o'clock and reduce it again after 9 o'clock. Thus, it is possible to implement a method, in which the number of received events is monitored class-specifically, and automatically alter at least one weighting factor class- specifically on the basis of a change observed in the number of received events.
In one embodiment, the weighting factor of a class is raised automatically in response to the number of received events in a class being observed as having increased compared to previous numbers. The weighting factor of the class can also be correspondingly reduced automatically in response to the number of received events in the class being observed to having decreased compared to previous numbers.
Each event travelling through the system can thus also alter the weighting factors to be applied to future events. Irrespective of whether the change in the weighting factor is performed automatically or manually or in both ways, the change in the weighting factors helps to set the system's sensitivity to correspond better to a new situation. For example, in the case of trade fairs, the horizontal trigger can be made more sensitive by temporarily emphasizing the customer and location dimensions relating to trade fairs by raising their weighting factors. According to one example of use, it is noticed that the data-transfer service of a mobile network gives malfunction notifications in some area. It can be deduced from this, for example, that the payment-terminal service in the area in question will slow down, and notification of this can be sent, for example, to taxis operating in the area and, if desired, also to other owners of payment terminals used in the area.
According to one embodiment, a method for automatically predicting and classifying malfunction states in a communications network is thus implemented with the aid of a computer system. The technical system being monitored can thus be, for example, a mobile system or a telephone network. The technical system being monitored can also be some such technical system that uses a communications network as an essential component. An example of such a system is a payment-terminal system, which uses a mobile-telephone system for the communications relating to the payment event.
In the method of the embodiment, records (e.g., event records) are received, which contain data on an event, and the records that contain defined classification data and at least one deviation datum, which can be interpreted as signifying a malfunction, are detected. Other data too are recorded and analysed and, with their aid, weightings and limit values are formed. Further, in the method defined classification data are picked from the records, which contain class data for each class relating to a defined event. After this, the deviation information contained in each identified record is given a severity value and this severity value is weighted with a weighting factors corresponding to the picked class data. In this way, weighted class-specific severity values are formed.
As has already been stated above, the weighting factors can be altered dynamically, and even so as to give a different weighting to consecutive events with the same class data. After this, in the method on the basis of the weighted class-specific severity values a group of reference values is calculated and the reference values are compared with corresponding limit values. On the basis of the comparisons, further actions are decided according to the action classification.
In one embodiment, the records are received from the network elements of a telephone and/or data communications network. In one embodiment, the classification data comprise at least the following classes: customer, location, and service. The class data individuate correspondingly at least the customer related to the event, the location or location area of the event, and the service in question. In one embodiment, the reference values contain a so-called horizontal trigger value calculated for each detected record, which is formed on the basis on the weighted class- specific severity values corresponding to the record.
In one embodiment, the reference values contain so-called vertical triggers calculated for each class datum of a group of first class data, which are formed on the basis of all such detected records, which are received and/or processed during a specific period of time and which contain the class datum in question.
In one further embodiment of the previous embodiment, the group of class data comprises at least two class data in a first class and at least two class data in a second class differing from the first. In one further embodiment of the previous embodiment, the group of class data comprises at least two class data from each of the following classes: customer, location, and service.
In one further embodiment of the previous embodiment, the first group of class data is the same as the group of all class data. In one embodiment, the vertical trigger is formed on the basis of weighted class-specific severity values corresponding to the class data.
In one further embodiment of the previous embodiment, the limit values corresponding to the vertical triggers provide the vertical triggers with malfunction classes of from one to three classes. The lowest of these malfunction classes is intended to be informative to the maintainer of the system, the middle malfunction class is intended for proactive actions to prevent the malfunction from damaging the actual service, and the upper malfunction class is intended to depict malfunctions damaging the service, which require an immediate reaction. In one further embodiment of the previous embodiment at least one weighting factor is altered for a specific period of time in order to change the reaction sensitivity of the method for this period of time in the case of the class datum in question.
According to one embodiment, a system and computer software are implemented in order to implement the methods described above.
According to one embodiment, a computer system is implemented for the automatic prediction and classification of malfunction states in a communications network.
According to one embodiment, the system is part of a telephone and/or communications network and is arranged to receive records from network elements of the telephone and/or communications network and to implement prediction and classification in the server system relating to the telephone and/or communications network.
According to one embodiment, the system is arranged to provide a user interface, through which the weighting factors can be set as desired at each moment in time. The user interface can provide selected and desired functionalities not only to the maintainer of the system, but also to a customer, so that the customer can, for example, alter the weighting factors relating to its own services.
According to one embodiment, the system is arranged to provide a user interface, which in turn is arranged to provide class-specific windows to display the class-specific operating state of the communications network. On the basis of the examples described above, it is obvious that, within the scope of the invention, numerous solutions can be implemented that differ from the embodiments described above. Thus, the invention is not intended to be limited to concern only the example described above, but the patent protection should be examined to the full extent of the accompanying Claims.

Claims

Claims:
1. A method for automatically predicting and classifying malfunction states in a communication network with the aid of a computer system, characterized by:
- receiving records from the network elements of the communications network, the records containing data on events;
- identifying records containing defined classification data and at least one deviation datum, which can be interpreted as a sign of a malfunction;
- from the identified records, picking out the defined classification data, which contain class data for each defined class relating to the event;
- giving a severity value for the deviation information contained in each identified record and weighting the severity value with a weighting factor corresponding to the class data picked out, to form class-specific severity values;
- calculating a group of reference values on the basis of the weighted class- specific severity values;
- comparing the reference values to corresponding limit values; and
- on the basis of the comparisons, deciding further actions according to an action classification.
2. The method according to Claim 1, characterized in that the communications network is a telephone network.
3. The method according to Claim 1 or 2, characterized in that
- the classification data comprise at least the following classes: customer, location, and service and
- the class data correspondingly individuate at least the customer relating to the event, the location or location area of the event, and the service in question.
4. The method according to any of Claims 1 -3, characterized in that the reference values contain a so-called horizontal trigger calculated for each identified record, which is formed on the basis of the weighted class-specific severity values corresponding to the record.
5. The method according to any of Claims 1 -4, characterized in that the reference values contains so-called vertical triggers, calculated for each class datum of the first group of class data, which are formed on the basis of all such known records, which have been received and/or processed during a specific period of time and which contain the class datum in question.
6. The method according to Claim 5, characterized in that the first group of class data comprises at least two class data in the first class and at least two class data in the second class differing from the first.
7. The method according to Claim 5 or 6, characterized in that the first group of class data comprises at least two class data from each of the following classes: customer, location, and service.
8. The method according to any of Claims 5-7, characterized in that the first group of class data is the same as the group of all class data.
9. The method according to any of Claims 5 -8, characterized in that the vertical trigger is formed on the basis of the weighted class-specific severity values corresponding to the class data.
10. The method according to any of Claims 5 -9, characterized in that the limit values corresponding to the vertical triggers give the vertical triggers a malfunction class of from one to three classes, the lowest malfunction class of which is intended to be informative to the maintainer of the system, the middle malfunction class is intended for proactive actions to prevent the malfunction from damaging the actual service, and the upper malfunction class is intended to depict malfunctions damaging to the service, which require an immediate reaction.
11. The method according to any of Claims 1 - 10, characterized in that at least one weighting factor is altered for a specific period of time, in order to alter the reaction sensitivity of the method for this period of time in the case of the class datum in question.
12. The method according to any of Claims 1 - 11, characterized in that the number of class-specific received events is monitored and at least one weighting factor is altered class-specifically on the basis of a change observed in the number of received events.
13. The method according to any of Claims 1 - 12, characterized in that the weighting factor of a class is increased automatically in response to the fact that the number of received events concerning a class is observed to have grown compared to previous numbers.
14. The method according to any of Claims 1- 13, characterized in that the weighting factor of a class is reduced automatically in response to the fact that the number of received events concerning a class is observed to have decreased compared to previous numbers.
15. A computer system for automatically predicting and classifying malfunction states in a communication network, characterized in that the system is arranged to:
- receive records from the network elements of the communications network, which contain data on events; - identify records, which contain defined classification data and at least one deviation datum, which can be interpreted as being a sign of a malfunction;
- pick out of the identified records the defined classification data, which contain class data for each defined class relating to an event;
- give a severity value to the deviation data contained in each identified record and to weight this severity value using a weighting factor corresponding to the picked class data, in order to form weighted class-specific severity values;
- calculated a group of reference values on the basis of the weighted class- specific severity values; - compare the reference values to corresponding limit values; and
- decide, on the basis of the comparisons, on further actions according to an action classification.
16. The system according to Claim 15, characterized in that it is part of a telephone and/or communications network and is arranged to receive records from the network elements of the telephone and/or communications network and to implement prediction and classification in the server system relating to the telephone and/or communications network.
17. The system according to Claim 15 or 16, c h a r a c t e r i z e d in that it is arranged to implement a method that is defined in at least one of Claims 1-14.
18. The system according to any of Claims 15-17, characterized in that it is arranged to provide a user interface, through which the weighting factors can set as desired at each moment in time.
19. The system according to any of Claims 15-18, characterized in that it is arranged to provide a user interface, which in turn is arranged to provide class-specific windows to display the class-specific operating state of the communications network.
PCT/FI2014/050651 2013-08-27 2014-08-27 Fault anticipating service monitoring in a communications network WO2015028714A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FI20135865 2013-08-27
FI20135865A FI125573B (en) 2013-08-27 2013-08-27 Adaptive management of services that take into account the disruptive effect

Publications (1)

Publication Number Publication Date
WO2015028714A1 true WO2015028714A1 (en) 2015-03-05

Family

ID=51862334

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2014/050651 WO2015028714A1 (en) 2013-08-27 2014-08-27 Fault anticipating service monitoring in a communications network

Country Status (2)

Country Link
FI (1) FI125573B (en)
WO (1) WO2015028714A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020002772A1 (en) * 2018-06-29 2020-01-02 Elisa Oyj Automated network monitoring and control
US11252066B2 (en) 2018-06-29 2022-02-15 Elisa Oyj Automated network monitoring and control
US11329868B2 (en) 2018-06-29 2022-05-10 Elisa Oyj Automated network monitoring and control

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060168170A1 (en) * 2004-10-25 2006-07-27 Korzeniowski Richard W System and method for analyzing information relating to network devices
EP2571195A1 (en) * 2010-05-14 2013-03-20 Telefónica, S.A. Method for calculating perception of the user experience of the quality of monitored integrated telecommunications operator services

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060168170A1 (en) * 2004-10-25 2006-07-27 Korzeniowski Richard W System and method for analyzing information relating to network devices
EP2571195A1 (en) * 2010-05-14 2013-03-20 Telefónica, S.A. Method for calculating perception of the user experience of the quality of monitored integrated telecommunications operator services

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ANTONIS M HADJIANTONIS ET AL: "Policy-based self-management of wireless ad hoc networks", INTEGRATED NETWORK MANAGEMENT, 2009. IM '09. IFIP/IEEE INTERNATIONAL SYMPOSIUM ON, IEEE, PISCATAWAY, NJ, USA, 30 June 2009 (2009-06-30), pages 796 - 802, XP031499173, ISBN: 978-1-4244-3486-2 *
ZHU WANG ET AL: "A Quantitative Evaluation Model of Group User Experience", COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION, 2008. PACIIA '08. PACIFIC-ASIA WORKSHOP ON, IEEE, PISCATAWAY, NJ, USA, 19 December 2008 (2008-12-19), pages 918 - 923, XP031409910, ISBN: 978-0-7695-3490-9 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020002772A1 (en) * 2018-06-29 2020-01-02 Elisa Oyj Automated network monitoring and control
US11252066B2 (en) 2018-06-29 2022-02-15 Elisa Oyj Automated network monitoring and control
US11329868B2 (en) 2018-06-29 2022-05-10 Elisa Oyj Automated network monitoring and control

Also Published As

Publication number Publication date
FI125573B (en) 2015-11-30
FI20135865A (en) 2015-02-28

Similar Documents

Publication Publication Date Title
CN112204631B (en) System and method for managing intelligent alarms
CN111983383B (en) Power system fault first-aid repair method and system
CN102447570B (en) Monitoring device and method based on health degree analysis
US9606520B2 (en) Automated fault detection and diagnostics in a building management system
CN102882745B (en) A kind of method and apparatus for monitoring business server
US7076400B2 (en) Support network
KR101094195B1 (en) Power facility management system using two dimensional code and administation server, and method for managing the same
KR20080102394A (en) Method and apparatus for dynamically prioritize network faults based on real-time service degradation
CN110690698A (en) Dynamic tolerance curve of power monitoring system
US10708155B2 (en) Systems and methods for managing network operations
WO2015028714A1 (en) Fault anticipating service monitoring in a communications network
CN102668454B (en) For providing method and the operations support systems of the performance management in mobile communication system
US10390238B1 (en) System, method, and computer program for quantifying real-time business and service impact of underperforming, overloaded, or failed cells and sectors, and for implementing remedial actions prioritization
CN113819575B (en) Control method and device for air conditioner and server
CN103675492A (en) Electricity consumption monitoring analysis method, portable electricity consumption monitoring analysis device and system
CN108900004A (en) Power supply status monitoring method, device and computer readable storage medium
CN114167156A (en) System and method for managing voltage event alerts in an electrical system
CN104915894A (en) Metering automatic terminal operation risk early warning system
WO2011056723A1 (en) System and method of management and reduction of subscriber churn in telecommunications networks
US20110264477A1 (en) Methods and a system for use of business process management for demand response
US11169498B2 (en) System and methods for a real-time power performance tracker
US20110282824A1 (en) Technologies for mapping a set of criteria
EP4002042A2 (en) Systems and methods for managing smart alarms
CN112865312A (en) Power dispatching system and power data processing method
CA3140345A1 (en) Systems and methods for managing smart alarms

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14793599

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14793599

Country of ref document: EP

Kind code of ref document: A1