AU2018214067B1 - Method and system for detecting anomalies in consumption of data and charging of data services - Google Patents

Method and system for detecting anomalies in consumption of data and charging of data services Download PDF

Info

Publication number
AU2018214067B1
AU2018214067B1 AU2018214067A AU2018214067A AU2018214067B1 AU 2018214067 B1 AU2018214067 B1 AU 2018214067B1 AU 2018214067 A AU2018214067 A AU 2018214067A AU 2018214067 A AU2018214067 A AU 2018214067A AU 2018214067 B1 AU2018214067 B1 AU 2018214067B1
Authority
AU
Australia
Prior art keywords
data
session
volume
gap
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2018214067A
Inventor
Maheedhar Bose Juvva
Sumudu Prasad Wijetunge
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cellos Software Ltd
Original Assignee
CELLOS SOFTWARE Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CELLOS SOFTWARE Ltd filed Critical CELLOS SOFTWARE Ltd
Priority to AU2018214067A priority Critical patent/AU2018214067B1/en
Publication of AU2018214067B1 publication Critical patent/AU2018214067B1/en
Ceased legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

METHOD AND SYSTEM FOR DETECTING ANOMALIES IN CONSUMPTION OF DATA AND CHARGING OF DATA SERVICES System and method for detecting anomalies in the recorded consumption of data volume and charging of data services in a communication network is described. Data records for each session may be captured from multiple sources. The data records may comprise parameters indicating usage volume pertaining to services being consumed for each session. Further, the data records may be aggregated and reconciled to detect volume gap in each session. Each session may be categorized into a session category based upon the detection of the volume gap. The data records may further be enriched by tagging each data record with the session category. The data records enriched may then be aggregated across the parameters. Finally, a root-cause parameter for the volume gap pertaining to each session may be identified by computing a total volume, a total volume gap and a probability of gap root-cause for each parameter using the aggregated data records. [To be published with Figure 1]

Description

METHOD AND SYSTEM FOR DETECTING ANOMALIES IN CONSUMPTION OF DATA AND CHARGING OF DATA SERVICES
TECHNICAL FIELD
The present disclosure, in general, relates to monitoring and analyzing large amount of data packets from high speed traffic flows in a data network, and more particularly to a method and system for detecting anomalies in data volume captured pertaining to consumption of services.
BACKGROUND
The subject matter discussed in the background section should not be assumed to be prior art merely because of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also correspond to implementations of the claimed technology.
Telecom operators have a complex order-to-cash value chain cutting across multiple systems and processes. Telecom services are one of the highly-digitized industries with networks generating humongous volume of data from multiple complex systems in a matter of a few minutes. Independent and comprehensive analysis of all the data is indispensable to assure all usage on network is reported in charging systems accurately. Such a comprehensive analysis of terabytes of data is neither possible by means of human surveillance nor by means of using conventional computing mechanisms.
A typical telecom operation consists of a long and complex chain of interrelated operations that work together to deliver telecommunication services to customers and then track the services delivered and bill the customers for the services delivered. As the set of technologies and business processes grow bigger and more complex, the chance of failure increases in each of its connections. Staying competitive in the telecommunications industry requires delivering quality voice and data services, responding rapidly to market
2018214067 09 Aug 2018 demands, and maximizing revenues without affecting the underlying network. But to achieve these goals, a company must integrate large volumes of data in multiple formats from a wide range of systems—all while juggling technological changes such as 3G expansion and 4G/LTE network rollout and consolidation.
Revenue leakage or Data leakage is a big challenge faced by the telecom operators today. A revenue leakage caused due to data discrepancies is typically attributed to when a telecom operator is unable to bill correctly for a given service or to receive the correct payment due to several reasons. As the network grows the probability of such leakages only increase. Though there are a few systems that have been proposed for monitoring data and revenue leakage by using different algorithms, however, such systems are either possible only theoretically or require adding additional load on various network elements which is not desirable by any network operator. Moreover, it is very difficult for a network administrator to trace out a fraudulent user among the various users of the telecom network or a faulty network node or policy that may lead to data leakage.
Therefore, there is a long-standing need for a system and method to automatically capture subscriber usage data from the network independently without interfering with the operator’s core network and reconcile such data with the data recorded by various nodes in the network, and charging systems to find out the revenue gaps and possible root-cause of the gaps which will help in measuring and minimizing the leakage.
SUMMARY
This summary is provided to introduce concepts related to method and system for detecting anomalies in recorded consumption of data in a communication network and the concepts are further described below in the detailed description. This summary is not intended to identify essential features of the claimed subject matter nor is it intended for use in determining or limiting the scope of the claimed subject matter.
In an example embodiment, a method for detecting anomalies in recorded consumption of data in a communication network is described. The method may comprise extracting, by a processor, a plurality of data records from a plurality of data sources for each session in the communication network. The plurality of data sources may include at least a data
2018214067 09 Aug 2018 extraction platform communicatively coupled to one or more controlling interfaces of the communication network for extracting one or more of the plurality of data records. The plurality of data records may comprise one or more parameters indicating usage volume pertaining to one or more services being consumed for each session. The method may further comprise aggregating, by the processor, the usage volume captured in the plurality of data records in order to obtain an aggregate usage volume corresponding to each data source for each session. The method may further comprise reconciling, by the processor, the aggregate usage volumes corresponding to each data source for each session in order to determine either presence or absence of a volume gap in each session. The method may further comprise categorizing, by the processor, each session into a session category based upon the determination of either presence or absence of the volume gap. The method may further comprise tagging, by the processor, the data records of a session with a session category corresponding to the session in order to obtain enriched data records for each session. The method may further comprise aggregating, by the processor, the enriched data records from the multiple data sources across the multiple parameters for each session. The method may further comprise computing, by the processor, a total volume and a probability of gap root-cause corresponding to each parameter for detecting an anomaly in the data volume captured by each parameter, wherein the total volume and the probability of gap root-cause is computed for each parameter based upon the enriched data records aggregated for each session. The method may further comprise identifying at least one parameter amongst the multiple parameters as a root-cause parameter for the volume gap for each session based upon the probability of the gap root-cause computed corresponding to each parameter of the multiple parameters.
In another implementation, a system for detecting anomalies in recorded consumption of data in a communication network is described. The system may comprise a processor and a memory coupled with the processor. The processor may execute a plurality of modules stored in the memory. The plurality of modules may further comprise a data capturing module for capturing a plurality of data records from a plurality of data sources for each session in a communication network. The plurality of data sources may further include at least a data extraction platform communicatively coupled to one or more controlling interfaces of the communication network. The plurality of data sources may be configured for extracting one or more of the plurality of data records, and wherein the plurality of data
2018214067 09 Aug 2018 records comprises one or more parameters indicating usage volume pertaining to one or more services being consumed for each session. The plurality of modules may further comprise a data aggregation and reconciliation module for aggregating the usage volume captured in the plurality of data records in order to obtain an aggregate usage volume corresponding to each data source for each session. The data aggregation and reconciliation module may further reconcile the aggregate usage volumes corresponding to each data source in order to determine either presence or absence of volume gap for each session. The plurality of modules may further comprise a data enrichment module for categorizing each session into a predefined session category based upon the determination of either presence or absence of the volume gap. The data enrichment module may further tag the data records of a session with the predefined session category corresponding to the session in order to obtain enriched data records for each session. The plurality of modules may further comprise a data analytics module for aggregating the enriched data records from the multiple data sources across the multiple parameters for each session. The data analytics module may further compute a total volume and a probability of gap root-cause corresponding to each parameter for detecting an anomaly in the data volume captured by each parameter, wherein the total volume and the probability of gap root-cause is computed for each parameter based upon the enriched data records aggregated for each session. The data analytics module may further identify at least one parameter amongst the multiple parameters as a root-cause parameter for the volume gap for each session based upon the probability of the gap root-cause computed corresponding to each parameter of the multiple parameters.
In yet another implementation, a non-transitory computer readable medium storing program for detecting anomalies in recorded consumption of data in a communication network is described. The program may further comprise instructions for extracting a plurality of data records from a plurality of data sources for each session in the communication network, wherein the plurality of data sources include at least a data extraction platform communicatively coupled to one or more controlling interfaces of the communication network for extracting one or more of the plurality of data records, and wherein the plurality of data records comprises one or more parameters indicating usage volume pertaining to one or more services being consumed for each session. The program may further comprise instructions for aggregating the usage volume captured in the
2018214067 09 Aug 2018 plurality of data records in order to obtain an aggregate usage volume corresponding to each data source for each session. The program may further comprise instructions for reconciling the aggregate usage volumes corresponding to each data source in order to determine either presence or absence of volume gap for each session. The program may further comprise instructions for categorizing each session into a session category based upon the determination of either presence or absence of the volume gap. The program may further comprise instructions for tagging the data records of a session with the predefined session category corresponding to the session in order to obtain enriched data records for each session. The program may further comprise instructions for aggregating the enriched data records from the multiple data sources across the multiple parameters for each session. The program may further comprise instructions for computing a total volume and a probability of gap root-cause corresponding to each parameter for detecting an anomaly in the data volume captured by each parameter, wherein the total volume and the probability of gap root-cause is computed for each parameter based upon the enriched data records aggregated for each session. The program may further comprise instructions for identifying at least one parameter amongst the multiple parameters as a root-cause parameter for the volume gap for each session based upon the probability of the gap root-cause computed corresponding to each parameter of the multiple parameters.
BRIEF DESCRIPTION OF DRAWINGS
The detailed description is described with reference to the accompanying Figures. In the Figures, the left-most digit(s) of a reference number identifies the Figure in which the reference number first appears. The same numbers are used throughout the drawings to refer like features and components.
Figure 1 illustrates a network implementation of system 100 for detecting anomalies in recorded consumption of data in a communication network 107, in accordance with an example embodiment of the present disclosure.
Figure 2 illustrates an analytical platform 109 and its components collectively enabling the detection of the anomalies in recorded consumption of data in a communication network, in accordance with an embodiment of the present disclosure.
2018214067 09 Aug 2018
Figure 3 illustrates a method 300 depicting steps performed by the analytical platform 109 for detecting the anomalies in recorded consumption of data in a communication network, in accordance with an embodiment of the present disclosure.
Figure 4 illustrates a flow diagram 400 depicting various data processing and data analytics stages implemented by the analytical platform 109, in accordance with an embodiment of the present subject matter.
DETAILED DESCRIPTION
Method(s) and system(s) for detecting anomalies in recorded consumption of data in a communication network are described. An automated method and system are disclosed that facilitates to capture subscriber usage data from the network independently and to further reconcile such captured data with the usage data recorded by operator’s network and charging systems in order to determine the data leakage, revenue gaps, and probable rootcause of the gaps occurred in the usage data there by enabling the telecom operators to measure and minimize usage gaps.
The system may enable the telecom operator to capture a plurality of data records from multiple data sources for each session in a communication network. The multiple data sources may comprise one or more data extraction platforms (DEP) or telecom probes, one or more network nodes and one or more charging nodes. In one embodiment, the plurality of data records may comprise a plurality of call detail records (CDRs) that are extracted from one or more network nodes and one or more charging nodes. Further, the plurality of data records may comprise a plurality of usage detail records (UDRs) extracted by the one or more data extraction platforms. Hereinafter, the call detail records (CDRs) and the usage detail records (UDRs) will be in general referred to as “data records”. The data records may comprise parameters indicating usage volume consumed for each session along with various parameters for identifying a particular user’s data session.
In an embodiment, the data extraction platform may be a monitoring probe device communicatively coupled to one or more controlling interfaces of the communication network. In an aspect, the monitoring probe may be a probe as described in US Patent No. 9306818 assigned to the current assignee which is incorporated by reference herein in its
2018214067 09 Aug 2018 entirety. The data extraction platform may be abbreviated hereinafter as a “probe” or a “telecom probe” interchangeably in the present disclosure. The probe may extract traffic flows from different controlling interfaces including LTE interfaces such as LTE Sil, S1 U, Sl-MME, S3, S6a or S10 interfaces over 10 Gbps optical links. The LTE interfaces are defined in Third Generation Partnership Project (3GPP) Technical Specifications.
The data records captured are further aggregated corresponding to each data source for each session. The system may reconcile the aggregate usage volumes corresponding to each data source in order to determine either presence or absence of volume gap for each session. The aggregation and reconciliation process may enable to analyze and compute a total volume and a probability of gap root-cause corresponding to each parameter for detecting an anomaly in the data volume captured by each parameter.
While aspects of described system and method for detecting anomalies in recorded consumption of data in a communication network may be implemented in any number of different computing systems, environments, and/or configurations, the embodiments are described in the context of the following exemplary system. Moreover, though the present disclosure describes the method and system in a 3G network however it is to be understood that similar approach is valid for other networks such as 2G, 4G, 5G, and the like without deviating from the essential steps described herein.
Referring to Figure 1, a network implementation of the system 100 for detecting anomalies in recorded consumption of data in a communication network 107 is illustrated, in accordance with an embodiment of the present subject matter. In one embodiment, the system may comprise a user device 101 which belongs to the client/customer and is the point of usage of data provided by the telecom operator. The user device 101 may be connected to the communication network 107 via a radio access network 102. Further, the system 100 may comprise data source nodes comprising a Serving GPRS Support Node 103, a Gateway GPRS Support Node 104, a charging node 105, a policy control rule function/node 106, and one or more data extraction platforms or DEPs (108a, 108b, 108c, and 108d). Further, the system 100 may comprise an analytical platform 109 for data extraction, aggregation, reconciliation, detection of gap root cause and thereby generate one or more revenue assurance reports 110.
2018214067 09 Aug 2018
In an embodiment, the analytical platform 109 may extract a plurality of data records from a plurality of sources including the one or more data extraction platforms (108a, 108b, 108c, and 108d) (hereinafter referred as a “probe 108” or “telecom probe 108” or “telecom probe node 108” interchangeably), the network nodes (103, 104) and the charging node 105. The telecom probe node 108 may be configured for extracting traffic from telecom network interfaces comprising Gn/Gp/Sl l/SlU/S4/S5/S8/Gx/Gy/Gi for 2.5G-4G traffic. Similarly, multiple such data records are captured from other data sources as described above corresponding to each session in the communication network 107. The telecom operator(s) may manage multiple network and business machines to provide services and to charge customers based upon consumption of the services by the customers. Furthermore, for charging data services, the telecom operators may employ additional systems to enforce Policy and Charging Enforcement Function (PCEF), apply Policy and Charging Rules Function rules (PCRF) for specific policies. The data recorded in these systems is critical to assure/audit revenues.
The system may be capable of supporting multiple traffic interfaces between the multiple nodes in 3G network. The multiple traffic interfaces include:
Gn- The interface between two GSNs (SGSN and GGSN) within the same public land mobile network (PLMN) in a GPRS/UMTS network.
Gp- The interface between two GSNs (roaming SGSN and GGSN) in different PLMNs. GTP is a protocol defined on the Gn/Gp interface. The GGSN is a network gateway that provides the network’s view of the usage. The GGSN operates for the interworking between the GPRS network and external packet switched networks, like the Internet and X.25 networks.
Gx- The on-line policy interface between the GGSN and the charging rules function (CRF). The Gx interface may be used for provisioning service data flow based on charging rules and further uses the diameter protocol.
Gy is interface between Online Charging System (OCS) and PCEF/GGSN/PGW (Packed Data Network Gateway). Gy interface allows online credit control for service data flow based charging.
2018214067 09 Aug 2018
Gi- IP based interface between the GGSN and a public data network (PDN) either directly to the Internet or through a WAP gateway.
In an embodiment, the data records extracted from the telecom probe and the other sources of the operator’s systems may be converted to a predefined format by the analytical platform 109. The analytical platform 109 may further implement the processes of aggregation, reconciliation and Gap-root cause analysis to generate revenue assurance reports, the details of which are further explained hereinafter as below.
Although the present subject matter is explained considering that the analytical platform 109 is implemented on a server, it may be understood that the analytical platform 109 may also be implemented in a variety of computing systems, such as a distributed system, a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, and using a variety of database software like- RDBMS (exampleOracle, Postgres), distributed file systems (example- MapR). Examples of the user devices 101 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, and a workstation.
In one implementation, the network may be a wireless network, a wired network or a combination thereof. The network can be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, and the like. The network may either be a dedicated network or a shared network. The shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like, to communicate with one another. Further the network may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like.
Referring now to Figure 2, the analytical platform 109 is illustrated in accordance with an embodiment of the present subject matter. In one embodiment, the analytical platform 109 may include at least one processor 201, an input/output (I/O) interface 202, and a memory 203. The at least one processor 201 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state
2018214067 09 Aug 2018 machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the at least one processor 201 is configured to fetch and execute computer-readable instructions stored in the memory 203.
The I/O interface 202 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface 202 may allow the analytical platform 109 to interact with a network administrator or a revenue assurance analyst using one or more devices such as a laptop computer, personal computer, smartphone, and the like. Further, the I/O interface 202 may enable the analytical platform 109 to communicate with other computing devices, such as web servers and external data servers (not shown). The I/O interface 202 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. The I/O interface 202 may include one or more ports for connecting several devices to one another or to another server.
The memory 203 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. The memory 203 may include modules 204 and data 205.
The modules 204 include routines, programs, objects, components, data structures, etc., which perform particular tasks or implement particular abstract data types. In one implementation, the modules 204 may include a data capturing module 206, a data aggregation and reconciliation module 207, a data enrichment module 208, a data analytics module 209 and other modules. The other modules may include programs or coded instructions that supplement applications and functions of the analytical platform 109.
The data 205, amongst other things, serves as a repository for storing data processed, received, and generated by one or more of the modules 204. The data 205 may include data repository 210 and other data 211. The data repository 210 may include data records captured from multiple data sources for each session in a communication network. The
2018214067 09 Aug 2018 other data 211 may include data generated as a result of the execution of one or more modules in the other modules. The telecom operator may manage multiple network and business machines to provide services and to charge the customers. The detailed functionality of the modules 204 are further described in reference with Figures 2, 3 and 4 as below.
DATA CAPTURING MODULE 206
In an embodiment, the data capturing module 206 may enable the analytical platform 109 to capture the plurality of data records from multiple data sources for each session in a communication network. The data capturing module 206 may extract information of the plurality of usage detail records by probing various interfaces such as the common interface (e.g., Sl-U interface) and the Gn interface. The data captured from the telecom probe is the source of data which may be compared with the data captured from other sources to identify session volume gaps. The analytical platform 109 may extract such data from multiple interfaces over lGb/s copper and 10 Gb/s optical links. In accordance with embodiments of the present disclosure, the data records comprise parameters indicating usage volume pertaining to one or more services being consumed for each session. The multiple data sources may comprise one or more telecom probe nodes, one or more network nodes and one or more charging nodes. The charging node may belong to a telecom operator’s charging system. The charging node may be a signal controlling node and further may be used in a communication network (such as a 3G or 4G mobile communication system) that can receive and process, track, and rate (assign a value to) packet data service and content usage requests. The charging node may further identify and communicate with a Session Description Protocol (SDP), perform session control, identify the correct device and account numbers (number portability), and create call detail records. In an example embodiment, the data records may comprise the plurality of the parameters including, but not limited to:
• Access Point Name (APN): Name of a gateway between a GSM, GPRS, 3G, 4G mobile network and another computer network, configured in mobile handset.
• Domain name • Proxy IP address: IP address of the intermediate server redirecting traffic from enduser • Protocol: Protocol being used by user e.g. SIP
2018214067 09 Aug 2018 • Sub-protocol: Specific sub-protocol being used by user e.g.: ichat, SSL • Application: End user application being accessed by user e.g. Facetime • RAT type: Access technology- 2.5G, UMTS, LTE etc. being used for data services, can change during a session • TAC: Initial eight-digit portion of the 15-digit IMEI for identification of handset model • Network node: Specific site or network node in operator's core packet network involved in the session • VPLMN: PLMN on which the mobile subscriber has roamed when leaving their
HPLMN (Home Public Land Mobile Network).
• Cell ID: Granular location identifier of mobile subscribers within a radio tower • Time Window: Time hour or slab (peak, off-peak) related to subscriber's data session • Rating Product: Categorization of the service being used by subscriber into products defined by marketing team • Destination IP address: Shortened domain name being accessed by user, e.g.
google.com, google.co.in
Referring now to Figure 3, block 1 illustrates data of each data records captured from sources comprising the telecom probe node, network node and charging node. It is to be noted that the data records herein indicate multiple data records captured by multiple sources for multiple sessions. As illustrated in figure 3, block 1 indicates the data records captured pertaining to two session IDs; Viz. Session ID-1 and Session ID-2. The multiple data records captured may be stored in the data repository 210. The data repository 210 may comprise ‘n’ number of data records captured corresponding to n number of session IDs from multiple sources.
Referring now to Figure 4 is a flow diagram 400 illustrating a data ingestion stage 401, a data normalization stage 402, a data aggregation stage 403, a data reconciliation stage 404 and a reporting stage 405 implemented by the analytical platform 109. The above data extraction steps are performed in the data ingestion stage 401 explained in detail hereinafter as below.
As shown in Figure 4, in the data ingestion stage 401, the analytical platform 109 may extract data records from the telecom probe node, the network node and the charging node. In one embodiment, the data capturing module 206 of the analytical platform 109 may extract information about data sessions including MSISDN, Charging ID, UE IP, and
2018214067 09 Aug 2018 volume by probing the controlling interface (e.g. Sl-U interface for LTE) and Gn Interface. Furthermore, only the columns containing data that is relevant for further processing and analysis by the analytical platform 109 may be ingested into the data repository 210 from the telecom probe node (Data Extraction Platform), the charging node 105, and the network nodes (103, 104). In an embodiment, the data records are obtained from the operator’s systems including the network charging nodes (103, 104) and Charging (CHG) for the same period during which the Data Extraction Platform (or the telecom probe) extracted the Sl-U and Gn information. The data extracted from the telecom probe may be stored as Usage detail records (UDRs) whereas the data records extracted from the network nodes (103, 104) and the charging node 105 are stored as call detail records (CDRs) in the data repository 210. In some examples, UDRs such as usage detail records related to data consumed for email related services, usage detail records related to data consumed for Hypertext Transfer Protocol (HTTP) related services, usage detail records related to data consumed for Session Initiation Protocol (SIP) related services, usage detail records related to data consumed by one or more web applications, and the like may be created in the data repository 210.
Referring to Figure 4, in data normalization stage 402, the analytical platform 109 may normalize the UDRs and CDRs based upon extracting, transforming and loading (ETL) framework. The data records extracted from the multiple sources is transferred to the analytical platform 109 in diverse formats and with different meanings. In some examples, the formats of the data may be machine readable such as ASN.l, Binary and the like. It must be noted that correlation among the data sources and analysis requires common (ASCII) formats and definitions. Transformation and loading functions in the ETL framework cater to this requirement.
The first step in the ETL framework is to utilize transformation functions for converting any machine data to human readable format. Decoder programs for specific encoding syntax of source may be employed to convert the data into ASCII format. Once the data is converted in to ASCII format, transformation, enrichment rules may be applied for converting data into common data definition. In one example, all time stamps may be normalized to operator’s time zone in a predefined format for comparison. The data thus
2018214067 09 Aug 2018 extracted and transformed into common definition may further be loaded into analytical data store (e.g. the data repository 210) for analysis.
In an embodiment, the analytical platform 109 may normalize the UDRs and CDRs to generate one or more records such as normalized probe records, normalized network records, normalized charging records, and the like. Such records may be stored in the data repository 210 in form of a table or as any data structure that enables efficient processing and analysis.
In an embodiment, the multiple CDRs and/or UDRs may comprise multiple parameters indicating usage volume captured corresponding to each session. The multiple parameters may be identified as common parameters or uncommon parameters. The common parameters herein indicate the parameters that are captured by each of the multiple sources. Whereas, the uncommon parameters indicate the parameters that are captured by at least one of the multiple sources but not captured by at least one other source of the multiple sources.
In one embodiment, the parameters may be designated as the common parameters or the uncommon parameters based upon the parameters captured via the telecom probe node. That is at least one parameter captured by the telecom probe node which is also captured by the other sources (viz. the network node and the charging node) may be categorized as the common parameter. Whereas, at least one parameter captured by the telecom probe node but not captured by at least one of the other sources (i.e. the network node and the charging node) may be categorized as the uncommon parameter. Table 1 below illustrates commonality across different data sources in accordance with an embodiment of the present disclosure. As can be seen, the Table 1 comprises common and uncommon parameters further segregated into consistent and inconsistent parameters. It must be noted that the consistent parameters herein indicate parameters which remain consistent throughout a session. Whereas, the inconsistent parameters indicate the parameters which remain inconsistent throughout a session.
Consistency within session
Commonality across data sources Consistent parameter(s) Inconsistent parameter(s)
2018214067 09 Aug 2018
Common parameter(s) Access point name, VPLMN, Charging characteristic, Network node RAT Type, Time window, Cell ID
Uncommon parameter(s) IMEI TAC (Gn, CHG, optional in CHG) Domain name (Gn), Application (Gn), Protocol (Gn), Sub-protocol (Gn), Proxy IP (Gn), Destination server IP, (Gn) Destination port(Gn), Rating product (NET, CHG)
Table 1: Commonality amongst parameters captured via different sources
In an example, Tables 2-4 below illustrates sample data (UDRs/CDRs) captured from the telecom probe, the network node and the charging node corresponding to different sessions.
Session-I
Data Record attributes/parameters Probe UDR-1 Probe UDR-2 Charging CDR-1
MSISDN +911234567890 +911234567890 +911234567890
CHARGING ID 98765 98765 98765
SESSION START TIME 12:06:30 12:06:30 12:06:30
Data Record START TIME 12:06:30 12:06:30 12:06:30
Data Record END TIME 12:08:30 12:08:30 12:08:30
SESSION END TIME - 12:08:30 12:08:30
APN wap.telco.com wap.telco.com wap.telco.com
GGSN IP ADDRESS 51.61.71.81 51.61.71.81 51.61.71.81
IMEI 987654321012345 987654321012345 987654321012345
TAC 98765432 98765432 98765432
VPLMN 12345 12345 12345
CHARGING CHARACTGERISTIC 400 400 400
RAT TYPE 3G 3G 3G
PROTOCOL HTTP GOOGLE TALK HTTP
SUB PROTOCOL VIDEO GENERIC
APPLICATION GMAIL FACEBOOK
DOMAIN NAME gmail.com 0.facebook.com
2018214067 09 Aug 2018
PROXY SERVER IP ADDRESS 191.291.391.491 191.291.391.491
PROXY IP RANGE 191.291. 191.291.
LAC 78 78
CELL ID 45678 45678
RATING PRODUCT 1
UPLINK VOLUME 3145728 1048576
DOWNLINK VOLUME 7340032 9437184
TOTAL VOLUME 10485760 10485760 10484711
Table 2: Data records captured from telecom probe and charging node for session I
Session-II
Data Record attributes/parameters Probe UDR-3 Probe UDR-4 Charging CDR-2
MSISDN +911234567891 +911234567891 +911234567891
CHARGING ID 54321 54321 54321
SESSION START TIME 12:10:30 12:10:30 12:10:30
Data Record START TIME 12:10:30 12:10:30 12:11:30
Data Record END TIME 12:12:30 12:12:30 12:12:30
SESSION END TIME - 12:12:30 12:12:30
APN intemet.telco.com mtemet.telco.com intemet.telco.com
GGSN IP ADDRESS 51.61.71.81 51.61.71.81 51.61.71.81
IMEI 987654321012333 987654321012333 987654321012333
TAC 98765432 98765432 98765432
VPLMN 12345 12345 12345
CHARGING CHARACTGERISTIC 400 400 400
RAT TYPE 4G 4G 4G
PROTOCOL HTTP HTTP
SUB PROTOCOL GENERIC GENERIC
APPLICATION FACEBOOK FACEBOOK
DOMAIN NAME facebook.com 0.facebook.com
PROXY SERVER IP ADDRESS - -
PROXY IP RANGE - -
LAC 97 97
2018214067 09 Aug 2018
CELL ID 1234 1234
RATING PRODUCT 1
UPLINK VOLUME 3145728 1048576
DOWNLINK VOLUME 17825792 7340032
TOTAL VOLUME 20971520 8388608 20971516
Table 3: Data records captured from telecom probe and charging node for Session II
Session-Ill
Data record attributes Probe UDR-5 Probe UDR-6 Charging CDR-3
MSISDN +911234598765 +911234598765 +911234598765
CHARGING ID 98765 98765 98765
SESSION START TIME 12:16:20 12:16:20 12:17:20
Data record START TIME 12:16:20 12:16:20 12:17:20
Data Record END TIME 12:19:25 12:19:25 12:18:25
SESSION END TIME - 12:19:25 12:18:25
APN wap.telco.com wap.telco.com wap.telco.com
GGSN IP ADDRESS 51.61.71.81 51.61.71.81 51.61.71.81
IMEI 987654321012444 987654321012444 987654321012444
TAC 98765432 98765432 98765432
VPLMN 12345 12345 12345
CHARGING CHARACTGERISTIC 400 400 400
RAT TYPE 3G 4G 3G
PROTOCOL HTTP GOOGLE TALK HTTP
SUB PROTOCOL VIDEO GENERIC
APPLICATION GMAIL FACEBOOK
DOMAIN NAME gmail.com facebook.com
PROXY SERVER IP ADDRESS - 191.291.391.491
PROXY IP RANGE - 191.291.
LAC 97 78
CELL ID 1234 45678
RATING PRODUCT 1
UPLINK VOLUME 3145728 4194304
DOWNLINK VOLUME 12582912 22020096
2018214067 09 Aug 2018
TOTAL VOLUME| 15728640| 26214400| 41943040 Table 4: Data records captured from telecom probe and charging node for session III
The Data records along with their parameters categorized as common and uncommon parameters may be further processed via the data aggregation and reconciliation module 207 (as shown in Figure 2), the details of which are explained as below.
DATA AGGREGATION AND RECONCILIATION MODULE 207
Referring now to Figure 2, the data aggregation and reconciliation module 207 may further enable the analytical platform 109 to aggregate the usage volume captured via the UDRs/CDRs, in order to obtain an aggregate usage volume corresponding to each data source for each session. The data recorded by operator’s systems (e.g. the network nodes and the charging node) may facilitate monitoring network usage and charging the usage which are the key to revenue assurance. Data sessions, unlike voice calls, may last longer (up to weeks) in an always-on mode. Operator’s network, charging systems may generate CDRs based on individual system definition and optimization, resulting in multiple partial CDRs for a single session.
It is to be noted that aggregation is the process of adding up multiple related UDRs/CDRs from each source comprising the network node, the charging node and the telecom probe node to prepare source wise summary records. The aggregation may be executed in many levels by using proper session identifier. The aggregation levels may further comprise IPCAN session level, bearer level, and service data flow level. The detailed description of the aggregation and reconciliation is further explained in the data aggregation stage 403 and the data reconciliation stage 404 as illustrated in Figure 4.
Referring now to figure 4, in the data aggregation stage 403, the analytical platform 109, via the data aggregation and reconciliation module 207, may aggregate the normalized data records pertaining to various sources, wherein the data records normalized are aggregated session wise on time-to-time basis. Such time-to-time based aggregation may be executed either daily or weekly or as per the specific requirements of the system. In order to avoid
2018214067 09 Aug 2018 computational performance issues, the aggregation of an entire day’s data may be performed in two steps.
Step 1 - >Pre-Aggregation: Data is aggregated at pre-determined intervals to generate multiple intra-day aggregates.
Step 2 - >End-of-day Aggregation: All the pre-aggregated data for the day is aggregated at the end of the day.
In an embodiment, all the partial CDRs and the UDRs from the normalized data records associated with a single data session may be aggregated into aggregated data records corresponding to the data extraction platforms, the network elements, and the charging nodes.
Referring to Fig. 4, in the data reconciliation stage 404, the analytical platform 109, via the data aggregation and reconciliation module 207, may reconcile the data pertaining to a plurality of CDRs /UDRs belonging to each session. The reconciliation process is further explained in detail hereinafter.
The data aggregation and reconciliation module 207 may use the aggregation records including aggregate data records corresponding to the data extraction platforms, aggregate data records corresponding to the network elements, and aggregate data records corresponding to the charging nodes for reconciling data records of the same data session across all the multiple sources (i.e., Telecom Probes, Network nodes, and Charging nodes). The key for reconciling the same session across the above aggregation tables is the session’s MSISDN along with a Charging ID (for 2G, 3G) and the session’s MSISDN along with UEIP (for LTE). The result of this reconciliation stage is the generation of the three-way reconciliation record. It must be noted that the totals across all the sources for each MSISDN and Charging Id combination are reconciled in the three-way reconciliation record. Based upon the reconciliation output, the analytical platform 109 may calculate missing, underreported and overcharged data in the operator’s charging system. Specifically, the data aggregation and reconciliation module 207 may further provide the
2018214067 09 Aug 2018 reconciled output to the data analytics module 209 for identifying the data gaps/discrepancies, the details of which are further explained in the subsequent paragraphs.
In one embodiment, the data aggregation and reconciliation module 207 may implement a three-way reconciliation process for reconciling the data records. It is to be noted that the telecom probe is the Primary, reliable, and independent source of data records, and therefore the aggregated data from the Data Extraction Platform (i.e. the telecom probe) is reconciled with the operators charging CDRs’ data and Aggregated CDRs data obtained from network nodes. The aggregated data of the Data Extraction Platform, the network nodes, and the charging nodes for the derived event date is reconciled based on MSISDN and CHARGINGID. The selection criteria for this first level reconciliation is all the aggregated data for a specified ‘Event Date’ for the telecom probe’s data records, network node’s data records and the charging node’s data records. The matching criteria for this first level reconciliation is an Event Date, Charging Id, and MSISDN. The results of this three-way reconciliation are stored in a table within the data repository 210.
Based upon the reconciled data obtained for each parameter for each session, the data aggregation and reconciliation module 207 enables determining the presence or absence of the volume gap recorded for each session. For example, for the parameter APN as illustrated in Figure 3, based on the reconciled data, if there is gap in capturing of usage volume by one of the multiple sources, then the data aggregation and reconciliation module 207 may determine that there is a presence of volume gap for the session corresponding to session-ID. Similarly, for each of the parameters captured from multiple sources, aggregation and reconciliation process is implemented to confirm the presence of the volume gap for each session depending on the volume gap identified for each parameter based on the reconciled volume usage data of each parameter. The results of the reconciliation are further provided to the data enrichment module 208 and the data analytics module 209 for performing data analytics with respect to anomalies detected in the usage volume and root-causes therefor, the details of which are hereinafter explained as below.
DATA ENRICHMENT MODULE 208
2018214067 09 Aug 2018
Referring now to figure 2, the data enrichment module 208 may further enable the analytical platform 109 to categorize each session into a predefined session category based upon the determination of the presence or absence of the volume gap. In one embodiment, the predefined session category is one of a matched session, a missing session, an underreported session or an over-reported session. It must be understood that the matched session is a session present in the data records of each of the data sources including data Extraction Platform, the network node and the charging node and the captured volume at each of these data sources is matched. Further, the missing session is a session which is present in the data records of the data extraction platform but missing from the data records of the network node or the charging node (or both). Further, the underreported session is a session present in the data records of all the sources including the Data Extraction Platform, the network node and charging node but the volume reported in the data records of the network node or the charging node (or both) is lower than the volume reported by the data records of the Data Extraction Platform. Furthermore, the overreported session is a session present in the data records of all the sources including the Data Extraction Platform, the network node and charging node but the volume reported in the data records of the Data Extraction Platform is higher than the volume reported by the network node or the charging node (or both).
As illustrated in Figure 3, at block 3, each of the session is categorized into the session category depending on determination of presence of volume gap corresponding to at least one of the parameter based upon the reconciled usage data obtained for the said parameter. For example, referring to figure 3, for the session with session ID-1, there is a volume gap identified corresponding to the parameter APN whereas there is no volume gap identified for other parameters corresponding to session ID-1, still the session with session ID-1 is categorized into the under-reported session category or the missing session category depending on the under-reported volume or missing volume reported from at least one of the sources corresponding to the parameter APN, as the case may be. Further, the data enrichment module 208 may tag the data records of each session with the predefined session category corresponding to the said session in order to obtain enriched data records for each session. For example, referring to figure 3, depending upon the session category assigned to each session, at blocks 8 and 9, the respective data records captured for the
2018214067 09 Aug 2018 respective session are tagged with the respective session category to obtain enriched data records for the respective session.
In one example, table 5 below illustrates the results of aggregation, reconciliation and tagging of data records depicted in tables 2-4 pertaining to different sessions I, II and III. As can be seen from the table 5, the Session-I is depicted as “under reported” since the volume reported by the aggregated probe data records and the aggregated charging data records for the Session-I is 20971520 data units and 10484711 data units, respectively, thereby leading to a volume gap of 10486809 data units for the Session-I. Similarly, as shown in Table 5, the Session II and Session III are tagged as “underreported” and “matching” respectively based on the aggregation and reconciliation process.
Data Record attributes/parameters Session-I Session-II Session-Ill
Session Date 2017-12-01 2017-12-01 2017-12-01
MSISDN +911234567890 +911234567891 +911234598765
CHARGING ID 98765 54321 98765
Exists in Probe Yes Yes Yes
Exists in Charging Yes Yes Yes
APN wap.telco.com internet.telco.co m wap.telco.com
GGSN IP ADDRESS 51.61.71.81 52.62.72.82 51.61.71.81
IMEI 98765432101234 5 88745632105432 1 90065432106889 1
TAC 9876543 8874563 9006543
VPLMN 12345 34567 12345
CHARGING CHARACTGERISTI C 400 500 400
TOTAL VOLUME PROBE 20971520 29360128 41943040
TOTAL VOLUME CHARGING 10484711 20971516 41943040
VOLUME GAP 10486809 8388612 0
SESSION TAG Under reported Under reported Matching
Table 5: Tagging of sessions I, II and III into matched, missing, underreported, overreported based on aggregation & reconciliation results of data records of tables 2-4.
It must be noted that the volume gap in terms of under-reported or over-reported or missing sessions is determined by performing parameter-wise aggregation and reconciliation analysis as illustrated in figure 3. As shown, block 2 enables aggregating the volumes captured corresponding to each parameter by each source to obtain aggregate
2018214067 09 Aug 2018 volume corresponding to the said parameter. Further, at block 3, the aggregate volume is reconciled for each parameter across multiple sources. In one example, as shown at blocks 2-3, the usage volume for session ID-1 pertaining to parameter APN across the different sources is aggregated and reconciled to obtain reconciled usage data for the parameter APN. It is to be noted that the aggregation and reconciliation process as described above is applicable in order to perform parameter-wise aggregation and reconciliation analysis. The enriched data records may be processed by the data analytics module 209 in order to perform the data analytics on the enriched data records, the details of which are further explained hereinafter as below.
DATA ANALYTICS MODULE 209
Referring now to Figure 2, the data analytics module 209 may further enable the analytical platform 109 to determine the actual volume gap and the probability of gap based upon the aggregation and reconciliation results obtained corresponding to each of the parameters across the multiple sources. The data analysis performed by the data analytics module 209 is two-fold. The first-level analysis is with respect to common parameters as illustrated in blocks 4-7 of figure 2. Whereas, the second-level analysis is with respect to all parameters (either common or uncommon), wherein the enriched data records are utilized to determine the volume gap and the probability of gap.
In the first-level analysis, the data analytics module 209 may initially determine total gap summary based upon volume gap identified from reconciled data of each parameter across multiple sources for the respective session. In one example, as shown in Figure 3, at block 4, the data analytics module 209 may determine gap summary for the session-ID 1, session-ID 2 based upon the reconciled data obtained corresponding to each parameter across multiple sources at block 3. Further, at block 5, the data analytics module 209 may generate date-wise gap reports for each session identifying the total gap recorded for each session. Further, at block 5, the data analytics module 209 may determine total potential leakage in each session based upon the total gap recorded for each session.
2018214067 09 Aug 2018
Further, in the first-level analysis, the data analytics module 209 may perform analysis of volume gap and probability of gap for each parameter categorized as common parameter. The parameter-level analytics is performed in order to determine root-cause parameter for the volume gap detected and the probability of root-cause for the said parameter. The parameter-level analytics is performed as gap profiling for common parameters/dimensions as shown in block 6. Further, at block 7, root cause analysis for each common parameter is performed, wherein the total volume, total gap and the probability of root-cause for the gap for each common parameter is computed. The probability of gap is computed based upon division of the total gap and the total volume determined corresponding to each common parameter based upon the reconciled data obtained at block 3 for the said common parameter.
In the second-level analysis, the data analytics module 209 may aggregate the enriched data records of the multiple sources across the multiple parameters for each session. As illustrated in Figure 3, at block 10, the enriched data records (tagged with session category) are aggregated across multiple sources. Further, at block 11, the aggregated enriched data records are processed to perform all parameter/dimensional (both common/uncommon parameters) level analysis in order to determine the volume gap and root-cause parameters for the volume gap recorded for the multiple sessions.
Based upon the aggregated data records for each session, the data analytics module 209 may further compute a total volume and a probability of gap root-cause corresponding to each parameter for detecting an anomaly in the data volume captured by each parameter. The data analytics module 209 may further be enabled to identify at least parameter of the multiple parameters as a root-cause parameter for the volume gap for each session based upon the probability of the gap root-cause computed corresponding to each parameter of the multiple parameters.
It is to be noted that the reconciliation may be performed at a session level to detect volume gaps among multiple sources. Further, it is extremely essential to establish or identify a root-cause for the gap to plug leakage. A limiting point in this step is availability of same sensitive parameters in all the data sources. Therefore, the data analytics module 209 performs the gap root-cause analysis step based upon profiling the volume gap by
2018214067 09 Aug 2018 critical parameters captured in the Telecom probe data records. A gap probability by individual and combination of parameters is calculated resulting in guiding the root-cause analysis. The higher the probability, the more likely a parameter is causing gap.
In one embodiment, based upon the second-level analysis across all the parameters, the data analytics module may compute the total volume, volume from missing session, volume from under reported session and probability of session gap of each parameter. It must be noted that the probability of root cause may be calculated and reported by individual and combination of parameters as:
Probability of gap root-cause = (Volume from missing sessions + Volume from gap sessions)/Total volume
Table 6 depicts the parameters considered to detect gap root-cause for aggregated data records corresponding to a session.
Reconciliation Object Telecom Probe UDRs Gap profile parameters Probability of gap
9 UDRs flagged with Parameter (Y) APN Domain Parameter (Y) Total volume Volume from
Sessio n Volu me gap Categ ory reconciliation category name Proxy IP address missing sessions Volume from under reported
Sessio nA 10956 8 Under report ed Protocol Sub- protocol sessions Probability % of parameter
Application RAT type TAC Network node VPLMN Cell ID causing gap
Table 6: Parameters considered for gap profiling and determining probability of gap
As shown in Figure 3, for each of the parameters (Viz. APN, URL etc.) the total volume captured corresponding to multiple sessions, volume from missing sessions, volume from under-reported sessions is determined. Thereafter, the probability of root-cause for the gap
2018214067 09 Aug 2018 of each parameter is computed to identify the root-cause parameters resulting in the volume gap in the multiple sessions.
Referring to Figure 4, in the reporting stage 405, the analytical platform 109 may generate one or more revenue assurance reports 110 that are displayed to the users for identifying data discrepancies and thereby determining revenue leakages. The data is stored in a format that can be readily consumed to generate the reports using various visualization tools and graphic user interfaces. The revenue assurance reports 110 generated may provide visualized information depicting volume discrepancies, number of sessions, and number of subscribers in each of the above categories (i.e. missing, matching, under reporting and over reporting) are further analyzed for revenue assurance reporting purposes. In an aspect, the revenue assurance report may include information related to attributes such as one or more of a volume gap corresponding to one or more sessions, a total volume gap, a total data volume captured, total number of sessions, estimate revenue leakage, total number of subscribers contributing to revenue leakage, top Access Point Names contributing to revenue leakage, top Uniform Resource Locators contributing to revenue leakage, top Applications contributing to revenue leakage, Hourly trend of network traffic, Hourly trend of revenue leakage, Protocol, Proxy IP addresses, User location, Destination IP address, Charging characteristic^ total revenue leakage corresponding to the at least one parameter identified as the root-cause parameter, and the like.
In an embodiment, the DPI (deep packet inspection capability) of the analytical platform 109 may enable generating additional business insights including but not limited to total data volume captured, total number of sessions, total number of subscribers, top 20 APNs, top 20 URLs, top 20 applications used in operator’s network, and hourly trend of network traffic.
In an example implementation of the described methods and systems, a Deep Packet Inspection (DPI) Probe was setup to independently validate the completeness of charging of data session in a Charging Control Node (CCN). A reconciliation between DPI Probe and CCN data records was performed for data transactions pertaining to a predetermined time interval (e.g. say entire day). Reconciliation revealed that the data captured via the
2018214067 09 Aug 2018
DPI Probe was greater than the data captured via the CCN. Further analysis indicated that certain subscribers fraudulently modified domain name of various URLs being accessed to a domain that was not configured to trigger CCN for airtime depletion by the telecom operator and were able to browse the internet for free. This fraudulent bypass allowed the subscribers to surf the internet without a corresponding depletion of main or dedicated account balance leading to a loss in revenues. In a single instance, it was observed that a subscriber had performed downloads and uploads cumulative to 4.2 GB as captured by the DPI Probe against a charged volume in CCN of 2.1 KB clearly indicating a fraudulent activity.
In accordance with various aspects of the present disclosure, the described system and method for detecting anomalies in recorded consumption of data in a communication network may be implemented, but not limited, to following use cases.
In one exemplary embodiment, the described system may be used for detecting abusing of the policy services by the corporate users. For example, the corporate users may be using personal/banned/restrictive services (such as Torrent or porn websites on enterprise APNs that are discounted) thereby abusing the service contract. The system of the present disclosure may indicate the usage of such personal/banned/restrictive services as bandwidth reselling and thereby alert the relevant personal. In another example of the abusing of the policy services, the system may enable in detection of the restricted operator portal gateway IP’s being used for other domains. For example, the system may locate and/or identify the Reverse Proxy IP for WAP portal in external websites and for external application usages.
In another exemplary embodiment, the system of the present disclosure may enable in VPN bypass detection. For example, third party applications like Psiphon (accessed through Opera Mini, Psiphon browser) may allow VPN based routing of traffic, and settings in the policy control functions may allow this usage to bypass the charging system. The present system may enable in detection of such bypass of the charging system.
In yet another exemplary embodiment, the system may enable outlier usage profiling wherein abnormal/high risk usage may be profiled to ascertain risk associated with the
2018214067 09 Aug 2018
MSISDN. For example, in case the DNS traffic > 40% of the total usage, the system may refer the specific pattern of protocols being used to SIM box numbers.
In yet another exemplary embodiment, the system may independently validate usage of certain protocols and/or applications, controlled by DPI systems in Policy and Charging Control (PCC) systems, which otherwise cannot be detected.
In yet another exemplary embodiment, the system may enable profiling of gaps. Specifically, the system may enable profiling of network to charging volume difference to determine a specific protocol, an application, a proxy address, and domain name that may cause the gap. For example, the system may enable to profile usage over '127.0.0.1' proxy having 80% probability of not getting charged.
In yet another exemplary embodiment, the system may enable detection of CDR failure/suppression at a network. Specifically, the system may validate suppression/bypass rules at network nodes by independently probing records traffic of GGSN/EPG CDRs. For example, the system may validate the suppression/bypass rules for CDRs not generated for M2M traffic over corp. ABC APN. In another example, the system may detect, using an independent DPI probe, a root cause for volume difference between GGSN and charging due to mishandling at Policy control enforcing function (PCEF) gateway.
In yet another exemplary embodiment, the system may validate, using the DPI probe, recording and rounding off a volume captured in the communication network. The volume recording in GGSN is considered as a standard for Network billing purposes.
In yet another exemplary embodiment, the system may enable validation of a SIM box for abnormal usage. Specifically, the system may analyze MSISDN from the SIM box for abnormal machine-like usage in contrast to conventional methods/systems which restricts the SIM box analysis to detect the circuit switched usage. For example, the system may enable analysis of SIM box configured with automated browsing of URL-xyz and protocol ‘abc’ only.
In still another exemplary embodiment, the system may enable monitoring usage of data from blacklisted users/subscribers. Specifically, the system may thoroughly analyze the
2018214067 09 Aug 2018 parameters depicting usage from suspicious subscribers. For example, the system may enable monitoring/tracking of a destination server IP/PABX being accessed for a bruteforce attack.
Although implementations for method and system for detecting anomalies in recorded consumption of data in a communication networkhave been described in language specific to structural features and/or methods, it is to be understood that the appended claims are not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as examples of implementations for detecting anomalies in recorded consumption of data in a communication network.

Claims (14)

  1. CLAIMS:
    1. A method for detecting anomalies in recorded consumption of data in a communication network 107, the method comprising:
    extracting, by a processor 201, a plurality of data records from a plurality of data sources for each session in the communication network 107, wherein the plurality of data sources include at least a data extraction platform 108 communicatively coupled to one or more controlling interfaces of the communication network 107 for extracting a data record of the plurality of data records, and wherein the plurality of data records comprise one or more parameters indicating usage volume pertaining to one or more services being consumed for each session;
    aggregating, by the processor 201, the usage volume captured in the plurality of data records in order to obtain an aggregate usage volume corresponding to each data source for each session;
    reconciling, by the processor 201, the aggregate usage volumes corresponding to each data source for each session in order to determine either presence or absence of a volume gap in each session;
    categorizing, by the processor 201, each session into a session category based upon the determination of either presence or absence of the volume gap;
    tagging, by the processor 201, the data records of a session with a session category corresponding to the session in order to obtain enriched data records for each session;
    aggregating, by the processor 201, the enriched data records from the multiple data sources across the multiple parameters for each session;
    computing, by the processor 201, a total volume and a probability of gap root-cause corresponding to each parameter for detecting an anomaly in the data volume captured by each parameter, wherein the total volume and the probability of gap root-cause is computed for each parameter based upon the enriched data records aggregated for each session; and identifying at least one parameter amongst the multiple parameters as a root-cause parameter for the volume gap for each session based upon the probability of the gap root-cause computed corresponding to each parameter of the multiple parameters.
    2018214067 09 Aug 2018
  2. 2. The method of claim 1, wherein the plurality of data sources further comprises one or more network nodes (103, 104), a charging node 105 and a policy controlling node 106, and wherein the one or more controlling interfaces comprises traffic interfaces or gateways.
  3. 3. The method of claim 2 further comprising normalizing, by the processor 201, the plurality of data records to generate normalized data records using an extraction, transforming and loading (ETL)framework.
  4. 4. The method of claim 1, wherein the usage volume of sessions is aggregated at a level selected from at least one of an IP-CAN session level, a bearer level and a service data flow level.
  5. 5. The method of claim 1, wherein the session category is one of a matching session, a missing session, an under-reported session and an over-reported session.
  6. 6. The method of claim 1, wherein the multiple parameters comprise Access Point Name, domain name, proxy IP address, Protocol, Sub-protocol, Application, radio access technology type, type allocation code, Network node, visited public land mobile network and cell ID.
  7. 7. The method of claim 1, wherein the probability of gap root-cause for a parameter is computed based upon the total volume and the volume gap from one or more sessions with a category corresponding to either of the missing session, an under-reported session and over-reported session computed corresponding to the said parameter.
  8. 8. The method of claim 7, wherein the gap volume for one or more sessions is at least one of a missing volume, an under-reported volume and an over-reported volume.
  9. 9. The method of claim 1 further comprising generating a report depicting one or more of a volume gap corresponding to one or more sessions, a total volume gap, a total data volume captured, total number of sessions, estimate revenue leakage, total number of subscribers contributing to revenue leakage, top Access Point Names contributing to
    2018214067 09 Aug 2018 revenue leakage, top Uniform Resource Locators contributing to revenue leakage, top Applications contributing to revenue leakage, Hourly trend of network traffic, Hourly trend of revenue leakage, Protocol, Proxy IP addresses, User location, Destination IP address, Charging characteristic, and a total revenue leakage corresponding to the at least one parameter identified as the root-cause parameter.
  10. 10. A systemlOO for detecting anomalies in recorded consumption of data in a communication network 107, the system comprising:
    a processor 201; and a memory 203 coupled with the processor 201, wherein the processor 201 executes a plurality of modules 204 stored in the memory 203, the plurality of modules 204 comprising:
    a data capturing module 206 for capturing a plurality of data records from a plurality of data sources for each session in a communication network 107, wherein the plurality of data sources include at least a data extraction platform 108 communicatively coupled to one or more controlling interfaces of the communication network 107 for extracting a data record of the plurality of data records, and wherein the plurality of data records comprise one or more parameters indicating usage volume pertaining to one or more services being consumed for each session;
    a data aggregation and reconciliation module 207 for aggregating the usage volume captured in the plurality of data records in order to obtain an aggregate usage volume corresponding to each data source for each session, and reconciling the aggregate usage volumes corresponding to each data source in order to determine either presence or absence of volume gap for each session;
    a data enrichment module 208 for categorizing each session into a predefined session category based upon the determination of either presence or absence of the volume gap, and
    2018214067 09 Aug 2018 tagging the data records of a session with the predefined session category corresponding to the session in order to obtain enriched data records for each session; and a data analytics module 209 for aggregating the enriched data records from the multiple data sources across the multiple parameters for each session, computing a total volume and a probability of gap root-cause corresponding to each parameter for detecting an anomaly in the data volume captured by each parameter, wherein the total volume and the probability of gap root-cause is computed for each parameter based upon the enriched data records aggregated for each session, and identifying at least one parameter amongst the multiple parameters as a root-cause parameter for the volume gap for each session based upon the probability of the gap root-cause computed corresponding to each parameter of the multiple parameters.
  11. 11. The system 100 of claim 10, wherein the plurality of data sources further comprises one or more network nodes (103, 104), a charging node 105 and a policy controlling node 106, and wherein the one or more controlling interfaces comprises traffic interfaces or gateways.
  12. 12. The system 100 of claim 11, wherein the data records extracted using the data extraction platform, the network nodes (103,104) and the charging node 105 are normalized using an extraction, transforming and loading (ETL) framework.
  13. 13. The system 100 of claim 9, wherein the data analytics module 209 is configured for generating a report depicting one or more of a volume gap corresponding to one or more sessions, a total volume gap, a total data volume captured, total number of sessions,estimate revenue leakage, total number of subscribers contributing to revenue leakage, top Access Point Names contributing to revenue leakage, top Uniform Resource Locators contributing to revenue leakage, top Applications contributing to revenue leakage, Hourly trend of network traffic, Hourly trend of revenue leakage, Protocol, Proxy IP addresses, User location, Destination IP address, Charging characteristic, and a total
    2018214067 09 Aug 2018 revenue leakage corresponding to the at least one parameter identified as the root-cause parameter.
  14. 14. A non-transitory computer readable medium storing program for detecting anomalies in recorded consumption of data in a communication network, the program comprising instructions for:
    extracting a plurality of data records from a plurality of data sources for each session in the communication network, wherein the plurality of data sources include at least a data extraction platform communicatively coupled to one or more controlling interfaces of the communication network for extracting a data record of the plurality of data records, and wherein the plurality of data records comprise one or more parameters indicating usage volume pertaining to one or more services being consumed for each session;
    aggregating the usage volume captured in the plurality of data records in order to obtain an aggregate usage volume corresponding to each data source for each session;
    reconciling the aggregate usage volumes corresponding to each data source in order to determine either presence or absence of volume gap for each session;
    categorizing each session into a session category based upon the determination of either presence or absence of the volume gap;
    tagging the data records of a session with the predefined session category corresponding session in order to obtain enriched data records for each session;
    aggregating the enriched data records from the multiple data sources across the multiple parameters for each session;
    computing a total volume and a probability of gap root-cause corresponding to each parameter for detecting an anomaly in the data volume captured by each parameter, wherein the total volume and the probability of gap root-cause is computed for each parameter based upon the enriched data records aggregated for each session; and identifying at least one parameter amongst the multiple parameters as a root-cause parameter for the volume gap for each session based upon the probability of the gap rootcause computed corresponding to each parameter of the multiple parameters.
AU2018214067A 2018-08-09 2018-08-09 Method and system for detecting anomalies in consumption of data and charging of data services Ceased AU2018214067B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2018214067A AU2018214067B1 (en) 2018-08-09 2018-08-09 Method and system for detecting anomalies in consumption of data and charging of data services

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU2018214067A AU2018214067B1 (en) 2018-08-09 2018-08-09 Method and system for detecting anomalies in consumption of data and charging of data services

Publications (1)

Publication Number Publication Date
AU2018214067B1 true AU2018214067B1 (en) 2019-09-12

Family

ID=67844874

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2018214067A Ceased AU2018214067B1 (en) 2018-08-09 2018-08-09 Method and system for detecting anomalies in consumption of data and charging of data services

Country Status (1)

Country Link
AU (1) AU2018214067B1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115242667A (en) * 2022-06-24 2022-10-25 浪潮通信技术有限公司 Data acquisition analysis system and method combining 5G cloud side end

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130169216A1 (en) * 2011-12-28 2013-07-04 Fujitsu Limited Charging system, base-station apparatus, data relay apparatus, charging-information processing apparatus, charging-information generating apparatus, and charging-information modifying method
US9749840B1 (en) * 2015-11-19 2017-08-29 Cisco Technology, Inc. Generating and analyzing call detail records for various uses of mobile network resources
US20180183939A1 (en) * 2016-12-23 2018-06-28 Cellos Software Limited Method and system for detecting anomalies in consumption of data and charging of data services

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130169216A1 (en) * 2011-12-28 2013-07-04 Fujitsu Limited Charging system, base-station apparatus, data relay apparatus, charging-information processing apparatus, charging-information generating apparatus, and charging-information modifying method
US9749840B1 (en) * 2015-11-19 2017-08-29 Cisco Technology, Inc. Generating and analyzing call detail records for various uses of mobile network resources
US20180183939A1 (en) * 2016-12-23 2018-06-28 Cellos Software Limited Method and system for detecting anomalies in consumption of data and charging of data services

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115242667A (en) * 2022-06-24 2022-10-25 浪潮通信技术有限公司 Data acquisition analysis system and method combining 5G cloud side end

Similar Documents

Publication Publication Date Title
US10079943B2 (en) Method and system for detecting anomalies in consumption of data and charging of data services
US11916760B2 (en) Data usage analysis and reporting
US9853867B2 (en) Method and apparatus to determine network quality
US10192261B2 (en) System and method for performing offline revenue assurance of data usage
US9699676B2 (en) Policy controller based network statistics generation
US10015676B2 (en) Detecting fraudulent traffic in a telecommunications system
US9191523B1 (en) Cost allocation for derived data usage
US8761757B2 (en) Identification of communication devices in telecommunication networks
US10271244B2 (en) System and method for managing traffic detection
US10264139B2 (en) Cost allocation for derived data usage
CN105453613A (en) System, smart device and method for apportioning smart device operations and costs
EP2611074A1 (en) Confidence intervals for key performance indicators in communication networks
JP2009541872A (en) Method, communication system, and collection controller for allowing third party influence on provision of service to user station
CN106911523A (en) The method and system that mobile interchange network users are positioned by LTE indulging in the internet
Kivi Mobile Internet usage measurements: case Finland
AU2018214067B1 (en) Method and system for detecting anomalies in consumption of data and charging of data services
US10348910B2 (en) Method and system for providing a personalized product catalog enabling rating of communication events within a user device
Alsadi et al. Study to use NEO4J to analysis and detection SIM-BOX fraud
EP3050334B1 (en) Managing roaming information in communications
US10609224B2 (en) Method and system for dynamically allocating operator specific billing rules for data exchange by an application on a user equipment
US20140335842A1 (en) Customizable task execution flow
US9392124B2 (en) Method to determine the jurisdiction of CMRS traffic via cell site location and rate center
CN102395117A (en) Method and device for identifying content type
Khan et al. Automatic Monitoring & Detection System (AMDS) for Grey Traffic
CA2885035A1 (en) Data usage analysis and reporting

Legal Events

Date Code Title Description
FGA Letters patent sealed or granted (standard patent)
MK14 Patent ceased section 143(a) (annual fees not paid) or expired