WO2016091294A1

WO2016091294A1 - Estimating data traffic composition of a communication network through extrapolation

Info

Publication number: WO2016091294A1
Application number: PCT/EP2014/077162
Authority: WO
Inventors: Rodrigo Alvarez Dominguez; Miguel Angel Munoz De La Torre Alonso; Alfonso De Jesus Perez Martinez
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 2014-12-10
Filing date: 2014-12-10
Publication date: 2016-06-16

Abstract

The present invention relates to a method and node for estimating a data traffic composition in a policy and charging system of a communication network in order to improve the analysis of data traffic to better support the detection and/or prediction of traffic behavior. The method comprises analyzing data traffic of communication users regarding data traffic types by a node of said policy and charging system; identifying from the analyzed data traffic a communication user of a reference group, for which data traffic of a specific type is similar to data traffic of a specific type of a larger group of communication users, by using usage patterns of the communication users; and examining the data traffic of the specific type of the communication user of the reference group in order to extrapolate the result of the examination of the data traffic of the specific type to communication users of the larger group.

Description

ESTIMATING DATA TRAFFIC COMPOSITION OF A COMMUNICATION

NETWORK THROUGH EXTRAPOLATION

TECHNICAL FIELD

The present invention relates to a method and a node for estimating a data traffic composition of a communication network as well as to a corresponding system and computer program, and in particular to a method and node for estimating a data traffic composition in a policy and charging system of a communication network by analysing data traffic of communication users.

BACKGROUND In communication networks, such as telecommunication networks, a call or a service often involves, on the one hand, a control plane or signalling plane and, on the other hand, a user plane or media plane. The control plane or signalling plane is in charge of establishing and managing a connection between two points of the network. The user plane or media plane is in charge of transporting user data or service data.

Network operators have the desire to define and enforce a set of rules in the network. A set of rules constitutes policies. A policy framework for managing and enforcing these policies usually includes at least three elements or functions: a policy repository for storing policy rules, which may be user-specific, a policy decision element or function and a policy enforcement element or function. The purpose of a policy framework includes controlling subscriber access to networks and services as well as the kind of access, i.e. its characteristics.

A policy framework notably addresses the decisions as to whether the subscriber is entitled, or authorized, to enjoy a service, and whether the network can provide the service to the subscriber, and particularly whether the network can provide the service to the subscriber with the desired Quality of Service (QoS).

Policy and charging control architectures, such as, but not limited to, the architecture described in 3GPP TS 23.203 version 1 1.1.0 (201 1-03), Technical Specification Group Services and System Aspects; Policy and charging control architecture (release 11) (available on http://www.3gpp.org/ftp/Specs/201 1-03/Rel-1 1/23_series/), integrate policy and charging control.

One aim of a policy framework is to set up and enforce rules dependent on subscribers and/or desired services to ensure efficient usage of network resources among all subscribers.

An architecture that integrates and supports Policy and Charging Control (PCC) functionality, i.e. a PCC architecture, is depicted in figure 1 which is taken from 3GPP TS 23.203 specifying the PCC functionality for Evolved 3GPP Packet Switched domain including both 3GPP accesses and Non-3GPP (GERAN/UTRAN/E-UTRAN) accesses.

The Policy and Charging Rules Function (PCRF) 110 is a functional element that encompasses policy control decision and flow based charging control functionalities. The PCRF provides network control regarding Service Data Flow (SDF) detection, gating, QoS and flow based charging (except credit management) towards the Policy and Charging Enforcement Function (PCEF) 120. The PCRF receives session and media related information from the Application Function (AF) 140 and informs the AF of traffic plane events. The PCRF 110 is also coupled to a Subscriber Profile Repository (SPR) 150. Commonly, the PCRF functionality is implemented by stand-alone node(s). The PCRF shall provision PCC Rules to the PCEF via the Gx reference point and may provision QoS Rules to the Bearer Binding and Event Reporting Function (BBERF) 130 via the S7x reference point. In the architecture 100 of Figure 1 , the PCRF shall inform the PCEF through the use of PCC Rules on the treatment of each service data flow that is under PCC control, in accordance with the PCRF policy decision(s).

The SPR 150 is a functional entity that contains all subscriber/subscription related information needed for subscription-based policies and IP-CAN bearer level PCC/ADC rules by the PCRF. The SPR functionality can be implemented by a typical subscriber database node, such as a Home Location Register (HLR) or Home Subscriber Server (HSS) node, which in turn -and for the sake of storage capacity and reliability aspects- can be distributed along a plurality of database nodes (e.g. comprising replicas and/or distributing the data). The Gx reference point is defined in 3GPP TS 29.212 "Policy and charging control over Gx reference point", and lies between the PCRF and the PCEF. The Gx reference point is used for provisioning and removal of PCC Rules from the PCRF to the PCEF and the transmission of traffic plane events from the PCEF to the PCRF. The Gx reference point can be used for charging control, policy control or both.

The Rx reference point is defined in 3GPP TS 29.214 "Policy and charging control over Rx reference point" and is used to exchange application level session information between the PCRF and the AF. An example of a PCRF is the Ericsson Service-Aware Policy Controller (SAPC), see for example F. Castro et al., "SAPC: Ericsson's Convergent Policy Controller", Ericsson review No. 1, 2010, pp. 4 - 9. An example of an AF is the IP Multi-Media Subsystem (IMS) Proxy Call Session Control Function (P-CSCF).

The Sd reference point in Figure 1 is defined in 3GPP TS 29.212 and lies between the Policy and Charging Rule Function (PCRF) and the Traffic Detection Function (TDF).

Both Gx and Rx reference points may be based on Diameter, see for example P. Calhoun et al., "RFC 3588: Diameter Based Protocol", IETF, September 2003. PCEF encompasses service data flow detection, policy enforcement and flow based charging functionalities. A node implementing the PCEF can further encompass application traffic detection and reporting capabilities (e.g. in case it incorporates the functionalities of a TDF, as described below). The nodes implementing PCEF functionality are commonly located within the path taken by the IP data packets sent and received to/from user terminals of communication users.

!n particular, the PCEF enforces policy decisions received from the PCRF and also provides the PCRF with user-and access-specific information over the Gx reference point. The node including the PCEF or another Bearer Binding Function (BBF) encompasses SDF detection based on filter definitions included in the PCC Rules, as well as online and offline charging interactions (not described here) and policy enforcement. Since the PCEF is usually the one handling bearers, it is where the QoS is being enforced for the bearer according to the QoS information coming from the PCRF. This functional entity, i.e. the PCEF, can be located at a Gateway, e.g. in the Gateway GPRS Support Node (GGSN), in the GPRS case. For all the cases where there is Proxy Mobile IP (PMIP) or Dual-Stack Mobile IP (DSMIP) in the core network, the bearer control is performed in the BBERF instead. The Application Function (AF) 140 is an element offering applications in which service is delivered in a different layer, e.g. transport layer, from the one the service has been requested, e.g. signalling layer. One example of a network node including an AF 140 is the P-CSCF (Proxy-Call Session Control Function) of the iP multi-media (IM) core network (CN) sub-system. The AF 40 may communicate with the PCRF 110 to transfer dynamic session information, i.e. description of multi-media to be delivered in the transport layer. This communication is performed using the above-described Rx interface or Rx reference point, which is placed between the PCRF 110 and AF 140. Information in the Rx interface may be derived from the session information or service session information in the P-CSCF and it mainly includes what is called media components. Another example of a network node including an AF 140 is a streaming server.

Further, the PCC architecture in figure 1 depicts an Online Charging System (OCS) 180 and an Offline Charging System (OFCS) 190. The OCS 180 in figure 1 performs credit control based on service data flow as known in the art and the OFCS 190 is also known in the art.

Upon reception of the PCC/QoS rules from the PCRF, a Bearer Binding Function (BBF), either the PCEF 120 or the BBERF 130 depending on the deployment scenario, performs the bearer binding, i.e. associates the provided rule to an IP-CAN bearer within an IP-CAN (Internet Protocol Connectivity Access Network) session. The BBF will use the QoS parameters provided by the PCRF to create the bearer binding for the rule. Next, PCC support to applications is described. When an application requires dynamic policy and/or charging control over the IP-CAN user plane to start a service session, the AF will communicate with the PCRF to transfer the dynamic session information required for the PCRF to take the appropriate actions on the IP-CAN network. The PCRF will authorize the session information, create the corresponding PCC/QoS rules and install them in the PCEF/BBERF. The PCEF/BBERF will encompass SDF detection, policy enforcement (gate and QoS enforcement) and flow based charging functionalities. As described, the applicable bearer will be initiated/modified and if required, resources will be reserved for that application. The Traffic Detection Function (TDF) 160 is a functional entity that performs application detection and reporting of detected application or service and its service data flow description to the PCRF. The TDF can act in solicited mode, i.e. upon request from the PCRF, or unsolicited mode, i.e. without any request from the PCRF. For example, the TDF is a stand-alone kind of node introduced in 3GPP release 11 for the purpose of traffic detection. The TDF preferably applies Deep Packet Inspection (DPI) techniques - sometimes combined with heuristic analysis- for accomplishing with its detection functionality. According to 3GPP TS 23.203 (chapter 6.2.9) its functionality can be co- located within a node implementing a PCEF functionality. As in the case of PCEF, nodes implementing TDF functionality use to be located within the path taken by the IP data packets sent and received to/from user terminals. As noted above, 3GPP release 11 has introduced the TDF in the PCC architecture, which is a DPI box that monitors the payload and detects when an application is

initiated/terminated. This functionality can also reside in the PCEF. In detail, DPI technology supports packet inspection and service classification, which consists of IP- packets classified according to a configured tree of rules so that they are assigned to a particular service session. DPI has been standardized in 3GPP release 11 , the mentioned so-called Traffic Detection Function (TDF), which can be either stand-alone or collocated with the PCEF which is discussed in detail in 3GPP TR 23.813.

DPI technology offers two types of analysis:

· Shallow packet inspection: extracts basic protocol information such as IP

addresses (source, destination) and other low-level connection states. This information typically resides in the packet header itself and consequently reveals the principal communication intent.

Deep Packet Inspection (DPI) provides application awareness. This is achieved by analyzing the content in both the packet header and the payload over a series of packet transactions. There are several possible methods of analysis used to identify and classify applications and protocols that are grouped into signatures. One of them is heuristic signatures which relate to the behavioral analysis of the user traffic.

DPI enables, among others, collecting statistical information and searching for protocol non-compliance, viruses, spam, intrusions, etc. and/or decide whether a packet may pass. Thus, DP! enables advanced network management, user service, security functions and internet data mining.

Smartphones have created a new type of challenge to the mobile network. With the generalized use of smartphones the traffic has exploded and forced the operators to abandon flat-rate service plans. Data-intensive new features allow users to consume more wireless data through application downloads and video streaming. The operators need to get a new business model where QoS policies (that governs, among other, the assigned end-user bandwidth by means of e.g. controlling the assigned uplink or downlink maximum and/or guaranteed bit rates) are needed in order to compensate the network infrastructure cost.

Characterizing and dimensioning the network according to the services that are being used by end users is a key factor for both telecommunication network operators and content providers. The ability to detect and predict traffic behavior in order for the telecommunication network to react accordingly is also very important.

Additionally, smartphones tend to run applications that are constantly connected to the telecommunication network (e.g. via the radio or fixed/wired access provided by a state- of-the-art mobile telecommunication network). These applications may apply proprietary protocols or attempt to mimic other well-known protocols. In many occasions, these applications use encrypted traffic for security reasons. These factors make packet analysis and classification using DPI not only difficult but also very costly from a system resources perspective.

DPI technology uses heuristic analyzers that detect and identify these protocols based on binary signature patterns, metrics or connectivity patterns. The difficulty of correctly identifying this type of traffic means that the protocol identification accuracy cannot be guaranteed. The higher the percentage of encrypted packets the lower the detection rate.

In addition to the difficulties in the predictability of future data traffic types and amounts, these techniques facilitate fraud, as traffic can be encrypted in an attempt to avoid correct classification and obviate the customers charging model. In all these cases, a PCEF enhanced with application detection functionalities {PCEF- TDF), or a stand-alone TDF, commonly applies heuristic analysis based on a set of empirical patterns characteristic of a particular protocol or application. Each time a user is connected to the network and generates traffic, PCEF-TDF TDF tries to analyze each packet and inspect searching for each possible protocol. This has the following drawbacks: Number of new protocols and applications increases every year, consequently the current detection protocol mechanisms should change according to the state of the art of the internet protocols in a dynamic way.

The probability of incorrect protocol detection increases as a consequence of the new protocols and applications increment every year.

The packet analysis made by the PCEF-TDF/TDF node is currently performed in respect to data packets of each and every data connection established by each and every of the user terminals (e.g. smartphones) that are currently attached to the telecommunication network managed by a network operator.

The heuristic traffic analyzer makes a best guess classification but identification accuracy is not guaranteed to be 100%. This limitation is inherent in the heuristic approach. This type of analysis that keeps in mind the behavioral analysis of the packets is highly consuming CPU resources because more than one packet is taken into account for the analysis Therefore, heuristic analysis is not suitable for charging applications.

Moreover, having to analyze data packets in respect to each and every user terminal currently connected so as to determine e.g. the QoS rules that should be individually applied to the data packets of their respective packet data flows is also a cumbersome task and, thus, highly consuming CPU resources on the corresponding servers (e.g. PCEF-TDF/TDF nodes).

On the other hand although, PCEFs and TDFs from some vendors have a high detection rate for some protocols and applications, the number of supported protocols or applications is really limited. Internet applications change frequently. There are new tendencies, fashions, websites that are changing very rapidly. Even the same application or protocols are not popular in all countries. Many countries have local popular applications (Tencent QQ in China, Windows Live Messaging in Spain or Skype in Germany) with proprietary protocols.

The effort in order to update these protocols and to collect new popular applications is very costly and inefficient. In some cases due to the high number of existing applications, as for example applications downloaded from Apple Store or Android Market, it is almost unfeasible to be able to include them as supported application in any PCEF. Also, it has to be considered that PCEF/TDF is the single point of analysis and classification of all services and applications of all users in the core network. Those activities consume large CPU resources. Moreover, PCEF/TDF has to interoperate with other entities like PCRF so it has become a bottleneck in the PCC architecture.

For a!i these reasons, PCC functions are limited in these applications and they cannot guarantee that they apply for a single user and a specific application. The operator has to assume that many PCC functions will apply in best effort or will never be able to be applied. Example of these applications are Skype, Instant messaging, TV online (Zattoo, Pandora, Slacker), iPhone or android applications, P2P applications (emule, BitTorrent), SIP, RTSP, online games.

However, there are several reasons why mobile network operators desire to establish privileged access to new Internet tendencies, videos, fashion applications or websites or TV over IP channels. There is an urgent and critical need to improve the bandwidth and access to these services. The current PCC solution allows to detect, to classify, to improve their QoS or to charge the applications that are based on heuristic analysis but it is limited and low efficient.

There is the general problem of having to inspect (e.g. deep packet inspection and heuristic analysis) a large number of or all data packets of each and every data connection established by each and every of the user terminals (e.g. smartphones) that are currently attached to the telecommunication network managed by a network operator.

It is thus desirable to provide methods, nodes, systems and computer programs to improve the analysis of data traffic to better support the detection and/or prediction of traffic behavior.

SUMMARY

A suitable method, node, system and computer program are defined in the independent claims. Advantageous embodiments are defined in the dependent claims.

In one embodiment, a method for estimating a data traffic composition in a policy and charging system of a computer network is provided. This method comprises the step of analyzing data traffic of communication users regarding data traffic types by a node of the policy and charging system. Further, the method comprises identifying from the analyzed data traffic a communication user of a reference group, for which data traffic of a specific type is similar to data traffic of a specific type of a larger group of communication users, by using usage patterns of the communication users. Then, the data traffic of the specific type of the communication user of the reference group is examined in order to extrapolate the result of the examination of the data traffic of the specific type to communication users of the larger group. Accordingly, by providing an improved analysts and examination of data traffic, the behavior of data traffic may be better detected and preferably predicted. In particular, a better estimation of the data traffic composition, e.g. the different types of data traffic and their amounts, in a network, can be supported so as to characterize and dimension the network accordingly. In one embodiment, a node is provided for estimating a data traffic composition in a policy and charging system of a communication network. The node comprises an analyzer configured to analyze data traffic of communication users regarding data traffic types. Further, the node comprises an identifier configured to identify from the analyzed data traffic a communication user of a reference group, for which data traffic of a specific type is similar to data traffic of a specific type of a larger group of communication users, by using usage patterns of the communication users. Still further, the node comprises an examiner configured to examine the data traffic of the specific type of the communication user of the reference group in order to extrapolate the result of the examination of the data traffic of the specific type to communication users of the larger group. Accordingly, by providing an improved analysis and examination of data traffic, the behavior of data traffic may be better detected and preferably predicted, !n particular a better estimation of the data traffic composition, e.g. the different types of data traffic and their amounts, in a network, can be supported so as to characterize and dimension the network accordingly. In another embodiment, a system is provided comprising the above-described node and its elements, namely the analyzer, the identifier and the examiner as well as another node having a Policy and Charging Rules Function (PCRF), wherein the other node is configured to provide to said node a default classification rule to be installed in said node. In another embodiment, a computer program is provided which includes instructions configured, when executed on a data processor, to cause the data processor to carry out the above-described method.

Further, advantageous embodiments of the invention are disclosed in the dependent claims.

BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 illustrates an exemplary PCC architecture to assist the reader in understanding an exemplary content, in which embodiments of the invention may be applied. Figure 2 illustrates operations of a method for estimating data traffic composition according to an embodiment.

Figure 3 illustrates a specific example of identifying similarities in data traffic. Figure 4 illustrates an exemplary flow diagram for explaining an embodiment of the present invention in detail.

Figure 5 illustrates an exemplary flow diagram for explaining the estimation of data traffic composition according to a specific embodiment.

Figure 6 illustrates elements of a node for estimating data traffic composition in a policy and charging system according to an embodiment.

Figure 7 illustrates elements of a system including a modification of the node of figure 6.

DESCRIPTION OF THE EMBODIMENTS

Further embodiments of the invention are described with reference to the figures. It is noted that the following description contains examples only and should not be construed as limiting the invention.

In the following, similar or same reference signs indicate similar or same elements or operations. Figure 2 illustrates a flowchart of a method for estimating data traffic composition in a policy and charging system of a communication network, e.g. a telecommunication network of the 3rd or 4th generation, such as UMTS, LTE, LTE-Advanced, etc. The operations, also referred to as steps in the following, of the method may be carried out by at least one node of a policy and charging system, preferably the node comprising the Traffic Detection Function (TDF). For assisting the reader in understanding embodiments of the invention, examples are described by referring to the previously described PCC architecture of figure 1. As mentioned above, the function described in figure 1 are functional elements which may be placed in nodes of the policy and charging system and may be realized by suitable servers, routers, computers, a combination thereof, etc. as known in the art. For example, the policy and charging system of the communication network comprises the above-mentioned node including the TDF collocated with a node including the above- mentioned Policy and Charging Enforcement Function (PCEF) or a standalone TDF node. Further, the policy and charging system usually comprises a node including the PCRF connected to the node comprising the TDF.

As can be seen in figure 2, the method comprises a step 210 in which data traffic of communication users is analyzed regarding data traffic types. This analysts may be carried out by the above-mentioned node including the TDF collocated with a PCEF or a standalone TDF. In particular, the method is carried out to estimate a data traffic composition in the policy and charging system, i.e. composition of data traffic of different traffic types, wherein particularly a specific type, such as unknown data traffic, can be further broken down into subtypes defined by characteristic, such as consumed volume, traffic peaks, connection time and duration as well as bandwidths needed to ensure good service quality.

For example, when analyzing the data traffic, regular traffic inspection for unknown data traffic may be performed, and possibly Deep Packet Inspection (DPI) may be carried out for known data traffic, such as traffic from applications like Facebook, twitter, Whatsapp, etc. Analyzing data traffic may include classification of the traffic, in particular, classifying, e.g. by using the DPI function, the data traffic according to active PCC rules. Further, an Analytics function may calculate and store the usage pattern of a user and/or an average usage pattern for several users.

In step 220 a communication user of a reference group is identified from the analyzed data traffic, for which data traffic of a specific type is similar to data traffic of a specific type of a larger group of users. In particular, if the data traffic of the specific type of the user is similar to data traffic of the specific type of a larger group of users, which may be determined by using usage patterns of the users, the communication user is considered a user of the reference group. Data traffic of a specific type may be unknown data traffic which is classified into a default classification rule. This rule associated with a policy applies when the traffic does not match any classification rule for any known

application/service, such as specific classification rules (PCC rules) relating to Facebook- traffic and Twitter-traffic, for example. That is, the above-mentioned known data traffic types, such as traffic from Facebook and twitter, are each associated with a PCC rule (specific classification rules), and the unknown data traffic (default) is associated with a different PCC rule, namely this data traffic is data traffic of a specific type classified into a default classification rule indicating data traffic not matching a specific classification rule associated with known data traffic.

In particular, the default classification category is where old traffic that does not match other classification rules, is classified. It is not dependent on the analysis engine, such as DPI, shallow packet inspection or heuristic schemes. For clarification, the rough differences are basically the following, DPI inspects layer 7 traffic and shallow packet inspection inspects layer 4 traffic, whereas heuristic schemes usually inspect several fields in one packet or several packets in a communication. The user may be any user using communication services, such as mobile

telecommunication or other wireless communication services but is not limited thereto. Users of a state of the art telecommunications network, e.g. the one 3GPP envisages by the embodiments, do not have to use exclusively mobile devices, e.g. mobile phones using radio access, but may also utilize fixed terminals, such as PCs using wired accesses. The case wherein a communication user can be accessing a communication network via non-radio access is envisaged, for example, in 3GPP by 3GPP TS 23.139, see e.g. the "BBF device" cited by chapter 5.1.2 as well as by 3GPP TS 23.203, in particular Annex P. That is, communication users may use any type of communication device or terminal to generate the herein discussed data traffic.

Further, in step 230 the data traffic of the specific type, i.e. the unknown data traffic of the communication user of a reference group, is examined. This examination may be regarded as more deeply analyzing the data traffic of the specific type by using light or full DPI. This examination may be performed by distinguishing more characteristics of the data traffic of the specific type. As mentioned above, these characteristics may be associated with subtypes and may relate to consumed volume, traffic peaks, connection time, etc. The result of the examination of the data traffic of the specific type may then be extrapolated to communication users of a larger group, the data traffic of which has not been examined in detail yet. In the following, an specific example is described how a communication user of a reference group can be identified in step 220 from the analyzed data traffic by using usage patterns of several communication users by referring to figure 3. Figure 3 illustrates a specific example of identifying similarities in data traffic. Specifically, the detailed diagram shows an example in which the identifying step comprises correlating a usage pattern for a communication user with an average usage pattern of several communication users. In this example, the usage pattern is depicted as volume usage per application, e.g. data traffic in megabytes per application, over time. Therefore, in this example, the usage pattern is a pattern that can be illustrated by a graph as depicted in figure 3.

In detail, two graphs are shown in figure 3, the first graph for user 1 and the second graph for a group of users. User 1 , indicated by reference numeral 3101 , may be a user of the group 320 of users 1 having up to N users. The group may be defined as a group, in which all users have the same Access Point Name (APN), location, connection time and/or subscriber type. For example, an APN identifies within a mobile telecommunication network, e.g. a 3G or 4G network, a point of connection to an external network (e.g. the Internet, an intranet, etc) to which the user terminal connects through the

telecommunication network. In particular, an APN can translate into the IP-address of a gateway of the telecommunication network, e.g. a gateway such as a GGSN, through which said network connects to the Internet.

First, the data traffic is classified by analyzing the data traffic of a first user regarding its composition including different data traffic types. Then, the usage pattern for user 1 is calculated and stored. For each new user, e.g. user 2, user 3, etc., the Analytics function calculates and stores the average usage pattern representing the usage pattern for all existing users or for the users matching a certain criteria, e.g. APN, location or subscriber type. Up to N users may contribute to the average usage pattern.

In the example of figure 3 the usage pattern of user 1 is compared to the average usage pattern of a large group of users, wherein the average usage pattern may have been obtained in advance by the above-mentioned steps of classifying, calculating and storing. In more detail, in the identifying step 330 the usage pattern of user 1 is compared to the average usage pattern by correlating the usage pattern of the individual communication user 1 with an average usage pattern of several communication users. As can be seen in figure 3, if the correlation is high, the user is selected as a user of the reference group in step 350, and if the correlation is low, the user is selected as a user of the larger group. If the user is selected as a user of the larger group, it will be checked with other users (see step 340) whether another user may serve as reference user. For example, a correlation threshold may be selected and if the correlation coefficient is higher or equal to the correlation threshold, a high correlation is assumed and the user is selected as a user of the reference group and if the correlation coefficient is below the correlation threshold, the user is selected as a user of the larger group. For example, the correlation threshold could be chosen as 0.9 and it should be understood that a correlation coefficient of approximately 1 indicates that the correlation between the usage patterns is high.

In the data traffic analysis before correlation is performed, data traffic is classified and the usage pattern is calculated. More specifically, the data traffic is classified according to Policy and Charging Control (PCC) rules. As known in the art, a PCC rule is associated with a policy and provided, usually from a PCRF node, to a PCEF node or TDF node to be enforced.

Once the correlation is performed, the result of the correlation may be sent from the PCEF node or TDF node to the PCRF node. The result may be sent in a message indicating that the identified communication user is a communication user of the reference group or not.

After it is determined that the user is a user of the reference group, the data traffic of this user is analyzed in more detail. In particular, the data traffic of the specific type, which cannot be attributed to known applications/services, such as Facebook, twitter, etc., for which specific classification rules exist, is examined. In this examination more

characteristics, such as consumed volume, traffic peaks, connection time may be distinguished. By assuming that the data traffic of the specific type is similarly composed for the user of the reference group and the users of the larger group, such as the group of users 1 discussed in figure 3, it is possible to extrapolate the results of the examination of the data traffic of the specific type to the users of the larger group. As mentioned above with respect to group 320 of users 1 of figure 3, the user of the reference group and the users of the larger group may have at least one of the following criteria in common: an APN, a location, a connection time and a subscriber type. The subscriber type may be a subscriber plan (profile or tariff), subscriber's mobile device (smartphone or other phone) or subscriber age.

Figure 4 illustrates an exemplary flow diagram for explaining a detailed embodiment of the present invention. The detailed flow diagram of figure 3 covers several steps of a possible implementation, in which the Analytics functions described herein reside in a PCEF node which implements also a TDF functionality. However, other implementations are possible, for example the Analytics functions may reside in a standalone TDF node or reside in a specialized node that receives information from the PCEF node and/or TDF node. In the flow diagram of figure 4 the user equipment (UE) of user 1, e.g. a smartphone, is identified by reference numeral 4101 and the user equipment (UE) of user N by 410N. Further, the PCEF 420 including the herein discussed analytics, the PCEF 430, SPR 440 and the internet 450 are shown in figure 4 and the signaling between these entities will be described in detail below.

In step 1 of figure 4, at IP-CAN session establishment for user 1 (UE 1) (PDP context creation in the figure in case of 3G networks), the PCEF node, e.g. included in a Gateway GPRS Support Node (GGSN), conveys to the PCRF node 430 the subscriber identity, such as E Sl (international Mobile Subscriber identity) and/or MSISDN (Mobile Station Integrated Services Digital Network Identifier) in the Gx initial Credit Control Request (CCR) message.

In steps 2 and 3, the PCRF node 430 retrieves the subscriber's profile from the SPR database 440 based on the received subscriber identity, which in this case indicates this is not a reference user. In this example, it is at least not yet a reference user, since no average usage pattern has been stored yet.

In step 4, the PCRF node 430 installs in the PCEF node 420 default PCC rules indicating regular traffic inspection, as this is not a reference user. This user also pertains to the larger group of users mentioned above, which may be based on a certain criteria like APN, location, etc.

In step 5, as the UE 4101 of the user runs different services/applications, e.g. HTTP browsing, Whatsapp, Facebook, the PCEF node 420 including both DPI and Analytics functions performs the following actions. The DPI function classifies the data traffic according to active PCC rules associated with the known services/applications (e.g. HTTP browsing, Whatsapp, Facebook) in step 5, and shallow packet inspection may be carried out for other data traffic, such as unknown traffic, which may be classified according to a default PCC rule. As this is the first user, the Analytics function calculates and stores the usage pattern for user 1. At IP-CAN session establishment for user N (also pertaining to the larger group of users mentioned above) in step 6, the PCEF node 420 conveys to PCRF node 430 the subscriber identity (IMS! and/or MSISDN) in the Gx initial CCR message.

In steps 7 and 8, the PCRF node 430 retrieves the subscriber's profile from the SPR database 440 based on the retrieved subscriber identity, which in this case indicates that it is not a reference user.

In step 9, the PCRF node 430 installs in the PCEF node 420 the default PCC rules indicating regular traffic inspection, as this is not a reference user. In step 10, while the user N runs different services/applications on his/her UE, e.g. HTTP browsing, Whatsapp, Facebook, the PCEF node 420 including both DPI and Analytics functions, performs the following actions. The DPI function classifies the traffic according to active PCC rules and the Analytics function performs the steps which have been described in detail with respect to figure 3. in particular, the usage pattern for user N is calculated and stored. Further, the average usage pattern is calculated and stored by the Analytics function, wherein the average usage pattern represents the usage pattern for all users or for a group of users matching certain criteria, e.g. the same APN, location or subscriber type. The Analytics then correlates the usage pattern for user N with the average usage pattern and if the resulting correlation is high, the user N is selected as new reference user, i.e. a user of the reference group.

Several options for selecting a group of users exist. For example, it may be possible that the above-mentioned approach is not optimized from CPU resources perspective so that it may be more favorable to create the usage pattern only for the traffic classified into the default category and compare same traffic volume pattern although it is not known what unknown traffic is (what service or application). Further, it is possible to create subgroups of users in order to average the user patterns for these subgroups and to identify the reference users for these subgroups. Subgroups may be characterized by the same APN, location, etc. Furthermore, it is possible to monitor a group of subscribers of users that have same similarities based on a statistic study, such as connection times, age, a mobile device type, subscriber profile or tariff. In step 11 of figure 4, the PCEF node 420 triggers a Gx CCR update message to the

PCRF node 430 including a new event trigger (REFERENCEJJSER) to indicate that user N is a reference user.

In steps 12, 13 and 14, the PCRF node 430 stores in the subscriber's profile SPR database that this subscriber is a reference user. This will be used as permanent storage for this attribute for future IP-CAN session establishment for user N.

Steps 15 and 16 show that the PCRF node 430 installs the PCC rules including extra traffic analysis for this reference user. However, it is also possible that the PCEF locally selects the PCC rule or rules for the reference user, e.g. when there is no PCRF node or when the PCRF node does not support the desired functionality.

In step 17, the PCEF node 420 applies extra traffic analysis for the reference user. The Analytics function in the PCEF node 420 uses the results obtained for the reference user and extrapolates them for the rest of users without the need of further analysis.

As noted above, the procedure described with respect to figure 4 applies to an

embodiment considering a PCEF node with TDF capabilities and enhanced according to the herein described inventive schemes. In case of a standalone TDF enhanced with the herein discussed inventive schemes, similar procedures apply with the differences that the Analytics functionality would then reside within the TDF and that the messages exchanged between the TDF and the PCRF would be messages via the "Sd" interface, i.e. TDF to PCRF and PCRF to TDF, instead via the "Gx" interface, i.e. PCEF to PCRF and PCRF to PCEF, as illustrated by figure 4.

The result of the above procedure is that, only for this reference user, traffic is going to be analyzed more deeply than for the rest of the users, i.e. users of the larger group. With these new rules (analysis patterns), the PCEF is going to distinguish, in particular for the unknown traffic, more characteristics of the traffic. These results can then be extrapolated to the rest of the users, e.g. the users of the larger group. These characteristics can be, for example, consumed volume, possible traffic peaks, connection time and duration, possible bandwidths needed to ensure a good service quality, etc. All these characteristics can help operators, for example, to set up QoS to be enforced with respect to traffic classified as "unknown" for a group of users, as well as to predict and anticipate the network demands for user service and even to find out possible charging failures if there is a big discrepancy between reference user and any other user.

One main application of the above procedure is to run just a light DPI, e.g. just inspect some IP packet's contents beyond the IP packet header, but not perform heuristic analysis on all the content of the IP packet, for most of the users (basically to detect the minimum set of services/applications for QoS or charging purposes). One example would be to detect only video in order to prioritize it from a QoS perspective and general Internet browsing (in case of a browsing package to be charged as Rating Group 1). In this case, a large amount of traffic (all non-video and non-Internet browsing traffic) is classified into the default traffic category referred to as data traffic of the specific type and unknown data traffic above. It would be very useful for the operator to identify which other services the users are running, without the need to run costly full DPI resources for each and every user. in the following, another exemplary flow diagram for explaining the estimation of data traffic composition according to a specific embodiment is discussed with respect to figure 5.

As mentioned above, it is desired to identify users whose unknown data traffic follows the same or similar graph, e.g. volume usage per application versus time, of unknown traffic of a selected group of users.

In step 510 of figure 5, a PCEF TDF node, i.e. either a PCF node with TDF functionality or a standalone TDF node, analyzes and classifies traffic of all users or of a specific group of users, e.g. the group of users 1 discussed in figure 3, that are to be monitored according to certain operator criteria, such as APN, location, subscriber profile, charging profile, etc., which could indicate that these users share some similarities. As a part of the

classification rules for this group of users, the unknown data traffic is directly classified into a default service category without any extra analysis and classification effort.

In step 520, for each user in the above group, a graph of the usage data traffic for unknown traffic, i.e. traffic classified into the default service category, is obtained. Based on the information obtained in step 520, in step 530, the average graph of usage data traffic (usage traffic volume) of unknown data traffic for the above group of users is obtained. Then, in step 540 the graphs are correlated for all users in the group in the Analytics function which resides either in the PCEF TDF node or in an external specialized/devoted Analytics node.

In step 550 a user (or users) in the group whose correlation coefficient is near 1 is obtained and this user (users) is defined as a reference user, i.e. a user of the reference group.

Then in step 560, the PCEF TDF analyzes more deeply only the data traffic for the reference user (users), e.g. by loading more rules, patterns, etc., then for the rest of the users, i.e. the users of the larger group.

In step 570 the results of the analysis of the data traffic of the reference user (users) is extrapolated to the whole group, i.e. to the users of the larger group. In summary, the above-mentioned solutions comprise inspecting deeply the traffic of the users of the reference group, in particular the traffic pattern with respect to "unknown data traffic" which is matched to an average traffic pattern of users of a larger group, and extrapolating the results to the users of the larger group which traffic pattern matches. Thereby, the classification work carried out by the PCEF TDF or TDF nodes can be reduced, since these nodes can apply the same classification rules extracted from the first group of users (reference users) to the second group of users (larger group of users).

In particular, the above schemes provide a mechanism to identify and characterize the behavior of those users that are relevant from the operator's point of view because their behavior is similar to the global behavior of the network. In short, the users are identified whose traffic follows the same scaled (average) graph of the data traffic of a selected larger group of users. This larger group can be either the total number of users in the operator's network or a subgroup of them, e.g. users in a certain location or under a certain APN, etc. By analyzing the unknown traffic of a small group of selected identified users in more detail, these results of the analysis can be extrapolated with respect to other users, i.e. the larger group of users, which allows reducing the amount of data traffic analysis required to be performed by traffic inspection nodes, such as PCEF-TDF/TDF nodes. Additionally, with the analysis of the identified users, i.e. the small group of users, the traffic behavior of the whole group of users, i.e. the larger group of users, can be predicted. Accordingly, it is possible to identify and further analyze in more detail the unknown data traffic and particularly do this only for selected users, namely the users of the reference group. As described above, unknown data traffic is the traffic which is classified into the default classification rule, which applies when the data traffic does not match any classification rule for any known application/service. For identification of those users whose behavior regarding the unknown data traffic is similar to the larger group of users, different correlation techniques have been proposed above, and once those selected users are identified, a more detailed analysis, such as deep packet inspection or heuristic analysis, can be carried out only for these users, wherein the examination results, e.g. in terms of what is the detected kind of traffic and what is the QoS that corresponds to be applied, can be applied to the larger group of users. In particular, the results of the detailed analysis (examination) for the selected group of users can then be extrapolated to the users of the larger group.

Before describing some nodes of the PCC architecture, the concept of the above- mentioned default classification rule is explained as follows: an operator configures "per service classification rules" in PCEF for the services he's interested in (e.g. Whatsapp, Facebook, HTTP browsing, etc because those services are treated, e.g. charged, differently, each one with a different Rating Group). The traffic that is not classified into those "per service classification rules" is classified into a "generic default classification rule" (this rule will match all the traffic corresponding to the services the operator is not particularly interested in, e.g. Tencent, BitTorrent, etc, as ail those services have a generic treatment, e.g. charged with a default Rating Group).

In the following, it is referred to figures 6 and 7, which illustrate elements of a node, such as a PCEF node including a TDF or a standalone TDF node, and elements of a system including the node, respectively.

In detail, figure 6 illustrates elements of a node 600 for estimating data traffic composition in a policy and charging system of a communication network according to an embodiment. The node 600 may be configured to implement a PCEF and/or TDF and thus may constitute or be included in the above-described network nodes 160, 120 or 420. The node 600 may be a server comprising a processor to carry out at least some of the above- described functions, specifically the functions described with respect to figures 2 and 5.

As can be seen in figure 6, the node 600 comprises an analyzer 610, an identifier 620 and an examiner 630, which may be tangible elements or software functions running on a processor.

The analyzer 610 is configured to analyze the data traffic of one or more communication users regarding data traffic types, wherein details about this analysis and the data traffic types have been described above.

The identifier 620 is configured to identify from the analyzed data traffic a communication user or users of a reference group, for which data traffic of a specific type is similar to data traffic of a specific type of a larger group of communication users, by using usage patterns of the communication users. Again, it is referred to the above discussion for more details to avoid unnecessary repetition.

The examiner 630 is configured to examine the data traffic of the specific type of the communication user or users of the reference group in order to extrapolate the result of the examination of the data traffic of the specify type to communication users of the larger group. Details of the functions of the examiner 630 have been discussed above. In particular, these described elements 610, 620 and 630 allow for carrying out the functions previously described with respect to figures 2, 3, 4 and 5 so that it is referred to the description of these figures for further details. It is further clear that at least the functions of the analyzer and examiner may be combined in one functional element, e.g. an analyzing unit configured to analyze and to examine.

Accordingly, the same advantages which are achieved with the above-described methods can also be achieved by the node 600.

In one embodiment, the identifier 620 of the node 600 may be configured to correlate a usage pattern for a communication user with an average usage pattern of several communication users. Additionally or alternatively, the analyzer 610 of node 600 may be further configured to carry out at least one function of classifying data traffic according to PCC rules and calculating a usage pattern for a communication user. In one embodiment, the node 600 may further comprise an extrapolator (not shown) configured to extrapolate the results of the examination of the data traffic of the specific type to the communication users of the larger group by assuming that the data traffic of the specific type is composed similarly for the communication user of the reference group and communication users of the larger group.

In another embodiment, the node 600 may comprise a sender configured to send from the node 600 to another node a message indicating that the identified communication user is a communication user of the reference group. An example of a node including a sender is shown in figure 7, in which sender 725 is included in node 700, which is a modified node 600. in more detail, figure 7 shows a system 900 comprising a node 800 and the node 700 which can be the same as node 600 or may comprise the same functions as the node 600 as well as additional functions, such as the functions of the sender 725. The node 800 of the system 900 has PCRF functionality and is configured to provide to the node 700 a default classification rule to be installed in said node 700.

For example, a PCC rule can be provided by the provider 810 of node 800. The PCC rule may include additional data traffic examination for the communication user of the reference group. Therefore, node 800 may be a PCRF node. Node 800 may be further configured to receive a message indicating that the identified communication user is a mobile communication user of the reference group and configured to store this information in a database, for example a database of the SPR 150. To receive such a message, the receiver 820 of node 800 may be provided. It I understood that node 600 or 700 may be implemented by including a bus, a processing unit, a main memory, a ROM, a storage device, an I/O interface consisting of an input device and an output device and a communication interface. The bus may include a path that permits communication among the components of the node.

The processing unit may include a processor, a microprocessor or other processing logic that interprets and executes instructions. The main memory may include a RAM or other type of dynamic storage device that may store information and instructions for execution by the processing unit. For example, the analyzer, identifier and examiner discussed above with respect to figures 6 and 7 may be realized by the processing unit. The ROM may include a ROM device or another type of static storage device that may store static information and instructions for use by the processing unit. As mentioned above, the node 600 may perform certain operations or processes described herein. The node 600 may perform these operations in response to the processing unit executing software instructions contained in a computer-readable medium, such as the main memory, ROM and/or storage device. A computer-readable medium may be defined as a physical or a logical memory device. For example, a logical memory device may include memories within a single physical memory device or distributed across multiple physical memory devices. Each of the main memory, ROM and storage device may include computer-readable media with instructions as program code. The software instructions may be read into the main memory for another computer- readable medium, such as a storage device or from another device via the communication interface.

The software instructions contained in the main memory may cause the processing unit including a data processor, when executed on the processing unit, to cause the data processor to perform operations or processes described herein. Alternatively, hard-wired circuitry may be used in place or on in combination with the software instructions to implement processes and/or operations described herein. Thus, implementations described herein are not limited to any specific combination of hardware and software.

The physical entities according to the different embodiments of the invention, including the elements, nodes and systems may comprise or store computer programs including software instructions such that, when the computer programs are executed on the physical entities, steps and operations according to the embodiments of the invention are carried out, i.e. cause data processing means to carry out the operations. In particular, embodiments of the invention also relate to computer programs for carrying out the operations/steps according to the embodiments of the invention, and to any computer- readable medium storing the computer programs for carrying out the above-mentioned methods.

Where the terms analyzer, identifier, examiner, sender, provider and receiver are used, no restrictions are made regarding how distributed these elements may be and regarding how gathered these elements may be. That is, the constituent elements of the nodes and systems may be distributed in different software and hardware components or other devices for bringing about the intended function. A plurality of distinct elements may also be gathered for providing the intended functionality. For example, the elements/functions of the node may be realized by a microprocessor and a memory similar to the above node including a bus, a processing unit, a main memory, ROM, etc. The microprocessor may be programmed such that the above-mentioned operation, which may be stored as instructions in the memory, are carried out. Further, the elements of the nodes or systems may be implemented in hardware, software, Field Programmable Gate Arrays (FPGAs), Application-Specific Integrated Circuits (ASICs), firmware or the like.

From the above discussion, the skilled person becomes aware that operators by using the above schemes may better understand in advance how the network is going to act by analyzing the behavior of some specific users. Operators can identify the

services/applications classified initially as unknown traffic, wherein this traffic is

considered initially as unknown because it may be very costly and time-consuming to determine what service/application it is for a large number of users. However, by means of identifying and analyzing the unknown data traffic of reference users, the operator can save resources by avoiding a detailed analysis of the unknown data traffic for all users in the network.

The operator can also predict the growth of new services by identifying the unknown data traffic and the corresponding evolution in the near future. The early identification of these services is relevant for the operator as it allows the operator to identify emerging services in their networks for which they could create service add-ons.

Further, knowing in more detail which services have been accessed by the users can allow operators to reach agreements with service providers. The operator can guarantee service quality for these over-the-top (OTT) services or offer sponsored data services connectivity.

Accordingly, the above presents a mechanism to identify reference users whose unknown data traffic follows the same or similar pattern of traffic of a larger group of users by analyzing in detail the traffic for these reference users and extrapolating the results to a larger group.

It will be apparent to those skilled in the art that various modifications and variations can be made in the entities and methods of this invention as well as in the construction of this invention without departing from the scope or spirit of the invention. The invention has been described in relation to particular embodiments and examples which are intended in all aspects to be illustrative rather than restrictive. Those skilled in the art will appreciate that many different combinations of hardware, software and/or firmware will be suitable for practicing the present invention.

Moreover, other implementations of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. It is intended that the specification and the examples be considered as exemplary only, wherein abbreviations used in the above examples are listed below. To this end, it is to be understood that inventive aspects lie in less than all features of a single foregoing disclosed implementation or configuration. Thus, the true scope and spirit of the invention is indicated by the following claims.

ABBREVIATIONS

APN Access Point Name

CCA Credit Control Answer

CCR Credit Control Request

CN Core Network

DPI Deep Packet Inspection

IP Internet Protocol

PCC Policy and Charging Control

PCEF Policy Control Enforcement Function

PCRF Policy Control Resource Function

SDN Software Defined Network

TDF Traffic Detection Function

UE User Equipment (communications equipment utilized by a user, such as e.g. a smartphone).

Claims

1. Method for estimating data traffic composition in a policy and charging system of a communication network, comprising the steps of analyzing (210) data traffic of communication users regarding data traffic types by a node of said policy and charging system; identifying (220) from the analyzed data traffic a communication user of a reference group, for which data traffic of a specific type is similar to data traffic of a specific type of a larger group of communication users, by using usage patterns of the communication users; and examining (230) the data traffic of the specific type of the communication user of the reference group in order to extrapolate the result of the examination of the data traffic of the specific type to communication users of the larger group.

2. Method of claim , wherein the identifying step comprises correlating (330) a usage pattern for a communication user with an average usage pattern of several

communication users.

3. Method of claim 2, wherein if the correlation is high, the communication user is selected (350) as a communication user of the reference group and if correlation is low, the communication user is selected (340) as a communication user of the larger group.

4. Method of one of claims 1 to 3, wherein the analyzing step comprises at least one of classifying data traffic according to Policy and Charging Control, PCC, rules and calculating a usage pattern for a communication user.

5. Method of one of claims 1 to 4, wherein a usage pattern comprises usage data traffic over time.

6. Method of one of claims 1 to 5, wherein data traffic of a specific type is data traffic which is classified into a default classification rule.

7. Method of claim 6, wherein the default classification rule indicates traffic not matching a specific classification rule.

8. Method of one of claims 1 to 7, further comprising sending from said node to another node a message indicating that the identified communication user is a communication user of the reference group.

9. Method of one of claims 1 to 8, further comprising extrapolating the result of the examination of the data traffic of the specific type to the communication users of the larger group by assuming that the data traffic of the specific type is composed similarly for the communication user of the reference group and communication users of the larger group.

10. Method of one of claims 1 to 9, wherein the communication user of the reference group and the communication users of the larger group have at least one of the following criteria in common an access point network, a location, connection time and a subscriber type.

1 1. Node for estimating data traffic composition in a policy and charging system of a communication network, comprising an analyzer (610) configured to analyze data traffic of communication users regarding data traffic types; an identifier (620) configured to identify from the analyzed data traffic a communication user of a reference group, for which data traffic of a specific type is similar to data traffic of a specific type of a larger group of communication users, by using usage patterns of the communication users; and an examiner (630) configured to examine the data traffic of the specific type of the communication user of the reference group in order to extrapolate the result of the examination of the data traffic of the specific type to communication users of the larger group.

12. Node of claim 11, wherein the identifier (620) is configured to correlate a usage pattern for a communication user with an average usage pattern of several

communication users.

13. Node of claim 12, wherein if the correlation is high, the communication user is selected as a communication user of the reference group and if correlation is low, the communication user is selected as a communication user of the larger group.

14. Node of one of claims 1 to 13, wherein the analyzer (610) is further configured to carry out at least one function of classifying data traffic according to Policy and Charging Control, PCC, rules and calculating a usage pattern for a communication user.

15. Node of one of claims 11 to 14, wherein a usage pattern comprises usage data traffic over time.

16. Node of one of claims 11 to 15, wherein data traffic of a specific type is data traffic which is classified into a default classification rule.

17. Node of claim 16, wherein the default classification rule indicates traffic not matching a specific classification rule.

18. Node of one of claims 11 to 17, further comprising a sender (725) configured to send from said node (700) to another node (800) a message indicating that the identified communication user is a communication user of the reference group.

19. Node of one of claims 11 to 18, further comprising an extrapolator configured to extrapolate the result of the examination of the data traffic of the specific type to the communication users of the larger group by assuming that the data traffic of the specific type is composed similarly for the communication user of the reference group and communication users of the larger group.

20. Node of one of claims 11 to 19, wherein the communication user of the reference group and the communication users of the larger group have at least one of the following criteria in common an access point network, a location, connection time and a subscriber type.

21. System comprising the node (600, 700) of one of claims 1 1 to 20, as well as another node (800) having a Policy and Charging Rules Function, PCRF, and being configured to provide to said node a default classification rule to be installed in said node.

22. System of claim 21 , wherein said other node (800) is further configured to receive a message indicating that the identified communication user is a communication user of the reference group and to store this information in a database.

23. System of one of claims 21 and 22, wherein said other node is further configured to provide to said node a Policy and Charging Control, PCC, rule including additional data traffic examination for said communication user of the reference group.

24. Computer program including instructions configured, when executed on a data processor, to cause the data processor to execute the method of one of claims 1 to 10.