WO2015189380A1 - Method and apparatus for detecting and filtering undesirable phone calls - Google Patents

Method and apparatus for detecting and filtering undesirable phone calls Download PDF

Info

Publication number
WO2015189380A1
WO2015189380A1 PCT/EP2015/063155 EP2015063155W WO2015189380A1 WO 2015189380 A1 WO2015189380 A1 WO 2015189380A1 EP 2015063155 W EP2015063155 W EP 2015063155W WO 2015189380 A1 WO2015189380 A1 WO 2015189380A1
Authority
WO
WIPO (PCT)
Prior art keywords
phone
local
call
spam
assortativity
Prior art date
Application number
PCT/EP2015/063155
Other languages
French (fr)
Inventor
Christoph Neumann
Olivier Heen
Erwan Le Merrer
Original Assignee
Thomson Licensing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing filed Critical Thomson Licensing
Publication of WO2015189380A1 publication Critical patent/WO2015189380A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/436Arrangements for screening incoming calls, i.e. evaluating the characteristics of a call before deciding whether to answer it
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2203/00Aspects of automatic or semi-automatic exchanges
    • H04M2203/55Aspects of automatic or semi-automatic exchanges related to network data storage and management
    • H04M2203/551Call history

Definitions

  • This invention relates to a method and an apparatus for detecting and filtering undesirable incoming phone calls.
  • the present principles provide a method for detecting spam numbers, comprising: selecting a plurality of phone numbers; generating a local call graph for a respective one of the plurality of phone numbers; aggregating the local call graphs to generate a call graph; determining a parameter, indicative of correlation between a degree of a node and degrees of linked nodes, responsive to the generated call graph, the node corresponding to a phone number included in the generated call graph; and determining whether the phone number is a spam number responsive to the parameter as described below.
  • the present principles also provide an apparatus for performing these steps.
  • the present principles also provide a method for detecting spam numbers, comprising: selecting a plurality of phone numbers; generating a local call graph for a respective one of the plurality of phone numbers; aggregating the local call graphs to generate a call graph; determining at least one of an (out,in) local assortativity coefficient, an (in,out) local assortativity coefficient, and an (in,in) local assortativity coefficient for the phone number, responsive to the generated call graph, the node corresponding to a phone number included in the generated call graph; and determining the phone number to be a spam number when the (out,in) local assortativity coefficient is greater than zero, and the (in,out) local assortativity coefficient and the (in,in) local assortativity coefficient are smaller than zero as described below.
  • the present principles also provide an apparatus for performing these steps.
  • the present principles also provide a computer readable storage medium having stored thereon instructions for detecting spam numbers, according to the methods described above.
  • FIG. 1 illustrates an exemplary method for detecting and filtering undesired phone calls, in accordance with one embodiment of the present principles.
  • FIG. 2 illustrates a pictorial example depicting a local call graph, in accordance with one embodiment of the present principles.
  • FIG. 3 illustrates an exemplary apparatus for detecting and filtering undesired phone calls, in accordance with one embodiment of the present principles.
  • FIG. 4 illustrates an exemplary system wherein phone spam detection and filtering can be used.
  • Tellows associates a "trustworthiness score" to each referenced phone number and provides statistics on how many times a number has been searched for or commented on for its service. Tellows also provides a mobile phone application (Android, iOS) that automatically displays the trustworthiness score for each incoming phone call.
  • Android iOS
  • SPIT Voice call spam in the context of internet telephony is called “Spam over Internet Telephony” (SPIT).
  • SPIT mechanisms are often limited to Internet Telephony and consider specific features of the Internet Telephony protocol.
  • CallRank as described in an article by Vijay Balasubramaniyan et al., titled “CallRank: combating SPIT Using Call Duration, Social Networks and Global Reputation,” CEAS 2007 Fourth Conference on Email and
  • Nanavati a mobile phone call graph (where people are the nodes and calls are the edges), where a call graph G is defined as a pair ⁇ V(G), E(G)>, V(G) being a set of vertices representing the mobile phone users, and E(G) being the set of directed vertex -pairs from V(G) representing mobile phone calls.
  • edges are directed but not weighted.
  • Nanavati in particular analyzes the in- and out-degree correlation in a mobile phone of a large mobile telecom operator, and shows that there is a strong correlation between the in-degree and the average out-degree of a node, up to an in-degree of approximately 100. Over that threshold Nanavati highlights that there exist nodes which call a lot of numbers (i.e., having a high out-degree, while having a low in-degree) and attributes these calls to some salesmen activity. Over an in-degree of 100 Nanavati also observes a lot of numbers with a very low out-degree and attributes these calls to some customer service numbers or small businesses with advertised phone numbers.
  • a call graph G is defined as a pair ⁇ V(G), E(G)>, V(G) being a set of vertices representing the phone numbers, and E(G) being a set of directed vertex -pairs (edges) between V(G) representing phone calls.
  • edges are directed but not weighted.
  • FIG. 1 illustrates an exemplary method 100 for detecting phone call spam according to an embodiment of the present principles.
  • Method 100 starts at step 105.
  • it performs initialization, for example, a telecom operator may select or sample N users randomly, wherein a respective local call graph may be built from the perspective of each of these N users. For ease of notation, these N chosen users are denoted as "sample users" in the present application.
  • a call graph is accessed. In one embodiment, the telecom operator continuously maintains a call graph over a sliding window e.g., of 1 day or 1 month.
  • the call graph may be built from past activities and method 100 is used to analyze previous phone spam activities.
  • the complete call graph G of a telecom operator i.e., the call graph comprising all users and phone calls of a telecom operator, is generally quite large and computations can be expensive. Therefore, we can consider a sub-graph G' comprising only the N sample users and the numbers they called and have been called from and the corresponding phone calls.
  • G' can be constructed as the union of all local call graphs of the N sample users.
  • step 130 (in,in), (in,out),(out,in) local assortativity coefficients are determined for individual nodes in the sub-graph.
  • step 140 it classifies a phone call as spam, for example, based on the local assortativity coefficients, the out-degrees, whitelist, and/or blacklist. More generally, we may compute other metrics that measure the correlation between degrees (for example, in-degrees or out-degrees) of linked nodes instead of the local assortativity coefficients. For example, the centrality measures of a node in the call graph, such as the betweenness centrality or PageRank. [19] At step 150, it applies a specific treatment to the calls being classified as spam.
  • the treatment may be chosen from, for example, but not limited to, block call, forward to voice mail box, notify the end-user of the classification result (e.g., prefixing the callers ID on the phone display with 'Spam?', using a different ring tone, display information on some device, e.g., smartphone, TV).
  • Method 100 ends at step 199.
  • the steps of building a call graph, calculating local assortativity coefficients, and detecting phone spam are discussed in further detail.
  • the present principles may be implemented in a telecom operator network or in a third party device.
  • the operator continuously maintains the call graph over a sliding window, e.g., of 1 day or 1 month.
  • a third party which has access to the operator's phone call log, can also be used to generate the call graph.
  • the call graph may be obtained by different methods, for example, by analyzing an internal operator's logs, by active agents running on smartphones, by active agents running on home gateways.
  • An example of phone call log is provided below:
  • 2013- 09-29T21 19 16Z Incoming, 12345678, +XX987654321, 0, AbortedbyCalled 2013- 09-27T20 03:50Z Outgoing, + XX987654321, 213427959, 14, Finished 2013- 09-18T21 14 : 58Z Incoming, 12345678, +XX987654321, 0, AbortedbyCalled 2013- 09-19T20 15 : 47Z Outgoing, +XX987654321, 765321, 82, Finished
  • 2013- 09-23T21 08 20Z Incoming, 12345678, +XX987654321, 83, Finished 2013- 09-28T21 15.-06Z Incoming, 12345678, +XX987654321, 0, AbortedbyCalled 2013- 09-27T20 04 : 45Z Outgoing, +XX987654321, 765321, 58, Finished
  • a call graph may be generated using at least the following steps:
  • the created edges are non-weighted.
  • the in-degree of a vertex is defined as the number of incoming edges, and the out-degree of a vertex is defined as the number of outgoing edges.
  • a complete call graph G of a telecom operator is generally quite large and computations can be expensive. Therefore, the operator may only sample N users, generate a local view of the call graph for each sample user and aggregate them to a sub-graph G' .
  • one agent for example, an application running on a
  • smartphone may collect the local call log of a sample user, and can generate a local call graph from the perspective of the sample user. Combining and aggregating the logs of several agents, or the local call graphs, allow building the graph G' .
  • G' G t , where G ; is the local call graph of sample user u ; .
  • the local call graph G is composed of the set of vertices V(G and the set of edges E(G , wherein V(G contains Uj and the set of phone numbers that Uj called or has been called by, and E(G represents phone calls between Uj and the phone numbers in V(Gi).
  • V(G contains Uj and the set of phone numbers that Uj called or has been called by
  • E(G represents phone calls between Uj and the phone numbers in V(Gi).
  • There exist directed unweighted edges in E(G if there has been at least one call from one phone number towards another one.
  • the graph is considered assortative if the nodes with a high degree tend to be connected with other nodes with a high degree. Inversely, a graph is considered disassortative if high-degree nodes are mostly connected to low-degree nodes.
  • Social networks, co-author graphs, co-actor graphs are typically assortative, while the Internet topology or the graph of world wide web hyperlinks are typically disassortative.
  • the global assortativity coefficient r can be measured, as described in an article by Mark E. J. Newman, titled "Assortative mixing in networks," Physical review letters 89.20 (2002): 208701, by calculating the Pearson correlation coefficient of degrees between pairs of linked nodes:
  • the directed degrees j" ,j be the a- and ⁇ -degree of the source node and target node for edge i (a ⁇ in, out ⁇ index the degree type) and the standard deviations ⁇ ⁇ , for the ( ⁇ , ⁇ ) assortativity coefficient.
  • the ( ⁇ , ⁇ ) local assortativity coefficient can be defined using the Pearson correlation:
  • r(in,in) measures the tendencies of a node to connect with other nodes that have similar in-degrees
  • r(out,out) measures the tendencies of a node to connect with other nodes that have similar out-degrees
  • r(in,out) measures the tendencies of a node with a given in-degree to connect with other nodes that have a similar out-degree
  • r(out,in) measures the tendencies of a node with a given out-degree to connect with other nodes that have a similar in-degree.
  • this classifier classifies a phone number as a spam number, if
  • the phone number calls many users with high in-degrees.
  • the call graph is built as an aggregation of local call graphs of sample users, usually the majority of users having high in-degrees are sample users. That is, this condition states that the phone number calls many sample users, and
  • the phone number is never or rarely called by numbers that have either many incoming or outgoing calls.
  • the call graph is built as an aggregation of local call graphs of sample users, usually the majority of users having high out-degrees are sample users. This condition simply states that spam numbers are in general not or rarely called by anyone. Even sample users that call a lot of other numbers do not or rarely call the spam numbers.
  • FIG. 2 illustrates the graphical representation of a call graph G' composed of the local call graphs of 146 sample users of a telecom operator. Overall G' has 4699 vertices (phone numbers) and 6621 directed unweighted edges (at least one phone call between two vertices).
  • the implementations of the proposed phone spam detection do not require that the content of the phone call to be analyzed, and thus allow fast reaction and may be used for real time applications.
  • the present implementations may also be used a posteriori, on an older call graph, for seeking evidences of a former spam activity.
  • the implementations only require light computations.
  • the asymptotic complexity with respect to graph G' is less than quadratic.
  • the present principles are not restricted to Internet Telephony and can be applied more broadly to any telephony system.
  • the present principles may be implemented from the point of view of a telecom operator which generally has a broader view and valuable information to filter out phone spam.
  • FIG. 3 depicts a block diagram of an exemplary phone spam detection system 300 where phone call spam can be detected and filtered.
  • System 300 includes call history logging module 310, spam detector 320, and call filter 330.
  • Call history logging module 310 record information about the phone calls.
  • spam detector 320 analyzes the call graph, for example, using the method described in method 100. If a call is detected as spam, the call may be blocked or the user is notified. Otherwise, if the call is not detected as spam, the phone call is directed to the user.
  • All the modules of phone spam detection system 300 may be located at one device, or may be distributed over different devices.
  • call history logging module 310 may be located centrally in the operators network or in a distributed manner on several smartphones
  • spam detector may be located within the telecom operator network
  • call filter may be located within the telecom operator network or on a phone.
  • Such phone spam detection and filtering service may be offered by the telecom operator to its users.
  • the offered service is then similar to an e-mail spam- filter but applied in the context of phone calls.
  • the service can also be offered by a third party that collects the phone call logs from several smartphones, e.g., by distributing an app that provides both the logging module and the call filter module.
  • FIG. 4 shows an exemplary system 400 where phone spam detection system 300 can be used.
  • different types of phones or devices providing phone service for example, mobile phones 410, network devices 420, and landline phones 430, are connected to a telecom operator 470, through wireless network 440, Internet 450, telephone network 460, respectively.
  • the spam detection system may be embodied as a standalone device within the telecom operators core network, for example, incorporated within local handsets or base stations at the customer location or servers at the telecom operator locations, or running as a service on some device/computer connected to the internet.
  • the implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program).
  • An apparatus may be implemented in, for example, appropriate hardware, software, and firmware.
  • the methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device.
  • Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users.
  • communication devices such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users.
  • PDAs portable/personal digital assistants
  • Determining the information may include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.
  • Accessing the information may include one or more of, for example, receiving the information, retrieving the information (for example, from memory), storing the information, processing the information, transmitting the information, moving the
  • Receiving is, as with “accessing”, intended to be a broad term.
  • Receiving the information may include one or more of, for example, accessing the information, or retrieving the information (for example, from memory).
  • “receiving” is typically involved, in one way or another, during operations such as, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
  • implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted.
  • the information may include, for example, instructions for performing a method, or data produced by one of the described implementations.
  • a signal may be formatted to carry the bitstream of a described embodiment.
  • Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal.
  • the formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream.
  • the information that the signal carries may be, for example, analog or digital information.
  • the signal may be transmitted over a variety of different wired or wireless links, as is known.
  • the signal may be stored on a processor-readable medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

From a directed call graph, local (out, in), (in, in), and (in, out) assortativity coefficients may be calculated. If a phone number's (a) local (out, in) assortativity coefficient > 0; and (b) local (in, in) and (in, out) assortativity coefficients ≤ 0, we may classify the phone number as a spam number. Other parameters measuring the correlation of degrees between linked nodes can be used instead of local assortativity coefficients for phone spam detection. We may use additional conditions, for example, whether the out-degree of a node corresponding a phone number is greater than a threshold, in phone spam detection. We can also use other mechanisms, such as whitelist and blacklist, in phone spam detection.

Description

Method and Apparatus for Detecting and Filtering Undesirable Phone Calls
TECHNICAL FIELD
[1] This invention relates to a method and an apparatus for detecting and filtering undesirable incoming phone calls. BACKGROUND
[2] It is generally considered desirable to be able to detect and filter undesirable phone calls. We may consider an incoming phone call to be undesirable/annoying, for example, but not limited to, when the phone call belongs to at least one of the following categories: telemarketing, survey, market research, phone evangelism, and one ring phone scam (pinging a range of phone numbers sequentially to solicit return calls).
SUMMARY
[3] The present principles provide a method for detecting spam numbers, comprising: selecting a plurality of phone numbers; generating a local call graph for a respective one of the plurality of phone numbers; aggregating the local call graphs to generate a call graph; determining a parameter, indicative of correlation between a degree of a node and degrees of linked nodes, responsive to the generated call graph, the node corresponding to a phone number included in the generated call graph; and determining whether the phone number is a spam number responsive to the parameter as described below. The present principles also provide an apparatus for performing these steps. [4] The present principles also provide a method for detecting spam numbers, comprising: selecting a plurality of phone numbers; generating a local call graph for a respective one of the plurality of phone numbers; aggregating the local call graphs to generate a call graph; determining at least one of an (out,in) local assortativity coefficient, an (in,out) local assortativity coefficient, and an (in,in) local assortativity coefficient for the phone number, responsive to the generated call graph, the node corresponding to a phone number included in the generated call graph; and determining the phone number to be a spam number when the (out,in) local assortativity coefficient is greater than zero, and the (in,out) local assortativity coefficient and the (in,in) local assortativity coefficient are smaller than zero as described below. The present principles also provide an apparatus for performing these steps.
[5] The present principles also provide a computer readable storage medium having stored thereon instructions for detecting spam numbers, according to the methods described above.
BRIEF DESCRIPTION OF THE DRAWINGS
[6] FIG. 1 illustrates an exemplary method for detecting and filtering undesired phone calls, in accordance with one embodiment of the present principles.
[7] FIG. 2 illustrates a pictorial example depicting a local call graph, in accordance with one embodiment of the present principles.
[8] FIG. 3 illustrates an exemplary apparatus for detecting and filtering undesired phone calls, in accordance with one embodiment of the present principles.
[9] FIG. 4 illustrates an exemplary system wherein phone spam detection and filtering can be used. DETAILED DESCRIPTION
[10] There exist some work on detecting undesirable phone calls. For example, several internet sites provide community-sourced databases of undesirable phone numbers. Such internet sites include, but not limited to, http://callerr.com, http://whocalled.us/, http://www.numbercop.com/, http://www.whocallsmefrom.com/, http://whocallsu.com, http://800notes.com/, http://SpamPages.info/, http://www.chamada.pt/,
http://whocallsme.com/, http://www.callercomplaints.com/, http://www.tellows.com/. In particular, Tellows associates a "trustworthiness score" to each referenced phone number and provides statistics on how many times a number has been searched for or commented on for its service. Tellows also provides a mobile phone application (Android, iOS) that automatically displays the trustworthiness score for each incoming phone call.
Community-based approaches require community sourced information and may take some time before a new spam number is referenced.
[11] Voice call spam in the context of internet telephony is called "Spam over Internet Telephony" (SPIT). SPIT mechanisms are often limited to Internet Telephony and consider specific features of the Internet Telephony protocol. CallRank, as described in an article by Vijay Balasubramaniyan et al., titled "CallRank: Combating SPIT Using Call Duration, Social Networks and Global Reputation," CEAS 2007 Fourth Conference on Email and
AntiSpam, August 23, 2007, Mountain View, California USA, relies on a social network and a reputation mechanism to differentiate between legitimate and spam callers in IP telephony systems. CallRank uses the call duration to differentiate between a legitimate caller and a spammer: callers provide call recipients with "call credentials," which attest that the call recipient stayed a certain amount of time in phone calls. The intuition is that legitimate callers generally receive and send calls that last some significant duration. In contrast, a spammer or telemarketer generally does not receive any phone calls at all, or only receives a very small number. A spammer also tries to make calls brief, in order to reach as many victims as possible in a short amount of time. [12] Dantu et al. ("Detecting spam in VoIP networks," Proc. SRUTI '05: Steps to
Reducing Unwanted Traffic on the Internet Workshop, pp. 31-37, Cambridge, MA, July 2005) adopt the point of view of a single end-user/VoIP terminal, locally trying to filter out an incoming SPIT. Call filtering is based on the following stages: (i) Presence:
synchronization with an end-user's individual calendar. For example, calls are blocked if the end-user is in a meeting; (ii) Rate limiting: thresholds on the velocity and acceleration values of the number of arriving calls from a given user/host/domain; (iii) Black and white lists that are continuously updated by the end-user by classifying incoming calls; (iv) Bayesian inference: Based on the end-user's feedback (black/white listing) a Bayesian learning module automatically infers the category of an incoming call. Bayesian learning is based on the Session Initiation Protocol (SIP) fields "From," "To," "Via," "Record Route," "Contact Info," etc; (v) Social network and reputation using the Bayesian inference of trusted peers.
[13] Nassar et al. ("VoIP Honeypot Architecture," ΙΜΌ7, 10th IFIP/IEEE International Symposium on Integrated Network Management, pp. 109-118, Munich, 2007) use honeypots for detecting spam calls. For example, some telecom operators use fake phone numbers to detect spam calls.
[14] There also exist some research work that focus on malicious outgoing phone calls/SMS, for example, on international phone calls triggered by malicious smartphone applications, and on SCAM phone numbers.
[15] Nanavati et al. ("On the structural properties of massive telecom call graphs: findings and implications," ACM International Conference on Information and Knowledge
Management, CIKM'06, Arlington, Virginia, November, 2006, hereinafter "Nanavati") analyze a mobile phone call graph (where people are the nodes and calls are the edges), where a call graph G is defined as a pair <V(G), E(G)>, V(G) being a set of vertices representing the mobile phone users, and E(G) being the set of directed vertex -pairs from V(G) representing mobile phone calls. In Nanavati, edges are directed but not weighted.
[16] Nanavati in particular analyzes the in- and out-degree correlation in a mobile phone of a large mobile telecom operator, and shows that there is a strong correlation between the in-degree and the average out-degree of a node, up to an in-degree of approximately 100. Over that threshold Nanavati highlights that there exist nodes which call a lot of numbers (i.e., having a high out-degree, while having a low in-degree) and attributes these calls to some salesmen activity. Over an in-degree of 100 Nanavati also observes a lot of numbers with a very low out-degree and attributes these calls to some customer service numbers or small businesses with advertised phone numbers.
[17] In the present application, we use the term "undesirable phone call" and "phone call spam" interchangeably, and use the term "vertex" and "node" interchangeably. In the following, we assume a phone user has only one phone number without loss of generality. The present principles can also be applied when a phone user has more than one phone number. In the present application, a call graph G is defined as a pair <V(G), E(G)>, V(G) being a set of vertices representing the phone numbers, and E(G) being a set of directed vertex -pairs (edges) between V(G) representing phone calls. In the present application, edges are directed but not weighted. The present principles are directed to automatic detection and filtering of annoying and undesirable incoming phone calls. FIG. 1 illustrates an exemplary method 100 for detecting phone call spam according to an embodiment of the present principles. Method 100 starts at step 105. At step 110, it performs initialization, for example, a telecom operator may select or sample N users randomly, wherein a respective local call graph may be built from the perspective of each of these N users. For ease of notation, these N chosen users are denoted as "sample users" in the present application. At step 120, a call graph is accessed. In one embodiment, the telecom operator continuously maintains a call graph over a sliding window e.g., of 1 day or 1 month. In another embodiment, the call graph may be built from past activities and method 100 is used to analyze previous phone spam activities. The complete call graph G of a telecom operator, i.e., the call graph comprising all users and phone calls of a telecom operator, is generally quite large and computations can be expensive. Therefore, we can consider a sub-graph G' comprising only the N sample users and the numbers they called and have been called from and the corresponding phone calls. We denote the call graph from the perspective of one sample user as a local call graph, which only comprises the one sample user and the numbers he called and has been called from and the corresponding phone calls. G' can be constructed as the union of all local call graphs of the N sample users.
[18] At step 130, (in,in), (in,out),(out,in) local assortativity coefficients are determined for individual nodes in the sub-graph. At step 140, it classifies a phone call as spam, for example, based on the local assortativity coefficients, the out-degrees, whitelist, and/or blacklist. More generally, we may compute other metrics that measure the correlation between degrees (for example, in-degrees or out-degrees) of linked nodes instead of the local assortativity coefficients. For example, the centrality measures of a node in the call graph, such as the betweenness centrality or PageRank. [19] At step 150, it applies a specific treatment to the calls being classified as spam. The treatment may be chosen from, for example, but not limited to, block call, forward to voice mail box, notify the end-user of the classification result (e.g., prefixing the callers ID on the phone display with 'Spam?', using a different ring tone, display information on some device, e.g., smartphone, TV). Method 100 ends at step 199. [20] In the following, the steps of building a call graph, calculating local assortativity coefficients, and detecting phone spam are discussed in further detail.
[21] Building Call Graph
[22] The present principles may be implemented in a telecom operator network or in a third party device. In one embodiment, the operator continuously maintains the call graph over a sliding window, e.g., of 1 day or 1 month. A third party, which has access to the operator's phone call log, can also be used to generate the call graph.
[23] The call graph may be obtained by different methods, for example, by analyzing an internal operator's logs, by active agents running on smartphones, by active agents running on home gateways. An example of phone call log is provided below:
Date, In/Out, Caller Callee, Duration, Status
2013- 09-29T21 19 : 16Z Incoming, 12345678, +XX987654321, 0, AbortedbyCalled 2013- 09-27T20 03:50Z Outgoing, + XX987654321, 213427959, 14, Finished 2013- 09-18T21 14 : 58Z Incoming, 12345678, +XX987654321, 0, AbortedbyCalled 2013- 09-19T20 15 : 47Z Outgoing, +XX987654321, 765321, 82, Finished
2013- 09-28T21 15.-06Z Incoming, 12345678, +XX987654321, 0, AbortedbyCalled 2013- 09-13T13 38 : 11 Z Incoming, 33344455566, +XX987654321, 0, AbortedbyCalled 2013- 09-12T09 56:26Z Incoming, , +XX987654321, 0, AbortedbyCalled 2013- 09-27T21 27 : 13Z Incoming, 12345678, +XX987654321, 122, Finished 2013- 09-29T21 19 : 16Z Incoming, 12345678, +XX987654321, 0, None
2013- 09-29T21 19 : 16Z Incoming, 12345678, +XX987654321, 0, None
2013- 09-27T20 04 : 11Z Outgoing, +XX987654321, , 0, AbortedbyCalled
2013- 09-23T21 08 :20Z Incoming, 12345678, +XX987654321, 83, Finished 2013- 09-28T21 15.-06Z Incoming, 12345678, +XX987654321, 0, AbortedbyCalled 2013- 09-27T20 04 : 45Z Outgoing, +XX987654321, 765321, 58, Finished
2013- 09-27T21 26:50Z Incoming, 12345678, +XX987654321, 0, AbortedbyCalled 2013- 09-24T21 22 : 09Z Incoming, 12345678, +XX987654321, 97, Finished 2013- 09-13T09 31 : 06Z Incoming, , +XX987654321, 0, AbortedbyCalled 2013- 09-20T17 14 : 42Z Incoming, , +XX987654321, 0, AbortedbyCalled
[24] In one embodiment, based on the operator's call logs, a call graph may be generated using at least the following steps:
Create a vertex for every number in the call log; Create a directed edge between the vertex representing a caller and the vertex representing the callee if there is at least one call from the caller to the callee.
The created edges are non-weighted. The in-degree of a vertex is defined as the number of incoming edges, and the out-degree of a vertex is defined as the number of outgoing edges. [25] As discussed before, a complete call graph G of a telecom operator is generally quite large and computations can be expensive. Therefore, the operator may only sample N users, generate a local view of the call graph for each sample user and aggregate them to a sub-graph G' . For example, one agent (for example, an application running on a
smartphone) may collect the local call log of a sample user, and can generate a local call graph from the perspective of the sample user. Combining and aggregating the logs of several agents, or the local call graphs, allow building the graph G' .
[26] Mathematically, G' =
Figure imgf000010_0001
Gt, where G; is the local call graph of sample user u;. The local call graph G; is composed of the set of vertices V(G and the set of edges E(G , wherein V(G contains Uj and the set of phone numbers that Uj called or has been called by, and E(G represents phone calls between Uj and the phone numbers in V(Gi). There exist directed unweighted edges in E(G if there has been at least one call from one phone number towards another one.
[27] Calculating Local Assortativity Coefficients
[28] In an undirected graph, the graph is considered assortative if the nodes with a high degree tend to be connected with other nodes with a high degree. Inversely, a graph is considered disassortative if high-degree nodes are mostly connected to low-degree nodes. Social networks, co-author graphs, co-actor graphs are typically assortative, while the Internet topology or the graph of world wide web hyperlinks are typically disassortative. [29] In an undirected graph, the global assortativity coefficient r can be measured, as described in an article by Mark E. J. Newman, titled "Assortative mixing in networks," Physical review letters 89.20 (2002): 208701, by calculating the Pearson correlation coefficient of degrees between pairs of linked nodes:
jkjk(ejk - qjqk)
γ =
σ2 where j and k are the degrees of linked nodes, qk and qj are the distributions of the remaining degree, which represents the number of edges leaving the node, other than the ones that connects the pair, ejk is the joint probability distribution of the remaining degrees, and oq is the standard deviation of the distribution of the remaining degree q in the graph. A graph is assortative when r > 0, and disassortative when r < 0. [30] We can also calculate the local assortativity coefficient p as the contribution that a particular node makes to the graph assortativity:
Figure imgf000011_0001
where j is the degree of the particular node, k is the average degree of its neighbors (a neighbor is a node to which the particular node has an edge), is the mean of the remaining degree q in the graph, and M is the number of edges in the graph. [31] In a directed graph we can calculate the (in,in), (in,out), (out,in) and (out,out) local assortativity coefficients as described in an article by Jacob G. Foster et al., "Edge direction and the structure of networks," Proceedings of the National Academy of Sciences 107, no. 24 (2010): 10815-10820. Instead of the undirected degrees j and k, we consider the directed degrees j" ,j be the a- and β -degree of the source node and target node for edge i (a {in, out} index the degree type) and the standard deviations σα, for the (α,β) assortativity coefficient. The (α,β) local assortativity coefficient can be defined using the Pearson correlation:
Figure imgf000012_0001
where = ∑ij? /M and kP= ∑i kf /M. Generally, r(in,in) measures the tendencies of a node to connect with other nodes that have similar in-degrees, r(out,out) measures the tendencies of a node to connect with other nodes that have similar out-degrees, r(in,out) measures the tendencies of a node with a given in-degree to connect with other nodes that have a similar out-degree, and r(out,in) measures the tendencies of a node with a given out-degree to connect with other nodes that have a similar in-degree.
[32] Classifying a Phone Call
[33] Upon calculating the (in,in), (in,out) and (out,in) local assortativity coefficients of individual nodes in G' , the operator classifies a phone number as a spam number (or any outgoing call from this number as a spam call) if:
The calling phone number's (out,in) local assortativity > 0, and
The calling phone number' s (in,in) and (in,out) local assortativity < 0.
[34] Intuitively, this classifier classifies a phone number as a spam number, if
the phone number calls many users with high in-degrees. In particularly, when the call graph is built as an aggregation of local call graphs of sample users, usually the majority of users having high in-degrees are sample users. That is, this condition states that the phone number calls many sample users, and
the phone number is never or rarely called by numbers that have either many incoming or outgoing calls. In particularly, when the call graph is built as an aggregation of local call graphs of sample users, usually the majority of users having high out-degrees are sample users. This condition simply states that spam numbers are in general not or rarely called by anyone. Even sample users that call a lot of other numbers do not or rarely call the spam numbers.
[35] As discussed before, instead of local assortativity coefficients, other metrics that measure the correlation between degrees of linked nodes can also be used. Additional conditions may be added to the above classifier. For example, we can add the condition that the out-degree of a node must be greater than a threshold, for example, 2, to be classified as spam. We may also use whitelist or blacklist mechanisms.
[36] When a user receives a call, the number of the caller is checked against the phone numbers classified as spam. When a call is classified as spam the operator may show a notification of the classification on the end-user's TV (if switched on and using IPTV services at this moment) or the phone display (by e.g., prefixing the callers ID on the phone display with 'Spam?'). The operator may also automatically redirect the call to the voicemail of the user. [37] FIG. 2 illustrates the graphical representation of a call graph G' composed of the local call graphs of 146 sample users of a telecom operator. Overall G' has 4699 vertices (phone numbers) and 6621 directed unweighted edges (at least one phone call between two vertices).
[38] In order to evaluate the performance of the classifier we collect ground truth from the community-based phone number rating sites, for example, Tellows. We consider a call as spam if the number of searches for this number on tellows is greater than 0. We also checked some "suspicious" numbers manually on other community-based sites and updated the classification accordingly. Using this method we label 231 phone numbers as spam numbers and 1873 calls as spam calls. We label 4468 numbers as not spam and 31820 calls as not spam.
[39] Applying our classifier we obtain the following results:
TABLE 1(a) without the condition that out-degree > 2 for a spam number
Figure imgf000014_0001
[40] The above results show that the method allows for detecting spam with relative good detection rates, even though the sampling size (N) is small. A value for N that optimizes the detection rates may be determined heuristically.
[41] The implementations of the proposed phone spam detection do not require that the content of the phone call to be analyzed, and thus allow fast reaction and may be used for real time applications. The present implementations may also be used a posteriori, on an older call graph, for seeking evidences of a former spam activity. The implementations only require light computations. In particular, the asymptotic complexity with respect to graph G' is less than quadratic.
[42] In addition, the present principles are not restricted to Internet Telephony and can be applied more broadly to any telephony system. The present principles may be implemented from the point of view of a telecom operator which generally has a broader view and valuable information to filter out phone spam.
[43] FIG. 3 depicts a block diagram of an exemplary phone spam detection system 300 where phone call spam can be detected and filtered. System 300 includes call history logging module 310, spam detector 320, and call filter 330. Call history logging module 310 record information about the phone calls. Given the phone call logs, spam detector 320 analyzes the call graph, for example, using the method described in method 100. If a call is detected as spam, the call may be blocked or the user is notified. Otherwise, if the call is not detected as spam, the phone call is directed to the user.
[44] All the modules of phone spam detection system 300 may be located at one device, or may be distributed over different devices. For example, call history logging module 310 may be located centrally in the operators network or in a distributed manner on several smartphones, spam detector may be located within the telecom operator network, and call filter may be located within the telecom operator network or on a phone.
[45] Such phone spam detection and filtering service may be offered by the telecom operator to its users. The offered service is then similar to an e-mail spam- filter but applied in the context of phone calls. The service can also be offered by a third party that collects the phone call logs from several smartphones, e.g., by distributing an app that provides both the logging module and the call filter module.
[46] FIG. 4 shows an exemplary system 400 where phone spam detection system 300 can be used. In FIG. 4, different types of phones or devices providing phone service, for example, mobile phones 410, network devices 420, and landline phones 430, are connected to a telecom operator 470, through wireless network 440, Internet 450, telephone network 460, respectively. The spam detection system may be embodied as a standalone device within the telecom operators core network, for example, incorporated within local handsets or base stations at the customer location or servers at the telecom operator locations, or running as a service on some device/computer connected to the internet.
[47] The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users. [48] Reference to "one embodiment" or "an embodiment" or "one implementation" or "an implementation" of the present principles, as well as other variations thereof, mean that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment of the present principles. Thus, the appearances of the phrase "in one embodiment" or "in an embodiment" or "in one
implementation" or "in an implementation", as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment.
[49] Additionally, this application or its claims may refer to "determining" various pieces of information. Determining the information may include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.
[50] Further, this application or its claims may refer to "accessing" various pieces of information. Accessing the information may include one or more of, for example, receiving the information, retrieving the information (for example, from memory), storing the information, processing the information, transmitting the information, moving the
information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
[51] Additionally, this application or its claims may refer to "receiving" various pieces of information. Receiving is, as with "accessing", intended to be a broad term. Receiving the information may include one or more of, for example, accessing the information, or retrieving the information (for example, from memory). Further, "receiving" is typically involved, in one way or another, during operations such as, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
[52] As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry the bitstream of a described embodiment. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.

Claims

CLAIMS:
1. A method for detecting spam numbers, comprising:
selecting a plurality of phone numbers from a set of phone numbers stored in at least one memory;
generating a local call graph for a respective one of the plurality of phone numbers; aggregating (120) the local call graphs to generate a call graph;
determining (130) a parameter, indicative of correlation between a degree of a node and degrees of linked nodes, responsive to the generated call graph using a spam detector, the node corresponding to a phone number included in the generated call graph; and
determining (140) whether the phone number is a spam number responsive to the parameter.
2. The method of claim 1, wherein the determining a parameter comprises determining at least one of a local assortativity coefficient, betweenness centrality and PageRank.
3. The method of claim 2, wherein the determining a local assortativity coefficient comprises:
determining at least one of an (out,in) local assortativity coefficient, an (in,out) local assortativity coefficient, and an (in,in) local assortativity coefficient for the phone number.
4. The method of claim 3, wherein the phone number is determined to be a spam number when the (out,in) local assortativity coefficient is greater than zero, and the (in,out) and (in,in) local assortativity coefficients are smaller than zero.
5. The method of claim 1, further comprising:
determining an out-degree of the node responsive to the generated call graph, wherein the determining the phone number to be a spam number is further responsive to the out-degree exceeding a threshold.
6. The method of claim 1, further comprising at least one of indicating a phone call from the spam number as a spam call to a receiver of the phone call; and blocking the phone call from the receiver.
7. The method of claim 1, wherein the local call graph includes a node representing the respective one of the plurality of users, edges representing incoming and outgoing calls of the respective one of the plurality of users, and nodes corresponding to callers of the incomding calls and callees of the outgoing calls.
8. An apparatus for detecting spam numbers, comprising
a call history logging module (310) configured to access a set of phone numbers stored in at least one memory; and
a spam detector (320) configured to
select a plurality of phone numbers from the accessed set of phone numbers, generate a local call graph for a respective one of the plurality of phone numbers,
aggregate the local call graphs to generate a call graph,
determine a parameter, indicative of correlation between a degree of a node and degrees of linked nodes, responsive to the generated call graph, the node corresponding to a phone number included in the generated call graph, and determine whether the phone number is a spam number responsive to the parameter.
9. The apparatus of claim 8, wherein the spam detector is configured to determine at least one of a local assortativity coefficient, betweenness centrality and PageRank.
10. The apparatus of claim 9, wherein the spam detector is configured to determine at least one of an (out,in) local assortativity coefficient, an (in,out) local assortativity coefficient, and an (in,in) local assortativity coefficient for the phone number.
11. The apparatus of claim 10, wherein the phone number is determined to be a spam number when the (out,in) local assortativity coefficient is greater than zero, and the (in,out) and (in,in) local assortativity coefficients are smaller than zero.
12. The apparatus of claim 8, wherein the spam detector is configured to determine an out-degree of the node responsive to the generated call graph, and to determine the phone number to be the spam number further responsive to the out-degree exceeding a threshold.
13. The apparatus of claim 8, further comprising a call filter (330) configured to perform at least one of indicate a phone call from the spam number as a spam call to a receiver of the phone call; and block the phone call from the receiver.
14. The apparatus of claim 8, wherein the local call graph includes a node representing the respective one of the plurality of users, edges representing incoming and outgoing calls of the respective one of the plurality of users and nodes corresponding to callers of the incomding calls and callees of the outgoing calls.
PCT/EP2015/063155 2014-06-13 2015-06-12 Method and apparatus for detecting and filtering undesirable phone calls WO2015189380A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
EP14305904 2014-06-13
EP14305904.6 2014-06-13
EP14306683 2014-10-23
EP14306683.5 2014-10-23

Publications (1)

Publication Number Publication Date
WO2015189380A1 true WO2015189380A1 (en) 2015-12-17

Family

ID=53366046

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2015/063155 WO2015189380A1 (en) 2014-06-13 2015-06-12 Method and apparatus for detecting and filtering undesirable phone calls

Country Status (1)

Country Link
WO (1) WO2015189380A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017148146A1 (en) * 2016-03-01 2017-09-08 华为技术有限公司 Method and device for preventing nuisance calls
US10708416B2 (en) 2017-03-28 2020-07-07 AVAST Software s.r.o. Identifying spam callers in call records
RU2762389C2 (en) * 2021-05-07 2021-12-20 Общество с ограниченной ответственностью "Алгоритм" Method for recognizing a subscriber making unwanted calls and a method for handling an unwanted call
US20220329663A1 (en) * 2021-04-12 2022-10-13 Rakuten Mobile, Inc. Managing a software application

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080292077A1 (en) * 2007-05-25 2008-11-27 Alcatel Lucent Detection of spam/telemarketing phone campaigns with impersonated caller identities in converged networks
US20100124916A1 (en) * 2008-11-20 2010-05-20 Samsung Electronics Co., Ltd. Apparatus and method for managing spam number in mobile communication terminal
WO2011149846A1 (en) * 2010-05-26 2011-12-01 Google Inc. Apparatus and method for identification of spam

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080292077A1 (en) * 2007-05-25 2008-11-27 Alcatel Lucent Detection of spam/telemarketing phone campaigns with impersonated caller identities in converged networks
US20100124916A1 (en) * 2008-11-20 2010-05-20 Samsung Electronics Co., Ltd. Apparatus and method for managing spam number in mobile communication terminal
WO2011149846A1 (en) * 2010-05-26 2011-12-01 Google Inc. Apparatus and method for identification of spam

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017148146A1 (en) * 2016-03-01 2017-09-08 华为技术有限公司 Method and device for preventing nuisance calls
US10708416B2 (en) 2017-03-28 2020-07-07 AVAST Software s.r.o. Identifying spam callers in call records
US20220329663A1 (en) * 2021-04-12 2022-10-13 Rakuten Mobile, Inc. Managing a software application
US11736578B2 (en) * 2021-04-12 2023-08-22 Rakuten Mobile, Inc. Managing a software application
RU2762389C2 (en) * 2021-05-07 2021-12-20 Общество с ограниченной ответственностью "Алгоритм" Method for recognizing a subscriber making unwanted calls and a method for handling an unwanted call

Similar Documents

Publication Publication Date Title
EP3050287B1 (en) Identifying and filtering incoming telephone calls to enhance privacy
US9100455B2 (en) Method and apparatus for providing protection against spam
US8788657B2 (en) Communication monitoring system and method enabling designating a peer
US12081696B2 (en) System and method for determining unwanted call origination in communications networks
US10785369B1 (en) Multi-factor scam call detection and alerting
US20120151046A1 (en) System and method for monitoring and reporting peer communications
US20110280160A1 (en) VoIP Caller Reputation System
US20120233098A1 (en) Multiple Hypothesis Tracking
US11057436B1 (en) System and method for monitoring computing servers for possible unauthorized access
US20110295982A1 (en) Societal-scale graph-based interdiction for virus propagation slowdown in telecommunications networks
US20120233097A1 (en) Multiple Hypothesis Tracking
WO2015189380A1 (en) Method and apparatus for detecting and filtering undesirable phone calls
US11770475B2 (en) Computerized system and method for robocall steering
Chaisamran et al. Trust-based voip spam detection based on call duration and human relationships
Pandit et al. Towards Measuring the Effectiveness of Telephony Blacklists.
Azad et al. Socioscope: I know who you are, a robo, human caller or service number
US20130238517A1 (en) Method and apparatus for creating a social network map of non-voice communications
Shafiq et al. Effective feature selection for 5G IM applications traffic classification
Tabassum et al. Profiling high leverage points for detecting anomalous users in telecom data networks
Azad et al. Clustering VoIP caller for SPIT identification
Chaisamran et al. Trust-based voip spam detection based on calling behaviors and human relationships
d’Heureuse et al. Analyzing telemarketer behavior in massive telecom data records
Azad et al. Mitigating spit with social strength
Ravula et al. VoIP Spam Detection using Machine Learning
Azad et al. ROBO-SPOT: Detecting Robocalls by Understanding User Engagement and Connectivity Graph

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15727686

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15727686

Country of ref document: EP

Kind code of ref document: A1