WO2015189380A1 - Method and apparatus for detecting and filtering undesirable phone calls - Google Patents
Method and apparatus for detecting and filtering undesirable phone calls Download PDFInfo
- Publication number
- WO2015189380A1 WO2015189380A1 PCT/EP2015/063155 EP2015063155W WO2015189380A1 WO 2015189380 A1 WO2015189380 A1 WO 2015189380A1 EP 2015063155 W EP2015063155 W EP 2015063155W WO 2015189380 A1 WO2015189380 A1 WO 2015189380A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- phone
- local
- call
- spam
- assortativity
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/436—Arrangements for screening incoming calls, i.e. evaluating the characteristics of a call before deciding whether to answer it
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2203/00—Aspects of automatic or semi-automatic exchanges
- H04M2203/55—Aspects of automatic or semi-automatic exchanges related to network data storage and management
- H04M2203/551—Call history
Definitions
- This invention relates to a method and an apparatus for detecting and filtering undesirable incoming phone calls.
- the present principles provide a method for detecting spam numbers, comprising: selecting a plurality of phone numbers; generating a local call graph for a respective one of the plurality of phone numbers; aggregating the local call graphs to generate a call graph; determining a parameter, indicative of correlation between a degree of a node and degrees of linked nodes, responsive to the generated call graph, the node corresponding to a phone number included in the generated call graph; and determining whether the phone number is a spam number responsive to the parameter as described below.
- the present principles also provide an apparatus for performing these steps.
- the present principles also provide a method for detecting spam numbers, comprising: selecting a plurality of phone numbers; generating a local call graph for a respective one of the plurality of phone numbers; aggregating the local call graphs to generate a call graph; determining at least one of an (out,in) local assortativity coefficient, an (in,out) local assortativity coefficient, and an (in,in) local assortativity coefficient for the phone number, responsive to the generated call graph, the node corresponding to a phone number included in the generated call graph; and determining the phone number to be a spam number when the (out,in) local assortativity coefficient is greater than zero, and the (in,out) local assortativity coefficient and the (in,in) local assortativity coefficient are smaller than zero as described below.
- the present principles also provide an apparatus for performing these steps.
- the present principles also provide a computer readable storage medium having stored thereon instructions for detecting spam numbers, according to the methods described above.
- FIG. 1 illustrates an exemplary method for detecting and filtering undesired phone calls, in accordance with one embodiment of the present principles.
- FIG. 2 illustrates a pictorial example depicting a local call graph, in accordance with one embodiment of the present principles.
- FIG. 3 illustrates an exemplary apparatus for detecting and filtering undesired phone calls, in accordance with one embodiment of the present principles.
- FIG. 4 illustrates an exemplary system wherein phone spam detection and filtering can be used.
- Tellows associates a "trustworthiness score" to each referenced phone number and provides statistics on how many times a number has been searched for or commented on for its service. Tellows also provides a mobile phone application (Android, iOS) that automatically displays the trustworthiness score for each incoming phone call.
- Android iOS
- SPIT Voice call spam in the context of internet telephony is called “Spam over Internet Telephony” (SPIT).
- SPIT mechanisms are often limited to Internet Telephony and consider specific features of the Internet Telephony protocol.
- CallRank as described in an article by Vijay Balasubramaniyan et al., titled “CallRank: combating SPIT Using Call Duration, Social Networks and Global Reputation,” CEAS 2007 Fourth Conference on Email and
- Nanavati a mobile phone call graph (where people are the nodes and calls are the edges), where a call graph G is defined as a pair ⁇ V(G), E(G)>, V(G) being a set of vertices representing the mobile phone users, and E(G) being the set of directed vertex -pairs from V(G) representing mobile phone calls.
- edges are directed but not weighted.
- Nanavati in particular analyzes the in- and out-degree correlation in a mobile phone of a large mobile telecom operator, and shows that there is a strong correlation between the in-degree and the average out-degree of a node, up to an in-degree of approximately 100. Over that threshold Nanavati highlights that there exist nodes which call a lot of numbers (i.e., having a high out-degree, while having a low in-degree) and attributes these calls to some salesmen activity. Over an in-degree of 100 Nanavati also observes a lot of numbers with a very low out-degree and attributes these calls to some customer service numbers or small businesses with advertised phone numbers.
- a call graph G is defined as a pair ⁇ V(G), E(G)>, V(G) being a set of vertices representing the phone numbers, and E(G) being a set of directed vertex -pairs (edges) between V(G) representing phone calls.
- edges are directed but not weighted.
- FIG. 1 illustrates an exemplary method 100 for detecting phone call spam according to an embodiment of the present principles.
- Method 100 starts at step 105.
- it performs initialization, for example, a telecom operator may select or sample N users randomly, wherein a respective local call graph may be built from the perspective of each of these N users. For ease of notation, these N chosen users are denoted as "sample users" in the present application.
- a call graph is accessed. In one embodiment, the telecom operator continuously maintains a call graph over a sliding window e.g., of 1 day or 1 month.
- the call graph may be built from past activities and method 100 is used to analyze previous phone spam activities.
- the complete call graph G of a telecom operator i.e., the call graph comprising all users and phone calls of a telecom operator, is generally quite large and computations can be expensive. Therefore, we can consider a sub-graph G' comprising only the N sample users and the numbers they called and have been called from and the corresponding phone calls.
- G' can be constructed as the union of all local call graphs of the N sample users.
- step 130 (in,in), (in,out),(out,in) local assortativity coefficients are determined for individual nodes in the sub-graph.
- step 140 it classifies a phone call as spam, for example, based on the local assortativity coefficients, the out-degrees, whitelist, and/or blacklist. More generally, we may compute other metrics that measure the correlation between degrees (for example, in-degrees or out-degrees) of linked nodes instead of the local assortativity coefficients. For example, the centrality measures of a node in the call graph, such as the betweenness centrality or PageRank. [19] At step 150, it applies a specific treatment to the calls being classified as spam.
- the treatment may be chosen from, for example, but not limited to, block call, forward to voice mail box, notify the end-user of the classification result (e.g., prefixing the callers ID on the phone display with 'Spam?', using a different ring tone, display information on some device, e.g., smartphone, TV).
- Method 100 ends at step 199.
- the steps of building a call graph, calculating local assortativity coefficients, and detecting phone spam are discussed in further detail.
- the present principles may be implemented in a telecom operator network or in a third party device.
- the operator continuously maintains the call graph over a sliding window, e.g., of 1 day or 1 month.
- a third party which has access to the operator's phone call log, can also be used to generate the call graph.
- the call graph may be obtained by different methods, for example, by analyzing an internal operator's logs, by active agents running on smartphones, by active agents running on home gateways.
- An example of phone call log is provided below:
- 2013- 09-29T21 19 16Z Incoming, 12345678, +XX987654321, 0, AbortedbyCalled 2013- 09-27T20 03:50Z Outgoing, + XX987654321, 213427959, 14, Finished 2013- 09-18T21 14 : 58Z Incoming, 12345678, +XX987654321, 0, AbortedbyCalled 2013- 09-19T20 15 : 47Z Outgoing, +XX987654321, 765321, 82, Finished
- 2013- 09-23T21 08 20Z Incoming, 12345678, +XX987654321, 83, Finished 2013- 09-28T21 15.-06Z Incoming, 12345678, +XX987654321, 0, AbortedbyCalled 2013- 09-27T20 04 : 45Z Outgoing, +XX987654321, 765321, 58, Finished
- a call graph may be generated using at least the following steps:
- the created edges are non-weighted.
- the in-degree of a vertex is defined as the number of incoming edges, and the out-degree of a vertex is defined as the number of outgoing edges.
- a complete call graph G of a telecom operator is generally quite large and computations can be expensive. Therefore, the operator may only sample N users, generate a local view of the call graph for each sample user and aggregate them to a sub-graph G' .
- one agent for example, an application running on a
- smartphone may collect the local call log of a sample user, and can generate a local call graph from the perspective of the sample user. Combining and aggregating the logs of several agents, or the local call graphs, allow building the graph G' .
- G' G t , where G ; is the local call graph of sample user u ; .
- the local call graph G is composed of the set of vertices V(G and the set of edges E(G , wherein V(G contains Uj and the set of phone numbers that Uj called or has been called by, and E(G represents phone calls between Uj and the phone numbers in V(Gi).
- V(G contains Uj and the set of phone numbers that Uj called or has been called by
- E(G represents phone calls between Uj and the phone numbers in V(Gi).
- There exist directed unweighted edges in E(G if there has been at least one call from one phone number towards another one.
- the graph is considered assortative if the nodes with a high degree tend to be connected with other nodes with a high degree. Inversely, a graph is considered disassortative if high-degree nodes are mostly connected to low-degree nodes.
- Social networks, co-author graphs, co-actor graphs are typically assortative, while the Internet topology or the graph of world wide web hyperlinks are typically disassortative.
- the global assortativity coefficient r can be measured, as described in an article by Mark E. J. Newman, titled "Assortative mixing in networks," Physical review letters 89.20 (2002): 208701, by calculating the Pearson correlation coefficient of degrees between pairs of linked nodes:
- the directed degrees j" ,j be the a- and ⁇ -degree of the source node and target node for edge i (a ⁇ in, out ⁇ index the degree type) and the standard deviations ⁇ ⁇ , for the ( ⁇ , ⁇ ) assortativity coefficient.
- the ( ⁇ , ⁇ ) local assortativity coefficient can be defined using the Pearson correlation:
- r(in,in) measures the tendencies of a node to connect with other nodes that have similar in-degrees
- r(out,out) measures the tendencies of a node to connect with other nodes that have similar out-degrees
- r(in,out) measures the tendencies of a node with a given in-degree to connect with other nodes that have a similar out-degree
- r(out,in) measures the tendencies of a node with a given out-degree to connect with other nodes that have a similar in-degree.
- this classifier classifies a phone number as a spam number, if
- the phone number calls many users with high in-degrees.
- the call graph is built as an aggregation of local call graphs of sample users, usually the majority of users having high in-degrees are sample users. That is, this condition states that the phone number calls many sample users, and
- the phone number is never or rarely called by numbers that have either many incoming or outgoing calls.
- the call graph is built as an aggregation of local call graphs of sample users, usually the majority of users having high out-degrees are sample users. This condition simply states that spam numbers are in general not or rarely called by anyone. Even sample users that call a lot of other numbers do not or rarely call the spam numbers.
- FIG. 2 illustrates the graphical representation of a call graph G' composed of the local call graphs of 146 sample users of a telecom operator. Overall G' has 4699 vertices (phone numbers) and 6621 directed unweighted edges (at least one phone call between two vertices).
- the implementations of the proposed phone spam detection do not require that the content of the phone call to be analyzed, and thus allow fast reaction and may be used for real time applications.
- the present implementations may also be used a posteriori, on an older call graph, for seeking evidences of a former spam activity.
- the implementations only require light computations.
- the asymptotic complexity with respect to graph G' is less than quadratic.
- the present principles are not restricted to Internet Telephony and can be applied more broadly to any telephony system.
- the present principles may be implemented from the point of view of a telecom operator which generally has a broader view and valuable information to filter out phone spam.
- FIG. 3 depicts a block diagram of an exemplary phone spam detection system 300 where phone call spam can be detected and filtered.
- System 300 includes call history logging module 310, spam detector 320, and call filter 330.
- Call history logging module 310 record information about the phone calls.
- spam detector 320 analyzes the call graph, for example, using the method described in method 100. If a call is detected as spam, the call may be blocked or the user is notified. Otherwise, if the call is not detected as spam, the phone call is directed to the user.
- All the modules of phone spam detection system 300 may be located at one device, or may be distributed over different devices.
- call history logging module 310 may be located centrally in the operators network or in a distributed manner on several smartphones
- spam detector may be located within the telecom operator network
- call filter may be located within the telecom operator network or on a phone.
- Such phone spam detection and filtering service may be offered by the telecom operator to its users.
- the offered service is then similar to an e-mail spam- filter but applied in the context of phone calls.
- the service can also be offered by a third party that collects the phone call logs from several smartphones, e.g., by distributing an app that provides both the logging module and the call filter module.
- FIG. 4 shows an exemplary system 400 where phone spam detection system 300 can be used.
- different types of phones or devices providing phone service for example, mobile phones 410, network devices 420, and landline phones 430, are connected to a telecom operator 470, through wireless network 440, Internet 450, telephone network 460, respectively.
- the spam detection system may be embodied as a standalone device within the telecom operators core network, for example, incorporated within local handsets or base stations at the customer location or servers at the telecom operator locations, or running as a service on some device/computer connected to the internet.
- the implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program).
- An apparatus may be implemented in, for example, appropriate hardware, software, and firmware.
- the methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device.
- Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users.
- communication devices such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users.
- PDAs portable/personal digital assistants
- Determining the information may include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.
- Accessing the information may include one or more of, for example, receiving the information, retrieving the information (for example, from memory), storing the information, processing the information, transmitting the information, moving the
- Receiving is, as with “accessing”, intended to be a broad term.
- Receiving the information may include one or more of, for example, accessing the information, or retrieving the information (for example, from memory).
- “receiving” is typically involved, in one way or another, during operations such as, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
- implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted.
- the information may include, for example, instructions for performing a method, or data produced by one of the described implementations.
- a signal may be formatted to carry the bitstream of a described embodiment.
- Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal.
- the formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream.
- the information that the signal carries may be, for example, analog or digital information.
- the signal may be transmitted over a variety of different wired or wireless links, as is known.
- the signal may be stored on a processor-readable medium.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Telephonic Communication Services (AREA)
Abstract
From a directed call graph, local (out, in), (in, in), and (in, out) assortativity coefficients may be calculated. If a phone number's (a) local (out, in) assortativity coefficient > 0; and (b) local (in, in) and (in, out) assortativity coefficients ≤ 0, we may classify the phone number as a spam number. Other parameters measuring the correlation of degrees between linked nodes can be used instead of local assortativity coefficients for phone spam detection. We may use additional conditions, for example, whether the out-degree of a node corresponding a phone number is greater than a threshold, in phone spam detection. We can also use other mechanisms, such as whitelist and blacklist, in phone spam detection.
Description
Method and Apparatus for Detecting and Filtering Undesirable Phone Calls
TECHNICAL FIELD
[1] This invention relates to a method and an apparatus for detecting and filtering undesirable incoming phone calls. BACKGROUND
[2] It is generally considered desirable to be able to detect and filter undesirable phone calls. We may consider an incoming phone call to be undesirable/annoying, for example, but not limited to, when the phone call belongs to at least one of the following categories: telemarketing, survey, market research, phone evangelism, and one ring phone scam (pinging a range of phone numbers sequentially to solicit return calls).
SUMMARY
[3] The present principles provide a method for detecting spam numbers, comprising: selecting a plurality of phone numbers; generating a local call graph for a respective one of the plurality of phone numbers; aggregating the local call graphs to generate a call graph; determining a parameter, indicative of correlation between a degree of a node and degrees of linked nodes, responsive to the generated call graph, the node corresponding to a phone number included in the generated call graph; and determining whether the phone number is a spam number responsive to the parameter as described below. The present principles also provide an apparatus for performing these steps. [4] The present principles also provide a method for detecting spam numbers, comprising: selecting a plurality of phone numbers; generating a local call graph for a respective one of the plurality of phone numbers; aggregating the local call graphs to generate a call graph;
determining at least one of an (out,in) local assortativity coefficient, an (in,out) local assortativity coefficient, and an (in,in) local assortativity coefficient for the phone number, responsive to the generated call graph, the node corresponding to a phone number included in the generated call graph; and determining the phone number to be a spam number when the (out,in) local assortativity coefficient is greater than zero, and the (in,out) local assortativity coefficient and the (in,in) local assortativity coefficient are smaller than zero as described below. The present principles also provide an apparatus for performing these steps.
[5] The present principles also provide a computer readable storage medium having stored thereon instructions for detecting spam numbers, according to the methods described above.
BRIEF DESCRIPTION OF THE DRAWINGS
[6] FIG. 1 illustrates an exemplary method for detecting and filtering undesired phone calls, in accordance with one embodiment of the present principles.
[7] FIG. 2 illustrates a pictorial example depicting a local call graph, in accordance with one embodiment of the present principles.
[8] FIG. 3 illustrates an exemplary apparatus for detecting and filtering undesired phone calls, in accordance with one embodiment of the present principles.
[9] FIG. 4 illustrates an exemplary system wherein phone spam detection and filtering can be used. DETAILED DESCRIPTION
[10] There exist some work on detecting undesirable phone calls. For example, several internet sites provide community-sourced databases of undesirable phone numbers. Such
internet sites include, but not limited to, http://callerr.com, http://whocalled.us/, http://www.numbercop.com/, http://www.whocallsmefrom.com/, http://whocallsu.com, http://800notes.com/, http://SpamPages.info/, http://www.chamada.pt/,
http://whocallsme.com/, http://www.callercomplaints.com/, http://www.tellows.com/. In particular, Tellows associates a "trustworthiness score" to each referenced phone number and provides statistics on how many times a number has been searched for or commented on for its service. Tellows also provides a mobile phone application (Android, iOS) that automatically displays the trustworthiness score for each incoming phone call.
Community-based approaches require community sourced information and may take some time before a new spam number is referenced.
[11] Voice call spam in the context of internet telephony is called "Spam over Internet Telephony" (SPIT). SPIT mechanisms are often limited to Internet Telephony and consider specific features of the Internet Telephony protocol. CallRank, as described in an article by Vijay Balasubramaniyan et al., titled "CallRank: Combating SPIT Using Call Duration, Social Networks and Global Reputation," CEAS 2007 Fourth Conference on Email and
AntiSpam, August 23, 2007, Mountain View, California USA, relies on a social network and a reputation mechanism to differentiate between legitimate and spam callers in IP telephony systems. CallRank uses the call duration to differentiate between a legitimate caller and a spammer: callers provide call recipients with "call credentials," which attest that the call recipient stayed a certain amount of time in phone calls. The intuition is that legitimate callers generally receive and send calls that last some significant duration. In contrast, a spammer or telemarketer generally does not receive any phone calls at all, or only receives a very small number. A spammer also tries to make calls brief, in order to reach as many victims as possible in a short amount of time.
[12] Dantu et al. ("Detecting spam in VoIP networks," Proc. SRUTI '05: Steps to
Reducing Unwanted Traffic on the Internet Workshop, pp. 31-37, Cambridge, MA, July 2005) adopt the point of view of a single end-user/VoIP terminal, locally trying to filter out an incoming SPIT. Call filtering is based on the following stages: (i) Presence:
synchronization with an end-user's individual calendar. For example, calls are blocked if the end-user is in a meeting; (ii) Rate limiting: thresholds on the velocity and acceleration values of the number of arriving calls from a given user/host/domain; (iii) Black and white lists that are continuously updated by the end-user by classifying incoming calls; (iv) Bayesian inference: Based on the end-user's feedback (black/white listing) a Bayesian learning module automatically infers the category of an incoming call. Bayesian learning is based on the Session Initiation Protocol (SIP) fields "From," "To," "Via," "Record Route," "Contact Info," etc; (v) Social network and reputation using the Bayesian inference of trusted peers.
[13] Nassar et al. ("VoIP Honeypot Architecture," ΙΜΌ7, 10th IFIP/IEEE International Symposium on Integrated Network Management, pp. 109-118, Munich, 2007) use honeypots for detecting spam calls. For example, some telecom operators use fake phone numbers to detect spam calls.
[14] There also exist some research work that focus on malicious outgoing phone calls/SMS, for example, on international phone calls triggered by malicious smartphone applications, and on SCAM phone numbers.
[15] Nanavati et al. ("On the structural properties of massive telecom call graphs: findings and implications," ACM International Conference on Information and Knowledge
Management, CIKM'06, Arlington, Virginia, November, 2006, hereinafter "Nanavati") analyze a mobile phone call graph (where people are the nodes and calls are the edges),
where a call graph G is defined as a pair <V(G), E(G)>, V(G) being a set of vertices representing the mobile phone users, and E(G) being the set of directed vertex -pairs from V(G) representing mobile phone calls. In Nanavati, edges are directed but not weighted.
[16] Nanavati in particular analyzes the in- and out-degree correlation in a mobile phone of a large mobile telecom operator, and shows that there is a strong correlation between the in-degree and the average out-degree of a node, up to an in-degree of approximately 100. Over that threshold Nanavati highlights that there exist nodes which call a lot of numbers (i.e., having a high out-degree, while having a low in-degree) and attributes these calls to some salesmen activity. Over an in-degree of 100 Nanavati also observes a lot of numbers with a very low out-degree and attributes these calls to some customer service numbers or small businesses with advertised phone numbers.
[17] In the present application, we use the term "undesirable phone call" and "phone call spam" interchangeably, and use the term "vertex" and "node" interchangeably. In the following, we assume a phone user has only one phone number without loss of generality. The present principles can also be applied when a phone user has more than one phone number. In the present application, a call graph G is defined as a pair <V(G), E(G)>, V(G) being a set of vertices representing the phone numbers, and E(G) being a set of directed vertex -pairs (edges) between V(G) representing phone calls. In the present application, edges are directed but not weighted. The present principles are directed to automatic detection and filtering of annoying and undesirable incoming phone calls. FIG. 1 illustrates an exemplary method 100 for detecting phone call spam according to an embodiment of the present principles. Method 100 starts at step 105. At step 110, it performs initialization, for example, a telecom operator may select or sample N users randomly, wherein a respective local call graph may be built from the perspective of each of these N users. For ease of
notation, these N chosen users are denoted as "sample users" in the present application. At step 120, a call graph is accessed. In one embodiment, the telecom operator continuously maintains a call graph over a sliding window e.g., of 1 day or 1 month. In another embodiment, the call graph may be built from past activities and method 100 is used to analyze previous phone spam activities. The complete call graph G of a telecom operator, i.e., the call graph comprising all users and phone calls of a telecom operator, is generally quite large and computations can be expensive. Therefore, we can consider a sub-graph G' comprising only the N sample users and the numbers they called and have been called from and the corresponding phone calls. We denote the call graph from the perspective of one sample user as a local call graph, which only comprises the one sample user and the numbers he called and has been called from and the corresponding phone calls. G' can be constructed as the union of all local call graphs of the N sample users.
[18] At step 130, (in,in), (in,out),(out,in) local assortativity coefficients are determined for individual nodes in the sub-graph. At step 140, it classifies a phone call as spam, for example, based on the local assortativity coefficients, the out-degrees, whitelist, and/or blacklist. More generally, we may compute other metrics that measure the correlation between degrees (for example, in-degrees or out-degrees) of linked nodes instead of the local assortativity coefficients. For example, the centrality measures of a node in the call graph, such as the betweenness centrality or PageRank. [19] At step 150, it applies a specific treatment to the calls being classified as spam. The treatment may be chosen from, for example, but not limited to, block call, forward to voice mail box, notify the end-user of the classification result (e.g., prefixing the callers ID on the phone display with 'Spam?', using a different ring tone, display information on some device, e.g., smartphone, TV). Method 100 ends at step 199.
[20] In the following, the steps of building a call graph, calculating local assortativity coefficients, and detecting phone spam are discussed in further detail.
[21] Building Call Graph
[22] The present principles may be implemented in a telecom operator network or in a third party device. In one embodiment, the operator continuously maintains the call graph over a sliding window, e.g., of 1 day or 1 month. A third party, which has access to the operator's phone call log, can also be used to generate the call graph.
[23] The call graph may be obtained by different methods, for example, by analyzing an internal operator's logs, by active agents running on smartphones, by active agents running on home gateways. An example of phone call log is provided below:
Date, In/Out, Caller Callee, Duration, Status
2013- 09-29T21 19 : 16Z Incoming, 12345678, +XX987654321, 0, AbortedbyCalled 2013- 09-27T20 03:50Z Outgoing, + XX987654321, 213427959, 14, Finished 2013- 09-18T21 14 : 58Z Incoming, 12345678, +XX987654321, 0, AbortedbyCalled 2013- 09-19T20 15 : 47Z Outgoing, +XX987654321, 765321, 82, Finished
2013- 09-28T21 15.-06Z Incoming, 12345678, +XX987654321, 0, AbortedbyCalled 2013- 09-13T13 38 : 11 Z Incoming, 33344455566, +XX987654321, 0, AbortedbyCalled 2013- 09-12T09 56:26Z Incoming, , +XX987654321, 0, AbortedbyCalled 2013- 09-27T21 27 : 13Z Incoming, 12345678, +XX987654321, 122, Finished 2013- 09-29T21 19 : 16Z Incoming, 12345678, +XX987654321, 0, None
2013- 09-29T21 19 : 16Z Incoming, 12345678, +XX987654321, 0, None
2013- 09-27T20 04 : 11Z Outgoing, +XX987654321, , 0, AbortedbyCalled
2013- 09-23T21 08 :20Z Incoming, 12345678, +XX987654321, 83, Finished 2013- 09-28T21 15.-06Z Incoming, 12345678, +XX987654321, 0, AbortedbyCalled 2013- 09-27T20 04 : 45Z Outgoing, +XX987654321, 765321, 58, Finished
2013- 09-27T21 26:50Z Incoming, 12345678, +XX987654321, 0, AbortedbyCalled 2013- 09-24T21 22 : 09Z Incoming, 12345678, +XX987654321, 97, Finished 2013- 09-13T09 31 : 06Z Incoming, , +XX987654321, 0, AbortedbyCalled 2013- 09-20T17 14 : 42Z Incoming, , +XX987654321, 0, AbortedbyCalled
[24] In one embodiment, based on the operator's call logs, a call graph may be generated using at least the following steps:
Create a vertex for every number in the call log;
Create a directed edge between the vertex representing a caller and the vertex representing the callee if there is at least one call from the caller to the callee.
The created edges are non-weighted. The in-degree of a vertex is defined as the number of incoming edges, and the out-degree of a vertex is defined as the number of outgoing edges. [25] As discussed before, a complete call graph G of a telecom operator is generally quite large and computations can be expensive. Therefore, the operator may only sample N users, generate a local view of the call graph for each sample user and aggregate them to a sub-graph G' . For example, one agent (for example, an application running on a
smartphone) may collect the local call log of a sample user, and can generate a local call graph from the perspective of the sample user. Combining and aggregating the logs of several agents, or the local call graphs, allow building the graph G' .
[26] Mathematically, G' =
Gt, where G; is the local call graph of sample user u;. The local call graph G; is composed of the set of vertices V(G and the set of edges E(G , wherein V(G contains Uj and the set of phone numbers that Uj called or has been called by, and E(G represents phone calls between Uj and the phone numbers in V(Gi). There exist directed unweighted edges in E(G if there has been at least one call from one phone number towards another one.
[27] Calculating Local Assortativity Coefficients
[28] In an undirected graph, the graph is considered assortative if the nodes with a high degree tend to be connected with other nodes with a high degree. Inversely, a graph is considered disassortative if high-degree nodes are mostly connected to low-degree nodes. Social networks, co-author graphs, co-actor graphs are typically assortative, while the Internet topology or the graph of world wide web hyperlinks are typically disassortative.
[29] In an undirected graph, the global assortativity coefficient r can be measured, as described in an article by Mark E. J. Newman, titled "Assortative mixing in networks," Physical review letters 89.20 (2002): 208701, by calculating the Pearson correlation coefficient of degrees between pairs of linked nodes:
∑jkjk(ejk - qjqk)
γ =
σ2 where j and k are the degrees of linked nodes, qk and qj are the distributions of the remaining degree, which represents the number of edges leaving the node, other than the ones that connects the pair, ejk is the joint probability distribution of the remaining degrees, and oq is the standard deviation of the distribution of the remaining degree q in the graph. A graph is assortative when r > 0, and disassortative when r < 0. [30] We can also calculate the local assortativity coefficient p as the contribution that a particular node makes to the graph assortativity:
where j is the degree of the particular node, k is the average degree of its neighbors (a neighbor is a node to which the particular node has an edge), is the mean of the remaining degree q in the graph, and M is the number of edges in the graph. [31] In a directed graph we can calculate the (in,in), (in,out), (out,in) and (out,out) local assortativity coefficients as described in an article by Jacob G. Foster et al., "Edge direction and the structure of networks," Proceedings of the National Academy of Sciences 107, no. 24 (2010): 10815-10820. Instead of the undirected degrees j and k, we consider the directed degrees j" ,j be the a- and β -degree of the source node and target node for edge i (a {in, out} index the degree type) and the standard deviations σα, for the (α,β)
assortativity coefficient. The (α,β) local assortativity coefficient can be defined using the Pearson correlation:
where = ∑ij? /M and kP= ∑i kf /M. Generally, r(in,in) measures the tendencies of a node to connect with other nodes that have similar in-degrees, r(out,out) measures the tendencies of a node to connect with other nodes that have similar out-degrees, r(in,out) measures the tendencies of a node with a given in-degree to connect with other nodes that have a similar out-degree, and r(out,in) measures the tendencies of a node with a given out-degree to connect with other nodes that have a similar in-degree.
[32] Classifying a Phone Call
[33] Upon calculating the (in,in), (in,out) and (out,in) local assortativity coefficients of individual nodes in G' , the operator classifies a phone number as a spam number (or any outgoing call from this number as a spam call) if:
The calling phone number's (out,in) local assortativity > 0, and
The calling phone number' s (in,in) and (in,out) local assortativity < 0.
[34] Intuitively, this classifier classifies a phone number as a spam number, if
the phone number calls many users with high in-degrees. In particularly, when the call graph is built as an aggregation of local call graphs of sample users, usually the majority of users having high in-degrees are sample users. That is, this condition states that the phone number calls many sample users, and
the phone number is never or rarely called by numbers that have either many incoming or outgoing calls. In particularly, when the call graph is built as an aggregation of local call graphs of sample users, usually the majority of users having high out-degrees are
sample users. This condition simply states that spam numbers are in general not or rarely called by anyone. Even sample users that call a lot of other numbers do not or rarely call the spam numbers.
[35] As discussed before, instead of local assortativity coefficients, other metrics that measure the correlation between degrees of linked nodes can also be used. Additional conditions may be added to the above classifier. For example, we can add the condition that the out-degree of a node must be greater than a threshold, for example, 2, to be classified as spam. We may also use whitelist or blacklist mechanisms.
[36] When a user receives a call, the number of the caller is checked against the phone numbers classified as spam. When a call is classified as spam the operator may show a notification of the classification on the end-user's TV (if switched on and using IPTV services at this moment) or the phone display (by e.g., prefixing the callers ID on the phone display with 'Spam?'). The operator may also automatically redirect the call to the voicemail of the user. [37] FIG. 2 illustrates the graphical representation of a call graph G' composed of the local call graphs of 146 sample users of a telecom operator. Overall G' has 4699 vertices (phone numbers) and 6621 directed unweighted edges (at least one phone call between two vertices).
[38] In order to evaluate the performance of the classifier we collect ground truth from the community-based phone number rating sites, for example, Tellows. We consider a call as spam if the number of searches for this number on tellows is greater than 0. We also checked some "suspicious" numbers manually on other community-based sites and updated the classification accordingly. Using this method we label 231 phone numbers as spam
numbers and 1873 calls as spam calls. We label 4468 numbers as not spam and 31820 calls as not spam.
[39] Applying our classifier we obtain the following results:
TABLE 1(a) without the condition that out-degree > 2 for a spam number
[40] The above results show that the method allows for detecting spam with relative good detection rates, even though the sampling size (N) is small. A value for N that optimizes the detection rates may be determined heuristically.
[41] The implementations of the proposed phone spam detection do not require that the content of the phone call to be analyzed, and thus allow fast reaction and may be used for real time applications. The present implementations may also be used a posteriori, on an older call graph, for seeking evidences of a former spam activity. The implementations only require light computations. In particular, the asymptotic complexity with respect to graph G' is less than quadratic.
[42] In addition, the present principles are not restricted to Internet Telephony and can be applied more broadly to any telephony system. The present principles may be implemented
from the point of view of a telecom operator which generally has a broader view and valuable information to filter out phone spam.
[43] FIG. 3 depicts a block diagram of an exemplary phone spam detection system 300 where phone call spam can be detected and filtered. System 300 includes call history logging module 310, spam detector 320, and call filter 330. Call history logging module 310 record information about the phone calls. Given the phone call logs, spam detector 320 analyzes the call graph, for example, using the method described in method 100. If a call is detected as spam, the call may be blocked or the user is notified. Otherwise, if the call is not detected as spam, the phone call is directed to the user.
[44] All the modules of phone spam detection system 300 may be located at one device, or may be distributed over different devices. For example, call history logging module 310 may be located centrally in the operators network or in a distributed manner on several smartphones, spam detector may be located within the telecom operator network, and call filter may be located within the telecom operator network or on a phone.
[45] Such phone spam detection and filtering service may be offered by the telecom operator to its users. The offered service is then similar to an e-mail spam- filter but applied in the context of phone calls. The service can also be offered by a third party that collects the phone call logs from several smartphones, e.g., by distributing an app that provides both the logging module and the call filter module.
[46] FIG. 4 shows an exemplary system 400 where phone spam detection system 300 can be used. In FIG. 4, different types of phones or devices providing phone service, for example, mobile phones 410, network devices 420, and landline phones 430, are connected to a telecom operator 470, through wireless network 440, Internet 450, telephone network 460, respectively. The spam detection system may be embodied as a standalone device within
the telecom operators core network, for example, incorporated within local handsets or base stations at the customer location or servers at the telecom operator locations, or running as a service on some device/computer connected to the internet.
[47] The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users. [48] Reference to "one embodiment" or "an embodiment" or "one implementation" or "an implementation" of the present principles, as well as other variations thereof, mean that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment of the present principles. Thus, the appearances of the phrase "in one embodiment" or "in an embodiment" or "in one
implementation" or "in an implementation", as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment.
[49] Additionally, this application or its claims may refer to "determining" various pieces of information. Determining the information may include one or more of, for example, estimating the information, calculating the information, predicting the information, or
retrieving the information from memory.
[50] Further, this application or its claims may refer to "accessing" various pieces of information. Accessing the information may include one or more of, for example, receiving the information, retrieving the information (for example, from memory), storing the information, processing the information, transmitting the information, moving the
information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
[51] Additionally, this application or its claims may refer to "receiving" various pieces of information. Receiving is, as with "accessing", intended to be a broad term. Receiving the information may include one or more of, for example, accessing the information, or retrieving the information (for example, from memory). Further, "receiving" is typically involved, in one way or another, during operations such as, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
[52] As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry the bitstream of a described embodiment. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be
transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.
Claims
1. A method for detecting spam numbers, comprising:
selecting a plurality of phone numbers from a set of phone numbers stored in at least one memory;
generating a local call graph for a respective one of the plurality of phone numbers; aggregating (120) the local call graphs to generate a call graph;
determining (130) a parameter, indicative of correlation between a degree of a node and degrees of linked nodes, responsive to the generated call graph using a spam detector, the node corresponding to a phone number included in the generated call graph; and
determining (140) whether the phone number is a spam number responsive to the parameter.
2. The method of claim 1, wherein the determining a parameter comprises determining at least one of a local assortativity coefficient, betweenness centrality and PageRank.
3. The method of claim 2, wherein the determining a local assortativity coefficient comprises:
determining at least one of an (out,in) local assortativity coefficient, an (in,out) local assortativity coefficient, and an (in,in) local assortativity coefficient for the phone number.
4. The method of claim 3, wherein the phone number is determined to be a spam number when the (out,in) local assortativity coefficient is greater than zero, and the (in,out) and (in,in) local assortativity coefficients are smaller than zero.
5. The method of claim 1, further comprising:
determining an out-degree of the node responsive to the generated call graph, wherein the determining the phone number to be a spam number is further responsive to the out-degree exceeding a threshold.
6. The method of claim 1, further comprising at least one of indicating a phone call from the spam number as a spam call to a receiver of the phone call; and blocking the phone call from the receiver.
7. The method of claim 1, wherein the local call graph includes a node representing the respective one of the plurality of users, edges representing incoming and outgoing calls of the respective one of the plurality of users, and nodes corresponding to callers of the incomding calls and callees of the outgoing calls.
8. An apparatus for detecting spam numbers, comprising
a call history logging module (310) configured to access a set of phone numbers stored in at least one memory; and
a spam detector (320) configured to
select a plurality of phone numbers from the accessed set of phone numbers, generate a local call graph for a respective one of the plurality of phone numbers,
aggregate the local call graphs to generate a call graph,
determine a parameter, indicative of correlation between a degree of a node and degrees of linked nodes, responsive to the generated call graph, the node corresponding to a phone number included in the generated call graph, and
determine whether the phone number is a spam number responsive to the parameter.
9. The apparatus of claim 8, wherein the spam detector is configured to determine at least one of a local assortativity coefficient, betweenness centrality and PageRank.
10. The apparatus of claim 9, wherein the spam detector is configured to determine at least one of an (out,in) local assortativity coefficient, an (in,out) local assortativity coefficient, and an (in,in) local assortativity coefficient for the phone number.
11. The apparatus of claim 10, wherein the phone number is determined to be a spam number when the (out,in) local assortativity coefficient is greater than zero, and the (in,out) and (in,in) local assortativity coefficients are smaller than zero.
12. The apparatus of claim 8, wherein the spam detector is configured to determine an out-degree of the node responsive to the generated call graph, and to determine the phone number to be the spam number further responsive to the out-degree exceeding a threshold.
13. The apparatus of claim 8, further comprising a call filter (330) configured to perform at least one of indicate a phone call from the spam number as a spam call to a receiver of the phone call; and block the phone call from the receiver.
14. The apparatus of claim 8, wherein the local call graph includes a node representing the respective one of the plurality of users, edges representing incoming and outgoing calls of the respective one of the plurality of users and nodes corresponding to callers of the incomding calls and callees of the outgoing calls.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP14305904 | 2014-06-13 | ||
EP14305904.6 | 2014-06-13 | ||
EP14306683 | 2014-10-23 | ||
EP14306683.5 | 2014-10-23 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015189380A1 true WO2015189380A1 (en) | 2015-12-17 |
Family
ID=53366046
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2015/063155 WO2015189380A1 (en) | 2014-06-13 | 2015-06-12 | Method and apparatus for detecting and filtering undesirable phone calls |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2015189380A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017148146A1 (en) * | 2016-03-01 | 2017-09-08 | 华为技术有限公司 | Method and device for preventing nuisance calls |
US10708416B2 (en) | 2017-03-28 | 2020-07-07 | AVAST Software s.r.o. | Identifying spam callers in call records |
RU2762389C2 (en) * | 2021-05-07 | 2021-12-20 | Общество с ограниченной ответственностью "Алгоритм" | Method for recognizing a subscriber making unwanted calls and a method for handling an unwanted call |
US20220329663A1 (en) * | 2021-04-12 | 2022-10-13 | Rakuten Mobile, Inc. | Managing a software application |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080292077A1 (en) * | 2007-05-25 | 2008-11-27 | Alcatel Lucent | Detection of spam/telemarketing phone campaigns with impersonated caller identities in converged networks |
US20100124916A1 (en) * | 2008-11-20 | 2010-05-20 | Samsung Electronics Co., Ltd. | Apparatus and method for managing spam number in mobile communication terminal |
WO2011149846A1 (en) * | 2010-05-26 | 2011-12-01 | Google Inc. | Apparatus and method for identification of spam |
-
2015
- 2015-06-12 WO PCT/EP2015/063155 patent/WO2015189380A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080292077A1 (en) * | 2007-05-25 | 2008-11-27 | Alcatel Lucent | Detection of spam/telemarketing phone campaigns with impersonated caller identities in converged networks |
US20100124916A1 (en) * | 2008-11-20 | 2010-05-20 | Samsung Electronics Co., Ltd. | Apparatus and method for managing spam number in mobile communication terminal |
WO2011149846A1 (en) * | 2010-05-26 | 2011-12-01 | Google Inc. | Apparatus and method for identification of spam |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017148146A1 (en) * | 2016-03-01 | 2017-09-08 | 华为技术有限公司 | Method and device for preventing nuisance calls |
US10708416B2 (en) | 2017-03-28 | 2020-07-07 | AVAST Software s.r.o. | Identifying spam callers in call records |
US20220329663A1 (en) * | 2021-04-12 | 2022-10-13 | Rakuten Mobile, Inc. | Managing a software application |
US11736578B2 (en) * | 2021-04-12 | 2023-08-22 | Rakuten Mobile, Inc. | Managing a software application |
RU2762389C2 (en) * | 2021-05-07 | 2021-12-20 | Общество с ограниченной ответственностью "Алгоритм" | Method for recognizing a subscriber making unwanted calls and a method for handling an unwanted call |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3050287B1 (en) | Identifying and filtering incoming telephone calls to enhance privacy | |
US9100455B2 (en) | Method and apparatus for providing protection against spam | |
US8788657B2 (en) | Communication monitoring system and method enabling designating a peer | |
US12081696B2 (en) | System and method for determining unwanted call origination in communications networks | |
US10785369B1 (en) | Multi-factor scam call detection and alerting | |
US20120151046A1 (en) | System and method for monitoring and reporting peer communications | |
US20110280160A1 (en) | VoIP Caller Reputation System | |
US20120233098A1 (en) | Multiple Hypothesis Tracking | |
US11057436B1 (en) | System and method for monitoring computing servers for possible unauthorized access | |
US20110295982A1 (en) | Societal-scale graph-based interdiction for virus propagation slowdown in telecommunications networks | |
US20120233097A1 (en) | Multiple Hypothesis Tracking | |
WO2015189380A1 (en) | Method and apparatus for detecting and filtering undesirable phone calls | |
US11770475B2 (en) | Computerized system and method for robocall steering | |
Chaisamran et al. | Trust-based voip spam detection based on call duration and human relationships | |
Pandit et al. | Towards Measuring the Effectiveness of Telephony Blacklists. | |
Azad et al. | Socioscope: I know who you are, a robo, human caller or service number | |
US20130238517A1 (en) | Method and apparatus for creating a social network map of non-voice communications | |
Shafiq et al. | Effective feature selection for 5G IM applications traffic classification | |
Tabassum et al. | Profiling high leverage points for detecting anomalous users in telecom data networks | |
Azad et al. | Clustering VoIP caller for SPIT identification | |
Chaisamran et al. | Trust-based voip spam detection based on calling behaviors and human relationships | |
d’Heureuse et al. | Analyzing telemarketer behavior in massive telecom data records | |
Azad et al. | Mitigating spit with social strength | |
Ravula et al. | VoIP Spam Detection using Machine Learning | |
Azad et al. | ROBO-SPOT: Detecting Robocalls by Understanding User Engagement and Connectivity Graph |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15727686 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15727686 Country of ref document: EP Kind code of ref document: A1 |