WO2013189725A1

WO2013189725A1 - Method and system for spam detection and mitigation

Info

Publication number: WO2013189725A1
Application number: PCT/EP2013/061371
Authority: WO
Inventors: Joaquín GÓMEZ GONZALO; Antonio Agustín Pastor Perales; David Prieto Marques; Mª Antonia LÓPEZ AJENJO
Original assignee: Telefonica, S.A.
Priority date: 2012-06-21
Filing date: 2013-06-03
Publication date: 2013-12-27

Abstract

The method comprises computing means for capturing suspicious data traffic through a plurality of access nodes in a communication network characterized in that it comprises: a) detecting, a detector module, said suspicious data traffic passing through said plurality of access nodes in the communication network; and b) receiving and analysing, a mitigation module, said suspicious data traffic detected, in order of blocking it in case said suspicious data traffic is infected, in real time at the origin of the network access node in said communication network. The system is arranged for implementing the method of the present invention.

Description

Method and system for SPAM detection and mitigation

Field of the art

The present invention generally relates, in a first aspect, to a method for SPAM detection and mitigation, and more specifically to a method for optimizing the performance of detecting and mitigating SPAM in a network.

A second aspect of the invention relates to a system arranged for implementing the method of the first aspect. Prior State of the Art

SPAM: SPAM is the practice of sending unsolicited bulk messages indiscriminately. Currently, the most widely way of distributing those messages is via e- mail.

Frequently the SPAM has a commercial content and is sent to an indiscriminate set of recipients that did not request those e-mails. So that's why it is considered junk mail.

SPAM is growing exponentially year after year and it is estimated that today it composes around 85% of all the e-mails in the world. This happens because SPAM is economically profitable: spammers have no operating costs apart of managing the mailing lists.

A large percentage of the costs generated by the SPAM are borne by the ISPs, which have been forced to add extra capacity in their networks to manage the "extra" traffic. Apart from that, another major disadvantage for ISPs hosting spammers is the loss of reputation.

Currently, most of the SPAM in the world is generated by "zombie networks" or botnets.

BotNets: Robot networks are a set of infected PCs controlled centrally to carry out, in an orchestrated way, one (or many) of the following actions:

• Send SPAM

· DDoS attacks.

Get private/economical information from the victims.

When a PC is infected it becomes part of the botnet and, from that moment, it is waiting for the orders of the controller. Meanwhile, the legitimate owner of the PC does not realize what is really happening while he is doing his usual tasks. The main advantage of sending SPAM using a botnet is that the sender is not a unique PC, but hundreds or thousands of them, so it is very difficult to detect and mitigate it.

Anti-SPAM tools currently existing:

Currently there are several tools to fight SPAM. The most important are the following ones:

Detection at the destination: When a user receives an email, the client application analyzes it to determine if it is SPAM or a legitimate mail.

Detection at the mail server: The mail server analyzes all the emails it receives before sending them to the destination.

Detection at the core of the network: A sample of the traffic that crosses the ISP network is analyzed looking for large amounts of traffic sent to port TCP/25.

Problems with existing solutions:

As described in the previous section there are several solutions to fight SPAM, but all of them have several problems to solve:

Performance limitation: When the SPAM detection is carried out in a central point, usually arises performance or scaling problems. This is the case of the following two solutions:

· Detection at the mail server: All the mails sent to a domain must be analyzed in the same point: the mail server. This can lead to performance/scaling problems.

Detection at the core of the network: The detection is carried out using equipment located in the core of the ISP network, so the amount of traffic that must be analyzed is really huge. Also, this can lead to performance/scaling problems.

ISP bandwidth and network processing resource consumption: The farther from the source is the detection / mitigation performed, the more amount of bandwidth and processing resources in the network will be consumed by SPAM, besides processing resources and, even, storage capacity in the client, when detection is performed at the destination.

In any of the three existing solutions the detection / mitigation is carried out once the SPAM has consumed some bandwidth and processing resources of the ISP core network, which can result in the need to increase the ISPs resources. ISP reputation loss: When an ISP hosts a large number of spammers who send emails to external domains, there will be an important ISP reputation loss.

The described solutions:

Detection at the destination

· Detection at the mail server

allow the SPAM to be received by the destination so, from an external point of view, the ISP in which the sender is hosted is allowing the SPAM. This leads to an ISP reputation loss.

Inaccuracy: The detection at the core of the network solution is based on sampling the traffic across the network, so it really does not analyze the whole information to detect the SPAM, but only a minimum part of it. This can lead to an inaccuracy detection mechanism.

On the other hand this solution identifies a user by his/her IP, which can be very imprecise in those ISPs in which private addressing is used, not identifying univocally the user.

Ineffective or nonexistent mitigation measures: The mitigation measures applied by the described solutions are the following ones:

Detection at the destination / Detection at the mail server: Usually the mitigation consists of storing the SPAM e-mail in a SPAM folder. · Detection at the core of the network: In ISPs in which this solution is deployed, the mitigation usually consists of dropping the spammer's traffic to TCP/25 port.

Description of the Invention

It is necessary to offer an alternative to the state of the art which covers the gaps found therein, particularly related to the lack of proposals which really allow the detection and mitigation of SPAM in a network.

To that end, the present invention provides, in a first aspect, a method for SPAM detection and mitigation, performed in a user-centric Network Anomaly Detection System, comprising means for capturing suspicious data traffic through a plurality of access nodes in a communication network.

On contrary to the known proposals, the method of the first aspect of the invention comprises:

a) detecting, a detector module, the suspicious data traffic passing through the plurality of access nodes in the communication network; and b) receiving and analysing, a mitigation module, the suspicious data traffic detected, in order to blocking it in case the suspicious data traffic is infected, the steps a) and b) performed in real time at the origin of the network access node in the communication network.

In a preferred embodiment, the method analyses a plurality of SMTP commands and a plurality of DNS MX packets of the suspicious data traffic.

In another preferred embodiment, the detection of the suspicious data traffic doesn't depend on the existence of a previous calculated behavior.

Other embodiments of the method of the first aspect of the invention are described according to appended claims 2 to 8, and in a subsequent section related to the detailed description of several embodiments.

A second aspect of the present invention generally comprises a system for SPAM detection and mitigation, performed in a user-centric Network Anomaly Detection System, comprising means for detecting suspicious data traffic through a plurality of access nodes in a communication network.

On contrary to the known proposals, the system of the second aspect of the present invention comprises:

- a detector module arranged for perform the detecting of suspicious data traffic from the communication network; and

- a mitigation module arrange for receive and analyze the suspicious data traffic detected, and in charge of blocking it in case the suspicious data traffic is infected.

In a preferred embodiment, the system has a probe module arranged for capture the suspicious data traffic from the communication network and for prepare the suspicious data traffic for the detection, and a detector library and a mitigator library arranged to the detector module and the mitigator module in order to perform said detection and mitigation of the suspicious data traffic.

In another preferred embodiment, the system is integrated in a specific card or as plug-in within the access node.

The system of the second aspect of the present invention is arranged for implement the method of the first aspect.

Other embodiments of the second aspect of the invention are described according to appended claims 9 to 15, and in a subsequent section related to the detailed description of several embodiments. Brief Description of the Drawings The previous and other advantages and features will be more fully understood from the following detailed description of embodiments, with reference to the attached drawings, which must be considered in an illustrative and non-limiting manner, in which:

Figure 1 shows an example of the SPAM-DMSON architecture proposed in the present invention.

Figure 2 shows an example of the detector and detector library components. Figure 3 shows an example of the mitigator and mitigator library components. Figure 4 shows the traffic analysis done by the SPAM-DMSON detection algorithm, according to an embodiment of the present invention.

Figure 5 shows the SPAM-DMSON counter evaluation performed by the SPAM- DMSON detection algorithm, according to an embodiment of the present invention.

Figure 6 shows an example of the SPAM-DMSON mitigation algorithm implemented as part of the mitigation library.

Detailed Description of Several Embodiments

The proposed invention proposes an SPAM Detection and Mitigation System On-Net (SPAM-DMSON) regarding hardware and software equipment to be included in the network access nodes, for example, integrated in a specific card within the node.

This system will enable:

Detect infected customers generating SPAM at origin, since detection takes place at the network access nodes.

Doing the traffic analysis in a lightweight manner in real time, since the system only inspects SMTP commands and DNS MX packets passing through the access/ISP network nodes.

Mitigating suspected spammers, so that legitimate customers and/or traffic are avoided to be blocked. As SPAM detection, mitigation is performed in real time, what implies a dynamic mitigation according to the traffic passing through the access node.

In a preferred embodiment, the proposed invention is based on a specific implementation of a Network Anomaly Detection System (NADS), specifically a user- centric NADS. The invention adds new functional modules for the detection and mitigation of suspected spammers with specific SPAM detection algorithm, analyzing several kinds of traffic (SMTP and DNS). As a differential characteristic compared to a typical user centric NADS, the detection algorithm doesn't need a training period and it doesn't depend on the existence of a previously calculated base behavior for the users. Likewise, mitigation is adapted to the traffic/detection situation carried out in each moment.

The invention defines functions that currently are not provided by the network access/ISP nodes: SPAM detection and mitigation at origin in real time by a lightweight traffic analysis.

The SPAM Detection and Mitigation System (SPAM-DMSON) concerns a hardware and software system which implements a lightweight detection and mitigation algorithm in real time based on the inspection of the SMTP commands and DNS MX requests going through the access/ISP nodes, such as BRAS and GGSN, in the ISP network. This system will be included in these nodes integrated in a specific card or as a plug-in which can be added to an existing node.

The SPAM-DMSON architecture, including its components and the interaction between them, is depicted in Figure 1. The modules defined in the system are:

· PROBE: monitoring point, providing a copy of the network traffic to the

Detector.

• DETECTOR: receives the traffic from the PROBE module, being responsible for the invocation of the DETECTOR LIBRARY.

• DETECTOR LIBRARY: supposes the library that implements the SPAM- DMSON Detection Algorithm to detect suspicious spammers.

• MITIGATOR: receives the SMTP traffic passing through the access node, applying mitigation according to the SPAM-DMSON Mitigation Algorithm, in charge of blocking illegitimate traffic.

• MITIGATOR LIBRARY: library that implements the SPAM-DMSON Mitigation Algorithm.

PROBE:

Since detection carried out by the SPAM-DMSON needs a copy of the traffic passing through the access node, this copy will be taken per-software basis, using a unique physical access to the network traffic. To capture those packets from the physical-media and preparing them for the detection algorithm, the PROBE component is defined.

DETECTOR and DETECTOR LIBRARY:

Once the network traffic is aggregated into flows, it is ready to be analyzed using the SPAM-DMSON detection algorithm, implemented by the DETECTOR LIBRARY. The DETECTOR module using the detection mode, forwards the flows to the DETECTOR LIBRARY. Furthermore, the detection algorithm implemented as part of the DETECTOR LIBRARY is able to store the results of this detection.

The relationship between the DETECTOR and the DETECTOR LIBRARY includes the possibility of sending alerts to logging facilities.

MITIGATOR and MITIGATOR LIBRARY:

Once the DETECTOR has accomplished with its function of detecting suspicious spammer users, the associated SMTP traffic will be mitigated, for example, being blocked, using the component called MITIGATOR. This element analyses the traffic passing through the access node and invokes the MITIGATOR LIBRARY, which implements the SPAM-DMSON Mitigation Algorithm. The MITIGATOR LIBRARY needs the information previously stored as result of the detection algorithm to allow or block the suspected SMTP traffic. Besides, the MITIGATOR LIBRARY would be used to dynamically change the mitigation algorithm and its settings. In Figure 3 it can be seen the MITIGATOR and M I T I G ATO R- L I B RARY components.

1 ) SPAM-DMSON Detection Algorithm:

As said before, the SPAM-DMSON algorithm is implemented as part of the DETECTOR LIBRARY. As stated previously, the flows received by the DETECTOR from the PROBE, to be processed by the DETECTOR LIBRARY, follow the DETECTOR interface. This include the availability for the flows of fields such as source and destination IP addresses; source and destination TCP or UDP ports or certain application header data, among others; making possible the analysis performed by the current detection algorithm.

The objective of this algorithm is detecting suspicious spammers, in a lightweight manner by monitoring SMTP commands and DNS MX requests. User equipment controlled by an SPAM bot could act as an SMTP server trying to send a lot of mails to configured legal SMTP servers during a time, so this algorithm will be based on detecting this SMTP server to server communication within the access/ISP network. In order to determine whether an SMTP server belongs to the access/ISP network node that analyses its traffic, the network node applying the detection algorithm should request this information to the proper internal ISP platforms.

Though the SPAM bot behavior could have different flavours and the SPAM mail stream could not be continuous, with long intervals waiting silently without sending anything, the SPAM-DSON detection algorithm will be able to detect these behaviors since it evaluates several counters within an interval of time, in order to establish the list with the malicious users. Some of these detection counters/features to evaluate include:

• Monitored server ID, which conforms to the identity of the potential spammer. In general, this identity will be composed of an IP address, but depending on the case, it should be based on other communication features.

• Number of sent mails.

• Number of requested domains.

• Number of SMTP error responses with code 553 ("Requested action not taken: mailbox name not allowed").

· Number of SMTP error responses with code 421 ("Service not available, closing transmission channel").

• List with the origin domains the server seemed to deal with.

As said before, a general suspected spammer identity, called "ID", will be used in this algorithm. Otherwise, depending on the identification established, different issues would be found, for example, when this ID is only based on the user IP address, IP dynamic assignment issues would be associated. Consequently, this ID should be defined univocally at the beginning of the algorithm according to the transmission/communication characteristics dealt by the access/ISP node applying this algorithm. Among the features to define the user ID, the following ones (but not limited to) can be used: the MAC address; the user IP address; a combination of user IP address and the transmission timestamp, which needs a request to an external system like RADIUS; a combination of IP Address and other transmission features within the access node for that traffic, like the sub-interface or the port used for that communication, etc.

Traffic analysis done by the algorithm is depicted in figure 4.

Taking the picture as basis, the data flow (1 ) to obtain the former counters is analyzed as follows:

• UDP traffic to destination port 53 (2), to detect the number of requested domains through the detection of DNS MX domain requests. This fulfills the former list with the related counter associated to a user ID within the access network.

• SMTP traffic originated at port 25 (3), that is, the origin of the packet is an SMTP server answering an SMTP client/server. In this case, response SMTP error codes (553 and 421 , (4) and (5) respectively) sent by that SMTP server would be added to their corresponding counters in the former list. These counters indicate several failed attempts to send an e-mail; so when happening continuously during an interval, it can be assumed that the origin is trying to forge the transmission.

· SMTP traffic to destination port 25 (6), to allow the classification of the origin as client or server.

When the SMTP transfer includes an authentication phase (AUTH tag) or a TLS communication (STARTTLS tag) between both nodes, it is assumed that the origin constitutes an SMTP client (7). In this case, if the SMTP destination server ID belongs to the access network (8), the algorithm obtains the domains it serves, by analyzing the sender (a client) domain from the SMTP command "MAIL (FROM)", and stores these data in a separate list: the Server Domain List/Database. The Server Domain List will be taken into account in the mitigation phase.

Otherwise, the transfer will be treated as an SMTP server-to-server communication (9). Similarly to the client-to-server communication through port 25, if the SMTP destination server ID belongs to the access network (10), the algorithm stores the domain it serves in the Server Domain List. In this case, the domains served by the destination server are inferred from the SMTP commands "RCPT (TO)". Besides, if the origin ID belongs to the access network (1 1 ), the algorithm analyses the origin domains served by the SMTP origin server (12), extracted from the SMTP command "MAIL (FROM), and the number of the mail destinations (13), from the SMTP command "RCPT (TO)", to add both data to the related counters for that origin server I D (List with Origin Domains and Number of Sent Mails, respectively).

Each configured interval, the detection counters are evaluated using the corresponding thresholds set for the algorithm execution. Thus, when the user SMTP traffic and DNS MX requests have increased substantially, exceeding the configured thresholds, it indicates that the user has been probably infected by a bot. In this case, the user ID will be added to the list of suspected spammers and stored in the Suspected Spammers Database, as shown in figure 5. In this figure, thresholds corresponding to the counter tuple (Number of Sent Mails, Number of Requested Domains, Number of Err Codes 553, Number of Err Code 421 , Number of Origin Domains) are represented by the tuple (SM0, RD0, EC553, EC421 , OD0). The detection counters, after evaluation, are reset.

2) SPAM-DMSON Mitigation Algorithm: In an embodiment, the SPAM-DMSON mitigation algorithm, implemented as part of the MITIGATOR LIBRARY, is described in figure 6.

Mitigation consists of dropping the SMTP traffic from a suspicious ID detected by the SPAM-DMSON detection algorithm. Through the SMTP traffic detection (1 ) at the access node, if the origin user ID belongs to a malicious spammer (2) and it has no corresponding assigned server domain (3), that is, an associated domain previously stored in the Server Domain DB, then that SMTP flow will be dropped. In case this user ID has an assigned domain (4) but it does not match with the one used in the origin mail (5), extracted from the SMTP command "MAIL (FROM)" and constituting the Origin Domain", the associated SMTP flow will also be dropped. Otherwise, the SMTP traffic will be forwarded.

In the proposed algorithm, the mitigation action executed in the node is dropping suspected SMTP transmissions; however, other mitigation actions could be performed, such as redirecting this SMTP traffic to a specific server where it can be analyzed thoroughly.

It is worth to point out that, each interval of detection, mitigation dynamically adapts to the new SPAM situation.

Advantages of the Invention:

Including the SPAM Detection and Mitigation System in the ISP network at the access nodes will provide the following advantages:

• SPAM detection and mitigation at the network edge, a functionality that currently is not given by any equipment at this point.

• Quick and lightweight detection and mitigation in real time, to react as soon as possible and to avoid overloading the network node that will process the traffic. · Mail content is not analyzed beyond the SMTP command level (HELO/EHLO, AUTH, STARTTLS, MAIL (FROM), RCPT (TO), etc.) which supposes a substantial difference with other inventions based on SPAM content analysis. Furthermore, it also protects privacy of user messages.

• Efficiency through processing distribution, since the MITIGATOR can easily work in a different physical device within the access node, different from the device/s used by the rest of the components in the architecture.

• Since detection is performed at origin in the network access nodes, suspected spammers belonging to the ISP network can be detected avoiding NAT (use of private IP addressing) related problems. • Avoiding legitimate customers and/or traffic (all traffic to port 25) to be blocked, improving the customer satisfaction.

• Allowing that victim users affected by SPAM botnets can maintain their email service.

• Undesirable traffic will be mitigated, optimizing network resources and, obviously, impacting on the network dimensioning.

• Diminishing ISP costs related with the customer tickets due to legitimate traffic blocks.

• Providing an infrastructure protection against attacks.

• Improving ISP reputation.

A person skilled in the art could introduce changes and modifications in the embodiments described without departing from the scope of the invention as it is defined in the attached claims.

ACRONYMS

BRAS Broadband Remote Access Server

DB Database

DDoS Distributed Denial of Service

DNS Domain Name System

GGSN Gateway GPRS Support Node

ISP Internet Service Provider

MAC address Media Access Control address

NADS Network Anomaly Detection System

NAT Network Address Translation

RADIUS Remote Authentication Dial-ln User Server SMTP Simple Mail Transfer Protocol

SPAM Junk mail

SPAM-DMSON SPAM Detection and Mitigation System On-Net TCP Transport Control Protocol

UDP User Datagram Protocol

Claims

1. A method for SPAM detection and mitigation, performed in a user-centric Network Anomaly Detection System, said method comprising computing means for capturing suspicious data traffic through a plurality of access nodes in a communication network characterized in that it comprises:

a) detecting, a detector module, said suspicious data traffic passing through said plurality of access nodes in the communication network; and

b) receiving and analysing, a mitigation module, said suspicious data traffic detected, in order of blocking it in case said suspicious data traffic is infected, wherein said steps a) and b) are performed in real time at the origin of the network access node in said communication network.

2. The method according to claim 1 , characterized in that it comprises in said step b) analyzing a plurality of SMTP commands and a plurality of DNS MX packets of said suspicious data traffic.

3. The method according to claim 2, wherein mail content beyond a SMTP command level is not analyzed.

4. The method according to claim 1 or 2, comprising storing said plurality of SMTP commands and DNS MX packets detected in order to further allow them or block in case they are infected.

5. The method according to claim 1 , characterized in that said suspicious data traffic is generated by a spammer.

6. The method according to claim 5, comprising performing said steps a) and b) without depending on previous calculated behavior of the spammer.

7. The method according to claim 1 , characterized in that it further comprises adapting said detection to discontinuous streaming of suspicious data traffic and/or silent streaming periods of time by means of evaluating several counters within said periods of time.

8. The method according to claim 7, comprising keeping said several counters evaluated in a computer database, in order to establish a malicious identity spammer database.

9. A system for SPAM detection and mitigation, performed in a user-centric Network Anomaly Detection System, comprising means for detecting suspicious data traffic through a plurality of access nodes in a communication network characterized in that it comprises: - a detector module arranged for perform said detecting of suspicious data traffic from the communication network; and

- a mitigation module arrange for receive and analyse said suspicious data traffic detected, and in charge of blocking it in case said suspicious data traffic is infected.

10. The system according to claim 9, comprising a probe module arranged for capture said suspicious data traffic from said communication network and for prepare said suspicious data traffic for said detection.

1 1. The system according to claim 10, wherein a detector library is arranged to said detector module in order to perform said detection of the suspicious data traffic.

12. The system according to claim 1 1 , wherein a mitigator library is arranged to said mitigator module in order to perform said analysis of the suspicious data traffic.

13. The system according to claim 9, wherein said mitigator module is arranged in a same physical device within an access node.

14. The system according to claim 9, wherein said mitigator module is arranged in a different physical device within an access node.

15. The system according to claim 9, characterized in that is integrated in a specific card or as plug-in within said access node.