WO2011131228A1

WO2011131228A1 - Method and system for forming a network for s pam free e - mail communication

Info

Publication number: WO2011131228A1
Application number: PCT/EP2010/055183
Authority: WO
Inventors: Pan Hui; Sufian Hameed
Original assignee: Deutsche Telekom Ag; Technische Universität Berlin; Georg-August-Universität Göttingen
Priority date: 2010-04-20
Filing date: 2010-04-20
Publication date: 2011-10-27
Also published as: DE112010005508T5

Abstract

A method and a system of forming a network for communication between a particular user who is a member of a particular community which preferably comprises two levels, with users outside of this particular community, comprises the features of: selecting at least one trusted user who is a member of another community outside of the particular community, wherein this other community preferably comprises two levels of users, and using the trusted user to vouch for communication between the particular user in the particular community and any user in this other community.

Description

METHOD AND SYSTEM FOR FORMING A NETWORK FOR SPAM FREE E-MAIL COMMUNICATION

The present invention relates to a method of forming a network for spam free communication between a particular user within a particular community and another user outside of this particular community. The present invention further relates to a system for forming such a network for spam free communication. The present invention further relates to a method and system for e-mail communication using such a network.

Collaboration (scientific, business or merely social ones) without e-mail is almost impossible today. However, the explosive growth in the unsolicited e-mail (spam) in the past decade has made it impossible for e-mail communications to function without spam protection/filtering. Currently, spam e-mails have largely outnumbered legitimate ones, increasing from 65% in 2005 to 81 ) (200 billion spam messages daily) in 2009. Despite that researchers and practitioners have developed and deployed a broad variety of systems intended to prevent spam, spam remains a pressing problem of large scale. Existing approaches to combat spam falls roughly into 4 broad categories; a) content-based filtering, b) sender authentication approach, c) header-based approach and d) social network and trust based approaches. Any existing anti-spam system may comprise one or several of the following approaches.

Content-based filtering is the most popular spam protection technique and is widely available in most free and commercial implementations. Content-based filtering uses heuristics and machine learning methods, based on filters and keywords, for spam recognition. Unfortunately content- based filtering exhibits several problems which limits its usage. These problems include the intrinsic cost of initialization and continuous adaptation of the filters and false positives and false negatives of results.

Sender Authentication techniques are used as a protection against forged sender or from addresses by the spammer. It is an intrinsic requirement for all the white list approaches since otherwise spammer can simply guess on well-known e-mail addresses in the from: field. Under the umbrella of sender authentication techniques, a number of domain-based authentication systems have been developed to validate that the received e-mail actually came from one of these domains. According to one solution, the mail recipient requests a real-time challenge-based authentication protocol to validate the binding between individual domain names and legitimate mail sources for those domains.

Header-based approach examines the header of an e-mail to detect spam. This approach can be categorized as white lists and black lists. White lists schemes collect a list of all the e-mails that are trusted to be non-spammers. Any e-mail sourced from the addresses in the white lists is collected directly in the inbox. White listing is highly vulnerable to from address forgery, therefore it must be used together which source authentication schemes. Blacklists schemes, in contrasts stores the IP-addresses of all the spammers (e-mail addresses are easily forgeable and are ineffective 95% of the time) and refuse to accept e-mails from them. Manually generated lists have proved to be highly efficient but put quite a burden on an e-mail user to maintain it.

During the recent years different approaches are exploited for spam detection using social networks and trust reputation systems. Boykin et al. [1] create a social network of friends in the cyberspace based on the e-mails exchanged between them. With the usage of local clustering properties of social network the e-mails are classified. For spammer the clustering coefficient is very low as they do not exchange e-mails with each other. In contrast, the clustering coefficient of a legitimate user is rather high. Their method is able to classify 53% of all the e-mails as spam or non-spam with 100% accuracy. However, the method is limited to offline analysis, and even the remaining 47% e-mails are left for other filtering techniques.

Mislove et al. [2] tries to explore the use of trust relationship to thwart unwanted communication. They used the number of trust relationships the user has to limit the amount of unwanted communication a user can produce. Their system relies on existing trust networks to connect senders and receivers via chains of pair-wise trust relationship and use a pair-wise, link-based credit scheme to impose a cost on originator of unwanted communication. Scalability of this system is still not certain if it maintains a per link credit scheme. Although it can be decentralized by introducing a central tracker component, it is not clear how scalable the system will be - they have not evaluated this part. Also this method would not work for functional Ids that exist only to receive data (e.g. scholarships@eurosys.org for student travel grants). If a user is bad at responding/classifying e-mails (i.e. he does not care about sending either), then he is stuck waiting for a response.

Garris et al. [3] talks about use of white list of friends and automatic white list of Friends-of- Friends (FoF) to increase the communication chance of only white list friends. By using this protocol, this system can accept almost 75% of received e-mails and prevent up to 88% false positive by the existing spam filters. With FoF protocol there is 10% increase for accepted e- mails. When receiving e-mails from unknown sender or from users other than friends or FoF, this system uses other existing schemes. Further, this system requires a lot of structural overheads. Each user needs to maintain a public/private key, resulting in maintaining a system wide PKI (public key authentication) and authentication server for each domain. It would have been easier to use existing protocols like digitally signed e-mails, then going through all this architectural complexity.

In Trust and Reputation Systems, network users try to calculate the reliability and trustworthiness of other users based on their own experiences and that of others. Boykin et al. [1] propose an automatic e-mail ranking system based on trust and reputation algorithms. Reputation algorithms provide a rating for each e-mail address, which can subsequently be used to sort the incoming e- mails. These ratings can be gained in two ways, globally and personally/locally. In global scheme people share their personal ratings in such a way that a single global rating or reputation can be inferred for each e-mail address. Whereas, in personalized scheme, the ratings (considered as trust) are different for each e-mail user and depends on individual personal social network. Chirita et al. [4] propose a spam detection system which is based on trust and reputation scheme to classify e-mail addresses (apart from ranking e-mails as done in [1]) into spammer addresses and non-spammer addresses. It additionally determines the relative rank of an e-mail address with respect to other e-mail addresses. Trust and reputation systems are inherently subject to attacks like identity spoofing, false accusation and collusion. These attacks are independent of a particular trust and reputation calculation metric and are primarily due to lack of authentication and non-reputation in standard trust and reputation solution. Each of the above approaches has certain disadvantages. In short, the spam protection systems used today only filter spam from the user's inbox (i.e. recipient's edge), but the spam already travels the network, and provokes non-negligible cost to network operators in terms of bandwidth and infrastructure. On the other hand, content-based filtering, one of the most widely adopted defense mechanism, has turned spam problem into false positive and negative one. In consequence, this makes e-mail delivery unreliable. False negative is when spam is classified as legitimate and placed into the inbox. Whereas false positive can cause very serious problem as an important and legitimate e-mail may be misclassified and may not be received on time.

The above proposals [1, 2, 3] are using social networks (from now onwards social network, circle and community are use interchangeably) to fight spam. Their services are only limited within the social network of an e-mail user.

The object of the present invention is to provide a method and a system for forming a network for communication, in particular e-mail communication which improves protection against unsolicited e-mail like spam.

This object is achieved with the features of the independent claims. The dependent claims relate to further aspects of the present invention.

The present invention is based on the concept to let users select trusted users, in the following called Gatekeepers (GKs), from outside their social circle and within pre-defined social distances. Unless a Gatekeeper vouches for the e-mails of potential senders from outside the social circle of a particular recipient, those e-mails are prevented from transmission. In this way, the present invention drastically reduces the consumption of internet bandwidth by spam to control messages only. By using publicly available online social networks, data sets it can be demonstrated that reliable e-mail delivery from millions of potential users is possible using Gatekeepers in the order of hundreds.

The spam protection system according to the present invention leverages anti-social networking paradigm based on an underlying trust infrastructure to both extend spam protection beyond a user's social circle and fundamentally prevents the transmission of spam across a network at the first place. For any particular user within a particular community, the present invention may handle e-mails separately in two different ways based on their origins, i.e., sent from another user within the particular community or sent by a user outside the particular community.

All the e-mails generated within the user's particular community are authorized and trusted to be legitimate messages, and will pass directly into the inbox. Preferably, the social components of a community consists of two levels i.e. friends of the users and its friends-of-friends (FoF). A user A can add user B as a friend. A friend addition roughly corresponds to the notion that "User A trusts user B not to send him spam and vice versa". The addition of FoF into a social community is also useful for spam fighting in case when a sender and a recipient are not already friends, but instead share a common friend. Suppose A and B are friends, B and C are friends, on this basis A may conclude that C is unlikely to be a spammer as well. The intra-community communication according to the present invention is a kind of white listing application.

In order to extend spam-free e-mails beyond a recipient's social network, the present invention introduces a process called anti-social networking for selecting trusted users, i.e., Gatekeepers (G s). Subtracting a user's social network from the overall social graph reveals its anti-social network. In particular, the term anti-social networking is used to describe the process that a user performs to select trusted GKs outside its social network. GKs are selected to be socially separated from the user with predefined or user-specified social distances. Once selected, the GKs are used to vouch for all incoming e-mails outside a recipient's social network. The selection process of GKs assures that they are not malicious users and legitimate unknown senders can reach a recipient with the help of its GKs. According to an aspect of the present invention, the number of GKs can be optimized to select a minimum number of GKs for a maximum outside reachablity.

The present invention can be integrated into current e-mail systems with popular e-mail clients and SMTP servers.

Furthermore, the present invention can be combined with other existing protection systems. In particular, the present invention may also use sender authentication to prevent from address forgery. A solution for a source authentication can be similar to the system "DKIM" described by Allman et al. [5] or the system "SPF" described by M. W. Wong [6].

With the help of hundreds of GKs, a recipient can be possibly reached by millions of users. The solution can be scalably extended to users with larger social distances by iterative GK selection.

The invention will be further described by way of example with reference to preferred embodiments and the drawings, in which:

Fig. 1 is a block diagram of an example how to implement a spam protection system according to the present invention into an existing system,

Fig. 2 shows a community structure of a user 1 ,

Fig. 3 shows an example for the community formation of user 1,

Fig. 4 shows a community structure of a recipient node and Gatekeeper coverage,

Fig. 5 is a diagram of an example for a Gatekeeper authentication and certification,

Fig. 6 shows the diagram of an example for extension of a Gatekeeper selection procedure beyond adjacent communities,

Fig. 7 shows an example of an e-mail processing model,

Fig. 8 shows a diagram of a Diffie-Hellman key exchange protocol,

Fig. 9 shows an example of a modified Station-to-Station (STS) protocol,

Fig. 10 shows an example for an e-mail processing model according to the present invention,

Fig. 1 1 shows two diagrams showing the results of experiments on Facebook samples, and Fig. 12 shows two diagrams showing the results of experiments on Flickr samples.

The method of forming a network according to the present invention is described with reference to the following preferred embodiments.

The first part of this embodiment consists of the formation of a user's social network, also termed as community formation. All the emails generated within the community are authorized and trusted as legitimate messages and are passed directly to the inbox. In order to receive legitimate messages from outside the community a user will do anti-social networking to select Gatekeepers (GKs), which constitutes the second part of this embodiment.

In order to be robust, secure and efficient, the network is preferably formed taking into account one or several of the following design principles:

1. Simple and efficient design: The simpler the system is in terms of usability and design the more efficiently the system can be used and adopted. When fighting against spam, it is not required to have a very complex and robust cryptographic solution with huge infrastructure cost.

2. Decentralized Solution: It is difficult to scale a centralized solution. Every user works out individually to form its own community and select its GKs.

3. Knowledge of the network: A user cannot obtain full information about the global properties of the whole social network, such as the network diameter, central nodes and node degree distribution.

4. Sender Address non-forgeblity: Basic SMTP does not provide any from address authentication. Therefore white listing is vulnerable to spam attacks using arbitrary from addresses. The present invention utilizes standard sender authentication techniques to robustly verify that the from address in the received email is not forged. 5. Privacy of Community list information: Privacy of the community list of each user must always be protected against any external (outside of community) threats and they (lists) should not be exchanged freely at the time of community formation. Each individual node is not allowed to possess necessarily too much information about the network, which may induce privacy and security black-holes.

6. Incrementally deployable: The present invention can be integrated easily into the current SMTP servers. Inevitably, when deployed, some users will adopt the present invention before others. The deployment of the present invention does not worsen the spam problem for those who have not adopted it. Until every user is familiar with the present invention, it is better to run it complementarily with the existing spam filter. The users who have fully adopted in the present invention will get its full benefits; others will be profited until their extent of adoption.

Fig. 1 shows a block diagram of an existing protection system wherein the protection system according to the present invention is integrated. Block 1 which receives any e-mail comprises means for source authentication. This source authentication can be performed by using a known system for example SPF as described by M.W. Wong [6] or "DKIM" as described by Allman et al .[5]. In case the sender of the incoming e-mail cannot be authenticated it is forwarded to a trash. Otherwise, the e-mail follows a default path and reaches block 2 which represents a system for spam protection realized according to the present invention. Depending on the analysis the incoming e-mail is either directly forwarded to the inbox of a recipient or forwarded as a default path to an existing spam filter realized in block 3. This spam filer may be a content filter, may comprise black lists or other tools for protecting against unsolicited e-mail. Depending on the outcome of this filtering the e-mail is forwarded to the inbox of the recipient or sent to the trash.

Figure 2 depicts the community structure for user 1. The community of user 1 is indicated by a dashed line. Direct connections from user 1 to neighboring nodes 2, 3, 7, 9 and 1 1 are representing connections to friends of user 1. The community further comprises the nodes of level 2 which represent a friend-of-friend (FoF). The nodes at the boundary of the community are also named in the following boundary node. Each boundary node has connections to nodes outside of the community wherein any node outside of the community may belong to another community. According to an example of the present invention user 1, i.e. a particular user can receive all the messages from users within its particular community directly into its inbox. The formation of a social community (which also serves as white list/Commlist) is a simple two step process. Figure 3 depicts the community formation process (both the steps).

1. Adding Friends: The first step starts with the initiation of friend request. Anyone can request anyone else for friendship. Addition of friend is the very basic yet extremely crucial step. It is assumed that only two nodes having a mutual trust on each other will join the friend relationship (like in msn, Skype or Facebook). System security and defense against attacks from malicious users depends on the fact that friends relationship always form between two legitimate users having proven record of social interaction.

2. Adding FoF: The idea of FoF addition is that there will be no exchange of friend lists among the friends. Instead any user can suggest its friends (mutually exclusive) to add each other into their communities as FoF. For instance in Figure 3, user 2 has two mutually exclusive friends so 2 will suggest both 3 and 1 to join FoF relationship. If both 1 and 3 accept the suggestion they will add each other into their communities.

At the end of step 2, a community structure for all the nodes with friends and FoFs is formed. All the communities preferably consists of only 2 levels of social components which are considerably close. However, according to alternative embodiments the different communities, i.e. a particular community, any adjacent community and any further community may comprise a different level than another community, i.e. level 1 , 2, 3 or more. For example one adjacent community may consists of 2 levels, whereas other adjacent communities may consist of 3 levels, 4 levels or more an a further community may consist of 1 level.

During the entire process of community formation only local information of direct neighbors is used and the process is carried out in a decentralized manner at each individual user level. Furthermore there is no exchange of friend lists among the users without consensus to protect privacy of each user. By design, community formation is a selective process and may involve certain human involvement to prevent any unnecessary addition in communities and preserve high level of privacy. In order to counter spam and receive legitimate messages beyond a user's social circle, the present invention performs anti-social networking to select Gatekeepers (G s). The term antisocial networking is used to describe the process of a user (recipient) selecting GKs outside its social network. GK is selected to be socially separated from the user on social graph. They are legitimate and authenticated email user lying outside the social community of the recipient. The role of a GK is to vouch for legitimate users outside the community of the recipient for communication. Any emails outside the recipient's community can reach its inbox only if its authenticated GKs vouch for them. To maintain a reliable trust structure, a GK is only authorized to vouch for the nodes in its own community. Since a recipient node can only be reached from outside if its GK has vouched for the communication, it is necessary to find enough GKs to make the recipient node highly reachable from outside. In order to keep the system effective and scalable, the goal of GK selection process is to select minimum GK for maximum coverage . In the email system, a GK enabled user does not have to vouch manually for any other user, instead the mail server hosting it will realize the concept of GK and the trust it inherits to perform its duties transparently.

All the email users may be considered as a connected network and visualized as a graph G = (V, E), with email users as vertices (V) and their relationships (i.e. friends) as edges (E). For every recipient node in G, one need to find a subset S of V (i.e. GK nodes) such that nearly every vertex not in the social community of the recipient node lies within at least one of the communities of the member of S. The size of S should be as small as possible

The total email users today are more than 1.4 billion. Finding a smallest subset of S with maximum coverage will raise scalability question. This is also similar to the minimum dominating set problem, which is a classical NP-complete problem in computational complexity theory, the only difference is that a GK is connected to its community instead of direct neighbors. The discussion above has raised the following questions:

1. Is it a good practice to figure out the optimal (smallest size but biggest coverage) S of GK for the entire network at any single instance? 2. What is the chance that a particular email user has to be reached by all the other users of the entire email network (for instance a black smith in Africa trying to reach an Inuit in the arctic region)?

The answers to the above questions are both negative. Instead of working on a global provisioning of optimal GK subset of V for the entire email network, the system is a scalable approximation solution.

One of the design goals of the system is that a user can not obtain information about the global properties of the social network. The best approach here would be to restrict a user with only its personal community information. The GK selection procedure of the embodiment of the present invention consists of three stages as follows;

Stage 1 : GK selection in adjacent communities

For any given recipient node, the GK selection process starts from the adjacent communities right outside the recipient node's social circle. The process is described below:

1. Request: A recipient node will use its FoFs (also known as the boundary nodes as they are at the edge of the community) to help him find the locally optimal GKs outside its community (Figure 4). A recipient node will simply request all boundary nodes of its community to send their suggestion for good GKs.

2. Suggestion: The boundary nodes will suggest a user from its friends with largest community (outside the recipient's community) to the recipient as a GK. It will also inform the suggested user about the recipient. Figure 4 depicts the selection of GKs by two boundary nodes of the recipient. Once the GKs pass the authentication step (next step), they will be able to vouch for all the users in their communities for communicating with the recipient. Now with the addition of selected adjacent nodes as GKs the reachablity of the recipient has cover level 5. According to the small world property of social network, any two users can be connected with a small number of hops. This suggests that if the email network exhibits a social network behavior, the recipient node would be very highly reachable throughout. 3. Authentication: Once the boundary node suggests a GK to the recipient, it will start a handshake with the GK for mutual authentication i.e. verification that both are legitimate users and establish a secret key SK. Once the GK node has a SK, it will use it to issue signatures to its entire community members and they will use these signatures if they need to communicate with the recipient node (see Figure 5). All the users within a social radius (level or hops) of 5 would be able to send emails to the recipient with an assurance of being free from spam. Distant users having a social distance greater than 5 are covered in stage2 of the GK selection process.

Stage2: GK selection beyond adjacent communities

In order to provide reachablity to other distant users in the email network, the GK selection procedure of this embodiment according to the present invention can be easily extended to select GK in distant (beyond adjacent) communities. The process is very simple (see Figure 6). Once a recipient finalize the selection of GKs in the adjacent communities (stagel), it will send a request to the selected GKs to help them look for further GKs from their adjacent communities. As a result of this request, the GKs will use their boundary node to find new locally optimal GKs in a distant community and send their suggestions back to the recipient. Finally, the recipient will authenticate the new set of GKs from social level 6 and extend its reachablity to level 8. Using the same procedure, the extension of GK selection is possible at any further levels. Of course, all these extensions do not come for free, it is solely dependent on the design choice of the users of this system. They can specify how far they want to extend GK selection.

Garriss et al. [3] received 85% of the email correctly utilizing just its social network levels (friends and FoF). With the anti-social networking using GKs, this embodiment according to the present invention enhances its reliable and spam free delivery of emails beyond social network. The successful formation of social communities and continuous extension of anti-social network between the email users will increase the efficiency of the system. Users having a larger social community would be benefitting more than the isolated and less socially connected users. Stage3: Spontaneous GK selection to accommodate network dynamics

This embodiment according to the present invention, until now, has covered most of the aspects of communication to protect against spamming. There would rarely be any communication outside the social levels covered in stage 1 and 2, but there always exist a possibility. For example, say, there are new email users joining the network or a really distant user wants to start a new collaboration. This embodiment according to the present invention provides spam free email communication to distant and new users as follows. Instead of extending GK selection to the entire network, this example restricts it only to the social levels covered in stage 1 and 2. If a user wants to send an email to a recipient, who is not only outside its community but there is also no GK for the recipient within its community, the user will perform the following two steps;

1. Announcement: A further user in a further community will announce itself to the recipient, i.e. a particular user in a particular community that it wants to communicate and will start the authentication process.

2. Authentication: The further user will start the mutual authentication process to prove that it is a legitimate user and not a spammer. As a result of this process, the user will establish a secret key with the recipient and the recipient will add the user as its GK. The user will further use the secret key to issue signatures to its entire community nodes within its further community and they will be able to use these signatures as well to communicate with the recipient node.

This process is only performed once and for all at the start. After the user is authenticated to be a GK of the recipient, not only the further user but its entire further community can send email to the recipient. So in this way, instead of having GK for the entire network GK can be selected on the fly after stage 2, if there is any communication need.

In the present invention the classical dominating set or distributed dominating set approximation is not used to select the GKs for two main reasons. First, there is preferably no common set of nodes to serve as GKs for the whole population. The reason is that these common GKs will have too much information about everyone in the network and would become privacy and security weak points. The second reason is that by further considering the communication patterns, one does not expect everyone on the planet to communicate randomly with each other (an example is a black smith in Africa and an Inuit in Alaska). The probability actually decreases with an increase in the social distance. Hence, it is not necessary to have a dominating set for the whole population.

Figure 7 depicts the overall flow of an email i.e. from message creation, transport to delivery. Mail user agent (MUA), the sender's email client, submits the email to its mail server (MSA) using SMTP. The sender's MSA will look up the destination's mail exchanger record (MX) in the DNS server. The DNS server finds the highest preference mail server for the recipient and reports the name of the mail server by returning a MX resource record. After this point, a TCP connection is established between the sender's and the receiver's MS As and the sender's MSA send the MAIL FROM command to the receiver. With successful acknowledgement from the receiver side, the complete email (header and the body) is sent, and the TCP connection is released. The mail delivery agent (MDA) delivers the accepted email to a server for local mail delivery. Once delivered to the local mail server, the email is stored for batch retrieval by authenticated mail clients (MUAs) using IMAP or POP.

It is actually the mail servers (MSA) that executes the protocol according to the present invention on behalf of the email users. Each MSA may serve hundreds and thousands of email users depending on the size of the organization. Each email user is only responsible for making decision regarding its own community i.e. adding friends and FoF. The community information of each user is stored in a CommList and it can only be accessed by that particular user or its MSA. All the remaining functionalities of the present invention are handled by the MSA intransparent to the email users, which includes;

• Executing the GK selection protocol, including the mutual authentication of the recipients and G s and signature issuance by the G s to the users of their communities.

• Maintenance of CommList (list of User_1Ds of all the users in the community), SKList (list of GKID, RecipientiDs and shared secret keys SKs established between them) and SignList (list of signature issues by the GK to its community user to communicate with the recipient i.e. list of Sign[(User_1D)_SK, GKID]).

• Email filtering based on CommList and SignList. The functionalities mentioned above can be integrated with Sendmail (MTA) or Mail Avenger SMTP server and large email providers can also implement them on their email servers. All the legitimate MS As must have a valid certificate issued from a Trusted Authority. These certificates are later used to sign messages proving that they are originated from a legitimate MSA. Any MSA with non-valid certificate is assumed to be malicious and all the communication requests associated with those certificates are ignored. Furthermore, it is also safe to assume that it is very hard, if not impossible, for bots or malicious users to reside in a valid and legitimate MSA. The reason is that the addition of email users are very strictly moderated in companies, private institutes and universities ...etc. However, anyone can create a large number of account on webmail providers like gmail, yahoo, hotmail and gmx etc. The previous assumption, that the entire users within any certified MSA are considered to be legitimate, might raise question of human base spamming; since a human spammer will be able to create dummy accounts on webmail providers without any financial cost. But in reality it is not like that due to the following reasons:

• Creating and running a spamming account over yahoo, hotmail and gmail ...etc requires human effort and all of these will incur cost which is against the spamming model.

• Almost all the webmail and Internet service providers impose an email sending limit.

Exceeding the limit results in blocking of an email account for certain amount of time. Table 1 lists the email sending limit of some of the major Internet service and webmail providers. Apart from imposing limits on sending emails, webmail providers also block email account for certain time if the email contains a large number of non-existent or broken addresses that bounce back on failed delivery.

EarthLink 1000 recipients per day

Cablevision/Optimum (OOL) 50 recipients at one time

Road Runner 1,000 recipients per day per IP

AT&T Yahoo 100 recipients per email message

Charter 50 recipients / emails per hour

Table I : Email sending limit by major Webmail and Internet service providers

Mutual authentication protocol is one of the most significant parts of GK selection process. It ensures that both the GK and the recipient are legitimate users and helps to establish a shared secret key SK between them. GK use these SKs to issue signatures to its community members as a vouching mechanism to send emails to the recipient. As mentioned earlier, the MSA of GK and the recipient carries out this protocol transparently from the email users, just like it searches for the destination mail exchange record (MX) from the DNS while an email is being sent. With the successful completion of this protocol, the email users establish SKs with all their legitimate GKs and the members within the GK's communities are able to send emails to the recipient using their GK signature.

Two variations of the protocol are proposed herein based on the difference of the location of the recipient and its GK. For spam protection verifying the legitimacy of a user is enough to counter spam, rather than running costly protocols to authenticate the identity of each user.

If the recipient and its GK are hosted by the same MSA, the system only need to establish a shared secret key SK between the recipient and its GK. This is based on the fact that if any MSA is certified by a Trusted Authority, it is assumed to host and serve only non-malicious users and all the malicious activities are taken care of.

The present embodiment uses the classical Diffie-Hellman key exchange protocol (D-H), based on its ability to establish shared secret key between two parties having no prior knowledge of each other. In this protocol, all the communication is within the MSA i.e. its not over an insecure communication channel, thus, D-H's vulnerability to man-in-the-middle attack is not of concern any more. Here is a general description of the protocol with reference to Figure 8 (see [7] for details). D-H protocol uses the multiplicative group of integer modulo q, where q is a prime and p is a primitive root mod q. • Recipient and its GK agree on a finite cyclic group G and a generating element p in G.

• GK picks a random natural number x and sends p* to recipient.

• Recipient picks a random natural number JC and sends p^ to GK.

• GK computes (p^y)^x.

• GK computes (p^x/.

Finally both GK and recipient are in possession of which can be used as SK. Security of D-H is based on the discrete logarithm problem. In the present embodiment GK will always initiate the protocol.

If the recipient and its GK are hosted by different MSAs, the present embodiment will run an authenticated shared key establishment protocol for establishing SK. In 1992 Whitfield Diffie, Paul C. van Oorschot and Michael J. Wiener presented Station-to-Station (STS) protocol [8]. STS is based on the classical D-H protocol and provides entity authentication along with SK.

In the present embodiment the STS protocol is modified. Since the recipient and GK belong to different MSAs the system performs authentication at the server level to verify that both recipient's and GK's servers are legitimate and untempered. SK establishment is performed at the user level, the users being the recipient and GK. The benefit of server base authentication is that there is no need to authenticate at the user level, using public key certificates, thus, a lot of complexity is reduced by avoiding use of a system wide PKI. Authentication of the servers is enough to assume that the users hosted on them are also valid and legitimate. Since, as far as the present application scenario is concerned, the legitimacy of users is enough to fight against spam, rather than authenticating the identity of every user. Following is a brief description of the modified protocol with reference to Figure 9 (see [8] for actual STS protocol in detail). The protocol uses the multiplicative group of integer modulo q, where q is a prime and p is a primitive root mod q.

• Recipient and its GK agree on a finite cyclic group G and a generating element p in G.

• GK picks a random natural number JC and sends p^x to recipient. • Recipient generates a random natural number y and computes ff. It further computes the shared secret key SK = (p^x mod q. After that it concatenates the exponentials ([?; p^x) (order is important), signs them using private signature key of its MSA, and then encrypts them with SK (since the protocol is executed by the MSAs themselves therefore, access to the private signature key is available). Finally it sends the cipher texts along with its own exponential and its MSA's certificate issued by the Trusted Authority to GK.

• GK computes the shared secret key SK = mod q and decrypts and verifies recipient's signature.

• GK concatenates the exponentials (p^x; pP) (order is important), signs them using its MSA's secret key for signature scheme and then encrypts them with SK. Finally it sends the cipher text after concatenating it with its MSA's certificate issued by the Trusted authority.

• Recipient decrypts and verifies GK's signature.

MSAs of GK and Recipient are now mutually authentication and GK and Recipient are trusted to be legitimate and have a shared secret key SK. Once the secret key SK is establish the MSA will make an entry in the SKList of its respective user (GK and recipient). Each SKList entry consists of G ID, RecipientiDs and shared SKs established between them.

Since addresses are not authenticated in SMTP may be easy for the spammer to launch a spam attack with forged from addresses as if they are from the recipient's community. In order to solve this problem the present embodiment according to the present invention uses the standard sender authentication techniques (for instance SPF [6], DKIM [5]) which are already being used in the existing email system.

Once the GK has the SK its MSA will issue a signature Sign[(Userj_D)sK, GK_IDJ) to all the users of the GK's community. These signatures are added to the SignList of the users and to use later for communication with the recipient.

Working of the present embodiment according to the present invention is intransparent to the users like any other spam filter. Now it is assumed that the social communities are defined and the GKs are selected. Each user will store Commlist, SKList and SignList at the local mail server. There are two types of messages sent and received i.e. messages within the social community and those outside. The processing of both the messages are explained as follows with reference to Figure 10.

1. Message within community: If a message is sent to any recipient within the community, the message will flow all the way to the receiver's MSA. At the MSA, the sender will be verified against the recipient's Commlist and will be placed into its mailbox.

2. Message outside community with GK: If a message is sent to a receiver outside its community, the sender's MSA will bind a signature, issued by an authorized GK to communicate with the recipient, along with the message. When the message arrives at the recipient's MSA, it is verified using the SK of the GK listed in the signature. On successful verification it will place the message in the recipient's mailbox.

3. Message outside community without GK: If a message is intended for a recipient outside the sender's community and with no signature issued by any GK, the sender's MSA will hold the message and start a GK selection procedure (stage 3). The sender will be announced as a potential GK for its community and the GK authentication procedure will be carried out as discussed earlier. On successful completion the sender will be selected as a GK for the receiver and the MSA will now bind its signature with the withheld message and send it out. When the message arrives at the receiver's MSA, it will be verified using the SK of the GK listed in the signature and on successful verification it will place the message in the receiver's mailbox.

One of the main contributions of the present invention is that it prevents the transmissions of spam across the network at the first place to save maintenance and infrastructure cost of the network operators. The flow of an email message from the sender's client to the receiver's inbox has been described above. In order to prevent spam transmission at the first place, the system adds a minor verification step. It is considered that the sender's and receiver's MSA have already established a TCP connection. Now, when the sender's MSA send the MAIL FROM command, it also append it with a signature, issued by the authorized GK to communicate with the recipient, if the recipient is not in the sender's community. At the recipient's end, the MSA verifies if the sender in the MAIL FROM command is a community member or not. If the sender is a member, the receiver's MSA sends back an acknowledgement and the process continues. On the other hand if the sender is not in the receiver's community, the MSA checks for a valid signature of a GK. Failure to present a valid signature results in termination of the TCP connection by the receiver and the transmission of email (header and body) will not take place.

Working of the present invention is in transparent to the users and it is actually the Mails Server (MSA) that execute the protocol on behalf of the email users. This means that the MSA manages the lists (SKList and SignList) for the email users. If a user (who is also a GK) is compromised it will only have a temporary local effect within the community. The effect is temporary and lasts until the victimized user broadcast about the incident using his other Ids (may be through friends or word of mouth) or claims back his ownership from the email service provider. If the victimized user is unable to reclaim the ownership of its Id, in that case the user can always request its community to abandon its compromised Id and MSA will remove all the data associated with the compromised Id from the SKList and SignList. Hence, the attacker would not be able to harm the system on large scale as the SKs and signatures are handled internally by the MSA.

EXPERIMENTS AND RESULTS

In order to verify the feasibility and scalability of the present embodiment, two large scale online social network datasets i.e. Facebook and Flickr have been used. Data samples of Facebook and Flickr are good choices for evaluating the present embodiment as they represent real or cyberly real social connections. The GK selection procedure at stage 1 (GK selection in the adjacent communities) has been evaluated, because of the limitation of the dataset size. Although both datasets contain millions of users, the average path lengths are no more than 5 hops. For the results the following two things are of interest:

1. Number of GKs for receiving messages: As a result of GK selection procedure, each recipient ends up authenticating certain number of GKs outside its social network to vouch for legitimate users. Feasibility and scalability of the present embodiment depend on the fact that the number of GKs selected for a particular recipient lies within a small range. The more the required GKs, the larger the number of SKs needs to be establish and maintain. 2. Reachablity of recipient via GKs: The success of the present embodiment also depend on how many legitimate users outside the community can possibly reach a recipient with a certain number of GKs. Ideally, a maximum number of legitimate users with a minimum number of GKs can be expected.

Table 2 presents the high-level statistics of Facebook and Flickr datasets gathered and used in [9, 10]. Facebook is the largest social network in the world and the number one photo sharing site on the internet. It is a "pure" social network, in the sense that its primary purpose is finding and connecting to other users. The present data sample of Facebook consists of 3.1 million users with over 23 million edges and an average of 15.2 friends per user. Flickr on the other hand is not a pure social network, intended primarily for publishing, organizing and locating content. This dataset of Flickr consists of 1.7 million users with over 15 million edges and an average of 18.1 friends per user.

Around 3000 nodes within the graph of Facebook samples are randomly selected and tested for GK selection. The nodes are selected randomly with the constraints that the community size should be between 100 and 650 (this number is quite reasonable for an average email user) and the number of friends for any given node should be greater than 25. In the following the results for Facebook are discussed;

Table 2: High-level statistics of out Facebook and Flickr datasets Figure 11 presents the results of the number of GKs selected for a recipient to receive messages outside its community. The number of required GKs is very reasonable, ranging between 58 to 420 and most of the time the number is less than half of the community size. The number of GKs shows a near linear relationship with the number of boundary nodes. Increase in the number of boundary nodes also show a relative increase in the number of GKs but this is not always the case. It is observed several times that a higher number of boundary nodes results in smaller number of GKs. The GK number is lower if the GK is selected from a region where the nodes have high clustering coefficient, which results in the suggestion of the same GK from a number of boundary nodes. SKlist consist of three entries, ID of GK, recipient and the SK itself. Currently, about 99% of the email addresses are on average of 22 characters (i.e. 22 bytes in terms of space). If one chooses the key size of 256 bits i.e. 32 bytes, any single entry in SKList will cost only 76 bytes. With the worse case of 420 GKs, the SKList size will be still only 31.37 Kbytes, which is efficient in space based on the storage capacity nowadays.

Figure 11 shows the results of the number of users that can reach a particular recipient with the help of GKs. With a minimum number of GKs the reachablity of the recipient is ranging between 760K to 1.45 million i.e. 24 to 47% of the total network and most of the time it remains above 35%. All of these are achieved with merely the execution of stagel GK selection process. Based on these results, one can safely assume that in reality there would rarely be any messages sent to the recipient by a sender not covered by the GKs. Nevertheless recursive iterations of stage2 and the use of stage3 of the GK selection process will outcast nearly any rare case of a legitimate message not being handle.

In the Flickr case, more than 500 nodes within the graph of Flickr sample were randomly selected and tested for GK selection. The nodes have a community size between 100 and 500 and an average number of friends greater than 25 (same setting as for Facebook).

Figure 12 shows the results of the number of GKs for a recipient to receive messages from outside its community. The resulting number of GKs range between 23 to 153 and most of the time the numbers are less than 30% the community size. The numbers are reasonably small. Any single entry in SKList will cost only 76 bytes and even with the worse case of 153 GKs the SKList size will be only 11.35 Kbytes. Figure 12 presents the results to show the number of users that can reach a particular recipient with the help of GKs. With the selected GKs above, the reachablity of the recipient is in a range between 643K to 854K i.e. 38 to 50 % of the total network and mostly it remains above 45%. Flickr is not a pure social network and is intended primarily for publishing, organizing and locating content. It contains a large number of strongly connected cores of very high degree nodes. Due to this most of the boundary nodes end up suggesting the same node as GK, thus, resulting in smaller number of GKs covering a large number of users. The case with Facebook is different as it is a pure social network with the primary purpose to find and connect to new users.

MSA maintains a SignList for each user, containing signatures of the GKs to send emails to the recipients outside their community. A single entry in SignList occupies only 44 bytes of space. Even if a SignList contains a million entries it will occupy only 42 Mbytes of space. Nowadays, webmail providers allow 20 Mbytes of attachments for a single email and allocate multiple GBs of space to single user. Therefore, if a SignList occupies couple of Mbytes, it will not create any scalability issue.

Based on the results presented in this section, one can confidently conclude that the present invention is scalable in terms of number of required GKs and the reachablity. With the help of only hundreds of GKs, a recipient can be reached by millions of users and the solution can be scalably extended to the users with even further social distance by further GK selection. Increase in the size of a recipient's community has a direct impact on its reachablity. User having a larger social community would be benefitting more from the present invention than the isolated and less socially connected users.

The present invention has now being described with reference to several embodiments thereof. The foregoing detailed description and examples have been given for clartiy of understanding only. No unnecessary limitations are to be understood therefore. It will be apparent to those skilled in the art that many changes can be made in the embodiments described without departing from the scope of the present invention. In particular, although features and elements of the present invention are described in the preferred embodiments, in particular combinations, each feature or element can be used alone without the other features and elements of the preferred embodiments or in various combinations with or without other features and elements of the invention. Therefore, the scope of the present invention should not be limited to the methods and systems described herein. References

[1] P. Oscar Boykin and Vwani Roychowdhury. Personal email networks: An effective anti- spam tool.

IEEE COMPUTER, 38:61, 2004.

[2] A Mislove, A Post, P Druschel, and KP Gummadi. Ostra: Leveraging trust to thwart unwanted communication. In Proceedings of the 5th Symposium on Networked Systems Design and Implementation

(NSDI'08), San Francisco, CA, USA, April 2008.

[3] S.Garriss, M.Kaminsky, M.J.Freedman, B.Karp, D.Mazi'eres, and H.Yu. Re: Reliable email. In

Proceedings of the 3rd Symposium on Networked Systems Design and Implementation (NSDI'06),

SanJose, CA, May 2006.

[4] Paul-Alexandru Chirita, Jorg Diederich, and Wolfgang Nejdl. Mailrank: using ranking for spam detection. In CIKM '05: Proceedings of the 14th ACM international conference on Information and

knowledge management, pages 373-380, New York, NY, USA, 2005. ACM.

[5] E. Allman, J. Callas, M. Delany, M. Libbey, J. Fenton, and M. Thomas. Domainkeys identified mail

(dkim). RFC 4871.

[6] M. W. Wong. Sender authentication: What to do. http://spf.pobox.com/whitepaper.pdf, July 2005.

[7] W. Diffie and M.E. Hellman. New directions in cryptography. IEEE Transactions on Information Theory, IT-22:644-654, 1976.

[8] Diffie.W, van Oorschot.P.C, and M.J. Wiener. Authentication and authenticated key exchanges. Designs, Codes and Cryptography, 2:107-125, 1992.

[9] Alan Mislove, Massilmiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. Measurement and analysis of online social networks. In Proceedings of the 5th ACM/USENIX Internet Measurement Conference (IMC'07), San Diego, CA, October 2007.

[10] Christo Wilson, Bryce Boe, Alessandra Sala, Krishna P.N. Puttaswamy, and Ben Y. Zhao. User interactions in social networks and their implications. In EuroSys '09: Proceedings of the 4th ACM European conference on Computer systems, pages 205-218, New York, NY, USA,

2009. ACM.

Claims

1. Method of forming a network for communication between a particular user who is a member of a particular community which preferably comprises two levels, with users outside of this particular community, comprising the steps of:

selecting at least one trusted user who is a member of another community outside of the particular community, wherein this other community preferably comprises two levels of users, and

using the trusted user to vouch for communication between the particular user in the particular community and any user in this other commumty.

2. Method according to claim 1, further comprising the steps of:

selecting a plurality of trusted users, each trusted user corresponding to one of a plurality of other communities,

checking the number of users within each of the plurality of other communities and

reducing a number of trusted users from the selected trusted users on the basis of the number of users within its respective community, preferably such that a minimum of trusted users are used for vouching communication between the particular user in the particular community with a maximum number of users outside of the particular community.

3. Method according to claim 1 or 2, wherein the step of selecting a trusted user comprises the following further steps:

sending on behalf of the particular user a request for a proposal of a trusted user to at least one of the users at boundary nodes of the particular community,

receiving a proposal for a trusted user in an adjacent community from said at least one user at a boundary node of the particular community and wherein said at least one user at the boundary node of the particular community informs the proposed trusted user in the adjacent community of its proposal,

performing a mutual authentication of the particular user with the proposed trusted user in the adjacent community and

upon this authentication, adding this trusted user to the list of trusted users of the particular user.

4. Method according to claim 3, further comprising the following steps:

sending on behalf of the particular user a request for a proposal of a trusted user from a trusted user within an adjacent community to at least one user at a boundary node of this adjacent community,

receiving a proposal for a trusted user in a distant community from said at least one user at a boundary node of the adjacent community, wherein said at least one user at the boundary node of the adjacent community informs the proposed trusted user in the distant community of its proposal,

forwarding the proposal from the trusted user in the adjacent community to the particular user in the particular community, and

performing a mutual authentication between the particular user in the particular community and the proposed trusted user in the distant community and

5. Method according to any of the preceding claims, further comprising the following steps: receiving an announcement from a further user who is a member of a further community, and

performing a mutual authentication between the particular user in the particular community and the other user in the further community, and

upon this authentication, adding said further user as a trusted user to the list of trusted users of the particular user.

6. Method according to any of the preceding claims, further comprising the following steps: verifying the legitimacy of a selected trusted user and the particular user and

upon verification of each other's legitimacy, establishing a secret key, preferably a common shared secret key and

creating a signature on the basis of the secret key and sending the created signature to the users in the community of said trusted user.

7. Method according to any of the preceding claims, wherein the two levels of the particular community are friends of the particular user and its friends-of-friends (FoF) and wherein the two levels of the other community are friends of the trusted user and its friends-of-friends.

8. Method according to any of the preceding claims, comprising the steps: storing a user ID of each user in the particular community of said particular user in the form of a CommList in a first storage of a main server,

storing the user ID for each trusted user in one of the other communities for said particular user in the form of a GKList in a second storage of the mail server,

storing a secret key and the corresponding signature created by a trusted user in the form of a SignList in a third storage of said mail server.

9. Method for E-mail communication using a network which is formed by a method according to any of the preceding claims, wherein:

the trusted user of an other community vouches for any communication being send from a user within this other community to the particular user in the particular community.

10. Method according to claim 9, further comprising the steps:

binding a signature to a message from said user in this other community, wherein said signature is issued by the trusted user of this other community by using a secret key, preferably a common shared secret key, wherein said secret key is established upon verification of each other's legitimacy of the particular user and the trusted user,

verifying the message with the signature by using said corresponding secret key, and upon successful verification, placing the message in the mail box of the particular user.

11. Method according to claim 10, further comprising the steps:

upon establishing a TCP connection between a sender's mail server and a receiver's mail server for sending the message from said user in this other community to the particular user in the particular community as a recipient, the sender's mail server appends the signature to the Mail From command,

verifying at the receiver's mail server whether the Mail From command belongs to a member of the particular community or not,

if the sender is a member of the particular community, the receiver's mail server sends back an acknowledgement and the process continues,

and

if the sender is not a member of the particular community, the receiver's mail server checks for a valid signature of a trusted user and upon successful verification, the receiver's mail server sends back an acknowledgement and the process continues, or in case upon failure in verification of the signature, the TCP connection is terminated by the receiver's mail server and the transmission of e-mail will not take place.

12. System for forming a network for communication between a particular user who is a member of a particular community which preferably comprises two levels, with users outside of this particular community, comprising:

means for automatically selecting at least one trusted user who is a member of another community outside of the particular community, wherein this other community preferably comprises two levels of users, and

means for using the trusted user to vouch for communication between the particular user in the particular community and any user in this other community.

13. System according to claim 12, further comprising:

means for automatically selecting a plurality of trusted users, each trusted user corresponding to one of a plurality of other communities,

means for automatically checking the number of users within each of the plurality of other communities and

means for automatically reducing a number of trusted users from the selected trusted users on the basis of the number of users within its respective community, preferably such that a minimum of trusted users are used for vouching communication between the particular user in the particular community with a maximum number of users outside of the particular community.

14. System according to claim 12 or 13, wherein the means for selecting a trusted user comprises:

means for automatically sending on behalf of the particular user a request for a proposal of a trusted user to at least one of the users at boundary nodes of the particular community,

means for automatically receiving a proposal for a trusted user in an adjacent community from said at least one user at a boundary node of the particular community and wherein said at least one user at the boundary node of the particular community informs the proposed trusted user in the adjacent community of its proposal,

means for automatically performing a mutual authentication of the particular user with the proposed trusted user in the adjacent community and

means for automatically adding this trusted user to the list of trusted users of the particular user upon this authentication.

1 . System according to claim 14, further comprising:

means for automatically sending on behalf of the particular user a request for a proposal of a trusted user from a trusted user within an adjacent community to at least one user at a boundary node of this adjacent community,

means for automatically receiving a proposal for a trusted user in a distant community from said at least one user at a boundary node of the adjacent community, wherein said at least one user at the boundary node of the adjacent community informs the proposed trusted user in the distant community of its proposal,

means for automatically forwarding the proposal from the trusted user in the adjacent community to the particular user in the particular community, and

means for automatically performing a mutual authentication between the particular user in the particular community and the proposed trusted user in the distant community and

16. System according to any of claims 12 to 15, further comprising:

means for automatically receiving an announcement from a further user who is a member of a further community, and

means for automatically performing a mutual authentication between the particular user in the particular community and the other user in the further community, and

means for automatically adding said further user as a trusted user to the list of trusted users of the particular user upon this authentication.

17. System according to any of claims 12 to 16, further comprising:

means for automatically verifying the legitimacy of a selected trusted user and the particular user and

means for automatically establishing a secret key, preferably a common shared secret key upon verification of each other's legitimacy, and

means for automatically creating a signature on the basis of the secret key and means for automatically sending the created signature to the users in the community of said trusted user.

18. System according to any of claims 12 to 17, wherein the two levels of the particular community are friends of the particular user and its friends-of-friends (FoF) and wherein the two levels of the other community are friends of the trusted user and its friends-of-friends.

19. System according to any of claims 12 to 18, comprising:

means for automatically storing a user ID of each user in the particular community of said particular user in the form of a CommList in a first storage of a mail server,

20. System for E-mail communication using a network according to any of claims 12 to 19, comprising:

means for vouching for any communication being send from a user within another community to the particular user in the particular community.

21. System according to claim 20, further comprising:

means for automatically binding a signature to a message from said user in this other community, wherein said signature is issued by the trusted user of this other community by using a secret key, preferably a common shared secret key, wherein said secret key is established upon verification of each other's legitimacy of the particular user and the trusted user,

means for automatically verifying the message with the signature by using said corresponding secret key, and

means for automatically placing the message in the mail box of the particular user upon successful verification.

22. System according to claim 21, wherein upon establishing a TCP connection between a sender's mail server and a receiver's mail server for sending the message from said user in this other community to the particular user in the particular community as a recipient,

the sender's mail server comprises means for automatically appending the signature to the Mail From command,

the receiver's mail server comprises means for automatically verifying whether the Mail From command belongs to a member of the particular community or not, if the sender is a member of the particular community, the receiver's mail server is adapted to send back an acknowledgement and the process continues,

and

if the sender is not a member of the particular community, the receiver's mail server is adapted to check for a valid signature of a trusted user and upon successful verification, the receiver's mail server is adapted to send back an acknowledgement and the process continues, or in case upon failure in verification of the signature, the receiver's mail server is adapted to terminate the TCP connection and the transmission of e-mail will not take place.