WO2008059237A1 - Electronic mail filter - Google Patents

Electronic mail filter Download PDF

Info

Publication number
WO2008059237A1
WO2008059237A1 PCT/GB2007/004341 GB2007004341W WO2008059237A1 WO 2008059237 A1 WO2008059237 A1 WO 2008059237A1 GB 2007004341 W GB2007004341 W GB 2007004341W WO 2008059237 A1 WO2008059237 A1 WO 2008059237A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
message
rejectable
filter
library
Prior art date
Application number
PCT/GB2007/004341
Other languages
French (fr)
Inventor
Lawrence James Robert Keable
Original Assignee
Keycorp Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Keycorp Limited filed Critical Keycorp Limited
Publication of WO2008059237A1 publication Critical patent/WO2008059237A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/107Computer-aided management of electronic mailing [e-mailing]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/21Monitoring or handling of messages
    • H04L51/212Monitoring or handling of messages using filtering or selective blocking

Definitions

  • the present invention relates to an electronic mail filter and method of electronic mail filtering for use particularly with regard to emails.
  • spam Use of electronic communication and in particular electronic mail in the form of emails received between computers and PDAs and mobile phones is an increasingly important form of communication.
  • the nature of such ease of communication means that so called "spam" messages are forwarded to enable surreptitious advertising and promotion of activities as well as for simple nuisance and possibly malicious intentions.
  • a broad definition of spam relates to obscuring or hiding the contents of an email such that it cannot be easily filtered through typical prior filtering techniques.
  • spam will be mixed with legitimate and desired electronic mail messages which the user will want to receive but a requirement to sift through a large numbers of spam and erroneous emails will be detrimental both to the user's convenience with respect to electronic mail delivery, and acceptability of the electronic mail system.
  • the amount of spam electronic mail is rising each year and can be considered to have reached epidemic proportions with the result that it is undermining the convenience of the electronic mail system.
  • Originators of spam and illicit electronic mail typically use relatively simple techniques such that the spam messages can be delivered through filter mechanism and methods.
  • Original spam recognition systems recognised sender email addresses, groups of words and styles of wording.
  • these techniques are increasingly less effective and the originators of spam messages are increasingly sending their messages in the form of text embedded in graphical images.
  • the text appears the same however anti spam systems cannot recognise the graphic images.
  • the graphic images themselves can be created slightly differently each time to ensure that anti spam systems which depend upon a simple checksum to identify graphical image similarity cannot be utilised.
  • an electronic mail filter comprising:
  • a receiver to receive an electronic message and parsing that message as a received image; b) A rejectable message library recorded in or convertible to an image form consistent with the received image; c) An image window specifier to select a part of an image from the rejectable image library; and, d) A comparer to compare sequentially the part of the image from the rejectable image library with the received image to identify as an overlay matches indicative of consistency between the electronic message and one of the library of rejectable messages.
  • the image form is optical and as seen.
  • the image form comprises text and/or HTML messaging.
  • the image window is fixed.
  • the image window is fixed in terms of size and/or shape.
  • the image window is variable.
  • the image window is variable in terms of size and shape and orientation relative to the image formed.
  • the image window is arranged to span portions of text characters on one or more lines. Potentially, the image window is adjustable for consideration of magnitude and size comparison between the image of the received electronic message and one of the images in the rejectable message library.
  • a background filter is provided to eliminate background colour and scenery for comparison between the image window of the received message and the image window taken from the rejectable message library.
  • comparison between the image window of the rejectable message library and the image of the received electronic message is compared sequentially in a raster arrangement across the image of the electronic message.
  • the filter or method incorporates a statistical element to indicate the number of instances of consistency between the image of the electronic message and one of the rejectable message library images.
  • a rejection mechanism is provided whereby if the number of instances of consistency exceed a threshold level then rejection of the received electronic message is provided.
  • an indictor is provided for the electronic message indicative of the likelihood of that electronic message being substantially a rejectable message as provided in the rejectable message library.
  • the rejectable message library is augmented by storing rejectable messages identified by other means and uploaded by users of the electronic mail filter or the method of filtering electronic mail.
  • Fig. 1 is an illustration of a typical electronic message incorporating text and an image
  • Fig. 2 illustrates two alternative forms of electronic message arranged to form typical electronic mail filter arrangements
  • Fig. 3 illustrates two alternative electronic mail messages incorporating images arranged to avoid conventional electronic mail filters and methods
  • Fig. 4 is a schematic illustration of an image window utilised in accordance with aspects of the present invention in order to provide electronic mail filtering;
  • Fig. 5 illustrates alternative image windows for utilisation in accordance with aspects of the present invention.
  • HTML is a simple programming language that is used to describe web pages.
  • a HTML email is basically a simple single page website and contains the text of the message plus any pictures to be displayed and links to external websites.
  • HTML messaging renders it even more difficult to utilise text based recognition systems in order to act as a filter with regard to illicit messages. These illicit messages can be described as rejectable messages which a user does not wish to view and therefore are an irritant.
  • an HTML electronic message is first parsed or rendered, that is to say turned into a received image that would normally be viewed as an email. This received image is then compared with known rejectable messages, commonly referred to as spam using various graphical image recognition techniques. In such circumstances essentially aspects of the present invention utilise a pictorial representation of the message rather than the text within the message for filtering.
  • aspects of the present invention take an image window which is a portion of a received electronic message and compare it with similar portions of rejectable messages already identified.
  • aspects of the present invention check for illicit messages by matching parts, that is to say image windows of a graphical image against a database of such graphical images in the same image format located within a rejectable message library.
  • the rejectable message library is built by uploading illicit messages or spam received by others. It will be appreciated that current techniques with regard to illicit message filtering in addition to a basic list of addressees, keyword and styles of words also depend upon a network of operatives identifying spam and then uploading that spam to an existing library. Aspects of the present invention use such uploading from a user group in receipt of illicit electronic messages in order to create the rejectable message library.
  • the rejectable message library as indicated, is produced and parsed/rendered in an image form consistent with the image form of the electronic message as presented or converted appropriately. In such circumstances, a like for like pictorial overlay comparison can be achieved and flagged if appropriate.
  • the pictorial image comparison in accordance with aspects of the present invention essentially comprises comparing pixels or dot representations of the graphic within an image window taken as a segment of the received image and a segment of the images stored in the rejectable image library. For comparison at a basic level a like for like overlay approach is taken such that if there is consistency between part of the image of an illicit rejectable message and part of the image of the received message this will be highlighted.
  • the size and positioning of the image window is important.
  • the image window is of a large size including several lines of text or graphical images the likelihood of an exact comparison resulting in a consistency indicator is limited.
  • the image window is small and is positioned such that it simply retains a single word in a common text format such as the word "the” or "and” in all likelihood legitimate messages will also incorporate those words in that format and therefore these messages will also be highlighted as illicit when this is untrue.
  • various techniques are utilised in accordance with aspects of the present invention in order to avoid such rejection of legitimate messages.
  • Electronic mail filters and methods of electronic mail filtering in accordance with aspects of the present invention utilise relatively high powered computer processors such that high volumes of graphic image comparisons can be achieved in a relatively short time period.
  • the respective image windows from the received message in the form of a received image and image windows taken from the rejectable image library are compared like for like in a standard overlay. If the image window from the rejectable image library is consistent with parts of the receivable image then this will be flagged as a consistency.
  • the number of consistencies allowable within a legitimate message will be set as a threshold such that if the number of consistencies between the received message and rejectable message is below that threshold then the message will be deemed legitimate, whilst above that threshold the message will be deemed illicit and therefore rejectable.
  • various grades of potential suspicion may be established by thresholds and each level of potential legitimacy or illegitimacy flagged by an appropriate signal associated with the electronic message in the user's reader.
  • the image window taken from the rejectable messages library may be compared with all parts of the received image from the electronic messages but this approach may be slow.
  • the whole received image may be compared with a whole image in the same format held within the rejectable message library.
  • image windows at those points may be compared.
  • this approach assumes that the illicit message has substantially the same format which with regard to persons attempting to forward illicit electronic mail may not always be the same. Nevertheless, as a first check such an approach will provide an indication whether subsequent testing should be performed.
  • Further means of reducing the time for testing include separating graphic images within an electronic message from text to enable review of just the graphic images such that conventional techniques can be applied to the text or rendering the text of the email in a different colour to allow hard edges of images to stand out. For example, by simply changing text in an email to black prior to rendering of images, it will be easier to test for comparison. It will be understood that one relatively easy approach with respect to avoiding conventional electronic mail filtering regimes is to present text in different colours which, by their nature, are difficult to compare with known text using conventional techniques.
  • Fig. 1 provides an illustration of one illicit message in which a block of text 1 is provided with an image 2 included within the electronic message as a graphic file.
  • the text within the graphic image 2 can be coloured such that by converting all this text to a known colour, that is to say black, comparison in an image to image fashion can be more readily achieved.
  • Change of the text colours as well as background colour 3 for the text are conventional techniques used by originators of illicit electronic messages to avoid filtering techniques. In such circumstances as part of the parsing or rendering of an electronic message and rejectable messages into a consistent image format there will be removal of coloured text as well as filtering out of background colouring 3 to a consistent colour, that is to say white in order to create a standard block for comparison testing.
  • the full electronic message is converted to a graphic image and image windows taken in the form of samples which will be matched with reciprocal image windows taken from the rejectable message library such that consistency will provide a score with respect to the likelihood of the received electronic message being illicit.
  • the method and filter of aspects of the present invention may be taken to create an image window for the block "you've” even though this section is within the graphic image 2.
  • aspects of the present invention compare pictorially image windows or blocks from the rejectable message library and a part of the whole graphic of the electronic message.
  • Fig. 2 illustrates a common situation with regard to electronic mail in the form of text which is designed to avoid basic electronic email filters and methods as known.
  • the text between respective messages 2a, 2b is substantially the same although there are a number of marginal differences.
  • the fourth word thinner in Fig. 2a is spelt thin+ner whilst in Fig. 2b this word is spelt thin-ner.
  • appetite is presented with an asterisk in Fig. 2a line 4 and with a gap in Fig. 2b at line 4.
  • Fig. 2b there are a number of such marginal differences between the text messages which are designed to fool existing electronic message and mail filters.
  • the electronic messages in Fig. 2 are just text, it will be appreciated that simple techniques with regard to comparisons utilised for electronic message filtering are difficult to utilise as each message has changed in terms of the text each time.
  • graphic pictorial image blocks are compared block for block to determine whether there is a successful consistent match indicative of an illicit message.
  • the image window utilised for comparison may comprise one or more whole or part words in one or more lines of the text and so by iterative sequential comparison which may compare the whole text progressively between each message or by using selected random or specific focus points upon which the image windows are taken it will be understood that illicit message identification and filtering can be achieved.
  • image window 20 which will be stored within a rejectable message library such that for example if the message in Fig. 2a is taken as a rejectable message connected into a consistent image format an image window 20 can be taken for comparison with a received electronic message depicted in Fig. 2b.
  • image window 21 there is consistency between the image window 20 and image window 21 in the text of Fig. 2b. This consistency will be highlighted and is indicative of a rejectable illicit message in accordance with aspects of the present invention.
  • image window 22 in Fig. 2a can be found consistent with an image window 23 in Fig. 2b and again consistency found.
  • these windows 20, 22 may be compared in a raster or iterative means by simple overlay one upon the other to establish consistency.
  • a datum or anchor point 24 may be established in the text and then the window 23 for comparison with the window 22 centred about a similar datum or anchor point 25 in the text of Fig. 2b.
  • less comparison operations will be required to establish probable consistency between the text of Fig. 2a and Fig. 2b and therefore an indicator as to validity of the electronic mail or message.
  • a rejectable message as illustrated with regard to Fig. 2a will typically be established by an initial first match through a human review of the text such that this rejectable message can then be uploaded to the rejectable text library.
  • the rejectable message library may record the electronic message in any appropriate form and incorporate appropriate converters to convert the electronic message into a consistent image format with that of any received message for comparison. It is important with regard to aspects of the present invention that there is consistency in the image format as the method and filter depend upon graphic pictorial comparison with the as seen form.
  • the rejectable message library may store whole images or parts of images for comparison.
  • an originator of an illicit electronic message or mail may modify the text as well as an attached image received in the form of a HTML element.
  • Fig. 3 illustrates two respective messages which are substantially consistent but which would present difficulties with regard to filtering with conventional filters and methods as known prior to the present application.
  • the text 31 in Fig. 3a is different to the text 32 in Fig. 3b.
  • the graphic image 33 in Fig. 3a is sufficiently different to the graphic image 34 in Fig. 3b such that a simple text mechanism for comparison would not be able to establish consistency.
  • the dots and dashes 35, 36 in the graphic image 33 are different to the dots and dashes 37, 38 in graphic image 34 and therefore as compared by a checksum approach these images are different and therefore would not be matched.
  • the coloured text in the graphic images 33, 34 would be rendered as black text to allow comparison by an image window comprising a proportion of a rejectable message in the form of a consistent format image for comparison with the same format image in a received electronic, mail.
  • an image window comprising a proportion of a rejectable message in the form of a consistent format image for comparison with the same format image in a received electronic, mail.
  • a possible image window would be as depicted as box 39 and this can be compared with box 40 in the graphic image 34 depicted in Fig. 3b.
  • This rendering to a consistent block image can be considered part of the achievement of a consistent image format for comparison.
  • an image window which comprises a graphic portion of an image of a rejectable message for comparison with the whole or selected parts dependent upon datum or anchor points in a received image of an electronic message for processing.
  • Fig. 4 illustrates two potential image windows to allow comparison in accordance with aspects of the present invention.
  • Fig. 4a is generally consistent with the approach depicted with regard to Figs. 2 and 3 above.
  • the image window 41 takes whole text or ietters/numbers for comparison.
  • the image window 41 may be anchored about a spatial datum point within a graphic image for consistency between the rejectable message image and the received image of the electronic message to be filtered.
  • the image window 41 may be moved about a received message randomly or iteratively or as a raster process in order to establish by overlay consistency between the window 41 and the received message in the consistent image format.
  • consistency between the image windows of the rejectable message and the received message will provide an indication as to the acceptability of that received message.
  • a particular advantage with regard to aspects of the present invention is that the image window need not provide or consider whole words or provide a simple rectangular comparison as will be described later with regard to Fig. 5.
  • an image window 43 extends over part of the letters.
  • This window 43 will be compared with the received message to determine consistency between the image window taken from an image in a consistent format of the rejectable message stored in a rejectable message library. Again the comparison may be based upon a datum point (riot shown) or raster overlay comparisons in order to find a consistency within the received message.
  • the image windows essentially comprise a dot or pixel matrix for comparison between the received message and image of the rejectable message as seen.
  • graphic comparison software the percentage similarity between the image windows can be identified and such minor changes will therefore be less conclusive with regard to fooling a filter into considering the received image as different from existing rejectable messages.
  • Fig. 5 illustrates four potential approaches with respect to altering the image window to take account of evasive measures taken by originators of illicit messages.
  • Fig. 5a a simple rectangular image window is depicted consistent with those utilised with regard to Figs. 2 to 4.
  • This window 51 will be positioned appropriately to reflect a part of an image taken from a rejectable message for overlay comparison with parts of a received image from an electronic message to be processed.
  • Each electronic email filter or method may have a different sized window 51 electively determined by a user. It will be appreciated that a balance must be drawn between having too small a window 51 such that individual letters or short two or three letter combinations are encompassed or common words such that the filtering effect is compromised by a large number of consistencies which would be executed in legitimate messages as well as illegitimate messages.
  • the image window 51 is rendered too large then clearly the likelihood of a high percentage consistency may be diminished by the greater detail, that is to say a large number of words or graphics which must be consistent between the received image and the image of the rejectable message utilised for filtering taken from the rejectable message library.
  • a significant factor with regard to deciding the size of the image window 51 will be the processing power available, the speed with which filtering must be performed and the availability of time for filtering.
  • a rectangular image window 51 will generally be the preferred approach as it is geometrically simple and therefore easy to compare selected image windows in the received message image and images of rejectable messages in accordance with aspects of the present invention.
  • FIG. 5b An alternative which may be more difficult for an originator of illicit messages to overcome is to provide a different geometric shape and sized image window 52 as depicted in Fig. 5b.
  • the example shown in Fig. 5b is of a triangle which could span two or more lines of text and take parts of alphanumeric characters for comparison between the received image and stored or converted images for rejectable messages in accordance with aspects of the present invention.
  • a further alternative would be to create a cross shaped image window as depicted in Fig. 5c.
  • the image window 53 extends to provide alphanumeric or other graphic comparison elements between a received image and images or converted images of rejectable messages.
  • originators of text in addition to changing the colour and marginally changing the text may change the size of the characters and text in a graphic image attached to an electronic message.
  • aspects of the present invention may utilise a scaling approach such that an image window 54 can be scaled for comparison between the received message and the converted or stored image of the rejectable message in the rejectable message library in accordance with aspects of the present invention.
  • this scaling would ensure that an image window for an exemplary window 54a at a certain size and distribution of alpha numeric characters or symbols would be identified as consistent if scaled to have image windows of a size 54b or 54c in comparison between the received message and the rejectable message.
  • Such an approach would again inhibit the activities of originators of illicit messages.
  • aspects of the present invention utilise direct graphic to graphic comparison in the form of image windows between a received message and rejectable messages. In such circumstances the comparison is more akin to a human comparison than machine readable text comparisons and checks on comparisons.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Computer Hardware Design (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

An electronic mail filter comprising: a. a receiver to receive an electronic message and parsing that message as a received image; b. a rejectable message library recorded in or convertible to an image form consistent with the received image; c. an image window specifier to select a part of an image from the rejectable image library; and, d. a comparer to compare sequentially the part of the image from the rejectable image library with the received image to identify as an overlay matches indicative of consistency between the electronic message and one of the library of rejectable messages.

Description

Electronic Mail Filter
The present invention relates to an electronic mail filter and method of electronic mail filtering for use particularly with regard to emails.
Use of electronic communication and in particular electronic mail in the form of emails received between computers and PDAs and mobile phones is an increasingly important form of communication. Unfortunately, the nature of such ease of communication means that so called "spam" messages are forwarded to enable surreptitious advertising and promotion of activities as well as for simple nuisance and possibly malicious intentions. It will be appreciated that a broad definition of spam relates to obscuring or hiding the contents of an email such that it cannot be easily filtered through typical prior filtering techniques. It will be appreciated spam will be mixed with legitimate and desired electronic mail messages which the user will want to receive but a requirement to sift through a large numbers of spam and erroneous emails will be detrimental both to the user's convenience with respect to electronic mail delivery, and acceptability of the electronic mail system. The amount of spam electronic mail is rising each year and can be considered to have reached epidemic proportions with the result that it is undermining the convenience of the electronic mail system.
Originators of spam and illicit electronic mail typically use relatively simple techniques such that the spam messages can be delivered through filter mechanism and methods. Original spam recognition systems recognised sender email addresses, groups of words and styles of wording. However, these techniques are increasingly less effective and the originators of spam messages are increasingly sending their messages in the form of text embedded in graphical images. The text appears the same however anti spam systems cannot recognise the graphic images. It will also be appreciated that the graphic images themselves can be created slightly differently each time to ensure that anti spam systems which depend upon a simple checksum to identify graphical image similarity cannot be utilised. Essentially, however complex a text based recognition system becomes, it will fail increasingly with regard to text/text embedded in graphic images type spam or illicit electronic mail messages.
In accordance with aspects of the present invention there is provided an electronic mail filter comprising:
a) A receiver to receive an electronic message and parsing that message as a received image; b) A rejectable message library recorded in or convertible to an image form consistent with the received image; c) An image window specifier to select a part of an image from the rejectable image library; and, d) A comparer to compare sequentially the part of the image from the rejectable image library with the received image to identify as an overlay matches indicative of consistency between the electronic message and one of the library of rejectable messages.
Also in accordance with aspects of the present invention there is provided a method of filtering electronic mail comprising:
a) Establishing a rejectable message library in an image form or convertible into an image form; b) Establishing an image window comprising of part of the image form; c) Receiving an electronic message and parsing the image to the image form; and, d) Comparing the image window of the received message with an image window from the rejectable message library in the image formed to identify any overlay matches indicative of consistency between the electronic message and one of the rejectable messages in the rejectable message library.
Generally, the image form is optical and as seen. Generally, the image form comprises text and/or HTML messaging.
Generally, the image window is fixed. Typically, the image window is fixed in terms of size and/or shape.
Alternatively, the image window is variable. Typically, the image window is variable in terms of size and shape and orientation relative to the image formed.
Advantageously, the image window is arranged to span portions of text characters on one or more lines. Potentially, the image window is adjustable for consideration of magnitude and size comparison between the image of the received electronic message and one of the images in the rejectable message library.
Possibly, a background filter is provided to eliminate background colour and scenery for comparison between the image window of the received message and the image window taken from the rejectable message library.
Typically, comparison between the image window of the rejectable message library and the image of the received electronic message is compared sequentially in a raster arrangement across the image of the electronic message.
Generally, the filter or method incorporates a statistical element to indicate the number of instances of consistency between the image of the electronic message and one of the rejectable message library images. Generally, a rejection mechanism is provided whereby if the number of instances of consistency exceed a threshold level then rejection of the received electronic message is provided. Possibly, an indictor is provided for the electronic message indicative of the likelihood of that electronic message being substantially a rejectable message as provided in the rejectable message library.
Generally, the rejectable message library is augmented by storing rejectable messages identified by other means and uploaded by users of the electronic mail filter or the method of filtering electronic mail.
Embodiments of the present invention will now be described by way of example and with reference to the accompanying drawings in which:-
Fig. 1 is an illustration of a typical electronic message incorporating text and an image;
Fig. 2 illustrates two alternative forms of electronic message arranged to form typical electronic mail filter arrangements;
Fig. 3 illustrates two alternative electronic mail messages incorporating images arranged to avoid conventional electronic mail filters and methods;
Fig. 4 is a schematic illustration of an image window utilised in accordance with aspects of the present invention in order to provide electronic mail filtering; and,
Fig. 5 illustrates alternative image windows for utilisation in accordance with aspects of the present invention.
As indicated above, previous filters and methods of filtering electronic mail depends upon principally text based recognition. However, increasingly emails that are sent incorporate and are sent as HTML rather than just plain text. HTML is a simple programming language that is used to describe web pages. A HTML email is basically a simple single page website and contains the text of the message plus any pictures to be displayed and links to external websites. Such HTML messaging renders it even more difficult to utilise text based recognition systems in order to act as a filter with regard to illicit messages. These illicit messages can be described as rejectable messages which a user does not wish to view and therefore are an irritant.
In accordance with aspects of the present invention an HTML electronic message is first parsed or rendered, that is to say turned into a received image that would normally be viewed as an email. This received image is then compared with known rejectable messages, commonly referred to as spam using various graphical image recognition techniques. In such circumstances essentially aspects of the present invention utilise a pictorial representation of the message rather than the text within the message for filtering.
Aspects of the present invention take an image window which is a portion of a received electronic message and compare it with similar portions of rejectable messages already identified. Thus, aspects of the present invention check for illicit messages by matching parts, that is to say image windows of a graphical image against a database of such graphical images in the same image format located within a rejectable message library.
The rejectable message library is built by uploading illicit messages or spam received by others. It will be appreciated that current techniques with regard to illicit message filtering in addition to a basic list of addressees, keyword and styles of words also depend upon a network of operatives identifying spam and then uploading that spam to an existing library. Aspects of the present invention use such uploading from a user group in receipt of illicit electronic messages in order to create the rejectable message library. The rejectable message library, as indicated, is produced and parsed/rendered in an image form consistent with the image form of the electronic message as presented or converted appropriately. In such circumstances, a like for like pictorial overlay comparison can be achieved and flagged if appropriate.
It is important that the same image format is utilised as it will be understood that different parsing and rendering techniques for electronic messages will produce slightly different graphic images. The pictorial image comparison in accordance with aspects of the present invention essentially comprises comparing pixels or dot representations of the graphic within an image window taken as a segment of the received image and a segment of the images stored in the rejectable image library. For comparison at a basic level a like for like overlay approach is taken such that if there is consistency between part of the image of an illicit rejectable message and part of the image of the received message this will be highlighted.
Clearly, the size and positioning of the image window is important. Thus, if the image window is of a large size including several lines of text or graphical images the likelihood of an exact comparison resulting in a consistency indicator is limited. Alternatively, if the image window is small and is positioned such that it simply retains a single word in a common text format such as the word "the" or "and" in all likelihood legitimate messages will also incorporate those words in that format and therefore these messages will also be highlighted as illicit when this is untrue. As outlined below, various techniques are utilised in accordance with aspects of the present invention in order to avoid such rejection of legitimate messages.
Electronic mail filters and methods of electronic mail filtering in accordance with aspects of the present invention utilise relatively high powered computer processors such that high volumes of graphic image comparisons can be achieved in a relatively short time period. The respective image windows from the received message in the form of a received image and image windows taken from the rejectable image library are compared like for like in a standard overlay. If the image window from the rejectable image library is consistent with parts of the receivable image then this will be flagged as a consistency. Generally, the number of consistencies allowable within a legitimate message will be set as a threshold such that if the number of consistencies between the received message and rejectable message is below that threshold then the message will be deemed legitimate, whilst above that threshold the message will be deemed illicit and therefore rejectable. It will also be understood that various grades of potential suspicion may be established by thresholds and each level of potential legitimacy or illegitimacy flagged by an appropriate signal associated with the electronic message in the user's reader.
It will be appreciated that the image window taken from the rejectable messages library may be compared with all parts of the received image from the electronic messages but this approach may be slow. Alternatively, the whole received image may be compared with a whole image in the same format held within the rejectable message library. In such circumstances by picking a number of points at random in the comparable sized received message of the electronic message and the image held within the rejectable message library, image windows at those points may be compared. However, this approach assumes that the illicit message has substantially the same format which with regard to persons attempting to forward illicit electronic mail may not always be the same. Nevertheless, as a first check such an approach will provide an indication whether subsequent testing should be performed. Further means of reducing the time for testing include separating graphic images within an electronic message from text to enable review of just the graphic images such that conventional techniques can be applied to the text or rendering the text of the email in a different colour to allow hard edges of images to stand out. For example, by simply changing text in an email to black prior to rendering of images, it will be easier to test for comparison. It will be understood that one relatively easy approach with respect to avoiding conventional electronic mail filtering regimes is to present text in different colours which, by their nature, are difficult to compare with known text using conventional techniques.
Fig. 1 provides an illustration of one illicit message in which a block of text 1 is provided with an image 2 included within the electronic message as a graphic file. The text within the graphic image 2 can be coloured such that by converting all this text to a known colour, that is to say black, comparison in an image to image fashion can be more readily achieved.
Change of the text colours as well as background colour 3 for the text are conventional techniques used by originators of illicit electronic messages to avoid filtering techniques. In such circumstances as part of the parsing or rendering of an electronic message and rejectable messages into a consistent image format there will be removal of coloured text as well as filtering out of background colouring 3 to a consistent colour, that is to say white in order to create a standard block for comparison testing.
In accordance with aspects of the present invention the full electronic message is converted to a graphic image and image windows taken in the form of samples which will be matched with reciprocal image windows taken from the rejectable message library such that consistency will provide a score with respect to the likelihood of the received electronic message being illicit.
With regard to Fig. 1 , the method and filter of aspects of the present invention may be taken to create an image window for the block "you've" even though this section is within the graphic image 2. this is because aspects of the present invention compare pictorially image windows or blocks from the rejectable message library and a part of the whole graphic of the electronic message. By such an approach it will also be appreciated that as the whole of the message is converted to an image the technique with regard to image window comparison can also be applied to the existing graphic version of the standard text 2 such as with regard to the word phrase "enjoyment highly".
As long as we take a number of sample matches, that is to say comparisons between image windows taken from the received image of the electronic message and similar parts of images in the rejectable message library, if we find a high percentage match or even a hundred percent consistency, we can identify the putive received electronic message through its received image as an illicit message where the text and graphics might be slightly changed each time received.
Fig. 2 illustrates a common situation with regard to electronic mail in the form of text which is designed to avoid basic electronic email filters and methods as known. As can be seen, the text between respective messages 2a, 2b is substantially the same although there are a number of marginal differences. For example, it will be noted that the fourth word thinner in Fig. 2a is spelt thin+ner whilst in Fig. 2b this word is spelt thin-ner. Similarly, appetite is presented with an asterisk in Fig. 2a line 4 and with a gap in Fig. 2b at line 4. There are a number of such marginal differences between the text messages which are designed to fool existing electronic message and mail filters.
Although the electronic messages in Fig. 2 are just text, it will be appreciated that simple techniques with regard to comparisons utilised for electronic message filtering are difficult to utilise as each message has changed in terms of the text each time. However, in accordance with aspects of the present invention, as indicated, graphic pictorial image blocks are compared block for block to determine whether there is a successful consistent match indicative of an illicit message. Thus, the image window utilised for comparison may comprise one or more whole or part words in one or more lines of the text and so by iterative sequential comparison which may compare the whole text progressively between each message or by using selected random or specific focus points upon which the image windows are taken it will be understood that illicit message identification and filtering can be achieved. In Fig. 2 this can be shown by image window 20 which will be stored within a rejectable message library such that for example if the message in Fig. 2a is taken as a rejectable message connected into a consistent image format an image window 20 can be taken for comparison with a received electronic message depicted in Fig. 2b. In such circumstances it will be seen that there is consistency between the image window 20 and image window 21 in the text of Fig. 2b. This consistency will be highlighted and is indicative of a rejectable illicit message in accordance with aspects of the present invention. Similarly, as image window 22 in Fig. 2a can be found consistent with an image window 23 in Fig. 2b and again consistency found. It will be appreciated that these windows 20, 22 may be compared in a raster or iterative means by simple overlay one upon the other to establish consistency. Alternatively, as depicted with regard to image window 22 a datum or anchor point 24 may be established in the text and then the window 23 for comparison with the window 22 centred about a similar datum or anchor point 25 in the text of Fig. 2b. In such circumstances less comparison operations will be required to establish probable consistency between the text of Fig. 2a and Fig. 2b and therefore an indicator as to validity of the electronic mail or message.
As indicated above, generally a rejectable message as illustrated with regard to Fig. 2a will typically be established by an initial first match through a human review of the text such that this rejectable message can then be uploaded to the rejectable text library. The rejectable message library may record the electronic message in any appropriate form and incorporate appropriate converters to convert the electronic message into a consistent image format with that of any received message for comparison. It is important with regard to aspects of the present invention that there is consistency in the image format as the method and filter depend upon graphic pictorial comparison with the as seen form. The rejectable message library may store whole images or parts of images for comparison.
As indicated above, typically an originator of an illicit electronic message or mail may modify the text as well as an attached image received in the form of a HTML element.
Fig. 3 illustrates two respective messages which are substantially consistent but which would present difficulties with regard to filtering with conventional filters and methods as known prior to the present application. As can be seen in Figs. 3a and 3b the text 31 in Fig. 3a is different to the text 32 in Fig. 3b. Furthermore, the graphic image 33 in Fig. 3a is sufficiently different to the graphic image 34 in Fig. 3b such that a simple text mechanism for comparison would not be able to establish consistency. It will be noted with regard to the graphic images the dots and dashes 35, 36 in the graphic image 33 are different to the dots and dashes 37, 38 in graphic image 34 and therefore as compared by a checksum approach these images are different and therefore would not be matched.
By aspects of the present invention, as indicated, the coloured text in the graphic images 33, 34 would be rendered as black text to allow comparison by an image window comprising a proportion of a rejectable message in the form of a consistent format image for comparison with the same format image in a received electronic, mail. With regard to Fig. 3 a possible image window would be as depicted as box 39 and this can be compared with box 40 in the graphic image 34 depicted in Fig. 3b. This rendering to a consistent block image can be considered part of the achievement of a consistent image format for comparison.
As indicated above aspects of the present invention utilise an image window which comprises a graphic portion of an image of a rejectable message for comparison with the whole or selected parts dependent upon datum or anchor points in a received image of an electronic message for processing. Fig. 4 illustrates two potential image windows to allow comparison in accordance with aspects of the present invention.
Fig. 4a is generally consistent with the approach depicted with regard to Figs. 2 and 3 above. Thus, an image window 41 takes whole text or ietters/numbers for comparison. The image window 41 may be anchored about a spatial datum point within a graphic image for consistency between the rejectable message image and the received image of the electronic message to be filtered. Alternatively, the image window 41 may be moved about a received message randomly or iteratively or as a raster process in order to establish by overlay consistency between the window 41 and the received message in the consistent image format. Clearly, consistency between the image windows of the rejectable message and the received message will provide an indication as to the acceptability of that received message.
A particular advantage with regard to aspects of the present invention is that the image window need not provide or consider whole words or provide a simple rectangular comparison as will be described later with regard to Fig. 5. Thus, with regard to Fig. 4b it will be noted that an image window 43 extends over part of the letters. This window 43 as indicated previously, will be compared with the received message to determine consistency between the image window taken from an image in a consistent format of the rejectable message stored in a rejectable message library. Again the comparison may be based upon a datum point (riot shown) or raster overlay comparisons in order to find a consistency within the received message.
It will be appreciated that by use of a graphic or pictorial image window for comparison, utilisation of embedded images within an electronic mail message are less likely to avoid filtering. The image windows essentially comprise a dot or pixel matrix for comparison between the received message and image of the rejectable message as seen. In such circumstances minor alteration as used by an originator of illicit messages will be less likely to go undetected and in any event, by graphic comparison software the percentage similarity between the image windows can be identified and such minor changes will therefore be less conclusive with regard to fooling a filter into considering the received image as different from existing rejectable messages.
As originators of illicit messages become more aware of aspects of the present invention, these originators may attempt to anticipate the size, shape and distribution of image windows. In such circumstances it is also envisaged within the scope of aspects of the present invention that the image window taken from the consistent image format of the received electronic message and the rejectable messages may be changed. Fig. 5 illustrates four potential approaches with respect to altering the image window to take account of evasive measures taken by originators of illicit messages.
In Fig. 5a a simple rectangular image window is depicted consistent with those utilised with regard to Figs. 2 to 4. This window 51 will be positioned appropriately to reflect a part of an image taken from a rejectable message for overlay comparison with parts of a received image from an electronic message to be processed. Each electronic email filter or method may have a different sized window 51 electively determined by a user. It will be appreciated that a balance must be drawn between having too small a window 51 such that individual letters or short two or three letter combinations are encompassed or common words such that the filtering effect is compromised by a large number of consistencies which would be executed in legitimate messages as well as illegitimate messages. Alternatively, if the image window is rendered too large then clearly the likelihood of a high percentage consistency may be diminished by the greater detail, that is to say a large number of words or graphics which must be consistent between the received image and the image of the rejectable message utilised for filtering taken from the rejectable message library. A significant factor with regard to deciding the size of the image window 51 will be the processing power available, the speed with which filtering must be performed and the availability of time for filtering. A rectangular image window 51 will generally be the preferred approach as it is geometrically simple and therefore easy to compare selected image windows in the received message image and images of rejectable messages in accordance with aspects of the present invention.
An alternative which may be more difficult for an originator of illicit messages to overcome is to provide a different geometric shape and sized image window 52 as depicted in Fig. 5b. The example shown in Fig. 5b is of a triangle which could span two or more lines of text and take parts of alphanumeric characters for comparison between the received image and stored or converted images for rejectable messages in accordance with aspects of the present invention. A further alternative would be to create a cross shaped image window as depicted in Fig. 5c. Thus, again the image window 53 extends to provide alphanumeric or other graphic comparison elements between a received image and images or converted images of rejectable messages.
By having different shapes for the image windows it will be appreciated that it will be more difficult for an originator of illicit messages to avoid consistencies between their illicit rejectable spam messages whist still achieving delivery of the underlying message necessitated by that message such as an advert or dialogue.
It will also be understood that originators of text in addition to changing the colour and marginally changing the text may change the size of the characters and text in a graphic image attached to an electronic message. Thus, as illustrated in Fig. 5d aspects of the present invention may utilise a scaling approach such that an image window 54 can be scaled for comparison between the received message and the converted or stored image of the rejectable message in the rejectable message library in accordance with aspects of the present invention. As illustrated in Fig. 5d this scaling would ensure that an image window for an exemplary window 54a at a certain size and distribution of alpha numeric characters or symbols would be identified as consistent if scaled to have image windows of a size 54b or 54c in comparison between the received message and the rejectable message. Such an approach would again inhibit the activities of originators of illicit messages.
Aspects of the present invention utilise direct graphic to graphic comparison in the form of image windows between a received message and rejectable messages. In such circumstances the comparison is more akin to a human comparison than machine readable text comparisons and checks on comparisons.
Whilst endeavouring in the foregoing specification to draw attention to those features of the invention believed to be of particular importance it should be understood that the Applicant claims protection in respect of any patentable feature or combination of features hereinbefore referred to and/or shown in the drawings whether or not particular emphasis has been placed thereon.

Claims

Claims
1. An electronic mail filter comprising:
a. a receiver to receive an electronic message and parsing that message as a received message;
b. a rejectable message library recorded in or convertible to an image form consistent with the received image;
c. an image window specifier to select a part of an image from the rejectable image library; and,
d. , a comparer to compare sequentially the part of the image from the rejectable image library with the received image to identify as an overlay matches indicative of consistency between the electronic message and one of the library of rejectable messages.
2. A filter as claimed in claim 1 wherein the image form is optical and as seen.
3. A filter as claimed in claim 1 or claim 2 wherein the image form comprises text and/or HTML messaging.
4. A filter as claimed in any of claims 1 to 3 wherein the image window is fixed.
5. A filter as claimed in claim 4 wherein the image window is fixed in terms of size and/or shape.
6. A filter as claimed in any of claims 1 to 3 wherein the image window is variable.
7. A filter as claimed in claim 6 wherein the image window is variable in terms of size and shape and orientation relative to the image formed.
8. A filter as claimed in any preceding claim wherein the image window is arranged to span portions of text characters on one or more lines.
9. A filter as claimed in any preceding claim wherein the image window is adjustable for consideration of magnitude and size comparison between the image of the received electronic message and one of the images in the rejectable message library.
10. A filter as claimed in any preceding claim wherein a background filter is provided to eliminate background colour and scenery for comparison between the image window taken from the rejectable message library.
11. A filter as claimed in any preceding claim wherein comparison between the image window of the rejectable message library and the image of the received electronic message is compared sequentially in a raster arrangement across the image of the electronic message.
12. A filter as claimed in any preceding claim herein the filter or method incorporates a statistical element to indicate the number of instances of consistency between the image of the electronic message and one of the rejectable message library images.
13. A filter as claimed in any preceding claim wherein a rejection mechanism is provided whereby if the number of instances of consistency exceed a threshold level then rejection of the received electronic message is provided.
14. A filter as claimed in any preceding claim wherein an indictor is provided for the electronic message indicative of the likelihood of that electronic message being substantially a rejectable message as provided in the rejectable message library.
15. A filter as claimed in any preceding claim wherein the rejectable message library is augmented by storing rejectable messages identified by other means and uploaded by users of the electronic mail filter.
16. An electronic mail filter substantially as hereinbefore described with reference to the accompanying drawings.
17. A method of filtering electronic mail comprising:
a. establishing a rejectable message library in an image form or convertible into an image form;
b. establishing an image window comprising of part of the image form;
c. receiving an electronic message and parsing the image to the image form; and,
d. comparing the image window of the received message with an image window from the rejectable message library in the image formed to identify any overlay matches indicative of consistency between the electronic message and one of the rejectable messages in the rejectable message library.
18. A method as claimed in claim 17 wherein the image form is optical and as seen.
19. A method as claimed in claim 17 or claim 18 wherein the image form comprises text and/or HTML messaging.
20. A method as claimed in any of claims 17 to 19 wherein the image window is fixed.
21. A method as claimed in claim 20 wherein the image window is fixed in terms of size and/or shape.
22. A method as claimed in any of claims 17 to 19 wherein the image window is variable.
23. A method as claimed in claim 22 wherein the image window is variable in terms of size and shape and orientation relative to the image formed.
24. A method as claimed in any of claims 17 to 23 wherein the image window is arranged to span portions of text characters on one or more lines.
25. A method as claimed in any of claims 17 to 24 wherein the image window is adjustable for consideration of magnitude and size comparison between the image of the received electronic message and one of the images in the rejectable message library.
26. A method as claimed in any of claims 17 to 25 wherein a background filter is provided to eliminate background colour and scenery for comparison between the image window of the received message and the image window taken from rejectable message library.
27. A method as claimed in any of claims 17 to 26 wherein comparison between the image window of the rejectable message library and the image of the received electronic message is compared sequentially in a raster arrangement across the image of the electronic message.
28. A method as claimed in any of claims 17 to 27 wherein the filter or method incorporates a statistical element to indicate the number of instances of consistency between the image of the electronic message and one of the rejectable message library images.
29. A method as claimed in any of claims 17 to 28 wherein a rejection mechanism is provided whereby if the number of instances of consistency exceed a threshold level then rejection of the received electronic message is provided.
30. A method as claimed in any of claims 17 to 29 wherein an indictor is provided for the electronic message indicative of the likelihood of that electronic message being substantially a rejectable message as provided in the rejectable message library.
31. A method as claimed in any of claims 17 to 30 wherein the rejectable message library is augmented by storing rejectable messages identified by other means and uploaded by users of the electronic mail filter or the method of filtering electronic mail.
32. A method of filtering electronic mail substantially as hereinbefore described with reference to the accompanying drawings.
33. An electronic storage device incorporating an electronic mail filter as claimed in any of claims 1 to 16.
34. An electronic storage device incorporating process commands for a method as claimed in any of claims 17 to 31.
35. Any novel subject matter or combination including novel subject matter disclosed herein, whether or not within the scope of or relating to the same invention as any of the preceding claims.
PCT/GB2007/004341 2006-11-14 2007-11-14 Electronic mail filter WO2008059237A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0622621.1 2006-11-14
GB0622621A GB2443873B (en) 2006-11-14 2006-11-14 Electronic mail filter

Publications (1)

Publication Number Publication Date
WO2008059237A1 true WO2008059237A1 (en) 2008-05-22

Family

ID=37594849

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2007/004341 WO2008059237A1 (en) 2006-11-14 2007-11-14 Electronic mail filter

Country Status (2)

Country Link
GB (1) GB2443873B (en)
WO (1) WO2008059237A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110048936A (en) * 2019-04-18 2019-07-23 合肥天毅网络传媒有限公司 A kind of method that semantic association word judges spam

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10666659B2 (en) 2015-08-24 2020-05-26 Bravatek Solutions, Inc. System and method for protecting against E-mail-based cyberattacks
CN106817297B (en) * 2017-01-19 2019-11-26 华云数据(厦门)网络有限公司 A method of spam is identified by html tag

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050216564A1 (en) * 2004-03-11 2005-09-29 Myers Gregory K Method and apparatus for analysis of electronic communications containing imagery
WO2006088914A1 (en) * 2005-02-14 2006-08-24 Inboxer, Inc. Statistical categorization of electronic messages based on an analysis of accompanying images
WO2006117575A1 (en) * 2005-05-04 2006-11-09 I-Sieve Technologies Ltd. Method for probabilistic information fusion to filter multi-lingual, semi-structured and multimedia electronic content

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050050150A1 (en) * 2003-08-29 2005-03-03 Sam Dinkin Filter, system and method for filtering an electronic mail message

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050216564A1 (en) * 2004-03-11 2005-09-29 Myers Gregory K Method and apparatus for analysis of electronic communications containing imagery
WO2006088914A1 (en) * 2005-02-14 2006-08-24 Inboxer, Inc. Statistical categorization of electronic messages based on an analysis of accompanying images
WO2006117575A1 (en) * 2005-05-04 2006-11-09 I-Sieve Technologies Ltd. Method for probabilistic information fusion to filter multi-lingual, semi-structured and multimedia electronic content

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110048936A (en) * 2019-04-18 2019-07-23 合肥天毅网络传媒有限公司 A kind of method that semantic association word judges spam
CN110048936B (en) * 2019-04-18 2021-09-10 宁波青年优品信息科技有限公司 Method for judging junk mail by semantic associated words

Also Published As

Publication number Publication date
GB2443873A (en) 2008-05-21
GB2443873B (en) 2011-06-08
GB2443873A8 (en) 1900-01-01
GB0622621D0 (en) 2006-12-20

Similar Documents

Publication Publication Date Title
US7706614B2 (en) System and method for identifying text-based SPAM in rasterized images
Attar et al. A survey of image spamming and filtering techniques
US7882187B2 (en) Method and system for detecting undesired email containing image-based messages
Al-Shatnawi A new method in image steganography with improved image quality
US8045808B2 (en) Pure adversarial approach for identifying text content in images
US20050216564A1 (en) Method and apparatus for analysis of electronic communications containing imagery
US9489452B2 (en) Image based spam blocking
US8098939B2 (en) Adversarial approach for identifying inappropriate text content in images
US7711192B1 (en) System and method for identifying text-based SPAM in images using grey-scale transformation
US20050050150A1 (en) Filter, system and method for filtering an electronic mail message
EP1803267B1 (en) Method and system for sending electronic mail over a network
KR20050000309A (en) Advanced spam detection techniques
US7430720B2 (en) System and method for preventing screen-scrapers from extracting user screen names
US11978020B2 (en) Email security analysis
Singh et al. A survey on text based steganography
US7596270B2 (en) Method of shuffling text in an Asian document image
WO2008059237A1 (en) Electronic mail filter
Liu et al. Fighting unicode-obfuscated spam
Changder et al. A new approach to Hindi text steganography by shifting matra
US8180152B1 (en) System, method, and computer program product for determining whether text within an image includes unwanted data, utilizing a matrix
Dhavale Advanced image-based spam detection and filtering techniques
EP2275972B1 (en) System and method for identifying text-based spam in images
He et al. A simple method for filtering image spam
Shahreza erifyin Spam SMS y Arabic CA CA
KR100460420B1 (en) method for filtering spam mail by X-code

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07824566

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07824566

Country of ref document: EP

Kind code of ref document: A1