DE102008016667B3 - Method for the detection of almost identical content or identical picture messages and its use for the suppression of unwanted picture messages - Google Patents

Method for the detection of almost identical content or identical picture messages and its use for the suppression of unwanted picture messages

Info

Publication number
DE102008016667B3
DE102008016667B3 DE102008016667A DE102008016667A DE102008016667B3 DE 102008016667 B3 DE102008016667 B3 DE 102008016667B3 DE 102008016667 A DE102008016667 A DE 102008016667A DE 102008016667 A DE102008016667 A DE 102008016667A DE 102008016667 B3 DE102008016667 B3 DE 102008016667B3
Authority
DE
Germany
Prior art keywords
image
comparison
picture
overall
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
DE102008016667A
Other languages
German (de)
Inventor
Ben St John
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Siemens AG
Original Assignee
Siemens AG
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens AG filed Critical Siemens AG
Priority to DE102008016667A priority Critical patent/DE102008016667B3/en
Application granted granted Critical
Publication of DE102008016667B3 publication Critical patent/DE102008016667B3/en
Application status is Expired - Fee Related legal-status Critical
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06QDATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation, e.g. computer aided management of electronic mail or groupware; Time management, e.g. calendars, reminders, meetings or time accounting
    • G06Q10/107Computer aided management of electronic mail
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5838Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00Arrangements for user-to-user messaging in packet-switching networks, e.g. e-mail or instant messages
    • H04L51/12Arrangements for user-to-user messaging in packet-switching networks, e.g. e-mail or instant messages with filtering and selective blocking capabilities

Abstract

The invention consists essentially in that a normalized search overall image and a normalized comparison total image are created and compared with each other pixel by pixel, the normalization, for example, by an overall picture, a color / gray scale conversion, a histogram normalization, a downscaling, an edge detection and a two-dimensional Fourier transformation is performed to detect alienated image messages with almost the same or even the same content and filter out if necessary.

Description

  • The The invention relates to a method for detecting almost identical content or the same content news, in which a comparison between a respective comparison image and a search image takes place in such a way that the two pixels compared the corresponding pixels and that from a predetermined proportion of matching Pixels a near-content equality or a content equality is detected.
  • to suppression unwanted widespread commercial e-mails (spam mails) A number of methods are already known. For example with the help of positive address lists (white lists) or negative address lists (black lists) wanted by distinguished and filtered out unwanted senders. The size and style The distributor is sometimes used to detect such unwanted Messages used.
  • Furthermore There are also statistical filters that affect the content of messages yourself and for example the presence of certain Check words. such But filters are becoming common for example, by a slightly different spelling of words, by Append from trivial text or a pictorial representation of words or Bypassed text.
  • A Conversion of pictorial representations of words or text into the corresponding ASCII characters with the help of OCR programs (optical character recognition) is known per se.
  • Out the publication Wang, Josephson, LV, Charikar, Li: Filtering Image Spam with Near-Duplicate Detection, Conference on Email and Anti-Spam (CEAS 2007), http://www.cs.princeton.edu/cass/papers/spam__easeas.pdf For example, a method is known that is based on detecting "near-duplicates" of already known ones Spam images based.
  • Out the publication Aradhye Myers, Herson: Image Analysis for Efficient Categorization of image-based spam e-mail, In: Proceedings Conference on Document Analysis and Recognition (ICDAR'05) DOI: 10.1109 / ICDAR.2005.135 a corresponding method is known where a superimposed on a picture Text, certain image areas with text or color properties be detected.
  • Out the publication Fumera, Pillai, Roll, Biggi: Image Spam Filtering using Textual and Visual Information, 2007 http://lxxi.org/files/spamcon07/fumera_biggio06.pdf For example, a method is known that relates to OCR and text categorization techniques and detects content obfuscation techniques.
  • Similar methods and further details are moreover still in the patent / application notes US Pat. No. 6,865,302 B2 . GB 2 440 375 A . WO 01/71652 A1 . GB 2 443 469 A (post-published) and US 2008/0208987 A1 (post-published) known.
  • The The object underlying the invention is now a method for the detection of almost identical content or identical picture messages specify such that the above-mentioned disadvantages possible be avoided and that despite alienation of the text respectively the visual representation of the best possible detection takes place.
  • This is inventively the features of claim 1 achieved. The further claims relate preferred embodiments of the invention and a preferred use the method according to the invention.
  • The Essentially, the invention consists in that a normalized total search image and a respective normalized comparison total image are created and be compared with each other pixel by pixel, the normalization, for example through an overall picture, a color / grayscale conversion, a histogram normalization, a downscaling, an edge detection and a two-dimensional Fourier transform is performed to alienated image messages with almost the same or even the same content to detect and if necessary filter out.
  • following becomes a preferred embodiment the invention explained in more detail with reference to the drawing.
  • First, will from at least one search image message SN by normalization N at least one standardized search overall image NSB created. After that or also in parallel to this is from a respective comparison image message VN by normalization N or at least by a similar one second normalization N ' respective normalized comparison total image NVB created. After that there is a comparison V between the respective normalized comparison total image NVB and the at least one normalized search total image NSB, where from both images NSB and NVB the corresponding pixels are compared and being from a given proportion or percentage of matching Pixels an almost equal content or a content equality I is determined.
  • The above-mentioned normalization N consists, in its most advantageous embodiment, of the following consecutively performed normalization steps:
  • normalization step 1 - Overall image generation:
  • Out the search picture message or the comparison picture message is a composition information won, whether the overall picture a single image or consists of several sub-images. In the case, that the overall picture consists of several sub-images, the overall picture with the help of the composition information first to a search total image or comparative overall picture put together.
  • normalization step 2 - Color / Grayscale conversion:
  • in this connection becomes from a colored search total picture or comparison total picture a corresponding gray value image is generated.
  • normalization step 3 - histogram normalization:
  • By this step will be the frequency the gray values of the total search image and the image Adjusted comparison image and thus a corresponding contrast normalized Image generated.
  • normalization step 4 - Downscaling:
  • in this connection is using a picture shrinkage from a relatively high-resolution total search image or Compared overall image a correspondingly lower resolution normalized image generated.
  • normalization step 5 - edge detection method:
  • In This step is performed using a known edge detection method, for example, by a Sobel filter, a Roberts filter or by Canning Edge Detection, from a total search image or overall comparison image produces a corresponding edge normalized image. In this edge detection method becomes a respective pixel (pixel) by a calculated pixel replaced, with a general and simplified represent each one from the pixel value and from neighboring pixels existing matrix with multiplied by a coefficient matrix.
  • normalization step 6 - two-dimensional Fourier transform:
  • There a translation or rotation in the spatial domain in a two-dimensional Fourier transformation only to phase shifts in the spatial frequency range to lead and the for but the image content typical spatial frequencies remain unchanged, is with this step from a total search image or overall comparison image a corresponding reference Translation and rotation normalized "spatial frequency image" generated.
  • ever depending on the type of picture messages may be single or even more of said standardization steps z. In favor of a higher one Processing speed omitted.
  • The The method described above can advantageously be used for filtering or suppress of widely scattered unwanted Picture messages are used, with the unwanted picture message which corresponds to at least one search image message SN and the comparison image messages VN are then filtered out or suppressed by a filter F, when the near-content equality I or the content equality is detected becomes.

Claims (8)

  1. Method for detection almost identical content or identical picture messages, - at the first off at least one search image message (SN) by normalization (N) at least one normalized search overall image (NSB) is created, - in which then from a respective comparison picture message (VN) by the Normalization (N ') a respective normalized comparison total image (NVB) is created and - at a comparison (V) between the respective normalized comparison total image and the at least one normalized search total image is such that from both images (NSB, NVB) compared the corresponding pixels and that from a given proportion of matching pixels a near-content equality or a content equality (I) is detected.
  2. Method according to Claim 1, in which a first normalization step ( 1 ) is made in such a way that - from the search picture message or the comparison picture message, composition information is obtained as to whether the overall picture consists of a single picture or of several partial pictures, and - that in the case that the overall picture consists of several partial pictures, the overall picture is obtained with the aid of the composition information is first assembled into a total search image or comparison overall image.
  3. Method according to Claim 1 or 2, in which a second normalization step ( 2 ) takes place in such a way that from a colored total search image or overall comparison image a corresponding gray value image is generated.
  4. Method according to one of Claims 1 to 3, in which a third standardization step ( 3 ) takes place in such a way that a corresponding contrast-normalized image is generated from a total search image or overall comparison image with the aid of a histogram normalization.
  5. Method according to one of Claims 1 to 4, in which a fourth normalization step ( 4 ) takes place in such a way that a corresponding resolution-normalized image is generated with the aid of image shrinkage from a high-resolution total search image or overall comparison image.
  6. Method according to one of Claims 1 to 5, in which a fifth normalization step ( 5 ) takes place in such a way that a corresponding edge-normalized image is generated from an entire search image or overall comparison image with the aid of an edge detection method.
  7. Method according to one of Claims 1 to 6, in which a sixth standardization step ( 6 ) takes place in such a way that a corresponding normalized image with respect to translation and rotation is generated with the aid of a two-dimensional Fourier transformation from a total search image or overall comparison image.
  8. Use of any of the foregoing methods, for Filtering out or suppressing widespread undesired ones Picture messages, being the unwanted Picture message of the at least one search picture message (SN) corresponds and the comparison picture messages (VN) then filtered out or repressed (F) when the near-content equality or the content equality is detected.
DE102008016667A 2008-04-01 2008-04-01 Method for the detection of almost identical content or identical picture messages and its use for the suppression of unwanted picture messages Expired - Fee Related DE102008016667B3 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
DE102008016667A DE102008016667B3 (en) 2008-04-01 2008-04-01 Method for the detection of almost identical content or identical picture messages and its use for the suppression of unwanted picture messages

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE102008016667A DE102008016667B3 (en) 2008-04-01 2008-04-01 Method for the detection of almost identical content or identical picture messages and its use for the suppression of unwanted picture messages
PCT/EP2009/052830 WO2009121694A2 (en) 2008-04-01 2009-03-11 Method for detecting picture messages with the same or nearly the same contents, and use of the method for suppressing unwanted picture messages

Publications (1)

Publication Number Publication Date
DE102008016667B3 true DE102008016667B3 (en) 2009-07-23

Family

ID=40786144

Family Applications (1)

Application Number Title Priority Date Filing Date
DE102008016667A Expired - Fee Related DE102008016667B3 (en) 2008-04-01 2008-04-01 Method for the detection of almost identical content or identical picture messages and its use for the suppression of unwanted picture messages

Country Status (2)

Country Link
DE (1) DE102008016667B3 (en)
WO (1) WO2009121694A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699843A (en) * 2013-12-30 2014-04-02 珠海市君天电子科技有限公司 Malicious activity detection method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001071652A1 (en) * 2000-03-16 2001-09-27 The Regents Of The University Of California Perception-based image retrieval
GB2440375A (en) * 2006-07-21 2008-01-30 Clearswift Ltd Method for detecting matches between previous and current image files, for files that produce visually identical images yet are different
GB2443469A (en) * 2006-11-03 2008-05-07 Messagelabs Ltd Detection of image spam
US20080208987A1 (en) * 2007-02-26 2008-08-28 Red Hat, Inc. Graphical spam detection and filtering

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7937480B2 (en) * 2005-06-02 2011-05-03 Mcafee, Inc. Aggregation of reputation data
US20030229643A1 (en) * 2002-05-29 2003-12-11 Digimarc Corporation Creating a footprint of a computer file

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001071652A1 (en) * 2000-03-16 2001-09-27 The Regents Of The University Of California Perception-based image retrieval
US6865302B2 (en) * 2000-03-16 2005-03-08 The Regents Of The University Of California Perception-based image retrieval
GB2440375A (en) * 2006-07-21 2008-01-30 Clearswift Ltd Method for detecting matches between previous and current image files, for files that produce visually identical images yet are different
GB2443469A (en) * 2006-11-03 2008-05-07 Messagelabs Ltd Detection of image spam
US20080208987A1 (en) * 2007-02-26 2008-08-28 Red Hat, Inc. Graphical spam detection and filtering

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ARADHYE,MYERS,HERSON: Image Analysis for Efficient Categorization of Image-Based Spam E-Mail. In: Proceedings Conference on Document Analysis and Recognition (ICDAR'05), 2005, DOI: 10.1109/ICDAR.2005.135 *
FUMERA,PILLAI,ROLI,BIGGIO: Image Spam Filtering using Textual and Visual Information, 2007, <http://lxxi.org/files/spamcon07/fumera_giggio06.p f> *
WANG,JOSEPHSON,LV,CHARIKAR,LI: Filtering Image Spam with Near-Duplicate Detection, Conference on Email and Anti-Spam (CEAS 2007), http://www.cs.princeton.edu/cass/papers/spam_ceas0 .pdf *
WANG,JOSEPHSON,LV,CHARIKAR,LI: Filtering Image Spam with Near-Duplicate Detection, Conference on Email and Anti-Spam (CEAS 2007), http://www.cs.princeton.edu/cass/papers/spam_ceas07.pdf ARADHYE,MYERS,HERSON: Image Analysis for Efficient Categorization of Image-Based Spam E-Mail. In: Proceedings Conference on Document Analysis and Recognition (ICDAR'05), 2005, DOI: 10.1109/ICDAR.2005.135 FUMERA,PILLAI,ROLI,BIGGIO: Image Spam Filtering using Textual and Visual Information, 2007, <http://lxxi.org/files/spamcon07/fumera_giggio06.pdf>

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699843A (en) * 2013-12-30 2014-04-02 珠海市君天电子科技有限公司 Malicious activity detection method and device

Also Published As

Publication number Publication date
WO2009121694A3 (en) 2009-11-26
WO2009121694A2 (en) 2009-10-08

Similar Documents

Publication Publication Date Title
CA2467869C (en) Origination/destination features and lists for spam prevention
US7899866B1 (en) Using message features and sender identity for email spam filtering
Drimbarean et al. Image processing techniques to detect and filter objectionable images based on skin tone and shape recognition
Trier et al. Evaluation of binarization methods for document images
US8533270B2 (en) Advanced spam detection techniques
KR100938072B1 (en) Framework to enable integration of anti-spam technologies
US8977072B1 (en) Method and system for detecting and recognizing text in images
US20070168430A1 (en) Content-based dynamic email prioritizer
JP2016517587A (en) Classification of objects in digital images captured using mobile devices
US20040008884A1 (en) System and method for scanned image bleedthrough processing
EP1376427A2 (en) SPAM detector with challenges
US9064316B2 (en) Methods of content-based image identification
US7627641B2 (en) Method and system for recognizing desired email
JP4516778B2 (en) Data processing system
Palumbo et al. Document image binarization: Evaluation of algorithms
CN100446027C (en) Low resolution optical character recognition for camera acquired documents
Fumera et al. Spam filtering based on the analysis of text information embedded into images
US7664812B2 (en) Phonetic filtering of undesired email messages
US20050091321A1 (en) Identifying undesired email messages having attachments
Leedham et al. Comparison of Some Thresholding Algorithms for Text/Background Segmentation in Difficult Document Images.
US20060123083A1 (en) Adaptive spam message detector
Aradhye et al. Image analysis for efficient categorization of image-based spam e-mail
US9355312B2 (en) Systems and methods for classifying objects in digital images captured using mobile devices
US7245780B2 (en) Group average filter algorithm for digital image processing
Fan et al. Marginal noise removal of document images

Legal Events

Date Code Title Description
8364 No opposition during term of opposition
R119 Application deemed withdrawn, or ip right lapsed, due to non-payment of renewal fee
R119 Application deemed withdrawn, or ip right lapsed, due to non-payment of renewal fee

Effective date: 20141101