TWI287720B - Junk mail filtering systems and methods based on abnormal features in e-mails - Google Patents

Junk mail filtering systems and methods based on abnormal features in e-mails Download PDF

Info

Publication number
TWI287720B
TWI287720B TW94125105A TW94125105A TWI287720B TW I287720 B TWI287720 B TW I287720B TW 94125105 A TW94125105 A TW 94125105A TW 94125105 A TW94125105 A TW 94125105A TW I287720 B TWI287720 B TW I287720B
Authority
TW
Taiwan
Prior art keywords
string
mail
abnormal
abnormal feature
email
Prior art date
Application number
TW94125105A
Other languages
Chinese (zh)
Other versions
TW200705215A (en
Inventor
Wen-Chih Chen
Po-Chang Huang
Chen-Yi Lin
Chia-Hsin Liao
Original Assignee
Inst Information Industry
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inst Information Industry filed Critical Inst Information Industry
Priority to TW94125105A priority Critical patent/TWI287720B/en
Publication of TW200705215A publication Critical patent/TW200705215A/en
Application granted granted Critical
Publication of TWI287720B publication Critical patent/TWI287720B/en

Links

Abstract

A junk mail filtering method based on abnormal features in mails is provided. An e-mail is received. Multiple first abnormal feature strings are extracted from the e-mail by utilizing an abnormal feature extraction rule. Multiple groups associating with the first abnormal feature strings are located. It is determined whether the e-mail is a junk mail contingent upon the similar extent between the first abnormal feature strings and multiple second abnormal feature strings associated with each group.

Description

1287720 i、發明說明(1) 【發明所屬之技術領域】 ϋ外if:疋一種垃圾郵件過濾技術,特別是-種以異常 ,徵f基礎t垃圾郵件H系統及方法。 【先前技術】 *瘅=著:Γ基礎建設發達,衍生出許多便利的網路服務 相對地,也製造出許多問題。其中,賣方所產 生用來行銷之大量垃圾郵件(unsolicited Bulk Email或1287720 i, invention description (1) [Technical field to which the invention belongs] if outside:: A spam filtering technology, in particular, a kind of spam H system and method based on abnormality. [Prior Art] *瘅=着:ΓThe infrastructure is developed, and many convenient network services are derived. Relatively, many problems have also been created. Among them, the seller generates a large amount of spam (unsolicited Bulk Email or

ElDail),讓使用者感到相當困 Γ Πίίίί較,一般媒體進行行銷來得便宜許 者進行行:,it常電子郵件傳遞架構對消費 二可鋼通㊆發送出去的廣告郵件可多達百萬件。 有鑒於垃圾郵件所引發的困擾,目前已經有許多處理 垃=件=解決方案,大致上可分為飼^ 以及使用者端郵件過滤器兩種。傳統。 接用TP W : ^ 慮’=為非法詞彙過濾。 。知用IP過滤技術之過滤器’會先建立大量的1{1里名 =,依,經驗將會傳送垃圾郵件的1?位址納入里名單、中, 存在-些缺點,首先,廣告商通常會使用】=;慮= (pretended)的IP進行郵件發送, ^目 入黑名單中,此外,使用⑴立址來阻一/又:通/不易被納 住那些由IP黑名單所傳送來之合法電^ P件’會阻擋 電子郵件被阻擋在外。 ’讓該收到的ElDail) makes the user feel quite sleepy. Π Π ί ί ί 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般 一般In view of the troubles caused by spam, there are already many treatments, which can be divided into feeds and user-side mail filters. Tradition. Use TP W : ^ Consider '= for illegal vocabulary filtering. . Knowing the filter of IP filtering technology will first establish a large number of 1 {1 name=, according to the experience, the 1st address of the spam will be sent into the list, in the middle, there are some shortcomings. First, the advertiser usually Will use [====================================================================================================== Legitimate electricity will prevent email from being blocked. ‘Let the received

為避免上述缺點’另一種採用非、、i μ A 株用非法詞彙過濾技術之過 第6頁 0213-A40531TW(N2);ACI93003TW;SNOWBALL.ptdIn order to avoid the above disadvantages, another use of non-, i μ A strains with illegal lexical filtering technology. Page 6 0213-A40531TW(N2); ACI93003TW; SNOWBALL.ptd

1287720 i、發明說明(2) _____ 濾器,並不針對IP位址,而是依據郵件的 件。首先,會先建立大量的訓練電子郵件,龙i過濾郵 電子郵件會被歸類為正常或垃圾郵件,接每一個 進郵件,使用一個判斷方法或規則,例如貝母一封新 (bayesian classifieatiQn),依據此新進 ^ 類 ^ 正常與垃圾電子郵件之相似程度來決定此電子::勃:練之 一垃圾郵件。但目前有許多的垃圾電子郵件备,=疋否為 中之内文以外的地方加入許多奇怪 ^在電子郵件 之判斷方法或規則,避免被歸類成垃二用以影響其中 因此,需要一種以異常特徵為基礎之垃圾 統及方法,用以提高郵件過濾的正確性。 件過濾系 【發明内容】 有鑑於此,本發明之目的為提供—種 礎之垃圾郵件過濾系統及方法, ;*特徵為基 性。 用u k南郵件過濾的正確 依據上述目的’本發明實施例揭露一種 基礎之垃圾郵件過濾方法,包括:接收 杜申特徵為 件異常特徵擷取規則來擷取電子郵 ,使用郵 徵字I依據第一異常特徵字串 一異常特 徵字串之多個群組;以及依據第s ^ 聯第—異常特 組中之母一者所關聯之多個第二異 =與多個群 度,來決定電子郵件是否為垃圾郵件串之間之相似 於一些情況下,於決定電子郵 中’可包括:取得每-群組所關聯之;驟 0213-Α40531TW(Ν2);ACI93003TW;SNOWBALL.p t d 第7頁 P87720 乒、發明說明(3) 依據第一異 字串間之相 是否超過第 常特徵字 似性,計 一初始設 第一初始設定值時 當相似度值超過第一 第一初始 量代表特 中之累計數 累計數量加 以及,當累 處理程序 串與每 算相似 定值; 決定接 初始設 設定值 定群組 不超過第二 使用者可接 於一些 碰撞處理方 計數量超 使得使用 初始設定 收電子郵 情況下, 法來尋找 決定累計數 過第二 一群組所 度值;決 以及當其 收之電子 定值時, 之群組所 中所關聯 量是否超 初始設定 接收此電 執行正常 關聯之第二異 定其中之一相 中之一相似度 郵件為垃圾郵 更可包括··取 關聯之累計數 之電子郵件數 過第二初始設 值時,執行垃 子郵件;當累 郵件處理程序 常特徵 似度值 值超過 件。於 得其相 量,其 目,·將 定值, 圾郵件 計數量 ’使得 配使用雜 第一異常 凑表、雜湊函數以及 特徵字串之多個群 ^發明實施例肖露一種電腦可讀取儲存媒體 存電腦程式,該電腦程式用以恭入 用 儲 嗲雷腦系铋勃> > μ私+ 至電腦系統中並且使得 異常特徵為基礎之垃圾郵件 t發明實施例揭露一種以異常特徵為 過濾系統,包括通訊裝置與處理i 夂垃圾郵件 訊裝置,透過通訊裝置接收元福接於通 擷取規則來擷取電子郵件中之多個 卩件異常特徵 理單元依據第-異常特徵字串,尋徵字串。處 号找關聯於第一異常特徵 0213 - A40531TW( N2); ACI93003TW; SNOWBALL. p t d 第8頁 1287720 i、發明說明(4) 字串之多個群組。虑搜留^ #破松 群組中之每一者早依據第-異常特徵字串與多個 似度,來決定電子郵件是否為特徵字串之間之相 統更包括;;4;:= =基礎之垃圾郵件過據系 5關聯於多封以前所接收之 :中:巧群 電子郵件間所擁有之第二異當转n ^並且,關聯之每一 所擁^之ί常特徵字串來得相:I子串會較其他電子郵件 第二異常特i字:。J::::得每一群組所關聯之 一群組所關聯之第—I^ 二第一異常特徵字串與每 *值’決定其中::::=:=;目似性;!Ϊ相似 接收之電子郵件為垃圾郵件 J初始5又疋值時’決定 =第-初始設定值時,更取得似度值超 定值之群組所關聯之累計數量,其中;始設 :數量是否超過第值將=數巧;數er 第二初始設定值時,執及田累计數$超過 無法接收上述電子郵件。處理程序,使得使用者 一初始設定值時,並且,更者 當相似度值超過第 定值時,執行正常郵件處理;序:初始設 郵件。 便仵使用者可接收電子 其中之郵件可包含寄件飼服器名稱、寄件人電子郵件 I國 第9頁 0213-A4053mV(N2);ACI93003nV;SN〇WBALL.ptd 1287720 乒、發明說明(5) 2箱、收件人電子郵件信箱、副本收件人電子郵件信箱, 雄、件副本收件人電子郵件信箱以及郵件本文。 郵件異常特徵擷取規則可為下述規則之至少一者·· (1 )摘取相應於寄件人電子郵件信箱之字串,當作第 一異常特徵字串,· (2 )擁取相應於收件人/副本收件人/密件副本收件人 電子郵件信箱之字串,當作第一異常特徵字串; (3 )擷取相應於寄件伺服器名稱之字串,當作第一異 常特徵字串; Λ (4)擷取郵件本文中之相應於超連結之字串,告 一異常特徵字串; 田 色之(字5)串榻取Λ件/文中之具有與背景顏色相同之前景顏 色之子串,當作第一異常特徵字串; (6) 搁取郵件本文中之益〇 a上 .^ ^ 也 又r (無法在’庫裡面找到之字串, 當作第一異常特徵字串; 子甲 (7) 擷取郵件本文中之非屬中文 第一異常特徵字串; 飞央文之子串,當作 (8) 擁取郵件本文中被包含於 之相應於文字屬性值之字串,當作、不β形之標記中 (9 )擷取郵件本文中之具有特一…異常特徵字串; 作第一異常特徵字串;以及 文字效果之字串,當 (10)榻取郵件本文中之相應於 當作第一異常特徵字串。 、圾郵件語意之字串, 每一群組可關聯於多封以前 接收之電子郵件,其關1287720 i, invention description (2) _____ filter, not for the IP address, but according to the mail. First, a large number of training emails will be created first. The emails will be classified as normal or spam. Each incoming email will be judged using a judgment method or rule, such as a new one (bayesian classifieatiQn). According to this new class ^ normal and the degree of similarity of junk e-mail to determine this electronic:: Bo: practice one of the spam. However, there are a lot of junk e-mails available at the moment. If you don’t add a lot of strange methods to the e-mail, you can avoid being classified as a rally to affect them. Therefore, you need to The anomaly feature is based on the garbage system and method to improve the correctness of mail filtering. SUMMARY OF THE INVENTION In view of the above, it is an object of the present invention to provide a basic spam filtering system and method, characterized in that the characteristics are based. According to the above-mentioned purpose, the embodiment of the present invention discloses a basic spam filtering method, which includes: receiving a Duchen feature as an abnormal feature extraction rule to retrieve an electronic mail, using the postal code I according to the first An abnormal feature string-a plurality of groups of abnormal feature strings; and determining a plurality of groups according to a plurality of second different values and a plurality of group degrees associated with the parent in the s^-th in the abnormal group Whether the mail is similar to the spam string in some cases, in the decision e-mail 'may include: get the per-group associated; step 0213-Α40531TW (Ν2); ACI93003TW; SNOWBALL.ptd page 7 P87720 Pingping, invention description (3) According to whether the phase between the first heterostrings exceeds the first characteristic feature word, when the first initial setting value is initially set, when the similarity value exceeds the first first initial amount, the cumulative number is represented. The cumulative quantity plus and when the tired processing program string is similar to each calculation; the decision is made that the initial setting value is not more than the second user can be connected to some collision processing party. In the case of using the initial setting to receive e-mail, the method is to find the value that determines the cumulative number over the second group; and when the electronic value is received, whether the associated quantity in the group exceeds the initial setting reception Performing a spam mail when the number of similarity messages of the second parameter of the normal association is one of the two processes of the normal association is the spam mail, and the number of the emails of the accumulated number of the associated numbers is over the second initial setting value; When the tired mail handler often has a feature value value that exceeds the piece. In order to obtain its phasor, its purpose, the value will be fixed, and the amount of spam count will be used to match the first exception table, the hash function, and the plurality of groups of feature strings. The storage medium stores a computer program for obscuring the use of the stored thunder brain system, and the anti-aliasing-based spam t invention embodiment discloses an abnormal feature. For the filtering system, including the communication device and the processing device, the receiving device receives the meta-following rules through the communication device to retrieve the plurality of abnormal features in the email according to the first-abnormal feature string. , looking for a string. The number is associated with the first anomaly feature 0213 - A40531TW( N2); ACI93003TW; SNOWBALL. p t d Page 8 1287720 i, invention description (4) Multiple groups of strings. Each of the breakout groups can determine whether the email is a feature between the feature strings based on the first-abnormal feature string and the multiple similarities; 4;:= =Basic spam has been associated with multiple previously received ones: Medium: The second exception that is owned by Qiao Group e-mails is n ^ and that each of the associated features is associated with Come to phase: I substring will be the second exception for other emails: J:::: The first abnormal feature string associated with a group associated with each group and each *value' is determined by ::::=:=; simiency; Ϊ If the similarly received e-mail is the default value of the initial value of the spam J when the initial value is 5, the cumulative number associated with the group with the similarity value is obtained. If the number exceeds the first value, it will be the same as the number of times. The program is executed such that when the user initially sets the value, and moreover, when the similarity value exceeds the predetermined value, normal mail processing is performed; The user can receive the e-mail, and the mail can include the name of the mailing device, the sender's e-mail, page 9, 0213-A4053mV (N2); ACI93003nV; SN〇WBALL.ptd 1287720 ping, invention description (5 ) 2 boxes, recipient email address, copy recipient email address, male, copy recipient email address, and mail article. The mail abnormal feature extraction rule may be at least one of the following rules: (1) extracting a string corresponding to the sender's email address as the first abnormal feature string, (2) correspondingly The string of the recipient/copy recipient/bcc copy recipient email address is treated as the first abnormal feature string; (3) the string corresponding to the sender server name is taken as the first An abnormal feature string; Λ (4) Capture the message in the text corresponding to the hyperlink, and report an abnormal feature string; the color of the field (word 5) string to take the piece / text with the background color Substring of the same foreground color, as the first abnormal feature string; (6) Shelving the mail in the article 〇 a. ^ ^ also r (can not find the string in the 'library, as the first Abnormal feature string; sub-a (7) retrieved mail in this article is not the Chinese first abnormal feature string; flying sub-string of the text, as (8) the fetched mail contained in this text corresponds to the text attribute The string of values, in the mark of the not-shaped beta (9), the mail in this article has a special one... abnormal feature string ; as the first abnormal feature string; and the string of the text effect, when (10) the mail in the text corresponds to the first abnormal feature string, the spam semantic string, each group can Associated with multiple previously received emails,

0213·A40531TW(N2);ACI93003TW;SNOWBALL.ptd 第10頁 之第二異常特徵字串會較其他 字串來得相似。 1287720 ·0213·A40531TW(N2); ACI93003TW; SNOWBALL.ptd Page 10 The second exception feature string will be similar to other strings. 1287720 ·

聯之每一電子郵件間所擁有 電子郵件所擁有之異常特徵 【實施方式】 第1圖係表示依據本發明實施例之以異常特徵為基礎 之垃圾郵件過濾系統10之硬體架構圖,包括處理單元Η、 ^存裝置13、輸出裝置“、輸入裝置ΐ5、通訊 裝置16,並使用匯流排17將其連結在一起。除此之外,孰 習此技藝人士也可將此系統實施於其他電腦系統樣態 (configuration)上,例如,手持式設備(hand-heid devices)、多處理器系統、以微處理器為基礎或可程式化 之消費性電子產品(micr〇pr〇cessc)r —based 〇r programmable consumer electronics)、網路電腦、迷你 電腦、大型主機以及類似之設備。處理單元n可包含一單 中央處理單元(centrai一processing unit; cpu)或者是 關連於平行運算環境(parallel pr〇cessing environment)之多個平行處理單元。記憶體12包含唯讀記 憶體(read only memory ; R〇M)、快閃記憶體lash R〇M) 以及/或動態存取記憶體(rand〇m access mem〇ry; RAM), 4用以儲存可供處理單元丨丨執行之程式模組以及資料。一般 而言’程式模組包含常序(routines)、程式(program)、 物件(object )、元件(component )等,用以執行以郵件特 徵為基礎之垃圾郵件過濾功能。本發明亦可以實施於分散 式運算環境’其運算工作被一連結於通訊網路之遠端處理 設備所執行。在分散式環境中,郵件異常特徵為基礎之垃Abnormal features possessed by e-mails owned by each e-mail room [Embodiment] FIG. 1 is a diagram showing a hardware structure of a spam filtering system 10 based on anomalous features according to an embodiment of the present invention, including processing The unit ^, the storage device 13, the output device ”, the input device ΐ5, the communication device 16, and the bus bar 17 are used to connect them together. In addition, those skilled in the art can also implement the system on other computers. System configuration, for example, hand-heid devices, multiprocessor systems, microprocessor-based or programmable consumer electronics (micr〇pr〇cessc) r-based Programmabler programmable consumer electronics), network computers, minicomputers, mainframes, and the like. The processing unit n can include a single central processing unit (cpu) or a parallel computing environment (parallel pr〇) Multiple parallel processing units of cessing environment. Memory 12 contains read-only memory (R〇M), flash memory lash R〇M) and/or rm〇m access mem〇ry (RAM), 4 is used to store program modules and data that can be executed by the processing unit. Generally, the program module contains Routines, programs, objects, components, etc., are used to perform spam filtering functions based on email features. The present invention can also be implemented in a distributed computing environment. Executed by a remote processing device connected to the communication network. In a decentralized environment, the message anomaly is based on

1287720 ‘ 異、發明說明(7) 圾郵件過;慮系統1 〇之功能 — 電腦系統共同完成。儲存裝=13勺=由本地以及多部遠端 置、光碟裝置或隨身碟裝置,用碟裝置、軟碟裝 3身碟中儲存之程式模組以及/或資硬料碟诵軟碟、光 可為有線網路卡或符合GPRS、8〇2次貝格抖。通訊裝置16 件人電子郵件信箱包含寄件飼服器名稱、寄 子郵件信箱電子郵件信箱、副本收件人: 郵件本文以及夾帶槽案等内$ 件&題、 樣符合超文件桿記纽+炊 了匕含各式各 language,HTML)之劇本指 ^ 内文背景或超連結、加上聲音等等。、’文子内容、提供 之垃Γ郝圖杜係Λ示依據本發明實施例之以異常特徵為基礎 之垃圾郵件過濾方法之方法流程圖。 首先,如步驟S21,可透過通訊裝置16取得一份電子 郵件。如步驟S23,使用郵件異常特徵擷取規則來擷取電 $郵件中之多個第一異常特徵字串,郵件異常特徵擷取之 詳細規則可參考以下段落之說明。如步驟S25,依據第一 異常特徵字串,尋找關聯於第一異常特徵字串之多個群 ,。如步驟S27,依據第一異常特徵字串與多個群組中之 每一者所關聯之第二異常特徵字串之間之相似度,來決定 電子郵件是否為垃圾郵件。 第3圖係表示依據本發明實施例之以異常特徵為基礎 之垃圾郵件過濾方法之方法流程圖。 0213-A40531TW(N2);ACI93003TW;SNOWBALL.ptd 第12頁 .1287720 · _五、發明說明(8) 首先,如步驟S311,可透過 郵件。於步驟S3 13,擷取電子郵杜^裝置16取得一份電子 擷取規則之所有字串(於本f 牛中之符合郵件異常特徵 常特徵字串)。郵件異下皆稱此類字串為異 下所述規則中步_可使用以 ^ ^ 者來擷取異常特徵字串: 2則1-擷取相應於寄件人電子郵件 異常特徵字串; 祁 < 于甲田作 規則2-擷取相應於收件人/副本收件人/密件副本收件 人電子郵件信箱之字串,當作異常特徵字串; 規則3-擷取相應於寄件伺服器之字串,當作異常特徵 字串; 、 ^ 規則4-擷取郵件本文中之相應於超連結之字串,當 異常特徵字串; 規則5-擷取郵件本文中之具有與背景顏色相同之前景 顏色之子串’此類字串亦可稱為隱藏墨水(invisible ink),當作異常特徵字串; 規則6 -擷取郵件本文中之無法在詞庫裡面找到之字 _丨串,此類字串亦可稱為文字沙拉(word salad),當作異常 特徵字串; 規則7 -擷取郵件本文中之非屬中文或英文之字串,當 作異常特徵字串; 規則8-擷取郵件本文中被包含於顯示圖形之HTML標記 中之相應於文字屬性值之字串,例如,一個顯示圖形之1287720 ‘Different, invention description (7) spam; consider the function of system 1 – computer system is completed together. Storage = 13 scoops = local and multi-port remote, optical disc or flash drive, dribble device, floppy disk loaded with three-disc stored program modules and / or hard disk floppy, light Can be wired network card or GPRS, 8 〇 2 times Berg shake. The communication device 16 person's email address includes the name of the mailing device, the mailing address of the mail box, the copy of the recipient: the mailing list and the entrainment case, etc. + 炊 匕 匕 各 各 lang lang lang lang HTML HTML HTML HTML HTML HTML HTML HTML HTML HTML HTML HTML HTML HTML HTML HTML HTML HTML HTML HTML And the text of the method for providing a spam filtering method based on anomalous features according to an embodiment of the present invention. First, in step S21, an e-mail can be obtained through the communication device 16. In step S23, the mail abnormal feature extraction rule is used to retrieve a plurality of first abnormal feature strings in the e-mail, and the detailed rules of the mail abnormal feature extraction can be referred to the following paragraphs. In step S25, a plurality of groups associated with the first abnormal feature string are searched for according to the first abnormal feature string. In step S27, it is determined whether the email is spam based on the similarity between the first abnormal feature string and the second abnormal feature string associated with each of the plurality of groups. Figure 3 is a flow chart showing a method of a spam filtering method based on anomalous features in accordance with an embodiment of the present invention. 0213-A40531TW(N2); ACI93003TW; SNOWBALL.ptd Page 12 .1287720 · _ V. Invention Description (8) First, in step S311, the mail can be transmitted. In step S3 13, the electronic mailing device 16 is obtained to obtain all the strings of the electronic capturing rule (in the case of the mail abnormal feature regular character string). The message is different from the above-mentioned string. The step in the rule can be used to extract the abnormal feature string: 2 - 1 - corresponding to the sender email abnormal feature string;祁< in the field of the rule 2 - draw the string corresponding to the recipient / copy recipient / secret copy recipient email mailbox, as an exception feature string; rule 3 - draw corresponding to the mail The string of the server is treated as an abnormal feature string; , ^ Rule 4 - Pick up the message in the text corresponding to the hyperlink, in the case of the exception feature string; Rule 5 - Capture the message with the background Substrings of the same color of the previous color 'This string can also be called invisible ink, as an abnormal feature string; Rule 6 - Pick up the message in this article can not be found in the thesaurus _ 丨 string Such a string may also be referred to as a word salad as an abnormal feature string; Rule 7 - Extracting a string in the text that is not Chinese or English, as an abnormal feature string; Rule 8 - Capture the message in this article is included in the HTML markup of the display graphic Should be a string of text attribute values, for example, a display graphic

0213-A40531TlV(N2);ACI93003TlV;SNOWBALL.ptd 第13頁 1287720 五、發明說明(9) HTML標記為” <img scr =,’imagel.gifn text = ” advertisement”/〉”,則其中相應於文字屬性值之 字串為"advertisement",當作異常特徵字串; 規則9-擷取郵件本文中之具有特殊文字效果之字串, 例如,放大字型、具閃爍功能等字串,當作異常特徵字 串;以及 規則1 0 -擷取郵件本文中之相應於 串,當作異常特徵字串 步驟S321至S3 25為一個反覆執行之迴圈,用以取得所 f摘取之異常特徵字串所對應之群組。儲存裝置13中儲存 ^個群組,其中之每一個群組中包含多個異常特徵字串,子 這$異常特徵字串係由相似之至少一封電子郵件所取得, 使得屬於同一個群組中之電子郵件會較其他群組中之^ 3擁有較相似之異常特徵字串集合。此外,I一個 常特^ ί計數量值,《表接收過之相似於此群組中之異 下1:串之電子郵件的累計數量。此迴圈詳細說明如、 ,如步驟S321,取得下一個擷取之異常 驟S323,檢舍相^ ^ 叮做于申。如步 加速處理之異常特徵字串之所有群組。為 table)以及雜湊函數(h +·、七 養表(hash 士,^丄 双、nasJl function)來進杆給去 她、士 表儲存於儲存裝置13,水進仃檢索。雜湊 area)與碰撞】·. 雜矣fe(hash LCco11!s!on area ),包含多筌 — 筆a己錄中已經儲存一個異 μ、、母一 字串之群組,其中,.、4子串及關聯於異常特徵 具中儲存於雜湊區之特定異常特徵字串之0213-A40531TlV(N2); ACI93003TlV; SNOWBALL.ptd Page 13 1287720 V. Description of invention (9) The HTML tag is "<img scr =, 'imagel.gifn text = "advertise"/>", which corresponds to The string of the text attribute value is "advertisement", which is treated as an exception character string; Rule 9 - Capture the string with special text effect in the text, for example, amplify the font, have a flashing function, etc. Making an abnormal feature string; and the rule 1 0 - extracting the message corresponding to the string in the text, as the abnormal feature string step S321 to S3 25 is a repeated execution loop for obtaining the abnormal feature of the f extract The group to which the string corresponds. The storage device 13 stores ^ groups, each of which contains a plurality of abnormal feature strings, and the $ abnormal feature string is obtained by at least one similar email, so that the same group belongs to the same group. The email in it will have a similar set of anomalous signature strings than ^3 in other groups. In addition, I has a constant count value, and the table has received a similar number of emails similar to the one in the group. This loop details the example, and, in step S321, the next extraction exception S323 is obtained, and the detection phase is performed. All groups of exception feature strings that are accelerated by step. For the table) and the hash function (h + ·, seven raise table (hash, ^ double, nasJl function) to go to her, the watch is stored in the storage device 13, the water into the search. Hashing area) and collision 】·. 矣 矣 fe(hash LCco11!s!on area ), including multiple 筌 笔 笔 己 己 己 己 己 己 己 己 己 己 己 己 己 己 己 己 己 己 己 己 己 己 己 己 己 己 己 己 己 己 己 己 己 己 己 己 己 己 己 己The abnormal feature has a specific abnormal feature string stored in the hash area.

1287720 五、發明說明(10) ^際儲存位址,係由雜凑函數依據異常特徵 碼(例如ASCII碼、ΒΙ(ί 5碼、GB2313碼等) 言之,步驟S323首先依據取得之異常特而件。评而 若符合則取得儲存位址中:::::二::特: 進㈣處理(c〇llisio…cess),至群碰:區-沒有則 (collision)中之儲存空間找尋相符之異常特徵1287720 V. Inventive Note (10) The inter-storage address is determined by the hash function based on the abnormal feature code (for example, ASCII code, ΒΙ5, GB2313 code, etc.), and step S323 is based on the exception obtained first. If the match is met, the storage address is obtained:::::2:: special: advance (four) processing (c〇llisio...cess), to the group touch: area - no (collision) storage space to find the match Abnormal feature

理,皆可做適度修改並應用中雜 ::是則進行步糊丨之處理,否則進行步驟S32=J 檢幸Ϊί:?? ’將此電子郵件之異常特徵字串與每-個 檢索到之群組中之特徵字串進行相似度比對, ID表!t郵件與檢索到之群組之相似度值。如步驟 一給去1疋此電子郵件之異常特徵字串是否相似於盆中之 處Ξ否組:之異常特徵字串’是則進行步襲1之 於本實於例ΐ仃步驟S343之處理。關於步驟S33es341, 於本實鉍例中,可使用貝氏分類(Bayesian 之異‘^二10:),方法。貝氏Λ*類方法輸入此郵件所擷取 於-個群組之條件機率值,並據以計算出代二 1 第15頁 0213-A40531TW(N2) ;ACI93003TW; SNOWBALL, t 1287720 五、發明說明(U) 索i之群組之相似度值。接著,比較各個群組之相似度 值、,取得一個滿足初始設定條件並具有最大相似度之群組 ^ :、相似群組’若所有群組之相似度值皆無法滿足初始設 定條件’則代表此電子郵件與所有群組皆不相似。熟習此 ,藝^ 士皆了解,於比較相似度上,除使用貝氏分類法以 夕、,亦可使用各式各樣之相似度分析方法輔以判斷相似性 之準則來實作步驟S331與S343。 如步驟S343,執行正常電子郵件處理程序。於此步驟 中,可將此電子郵件留在收信伺服器 =子郵件,或者,可將此電子郵件透過各式各樣= =例如簡訊、即時通訊訊息等),直接傳送給使用者。 =步驟S351 ’將相似於此電子郵件之群組之累計數量加 丨決定相似群組之累計數量是否超過-個 初始叹疋值,是則進行步驟S355之處理, S3 43之處理。如+嗷批―认 企則進订步驟 於此+ = 執行垃圾電子郵件處理程序。 接收此電子郵件。 于便用者無法 之垃ΓΛ係;Λ示Λ據本發明實施例之以異常特徵為基礎 卵,用以儲存-電腦程式㈣,此儲存媒 異常特徵為基礎之垃圾郵件過遽方法。貫見以上所述之以 統’或特定型態或其騎,可以以 :月,方法與系 體媒體,如軟碟、光碟片、硬碟、或:,的型態包含於實 取(如電腦可讀取)儲存媒體,其中,其他機器可讀 田%式碼被機器,如Rational, you can make moderate modifications and apply the miscellaneous:: Yes, then proceed with the step, otherwise proceed to step S32=J check fortunate Ϊ:?? 'Retrieve the unusual character string of this email with each one The feature string in the group is compared with the similarity, ID table! The similarity value between the t-mail and the retrieved group. If the abnormal feature string of the email is given in step 1 is similar to the abnormality group in the basin: the abnormal feature string 'is the step 1 is performed in the processing of the step S343. Regarding step S33es341, in the present example, a Bayesian classification (Bayesian's different '^2:10:) method can be used. The Bayesian Λ* method enters the conditional probability value of the group taken from this message, and calculates the generation 2 1 page 15 0213-A40531TW(N2); ACI93003TW; SNOWBALL, t 1287720 V. Description of the invention (U) The similarity value of the group of I. Then, comparing the similarity values of the respective groups, obtaining a group that satisfies the initial setting condition and having the greatest similarity ^:, the similar group 'if the similarity values of all the groups cannot satisfy the initial setting condition' This email is not similar to all groups. Familiar with this, the art knows that, in terms of comparative similarity, in addition to using the Bayesian classification method, a variety of similarity analysis methods can be used together with the criterion for judging similarity to implement step S331 and S343. In step S343, a normal email processing program is executed. In this step, you can leave this email on the receiving server = sub-mail, or you can send the e-mail directly to the user through a variety of == for example, SMS, IM, etc.). = step S351', the cumulative number of groups similar to this email is added to determine whether the cumulative number of similar groups exceeds - an initial sigh value, and the process proceeds to step S355, S3 43. For example, if you want to subscribe to the order, then the order is as follows: + = Execute the junk e-mail handler. Receive this email. The user can't use the system according to the embodiment of the present invention to store the computer-based program (4), which is based on the abnormal feature of the storage medium. Throughout the above-mentioned or 'specific type or its riding, you can use: month, method and system media, such as floppy disk, CD, hard disk, or:, the type is included in the actual (such as computer Readable) storage medium, where other machine-readable fields are coded by the machine, such as

.1287720 -五、發明說明(12) 電腦載入且執行時,此機器變成 本發明之方法與裝置也可以以程 體,如電線或電纜、光纖、或是 其中’當程式碼被機器,如電腦 機器變成用以參與本發明之裝置 (general-purpose processing 合處理器提供一操作類似於應用 置。 雖然本發明已以較佳實施例 限定本發明,任何熟悉此項技藝 神和範圍内,當可做些許更動與 範圍當視後附之申請專利範圍所 用以參與本發明之裝置。 式瑪型態透過一些傳送 任何傳輸型態進行傳送,、 接收、載入且執行時,此 。當在一般用途處理單一 unit)實作時,程式碼結70 特定邏輯電路之獨特袈 揭露如上,然其並非用以 者’在不脫離本發明之精 潤飾’因此本發明之保護 界定者為準。 0213-A40531TW(N2);ACI93003TW;SNOWBALL.ptd 第17頁 1287720.1287720 - V. INSTRUCTIONS (12) When the computer is loaded and executed, the machine becomes the method and device of the present invention, and can also be used as a body, such as a wire or cable, an optical fiber, or a device in which the code is used, such as The computer machine becomes a device for participating in the present invention. The general-purpose processing of the present invention provides an operation similar to that of the application. Although the invention has been defined by the preferred embodiments, any one skilled in the art and the scope A number of changes and scopes may be made to participate in the device of the present invention as disclosed in the appended claims. The zebra pattern is transmitted, received, loaded and executed by some transmission type, as in general. The uniqueness of the specific logic circuit of the program code 70 is as described above, but it is not intended to be used without departing from the spirit of the invention. 0213-A40531TW(N2); ACI93003TW;SNOWBALL.ptd Page 17 1287720

【圖示簡單說明】 第1圖係表示依據本發明實施例之以 之垃圾郵件過濾系統之硬體架構圖; 、㊉特徵為基礎 第2、3圖係表示依據本發明實施例 礎之垃圾郵件過濾方法之方法流程圖; 共吊特徵為基 第4圖係表示依據本發明實施例之以異 之垃圾郵件過濾之電腦可讀取儲存媒 主吊特欲為基礎 【主要元件符號說明】 不思、圖。 1 0〜以異常特徵為基礎之垃圾郵件過:虔 11〜處理單元; ‘系統 1 2〜記憶體; 13〜儲存裝置 14〜輸出裝置 1 5〜輸入裝置 16〜通訊裝置 1 7〜匯流排; S21、S23、S25、S27〜操作步驟; 〜操作步驟; 郵件過濾電腦程式 S311、S313.....S353、S355BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a hardware structural diagram of a spam filtering system according to an embodiment of the present invention; and ten features are based on the second and third figures showing spam according to an embodiment of the present invention. Method flow chart of filtering method; common hanging feature is based on Fig. 4 is a diagram showing the special desire of computer readable storage medium based on the spam filtering of the embodiment of the present invention. [Main component symbol description] , map. 1 0 ~ spam based on abnormal features: 虔 11 ~ processing unit; 'system 1 2 ~ memory; 13 ~ storage device 14 ~ output device 1 5 ~ input device 16 ~ communication device 1 7 ~ bus; S21, S23, S25, S27~ operation steps; ~ operation steps; mail filtering computer programs S311, S313.....S353, S355

40〜儲存媒體; 420〜以異常特徵為基礎之垃圾40~ storage media; 420~ garbage based on abnormal features

0213-A40531TW(N2);ACI93003TW;SN〇WBALL.ptd 第18頁0213-A40531TW(N2); ACI93003TW; SN〇WBALL.ptd第18页

Claims (1)

1287720 六、申請專利範圍 1· 一種以異常特徵為基 具有-中央處理器之電子裝置勃垃圾::過濾方法,被-驟·· 裝置執仃,其方法包括下列步 接收一電子郵件; 使用-郵件異常特徵擷取 之多個第一異常特徵字串; %上迷冤千郵件中 依據上述第一異常特徵字串 常特徵字串之多個群組;以及 寸々_聊於上遮第一異 依據上述第一異當牲外〜士 μ 者所關聯之多個第二異上述多個群組中之每- 上述電子郵件是否為了串之間之相似度,來決定 2 ·如申請專利範圍笫1 垃圾郵件過濾方法,复中上过翻^之以異㊉特徵為基礎之 稱、-寄件人電子郵;nti料包含一寄件伺服器名 副本收件人電子郵件信箱、目=件人電子郵件信箱、一 箱以及一郵件本文Γ = 3翻二拔件副本收件人電子郵件信 則之至少一者: 述郵件異常特徵擷取規則為下述規 擷取相應於上述寄#人Φ ^ β 述第一異常特徵字串寄件人電子郵件信箱…’當作上 操取相應於上述收件人/ 3丨 電子郵件信箱之字串,當作上"收件人/密件副本收件人 擷取相應於上述寄件飼服異常特徵字串; 一異常特徵字串; 以稱之字串,當作上述第 擷取上述郵件本文中之相應於超連結之字串,當作1287720 VI. Patent application scope 1. An electronic device with a central processor based on an abnormal feature: a filtering method, which is executed by a device, and the method includes the following steps: receiving an email; a plurality of first abnormal feature strings captured by the mail anomaly feature; % of the plurality of groups in the thousands of emails according to the first abnormal feature string constant feature string; and the inch _ chatting on the first cover Depending on whether the above-mentioned e-mail is based on the similarity between the strings, whether the above-mentioned e-mails are related to the similarity between the strings, etc.笫1 spam filtering method, the name of the top-of-the-line feature based on the different ten features, the sender's e-mail; nti material contains a copy of the server name, the recipient's e-mail address, the destination = A person's e-mail address, a box, and a mail Γ = 3 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻 翻Φ ^ β An abnormal feature string sender email address... 'takes the string corresponding to the above recipient/3丨 email address as the upper "recipient/bcc copy recipient Corresponding to the above-mentioned mail feeding abnormal characteristic string; an abnormal characteristic string; a string of words, as the above-mentioned first drawing of the above-mentioned mail corresponding to the hyperlink in the text, as 〇213-A4053m(N2);ACI93003TW;SNOWBALL.ptd 第19頁 1287720〇213-A4053m(N2); ACI93003TW;SNOWBALL.ptd Page 19 1287720 述第一異常特徵字串; $取上述郵件本文中之具有與背景顏色相同之前景顏 邑之子串,當作上述第一異常特徵字串; 擁取上述郵件本文中之無法在詞庫裡面找到之字串, 當作上述第一異常特徵字串; =取上述郵件本文中之非屬中文或英文之字串,當作 上述第一異常特徵字串; 夕h Ϊ取上述郵件本文中被包含於顯示圖形之HTML標記中 串;…於文字屬性值之字串,當作上述第一異常特徵字 擷取上述郵件本文中之具有特殊文字效果之字串,告 作上述第一異常特徵字串;以及 予政果之子串田 當作:述ίί本文中之相應於垃圾郵件語意之字串, 作上述第一異常特徵字串。 3·如申請專利範圍第丨項所述之以異 垃圾郵件過滹方法,1由— 、吊特徵為基礎之 接收之ΐ中母一上述群組關聯於多封以前所 述第二異常特徵窣电合#甘母電子郵件間所擁有之上 子串來得相似。 件所擁有之異常特徵 4·如申請專利範圍第3項 垃圾郵件過濾方法,於決定卜、+、異常特徵為基礎之 之步驟中,包括·· ^ 郵件是否為垃圾郵件 取得每一上述群組所關聯之上 依據上述第一異常特徵字串與異常特徵字串; 一母—上述群組所關聯之 第20頁 0213.A4053lIW(N2);ACI93003TW;SNOWBALL.ptd 1287720 •I六、申請專利範圍 ίϊί;:常f徵字串間之相似性,計算-相似度值; 述接超過上述第一初始設定值時,決定ΐ 叹之電子郵件為垃圾郵件。 疋上 5 ·如申請專利範 垃圾郵件過濾、方法,於Λ=以異常特徵為基礎之 初始設定值時,包括··、田述相似度值超過上述第一 取得其相似度值超過上诚筮—、 關聯之一累計數量,上 =u設定值之一群組所 之電子郵件數目; 、β 1代表特定群組中所關聯 將上述累計數量加一; 數量是否超過一第二初始設定值;以及 垃圾;ΐίίΙΓ超!上述第二初始設定值時,執行ΐ 6 ,2^序,使仵使用者無法接收上述電子郵件。 6·如申請專利範圍第5 于郵件。 垃圾郵件過濾方法,其中於νΛ:Λ 為礎之 初始設定值時,包括 备上述相似度值超過上述第- 一正ϊίϊΐ計數量不超過上述第二初始設定值時,執行 7如Ιί理程使得使用纟可接收上述電子郵件。 垃圾郵件工;2範:ί1項所述之以異常特徵為基礎之 祝過濾方法,於号找上述群組步驟中,包括: :上述第雜特=湊函數以及碰撞處理方法來尋找相應 边第一異常特徵字串之上述多個群組。 8· 一種電腦可讀取儲存媒體,用以儲存一電腦程式, 第21頁 〇213-A40531TW(N2) ;ACI93003TW;SNOWBAa.r 1287720 · 、六、申請專利範圍 " 腦程式用以載入至一電腦系統中並且使得該電腦系統 執订一以異常特徵為基礎之垃圾郵件過濾方法,其方法包 接收一電子郵件; 使用一郵件異常特徵擷取規則來擷取上述電子郵件中 之多個第一異常特徵字串; 依據上述第一異常特徵字串,尋找關聯於上述第一異 吊特徵字串之多個群組;以及 、 依據上述第一異常特徵字串與上述多個群組中之每一 ==多!第二異常特徵字串之間之相减,來決定 上述電子郵件是否為垃圾郵件。 括·· 9· 一種以#常特徵為基礎之垃圾郵件過滤系統,包 一通訊裝置; 訊裝置 異常特 特徵字 第一異 特徵字 常特徵 垃圾郵 述之以 件包含 收件人 ,透過上述通訊裝 徵擷取規則來擷取 串,依據上述第一 常特徵字串之多個 串與上述多個群組 字串之間之相似 件。 異常特徵為基礎之 一寄件伺服器名 電子郵件信箱、一 一處理單元,耦接於上述通 置接收一電子郵件,使用一郵件 上述電子郵件中之多個第一異常 異常特徵字串,尋找關聯於上述 群組,以及,依據上述第一異常 中之每一者所關聯之多個第二異 度,來決定上述電子郵件是否為 1 0 ·如申請專利範圍第9項所 垃圾郵件過濾系統,其中上述郵 稱、一寄件人電子郵件信箱、一The first abnormal feature string is described; $ takes the sub-string of the above-mentioned mail with the same background color as the first abnormal feature string; the above-mentioned mail is not found in the vocabulary. The string is treated as the first abnormal feature string; = the string of the non-Chinese or English in the above mail is taken as the first abnormal feature string; the hourly message is included in the article. In the HTML mark of the display graphic; ... in the string of the text attribute value, as the first abnormal feature word, the string having the special text effect in the mail message is taken as the first abnormal feature string. And the sub-field of the political fruit is treated as: the string corresponding to the spam semantics in the text, the first abnormal feature string. 3. If the method of applying for the spam is as described in the scope of the patent application, the receiving of the above-mentioned group is related to the plurality of previously described second anomalous features. The electricity and the #甘母Email room have the same substrings. The abnormal features possessed by the device. 4. The third step of the spam filtering method in the patent application scope, in the steps based on the decision of the bu, +, and anomalous features, including whether the email is spam or not. The association is based on the first abnormal feature string and the abnormal feature string; a parent-the group associated with the 20th page 0213.A4053lIW(N2); ACI93003TW; SNOWBALL.ptd 1287720 • I 6. Patent application scope ϊ ;;: The similarity between the regular f-strings, the calculation-similarity value; when the above-mentioned first initial setting is exceeded, the e-mail of the sigh is determined to be spam.疋上5 ·If applying for patent spam filtering and methods, when Λ=the initial set value based on abnormal features, including ··, Tian Shu similarity value exceeds the above first, the similarity value exceeds the sincerity —, the cumulative number of one of the associations, the number of emails in the group of one of the upper = u set values; , β 1 represents the associated number in the specific group plus one of the above cumulative quantities; whether the quantity exceeds a second initial set value; And garbage; ΐίίΙΓ super! When the second initial setting value is used, the sequence of ΐ 6 , 2 is executed, so that the user cannot receive the above email. 6. If you apply for patent coverage number 5 in the mail. The spam filtering method, wherein the initial setting value is based on νΛ: ,, if the similarity value exceeds the above-mentioned first positive ϊ ϊΐ ϊΐ 不 不 不 不 不 不 不 不 不 不 理 理 理 理 理 理Use 纟 to receive the above email. Spam worker; 2 Fan: ί1 item based on the abnormal feature based on the filtering method, in the above-mentioned group step, including:: the above-mentioned first special = cum function and collision processing method to find the corresponding side The above plurality of groups of an abnormal feature string. 8· A computer readable storage medium for storing a computer program, page 21 〇213-A40531TW(N2); ACI93003TW; SNOWBAa.r 1287720 · , VI, patent application scope " brain program for loading to a computer system and causing the computer system to perform an abnormal feature-based spam filtering method, the method package receiving an email; using a mail anomaly feature capture rule to retrieve a plurality of the emails An abnormal feature string; searching for a plurality of groups associated with the first different feature string according to the first abnormal feature string; and, according to the first abnormal feature string and the plurality of groups Every == more! The subtraction between the second abnormal feature strings determines whether the above email is spam. Included······························································································· The extracting rule is used to retrieve the string according to the similarity between the plurality of strings of the first constant feature string and the plurality of group strings. The abnormal feature is based on one of the mail server name email addresses, and the processing unit is coupled to the above-mentioned device to receive an email, and uses a plurality of first abnormal abnormal feature strings in the email to search for Associated with the group, and determining whether the email is 1 0 according to a plurality of second degrees of heterogeneity associated with each of the first abnormalities, and the spam filtering system of claim 9 , the above postal name, a sender's email address, one 1287720 · 六、申請專利範圍 副本收件人電 箱以及 則之至 擷 述第一 擷 電子郵 擷 一異常 擷 述第一 擷 色之字 擷 當作上 擷 上述第 擷 之相應 串; 擷 作上述 擷 當作上 一郵件 少一者 取相應 異常特 取相應 件信箱 取相應 特徵字 取上述 異常特 取上述 串,當 取上述 述第一 取上述 一異常 取上述 於文字 取上述 第一異 取上述 述第一 子郵件## 本文,I、,+、叔一密件副本收件人電子郵件信 : ^郵件異常特徵擷取規則為下述規 :二:寄件人電子郵件信箱之字,,當作上 ΐΐί收:d本收件人/密件副本收件人 於卜、+、&作上述第一異常特徵字串; 串,寄件祠服器名稱之字_,當作上述第 中之相應於超連結之字· ’當作上 m文中之具有與背景 作上述第一異常特徵字串; 別豕顏 ::t文中之無法在詞庫裡面找到之字串, 異常特徵字串; 丁甲, 郵件本文中之非Φ令十π 特徵字串 屬中文或央文之字串,當 2本文中被包含於顯*圖形之HTML標記 ,值之字串,當作上述第一異常特徵字 作 ΐ件本文中之具有特殊文字效果之字串 韦特徵字串;以及 本文中之相應於垃圾郵件語意之字串, 異〶特徵字串。 ,當1287720 · VI. Apply for a copy of the patent scope to the recipient's electrical box and then to the first electronic mail, an abnormal statement of the first color is taken as the corresponding string of the above-mentioned third paragraph;撷 撷 上 上 上 上 上 上 相应 相应 相应 相应 相应 相应 相应 相应 相应 相应 相应 相应 相应 相应 相应 相应 相应 相应 相应 相应 相应 相应 相应 相应 相应 相应 相应 相应 相应 相应 相应 相应 相应 相应 相应The first sub-mail ## This article, I,, +, uncle one copy of the recipient email letter: ^ mail anomaly feature extraction rules for the following rules: two: the sender's e-mail address, when Ϊ́ΐ 收 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Corresponding to the word of hyperlinks. 'As the first abnormal feature string with the background in the m text; 别豕颜:: The string in the t text that cannot be found in the lexicon, the abnormal feature string; A, the mail is not in this article Φ Let the ten π characteristic string be a string of Chinese or Central text, when 2 is included in the HTML mark of the *graphic, the string of values, as the first abnormal feature word as the first part of the article The special character effect string Wei character string; and the string corresponding to the spam semantics in this article, the different character string. , when 0213-A40531TW(N2) ;ACI93003TW; SNOWBALL. ptd 第23頁 .12877200213-A40531TW(N2) ; ACI93003TW; SNOWBALL. ptd Page 23 .1287720 垃圾u件m利範圍第9項所述之以異常特徵為基礎之 存上包括—儲存裝*’上述儲存裝置儲 之電子郵件母一上述群組關聯於多封以前所接收 二異常特徵二ϋΐ聯之每一電子料間所擁有之上述第 來得相似串會較其他電子郵件所擁有之異常特徵字串The deposit based on the abnormal features described in item 9 of the garbage component includes: storage device*'s storage device stored in the above-mentioned storage device, and the above group is associated with multiple previously received two abnormal features. The above-mentioned first-come-like strings owned by each of the electronic materials will be compared with the abnormal feature strings owned by other e-mails. 之垃2郵如件申過請a利範圍第"項所述之以異常特徵為基礎 組所:聛濾系統中上述處理單元取得每-上述群 徵字串述第一異常特徵字串,依據上述第—異常特 之二组;關聯之上r二異常特徵字串間 上Ξ第土:初始設定值,以及,#其中之-相似度值超過 郵件。子刀始&定值時,決定上述接收之電子郵件為垃圾 如申請專利範圍第12項所述之以異常特徵為基礎 值超:m渡系統…上述處理單元於當上述相似度 11、'*第一初始設定值時,更取得其相似度值超過上 a第一初始設定值之一群組所關聯之一累計數量,其中之 上述累計數量代表特定群組中所關聯之電子郵目, 上巧計數量加一,決定上述累計數量是否超過一第二: 始"又定值’以及’當上述累計數量超過上述第二初始設定 值時,執行一垃圾郵件處理程序,使得使用者無法接收上 述電子郵件。 η·如申請專利範圍第13項所述之以異常特徵為基礎According to the anomaly feature, the above-mentioned processing unit obtains the first abnormal feature string for each of the above-mentioned group syndrome words. According to the above-mentioned first-exception special two groups; the upper two relations between the two abnormal feature strings are associated with the initial set value, and the #-the similarity value exceeds the mail. When the sub-tool starts & the value is determined, the above-mentioned received e-mail is determined to be garbage. The abnormal feature-based value is as described in item 12 of the patent application scope: the m-transfer system...the above processing unit is when the above similarity is 11, ' * When the first initial setting value is obtained, a cumulative quantity whose one of the similarity values exceeds one of the first initial setting values of the upper a is obtained, wherein the accumulated quantity represents the electronic postal number associated with the specific group, The upper count is incremented by one to determine whether the accumulated quantity exceeds a second: start "set value' and 'when the accumulated quantity exceeds the second initial set value, a spam processing program is executed, so that the user cannot receive The above email. η·Based on the abnormal characteristics as described in item 13 of the patent application scope 1287720 · 六、申請專利範圍 之垃圾郵件過濾系統,其中上述處理 、、 值超過上迷第一初始設定值時,並且,舍 二又 使得#用ί 一時,執打一正常郵件處理程序, 便侍使用者可接收上述電子郵件。 斤 1 5 ·如申請專利範圍第9項所述之以異常特徵為基礎 垃圾郵件過濾系統,上述處理單元搭配使用雜湊表、雜凑 函數以及碰撞處理方法來尋找相應於上述第一異常特徵二 串之上述多個群組。 予 ❹1287720 · Sixth, the application of the scope of the spam filtering system, in which the above processing, value exceeds the first initial set value, and the second two make # use ί, then hit a normal mail processing program, wait The user can receive the above email.斤1 5 · According to the abnormal feature-based spam filtering system described in claim 9 of the patent scope, the processing unit uses a hash table, a hash function, and a collision processing method to find two strings corresponding to the first abnormal feature. The above plurality of groups. Give 0213-A40531TW(N2);ACI93003TW;SNOWBALL.ptd 第25頁0213-A40531TW(N2); ACI93003TW;SNOWBALL.ptd第25页
TW94125105A 2005-07-25 2005-07-25 Junk mail filtering systems and methods based on abnormal features in e-mails TWI287720B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW94125105A TWI287720B (en) 2005-07-25 2005-07-25 Junk mail filtering systems and methods based on abnormal features in e-mails

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW94125105A TWI287720B (en) 2005-07-25 2005-07-25 Junk mail filtering systems and methods based on abnormal features in e-mails

Publications (2)

Publication Number Publication Date
TW200705215A TW200705215A (en) 2007-02-01
TWI287720B true TWI287720B (en) 2007-10-01

Family

ID=39201749

Family Applications (1)

Application Number Title Priority Date Filing Date
TW94125105A TWI287720B (en) 2005-07-25 2005-07-25 Junk mail filtering systems and methods based on abnormal features in e-mails

Country Status (1)

Country Link
TW (1) TWI287720B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI505112B (en) * 2014-01-06 2015-10-21 Openfind Information Technology Inc E-mail server-side profile filtering method

Also Published As

Publication number Publication date
TW200705215A (en) 2007-02-01

Similar Documents

Publication Publication Date Title
US9819634B2 (en) Organizing messages in a messaging system using social network information
US10387559B1 (en) Template-based identification of user interest
US9071560B2 (en) Tagging email and providing tag clouds
CN104982011B (en) Use the document classification of multiple dimensioned text fingerprints
US7657603B1 (en) Methods and systems of electronic message derivation
US10262080B2 (en) Enhanced search suggestion for personal information services
Toolan et al. Feature selection for spam and phishing detection
US9906539B2 (en) Suspicious message processing and incident response
CN104067567B (en) System and method for carrying out spam detection using character histogram
US20100131523A1 (en) Mechanism for associating document with email based on relevant context
US7895515B1 (en) Detecting indicators of misleading content in markup language coded documents using the formatting of the document
WO2007143223A2 (en) System and method for entity based information categorization
US9667737B2 (en) Publisher-assisted, broker-based caching in a publish-subscription environment
Woitaszek et al. Identifying junk electronic mail in Microsoft outlook with a support vector machine
Sethi et al. Spam email detection using machine learning and neural networks
US8843574B2 (en) Electronic mail system, user terminal apparatus, information providing apparatus, and computer readable medium
US20120215858A1 (en) Caching potentially repetitive message data in a publish-subscription environment
TWI287720B (en) Junk mail filtering systems and methods based on abnormal features in e-mails
Patidar et al. A novel technique of email classification for spam detection
Chen et al. Email visualization correlation analysis forensics research
Islam et al. Machine learning approaches for modeling spammer behavior
Kolcz et al. The challenges of service-side personalized spam filtering: scalability and beyond
Sagar et al. An Effective Spam Classification Filter As A Web Application Using Naïve Bayes Classifier
Smirnov Clustering and classification methods for spam analysis
JP4334210B2 (en) Message providing system

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees