TW201125331A - Method, system and device for junk message recognition - Google Patents

Method, system and device for junk message recognition Download PDF

Info

Publication number
TW201125331A
TW201125331A TW99100272A TW99100272A TW201125331A TW 201125331 A TW201125331 A TW 201125331A TW 99100272 A TW99100272 A TW 99100272A TW 99100272 A TW99100272 A TW 99100272A TW 201125331 A TW201125331 A TW 201125331A
Authority
TW
Taiwan
Prior art keywords
message
communication message
communication
sender
content
Prior art date
Application number
TW99100272A
Other languages
Chinese (zh)
Inventor
li-ming Zhang
Po Wen
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to TW99100272A priority Critical patent/TW201125331A/en
Publication of TW201125331A publication Critical patent/TW201125331A/en

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The present invention discloses a method, system and device for junk message recognition, said method comprising: extracting sender information from communication message, according to extracted sender information, judging whether the sender of the communication message is expected sender, if not, then recognizing that the communication message is junk message, or recognizing the communication message continually. Through the application, junk message recognition is realized by judging whether sender of communication message is expected user, the rate of junk message missing judgment or wrong judgment can be reduced, the junk message recognition accuracy can be improved, as well as the message filtration effect can be strengthened.

Description

201125331 六、發明說明: 【發明所屬之技術領域】 本發明係有關網路通信技術領域,且特別有關一種垃 圾消息的識別方法、裝置和系統。 【先前技術】 隨著通信技術的發展,各種通訊系統的發明使用大大 方便了用戶之間的相互交流。兩個或兩個以上的用戶端可 以透過通訊網絡而進行連接,以即時地傳遞文字、檔案、 語音和視頻資訊。 然而’在大量的通訊消息中存在著相當一部分無用的 、甚至帶有欺騙性質的垃圾消息,不僅給通訊系統中的伺 服器增加了額外的負擔,也給處於用戶端的用戶造成了極 其惡劣的影響。用戶接收到垃圾消息後,需要花費大量的 時間用來確認接收到的消息是否有用以及刪除確認無用的 垃圾消息,甚至還會由於大量的垃圾消息的聚集而錯過正 常消息的接收,嚴重干擾了用戶之間的正常交流》同時, 大量的垃圾消息也會影響即時通訊系統運營商的信譽,給 網路監管帶來困難,甚至會使運營商的生產環境短時間內 癱瘓,使運營商蒙受巨額的經濟損失。 現有技術中,通常採用預定的關鍵字對接收到的通訊 消息進行過濾,以識別垃圾消息,具體步驟包括:首先, 透過對大量的垃圾消息的分析,總結垃圾消息中常用到的 一些關鍵字,組成關鍵字庫,放置到即時通訊系統伺服器 -5- 201125331 或用戶端中,關鍵字可以是“中獎”、"服裝大拍賣”、 “匯款”、“轉讓”等詞語,在很大程度上表明該資訊是 以宣傳、廣告、詐騙等爲目的的垃圾資訊;其次,接收到 通訊消息後,提取該通訊消息的消息內容,以供查驗;最 後,對照關鍵字庫,査驗接收到的通訊消息的資訊內容中 是否含有關鍵字庫內儲存的關鍵字,如果該消息內容中含 有關鍵字,則判定該通訊消息爲垃圾消息,對該通訊消息 進行丟棄;如果該消息內容中不含有關鍵字,則判定該通 訊消息爲正常消息。 另外,還採用預定的正則運算式(regular expression )對接收到的通訊消息的消息內容中的某種格式進行匹配 ,以識別垃圾消息。正則運算式描述了一種字串匹配的模 式,可以用來檢査一個字串是否含有某種子字串、將匹配 的子字串做替換或者從某個字串中取出符合某個條件的子 字串等。正則運算式判斷消息內容中是否存在匹配的關鍵 特徵,如網址、電話號碼、即時通訊聯繫號碼等資訊,如 果消息內容中存在匹配的關鍵特徵,則判定該通訊消息爲 垃圾消息。 現有技術也可以對上述兩種識別垃圾消息的方法進行 組合,綜合使用預定的關鍵字和正則運算式,過濾接收到 的通訊消息,以識別垃圾消息。 在實現本發明的過程中,發明人發現現有技術至少存 在以下問題: 現有技術中,透過對消息內容進行規則匹配識別垃圾 -6- 201125331 消息,識別的準確率取決於關鍵字和正則運算式的設定是 否合適。關鍵字和正則運算式一般是根據經驗設定或者從 已經標記爲垃圾消息的消息中選取,具有一定的隨意性, 無法識別不在關鍵字範圍內或關鍵字出現頻率較低的垃圾 消息》而符合垃圾消息的部分特徵、但不屬於垃圾消息的 消息,卻容易被誤判爲垃圾消息。例如,某些垃圾消息中 包含“視頻聊天”的詞語,如果把“視頻聊天”設定爲關 鍵字來識別垃圾消息,則用戶希望邀請其他人進行視頻聊 天的、包含“視頻聊天”的正常消息就會被誤判爲垃圾消 息。因此,現有的垃圾消息的識別方法,僅僅對消息內容 進行機械式的識別,沒有考慮到消息發送的場景,對垃圾 消息的識別結果存在很大的誤判率和漏判率。 【發明內容】 本發明提供一種垃圾消息的識別方法、裝置和系統, 提高了識別垃圾消息的準確率。 本發明提供一種垃圾消息的識別方法,包括·· 提取通訊消息中的發送方資訊; 根據所述提取的發送方資訊,判斷所述通訊消息的發 送方是否爲所預期的發送方,如果所述通訊消息的發送方 不是該所預期的發送方’則識別所述通訊消息爲垃圾消息 ,或繼續對所述通訊消息進行識別。 本發明還提供一種通訊設備’包括: 提取模組,用以提取通訊消息中的發送方資訊; 201125331 第一判斷模組,用以根據所述提取模組提取的發送方 資訊,判斷所述通訊消息的發送方是否爲所預期的發送方 t 識別模組,用以在所述第一判斷模組判斷所述通訊消 息的發送方不是該所預期的發送方時’識別所述通訊消息 爲垃圾消息,或繼續對所述通訊消息進行識別。 本發明還提供一種通訊系統,包括: 第一通訊設備,用以提取通訊消息中的發送方資訊’ 根據所述提取的發送方資訊,判斷所述通訊消息的發送方 是否爲所預期的發送方,在所述通訊消息的發送方不是該 所預期的發送方時,將所述通訊消息標記爲待識別消息, 向第二通訊設備轉發標記後的通訊消息; 第二通訊設備,用以根據以下內容中的至少一者,對 接收到的來自所述第一通訊設備的通訊消息進行識別: 預定的關鍵字列表、預定的正則運算式和預定的垃圾 消息識別選項。 本發明包括以下優點,由於透過判斷通訊消息的發送 方是否爲所預期的用戶,以進行垃圾消息的識別,降低了 對垃圾消息的漏判率和誤判率,提高了識別垃圾消息的準 確率’進而增強了資訊過濾的效果。當然,實施本發明的 任一產品並不一定需要同時達到以上所述的所有優點。 【實施方式】 本發明的主要思想包括,提取通訊消息中的發送方資 -8 - 201125331 訊;根據提取的發送方資訊,判斷通訊消息的發送方是否 爲所預期的發送方;如果通訊消息的發送方是所預期的發 送方,則識別該通訊消息爲正常消息;如果通訊消息的發 送方不是所預期的發送方,則識別該通訊消息爲垃圾消息 ,或繼續對接收到的通訊消息進行識別。本發明實施例中 ,可以由發送用戶端執行上述對垃圾消息的識別方法,對 待發送的通訊消息進行識別;也可以由系統伺服器執行上 述對垃圾消息的識別方法,對中轉的通訊消息進行識別; 還可以由接收用戶端執行上述對垃圾消息的識別方法,對 接收到的通訊消息進行識別。不論在發送用戶端、系統伺 服器或接收用戶端執行上述對垃圾消息的識別方法,對垃 圾消息的識別效果相同,均可以達到本發明的發明目的。 下面將結合本發明實施例中的附圖,對本發明中的技 術方案進行清楚、完整地描述,顯然,所描述的實施例僅 僅是本發明的一部分實施例,而不是全部的實施例。基於 本發明中的實施例,本領域普通技術人員在沒有做出創造 性勞動前提下所獲得的所有其他實施例,都屬於本發明保 護的範圍。 如圖1所示,爲本發明實施例一中的一種垃圾消息的 識別方法流程圖,包括以下步驟: 步驟101,提取通訊消息中的發送方資訊。 本發明實施例中的通訊消息可以爲IM (即時通訊)系 統中的即時消息、SMS (簡訊業務)消息、MMS (多媒體 簡訊業務)消息或E-mail (電子郵件)等,通訊消息本身 -9 - 201125331 可以包括發送時間、發送方資訊、接收方資訊和消息內容 等部分。通訊消息由發送用戶端發送,經系統伺服器轉發 到接收用戶端。其中,接收方資訊包括接收方名稱、接收 方ID (身份標識號碼)和接收方位址等內容。 系統伺服器或接收用戶端可以在接收到通訊消息後, 提取通訊消息中的發送方資訊,發送方資訊可以包括發送 方名稱、發送方ID和發送方位址等內容。 步驟1 02,根據提取的發送方資訊,判斷通訊消息的 發送方是否爲所預期的發送方。 如果通訊消息的發送方是所預期的發送方,則執行步 驟1 03 ;如果通訊消息的發送方不是所預期的發送方,則 執行步驟104。 所預期的發送方包括以下用戶中的至少一種:系統用 戶、通訊消息的接收方的好友用戶和通訊消息的接收方主 動聯繫過的非好友用戶。 判斷通訊消息的發送方是否爲所預期的發送方,包括 :獲取系統用戶名單、通信消息的接收方的好友用戶名單 和通訊消息接收方主動聯繫過的非好友用戶名單;如果通 訊消息的發送方資訊記錄在系統用戶名單、通信消息的接 收方的好友用戶名單和通訊消息接收方主動聯繫過的非好 友用戶名單的任一者中,則判斷通訊消息的發送方是所預 期的發送方。 其中,系統用戶爲發送通訊消息的第三方用戶,可以 包括通訊服務提供商。系統用戶向接收方發送的消息通常 -10- 201125331 以通知或提醒的形式出現,接收方可以將系統用戶作爲所 預期的發送方。系統用戶名單可以儲存在系統伺服器中, 用戶端識別垃圾消息時,可以向系統伺服器查詢系統用戶 名單,也可以接收來自系統伺服器的系統用戶名單。 通訊消息的接收方的好友用戶在向接收方發送通訊消 息之前,與接收方建立好友關係,並透過接收方的身份認 證,接收方可以將自身的好友用戶作爲所預期的發送方。 通信消息的接收方的好友用戶名單可以儲存在接收方用戶 端,系統伺服器識別垃圾消息時,可以向接收方用戶端查 詢接收方的好友用戶名單,也可以接收來自接收方用戶端 的接收方的好友用戶名單。 通訊消息的接收方主動聯繫過的非好友用戶在向接收 方發送通訊消息之前,未與接收方建立好友關係,但曾接 收到該接收方發送的消息。通訊消息接收方主動聯繫過的 非好友用戶名單可以儲存在接收方用戶端,接收方用戶端 可以週期性地或事件觸發性地將自身的好友用戶資訊和自 身主動聯繫過的非好友用戶資訊上傳到系統伺服器,也可 以接受系統伺服器或其他用戶端的查詢,供系統伺服器和 其他用戶端識別垃圾消息。 步驟1 03,識別通訊消息爲正常消息。 如果通訊消息的發送方是所預期的發送方,則識別該 通訊消息爲正常消息,並按照正常的流程處理該通訊消息 。系統伺服器識別接收到的通訊消息爲正常消息後,可以 對該通訊消息進行正常的轉發;接收用戶端識別接收到的 -11 - 201125331 通訊消息爲正常消息後,可以按照該通訊消息進行相對應 的操作,並將該通訊消息的發送方加入到白名單。 步驟1 04,識別通訊消息爲垃圾消息,或繼續對該逋 訊消息進行識別。 如果通訊消息的發送方不是所預期的發送方’則可以 識別該通訊消息爲垃圾消息’對該通訊消息進行丟棄處理 ,並將該通訊消息的發送方加入到黑名單;也可以繼續對 該通訊消息進行識別,識別方法可以包括使用預定的關鍵 字列表、預定的正則運算式和預定的垃圾消息識別選項中 的至少一者進行識別。 需要說明的是,本發明實施例可以根據實際需要對各 個步驟順序進行調整。上述使用預定的關鍵字列表、預定 的正則運算式和預定的垃圾消息識別選項中的至少一者識 別垃圾消息的步驟,也可以在使用發送方資訊識別垃圾消 息的步驟之前執行。 本發明包括以下優點,由於透過判斷通訊消息的發送 方是否爲所預期的用戶’以進行垃圾消息的識別,降低了 對垃圾消息的漏判率和誤判率,提高了識別垃圾消息的準 確率’進而增強了資訊過濾的效果。當然,實施本發明的 任一產品並不一定需要同時達到以上所述的所有優點。 如圖2所示,爲本發明實施例二中的一種垃圾消息的 識別方法流程圖,包括以下步驟: 步驟2〇1,發送用戶端提取待發送的通訊消息中的發 送方資訊。 -12- 201125331 發送用戶端獲取待發送的通訊消息後’可以不立即將 該通訊消息發送到指定的用戶端,而是提取該通訊消息中 的發送方資訊,以進行垃圾消息的識別。發送方資訊可以 包括發送方名稱、發送方ID和發送方位址等內容。 步驟202,發送用戶端根據提取的發送方資訊’判斷 待發送的通訊消息的發送方是否爲所預期的發送方° 如果待發送的通訊消息的發送方是所預期的發送方’ 則執行步驟2〇3 ;如果待發送的通訊消息的發送方不是所 預期的發送方’則執行步驟205。 所預期的發送方包括以下用戶中的至少一者:系統用 戶、通訊消息的接收方的好友用戶和通訊消息的接收方主 動聯繫過的非好友用戶。 系統中的用戶端也可以將自身的好友用戶資訊和自身 主動聯繫過的非好友用戶資訊,週期性地或事件觸發性地 上傳到系統伺服器。系統伺服器也可以將上述從用戶端接 收到的資訊,週期性地或事件觸發性地發送到系統中的其 他用戶端,或者接受其他用戶端的査詢,供其他用戶端進 行垃圾消息的識別。發送用戶端可以從待發送的通訊消息 中提取接收方資訊,並根據接收方資訊從系統伺服器査詢 接收方的好友用戶資訊和接收方主動聯繫過的非好友用戶 資訊,進而判斷待發送的通訊資訊的發送方是否爲接收方 的所預期的發送方。 步驟203,發送用戶端識別待發送的通訊消息爲正常 消息,將該通訊消息發送到系統伺服器。 -13- 201125331 如果通訊消息的發送方是所預期的發送方’發送用戶 端識別該通訊消息爲正常消息’並將該通訊消息發送到系 統伺服器。 步驟204,系統伺服器將接收到的通訊消息轉發給接 收用戶端,或對接收到的通訊消息進行識別。 系統伺服器接收到發送方用戶端發送的通訊消息後’ 可以提取該通訊消息中的接收方資訊,並根據該接收方資 訊將該通訊消息轉發給接收方用戶端;也可以繼續對接收 到的通訊消息進行識別,識別方法可以包括使用預定的關 鍵字列表、正則運算式和垃圾消息識別選項中的至少一者 來進行識別。 步驟205,發送用戶端判斷待發送的通訊消息的消息 內容是否與預定的關鍵字列表匹配。 如果待發送的通訊消息的消息內容與預定的關鍵字列 表匹配,則執行步驟206 ;如果待發送的通訊消息的消息 內容與預定的關鍵字列表不匹配,則執行步驟207。 關鍵字列表可以包含各種用於宣傳廣告資訊、惡意傳 播流言資訊和不文明資訊的垃圾消息中常用的關鍵字,例 如,“服裝大拍賣”、“轉讓門面店”、“中獎”、“請 匯款”等用詞,還可以包含一些常用的英文廣告詞、英文 不文明用語等。關鍵字列表可以由用戶個性化設定,也可 以由系統伺服器下發到各個用戶端。 發送用戶端對待發送的通訊消息提取消息內容,對照 預定的關鍵字列表,查驗該消息內容中是否包含關鍵字列 -14- 201125331 表中相對應的關鍵字,如果査驗到該消息內容中包含相對 應的關鍵字,則該消息內容與預定的關鍵字列表匹配;如 果查驗到該消息內容中不包含相對應的關鍵字,則該消息 內容與預定的關鍵字列表不匹配。 發送用戶端還可以對提取的消息內容進行格式轉換, 將消息內容轉換爲統一的格式,再進行查驗,如全部轉換 爲小寫、半形格式,防止垃圾消息發送者對一些關鍵字進 行大、小寫或全形、半形變換來規避查驗。 步驟206,發送用戶端識別待發送的通訊消息爲垃圾 消息。 如果待發送的通訊消息的消息內容與預定的關鍵字列 表匹配,亦即,該消息內容中包含相對應的關鍵字,則發 送用戶端識別待發送的通訊消息爲垃圾消息,並對該待發 送的通訊消息進行丟棄處理。 步驟207,發送用戶端判斷待發送的通訊消息的消息 內容是否符合預定的垃圾消息識別選項。 如果待發送的通訊消息的消息內容符合預定的垃圾消 息識別選項,則執行步驟203 ;如果待發送的通訊消息的 消息內容不符合預定的垃圾消息識別選項,則執行步驟 206 ° 如果待發送的通訊消息的消息內容與預定的關鍵字列 表不匹配,亦即,該消息內容中不包含相對應的關鍵字, 則發送用戶端可以提取待發送的通訊消息的消息內容,並 判斷提取到的消息內容是否符合預定的垃圾消息識別選項 -15- 201125331 預定的垃圾消息識別選項可以包括以下內容中的至少 一者:a、不允許出現電話號碼;b、不允許出現網路鏈結 ;c '不允許出現IM號碼;d、不允許出現圖片。用戶可以 根據自身的需求,個性化設定上述垃圾消息識別選項。 需要說明的是,本發明實施例可以根據實際需要而對 各個步驟順序進行調整。上述使用預定的垃圾消息識別選 項識別垃圾消息的步驟,可以在使用預定的關鍵字列表識 別垃圾消息的步驟之前執行;上述使用預定的關鍵字列表 和預定的垃圾消息識別選項識別垃圾消息的步驟,也可以 在使用發送方資訊識別垃圾消息的步驟之前執行。當上述 使用預定的關鍵字列表或預定的垃圾消息識別選項識別垃 圾消息的步驟,在使用發送方資訊識別垃圾消息的步驟之 前執行時,若通訊消息的消息內容與預定的關鍵字列表匹 配或消息內容不符合預定的垃圾消息識別選項,則暫時識 別該通訊消息爲垃圾消息,然後再透過判斷該通訊消息的 發送方是否是所預期的用戶進行進一步確認。亦即,若該 通訊消息的發送方不是所預期的用戶’則確認該通訊消息 爲垃圾消息,若該通訊消息的發送方是所預期的用戶,則 更改該通訊消息爲正常消息。作爲本發明實施例的替換方 案,若通訊消息的消息內容與預定的關鍵字列表匹配或消 息內容不符合預定的垃圾消息識別選項,也可以直接識別 該通訊消息爲垃圾消息’並對該通訊消息作丟棄處理。 本發明包括以下優點,透過發送用戶端主動判斷通訊 -16- 201125331 消息的發送方是否爲接收方的所預期的用戶,以及根據預 定的關鍵字列表和垃圾消息識別選項進行垃圾消息的識別 ,降低了對垃圾消息的漏判率和誤判率,提高了識別垃圾 消息的準確率,進而增強了資訊過濾的效果。當然,實施 本發明的任一產品並不一定需要同時達到以上所述的所有 優點。 本發明的以上實施方式中,由發送用戶端判斷待發送 的通訊消息的發送方是否爲接收方的所預期的用戶,並結 合預定的關鍵字列表和垃圾消息識別選項進行垃圾消息的 識別。本發明實施例中,還可以由系統伺服器判斷接收到 的通訊消息的發送方是否爲接收方的所預期的用戶,並由 接收用戶端根據預定的關鍵字列表和正則運算式進行垃圾 消息的識別。以下透過具體實施例來進行詳細描述。 如圖3所示,爲本發明實施例三中的一種垃圾消息的 識別方法流程圖,包括以下步驟: 步驟301,系統伺服器接收發送用戶端發送的通訊消 息。 通訊消息由發送用戶端發送,經系統伺服器轉發到接 收用戶端,可以包括發送時間、發送方資訊、接收方資訊 和消息內容等部分。 步驟3 02,系統伺服器提取接收到的通訊消息中的發 送方資訊。 系統伺服器接收到發送用戶端發送的通訊消息後,可 以不立即將該通訊消息轉發到指定的用戶端,而是提取該 -17- 201125331 通訊消息中的發送方資訊,以進行垃圾消息的識別。發送 方資訊可以包括發送方名稱、發送方ID和發送方位址等內 容。 步驟303,系統伺服器根據提取的發送方資訊,判斷 接收到的通訊消息的發送方是否爲所預期的發送方。 如果接收到的通訊消息的發送方是所預期的發送方, 則執行步驟3 04 ;如果接收到的通訊消息的發送方不是所 預期的發送方,則執行步驟3 06。. 所預期的發送方包括以下用戶中的至少一者:系統用 戶、通訊消息的接收方的好友用戶和通訊消息的接收方主 動聯繫過的非好友用戶。 系統中的用戶端也可以將自身的好友用戶資訊和自身 主動聯繫過的非好友用戶資訊,週期性地或事件觸發性地 上傳到系統伺服器。系統伺服器也可以將上述從用戶端接 收到的信息,週期性地或事件觸發性地向系統中的其他用 戶端公佈,供其他用戶端進行垃圾消息的識別。系統伺服 器可以從接收到的通訊消息中提取接收方資訊,並根據接 收方資訊查詢接收方的好友用戶資訊和接收方主動聯繫過 的非好友用戶資訊,進而判斷接收到的通訊資訊的發送方 是否爲接收方的所預期的發送方。 步驟3 04,系統伺服器將接收到的通訊消息標記爲正 常消息,並向接收用戶端轉發標記後的通訊消息。 如果接收到的通訊消息的發送方是所預期的發送方, 系統伺服器識別該通訊消息爲正常消息,並提取該通訊消 -18- 201125331 息中的接收方資訊,根據該接收方資訊將該通訊消息轉發 給接收方用戶端。接收用戶端可以按照接收到的通訊消息 進行相對應的操作,不再對該通訊消息進行識別。 步驟3 05,系統伺服器將接收到的通訊消息標記爲待 識別消息,並向接收用戶端轉發標記後的通訊消息。 如果接收到的通訊消息的發送方不是所預期的發送方 ,系統伺服器識別該通訊消息爲待識別消息,並提取該通 訊消息中的接收方資訊,根據該接收方資訊而將該通訊消 息轉發給接收方用戶端,由接收用戶端繼續對該通訊消息 進行識別。接收用戶端的識別方法可以包括使用預定的關 鍵字列表、正則運算式和垃圾消息識別選項中的至少一者 來進行識別。 步驟3 06,接收用戶端使用預定的關鍵字列表,對接 收到的通訊消息的消息內容進行匹配,並獲取與消息內容 匹配的關鍵字的分値。 關鍵字列表用以判斷通訊消息中是否包含垃圾消息的 特徵詞語,可以由用戶個性化設定,也可以由系統伺服器 下發到各個用戶端。每個關鍵字對應預定的分値,用於標 示該關鍵字出現在垃圾消息中的可能性,不同關鍵字的分 値可以相同或不同。 接收用戶端對接收到的通訊消息提取消息內容,對照 預定的關鍵字列表,查驗該消息內容中是否包含關鍵字列 表中相對應的關鍵字,如果查驗到該消息內容中包含相對 應的關鍵字,則判斷該消息內容與預定的關鍵字列表匹配 -19- 201125331 ,並獲取與該消息內容匹配的關鍵字的分値。與消息內容 匹配的正則運算式爲一個以上時,接收用戶端可以獲取所 有與消息內容匹配的正則運算式的分値。 接收用戶端還可以對提取的消息內容進行格式轉換, 將消息內容轉換爲統一的格式,再進行查驗,如全部轉換 爲小寫、半形格式,防止垃圾消息發送者對一些關鍵字進 行大、小寫或全形、半形變換來規避查驗。 步驟3 07,接收用戶端使用預定的正則運算式,對接 收到的通訊消息的消息內容進行匹配,並獲取與消息內容 匹配的正則運算式的分値。 正則運算式用於從消息內容中辨別某些關鍵特徵,例 如電話號碼、網路鏈結或IM號碼等。不同的正則運算式對 應不同的關鍵特徵,接收用戶端可以透過特定的正則運算 式,判斷接收到的通訊消息的消息內容中是否包含特定的 關鍵特徵,如果該消息內容中包含該關鍵特徵’則該消息 內容與該關鍵特徵對應的正則運算式匹配。 正則運算式可以由用戶個性化設定’也可以由系統伺 服器下發到各個用戶端。每個正則運算式對應預定的分値 ,用以標示該正則運算式從消息內容中辨別的關鍵特徵出 現在垃圾消息中的可能性,不同關鍵字的分値可以相同或 不同。與消息內容匹配的正則運算式爲一個以上時’接收 用戶端可以獲取所有與消息內容匹配的正則運算式的分値 〇 需要說明的是,本步驟與步驟3〇6的執行順序沒有先 -20- 201125331 後之分,亦即,接收用戶端可以在使用預定的關鍵字列表 而對接收到的通訊消息的消息內容進行匹配之前或之後, 使用預定的正則運算式,對接收到的通訊消息的消息內容 進行匹配。 步驟3 08,接收用戶端根據與消息內容匹配的關鍵字 和正則運算式的分値,以獲取該消息內容的匹配總分値。 接收用戶端將所有與消息內容匹配的關鍵字的分値和 正則運算式的分値相加,即可獲取該消息內容的匹配總分 値。 步驟3 09,接收用戶端判斷消息內容的匹配總分値是 否大於或等於預定的閾値。 如果消息內容的匹配總分値大於或等於預定的閩値, 則執行步驟3 1 0 ;如果消息內容的匹配總分値小於預定的 閩値,則執行步驟3 1 1。 上述閩値可以設定爲固定値,也可以根據通訊消息的 長度而動態地設定,亦即,不同長度的通訊消息對應不同 的閩値。 步驟310,接收用戶端識別接收到的通訊消息爲垃圾 消息。 如果消息內容的匹配總分値大於或等於預定的閩値, 則接收用戶端識別接收到的通訊消息爲垃圾消息,並對該 通訊消息進行丟棄處理。 步驟3 1 1,接收用戶端識別接收到的通訊消息爲正常 消息。 -21 - 201125331 如果消息內容的匹配總分値小於預定的閩値,則接收 用戶端識別接收到的通訊消息爲正常消息,並按照該通訊 消息進行相對應的操作。 需要說明的是,本發明實施例可以根據實際需要而對 各個步驟順序進行調整。上述使用預定的關鍵字列表和預 定的正則運算式識別垃圾消息的步驟,也可以在使用發送 方資訊識別垃圾消息的步驟之前執行。當上述使用預定的 關鍵字列表和預定的正則運算式識別垃圾消息的步驟,在 使用發送方資訊識別垃圾消息的步驟之前執行時,若通訊 消息的消息內容與預定的關鍵字列表匹配或消息內容的匹 配總分値大於或等於預定的閾値,則暫時識別該通訊消息 爲垃圾消息,然後再透過判斷該通訊消息的發送方是否是 所預期的用戶進行進一步確認。亦即,若該通訊消息的發 送方不是所預期的用戶,則確認該通訊消息爲垃圾消息’ 若該通訊消息的發送方是所預期的用戶,則更改該通訊消 息爲正常消息。作爲本發明實施例的替換方案,若通訊消 息的消息內容與預定的關鍵字列表匹配或消息內容的匹配 總分値大於或等於預定的閾値,也可以直接識別該通訊消 息爲垃圾消息,並對該通訊消息作丟棄處理。 本發明包括以下優點,透過系統伺服器判斷通訊消息 的發送方是否爲接收方的所預期的用戶,並由接收用戶端 根據預定的關鍵字列表和正則運算式進行垃圾消息的識別 ,降低了對垃圾消息的漏判率和誤判率,提高了識別垃圾 消息的準確率,進而增強了資訊過濾的效果。當然,實施 -22- 201125331 本發明的任一產品並不一定需要同時達到以上所述的所有 優點。 如圖4所示’爲本發明實施例四中的一種垃圾消息的 識別方法流程圖,包括以下步驟: 步驟401 ’用戶端獲取自身的所預期的發送方資訊。 用戶端運行後’可以從本地或者系統伺服器獲取自身 的所預期的發送方資訊。用戶端的所預期的發送方包括系 統用戶、用戶端的好友用戶和用戶端主動聯繫過的非好友 用戶中的至少一者。 步驟402,用戶端根據自身發送的通訊消息中的接收 方資訊,更新自身的所預期的發送方資訊。 用戶端發送通訊消息時,可以根據步驟401獲取的所 預期的發送方資訊,確定該通訊消息的接收方的屬性。如 果該通訊消息的接收方不是該用戶端的所預期的發送方時 ,該用戶端將該通訊消息的接收方資訊添加到自身的所預 期的發送方資訊中。 具體地說,用戶端可以將該通訊消息的接收方設定爲 自身主動聯繫過的非好友用戶,並記錄該通訊消息的接收 方資訊,該接收方資訊包括:接收方名稱、接收方ID、接 收方地址和最新聯繫時間等內容。其中,最新聯繫時間爲 用戶端向該接收方發送通訊消息的時間。 步驟403,用戶端提取接收到的通訊消息中的發送方 資訊。 用戶端接收到通訊消息後,可以提取該通訊消息中的 -23- 201125331 發送方資訊,以進行垃圾消息的識別。發送方資訊可以包 括發送方名稱、發送方ID和發送方位址等內容。 步驟404,用戶端根據提取的發送方資訊,判斷接收 到的通訊消息的發送方是否爲所預期的發送方。 如果接收到的通訊消息的發送方是所預期的發送方, 則執行步驟405 ;如果接收到的通訊消息的發送方不是所 預期的發送方,則執行步驟406。 用戶端可以將自身的好友用戶資訊和自身主動聯繫過 的非好友用戶資訊,週期性地或事件觸發性地上傳到系統 伺服器。系統伺服器也可以將上述從用戶端接收到的信息 ,週期性地或事件觸發性地向系統中的其他用戶端公佈, 以供其他用戶端進行垃圾消息的識別。 用戶端可以首先判斷該通訊消息的發送方是否爲該用 戶端的好友用戶,如果該發送方是該用戶端的好友用戶, 則判斷該發送方爲所預期的發送方;如果該發送方不是該 用戶端的好友用戶,則繼續判斷該發送方是否爲系統用戶 〇 如果該發送方是系統用戶,則判斷該發送方爲所預期 的發送方;如果該發送方不是系統用戶,則繼續判斷該發 送方是否爲該用戶端主動聯繫過的非好友用戶。 如果該發送方不是該用戶端主動聯繫過的非好友用戶 ,則判斷該發送方不是所預期的發送方;如果該發送方是 該用戶端主動聯繫過的非好友用戶,則査詢該用戶端與該 發送方的最新聯繫時間,判斷該發送方是否爲該用戶端在 -24- 201125331 設定時間內主動聯繫過的非好友用戶,亦即,該發送方與 該用戶端的最新聯繫時間與目前時間之間的時間間隔是否 超過預定的時間間隔Tmax。 如果該發送方與該用戶端的最新聯繫時間與目前時間 之間的時間間隔超過Tm ax,則判斷該發送方不是所預期的 發送方;如果該發送方與該用戶端的最新聯繫時間與目前 時間之間的時間間隔不超過Tm ax,則判斷該發送方是所預 期的發送方。 需要說明的是,本步驟可以根據實際需要而對各個判 斷順序進行調整。 步驟405,用戶端識別接收到的通訊消息爲正常消息 〇 如果接收到的通訊消息的發送方是所預期的發送方, 用戶端識別該通訊消息爲正常消息,並按照該通訊消息進 行相對應的操作。 步驟406,用戶端判斷接收到的通訊消息的消息內容 是否與預定的關鍵字列表匹配。 如果接收到的通訊消息的消息內容與預定的關鍵字列 表匹配,則執行步驟407 ;如果接收到的通訊消息的消息 內容與預定的關鍵字列表不匹配,則執行步驟408。 用戶端對接收到的通訊消息提取消息內容,對照預定 的關鍵字列表,查驗該消息內容中是否包含關鍵字列表中 相對應的關鍵字,如果查驗到該消息內容中包含相對應的 關鍵字,則該消息內容與預定的關鍵字列表匹配;如果查 -25- 201125331 驗到該消息內容中不包含相對應的關鍵字,則該消息內容 與預定的關鍵字列表不匹配。 用戶端還可以對提取的消息內容進行格式轉換,將消 息內容轉換爲統一的格式,再進行查驗,如全部轉換爲小 寫、半形格式,防止垃圾消息發送者對一些關鍵字進行大 、小寫或全形、半形變換來規避査驗。 步驟4〇7,用戶端識別接收到的通訊消息爲垃圾消息 〇 如果接收到的通訊消息的消息內容與預定的關鍵字列 表匹配,亦即,該消息內容中包含相對應的關鍵字,則用 戶端識別接收到的通訊消息爲垃圾消息,並對該接收到的 通訊消息進行丟棄處理。 步驟40 8,用戶端判斷接收到的通訊消息的消息內容 是否符合預定的垃圾消息識別選項。 如果接收到的通訊消息的消息內容符合預定的垃圾消 息識別選項,則執行步驟405 ;如果接收到的通訊消息的 消息內容不符合預定的垃圾消息識別選項,則執行步驟 407 〇 如果接收到的通訊消息的消息內容與預定的關鍵字列 表不匹配,亦即,該消息內容中不包含相對應的關鍵字, 則用戶端可以提取接收到的通訊消息的消息內容,並判斷 提取到的消息內容是否符合預定的垃圾消息識別選項。 需要說明的是,本發明實施例可以根據實際需要而對 各個步驟順序進行調整。上述使用預定的垃圾消息識別選 -26- 201125331 項識別垃圾消息的步驟,可以在使用預定的關鍵字列表識 別垃圾消息的步驟之前執行;上述使用預定的關鍵字列表 和預定的垃圾消息識別選項識別垃圾消息的步驟,也可以 在使用發送方資訊識別垃圾消息的步驟之前執行。 本發明包括以下優點,透過用戶端根據發送的通訊消 息中的接收方資訊,更新自身的所預期的發送方資訊,並 根據更新後的所預期的發送方資訊判斷通訊消息的發送方 是否爲接收方的所預期的用戶,以及根據預定的關鍵字列 表和垃圾消息識別選項進行垃圾消息的識別,降低了對垃 圾消息的漏判率和誤判率,提高了識別垃圾消息的準確率 ,進而增強了資訊過濾的效果。當然,實施本發明的任一 產品並不一定需要同時達到以上所述的所有優點。 需要說明的是,上述根據預定的關鍵字列表、正則運 算式和垃圾消息識別選項進行垃圾消息的識別的流程,可 以由系統伺服器執行,也可以由系統伺服器和用戶端組合 執行。上述根據預定的關鍵字列表、正則運算式和垃圾消 息識別選項進行垃圾消息的識別的流程,與根據發送方資 訊進行垃圾消息的識別的流程之間,沒有先後順序之分。 如圖5所示’爲本發明實施例五中的一種垃圾消息的 識別方法流程圖,包括以下步驟: 步驟5 0 1 ’系統伺服器設定黑名單列表和白名單列表 〇 系統伺服器識別垃圾消息的過程中,可以將被識別出 的垃圾消息的發送方資訊添加到黑名單列表中,還可以將 -27- 201125331 被識別出的正常消息的發送方資訊添加到白名單列表中。 系統伺服器可以週期性地或事件觸發性地向系統中的用戶 端公佈黑名單列表和白名單列表,供用戶端識別垃圾消息 。黑名單列表中的發送方資訊包括垃圾消息發送方的名稱 、ID和位址等內容,白名單列表中的發送方資訊包括正常 消息發送方的名稱、ID和位址等內容。 用戶端識別垃圾消息的過程中,也可以將被識別出的 垃圾消息的發送方資訊添加到黑名單列表中,將被識別出 的正常消息的發送方資訊添加到白名單列表中,並將上述 黑名單列表和白名單列表週期性地或事件觸發性地上傳到 系統伺服器,以供系統伺服器識別垃圾消息。 需要說明的是,上述黑名單列表和白名單列表也可以 單獨設定,亦即,系統伺服器可以只設定黑名單列表或只 設定白名單列表。系統伺服器只設定黑名單列表時,可以 只根據黑名單列表來識別垃圾消息;系統伺服器只設定白 名單列·表時,可以只根據白名單列表來識別垃圾消息。 步驟5 02,系統伺服器接收發送用戶端發送的通訊消 息。 通訊消息由發送用戶端發送,經系統伺服器而被轉發 到接收用戶端,可以包括發送時間、發送方資訊、接收方 資訊和消息內容等部分。 步驟503,系統伺服器提取接收到的通訊消息中的發 送方資訊。 系統伺服器接收到發送用戶端發送的通訊消息後,可 -28- 201125331 以不立即將該通訊消息轉發到指定的用戶端,而是提取該 通訊消息中的發送方資訊,以進行垃圾消息的識別。發送 方資訊可以包括發送方名稱、發送方ID和發送方位址等內 容。 步驟504,系統伺服器判斷提取到的發送方資訊是否 記錄在白名單列表中。 如果提取到的發送方資訊被記錄在白名單列表中,則 執行步驟505;如果提取到的發送方資訊沒有被記錄在白 名單列表中,則執行步驟506。 系統伺服器提取接收到的通訊消息中的發送方資訊後 ,可以從本地或用戶端獲取白名單列表,並判斷提取到的 發送方資訊是否被記錄白名單列表中。 步驟5 05,系統伺服器識別接收到的通訊消息爲正常 消息。 如果提取到的發送方資訊被記錄在白名單列表中,則 系統伺服器判斷接收到的通訊消息的發送方是所預期的發 送方,識別接收到的通訊消息爲正常消息,並將該識別爲 正常消息的通訊消息轉發給接收用戶端。 接收用戶端可以按照接收到的通訊消息進行相對應的 操作,不再對該通訊消息進行識別;也可以繼續對該通訊 消息進行識別。接收用戶端的識別方法可以包括使用預定 的關鍵字列表、正則運算式和垃圾消息識別選項中的至少 一者進行識別。 步驟506,系統伺服器判斷提取到的發送方資訊是否 ~ 29 - 201125331 被記錄在黑名單列表中。 如果提取到的發送方資訊被記錄在黑名單列表中,則 執行步驟507 ;如果提取到的發送方資訊沒有被記錄在黑 名單列表中,則執行步驟508。 系統伺服器提取接收到的通訊消息中的發送方資訊後 ,可以從本地或用戶端獲取黑名單列表,並判斷提取到的 發送方資訊是否被記錄黑名單列表中。 步驟507,系統伺服器識別接收到的通訊消息爲垃圾 消息。 如果提取到的發送方資訊被記錄在黑名單列表中,則 系統伺服器識別接收到的通訊消息爲垃圾消息,並將該識 別爲垃圾消息的通訊消息進行丟棄處理。 步驟50 8,系統伺服器使用預定的關鍵字列表而對接 收到的通訊消息的消息內容進行匹配,並獲取與消息內容 匹配的關鍵字的分値。 系統伺服器對接收到的通訊消息提取消息內容,對照 預定的關鍵字列表,查驗該消息內容中是否包含關鍵字列 表中相對應的關鍵字,如果查驗到該消息內容中包含相對 應的關鍵字,則判斷該消息內容與預定的關鍵字列表匹配 ,並獲取與該消息內容匹配的關鍵字的分値。與消息內容 匹配的正則運算式爲一個以上時,系統伺服器可以獲取所 有與消息內容匹配的正則運算式的分値。 系統伺服器還可以對提取的消息內容進行格式轉換’ 將消息內容轉換爲統一的格式,再進行查驗’如全部轉換 -30- 201125331 爲小寫、半形格式,防止垃圾消息發送者對一些關鍵字進 行大、小寫或全形、半形變換來規避查驗。 步驟509,系統伺服器使用預定的正則運算式,對接 收到的通訊消息的消息內容進行匹配,並獲取與消息內容 匹配的正則運算式的分値。 系統伺服器可以透過特定的正則運算式’判斷接收到 的通訊消息的消息內容中是否包含特定的關鍵特徵’如果 該消息內容中包含該關鍵特徵,則該消息內容與該關鍵特 徵對應的正則運算式匹配。 正則運算式可以由用戶個性化設定’也可以由系統伺 服器下發到各個用戶端。每個正則運算式對應預定的分値 ,用以標示該正則運算式從消息內容中辨別的關鍵特徵出 現在垃圾消息中的可能性,不同關鍵字的分値可以相同或 不同。與消息內容匹配的正則運算式爲一個以上時,系統 伺服器可以獲取所有與消息內容匹配的正則運算式的分値 〇 需要說明的是,本步驟與步驟508的執行順序沒有先 後之分,亦即,系統伺服器可以在使用預定的關鍵字列表 而對接收到的通訊消息的消息內容進行匹配之前或之後’ 使用預定的正則運算式,對接收到的通訊消息的消息內容 進行匹配。 步驟5 1 0,系統伺服器根據與消息內容匹配的關鍵字 和正則運算式的分値,以獲取該消息內容的匹配總分値。 系統伺服器將所有與消息內容匹配的關鍵字的分値和 -31 - 201125331 正則運算式的分値相加’即可獲取該消息內容的匹配總分 値。 步驟5 1 1,系統伺服器判斷消息內容的匹配總分値是 否大於或等於預定的閾値。 如果消息內容的匹配總分値大於或等於預定的閩値’ 則執行步驟507 ;如果消息內容的匹配總分値小於預定的 閾値,則執行步驟5 05。 上述閾値可以設定爲固定値,也可以根據通訊消息的 長度而動態地設定,亦即,不同長度的通訊消息對應不同 的閾値。 需要說明的是,本發明實施例可以根據實際需要而對 各個步驟順序進行調整。上述使用預定的關鍵字列表和預 定的正則運算式來識別垃圾消息的步驟,也可以在使用黑 名單列表和白名單列表來識別垃圾消息的步驟之前執行。 本發明包括以下優點,由系統伺服器根據設定的黑名 單列表和白名單列表’以及預定的關鍵字列表和正則運算 式進行垃圾消息的識別’降低了對垃圾消息的漏判率和誤 判率’提高了識別垃圾消息的準確率,進而增強了資訊過 濾的效果。當然’實施本發明的任一產品並不一定需要同 時達到以上所述的所有優點。 如圖ό所示’爲本發明實施例六中的一種垃圾消息的 識別方法流程圖,包括以下步驟: 步驟601 ’發送用戶端判斷待發送的通訊消息的消息 內容是否與預定的關鍵字列表匹配。 -32· 201125331 如果待發送的通訊消息的消息內容與預定的關鍵字歹u 表匹配,則執行步驟602 ;如果待發送的通訊消息的消息 內容與預定的關鍵字列表不匹配,則執行步驟603。 發送用戶端對待發送的通訊消息提取消息內容,對照 預定的關鍵字列表,查驗該消息內容中是否包含關鍵字列 表中相對應的關鍵字,如果査驗到該消息內容中包含相對 應的關鍵字,則該消息內容與預定的關鍵字列表匹配;如 果查驗到該消息內容中不包含相對應的關鍵字,則該消息 內容與預定的關鍵字列表不匹配。 發送用戶端還可以對提取的消息內容進行格式轉換, 將消息內容轉換爲統一的格式,再進行査驗,如全部轉換 爲小寫、半形格式,防止垃圾消息發送者對一些關鍵字進 行大、小寫或全形、半形變換來規避查驗。 步驟602,發送用戶端暫時識別待發送的通訊消息爲 垃圾消息。 如果待發送的通訊消息的消息內容與預定的關鍵字列 表匹配,亦即,該消息內容中包含相對應的關鍵字,則發 送用戶端暫時識別待發送的通訊消息爲垃圾消息,並將該 通訊消息發送到系統伺服器,由系統伺服器透過判斷該通 訊消息的發送方是否是所預期的用戶來進行進一步確認。 做爲步驟602的替代方案,如果待發送的通訊消息的 消息內容與預定的關鍵字列表匹配,亦即,該消息內容中 包含相對應的關鍵字,則發送用戶端可以直接識別待發送 的通訊消息爲垃圾消息,並對該通訊消息進行丟棄處理° -33- 201125331 步驟6 03,發送用戶端將待發送的通訊消息發送給系 統伺服器。 如果待發送的通訊消息的消息內容與預定的關鍵字列 表不匹配,亦即,該消息內容中不包含相對應的關鍵字, 則發送用戶端將該待發送的通訊消息發送給系統伺服器, 由系統伺服器繼續判斷該通訊消息是否爲垃圾消息。 步驟604,系統伺服器判斷接收到的通訊消息的消息 內容是否符合預定的垃圾消息識別選項。 如果接收到的通訊消息的消息內容符合預定的垃圾消 息識別選項,則執行步驟606 ;如果接收到的通訊消息的 消息內容不符合預定的垃圾消息識別選項,則執行步驟 605 » 系統伺服器接收到發送用戶端發送的通訊消息後,可 以提取接收到的通訊消息的消息內容,並判斷提取到的消 息內容是否符合預定的垃圾消息識別選項。 步驟605,系統伺服器暫時識別接收到的通訊消息爲 垃圾消息。 如果接收到的通訊消息的消息內容不符合預定的垃圾 消息識別選項,則系統伺服器暫時識別接收到的通訊消息 爲垃圾消息,並將該通訊消息發送到接收用戶端,由接收 用戶端透過判斷該通訊消息的發送方是否是所預期的用戶 來進行進一步確認。 做爲步驟605的替代方案,如果接收到的通訊消息的 消息內容不符合預定的垃圾消息識別選項,系統伺服器可 -34- 201125331 以直接識別接收到的通訊消息爲垃圾消息,並對該垃圾消 息做丟棄處理。 步驟606,系統伺服器將接收到的通訊消息轉發給接 收用戶端。 如果接收到的通訊消息的消息內容符合預定的垃圾消 息識別選項,則系統伺服器將該接收到的通訊消息轉發給 接收用戶端,由接收用戶端繼續識別該通訊消息是否爲垃 圾消息。 步驟607,接收用戶端提取接收到的通訊消息中的發 送方資訊。 接收用戶端接收到系統伺服器轉發的通訊消息後,可 以提取該通訊消息中的發送方資訊,以進行垃圾消息的識 別。發送方資訊可以包括發送方名稱、發送方ID和發送方 位址等內容。 步驟608,接收用_戶端根據提取的發送方資訊,判斷 接收到的通訊消息的發送方是否爲所預期的發送方。 如果接收到的通訊消息的發送方是所預期的發送方, 則執行步驟609 ;如果接收到的通訊消息的發送方不是所 預期的發送方,則執行步驟6 1 0。 系統中的用戶端也可以將自身的好友用戶資訊和自身 主動聯繫過的非好友用戶資訊,週期性地或事件觸發性地 上傳到系統伺服器。系統伺服器也可以將上述從用戶端接 收到的信息,週期性地或事件觸發性地向系統中的其他用 戶端公佈,以供其他用戶端進行垃圾消息的識別。 -35- 201125331 步驟609,接收用戶端識5 消息。 如果通訊消息的發送方是 端識別該通訊消息爲正常消息 對應的操作。 步驟610,接收用戶端識5 消息。 如果通訊消息的發送方不 用戶端識別該通訊消息爲垃圾 丟棄處理》 需要說明的是,本發明實 各個步驟順序進行調整。上述 選項識別垃圾消息的步驟,可 來識別垃圾消息的步驟之前執ί 本發明包括以下優點,先 訊消息的發送方是否爲接收方 服器根據預定的關鍵字列表, 消息識別選項來進行垃圾消息 的漏判率和誤判率,提高了識 增強了資訊過濾的效果。當然 不一定需要同時達成以上所述ί 如圖7所示,爲本發明實;ί 構示意圖,該通訊設備7〇〇,包 提取模組701,用以提取通 列接收到的通訊消息爲正常 所預期的發送方,接收用戶 ,並按照該通訊消息進行相 和接收到的通訊消息爲垃圾 是所預期的發送方,則接收 消息,並對該通訊消息進行 施例可以根據實際需要而對 使用預定的垃圾消息來識別 以在使用預定的關鍵字列表 'Ί ° 後由發送用戶端透過判斷通 的所預期的用戶、由系統伺 以及由接收用戶端根據垃圾 的識別,降低了對垃圾消息 別垃圾消息的準確率,進而 ,實施本發明的任一產品並 β所有優點》 ®例七中的一種通訊設備結 括: 訊消息中的發送方資訊。 -36- 201125331 本發明實施例中的通訊消息可以爲系統中的即時消 息、S M S消息、Μ M S消息或E - m a i 1等,通訊消息本身可以 包括發送時間、發送方資訊、接收方資訊和消息內容等部 分。通訊消息由發送用戶端發送’經系統伺服器轉發到接 收用戶端。其中,接收方資訊包括接收方名稱、接收方ID 和接收方位址等內容,發送方資訊可以包括發送方名稱、 發送方ID和發送方位址等內容。 本發明實施例中的通訊設備爲發送用戶端時’提取模 組701提取待發送的通訊消息中的發送方資訊;通訊設備 爲系統伺服器或接收用戶端時,提取模組70 1提取接收到 的通訊消息中的發送方資訊。 上述提取模組701是以上所述通訊設備700中負責提取 通訊消息中的發送方資訊的部分’可以是軟體、硬體或兩 者的結合。 第一判斷模組702,用以根據提取模組7〇1提取的發送 方資訊,判斷通訊消息的發送方是否爲所預期的發送方。 所預期的發送方包括以下用戶中的至少一者:系統用 戶、通訊消息的接收方的好友用戶和通訊消息的接收方主 動聯繫過的非好友用戶。 第一判斷模組702還用以獲取系統用戶名單、通信消 息的接收方的好友用戶名單和通訊消息接收方主動聯繫過 的非好友用戶名單;如果通訊消息的發送方資訊係記錄在 系統用戶名單、通信消息的接收方的好友用戶名單和通訊 消息接收方主動聯繫過的非好友用戶名單的任一者中’則 -37- 201125331 判斷通訊消息的發送方是所預期的發送方。 其中,系統用戶爲發送通訊消息的第三方用戶,可以 包括通訊服務提供商。系統用戶向接收方發送的消息通常 以通知或提醒的形式出現,接收方可以將系統用戶當作爲 所預期的發送方。 通訊消息的接收方的好友用戶在向接收方發送通訊消 息之前,與接收方建立好友關係,並透過接收方的身份認 證,接收方可以將自身的好友用戶當作爲所預期的發送方 0 通訊消息的接收方主動聯繫過的非好友用戶在向接收 方發送通訊消息之前,未與接收方建立好友關係,但曾接 收到該接收方發送的消息。系統中的用戶端也可以週期性 地或事件觸發性地將自身的好友用戶資訊和自身主動聯繫 過的非好友用戶資訊上傳到系統伺服器,以供系統伺服器 識別垃圾消息。 上述第一判斷模組702是以上所述通訊設備700中負責 根據提取的發送方資訊,判斷通訊消息的發送方是否爲所 預期的發送方的部分,可以是軟體、硬體或兩者的結合。 識別模組703,用以在第一判斷模組702判斷通訊消息 的發送方不是所預期的發送方時,識別該通訊消息爲垃圾 消息,或繼續對該通訊消息進行識別。 識別模組703可以在通訊消息的發送方不是所預期的 發送方時,識別該通訊消息爲垃圾消息,對該通訊消息進 行丟棄處理,並將該通訊消息的發送方加入到黑名單;也 -38- 201125331 可以繼續對該通訊消息進行識別,識別方法可以包括使用 預定的關鍵字列表、預定的正則運算式和預定的垃圾消息 識別選項中的至少一者來進行識別。 上述識別模組703是以上所述通訊設備7〇〇中負責在通 訊消息的發送方不是所預期的發送方時,識別該通訊消息 爲垃圾消息,或繼續對該通訊消息進行識別的部分,可以 是軟體、硬體或兩者的結合。 上述通訊設備700還包括·· 更新模組704,用以根據發送的通訊消息中的接收方 資訊,更新所預期的發送方資訊。 更新模組704可以確定發送的通訊消息的接收方的屬 性,並在該通訊消息的接收方不是所預期的發送方時,將 該通訊消息的接收方資訊添加到所預期的發送方資訊中》 具體地說,更新模組704可以將該通訊消息的接收方 設定爲主動聯繫過的非好友用戶,並記錄該通訊消息的接 收方資訊,該接收方資訊包括:接收方名稱、接收方ID、 接收方地址和最新聯繫時間等內容。其中,最新聯繫時間 爲用戶端向該接收方發送通訊消息的時間。 上述更新模組704是以上所述通訊設備700中負責根據 發送的通訊消息中的接收方資訊,更新所預期的發送方資 訊的部分,可以是軟體、硬體或兩者的結合。 設定模組705,用以設定黑名單列表和/或白名單列表 ,該黑名單列表中包括被識別出的垃圾消息的發送方資訊 ,該白名單列表中包括被識別出的正常消息的發送方資訊 -39- 201125331 設定模組705可以將被識別出的垃圾消息的發 訊添加到黑名單列表中,還可以將被識別出的正常 發送方資訊添加到白名單列表中。黑名單列表中的 資訊包括垃圾消息發送方的名稱' ID和位址等內容 單列表中的發送方資訊包括正常消息發送方的名稱 位址等內容。 上述設定模組705是以上所述通訊設備7〇〇中負 黑名單列表和/或白名單列表的部分’可以是軟體 或兩者的結合。 上述第一判斷模組702,還用以在通訊消息的 資訊記錄在黑名單列表中時,判斷通訊消息的發送 所預期的發送方;和/或 在通訊消息的發送方資訊記錄在白名單列表中 斷通訊消息的發送方是所預期的發送方° 上述通訊設備700,還包括: 第二判斷模組706,用以判斷通訊消息的消息 否與預定的關鍵字列表匹配。 關鍵字列表可以包含各種用於宣傳廣告資訊、 播流言資訊和不文明資訊的垃圾消息中常用的關鍵 如,“服裝大拍賣”、“轉讓門面店”、“中獎” 匯款”等用詞,還可以包含一些常用的英文廣告詞 不文明用語等。關鍵字列表可以由用戶個性化《設定 以由系統伺服器下發到各個用戶端。 送方資 消息的 發送方 ,白名 、ID和 責設定 、硬體 發送方 方不是 時,判 內容是 惡意傳 字,例 、“請 、英文 ,也可 -40- 201125331 第二判斷模組7 06將通訊消息中的消息內容,對照預 定的關鍵字列表,查驗該消息內容中是否包含關鍵字列表 中相對應的關鍵字,如果查驗到該消息內容中包含相對應 的關鍵字,則判斷該消息內容與預定的關鍵字列表匹配; 如果查驗到該消息內容中不包含相對應的關鍵字,則判斷 該消息內容與預定的關鍵字列表不匹配。 第二判斷模組706還可以對提取的消息內容進行格式 轉換,將消息內容轉換爲統一的格式,再進行查驗,如全 部轉換爲小寫、半形格式,防止垃圾消息發送者對一些關 鍵字進行大、小寫或全形、半形變換來規避查驗。 上述第二判斷模組706是以上所述通訊設備700中負責 判斷通訊消息的消息內容是否與預定的關鍵字列表匹配的 部分,可以是軟體、硬體或兩者的結合。 上述識別模組703,還用以在第二判斷模組706判斷通 訊消息的消息內容與預定的關鍵字列表匹配時,識別該通 訊消息爲垃圾消息。 上述通訊設備700,還包括: 第三判斷模組707,用以判斷通訊消息的消息內容是 否符合預定的垃圾消息識別選項。 預定的垃圾消息識別選項可以包括以下內容中的至少 一項:a、不允許出現電話號碼:b、不允許出現網路鏈結 ;c、不允許出現IM號碼;d、不允許出現圖片。用戶可以 根據自身的需求,個性化設定上述垃圾消息識別選項。 上述第三判斷模組是以上所述通訊設備7〇〇中負責 -41 - 201125331 判斷通訊消息的消息內容是否符合預定的垃圾消息識別選 項的部分,可以是軟體、硬體或兩者的結合。 上述識別模組703,還用以在第三判斷模組707判斷通 訊消息的消息內容不符合預定的垃圾消息識別選項時,識 別該通訊消息爲垃圾消息。 上述通訊設備700,還包括: 第一匹配模組708,用以使用預定的關鍵字列表,對 通訊消息的消息內容進行匹配,獲取與該消息內容匹配的 關鍵字的分値。 關鍵字列表用以判斷通訊消息中是否包含垃圾消息的 特徵詞語,可以由用戶個性化設定,也可以由系統伺服器 下發到各個用戶端。每個關鍵字對應預定的分値,用於標 示該關鍵字出現在垃圾消息中的可能性,不同關鍵字的分 値可以相同或不同。 上述第一匹配模組708是以上所述通訊設備700中負責 使用預定的關鍵字列表,對通訊消息的消息內容進行匹配 ,獲取與該消息內容匹配的關鍵字的分値的部分,可以是 軟體、硬體或兩者的結合。 第二匹配模組709,用以使用預定的正則運算式,對 通訊消息的消息內容進行匹配,獲取與該消息內容匹配的 正則運算式的分値。 正則運算式用於從消息內容中辨別某些關鍵特徵,例 如電話號碼、網路鏈結或號碼等。不同的正則運算式對 應不同的關鍵特徵,第二匹配模組709可以透過特定的正 -42- 201125331 則運算式,判斷接收到的通訊消息的消息內容中是否包含 特定的關鍵特徵,如果該消息內容中包含該關鍵特徵,則 該消息內容與該關鍵特徵對應的正則運算式匹配。 正則運算式可以由用戶個性化設定,也可以由系統伺 服器下發到各個用戶端。每個正則運算式對應預定的分値 ,用以標示該正則運算式從消息內容中辨別的關鍵特徵出 現在垃圾消息中的可能性,不同關鍵字的分値可以相同或 不同。與消息內容匹配的正則運算式爲一個以上時,第二 匹配模組709可以獲取所有與消息內容匹配的正則運算式 的分値。 上述第二匹配模組709是以上所述通訊設備700中負責 使用預定的正則運算式,對通訊消息的消息內容進行匹配 ,獲取與該消息內容匹配的正則運算式的分値的部分,可 以是軟體、硬體或兩者的結合。 獲取模組710,用以根據第一匹配模組7〇8和第二匹配 模組709獲取的與消息內容匹配的關鍵字和正則運算式的 分値,獲取該消息內容的匹配總分値。 獲取模組7 1 0將所有與消息內容匹配的關鍵字的分値 和正則運算式的分値相加,即可獲取該消息內容的匹配總 分値。 上述獲取模組710是以上所述通訊設備7〇〇中負責根據 與消息內容匹配的關鍵字和正則運算式的分値’獲取該消 息內容的匹配總分値的部分,可以是軟體、硬體或兩者的 結合。 -43- 201125331 第四判斷模組7 1 1,用以判斷獲取模組7 1 0獲取的消息 內容的匹配總分値是否大於或等於預定的閾値。 上述閩値可以設定爲固定値,也可以根據通訊消息的 長度而動態地設定’亦即,不同長度的通訊消息對應不同 的閾値。 上述第四判斷模組7 1 1是以上所述通訊設備700中負責 判斷消息內容的匹配總分値是否大於或等於預定的閩値的 部分,可以是軟體 '硬體或兩者的結合。 上述識別模組7〇3,還用以在第四判斷模組7 1 1判斷消 息內容的匹配總分値大於或等於預定的閾値時,識別該通 訊消息爲垃圾消息。 上述模組可以分佈於一個裝置,也可以分佈於多個裝 置。上述模組可以合倂爲一個模組,也可以進一步拆分成 多個子模組。 本發明包括以下優點,透過判斷通訊消息的發送方是 否爲所預期的用戶,進行垃圾消息的識別,降低了對垃圾 消息的漏判率和誤判率,提高了識別垃圾消息的準確率, 進而增強了資訊過濾的效果。當然,實施本發明的任一產 品並不一定需要同時達成以上所述的所有優點。 如圖8所示,爲本發明實施例八中的一種通訊系統結 構示意圖,包括: 第一通訊設備810,用以提取通訊消息中的發送方資 訊,根據提取的發送方資訊,判斷通訊消息的發送方是否 爲所預期的發送方,在該通訊消息的發送方不是所預期的 -44- 201125331 發送方時’將該通訊消息標記爲待識別消息,透過網路向 第二通訊設備820轉發標記後的通訊消息。 通訊消息中的發送方資訊可以包括發送方名稱、發送 方ID和發送方位址等內容,所預期的發送方包括以下用戶 中的至少一者:系統用戶、通訊消息的接收方的好友用戶 和通訊消息的接收方主動聯繫過的非好友用戶。 第二通訊設備8 20,用以根據以下內容中的至少一者 ,對接收到的來自第一通訊設備8 1 0的通訊消息進行識別 預定的關鍵字列表、預定的正則運算式和預定的垃圾 消息識別選項。 關鍵字列表可以包含各種用於宣傳廣告資訊、惡意傳 播流言資訊和不文明資訊的垃圾消息中常用的關鍵字,可 以由用戶個性化設定,也可以由系統伺服器下發到各個用 戶端。 正則運算式用以從消息內容中辨別某些關鍵特徵,例 如電話號碼、網路鏈結或IM號碼等。不同的正則運算式對 應不同的關鍵特徵,可以透過特定的正則運算式,判斷接 收到的通訊消息的消息內容中是否包含特定的關鍵特徵, 如果該消息內容中包含該關鍵特徵,則該消息內容與該關 鍵特徵對應的正則運算式匹配。正則運算式可以由用戶個 性化設定,也可以由系統伺服器下發到各個用戶端。每個 正則運算式對應預定的分値,用以標示該正則運算式從消 息內容中辨別的關鍵特徵出現在垃圾消息中的可能性,不 -45- 201125331 同關鍵字的分値可以相同或不同。與消息內容匹配的正則 運算式爲一個以上時,接收用戶端可以獲取所有與消息內 容匹配的正則運算式的分値。 預定的垃圾消息識別選項可以包括以下內容中的至少 •一項:a、不允許出現電話號碼;b、不允許出現網路鏈結 ;c、不允許出現IM號碼;d、不允許出現圖片。用戶可以 根據自身的需求,個性化設定上述垃圾消息識別選項。 上述第一通訊設備810,還用以在通訊消息的發送方 是所預期的發送方時,將該通訊消息標記爲正常消息,透 過網路向該第二通訊設備轉發標記後的通訊消息。 上述第一通訊設備810、第二通訊設備820可以分別爲 發送用戶端和系統伺服器,也可以分別爲系統伺服器和接 收用戶端。 上述第二通訊設備820,還用以將識別後的通訊消息 標記爲待識別消息,透過網路向第三通訊設備830轉發標 記後的通訊消息。 上述通訊系統,還包括: 第三通訊設備830,用以對接收到的來自第二通訊設 備82 0的通訊消息採用預定的關鍵字列表、預定的正則運 算式和預定的垃圾消息識別選項中的至少一者來進行識別 〇 上述第一通訊設備810、第二通訊設備8 20和第三通訊 設備830,可以分別爲發送用戶端、系統伺服器和接收用 戶端。 -46- 201125331 本發明包括以下優點,透過判斷通訊消息的發送方是 否爲所預期的用戶,以進行垃圾消息的識別,降低了對垃 圾消息的漏判率和誤判率,提高了識別垃圾消息的準確率 ,進而增強了資訊過濾的效果。當然,實施本發明的任一 產品並不一定需要同時達成以上所述的所有優點。 爲了描述的方便,以上所述通訊設備的各部分以功能 分爲各種模組分別描述。當然,在實施本發明時可以把各 模組的功能在同一個或多個軟體或硬體中實現。 透過以上的實施方式的描述,本領域的技術人員可以 清楚地瞭解到本發明可以透過硬體實現,也可以借助軟體 加必要的通用硬體平臺的方式來實現。基於這樣的理解, 本發明的技術方案可以以軟體產品的形式而體現出來,該 軟體產品可以儲存在一個非揮發性儲存媒體(可以是CD-ROM,U碟’移動式硬碟等)中,包括許多指令用以使得 一台電腦設備(可以是個人電腦,伺服器,或者網路設備 等)執行本發明各個實施例所述的方法。 本領域技術人員可以理解附圖只是一個較佳實施例的 示意圖,附圖中的模組或流程並不一定是實施本發明所必 須的。 本領域技術人員可以理解實施例中的裝置中的模組可 以按照實施例描述進行分佈於實施例的裝置中,也可以進 行相應變化位於不同於本實施例的一個或多個裝置中。上 述實施例的模組可以合倂爲一個模組,也可以進一步拆分 成多個子模組。 -47- 201125331 上述本發明實施例序號僅僅爲了描述,不代表實施例 的優劣。 以上所揭示者僅爲本發明的幾個具體實施例,但是, 本發明並非局限於此,任何本領域的技術人員能思之的變 化都應落入本發明的申請專利範圍中。 【圖式簡單說明】 爲了更清楚地說明本發明或現有技術的技術方案,下 面將對本發明或現有技術描述中所需要使用的附圖作簡單 地介紹,顯而易見地,下面描述中的附圖僅僅是本發明的 —些實施例,對於本領域普通技術人員來講,在不付出創 造性勞動的前提下,還可以根據這些附圖獲得其他的附圖 〇 圖1爲本發明實施例一中的一種垃圾消息的識別方法 流程圖; 圖2爲本發明實施例二中的一種垃圾消息的識別方法 流程圖; 圖3爲本發明實施例三中的一種垃圾消息的識別方法 流程圖; 圖4爲本發明實施例四中的一種垃圾消息的識別方法 流程圖; 圖5爲本發明實施例五中的一種垃圾消息的識別方法 流程圖; 圖6爲本發明實施例六中的一種垃圾消息的識別方法 -48- 201125331 流程圖; 圖7爲本發明實施 圖8爲本發明貫施 例七中的一種通訊設備結構示意圖 例八中的一種通訊系統結構示意圖 【主要元件符號說明】 7 00 :通訊設備 701 :提取模組 702 :第一判斷模 703 :識別模組 704 :更新模組 705 :設定模組 706 :第二判斷模 7〇7 :第三判斷模 708 :第一匹配模 709 :第二匹配模 7 1 〇 :獲取模組 7 1 1 :第四判斷模 810 :第一通訊設 820 :第二通訊設 8 3 0 :第三通訊設 組 組 組 組 組 組 備 備 備 -49-201125331 VI. Description of the Invention: [Technical Field] The present invention relates to the field of network communication technologies, and in particular, to a method, device and system for identifying a garbage message. [Prior Art] With the development of communication technology, the invention and use of various communication systems greatly facilitates mutual communication between users. Two or more clients can connect via a communication network to instantly deliver text, files, voice and video information. However, there are a lot of useless and even fraudulent spam messages in a large number of communication messages, which not only imposes an extra burden on the servers in the communication system, but also has extremely bad effects on users at the user end. . After receiving the spam message, the user needs to spend a lot of time to confirm whether the received message is useful and delete the spam message that is confirmed to be useless, and even miss the reception of the normal message due to the aggregation of a large number of spam messages, which seriously interferes with the user. At the same time, a large amount of spam messages will also affect the reputation of the instant messaging system operators, which will bring difficulties to the network supervision, and even cause the operators' production environment to linger in a short period of time, causing operators to suffer huge amounts of money. Economic losses. In the prior art, the received communication message is usually filtered by using a predetermined keyword to identify the spam message. The specific steps include: firstly, through analyzing the large amount of spam messages, summarizing some keywords commonly used in the spam message. Form a keyword library, placed in the instant messaging system server -5, 201125331 or the user side, the keywords can be "winning", "quote clothing auction", "remittance", "transfer" and other words, to a large extent The information indicates that the information is spam for the purpose of publicity, advertisement, fraud, etc. Secondly, after receiving the communication message, the message content of the communication message is extracted for inspection; finally, the received keyword is checked against the keyword library. Whether the information content of the communication message contains a keyword stored in the keyword library, and if the content of the message contains a keyword, determining that the communication message is a spam message, discarding the communication message; if the message content does not contain a key Word, it is determined that the communication message is a normal message. In addition, a predetermined regular expression is also used (regu Lar expression ) Matches a format in the message content of the received communication message to identify spam messages. The regular expression describes a pattern of string matching that can be used to check if a string contains a substring Substituting the matched substrings or extracting a substring corresponding to a certain condition from a string. The regular expression determines whether there are matching key features in the message content, such as a web address, a phone number, and an instant messaging link. The information such as the number, if there is a matching key feature in the message content, the communication message is determined to be a spam message. The prior art may also combine the two methods for identifying the spam message, using a predetermined keyword and a regular expression. The received communication message is filtered to identify the spam message. In the process of implementing the present invention, the inventor finds that the prior art has at least the following problems: In the prior art, the message is identified by matching the message content to identify the garbage-6-201125331 message. The accuracy rate depends on the setting of the keyword and the regular expression. Appropriate. Keyword and regular expressions are generally based on experience or selected from messages that have been marked as spam, with some randomness, unable to identify spam messages that are not in the keyword range or have a low frequency of keywords. Messages that meet some of the features of spam, but are not spam, are easily misidentified as spam. For example, some spam messages contain "video chat" words, if "video chat" is set as a keyword to identify Spam messages, the normal message containing the "video chat" that the user wants to invite others to video chat will be misidentified as spam. Therefore, the existing spam identification method only mechanically recognizes the message content. Without considering the scene of message transmission, there is a large false positive rate and missed rate for the recognition result of spam. SUMMARY OF THE INVENTION The present invention provides a method, device, and system for identifying spam messages, which improve the accuracy of identifying spam messages. The present invention provides a method for identifying a spam message, including: extracting sender information in a communication message; determining, according to the extracted sender information, whether the sender of the communication message is an intended sender, if The sender of the communication message is not the intended sender' then identifies the communication message as a spam message or continues to identify the communication message. The present invention further provides a communication device' including: an extraction module for extracting sender information in a communication message; 201125331 a first determining module, configured to determine the communication according to the sender information extracted by the extraction module Whether the sender of the message is the expected sender t identification module, for identifying that the communication message is garbage when the first determining module determines that the sender of the communication message is not the intended sender Message, or continue to identify the communication message. The present invention further provides a communication system, comprising: a first communication device, configured to extract sender information in a communication message, and determine, according to the extracted sender information, whether the sender of the communication message is an intended sender And when the sender of the communication message is not the intended sender, marking the communication message as a to-be-identified message, forwarding the marked communication message to the second communication device; and the second communication device is configured to At least one of the content identifies the received communication message from the first communication device: a predetermined keyword list, a predetermined regular expression, and a predetermined spam identification option. The present invention includes the following advantages: by judging whether the sender of the communication message is the intended user to identify the spam message, the leakage rate and the false positive rate of the spam message are reduced, and the accuracy of identifying the spam message is improved. This enhances the effectiveness of information filtering. Of course, implementing any of the products of the present invention does not necessarily require all of the advantages described above to be achieved at the same time. [Embodiment] The main idea of the present invention includes: extracting a sender's information in a communication message - 201125331; and determining, according to the extracted sender information, whether the sender of the communication message is the intended sender; if the message is If the sender is the intended sender, the communication message is identified as a normal message; if the sender of the communication message is not the intended sender, the communication message is identified as a spam message, or the received communication message is continuously identified. . In the embodiment of the present invention, the sending user end may perform the foregoing method for identifying the spam message, and the communication message to be sent is identified; or the system server may perform the foregoing method for identifying the spam message, and perform the relaying communication message. Identifying; the receiving user may perform the foregoing method for identifying the spam message to identify the received communication message. The above object of the present invention can be achieved by performing the above-described method for identifying a spam message on the transmitting client, the system server, or the receiving client, and the same effect on the identification of the spam message. The technical solutions in the present invention will be clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative work are within the scope of the present invention. As shown in FIG. 1 , a flowchart of a method for identifying a spam message according to a first embodiment of the present invention includes the following steps: Step 101: Extract sender information in a communication message. The communication message in the embodiment of the present invention may be an instant message, an SMS (newsletter service) message, an MMS (multimedia message service) message or an E-mail (email) in an IM (instant messaging) system, and the communication message itself-9 - 201125331 can include the sending time, sender information, recipient information and message content. The communication message is sent by the sending client and forwarded to the receiving client via the system server. The recipient information includes the recipient name, the recipient ID (identity identification number), and the receiving address. The system server or the receiving client may extract the sender information in the communication message after receiving the communication message, and the sender information may include the sender name, the sender ID, and the sending address. Step 102: Determine, according to the extracted sender information, whether the sender of the communication message is the intended sender. If the sender of the communication message is the intended sender, then step 103 is performed; if the sender of the communication message is not the intended sender, then step 104 is performed. The intended sender includes at least one of the following users: the system user, the buddy user of the recipient of the communication message, and the non-friend user actively contacted by the recipient of the communication message. Determining whether the sender of the communication message is the intended sender, including: obtaining a list of system users, a list of buddy users of the recipient of the communication message, and a list of non-buddy users actively contacted by the recipient of the communication message; if the sender of the communication message The information record is in the system user list, the friend user list of the recipient of the communication message, and the non-friend user list actively contacted by the communication message receiver, and then the sender of the communication message is determined to be the intended sender. The system user is a third-party user who sends a communication message, and may include a communication service provider. The message sent by the system user to the receiver is usually -10- 201125331 in the form of a notification or reminder, and the receiver can use the system user as the intended sender. The system user list can be stored in the system server. When the user side identifies the spam message, the system server can query the system server for the system user list or the system user list from the system server. The buddy user of the recipient of the communication message establishes a buddy relationship with the recipient before transmitting the communication message to the recipient, and through the identity authentication of the recipient, the recipient can use his or her buddy user as the intended sender. The buddy user list of the receiver of the communication message may be stored in the receiver user end. When the system server identifies the spam message, the system server may query the receiver user's buddy user list or receive the receiver from the receiver user end. A list of friend users. The non-friend user actively contacting the recipient of the communication message does not establish a friendship with the recipient before sending the communication message to the recipient, but has received the message sent by the receiver. The non-friend user list actively contacted by the receiver of the communication message may be stored in the receiver user terminal, and the receiver user terminal may periodically upload the friend information of the friend and the non-friend user information actively contacted by the receiver periodically or event-triggered. To the system server, you can also accept queries from the system server or other clients for the system server and other clients to identify spam. In step 103, the communication message is identified as a normal message. If the sender of the communication message is the intended sender, the communication message is identified as a normal message and the communication message is processed in accordance with normal procedures. After the system server recognizes that the received communication message is a normal message, the communication message can be forwarded normally; after receiving the -11 - 201125331 communication message received by the user terminal, the corresponding message can be corresponding to the communication message. The operation, and the sender of the communication message is added to the whitelist. In step 104, the communication message is identified as a spam message, or the message is continuously identified. If the sender of the communication message is not the intended sender', then the communication message may be identified as a spam message', the communication message is discarded, and the sender of the communication message is added to the blacklist; the communication may continue The message is identified, and the identifying method can include identifying using at least one of a predetermined keyword list, a predetermined regular expression, and a predetermined spam identification option. It should be noted that, in the embodiment of the present invention, the order of each step may be adjusted according to actual needs. The step of identifying the spam message using at least one of the predetermined keyword list, the predetermined regular expression, and the predetermined spam identification option may also be performed prior to the step of identifying the spam message using the sender information. The present invention includes the following advantages: by judging whether the sender of the communication message is the intended user's to identify the spam message, the leakage rate and the false positive rate of the spam message are reduced, and the accuracy of identifying the spam message is improved' This enhances the effectiveness of information filtering. Of course, implementing any of the products of the present invention does not necessarily require all of the advantages described above to be achieved at the same time. As shown in FIG. 2, a flowchart of a method for identifying a spam message according to a second embodiment of the present invention includes the following steps: Step 2:1, the sending client extracts the sender information in the communication message to be sent. -12- 201125331 After the sending client obtains the communication message to be sent, the communication message may not be sent to the designated client immediately, but the sender information in the communication message is extracted to identify the spam message. The sender information may include the sender name, sender ID, and destination address. Step 202: The sending client determines, according to the extracted sender information, whether the sender of the communication message to be sent is the intended sender. If the sender of the communication message to be sent is the intended sender, step 2 is performed. 〇3; If the sender of the communication message to be sent is not the intended sender' then step 205 is performed. The intended sender includes at least one of the following users: the system user, the buddy user of the recipient of the communication message, and the non-friend user that the recipient of the communication message has actively contacted. The client in the system can also upload its own friend user information and non-friend user information that has been actively contacted by itself to the system server periodically or event-triggered. The system server can also send the above-mentioned information received from the user terminal to other clients in the system periodically or event-triggered, or accept queries from other clients for other users to identify spam. The sending client may extract the receiving party information from the communication message to be sent, and query the receiving party's friend user information and the non-friend user information actively contacted by the receiving party from the system server according to the receiving party information, thereby determining the communication to be sent. Whether the sender of the message is the intended sender of the receiver. Step 203: The sending client identifies that the communication message to be sent is a normal message, and sends the communication message to the system server. -13- 201125331 If the sender of the communication message is the intended sender 'send the client to identify the communication message as a normal message' and send the communication message to the system server. Step 204: The system server forwards the received communication message to the receiving client, or identifies the received communication message. After receiving the communication message sent by the sender's client, the system server may extract the receiver information in the communication message, and forward the communication message to the receiver user according to the receiver information; or continue to receive the received message. The communication message is identified, and the identifying method can include identifying using at least one of a predetermined keyword list, a regular expression, and a spam identification option. Step 205: The sending client determines whether the message content of the communication message to be sent matches the predetermined keyword list. If the message content of the communication message to be sent matches the predetermined keyword list, step 206 is performed; if the message content of the communication message to be transmitted does not match the predetermined keyword list, step 207 is performed. Keyword lists can contain a variety of keywords commonly used in spam messages that promote advertising information, maliciously spread rumors, and uncivilized information, such as "clothing big auctions," "transfer facade stores," "winning", "please send money." "The words can also include some commonly used English advertising words, English uncivilized terms and so on. The keyword list can be personalized by the user or sent to each client by the system server. Sending the message content of the communication message to be sent by the client, and checking whether the content of the message contains the corresponding keyword in the key column-14-201125331 table according to the predetermined keyword list, if it is found that the content of the message is included Corresponding keywords, the message content matches the predetermined keyword list; if it is found that the message content does not contain the corresponding keyword, the message content does not match the predetermined keyword list. The sending client can also perform format conversion on the extracted message content, convert the message content into a unified format, and then perform verification, such as converting to a lowercase or half-shaped format, preventing the spammer from performing upper and lower case on some keywords. Or full-form, half-shaped transformation to avoid inspection. Step 206: The sending client identifies the communication message to be sent as a spam message. If the message content of the communication message to be sent matches the predetermined keyword list, that is, the message content includes the corresponding keyword, the sending client identifies the communication message to be sent as a spam message, and sends the message to the message. The communication message is discarded. Step 207: The sending client determines whether the message content of the communication message to be sent meets a predetermined spam identification option. If the message content of the communication message to be sent meets the predetermined spam identification option, step 203 is performed; if the message content of the communication message to be sent does not meet the predetermined spam identification option, step 206 ° is performed if the communication to be sent The message content of the message does not match the predetermined keyword list, that is, the message content does not include the corresponding keyword, and the sending client can extract the message content of the communication message to be sent, and determine the extracted message content. Compliance with the scheduled spam identification option -15- 201125331 The scheduled spam identification option may include at least one of the following: a, no phone number is allowed; b, no network link is allowed; c 'not allowed The IM number appears; d, no pictures are allowed. Users can personalize the above spam identification options according to their needs. It should be noted that the embodiments of the present invention can adjust the sequence of each step according to actual needs. The step of identifying the spam message using the predetermined spam identification option may be performed prior to the step of identifying the spam message using the predetermined keyword list; the step of identifying the spam message using the predetermined keyword list and the predetermined spam identification option, It can also be performed before the step of using the sender information to identify spam. The step of identifying a spam message using the predetermined keyword list or the predetermined spam identification option, when executed before the step of using the sender information to identify the spam message, if the message content of the communication message matches a predetermined keyword list or message If the content does not meet the predetermined spam identification option, the communication message is temporarily identified as a spam message, and then further confirmed by determining whether the sender of the communication message is the intended user. That is, if the sender of the communication message is not the intended user', the communication message is confirmed as a spam message, and if the sender of the communication message is the intended user, the communication message is changed to a normal message. As an alternative to the embodiment of the present invention, if the message content of the communication message matches the predetermined keyword list or the message content does not meet the predetermined spam identification option, the communication message may be directly identified as the spam message and the communication message is Dispose of. The present invention includes the following advantages: the transmitting user actively determines whether the sender of the communication-16-201125331 message is the intended user of the receiver, and identifies the spam according to the predetermined keyword list and the spam identification option, and reduces The missed rate and false positive rate of spam messages improve the accuracy of identifying spam messages, and thus enhance the effect of information filtering. Of course, implementing any of the products of the present invention does not necessarily require all of the advantages described above to be achieved at the same time. In the above embodiment of the present invention, the transmitting client determines whether the sender of the communication message to be transmitted is the intended user of the recipient, and combines the predetermined keyword list and the spam identification option to identify the spam message. In the embodiment of the present invention, the system server may further determine whether the sender of the received communication message is the intended user of the receiver, and the receiving user performs the spam message according to the predetermined keyword list and the regular expression. Identification. The detailed description is given below through specific embodiments. As shown in FIG. 3, a flowchart of a method for identifying a spam message according to a third embodiment of the present invention includes the following steps: Step 301: A system server receives a communication message sent by a sending client. The communication message is sent by the sending client and forwarded to the receiving client by the system server, and may include a sending time, a sender information, a receiver information, and a message content. In step 3 02, the system server extracts the sender information in the received communication message. After receiving the communication message sent by the client, the system server may not immediately forward the communication message to the designated client, but extract the sender information in the -17-201125331 communication message to identify the spam message. . The sender information can include the sender name, sender ID, and destination address. Step 303: The system server determines, according to the extracted sender information, whether the sender of the received communication message is the intended sender. If the sender of the received communication message is the intended sender, step 3 04 is performed; if the sender of the received communication message is not the intended sender, step 610 is performed. .  The intended sender includes at least one of the following users: system user, The non-friend user who has contacted the friend of the recipient of the communication message and the recipient of the communication message.  The client in the system can also send its own friend user information and non-friend user information that has been actively contacted by itself. Periodically or event-triggered upload to the system server. The system server can also receive the above information received from the user terminal. Periodically or event-triggered to other users in the system, For other users to identify spam. The system server can extract the recipient information from the received communication message. And according to the receiver information, the friend information of the receiver and the non-friend user information that the receiver actively contacts are queried. Further, it is determined whether the sender of the received communication information is the intended sender of the receiver.  Step 3 04, The system server marks the received communication message as a normal message. And forwarding the marked communication message to the receiving client.  If the sender of the received communication message is the intended sender,  The system server recognizes the communication message as a normal message. And extracting the recipient information in the communication cancellation -18-201125331 The communication message is forwarded to the recipient client according to the recipient information. The receiving client can perform corresponding operations according to the received communication message. The communication message is no longer identified.  Step 3 05, The system server marks the received communication message as a message to be identified. And forwarding the marked communication message to the receiving client.  If the sender of the received communication message is not the intended sender, The system server identifies the communication message as a message to be identified. And extracting the recipient information in the communication message, Forwarding the communication message to the recipient client according to the recipient information, The communication message is continued to be recognized by the receiving client. The method of identifying the receiving client may include using a predetermined keyword list, At least one of the regular expression and the spam identification option is identified.  Step 3 06, The receiving client uses a predetermined keyword list, Matching the message content of the received communication message, And get the score of the keyword that matches the content of the message.  The keyword list is used to determine whether the communication message contains the feature words of the spam message. Can be personalized by the user, It can also be delivered to each client by the system server. Each keyword corresponds to a predetermined score. Used to indicate the likelihood that the keyword will appear in spam. The scores of different keywords can be the same or different.  Receiving the user end extracts the message content of the received communication message, Compare the predetermined keyword list, Check if the content of the message contains the corresponding keyword in the keyword list. If it is found that the message contains the corresponding keyword, Then determining that the content of the message matches the predetermined keyword list -19- 201125331 , And get the score of the keyword that matches the content of the message. When the regular expression matching the content of the message is more than one, The receiving client can obtain all the rules of the regular expression that match the content of the message.  The receiving client can also perform format conversion on the extracted message content.  Convert the message content to a uniform format, Carry out the inspection again, If all are converted to lowercase, Half format, Prevent spammers from making large numbers of keywords, Lowercase or full shape, Half-shaped transformation to avoid inspection.  Step 3 07, The receiving client uses a predetermined regular expression, Matching the message content of the received communication message, And get the distribution of the regular expression that matches the content of the message.  Regular expressions are used to identify certain key features from the message content. Such as a phone number, Network link or IM number, etc. Different regular expressions correspond to different key features, The receiving client can pass a specific regular expression, Determining whether the message content of the received communication message contains a specific key feature, If the key feature is included in the message content, the message content matches the regular expression corresponding to the key feature.  Regular expressions can be personalized by the user's and can be delivered by the system server to each client. Each regular expression corresponds to a predetermined branch, The possibility to indicate that the key feature identified by the regular expression from the message content appears in the spam message. Different keywords may be the same or different. When the regular expression matching the content of the message is more than one, the receiving client can obtain all the regular expressions that match the content of the message. 〇 It should be noted that The execution order of this step and step 3〇6 is not preceded by -20-201125331, that is, The receiving client can use the predetermined keyword list before or after matching the message content of the received communication message.  Using a predetermined regular expression, Match the message content of the received communication message.  Step 3 08, Receiving the user's key according to the keyword and regular expression matching the content of the message, To get the matching total score of the content of the message.  The receiving client adds the scores of all the keywords matching the message content and the scores of the regular expressions. You can get the matching total score for the content of the message.  Step 3 09, The receiving client judges whether the matching total score of the message content is greater than or equal to a predetermined threshold.  If the total matching score of the message content is greater than or equal to the predetermined threshold,  Then perform step 3 1 0; If the total matching score of the message content is less than the predetermined threshold, Then proceed to step 3 1 1.  The above 闽値 can be set to fixed 値, It can also be dynamically set according to the length of the communication message. that is, Different lengths of communication messages correspond to different defects.  Step 310, The receiving client identifies the received communication message as a spam message.  If the total matching score of the message content is greater than or equal to the predetermined threshold,  Receiving the user terminal to identify the received communication message as a spam message. And the communication message is discarded.  Step 3 1 1, The receiving client recognizes that the received communication message is a normal message.  -21 - 201125331 If the total matching score of the message content is less than the predetermined threshold, Then, the receiving client identifies that the received communication message is a normal message. And follow the communication message to perform the corresponding operation.  It should be noted, The embodiments of the present invention can adjust the order of each step according to actual needs. The above steps of identifying a spam message using a predetermined keyword list and a predetermined regular expression, It can also be performed before the step of using the sender information to identify spam. When the above uses the predetermined keyword list and the predetermined regular expression to identify the spam, When executed before the step of using the sender information to identify spam, If the message content of the communication message matches the predetermined keyword list or the matching total content of the message content is greater than or equal to a predetermined threshold, Then temporarily identify the communication message as spam, It is then further confirmed by determining if the sender of the communication message is the intended user. that is, If the sender of the communication message is not the intended user, Then confirm that the communication message is spam ’ if the sender of the communication message is the intended user, Then change the communication message to a normal message. As an alternative to an embodiment of the invention, If the content of the message of the communication message matches the predetermined keyword list or the matching of the message content is greater than or equal to a predetermined threshold, It is also possible to directly identify the communication message as spam. And discarding the communication message.  The present invention includes the following advantages, Determining, by the system server, whether the sender of the communication message is the intended user of the receiver, And the receiving client identifies the spam according to the predetermined keyword list and the regular expression. Reduced the rate of missed and false positives on spam, Improve the accuracy of identifying spam messages, This enhances the effectiveness of information filtering. of course, Implementation -22- 201125331 Any of the products of the present invention does not necessarily need to achieve all of the advantages described above at the same time.  As shown in FIG. 4, FIG. 4 is a flowchart of a method for identifying a spam message according to Embodiment 4 of the present invention. Includes the following steps:  Step 401 ' The client acquires its own intended sender information.  After the client runs, it can obtain its own expected sender information from the local or system server. The intended sender of the client includes the system user, At least one of the friend user of the client and the non-friend user actively contacted by the client.  Step 402, The client sends information based on the receiver in the communication message sent by itself. Update your intended sender information.  When the client sends a communication message, The expected sender information that can be obtained according to step 401, Determine the attributes of the recipient of the communication message. If the recipient of the communication message is not the intended sender of the client, The UE adds the recipient information of the communication message to its intended sender information.  Specifically, The client can set the receiver of the communication message as a non-friend user who has actively contacted the user. And recording the recipient information of the communication message, The recipient information includes: Receiver name, Receiver ID, Receiver address and latest contact time. among them, The latest contact time is the time when the client sends a communication message to the recipient.  Step 403, The client extracts the sender information in the received communication message.  After the client receives the communication message, The -23-201125331 sender information in the communication message can be extracted. To identify spam. The sender information can include the sender name, The sender ID and the sending address and other content.  Step 404, The client is based on the extracted sender information. It is judged whether the sender of the received communication message is the intended sender.  If the sender of the received communication message is the intended sender,  Then perform step 405; If the sender of the received communication message is not the intended sender, Then step 406 is performed.  The client can send its own friend user information and non-friend user information that has been actively contacted by itself. Periodically or event-triggered upload to the system server. The system server can also receive the above information received from the client. Periodically or event-triggered to other clients in the system,  For other users to identify spam.  The client may first determine whether the sender of the communication message is a friend user of the user. If the sender is a friend user of the client,  Then determining that the sender is the intended sender; If the sender is not a friend of the client, Then continue to determine whether the sender is a system user 〇 If the sender is a system user, Then determining that the sender is the intended sender; If the sender is not a system user, Then, it is determined whether the sender is a non-friend user actively contacted by the client.  If the sender is not a non-friend user that the client has actively contacted, Then determining that the sender is not the intended sender; If the sender is a non-friend user that the user has actively contacted, Then query the latest contact time between the client and the sender, Determining whether the sender is a non-friend user actively contacted by the client within the set time of -24-201125331, that is, Whether the time interval between the latest contact time of the sender and the client and the current time exceeds a predetermined time interval Tmax.  If the time interval between the sender's latest contact time and the current time exceeds Tm ax, Then determining that the sender is not the intended sender; If the time interval between the sender's latest contact time and the current time does not exceed Tm ax, Then it is judged that the sender is the intended sender.  It should be noted, In this step, each judgment order can be adjusted according to actual needs.  Step 405, The client recognizes that the received communication message is a normal message. 〇 If the sender of the received communication message is the intended sender,  The client recognizes the communication message as a normal message. And follow the communication message to perform the corresponding operation.  Step 406, The UE determines whether the message content of the received communication message matches the predetermined keyword list.  If the message content of the received communication message matches the predetermined keyword list, Then perform step 407; If the message content of the received communication message does not match the predetermined keyword list, Then step 408 is performed.  The client extracts the message content of the received communication message. Against a predetermined list of keywords, Check if the content of the message contains the corresponding keyword in the keyword list. If it is found that the content of the message contains the corresponding keyword, Then the content of the message matches the predetermined keyword list; If it is checked that the content of the message does not contain the corresponding keyword, check -25- 201125331 The message content does not match the predetermined keyword list.  The client can also perform format conversion on the extracted message content. Convert the message content to a uniform format, Carry out the inspection again, If all are converted to lowercase, Half format, Prevent spammers from making large numbers of keywords, Lowercase or full shape, Half-shaped transformation to avoid inspection.  Step 4〇7, The client recognizes that the received communication message is a spam message. 〇 If the message content of the received communication message matches the predetermined keyword list, that is, The message contains the corresponding keywords. Then the user identifies the received communication message as a spam message. And discarding the received communication message.  Step 40 8. The client determines whether the message content of the received communication message meets the predetermined spam identification option.  If the message content of the received communication message meets the predetermined garbage message identification option, Then perform step 405; If the message content of the received communication message does not meet the predetermined spam identification option, Then perform step 407 〇 If the message content of the received communication message does not match the predetermined keyword list, that is, The corresponding content is not included in the message content.  The client can extract the message content of the received communication message. And determine whether the extracted message content meets the predetermined spam identification option.  It should be noted, The embodiments of the present invention can adjust the order of each step according to actual needs. The above steps of identifying spam using the predetermined spam identification, -26-201125331, It can be performed before the step of identifying the spam message using the predetermined keyword list; The above steps for identifying spam messages using a predetermined keyword list and a predetermined spam identification option, It can also be performed before the step of using the sender information to identify spam.  The present invention includes the following advantages, According to the receiver information in the communication message sent by the user terminal, Update your intended sender information, And determining, according to the updated sender information, whether the sender of the communication message is the intended user of the receiver, And identifying spam based on predetermined keyword lists and spam identification options. Reduced the rate of missed and false positives on spam messages, Improve the accuracy of identifying spam messages, This enhances the effectiveness of information filtering. of course, It is not necessary to achieve all of the advantages described above while implementing any of the products of the present invention.  It should be noted, According to the predetermined keyword list, Regular process and spam identification options for the process of identifying spam messages, Can be executed by the system server, It can also be executed by a combination of the system server and the client. According to the predetermined keyword list, Regular expression and garbage message identification options for the process of identifying spam messages, Between the process of identifying spam based on sender information, There is no order.  As shown in FIG. 5, FIG. 5 is a flowchart of a method for identifying a spam message according to Embodiment 5 of the present invention. Includes the following steps:  Step 5 0 1 'The system server sets the blacklist and whitelist list 〇 During the process of the system server identifying spam, The sender information of the identified spam message can be added to the blacklist. It is also possible to add the sender information of the normal message identified by -27-201125331 to the whitelist.  The system server can periodically and event-triggered the blacklist and whitelist to the users in the system. For the user to identify spam. The sender information in the blacklist includes the name of the spammer, ID and address, etc. The sender information in the whitelist includes the name of the normal sender of the message, ID and address and other content.  During the process of the client identifying spam, It is also possible to add the sender information of the identified spam message to the blacklist. Add the sender information of the recognized normal message to the whitelist. And uploading the above blacklist list and whitelist list to the system server periodically or event-triggeredly. For the system server to identify spam.  It should be noted, The above blacklist list and whitelist list can also be set separately. that is, The system server can only set a blacklist or just set a whitelist. When the system server only sets the blacklist list, You can identify spam based only on the blacklist; When the system server only sets the whitelist column and table, Spam messages can be identified based only on whitelists.  Step 5 02, The system server receives the communication message sent by the sending client.  The communication message is sent by the sending client. Forwarded to the receiving client via the system server, Can include sending time, Sender information, Receiver information and message content and other parts.  Step 503, The system server extracts the sender information in the received communication message.  After the system server receives the communication message sent by the sending client, -28-201125331 to not immediately forward the communication message to the specified client, Instead, extract the sender information in the communication message. To identify spam. The sender information can include the sender name, The sender ID and the sending address are the same.  Step 504, The system server determines whether the extracted sender information is recorded in the whitelist.  If the extracted sender information is recorded in the whitelist, Then performing step 505; If the extracted sender information is not recorded in the whitelist, Then step 506 is performed.  After the system server extracts the sender information in the received communication message, A whitelist can be obtained from the local or client. And judge whether the extracted sender information is recorded in the whitelist.  Step 5 05, The system server recognizes that the received communication message is a normal message.  If the extracted sender information is recorded in the whitelist, Then the system server determines that the sender of the received communication message is the intended sender. Identify the received communication message as a normal message, The communication message identified as a normal message is forwarded to the receiving client.  The receiving client can perform corresponding operations according to the received communication message. The communication message is no longer identified; The communication message can also be identified. The method of identifying the receiving client may include using a predetermined keyword list, At least one of the regular expression and the spam identification option is identified.  Step 506, The system server determines whether the extracted sender information is recorded in the blacklist.  If the extracted sender information is recorded in the blacklist, Then perform step 507; If the extracted sender information is not recorded in the blacklist, Then step 508 is performed.  After the system server extracts the sender information in the received communication message, You can get a blacklist from the local or client. And judge whether the extracted sender information is recorded in the blacklist.  Step 507, The system server identifies the received communication message as a spam message.  If the extracted sender information is recorded in the blacklist, The system server identifies the received communication message as a spam message. The communication message identified as spam is discarded.  Step 50 8. The system server uses the predetermined keyword list to match the message content of the received communication message. And get the score of the keyword that matches the content of the message.  The system server extracts the message content from the received communication message. Compare the predetermined keyword list, Check if the content of the message contains the corresponding keyword in the keyword list. If it is found that the message contains the corresponding keyword, Then determining that the content of the message matches the predetermined keyword list, And get the score of the keyword that matches the content of the message. When the regular expression matching the content of the message is more than one, The system server can get all the regular expressions that match the message content.  The system server can also format the extracted message content' to convert the message content into a uniform format. Check again, such as all conversions -30- 201125331 is lowercase, Half format, Prevent spammers from making large numbers of keywords, Lowercase or full shape, Half-shaped transformation to avoid inspection.  Step 509, The system server uses a predetermined regular expression, Matching the message content of the received communication message, And get the distribution of the regular expression that matches the content of the message.  The system server can determine whether the message content of the received communication message contains a specific key feature through a specific regular expression ‘ if the message content includes the key feature, Then the content of the message matches the regular expression corresponding to the key feature.  Regular expressions can be personalized by the user's and can be delivered by the system server to each client. Each regular expression corresponds to a predetermined branch, The possibility to indicate that the key feature identified by the regular expression from the message content appears in the spam message. Different keywords may be the same or different. When the regular expression matching the content of the message is more than one, The system server can get all the regular expressions that match the message content. 〇 It should be noted that There is no difference between the execution order of this step and step 508. that is, The system server may use a predetermined regular expression before or after matching the message content of the received communication message using a predetermined keyword list. Match the message content of the received communication message.  Step 5 1 0, The system server classifies the keywords and regular expressions that match the message content. To get the matching total score of the content of the message.  The system server adds the scores of all the keywords matching the message content to the scores of the -31 - 201125331 regular expressions to obtain the matching total score of the message content.  Step 5 1 1, The system server determines whether the matching total score of the message content is greater than or equal to a predetermined threshold.  If the matching total score of the message content is greater than or equal to the predetermined 闽値', step 507 is performed; If the total score of the message content is less than a predetermined threshold, Then go to step 5 05.  The above threshold 値 can be set to fixed 値, It can also be dynamically set according to the length of the communication message. that is, Different lengths of communication messages correspond to different thresholds.  It should be noted, The embodiments of the present invention can adjust the order of each step according to actual needs. The above steps of using a predetermined keyword list and a predetermined regular expression to identify spam messages, It can also be performed before the step of using the blacklist and whitelist to identify spam.  The present invention includes the following advantages, The system server identifies the spam message according to the set blacklist list and whitelist list' and the predetermined keyword list and the regular expression. 'Reducing the missed rate and false positive rate of spam' improves the identification of spam. Accuracy, This enhances the effect of information filtering. Of course, it is not necessarily necessary to achieve all of the advantages described above while implementing any of the products of the present invention.  As shown in the following figure, a flowchart of a method for identifying a spam message in the sixth embodiment of the present invention is shown. Includes the following steps:  Step 601 ' sends the UE to determine whether the message content of the communication message to be sent matches the predetermined keyword list.  -32· 201125331 If the message content of the communication message to be sent matches the predetermined keyword 歹u table, Then step 602 is performed; If the message content of the communication message to be sent does not match the predetermined keyword list, Then step 603 is performed.  Sending the message to be sent by the client to send the message, Compare the predetermined keyword list, Check if the content of the message contains the corresponding keyword in the keyword list. If it is found that the message contains the corresponding keyword, Then the content of the message matches the predetermined keyword list; If it is found that the content of the message does not contain the corresponding keyword, The message content does not match the predetermined keyword list.  The sending client can also perform format conversion on the extracted message content.  Convert the message content to a uniform format, Carry out the inspection again, If all are converted to lowercase, Half format, Prevent spammers from making large numbers of keywords, Lowercase or full shape, Half-shaped transformation to avoid inspection.  Step 602, The sending client temporarily identifies the communication message to be sent as a spam message.  If the message content of the communication message to be sent matches the predetermined keyword list, that is, The message contains the corresponding keywords. Then, the sending client temporarily identifies the communication message to be sent as a spam message. And send the communication message to the system server. Further confirmation is made by the system server by determining whether the sender of the communication message is the intended user.  As an alternative to step 602, If the message content of the communication message to be sent matches the predetermined keyword list, that is, The message contains the corresponding keywords. The sending client can directly identify the communication message to be sent as a spam message. And discarding the communication message. -33- 201125331 Step 6 03, The sending client sends a communication message to be sent to the system server.  If the message content of the communication message to be sent does not match the predetermined keyword list, that is, The corresponding content is not included in the message content.  Then, the sending client sends the communication message to be sent to the system server.  The system server continues to determine whether the communication message is a spam message.  Step 604, The system server determines whether the message content of the received communication message meets the predetermined spam identification option.  If the message content of the received communication message meets the predetermined garbage message identification option, Then perform step 606; If the message content of the received communication message does not meet the predetermined spam identification option, Then, step 605 is performed: after the system server receives the communication message sent by the sending client, The content of the message of the received communication message can be extracted. And determine whether the extracted message content meets the predetermined spam identification option.  Step 605, The system server temporarily recognizes the received communication message as spam.  If the message content of the received communication message does not meet the predetermined spam identification option, The system server temporarily recognizes the received communication message as a spam message. And send the communication message to the receiving client. Further confirmation is made by the receiving client by determining whether the sender of the communication message is the intended user.  As an alternative to step 605, If the message content of the received communication message does not meet the predetermined spam identification option, The system server can directly identify the received communication message as spam, -34- 201125331 And discard the garbage message.  Step 606, The system server forwards the received communication message to the receiving client.  If the message content of the received communication message meets the predetermined garbage message identification option, The system server forwards the received communication message to the receiving client. The receiving client continues to recognize whether the communication message is a spam message.  Step 607, The receiving client extracts the sender information in the received communication message.  After receiving the communication message forwarded by the system server, the receiving end receives the communication message from the system server. The sender information in the communication message can be extracted. To identify spam. The sender information can include the sender name, Content such as sender ID and sender address.  Step 608, Receiving the _ client according to the extracted sender information, Determine if the sender of the received communication message is the intended sender.  If the sender of the received communication message is the intended sender,  Then perform step 609; If the sender of the received communication message is not the intended sender, Then proceed to step 6 1 0.  The client in the system can also send its own friend user information and non-friend user information that has been actively contacted by itself. Periodically or event-triggered upload to the system server. The system server can also receive the above information received from the user terminal. Periodically or event-triggered to other users in the system, For other users to identify spam.  -35- 201125331 Step 609, Receive the user-side 5 message.  If the sender of the communication message is the end, the communication message is identified as the normal message corresponding to the operation.  Step 610, Receive the user-side 5 message.  If the sender of the communication message is not recognized by the client, the communication message is garbage disposal. The steps of the present invention are sequentially adjusted. The above options identify spam steps. The present invention can be used to identify the spam. The present invention includes the following advantages. Whether the sender of the pre-message message is a receiving device according to a predetermined keyword list,  Message identification options for spamming and false positives for spam messages, Improved awareness and enhanced information filtering. Of course, it is not necessary to achieve the above as well. As shown in Figure 7, For the present invention; ί structure diagram, The communication device is 7〇〇, Packet extraction module 701, Used to extract the communication messages received in the queue as normal expected senders, Receiving users, And according to the communication message, the phased and received communication message is garbage, which is the intended sender. Then receive the message, And the communication message is exemplified by using a predetermined spam message according to actual needs to identify the intended user through the judgment of the sending user after using the predetermined keyword list 'Ί°, It is recognized by the system and by the receiving client according to the garbage. Reduce the accuracy of spam messages, spam messages, and then , A communication device embodying any of the products of the present invention and β All Advantages ® Example 7 includes:  The sender information in the message.  -36- 201125331 The communication message in the embodiment of the present invention may be an instant message in the system, S M S message, Μ M S message or E - m a i 1 etc. The communication message itself can include the time of transmission, Sender information, Part of the recipient's information and message content. The communication message is sent by the sending client to be forwarded to the receiving client via the system server. among them, Receiver information includes the recipient name, Receiver ID and receiving address, etc. The sender information can include the sender name,  The sender ID and the sending address and other content.  When the communication device in the embodiment of the present invention transmits the UE, the extraction module 701 extracts the sender information in the communication message to be sent; When the communication device is the system server or receives the client, The extraction module 70 1 extracts the sender information in the received communication message.  The extraction module 701 is the part of the communication device 700 that is responsible for extracting the sender information in the communication message, and may be software, Hardware or a combination of both.  The first determining module 702, The sender information extracted according to the extraction module 7〇1, Determine if the sender of the communication message is the intended sender.  The intended sender includes at least one of the following users: system user, The non-friend user who has contacted the friend of the recipient of the communication message and the recipient of the communication message.  The first determining module 702 is further configured to obtain a system user list, a list of buddy users of the recipient of the communication message and a list of non-buddy users actively contacted by the recipient of the communication message; If the sender information of the communication message is recorded in the system user list, The list of buddy users of the recipient of the communication message and the list of non-buddy users actively contacted by the recipient of the message are then -37- 201125331 The sender of the communication message is determined to be the intended sender.  among them, The system user is a third-party user who sends a communication message. It can include a communication service provider. Messages sent by system users to recipients usually appear as notifications or reminders. The receiver can treat the system user as the intended sender.  Before the buddy user of the recipient of the communication message sends a communication message to the recipient, Establish a friendship with the recipient, And authenticated by the recipient, The receiving party can use its own buddy user as the non-friend user who is actively contacting the recipient of the intended sender 0 communication message before sending the communication message to the receiver. Did not establish a friendship with the recipient, However, it has received a message from the receiver. The UE in the system can also upload its own friend user information and non-friend user information that it has actively contacted to the system server periodically or event-triggered. For the system server to identify spam.  The first determining module 702 is responsible for the extracted sender information according to the communication device 700. Determining whether the sender of the communication message is part of the intended sender, Can be software, Hardware or a combination of both.  Identification module 703, When the first determining module 702 determines that the sender of the communication message is not the intended sender, Identify the communication message as a spam message, Or continue to identify the communication message.  The identification module 703 can be when the sender of the communication message is not the intended sender. Identify the communication message as spam, Discarding the communication message, And add the sender of the communication message to the blacklist; Also -38- 201125331 can continue to identify the communication message, The identification method can include using a predetermined keyword list, At least one of a predetermined regular expression and a predetermined spam identification option is identified.  The above identification module 703 is responsible for the above communication device 7 负责 when the sender of the communication message is not the intended sender. Identify the communication message as spam, Or continue to identify the part of the communication message, Can be software, Hardware or a combination of both.  The communication device 700 further includes an update module 704. Used to receive information from the recipient in the communication message sent, Update the expected sender information.  The update module 704 can determine the attributes of the recipient of the transmitted communication message, And when the recipient of the communication message is not the intended sender, Add the recipient information of the communication message to the expected sender information" Specifically, The update module 704 can set the recipient of the communication message as a non-friend user who has actively contacted. And recording the recipient information of the communication message, The recipient information includes: Receiver name, Receiver ID,  Receiver address and latest contact time. among them, The latest contact time is the time when the client sends a communication message to the recipient.  The above update module 704 is responsible for the recipient information in the communication message sent according to the above communication device 700. Update the portion of the sender's information that is expected, Can be software, Hardware or a combination of both.  Setting module 705, Used to set a blacklist and/or a whitelist, The blacklist list includes the sender information of the identified spam message. The whitelist list includes the sender information of the identified normal message. -39- 201125331 The setting module 705 can add the identified spam message to the blacklist. It is also possible to add the identified normal sender information to the whitelist. The information in the blacklist includes the name of the spam sender's ID and address. The sender information in the single list includes the name of the normal sender and the address.  The above setting module 705 is a part of the negative blacklist list and/or the whitelist list in the communication device 7〇〇 described above, which may be a software or a combination of the two.  The first determining module 702, Also used when the information of the communication message is recorded in the blacklist. Determining the sender of the communication message expected; And/or the sender information of the communication message is recorded in the whitelist list. The sender of the communication message is the intended sender. The above communication device 700, Also includes:  The second determining module 706, The message used to determine the communication message does not match the predetermined keyword list.  The keyword list can contain a variety of promotional information,  Keys commonly used in spam messages that broadcast rumors and uncivilized information, such as "Apparel auction", “Transfer the facade shop”, Words such as “winning” remittances, It can also contain some common English advertising words, uncivilized terms and so on. The keyword list can be personalized by the user to be sent by the system server to each client.  The sender of the message, White name, ID and responsibility settings, When the hardware is not sent, The content of the judgment is malicious communication. example , "please , English, Also available -40- 201125331 The second judgment module 7 06 will message the content of the message, Against a predetermined list of keywords, Check if the content of the message contains the corresponding keyword in the keyword list. If it is found that the content of the message contains the corresponding keyword, Then determining that the content of the message matches the predetermined keyword list;  If it is found that the content of the message does not contain the corresponding keyword, Then it is judged that the content of the message does not match the predetermined keyword list.  The second determining module 706 can also perform format conversion on the extracted message content. Convert the message content to a uniform format, Carry out the inspection again, If all are converted to lowercase, Half format, Prevent spammers from making large numbers of keywords, Lowercase or full shape, Half-shaped transformation to avoid inspection.  The second determining module 706 is the part of the communication device 700 that is responsible for determining whether the message content of the communication message matches the predetermined keyword list. Can be software, Hardware or a combination of both.  The above identification module 703, Also used when the second determining module 706 determines that the message content of the communication message matches the predetermined keyword list. The communication message is identified as spam.  The above communication device 700, Also includes:  The third determining module 707, It is used to judge whether the content of the message of the communication message meets the predetermined spam identification option.  The predetermined spam identification option can include at least one of the following: a, Phone number not allowed: b, Network links are not allowed; c, c, IM number is not allowed; d, Images are not allowed. Users can according to their own needs, Personalize the above spam identification options.  The third determining module is the part of the communication device 7 that is responsible for determining whether the message content of the communication message meets the predetermined spam identification option. Can be software, Hardware or a combination of both.  The above identification module 703, Also used when the third determining module 707 determines that the message content of the communication message does not meet the predetermined spam identification option. Identify the communication message as spam.  The above communication device 700, Also includes:  First matching module 708, Used to use a predetermined keyword list, Match the message content of the communication message, Get the distribution of the keywords that match the content of the message.  The keyword list is used to determine whether the communication message contains the feature words of the spam message. Can be personalized by the user, It can also be delivered to each client by the system server. Each keyword corresponds to a predetermined score. Used to indicate the likelihood that the keyword will appear in spam. The scores of different keywords can be the same or different.  The first matching module 708 is responsible for using a predetermined keyword list in the communication device 700. Match the message content of the communication message, Get the partial part of the keyword that matches the content of the message, Can be software, Hardware or a combination of both.  a second matching module 709, Used to use a predetermined regular expression, Match the message content of the communication message, Gets the distribution of the regular expression that matches the content of the message.  Regular expressions are used to identify certain key features from the message content. Such as a phone number, Network link or number, etc. Different regular expressions correspond to different key features, The second matching module 709 can pass a specific positive -42 - 201125331 algorithm. Determining whether the message content of the received communication message contains a specific key feature, If the message contains the key feature, Then the content of the message matches the regular expression corresponding to the key feature.  The regular expression can be personalized by the user. It can also be delivered to each client by the system server. Each regular expression corresponds to a predetermined branch, The possibility to indicate that the key feature identified by the regular expression from the message content appears in the spam message. Different keywords may be the same or different. When the regular expression matching the content of the message is more than one, The second matching module 709 can obtain all the rules of the regular expression that match the content of the message.  The second matching module 709 is responsible for using a predetermined regular expression in the communication device 700. Match the message content of the communication message, Get the part of the regular expression that matches the content of the message, Can be software, Hardware or a combination of both.  Obtaining module 710, a keyword for matching the content of the message and the regular expression obtained by the first matching module 〇8 and the second matching module 709, Get the matching total score of the message content.  The acquisition module 7 1 0 adds all the bifurcations of the keywords matching the message content and the bifurcations of the regular expressions. You can get a matching total score for the content of the message.  The obtaining module 710 is a part of the communication device 7 that is responsible for obtaining the matching total score of the message content according to the keyword matching the message content and the branching of the regular expression. Can be software, Hardware or a combination of both.  -43- 201125331 Fourth Judgment Module 7 1 1, It is used to determine whether the matching total score of the content of the message acquired by the obtaining module 710 is greater than or equal to a predetermined threshold.  The above 闽値 can be set to fixed 値, It can also be dynamically set according to the length of the communication message', that is, Different lengths of communication messages correspond to different thresholds.  The fourth judging module 71 1 is the part of the communication device 700 that is responsible for determining whether the matching total score of the message content is greater than or equal to a predetermined defect. It can be a software 'hardware' or a combination of both.  The above identification module 7〇3, Further, when the fourth judging module 7 1 1 determines that the matching total score of the content of the message is greater than or equal to a predetermined threshold, The communication message is identified as spam.  The above modules can be distributed in one device. It can also be distributed across multiple devices. The above modules can be combined into one module. It can also be further split into multiple sub-modules.  The present invention includes the following advantages, By judging whether the sender of the communication message is the intended user, Identify spam messages, Reduced the rate of missed and false positives on spam messages, Improve the accuracy of identifying spam messages,  This enhances the effectiveness of information filtering. of course, Implementing any of the products of the present invention does not necessarily require that all of the advantages described above be achieved at the same time.  As shown in Figure 8, A schematic diagram of a communication system structure in Embodiment 8 of the present invention, include:  The first communication device 810, Used to extract the sender information in the communication message, According to the extracted sender information, Determine if the sender of the communication message is the intended sender, When the sender of the communication message is not the expected sender of -44-201125331, the communication message is marked as a message to be identified. The marked communication message is forwarded to the second communication device 820 over the network.  The sender information in the communication message may include the sender name, The sender ID and the sending address, etc. The intended sender includes at least one of the following users: system user, The non-friend user who has contacted the friend of the recipient of the communication message and the recipient of the communication message.  a second communication device 8 20, Used to according to at least one of the following, Recognizing the received communication message from the first communication device 8 1 0, a predetermined keyword list, Scheduled regular expressions and scheduled spam identification options.  The keyword list can contain a variety of promotional information, Keywords commonly used in spam messages that maliciously propagate rumors and uncivilized information, Can be personalized by the user, It can also be delivered to each user by the system server.  Regular expressions are used to identify certain key features from the message content. Such as a phone number, Network link or IM number, etc. Different regular expressions correspond to different key features, Can pass specific regular expressions, Determining whether the message content of the received communication message contains a specific key feature,  If the message contains the key feature, Then the content of the message matches the regular expression corresponding to the key feature. The regular expression can be set by the user personalization. It can also be delivered to each client by the system server. Each regular expression corresponds to a predetermined number of branches, The possibility to indicate that the key feature identified by the regular expression from the content of the message appears in the spam message. No -45- 201125331 The scores of the same keywords can be the same or different. When the regular expression matching the content of the message is more than one, The receiving client can obtain all the rules of the regular expression that match the message content.  The predetermined spam identification option may include at least one of the following: a, Phone number is not allowed; b, Network links are not allowed; c, c, IM number is not allowed; d, Images are not allowed. Users can according to their own needs, Personalize the above spam identification options.  The first communication device 810, Also used when the sender of the communication message is the intended sender, Mark the communication message as a normal message, The marked communication message is forwarded to the second communication device over the network.  The first communication device 810, The second communication device 820 can be a sending client and a system server, respectively. It can also be the system server and the receiving client.  The second communication device 820, It is also used to mark the identified communication message as a message to be identified. The marked communication message is forwarded to the third communication device 830 over the network.  The above communication system, Also includes:  Third communication device 830, Used to adopt a predetermined keyword list for the received communication message from the second communication device 82 0, Identifying at least one of a predetermined regular operation formula and a predetermined spam identification option 〇 the first communication device 810, The second communication device 8 20 and the third communication device 830, Can be sent to the client, System server and receiving user.  -46- 201125331 The present invention includes the following advantages, By judging whether the sender of the communication message is the intended user, To identify spam, Reduced the rate of missed and false positives on spam messages, Improve the accuracy of identifying spam messages, This enhances the effectiveness of information filtering. of course, Implementing any of the products of the present invention does not necessarily require that all of the advantages described above be achieved at the same time.  For the convenience of description, The various parts of the above communication device are described by function into various modules. of course, In carrying out the invention, the functions of the modules can be implemented in one or more software or hardware.  Through the description of the above embodiments, It will be apparent to those skilled in the art that the present invention can be implemented by hardware. It can also be implemented by means of software plus the necessary universal hardware platform. Based on this understanding,  The technical solution of the present invention can be embodied in the form of a software product. The software product can be stored on a non-volatile storage medium (can be a CD-ROM, U disk 'mobile hard disk, etc.) Including many instructions to make a computer device (can be a personal computer, server, Or a network device or the like) performs the methods described in various embodiments of the present invention.  Those skilled in the art will appreciate that the drawings are only schematic representations of a preferred embodiment. The modules or processes in the drawings are not necessarily required to practice the invention.  Those skilled in the art can understand that the modules in the apparatus in the embodiment can be distributed in the apparatus of the embodiment according to the embodiment description. Corresponding changes can also be made in one or more devices other than the present embodiment. The modules of the above embodiments can be combined into one module. It can also be further split into multiple sub-modules.  -47- 201125331 The foregoing serial numbers of the embodiments of the present invention are only for the purpose of description. Does not represent the advantages and disadvantages of the embodiment.  The above disclosure is only a few specific embodiments of the present invention. but,  The present invention is not limited to this. Any changes that can be made by those skilled in the art should fall within the scope of the patent application of the present invention.  BRIEF DESCRIPTION OF THE DRAWINGS In order to more clearly illustrate the present invention or the prior art technical solutions, The drawings to be used in the present invention or in the description of the prior art will be briefly described below. obviously, The drawings in the following description are only some of the embodiments of the present invention, For those of ordinary skill in the art, Without paying for creative labor, Other drawings may be obtained according to the drawings. FIG. 1 is a flowchart of a method for identifying a spam message according to Embodiment 1 of the present invention;  2 is a flowchart of a method for identifying a spam message according to Embodiment 2 of the present invention;  3 is a flowchart of a method for identifying a spam message according to Embodiment 3 of the present invention;  4 is a flowchart of a method for identifying a spam message according to Embodiment 4 of the present invention;  FIG. 5 is a flowchart of a method for identifying a spam message according to Embodiment 5 of the present invention; FIG.  FIG. 6 is a flowchart of a method for identifying a spam message according to Embodiment 6 of the present invention;  FIG. 8 is a schematic structural diagram of a communication device according to a seventh embodiment of the present invention. FIG. 8 is a schematic structural diagram of a communication system in the eighth embodiment. [Signal Description of Main Components] 7 00 : Communication equipment 701 : Extraction module 702: First judgment mode 703: Identification module 704: Update module 705: Setting module 706: The second judgment mode 7〇7: Third judgment mode 708: First matching mode 709: The second matching mode 7 1 〇 : Get the module 7 1 1 : Fourth judgment mode 810: The first communication set 820: The second communication set 8 3 0 : The third communication set group group group is ready for preparation -49-

Claims (1)

201125331 七、申請專利範圍: 1. —種垃圾消息的識別方法,其特徵在於,包括: 提取通訊消息中的發送方資訊: 根據該提取的發送方資訊,判斷該通訊消息的發送方 是否爲所預期的發送方,如果該通訊消息的發送方不是該 所預期的發送方,則識別該通訊消息爲垃圾消息,或繼續 對該通訊消息進行識別。 2. 如申請專利範圍第1項所述的方法,其中,在該判 斷通訊消息的發送方是否爲所預期的發送方之後,還包括 如果該通訊消息的發送方是該所預期的發送方,則識 別該通訊消息爲正常消息。 3. 如申請專利範圍第1或2項所述的方法,其中, 該所預期的發送方包括以下用戶中的至少一種: 系統用戶、該通訊消息的接收方的好友用戶和該通訊 消息的接收方主動聯繫過的非好友用戶。 4. 如申請專利範圍第3項所述的方法,其中,該判斷 通訊消息的發送方是否爲所預期的發送方包括: 獲取系統用戶名單、該通信消息的接收方的好友用戶 名單和該通訊消息接收方主動聯繫過的非好友用戶名單; 如果該通訊消息的發送方資訊記錄在該系統用戶名單 、該通信消息的接收方的好友用戶名單和該通訊消息接收 方主動聯繫過的非好友用戶名單的任一種中,則判斷該通 訊消息的發送方是該所預期的發送方。 -50- 201125331 5 ·如申請專利範圍第3項所述的方法,其中, 該通訊消息的接收方主動聯繫過的非好友用 通訊消息的接收方在設定時間內主動聯繫過的非好友用p 〇 6. 如申請專利範圍第3項所述的方法,其中,還包括 根據自身發送的通訊消息中的接收方資訊,以更新自 身的所預期的發送方資訊。 7. 如申請專利範圍第1項所述的方法,其中,該判斷 通訊消息的發送方是否爲所預期的發送方包括: 設定黑名單列表和/或白名單列表,該黑名單列表中 包括被識別出的垃圾消息的發送方資訊,該白名單列表中 包括被識別出的正常消息的發送方資訊; 如果該通訊消息的發送方資訊記錄在該黑名單列表中 ,則判斷該通訊消息的發送方不是該所預期的發送方;和 /或 如果該通訊消息的發送方資訊記錄在該白名單列表中 ,則判斷該通訊消息的發送方是該所預期的發送方° 8 .如申請專利範圍第1項所述的方法’其中’該繼續 對該通訊消息進行識別包括: 判斷該通訊消息的消息內容是否與預定的關鍵字列1 $ 匹配; 如果該通訊消息的消息內容與該預定的關鍵字列表@ 配,則識別該通訊消息爲垃圾消息。 -51 - 201125331 9.如申請專利範圍第1項所述的方法,其中,在該提 取通訊消息中的發送方資訊之前,還包括: 判斷該通訊消息的消息內容是否與預定的關鍵字列表 匹配; 如果該通訊消息的消息內容與該預定的關鍵字列表匹 配,則暫時識別該通訊消息爲垃圾消息,或直接識別該通 訊消息爲垃圾消息。 10·如申請專利範圍第8或9項所述的方法,其中,在 該使用預定的關鍵字列表而對該通訊消息的消息內容進行 匹配之後,還包括: 如果該通訊消息的消息內容與該預定的關鍵字列表不 匹配,則識別該通訊消息爲正常消息,或繼續對該通訊消 息進行識別》 11.如申請專利範圍第1項所述的方法,其中,該繼續 對該通訊消息進行識別包括: 判斷該通訊消息的消息內容是否符合預定的垃圾消息 識別選項; 如果該通訊消息的消息內容不符合該預定的垃圾消息 識別選項,則識別該通訊消息爲垃圾消息。 1 2.如申請專利範圍第1項所述的方法,其中,在該提 取通訊消息中的發送方資訊之前’還包括: 判斷該通訊消息的消息內容是否符合預定的垃圾消息 識別選項; 如果該通訊消息的消息內容不符合該預定的垃圾消息 -52- 201125331 識別選項,則暫時識別該通訊消息爲垃圾消息,或直接識 別該通訊消息爲垃圾消息。 1 3 .如申請專利範圍第1 1或1 2項所述的方法,其中, 該垃圾消息識別選項包括以下內容中的至少一種: 不允許出現電話號碼、不允許出現網路鏈結、不允許 出現IM即時通訊號碼和不允許出現圖片。 14.如申請專利範圍第1 1或12項所述的方法,其中, 在該判斷通訊消息的消息內容是否符合預定的垃圾消息識 別選項之後,還包括: 如果該通訊消息的消息內容符合該預定的垃圾消息識 別選項,則識別該通訊消息爲正常消息,或繼續對該通訊 消息進行識別。 i 5 .如申請專利範圍第1項所述的方法,其中,該繼續 對通訊消息進行識別包括: 使用預定的關鍵字列表而對該通訊消息的消息內容進 行匹配,以獲取與該消息內容匹配的關鍵字的分値; 使用預定的正則運算式而對該通訊消息的消息內容進 行匹配,以獲取與該消息內容匹配的正則運算式的分値; 根據與該消息內容匹配的關鍵字和正則運算式的分値 ,以獲取該消息內容的匹配總分値; 判斷該消息內容的匹配總分値是否大於或等於預定的 閾値: 如果該消息內容的匹配總分値大於或等於該預定的閾 値,則識別該通訊消息爲垃圾消息。 -53- 201125331 16.如申請專利範圍第1項所述的方法,其中,在該提 取通訊消息中的發送方資訊之前,還包括: 使用預定的關鍵字列表而對該通訊消息的消息內容進 行匹配,以獲取與該消息內容匹配的關鍵字的分値; 使用預定的正則運算式而對該通訊消息的消息內容進 行匹配,以獲取與該消息內容匹配的正則運算式的分値; 根據與該消息內容匹配的關鍵字和正則運算式的分値 ,以獲取該消息內容的匹配總分値: 判斷該消息內容的匹配總分値是否大於或等於預定的 閩値; 如果該消息內容的匹配總分値大於或等於該預定的閩 値,則暫時識別該通訊消息爲垃圾消息,或直接識別該通 訊消息爲垃圾消息。 1 7 ·如申請專利範圍第1 5或1 6項所述的方法,其中, 在該判斷消息內容的匹配總分値是否大於或等於預定的閾 値之後,還包括: 如果該消息內容的匹配總分値小於該預定的閾値,則 識別該通訊消息爲正常消息,或繼續對該通訊消息進行識 別》 18.—種通訊設備,其特徵在於,包括: 提取模組,用以提取通訊消息中的發送方資訊; 第一判斷模組’用以根據該提取模組提取的發送方資 訊,判斷該通訊消息的發送方是否爲所預期的發送方; 識別模組,用以在該第一判斷模組判斷該通訊消息的 -54- 201125331 發送方不是該所預期的發送方時’識別該通訊消息爲垃圾 消息,或繼續對該通訊消息進行識別。 1 9.如申請專利範圍第1 8項所述的通訊設備’其中’ 該第一判斷模組還用以獲取系統用戶名單、該通信消息的 接收方的好友用戶名單和該通訊消息接收方主動聯繫過的 非好友用戶名單; 如果該通訊消息的發送方資訊記錄在該系統用戶名單 、該通信消息的接收方的好友用戶名單和該通訊消息接收 方主動聯繫過的非好友用戶名單的任一種中’則判斷該通 訊消息的發送方是所預期的發送方。 2 0 .如申請專利範圍第1 8項所述的通訊設備’其中, 還包括: 更新模組,用以根據發送的通訊消息中的接收方資訊 ,更新所預期的發送方資訊。 2 1 .如申請專利範圍第1 8項所述的通訊設備,其中, 還包括: 設定模組,用以設定黑名單列表和/或白名單列表, 該黑名單列表中包括被識別出的垃圾消息的發送方資訊, 該白名單列表中包括被識別出的正常消息的發送方資訊; 該第一判斷模組,還用以在在該通訊消息的發送方資 訊記錄在該黑名單列表中時,判斷該通訊消息的發送方不 是所預期的發送方;和/或 在該通訊消息的發送方資訊記錄在該白名單列表中時 ,判斷該通訊消息的發送方是所預期的發送方。 •55- 201125331 2 2.如申請專利範圍第18項所述的通訊設備,其中, 還包括: 第二判斷模組,用以判斷該通訊消息的消息內容是否 與預定的關鍵字列表匹配; 該識別模組還用以在該第二判斷模組判斷該通訊消息 的消息內容與該預定的關鍵字列表匹配時,識別該通訊消 息爲垃圾消息。 23·如申請專利範圍第18項所述的通訊設備,其中, 還包括: 第三判斷模組,用以判斷該通訊消息的消息內容是否 符合預定的垃圾消息識別選項; 該識別模組還用以在該第三判斷模組判斷該通訊消息 的消息內容不符合預定的垃圾消息識別選項時,識別該通 訊消息爲垃圾消息。 2 4.如申請專利範圍第1 8項所述的通訊設備,其中, 還包括: 第一匹配模組,用以使用預定的關鍵字列表而對該通 訊消息的消息內容進行匹配,以獲取與該消息內容匹配的 關鍵字的分値; 第二匹配模組,用以使用預設的正則運算式而對該通 訊消息的消息內容進行匹配,以獲取與該消息內容匹配的 正則運算式的分値; 獲取模組,用以根據該第一匹配模組和該第二匹配模 組獲取的與該消息內容匹配的關鍵字和正則運算式的分値 -56- 201125331 ’以獲取該消息內容的匹配總分値; 第四判斷模組,用以判斷該獲取模組獲取的該消息內 容的匹配總分値是否大於或等於預定的閾値; 該識別模組還用以在該第四判斷模組判斷該消息內容 的匹配總分値大於或等於該預設的閾値時,識別該通訊消 息爲垃圾消息。 25· —種通訊系統,其特徵在於,包括: 第一通訊設備,用以提取通訊消息中的發送方資訊, 根據該提取的發送方資訊,判斷該通訊消息的發送方是否 爲所預期的發送方,在該通訊消息的發送方不是該所預期 的發送方時,將該通訊消息標記爲待識別消息,向第二通 訊設備轉發標記後的通訊消息; 第二通訊設備,用以根據以下內容中的至少一者,對 接收到的來自該第一通訊設備的通訊消息進行識別: 預定的關鍵字列表、預定的正則運算式和預定的垃圾 消息識別選項。 26. 如申請專利範圍第25項所述的通訊系統,其中, 該第一通訊設備還用以在該通訊消息的發送方是該所 預期的發送方時,將該通訊消息標記爲正常消息’向該第 二通訊設備轉發標記後的通訊消息。 27. 如申請專利範圍第25項所述的通訊系統,其中, 還包括: 第三通訊設備,用以對接收到的來自該第二通訊設備 的通訊消息採用預定的關鍵字列表、預定的正則運算式和 -57- 201125331 預定的垃圾消息識別選項中的至少一種以進行識別; 該第二通訊設備還用以將識別後的該通訊消息標記爲 待識別消息,向該第三通訊設備轉發標記後的通訊消息。 -58-201125331 VII. Patent application scope: 1. A method for identifying a spam message, comprising: extracting sender information in a communication message: determining, according to the extracted sender information, whether the sender of the communication message is The intended sender, if the sender of the communication message is not the intended sender, identifies the communication message as a spam message or continues to identify the communication message. 2. The method of claim 1, wherein after determining whether the sender of the communication message is the intended sender, the method further comprises: if the sender of the communication message is the intended sender, Then the communication message is identified as a normal message. 3. The method of claim 1 or 2, wherein the intended sender comprises at least one of the following users: a system user, a friend user of the recipient of the communication message, and receiving the communication message Non-friend users who have been actively contacted by the party. 4. The method of claim 3, wherein the determining whether the sender of the communication message is the intended sender comprises: obtaining a list of system users, a list of friend users of the recipient of the communication message, and the communication a list of non-friend users actively contacted by the message receiver; if the sender information of the communication message is recorded in the system user list, the friend user list of the receiver of the communication message, and the non-friend user actively contacted by the communication message receiver In any of the lists, it is determined that the sender of the communication message is the intended sender. The method of claim 3, wherein the recipient of the non-friend communication message actively contacted by the recipient of the communication message actively contacts the non-friend with the set time. The method of claim 3, further comprising updating the expected sender information according to the recipient information in the communication message sent by itself. 7. The method of claim 1, wherein the determining whether the sender of the communication message is the intended sender comprises: setting a blacklist and/or a whitelist, the blacklist including the The sender information of the identified spam message, the whitelist list including the sender information of the identified normal message; if the sender information of the communication message is recorded in the blacklist list, determining the sending of the communication message The party is not the intended sender; and/or if the sender information of the communication message is recorded in the whitelist, it is determined that the sender of the communication message is the intended sender. The method of claim 1, wherein the continuing to identify the communication message comprises: determining whether the message content of the communication message matches a predetermined keyword column 1$; if the message content of the communication message is related to the predetermined key The word list @配, identifies the communication message as spam. The method of claim 1, wherein before the extracting the sender information in the communication message, the method further comprises: determining whether the message content of the communication message matches the predetermined keyword list. If the message content of the communication message matches the predetermined keyword list, the communication message is temporarily identified as a spam message, or the communication message is directly identified as a spam message. The method of claim 8 or 9, wherein after the matching the message content of the communication message by using the predetermined keyword list, the method further comprises: if the message content of the communication message is If the predetermined keyword list does not match, the communication message is identified as a normal message, or the communication message is continuously identified. 11. The method of claim 1, wherein the continuing to identify the communication message The method includes: determining whether the message content of the communication message meets a predetermined spam identification option; if the message content of the communication message does not meet the predetermined spam identification option, identifying the communication message as a spam message. 1. The method of claim 1, wherein before the extracting the sender information in the communication message, the method further comprises: determining whether the message content of the communication message meets a predetermined spam identification option; If the message content of the communication message does not meet the predetermined spam message-52-201125331 identification option, the communication message is temporarily identified as a spam message, or the communication message is directly identified as a spam message. The method of claim 1 or claim 2, wherein the spam identification option comprises at least one of the following: a phone number is not allowed, a network link is not allowed, and no permission is allowed. IM IM number appears and pictures are not allowed. 14. The method of claim 1, wherein after determining whether the message content of the communication message meets the predetermined spam identification option, the method further comprises: if the message content of the communication message conforms to the predetermined The spam identification option identifies the communication message as a normal message or continues to identify the communication message. The method of claim 1, wherein the continuing to identify the communication message comprises: matching a message content of the communication message with a predetermined keyword list to obtain a content matching the message content The distribution of the keyword; the predetermined regular expression is used to match the message content of the communication message to obtain the distribution of the regular expression matching the content of the message; according to the keyword and the regularity matching the content of the message The bifurcation of the expression to obtain a matching total score of the content of the message; determining whether the total matching score of the content of the message is greater than or equal to a predetermined threshold: if the total matching score of the content of the message is greater than or equal to the predetermined threshold , the communication message is identified as a spam message. The method of claim 1, wherein before the extracting the sender information in the communication message, the method further comprises: using the predetermined keyword list to perform the message content of the communication message. Matching to obtain a score of a keyword that matches the content of the message; matching the message content of the communication message with a predetermined regular expression to obtain a branch of the regular expression that matches the content of the message; The message matching the keyword and the regular expression of the message to obtain the matching total score of the message content: determining whether the matching total score of the message content is greater than or equal to a predetermined 闽値; if the message content matches If the total score is greater than or equal to the predetermined defect, the communication message is temporarily identified as a spam message, or the communication message is directly identified as a spam message. The method according to claim 15 or 16, wherein after the matching total score of the content of the judgment message is greater than or equal to a predetermined threshold, the method further comprises: if the total content of the message matches If the distribution is less than the predetermined threshold, the communication message is identified as a normal message, or the communication message is continuously identified. 18. A communication device, comprising: an extraction module, configured to extract a communication message The first judgment module is configured to determine, according to the sender information extracted by the extraction module, whether the sender of the communication message is the intended sender; the identification module is configured to be in the first judgment mode The group judges that the communication message is -54-201125331 when the sender is not the intended sender, 'identifies the communication message as a spam message, or continues to identify the communication message. 1 9. The communication device as described in claim 18, wherein the first determining module is further configured to obtain a list of system users, a list of buddy users of the recipient of the communication message, and the recipient of the communication message The list of non-friend users who have contacted; if the sender information of the communication message is recorded in the system user list, the friend user list of the receiver of the communication message, and the non-friend user list actively contacted by the communication message receiver In the case, it is judged that the sender of the communication message is the intended sender. The communication device of claim 18, wherein the method further comprises: an update module, configured to update the expected sender information according to the recipient information in the transmitted communication message. 2 1. The communication device according to claim 18, further comprising: a setting module, configured to set a blacklist and/or a whitelist, wherein the blacklist includes the identified garbage The sender information of the message, the whitelist list includes the sender information of the identified normal message; the first determining module is further configured to: when the sender information of the communication message is recorded in the blacklist list Determining that the sender of the communication message is not the intended sender; and/or determining that the sender of the communication message is the intended sender when the sender information of the communication message is recorded in the white list. 2. The communication device of claim 18, further comprising: a second determining module, configured to determine whether the message content of the communication message matches a predetermined keyword list; The identification module is further configured to: when the second determining module determines that the message content of the communication message matches the predetermined keyword list, identify the communication message as a spam message. The communication device of claim 18, further comprising: a third determining module, configured to determine whether the message content of the communication message meets a predetermined spam identification option; When the third judging module judges that the message content of the communication message does not meet the predetermined spam identification option, the communication message is identified as a spam message. 2. The communication device of claim 18, further comprising: a first matching module, configured to match a message content of the communication message by using a predetermined keyword list to obtain and The message matching content of the message; the second matching module is configured to match the message content of the communication message by using a preset regular expression to obtain a regular expression that matches the content of the message. The acquiring module is configured to obtain the content of the message according to the keyword and the regular expression of the keyword and the regular expression obtained by the first matching module and the second matching module. a matching total score; a fourth determining module, configured to determine whether a matching total score of the content of the message obtained by the acquiring module is greater than or equal to a predetermined threshold; the identifying module is further configured to be in the fourth determining module When it is determined that the total matching score of the content of the message is greater than or equal to the preset threshold, the communication message is identified as a spam message. a communication system, comprising: a first communication device, configured to extract sender information in a communication message, and determine, according to the extracted sender information, whether the sender of the communication message is expected to be sent And, when the sender of the communication message is not the intended sender, marking the communication message as a to-be-identified message, and forwarding the marked communication message to the second communication device; the second communication device is configured to At least one of the received communication messages from the first communication device is identified: a predetermined keyword list, a predetermined regular expression, and a predetermined spam identification option. 26. The communication system of claim 25, wherein the first communication device is further configured to mark the communication message as a normal message when the sender of the communication message is the intended sender. Forwarding the marked communication message to the second communication device. 27. The communication system of claim 25, further comprising: a third communication device for using the predetermined keyword list, the predetermined regularity for the received communication message from the second communication device And the at least one of the scheduled spam identification options for identifying; the second communication device is further configured to mark the identified communication message as a to-be-identified message, and forward the mark to the third communication device After the communication message. -58-
TW99100272A 2010-01-07 2010-01-07 Method, system and device for junk message recognition TW201125331A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW99100272A TW201125331A (en) 2010-01-07 2010-01-07 Method, system and device for junk message recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW99100272A TW201125331A (en) 2010-01-07 2010-01-07 Method, system and device for junk message recognition

Publications (1)

Publication Number Publication Date
TW201125331A true TW201125331A (en) 2011-07-16

Family

ID=45047400

Family Applications (1)

Application Number Title Priority Date Filing Date
TW99100272A TW201125331A (en) 2010-01-07 2010-01-07 Method, system and device for junk message recognition

Country Status (1)

Country Link
TW (1) TW201125331A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105207881A (en) * 2014-06-10 2015-12-30 阿里巴巴集团控股有限公司 Message sending method and equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105207881A (en) * 2014-06-10 2015-12-30 阿里巴巴集团控股有限公司 Message sending method and equipment

Similar Documents

Publication Publication Date Title
US10185479B2 (en) Declassifying of suspicious messages
US20050160144A1 (en) System and method for filtering network messages
KR100627500B1 (en) Mobilephone sms service system and the method
JP2006060811A (en) Method for filtering spam mail for mobile communication apparatus
US11966684B2 (en) Methods, systems, and apparatus for email to persistent messaging
WO2011153744A1 (en) Method and system for monitoring spam short message
KR20180118732A (en) Managing multiple profiles for a single account in an asynchronous messaging system
US10623357B2 (en) Peer-assisted mail thread management
WO2012034539A1 (en) Method and device for inviting friends in social networking site throuth mobile communication terminal
US20100287244A1 (en) Data communication using disposable contact information
RU2438171C2 (en) Method, device and system for identifying service
JP5363342B2 (en) System and method for filtering cellular telephone messages
CN105376137A (en) Instant messaging method and device
US7103372B1 (en) System and method for preventing delivery of unsolicited messages on an SMS network
WO2013075430A1 (en) Information filtering method, and method, device, and system for processing forwarded information
US8249560B2 (en) Sending method, receiving method, and system for email transfer by short message
WO2008015669A2 (en) Communication authenticator
TW201125331A (en) Method, system and device for junk message recognition
CN101184262A (en) Mobile message receive and reject method using mobile message receive system
CN101711013A (en) Method for processing multimedia message and device thereof
KR101600864B1 (en) A selective receiving method of e-mail
JP4185462B2 (en) Mail device
KR100643639B1 (en) Method and system of informing mobile instant messaging using message call
KR100746049B1 (en) System and method for managing spam message and mobile communication terminal therefor
TWI238635B (en) Method enabling local area network to securely transmit wireless e-mail