CN1801855A - 基于消息内容的无用消息(垃圾消息)检测 - Google Patents

基于消息内容的无用消息(垃圾消息)检测 Download PDF

Info

Publication number
CN1801855A
CN1801855A CNA2005101377059A CN200510137705A CN1801855A CN 1801855 A CN1801855 A CN 1801855A CN A2005101377059 A CNA2005101377059 A CN A2005101377059A CN 200510137705 A CN200510137705 A CN 200510137705A CN 1801855 A CN1801855 A CN 1801855A
Authority
CN
China
Prior art keywords
message
rubbish
attribute
upper limit
rubbish message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2005101377059A
Other languages
English (en)
Other versions
CN1801855B (zh
Inventor
蔡亦钢
S·瑟瑞尔·库图布
艾洛克·沙玛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Origin Asset Group Co ltd
Nokia of America Corp
Original Assignee
Lucent Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lucent Technologies Inc filed Critical Lucent Technologies Inc
Publication of CN1801855A publication Critical patent/CN1801855A/zh
Application granted granted Critical
Publication of CN1801855B publication Critical patent/CN1801855B/zh
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/21Monitoring or handling of messages
    • H04L51/212Monitoring or handling of messages using filtering or selective blocking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/60Business processes related to postal services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/12Messaging; Mailboxes; Announcements

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Transfer Between Computers (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

在电信网络中,一种用于检测无用消息(垃圾消息)的方法。对可疑垃圾消息的内容进行分析,以判断该消息的加权属性以及属性的加权和是否超过阈值。如果这些加权和超过阈值,则把该消息看作是垃圾消息并进行人工分析,以改善分析中所用的加权因子和属性的质量。

Description

基于消息内容的无用消息(垃圾消息)检测
技术领域
本发明涉及基于消息内容检测垃圾消息(spam)的方法。
背景技术
随着因特网的出现,发送者容易以极低的成本或没有成本地将消息发送到大量的目的地。这些信息包括短消息业务的短消息。这些消息包括令消息接收者讨厌的未经请求的和不想要的消息(垃圾消息),接收者必须将这些消息清理并确定它们是否重要。此外,它们对用于发送这些消息的电信网络的载体也是个麻烦,不仅因为它们带来了与那些恼怒于垃圾消息泛滥的客户之间的客户关系问题,而且因为通常毫无收益的这些消息占用了网络资源。以下两个统计量说明了这个问题的严重性。在中国,2003年,通过中国电信网络发送了两万亿条短消息业务(SMS)消息;在这些消息中,估计四分之三为垃圾消息。第二个统计量是在美国估计有85-90%的电子邮件为垃圾消息。
已经提出了多种方案且许多方案被实现用于删减发送的垃圾消息的数量。已经提出了各种方案用于在发送消息之前对其进行分析。依据一种方案,如果主叫方不是由被叫方指定的预选组中的一个,则该消息被阻塞。还可以通过允许被叫方指定消息不能发往超过多于N个目的地来拦截垃圾消息信息。
被叫方可以拒绝公开他/她的电话号码或者电子邮件地址。除了不允许主叫查询被叫方的电话号码或者电子邮件地址这个明显缺点之外,这些方案可能不能达到预期效果。精明的电脑黑客可以从IP网络检测到未登记的(unlisted)电子邮件地址,例如,通过在路由器上监视消息头部。未登记的被叫号码只不过引起主叫发送消息到一个局名代码的10000个电话号码;如上所提及的,利用当前方案非常容易发送消息到多个目的地。
在较为隐蔽的垃圾消息当中,有些消息是一些令人不快的消息,这些消息用于色情目的或给收件人带来一些无用广告。通常只能通过检查消息的内容才能拦截这些消息,因为发件人可能从同一源发送许多无害消息。垃圾消息检测的主要问题在于基于消息内容检测垃圾消息的问题。
发明内容
根据申请人的发明可以减轻上述问题并改进现有技术,其中:针对某些属性(比如关键字)的存在和这些属性的出现频率,对可疑消息进行分析;每一属性(property)都被赋予一个适当的垃圾消息指数,一个几乎静态并且是预定和提供的量,以及一个动态变化的取决于业务量和消息/内容类型的加权因子。检查消息中是否存在其使用频率超过阈值的任何属性、其组合使用超过阈值的属性预定组合以及其组合使用超过阈值的所有属性。根据申请人的发明的一个特征,可以通过分析人员动态调整每一属性的加权因子,以匹配可疑消息的检查结果。最好,通过使用分析人员,检测过程可以学习。
附图说明
图1示出了申请人的发明的操作;和
图2是说明申请人的发明的流程图。
具体实施方式
图1示出了申请人的发明的操作。源1想向目标2发送消息。消息被发送到网络3,网络3认为该消息可能是垃圾消息,但它需要对消息内容进行分析才能作出判断。网络3将消息传送到消息分析器10。如果消息分析器断定该消息不是垃圾消息,则,通过网络4将消息发送到目标2。
消息分析器10包括属性的列表数据14、每一属性的严重性指数、每一严重性指数的加权因子和属性的严重性程度阈值。
垃圾消息属性是作为垃圾消息的一种可能指示器的单词、短语、句子、图像或视频片断。单词“madam”是一个例子。对于出现在消息中的每一属性,可以计算出属性的出现次数、严重性指数和加权因子的乘积,以得出严重性程度。严重性程度用来判断是否把消息看作是垃圾消息。
严重性指数和严重性阈值保持相对不变,但加权因子可以响应于在业务部门检测到特殊问题区域(以增大加权因子)或者很少有垃圾消息活动的区域(以减小加权因子),根据来自垃圾消息业务部门15的消息而改变。
消息分析器取出消息的内容,并搜索预存的属性,比如单词“madam”和“lovers”。对于每一预存的属性,都有一个加权因子用来指示在达到某一严重性程度时这一属性有多大权重。其严重性程度超过预定阈值的消息被阻止并可以被存储以便进一步进行人工分析。
图2是说明申请人的垃圾消息检查的操作的流程图。到来的消息被接收和缓冲,以便进行垃圾消息分析(操作块201)。得到垃圾消息列表数据,以便计算消息的属性的垃圾消息严重性指数(操作块203)。垃圾消息分析返回消息的消息属性的垃圾消息严重性指数(操作块205)。业务逻辑用每一属性的严重性指数来填充电子数据表,并得出分布式垃圾消息严重性指数概况模式(操作块207)。测试209检查是否有任意单个属性严重性指数超过了该属性的阈值。如果有任意一个属性严重性指数超过了该限制,则进入操作块221(将在后面描述)。否则,进入测试211,检查是否有严重性指数的任意模式超过了阈值。如果有任意一个模式超过了该模式的阈值,则进入操作块221。否则,利用所有属性或其严重性指数超过了阈值的所有属性计算出聚集垃圾消息严重性指数(操作块213)。如果聚集指数超过了上限阈值(测试215),则消息是黑色的。如果它小于下限阈值(测试216),则消息是白色的。对于其他消息,利用测试217来判断是否应对消息进行人工分析。如果不进行人工分析,则将消息中继到其目标(操作块223)。如果已选中对它进行人工分析,则将消息发送到业务部门(操作块218)。人工检查结果(测试219)将判定是令人满意的结果从而将消息转发(操作块223),还是不令人满意的结果从而把消息看作是垃圾消息并执行操作块221的功能。
操作块221存储垃圾消息,必要时存储通过人工检查得到的更新的垃圾消息过滤器和规则业务数据库,并更新垃圾消息严重性加权因子和指数上限,必要时还要增加新的分布式垃圾消息模式。
以上所述是申请人的发明的一种优选实施方式。在不背离本发明的范围的前提下,普通技术人员显然可以得到其他实施方式。本发明只能由附属权利要求书来限定。

Claims (10)

1.在电信网络中,一种用于检测无用消息(垃圾消息)的方法,包括如下步骤:
存储潜在消息的每一属性的加权因子、指数和限值;
存储可疑垃圾消息;
得出所存储垃圾消息的属性;
计算每一属性的出现次数、其加权因子和其指数的乘积;
从乘积形成分布式垃圾消息概况;和
判断所述分布式垃圾消息概况是否满足把消息归类为垃圾消息的标准。
2.根据权利要求1所述的方法,其中如果任一乘积对于该乘积的属性超过其上限,则断定相关消息是垃圾消息。
3.根据权利要求1所述的方法,还包括如下步骤:
为属性的多个模式存储每一模式的上限;和
如果超过任一模式的上限,则断定消息是垃圾消息。
4.根据权利要求1所述的方法,其中如果所述消息的所有乘积的和超过预定上限阈值,则把所述消息看作是垃圾消息。
5.根据权利要求1所述的方法,其中可响应于来自业务部门的消息而改变属性的加权因子或上限。
6.在电信网络中,一种用于检测无用消息(垃圾消息)的设备,包括:
用于存储潜在消息的每一属性的加权因子、指数和限值的装置;
用于存储可疑垃圾消息的装置;
用于得出所存储垃圾消息的属性的装置;
用于计算每一属性的出现次数、其加权因子和其指数的乘积的装置;
用于从乘积形成分布式垃圾消息概况的装置;和
用于判断所述分布式垃圾消息概况是否满足把消息归类为垃圾消息的标准的装置。
7.根据权利要求6所述的设备,其中如果任一乘积对于该乘积的属性超过其上限,用于把相关消息看作是垃圾消息的装置。
8.根据权利要求6所述的设备,还包括:
用于为属性的多个模式存储每一模式的上限的装置;和
如果超过任一模式的上限,用于把消息看作是垃圾消息的装置。
9.根据权利要求6所述的设备,其中如果所述消息的所有乘积的和超过预定上限阈值,用于把所述消息看作是垃圾消息的装置。
10.根据权利要求6所述的设备,还包括:用于响应于来自业务部门的消息而改变属性的加权因子或上限的装置。
CN2005101377059A 2004-12-21 2005-12-20 基于消息内容的无用消息(垃圾消息)检测 Expired - Fee Related CN1801855B (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/018,270 2004-12-21
US11/018,270 US20060168032A1 (en) 2004-12-21 2004-12-21 Unwanted message (spam) detection based on message content

Publications (2)

Publication Number Publication Date
CN1801855A true CN1801855A (zh) 2006-07-12
CN1801855B CN1801855B (zh) 2011-04-06

Family

ID=35954109

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2005101377059A Expired - Fee Related CN1801855B (zh) 2004-12-21 2005-12-20 基于消息内容的无用消息(垃圾消息)检测

Country Status (6)

Country Link
US (1) US20060168032A1 (zh)
EP (1) EP1675330B1 (zh)
JP (1) JP4827518B2 (zh)
KR (1) KR101170562B1 (zh)
CN (1) CN1801855B (zh)
DE (1) DE602005001046T2 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010148711A1 (zh) * 2009-12-08 2010-12-29 中兴通讯股份有限公司 一种彩信处理的方法及装置
CN103368914A (zh) * 2012-03-31 2013-10-23 百度在线网络技术(北京)有限公司 一种用于拦截消息的方法、装置和设备

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101166159B (zh) * 2006-10-18 2010-07-28 阿里巴巴集团控股有限公司 一种确定垃圾信息的方法及系统
KR100851595B1 (ko) * 2006-11-28 2008-08-12 주식회사 케이티프리텔 스팸 메시지 발송 제한 방법 및 이를 위한 장치
US8745056B1 (en) 2008-03-31 2014-06-03 Google Inc. Spam detection for user-generated multimedia items based on concept clustering
US8752184B1 (en) * 2008-01-17 2014-06-10 Google Inc. Spam detection for user-generated multimedia items based on keyword stuffing
US8171020B1 (en) 2008-03-31 2012-05-01 Google Inc. Spam detection for user-generated multimedia items based on appearance in popular queries
US8291054B2 (en) 2008-05-27 2012-10-16 International Business Machines Corporation Information processing system, method and program for classifying network nodes
US8291024B1 (en) * 2008-07-31 2012-10-16 Trend Micro Incorporated Statistical spamming behavior analysis on mail clusters
US8572496B2 (en) * 2010-04-27 2013-10-29 Go Daddy Operating Company, LLC Embedding variable fields in individual email messages sent via a web-based graphical user interface
CN102315953B (zh) * 2010-06-29 2016-08-03 百度在线网络技术(北京)有限公司 基于帖子的出现规律来检测垃圾帖子的方法及设备
CN102480705B (zh) * 2010-11-26 2015-11-25 卓望数码技术(深圳)有限公司 一种根据号码关系图过滤垃圾短信的方法及系统
EP2646964A4 (en) 2010-12-01 2015-06-03 Google Inc RECOMMENDATIONS BASED ON TOPICAL CLUSTERS
US10037569B2 (en) * 2011-10-21 2018-07-31 Intercontinental Exchange Holdings, Inc. Systems and methods to implement an exchange messaging policy
US8955127B1 (en) * 2012-07-24 2015-02-10 Symantec Corporation Systems and methods for detecting illegitimate messages on social networking platforms
US9083729B1 (en) 2013-01-15 2015-07-14 Symantec Corporation Systems and methods for determining that uniform resource locators are malicious
RU2013144681A (ru) 2013-10-03 2015-04-10 Общество С Ограниченной Ответственностью "Яндекс" Система обработки электронного сообщения для определения его классификации
CN104572646B (zh) * 2013-10-11 2017-10-17 富士通株式会社 异常信息确定装置和方法以及电子设备
US9357362B2 (en) 2014-05-02 2016-05-31 At&T Intellectual Property I, L.P. System and method for fast and accurate detection of SMS spam numbers via monitoring grey phone space
US9565147B2 (en) 2014-06-30 2017-02-07 Go Daddy Operating Company, LLC System and methods for multiple email services having a common domain
US10229219B2 (en) * 2015-05-01 2019-03-12 Facebook, Inc. Systems and methods for demotion of content items in a feed
CN105100366B (zh) 2015-07-13 2018-03-20 小米科技有限责任公司 骚扰电话号码确定方法、装置和系统
EP3200136A1 (en) 2016-01-28 2017-08-02 Institut Mines-Telecom / Telecom Sudparis Method for detecting spam reviews written on websites
KR101870789B1 (ko) * 2017-09-12 2018-06-25 주식회사 에바인 스마트 수신 차단 방법 및 장치

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000026795A1 (en) * 1998-10-30 2000-05-11 Justsystem Pittsburgh Research Center, Inc. Method for content-based filtering of messages by analyzing term characteristics within a message
US6654787B1 (en) * 1998-12-31 2003-11-25 Brightmail, Incorporated Method and apparatus for filtering e-mail
GB2347053A (en) * 1999-02-17 2000-08-23 Argo Interactive Limited Proxy server filters unwanted email
KR100452910B1 (ko) 2002-02-22 2004-10-14 주식회사 네오위즈 대량 메일의 파악에 기반한 스팸 메일 필터링 방법 및 장치
US20030195937A1 (en) * 2002-04-16 2003-10-16 Kontact Software Inc. Intelligent message screening
WO2004061698A1 (en) * 2002-12-30 2004-07-22 Activestate Corporation Method and system for feature extraction from outgoing messages for use in categorization of incoming messages
US7366761B2 (en) * 2003-10-09 2008-04-29 Abaca Technology Corporation Method for creating a whitelist for processing e-mails
US7272853B2 (en) * 2003-06-04 2007-09-18 Microsoft Corporation Origination/destination features and lists for spam prevention

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010148711A1 (zh) * 2009-12-08 2010-12-29 中兴通讯股份有限公司 一种彩信处理的方法及装置
CN103368914A (zh) * 2012-03-31 2013-10-23 百度在线网络技术(北京)有限公司 一种用于拦截消息的方法、装置和设备

Also Published As

Publication number Publication date
EP1675330A1 (en) 2006-06-28
DE602005001046T2 (de) 2008-01-03
JP2006178998A (ja) 2006-07-06
KR101170562B1 (ko) 2012-08-01
CN1801855B (zh) 2011-04-06
EP1675330B1 (en) 2007-05-02
KR20060071361A (ko) 2006-06-26
DE602005001046D1 (de) 2007-06-14
JP4827518B2 (ja) 2011-11-30
US20060168032A1 (en) 2006-07-27

Similar Documents

Publication Publication Date Title
CN1801855B (zh) 基于消息内容的无用消息(垃圾消息)检测
EP1675333B1 (en) Detection of unwanted messages (spam)
US8930480B2 (en) Degrees of separation for filtering communications
US7949759B2 (en) Degrees of separation for handling communications
CN103198123B (zh) 用于基于用户信誉过滤垃圾邮件消息的系统和方法
US6779021B1 (en) Method and system for predicting and managing undesirable electronic mail
US6421709B1 (en) E-mail filter and method thereof
EP1376420A1 (en) Method and system for classifying electronic documents
US20050015626A1 (en) System and method for identifying and filtering junk e-mail messages or spam based on URL content
CA2654796A1 (en) Systems and methods for identifying potentially malicious messages
US20060224621A1 (en) E-mail response system
WO2003071753A1 (fr) Procede et dispositif permettant le traitement de courrier electronique indesirable pour l'utilisateur
KR20060071362A (ko) 스팸 차단 방법 및 스팸 차단 장치
US20090106065A1 (en) Process for automatically handling electronic requests for notification of unsolicited commercial email and other service disruptions
CN105635080A (zh) 一种基于内容过滤的电子邮件安全管理系统和方法
EP1721429A1 (en) A method and apparatus to use a statistical model to classify electronic communications
CN100456755C (zh) 消息过滤方法及其装置
KR100857124B1 (ko) 유해 메시지 여과 시스템과 그 여과 방법 및 이를 기록한기록매체
WO2005001649A2 (en) Defending against unwanted communications by striking back against the beneficiaries
JP2006245813A (ja) フィルタリングシステム、フィルタ作成エンジン、フィルタリング方法およびプログラム
Karagiannis et al. Email information flow in large-scale enterprises
KR20050078311A (ko) 다중 메일 서버의 스팸메일 탐지 및 관리 방법과 그 시스템
JP2006059313A (ja) 迷惑メールを除去するフィルターリング装置
Yamakawa et al. Analysis of spam mail sent to Japanese mail addresses in the long term
CN113839950A (zh) 基于终端邮件smtp协议的邮件审批方法及系统

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: New jersey, USA

Patentee after: ALCATEL-LUCENT USA Inc.

Address before: New jersey, USA

Patentee before: Lucent Technologies Inc.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20190529

Address after: American New York

Patentee after: Origin Asset Group Co.,Ltd.

Address before: New jersey, USA

Patentee before: ALCATEL-LUCENT USA Inc.

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110406

Termination date: 20181220