WO2015032123A1 - PROCÉDÉ ET DISPOSITIF POUR <sb />EXTRAIRE UN NUMÉRO, D'UN MESSAGE ÉLECTRONIQUE - Google Patents

PROCÉDÉ ET DISPOSITIF POUR <sb />EXTRAIRE UN NUMÉRO, D'UN MESSAGE ÉLECTRONIQUE Download PDF

Info

Publication number
WO2015032123A1
WO2015032123A1 PCT/CN2013/086174 CN2013086174W WO2015032123A1 WO 2015032123 A1 WO2015032123 A1 WO 2015032123A1 CN 2013086174 W CN2013086174 W CN 2013086174W WO 2015032123 A1 WO2015032123 A1 WO 2015032123A1
Authority
WO
WIPO (PCT)
Prior art keywords
byte
symbol
double
pure
email
Prior art date
Application number
PCT/CN2013/086174
Other languages
English (en)
Chinese (zh)
Inventor
陈颖棠
叶远鹏
Original Assignee
盈世信息科技(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 盈世信息科技(北京)有限公司 filed Critical 盈世信息科技(北京)有限公司
Publication of WO2015032123A1 publication Critical patent/WO2015032123A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/561Virus type analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/21Monitoring or handling of messages
    • H04L51/212Monitoring or handling of messages using filtering or selective blocking

Definitions

  • the present invention relates to the field of electronic mail technologies, and in particular, to a method for extracting numbers in an email and an apparatus therefor. Background technique
  • e-mail is the most commonly used function for people's office and communication.
  • e-mail is a commonly used basic application. Users can send information to each other by sending e-mails, which is very convenient, but also generates junk e-mail problems.
  • Spam email refers to any email that is forcibly sent to a user's email address without the permission of the user (receiver).
  • the content of the spam email includes promotional advertisements, adult advertisements, earning information, or computer viruses.
  • the computer system of the recipient user is compromised.
  • These spam emails have caused problems for mailbox users and affected the user experience of mailbox users. Therefore, all major mail providers have made the promotion of email anti-spam system an important concern for improving the experience of mailbox users.
  • the object of the present invention is to overcome the deficiencies of the prior art.
  • the present invention provides a method for extracting numbers in an email and an apparatus thereof, which can reduce the difficulty of number extraction and reduce resource consumption.
  • the present invention provides a method for extracting a number in an email, the method comprising: Identifying a single symbol in the email and obtaining a recognition result;
  • the determination result is converted to obtain a pure numeric number string.
  • the step of identifying a single symbol in the email and obtaining the recognition result comprises: identifying, according to the character encoding, that the symbol is a single-byte symbol or a double-byte symbol.
  • the step of performing classification determination on the identification result, and obtaining the determination result includes:
  • the symbol is a single-byte symbol, it is determined according to the character encoding whether it is a single-byte pure number, or whether it is a single-byte separator;
  • the symbol is a double-byte symbol
  • the step of converting the determination result to obtain a pure digital number string comprises:
  • the method further comprises: performing a verification record on the pure digital number string.
  • the present invention further provides an apparatus for extracting numbers in an email, the apparatus comprising:
  • An identification module configured to identify a single symbol in the email, and obtain a recognition result
  • a determination module configured to perform classification determination on the recognition result obtained by the identification module, to obtain a determination result
  • a conversion module configured to convert the determination result obtained by the determination module to obtain a pure digital number string.
  • the identifying module is configured to identify, according to the character encoding, that the symbol is a single-byte symbol or a double-byte symbol.
  • the determining module is further configured to: when determining that the symbol is a single-byte symbol, determine whether it is a single-byte pure number according to the character encoding, or whether it is a single-byte separator; When it is determined that the symbol is a double-byte symbol, it is determined according to the character encoding whether it is a double-byte symbol number, or whether it is a double-byte separator.
  • the conversion module is configured to directly record the number if the determination result is a single-byte pure number, and to convert to a single if the determination result is a double-byte character Byte characters, and converted to a pure numeric number.
  • the device further includes: an inspection record module, configured to perform inspection record on the pure digital number string.
  • Embodiments of the present invention can identify a delimited number and a symbol number in a subject or content of an email, and convert the mixed number into a pure numeric number string, which can reduce the difficulty of number extraction and reduce resources. Consumption; and the analysis of anti-spam modules in emails and the application of rules to quickly identify whether it is spam or not, which is convenient for users.
  • FIG. 1 is a schematic flowchart of a method for extracting numbers in an email according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of a device for extracting numbers in an email according to an embodiment of the present invention.
  • the main function of the anti-spam module in the e-mail system is to analyze e-mail, perform feature recording and statistics, and determine whether it is junk e-mail, while the traditional anti-spam module cannot identify "400-235-335".
  • the meaning of "400-235335" is the same, it refers to "400235335", and the system can only determine that the two sets of numbers are different. Therefore, a unified number representation is needed to allow the email system to recognize and avoid the interference caused by the difference in symbols.
  • FIG. 1 is a schematic flowchart of a method for extracting numbers in an email according to an embodiment of the present invention. As shown in FIG. 1, the method includes:
  • the symbol is identified as a single-byte symbol or a double-byte symbol according to the character encoding. It is recognized whether the extracted symbol is a single-byte symbol or a double-byte symbol depending on the character of the character encoding (whether the highest bit is 1 or not). If the symbol is a single-byte symbol, one byte of content is taken; if the symbol is a double-byte symbol, two bytes of content are taken.
  • determining symbol when the determining symbol is a single-byte symbol, determining whether it is a single-byte pure number according to the character encoding, or whether it is a single-byte separator; when the determining symbol is a double-byte symbol, according to the character
  • the encoding determines whether it is a double-byte symbol number, or whether it is a double-byte separator.
  • the symbol is a single-byte symbol, it is determined according to the content of the character encoding whether it is a single-byte pure number "0-9", or whether it is a single-byte separator; if the symbol is a double word
  • the symbol is determined according to the content of the character encoding, whether it is a symbol number (such as "9”, such as "9” is 0xA2, OxEl), or whether it is a double-byte separator.
  • the pure digital number string can also be inspected, including whether it is a pure digital number, whether the length of the number meets the requirements, and whether recording is required or the like.
  • Embodiments of the method of the present invention can identify a delimited number and a symbol number in a subject or content of an email, and convert the mixed number into a pure numeric number string, which can reduce the difficulty of number extraction, and Reduce resource consumption; and facilitate the analysis of anti-spam modules in emails and the application of rules to quickly identify whether it is spam or not, which is convenient for users.
  • An embodiment of the present invention further provides an apparatus for extracting numbers in an email.
  • the apparatus includes: an identification module 1 configured to identify a single symbol in an email, and obtain a recognition result. ;
  • the determining module 2 is configured to perform classification determination on the recognition result obtained by the identification module 1 to obtain a determination result
  • the conversion module 3 is configured to convert the determination result obtained by the determination module 2 to obtain a pure digital number string.
  • the identification module 1 is configured to identify the symbol as a single-byte symbol or a double-byte symbol according to the character encoding.
  • the specific way is: According to the characteristics of the character encoding (whether the highest bit is 1 or not), the extracted symbol is identified as a single-byte symbol or a double-byte symbol. If the symbol is a single-byte symbol, take one byte of content; if the symbol is a double-byte symbol, take two bytes of content.
  • the determining module 2 is further configured to: when determining that the symbol is a single-byte symbol, determine whether it is a single-byte pure number according to the character encoding, or whether it is a single-byte separator; and when determining that the symbol is double-byte When the symbol is used, it is determined whether it is a double-byte symbol number based on the character encoding, or whether it is a double-byte separator.
  • the determining module 2 determines whether it is a single-byte pure number "0-9" according to the content of the character encoding, or whether it is a single-byte separator; When it is a double-byte symbol, the decision module 2 determines whether it is a symbol number ("9" or the like, such as "9" is 0xA2, OxEl), or whether it is a double-byte delimiter according to the content of the character encoding.
  • the conversion module 3 is further configured to directly record the number if the result of the determination is a single-byte pure number; and to convert to a single-byte character if the result of the determination is a double-byte character, and Convert to a pure numeric number.
  • the apparatus may further include: an inspection record module (not shown) for performing inspection record on the pure digital number string, including whether it is a pure digital number, whether the length of the number meets the requirements, and whether Need to record, etc.
  • an inspection record module (not shown) for performing inspection record on the pure digital number string, including whether it is a pure digital number, whether the length of the number meets the requirements, and whether Need to record, etc.
  • Embodiments of the apparatus embodying the present invention can identify a delimited number and a symbol number in a subject or content of an e-mail, and convert the mixed number into a pure numeric number string, which can reduce the difficulty of number extraction, and Reduce resource consumption; and facilitate the analysis of anti-spam modules in emails and the application of rules to quickly identify whether it is spam or not, which is convenient for users.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Virology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

L'invention concerne un procédé et un dispositif pour extraire un numéro, d'un message électronique Le procédé consiste à : reconnaître un symbole unique dans un message électronique et obtenir un résultat de reconnaissance; réaliser une détermination de classification sur le résultat de reconnaissance et obtenir un résultat de détermination; et convertir le résultat de détermination, et obtenir une chaîne de nombres numériques purs. Les modes de réalisation de la présente invention permettent de reconnaître un nombre avec un séparateur et un nombre symbolique dans un sujet ou un contenu d'un message électronique, et de convertir un nombre mixte en une chaîne de nombres numériques purs, de sorte à simplifier l'extraction de nombres et réduire la consommation de ressources. D'autre part, comme l'analyse d'un module anti-spam et l'application d'une règle dans le message électronique sont rendues possible, il est possible de reconnaître rapidement que le message électronique est un message indésirable, ce qui améliore l'expérience de l'utilisateur.
PCT/CN2013/086174 2013-09-04 2013-10-29 PROCÉDÉ ET DISPOSITIF POUR <sb />EXTRAIRE UN NUMÉRO, D'UN MESSAGE ÉLECTRONIQUE WO2015032123A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310397191.5 2013-09-04
CN201310397191.5A CN103490980B (zh) 2013-09-04 2013-09-04 一种电子邮件中号码的提取方法及其装置

Publications (1)

Publication Number Publication Date
WO2015032123A1 true WO2015032123A1 (fr) 2015-03-12

Family

ID=49830951

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/086174 WO2015032123A1 (fr) 2013-09-04 2013-10-29 PROCÉDÉ ET DISPOSITIF POUR <sb />EXTRAIRE UN NUMÉRO, D'UN MESSAGE ÉLECTRONIQUE

Country Status (2)

Country Link
CN (1) CN103490980B (fr)
WO (1) WO2015032123A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020366B (zh) * 2017-12-07 2021-06-15 北大方正集团有限公司 邮箱信息抽取方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101304589A (zh) * 2008-04-14 2008-11-12 中国联合通信有限公司 利用短信网关发送垃圾短信的监控与过滤方法及系统
CN101784022A (zh) * 2009-01-16 2010-07-21 北京炎黄新星网络科技有限公司 短信过滤、分类方法及系统
CN102088697A (zh) * 2010-12-17 2011-06-08 北京华中融合科技有限公司 垃圾短信的处理方法和系统
US20120005589A1 (en) * 2010-07-05 2012-01-05 Seohyun Han Mobile terminal and method for controlling the operation of the mobile terminal

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101087259A (zh) * 2006-06-07 2007-12-12 深圳市都护网络科技有限公司 一种过滤国际互联网络中垃圾电子邮件的系统及其实现方法
CN102078984A (zh) * 2010-11-26 2011-06-01 西南铝业(集团)有限责任公司 分流模上模芯头工作带的加工方法及加工系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101304589A (zh) * 2008-04-14 2008-11-12 中国联合通信有限公司 利用短信网关发送垃圾短信的监控与过滤方法及系统
CN101784022A (zh) * 2009-01-16 2010-07-21 北京炎黄新星网络科技有限公司 短信过滤、分类方法及系统
US20120005589A1 (en) * 2010-07-05 2012-01-05 Seohyun Han Mobile terminal and method for controlling the operation of the mobile terminal
CN102088697A (zh) * 2010-12-17 2011-06-08 北京华中融合科技有限公司 垃圾短信的处理方法和系统

Also Published As

Publication number Publication date
CN103490980A (zh) 2014-01-01
CN103490980B (zh) 2017-07-28

Similar Documents

Publication Publication Date Title
CN104509041B (zh) 被遗忘的附件的检测方法及装置
CN103546446B (zh) 一种钓鱼网站的检测方法、装置和终端
US20170289082A1 (en) Method and device for identifying spam mail
US10387460B2 (en) Method and apparatus for processing text information
GB2483358A (en) Markov parsing of email message using annotations
CN1691631A (zh) 用于对电子名片进行管理的方法
CN112487149B (zh) 一种文本审核方法、模型、设备及存储介质
CN114157502B (zh) 一种终端识别方法、装置、电子设备及存储介质
WO2016000545A1 (fr) Procédé, appareil et dispositif électronique d&#39;identification de fichier d&#39;image poubelle
US20160088106A1 (en) Method and apparatus of processing a doi (digital object unique identifier) in interaction information
CN113114707B (zh) 一种电力芯片以太网控制器规则过滤方法
CN112307369A (zh) 一种短链接处理方法、装置、终端及存储介质
US11010687B2 (en) Detecting abusive language using character N-gram features
WO2021114634A1 (fr) Procédé d&#39;annotation de texte, dispositif, et support de stockage
US8955127B1 (en) Systems and methods for detecting illegitimate messages on social networking platforms
US20200304448A1 (en) System and Method for Detecting and Predicting Level of Importance of Electronic Mail Messages
US9672819B2 (en) Linguistic model database for linguistic recognition, linguistic recognition device and linguistic recognition method, and linguistic recognition system
WO2015032123A1 (fr) PROCÉDÉ ET DISPOSITIF POUR &lt;sb /&gt;EXTRAIRE UN NUMÉRO, D&#39;UN MESSAGE ÉLECTRONIQUE
CN104376304A (zh) 一种文本广告图像的识别方法及装置
CN116055067A (zh) 一种弱口令检测的方法、装置、电子设备及介质
CN103853784B (zh) 一种移动终端的网页匹配方法、装置和系统
CN115774762A (zh) 即时通讯信息处理方法、装置、设备及存储介质
CN106294292B (zh) 章节目录筛选方法及装置
CN113220949B (zh) 一种隐私数据识别系统的构建方法及装置
CN113420549A (zh) 异常字符串识别方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13892956

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13892956

Country of ref document: EP

Kind code of ref document: A1