CN103490980A - Method for extracting numbers in E-mail and device thereof - Google Patents
Method for extracting numbers in E-mail and device thereof Download PDFInfo
- Publication number
- CN103490980A CN103490980A CN201310397191.5A CN201310397191A CN103490980A CN 103490980 A CN103490980 A CN 103490980A CN 201310397191 A CN201310397191 A CN 201310397191A CN 103490980 A CN103490980 A CN 103490980A
- Authority
- CN
- China
- Prior art keywords
- byte
- symbol
- tal
- determination
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/561—Virus type analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/566—Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/21—Monitoring or handling of messages
- H04L51/212—Monitoring or handling of messages using filtering or selective blocking
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Virology (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computing Systems (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The embodiment of the invention discloses a method for extracting numbers in an E-mail and a device thereof. The method comprises the steps that a single symbol in the E-mail is identified, and an identified result is obtained; classification determination is conducted on the identified result to obtain a determined result; the determined result is converted to obtain a pure digital number string. According to the method for extracting the numbers in the E-mail and the device thereof, numbers with separators and symbolic numbers in the subject or the content of the E-mail can be identified, the hybrid numbers are converted into the pure digital number string, the difficulty for extracting the numbers can be lowered, and the consumption of resources is lowered. Analysis of an anti-spam module and rule application in the E-mail are facilitated, so that whether the E-mail is a junk E-mail or not is quickly identified, and convenience is brought to a user.
Description
Technical field
The present invention relates to the e-mail technique field, particularly relate to extracting method and the device thereof of number in a kind of Email.
Background technology
Along with the development of mobile terminal technology, the various mobile devices such as mobile phone, palmtop PC, flat board, notebook have become in people work, life must an obligato part, and Email to be people handle official business, communicate by letter one of the most frequently used function.In Internet user's various application, Email is a kind of base application more commonly used, and the user can transmit information to the other side by sending Email, very convenient, but has also produced the problem of SPAM simultaneously.
SPAM refers to without user (recipient) license with regard to any Email in the E-mail address that sends to by force the user, the content of SPAM comprises promotional advertising, adult's advertisement, money-making information, perhaps comprise computer virus, so that recipient user's computer system is encroached on.These SPAMs have brought puzzlement to mailbox user, have had influence on the experience of mailbox user, so each large mail provider is all using promoting the anti-garbage system effect of Email as the significant concern point that promotes mailbox user experience.
Prior art exists whether a kind of identification of the form by extraction number Email is SPAM, the extraction of number is mainly extracted in the content of E-mail subject and Email, main application is that the supplementary features as Email are applied to anti-rubbish field, as some leave the spam of contact method, extracted number can be contrasted with the data in the database that has the rubbish number, whether the identification Email of take is SPAM, there are two kinds of modes in the existing technology of extracting number, a kind of is that most number retentions is all that direct searching is digital number series entirely, another kind of mode is to use regular expression to carry out number retention.
Directly searching is that the applicability of method of digital number retention is narrower entirely, is only applicable to Connected digits, and None-identified is with the number of separator; Just identify and extract legal string and use regular expression to carry out number identification, owing to itself thering is powerful function, cause the difficulty of writing with testing authentication larger, and consumption of natural resource relatively.The number that above-mentioned two kinds of methods extract is all original character string, can not convert general cardinar number word string to, the application of the analysis of inconvenient anti-rubbish module and rule.
Summary of the invention
The object of the invention is to overcome the deficiencies in the prior art, the invention provides extracting method and the device thereof of number in a kind of Email, can reduce the difficulty of number retention, and the consumption that reduces resource.
In order to address the above problem, the present invention proposes the extracting method of number in a kind of Email, described method comprises:
Single symbol in described Email is identified, and obtained recognition result;
To the judgement of classifying of described recognition result, obtain result of determination;
Described result of determination is changed, obtained the pure digi-tal number series.
Preferably, described single symbol in described Email is identified, and the step that obtains recognition result comprises:
Identifying described symbol according to character code is byte symbol or for the double byte symbol.
Preferably, described to the judgement of classifying of described recognition result, the step that obtains result of determination comprises:
When judging described symbol as the byte symbol, take a decision as to whether the byte pure digi-tal according to character code, or whether be the byte separator;
When judging described symbol as the double byte symbol, take a decision as to whether the double byte symbolic number according to character code, or whether be the double byte separator.
Preferably, described described result of determination is changed, the step that obtains the pure digi-tal number series comprises:
If be judged to be the byte pure digi-tal, directly record this numeral;
If be judged to be double-byte characters, be converted to single-byte character, and be converted to the pure digi-tal number.
Preferably, described method also comprises: to the described pure digi-tal number series record of testing.
Correspondingly, the present invention also provides the extraction element of number in a kind of Email, and described device comprises:
Identification module, identified for the single symbol to described Email, and obtain recognition result;
Determination module, for the recognition result that described identification module the is obtained judgement of classifying, obtain result of determination;
Modular converter, changed for the result of determination that described determination module is obtained, and obtains the pure digi-tal number series.
Preferably, described identification module is byte symbol or for the double byte symbol for identify described symbol according to character code.
Whether preferably, described determination module also, for when judging described symbol as the byte symbol, takes a decision as to whether the byte pure digi-tal according to character code, or be the byte separator; And, for when judging described symbol as the double byte symbol, according to character code, take a decision as to whether the double byte symbolic number, or whether be the double byte separator.
Preferably, if, when described modular converter is the byte pure digi-tal for described result of determination, directly record this numeral; And if while for described result of determination, being double-byte characters, being converted to single-byte character, and being converted to the pure digi-tal number.
Preferably, described device also comprises: the inspection record module, and for record that described pure digi-tal number series is tested.
Implement the embodiment of the present invention, can in the theme of Email or content, identify number and the symbolic number with separator, and will to mix number translated be the pure digi-tal number series, can reduce the difficulty of number retention, and the consumption of reduction resource; And facilitate the analysis of anti-rubbish module in Email and the application of rule, and take whether identify rapidly be SPAM, bring advantage to the user.
The accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, below will the accompanying drawing of required use in embodiment or description of the Prior Art be briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain according to these accompanying drawings other accompanying drawing.
Fig. 1 is the schematic flow sheet of the extracting method of number in the Email of the embodiment of the present invention;
Fig. 2 is that the structure of the extraction element of number in the Email of the embodiment of the present invention forms schematic diagram.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, rather than whole embodiment.Embodiment based in the present invention, those of ordinary skills, not making under the creative work prerequisite the every other embodiment obtained, belong to the scope of protection of the invention.
Feature record and statistics are analyzed, carried out to the Main Function of the anti-rubbish module in e-mail system to Email, and take a decision as to whether SPAM, and the implication of traditional anti-rubbish module None-identified " 400-235-335 " and " 400-235335 " representative is identical, all refer to " 400235335 ", and system can only judge that two sets of numbers are different things.Therefore need a unified number mean mode, e-mail system can be identified, the interference of avoiding the otherness of symbol to bring.
Fig. 1 is the schematic flow sheet of the extracting method of number in the Email of the embodiment of the present invention, and as shown in Figure 1, the method comprises:
S101, identified the single symbol in Email, and obtain recognition result;
S102, to the recognition result judgement of classifying, obtain result of determination;
S103, changed result of determination, obtains the pure digi-tal number series.
Wherein, in S101, according to the character code distinguished symbol, be the byte symbol or be the double byte symbol.Identify according to character-coded characteristic (whether highest order is 1) that to be extracted symbol be byte symbol or double byte symbol.If this symbol is the byte symbol, get a byte content; If this symbol is the double byte symbol, get two byte content.
In S102, when decision symbol is the byte symbol, according to character code, take a decision as to whether the byte pure digi-tal, or whether be the byte separator; When decision symbol is the double byte symbol, takes a decision as to whether the double byte symbolic number according to character code, or whether be the double byte separator.
In concrete enforcement, if when symbol is the byte symbol, according to character-coded content, take a decision as to whether byte pure digi-tal " 0-9 ", or whether be the byte separator; When if symbol is the double byte symbol, according to character-coded content, judge, whether be symbolic number (" 9. " and so on, as 0xA2, the 0xE1 of being encoded to of " 9. "), or whether be the double byte separator.
In S103, if be judged to be the byte pure digi-tal, directly record this numeral; If be judged to be double-byte characters, be converted to single-byte character, and be converted to the pure digi-tal number.
In concrete enforcement, if byte pure digi-tal, direct record; If connector, obtain and continue to process and obtain next symbol; If double-byte characters, convert corresponding single-byte character (because this class symbolic coding is continuous, as long as the value of subtracting each other with start code is exactly the number that will be transformed into, as 9., 0xE1 – 0xA8=0x39, numeral " 9 " is encoded to 0x39) to; If other, current number retention finishes, and whether the verification number needs record, number length etc.
Further, after obtaining the pure digi-tal number series, can also be to the pure digi-tal number series record of testing, comprise whether be whether the length of pure digi-tal number, number meet the requirements and whether need record etc.
Implement embodiment of the method for the present invention, can in the theme of Email or content, identify number and the symbolic number with separator, and will to mix number translated be the pure digi-tal number series, can reduce the difficulty of number retention, and the consumption that reduces resource; And facilitate the analysis of anti-rubbish module in Email and the application of rule, and take whether identify rapidly be SPAM, bring advantage to the user.
The embodiment of the present invention also provides the extraction element of number in a kind of Email, and as shown in Figure 2, this device comprises:
Identification module 1, identified for the single symbol to Email, and obtain recognition result;
Determination module 2, for the recognition result that identification module 1 the is obtained judgement of classifying, obtain result of determination;
Modular converter 3, changed for the result of determination that determination module 2 is obtained, and obtains the pure digi-tal number series.
Wherein, this identification module 1 is for being the byte symbol according to the character code distinguished symbol or being the double byte symbol.Concrete mode is: identify according to character-coded characteristic (whether highest order is 1) that to be extracted symbol be byte symbol or double byte symbol.If this symbol is the byte symbol, get a byte content; If this symbol is the double byte symbol, get two byte content.
Whether determination module 2 also, for when decision symbol is the byte symbol, takes a decision as to whether the byte pure digi-tal according to character code, or be the byte separator; And, for when decision symbol is the double byte symbol, according to character code, take a decision as to whether the double byte symbolic number, or whether be the double byte separator.
In concrete enforcement, whether, when if symbol is the byte symbol, determination module 2 takes a decision as to whether byte pure digi-tal " 0-9 " according to character-coded content, or be the byte separator; Whether whether, when if symbol is the double byte symbol, determination module 2 is judged according to character-coded content, be symbolic number (" 9. " and so on, as 0xA2, the 0xE1 of being encoded to of " 9. "), or be the double byte separator.
In addition, if, when modular converter 3 also is the byte pure digi-tal for result of determination, directly record this numeral; And if while for result of determination, being double-byte characters, being converted to single-byte character, and being converted to the pure digi-tal number.In concrete enforcement, if byte pure digi-tal, direct record; If connector, obtain and continue to process and obtain next symbol; If double-byte characters, convert corresponding single-byte character (because this class symbolic coding is continuous, as long as the value of subtracting each other with start code is exactly the number that will be transformed into, as 9., 0xE1 – 0xA8=0x39, numeral " 9 " is encoded to 0x39) to; If other, current number retention finishes, and whether the verification number needs record, number length etc.
Further, this device can also comprise: inspection record module (not shown), for record that the pure digi-tal number series is tested, comprises whether be whether the length of pure digi-tal number, number meet the requirements and whether need record etc.
Implement device embodiment of the present invention, can in the theme of Email or content, identify number and the symbolic number with separator, and will to mix number translated be the pure digi-tal number series, can reduce the difficulty of number retention, and the consumption that reduces resource; And facilitate the analysis of anti-rubbish module in Email and the application of rule, and take whether identify rapidly be SPAM, bring advantage to the user.
One of ordinary skill in the art will appreciate that all or part of step in the whole bag of tricks of above-described embodiment is to come the hardware that instruction is relevant to complete by program, this program can be stored in a computer-readable recording medium, storage medium can comprise: read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), disk or CD etc.
In addition, in the above Email that the embodiment of the present invention is provided, extracting method and the device thereof of number are described in detail, applied specific case herein principle of the present invention and execution mode are set forth, the explanation of above embodiment is just for helping to understand method of the present invention and core concept thereof; , for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention simultaneously.
Claims (10)
1. the extracting method of number in an Email, is characterized in that, described method comprises:
Single symbol in described Email is identified, and obtained recognition result;
To the judgement of classifying of described recognition result, obtain result of determination;
Described result of determination is changed, obtained the pure digi-tal number series.
2. the extracting method of number in Email as claimed in claim 1, is characterized in that, described single symbol in described Email is identified, and the step that obtains recognition result comprises:
Identifying described symbol according to character code is byte symbol or for the double byte symbol.
3. the extracting method of number in Email as claimed in claim 2, is characterized in that, described to the judgement of classifying of described recognition result, the step that obtains result of determination comprises:
When judging described symbol as the byte symbol, take a decision as to whether the byte pure digi-tal according to character code, or whether be the byte separator;
When judging described symbol as the double byte symbol, take a decision as to whether the double byte symbolic number according to character code, or whether be the double byte separator.
4. the extracting method of number in Email as claimed in claim 3, is characterized in that, described described result of determination changed, and the step that obtains the pure digi-tal number series comprises:
If be judged to be the byte pure digi-tal, directly record this numeral;
If be judged to be double-byte characters, be converted to single-byte character, and be converted to the pure digi-tal number.
5. the extracting method of number in Email as described as claim 1 to 4 any one, is characterized in that, described method also comprises: to the described pure digi-tal number series record of testing.
6. the extraction element of number in an Email, is characterized in that, described device comprises:
Identification module, identified for the single symbol to described Email, and obtain recognition result;
Determination module, for the recognition result that described identification module the is obtained judgement of classifying, obtain result of determination;
Modular converter, changed for the result of determination that described determination module is obtained, and obtains the pure digi-tal number series.
7. the extraction element of number in Email as claimed in claim 6, is characterized in that, described identification module is byte symbol or for the double byte symbol for identify described symbol according to character code.
8. the extraction element of number in Email as claimed in claim 7, it is characterized in that, whether described determination module also, for when judging described symbol as the byte symbol, takes a decision as to whether the byte pure digi-tal according to character code, or be the byte separator; And, for when judging described symbol as the double byte symbol, according to character code, take a decision as to whether the double byte symbolic number, or whether be the double byte separator.
9. the extraction element of number in Email as claimed in claim 8, is characterized in that, if, when described modular converter is the byte pure digi-tal for described result of determination, directly record this numeral; And if while for described result of determination, being double-byte characters, being converted to single-byte character, and being converted to the pure digi-tal number.
10. the extraction element of number in Email as described as claim 6 to 9 any one, is characterized in that, described device also comprises: the inspection record module, and for record that described pure digi-tal number series is tested.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310397191.5A CN103490980B (en) | 2013-09-04 | 2013-09-04 | The extracting method and its device of number in a kind of Email |
PCT/CN2013/086174 WO2015032123A1 (en) | 2013-09-04 | 2013-10-29 | Method and device for extracting number from e-mail |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310397191.5A CN103490980B (en) | 2013-09-04 | 2013-09-04 | The extracting method and its device of number in a kind of Email |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103490980A true CN103490980A (en) | 2014-01-01 |
CN103490980B CN103490980B (en) | 2017-07-28 |
Family
ID=49830951
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310397191.5A Active CN103490980B (en) | 2013-09-04 | 2013-09-04 | The extracting method and its device of number in a kind of Email |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN103490980B (en) |
WO (1) | WO2015032123A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110020366A (en) * | 2017-12-07 | 2019-07-16 | 北大方正集团有限公司 | Mailbox message abstracting method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101087259A (en) * | 2006-06-07 | 2007-12-12 | 深圳市都护网络科技有限公司 | A system for filtering spam in Internet and its implementation method |
CN102078984A (en) * | 2010-11-26 | 2011-06-01 | 西南铝业(集团)有限责任公司 | Method and system for processing core head working tapes of divergent die upper die |
CN102088697A (en) * | 2010-12-17 | 2011-06-08 | 北京华中融合科技有限公司 | Method and system for processing spam |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101304589A (en) * | 2008-04-14 | 2008-11-12 | 中国联合通信有限公司 | Method and system for monitoring and filtering garbage short message transmitted by short message gateway |
CN101784022A (en) * | 2009-01-16 | 2010-07-21 | 北京炎黄新星网络科技有限公司 | Method and system for filtering and classifying short messages |
KR101735613B1 (en) * | 2010-07-05 | 2017-05-24 | 엘지전자 주식회사 | Mobile terminal and operation control method thereof |
-
2013
- 2013-09-04 CN CN201310397191.5A patent/CN103490980B/en active Active
- 2013-10-29 WO PCT/CN2013/086174 patent/WO2015032123A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101087259A (en) * | 2006-06-07 | 2007-12-12 | 深圳市都护网络科技有限公司 | A system for filtering spam in Internet and its implementation method |
CN102078984A (en) * | 2010-11-26 | 2011-06-01 | 西南铝业(集团)有限责任公司 | Method and system for processing core head working tapes of divergent die upper die |
CN102088697A (en) * | 2010-12-17 | 2011-06-08 | 北京华中融合科技有限公司 | Method and system for processing spam |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110020366A (en) * | 2017-12-07 | 2019-07-16 | 北大方正集团有限公司 | Mailbox message abstracting method and device |
Also Published As
Publication number | Publication date |
---|---|
WO2015032123A1 (en) | 2015-03-12 |
CN103490980B (en) | 2017-07-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112487149B (en) | Text auditing method, model, equipment and storage medium | |
CN110083805B (en) | Method and system for converting Word file into EPUB file | |
US8843815B2 (en) | System and method for automatically extracting metadata from unstructured electronic documents | |
CN102243699B (en) | Malicious code detection method and system | |
CN103218363B (en) | Information processing method and device | |
JP2005524892A5 (en) | ||
CN1691631A (en) | Method for management of vcards | |
WO2004072780A3 (en) | Method for automatic and semi-automatic classification and clustering of non-deterministic texts | |
CN103294953B (en) | A kind of mobile phone malicious code detecting method and system | |
CN105094824B (en) | A kind of notification message methods of exhibiting on smartwatch and a kind of smartwatch | |
WO2015032124A1 (en) | E-mail classification method and device thereof | |
CN102541948A (en) | Method and device for extracting document structure | |
CN102467653A (en) | Image-text recognition method and system thereof | |
CN105787047A (en) | Extraction, analysis and conversion method of resume information | |
CN106874448B (en) | Method and device for mining earthquake subject term from microblog | |
CN102291369A (en) | Control method and corresponding control device for verifying junk information settings | |
CN102855244A (en) | Method and device for file catalogue processing | |
CN101794378A (en) | Rubbish image filtering method based on image encoding | |
CN103902564A (en) | File showing method and device | |
CN107506407B (en) | File classification and calling method and device | |
Xu et al. | An approach to image spam filtering based on base64 encoding and N-Gram feature extraction | |
CN105320691A (en) | Account information recognition method and device | |
CN105721292A (en) | Information reading method, device and terminal | |
CN101702835A (en) | Method for realizing to handwrite messages and mobile terminal | |
CN103365934A (en) | Extracting method and device of complex named entity |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20180110 Address after: 510000 Guangdong city of Guangzhou province Panyu District Xiaoguwei Street Mingzhi University City Street No. 1 Building 5 building 503 room information hub Patentee after: Critics of science and Technology (Guangzhou) Co. Ltd. Address before: 100080 room 8003, cyber building, No. 19 South Haidian Road, Beijing, Haidian District Patentee before: MaiMailtech (Beijing) Co., Ltd. |
|
TR01 | Transfer of patent right |