CN116150752A - Mail attachment virus identification method, system, equipment and storable medium - Google Patents
Mail attachment virus identification method, system, equipment and storable medium Download PDFInfo
- Publication number
- CN116150752A CN116150752A CN202211728728.7A CN202211728728A CN116150752A CN 116150752 A CN116150752 A CN 116150752A CN 202211728728 A CN202211728728 A CN 202211728728A CN 116150752 A CN116150752 A CN 116150752A
- Authority
- CN
- China
- Prior art keywords
- mail attachment
- attachment
- file
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 241000700605 Viruses Species 0.000 title claims abstract description 45
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000000605 extraction Methods 0.000 claims abstract description 12
- 238000004590 computer program Methods 0.000 claims description 10
- 238000003058 natural language processing Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 5
- 238000005516 engineering process Methods 0.000 claims description 4
- 230000002155 anti-virotic effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/107—Computer-aided management of electronic mailing [e-mailing]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02W—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO WASTEWATER TREATMENT OR WASTE MANAGEMENT
- Y02W90/00—Enabling technologies or technologies with a potential or indirect contribution to greenhouse gas [GHG] emissions mitigation
Abstract
The invention discloses a mail attachment virus identification method, a system, a device and a storage medium, which are characterized in that whether a mail attachment file header is included is judged by judging a mail file, if yes, mail text content and the mail attachment are extracted, otherwise, mail text content is extracted, then file header characteristic extraction is carried out on the mail attachment, format identification is carried out on the mail attachment according to the file header characteristic, whether a mail attachment is a risk mail attachment is identified, the efficiency of mail attachment virus identification can be improved, the accuracy of mail attachment virus identification can be improved, in addition, a risk mail attachment sample can be effectively saved by extracting and disassembling the mail attachment text content, and the management person can sample conveniently.
Description
Technical Field
The present invention relates to the field of mail attachment virus identification technologies, and in particular, to a mail attachment virus identification method, system, device, and storable medium.
Background
In daily work of people, mail is necessary for communication, so that the attacks carried by the mail are very much, malicious information can be transmitted through the mail to induce users to perform some operations, malicious network links can be transmitted, user name passwords and the like are obtained, and malicious programs are transmitted, and the malicious programs usually exist in an attachment form, so that virus identification of mail attachments is very important, the existing mail attachment identification method is to analyze the mail to obtain the mail attachments, and the attachments are identified by using antivirus software, so that the effect of virus identification of the mail attachments is achieved, however, the identification accuracy is lower in the prior art based on feature codes.
Disclosure of Invention
In view of the above, the present invention provides a method, a system, a device and a storage medium for identifying mail attachment viruses, which can solve the defect of low identification accuracy in the prior art.
The technical scheme of the invention is realized as follows:
a mail attachment virus identification method specifically comprises the following steps:
acquiring a mail file;
judging whether the mail file comprises a mail attachment file header, if yes, extracting mail text content and mail attachment, otherwise extracting mail text content;
extracting file header characteristics of the mail attachment;
and carrying out format identification on the mail attachment according to the file header characteristics, and identifying whether the mail attachment is a risk mail attachment, if so, extracting and disassembling and storing text content of the mail attachment, thereby realizing identification of mail attachment viruses.
As a further alternative to the mail attachment virus identification method, the file header features of the mail attachment include plain text format, document format, audio video format, picture format, application format, and other formats.
As a further alternative of the mail attachment virus identification method, the identifying the mail attachment according to the file header feature, and identifying whether the mail attachment is a risk mail attachment specifically includes:
and identifying the file header characteristics of the mail attachment, and identifying whether the file header characteristics of the mail attachment belong to any one of a plain text format, a document format, a voice video format, a picture format and an application program format, if so, judging that the mail attachment is a normal mail attachment, otherwise, judging that the mail attachment is a risk mail attachment.
As a further alternative of the mail attachment virus identification method, the extracting text content of the mail attachment from the mail attachment specifically includes:
converting the mail attachment into plain text;
the names of people, goods, businesses, addresses, containers, customs notes and contact phones in plain text are identified based on natural language processing techniques.
As a further alternative of the mail attachment virus identification method, the method for disassembling and storing the mail attachment specifically includes:
acquiring file storage rules and file storage positions;
disassembling the mail attachment according to the file storage rule;
and storing the disassembled mail attachments according to the file storage positions.
A mail attachment virus identification system comprising:
the first acquisition module is used for acquiring mail files;
the judging module is used for judging whether the mail file comprises a mail attachment file header or not;
the first extraction module is used for extracting the mail text content and the mail attachment or extracting the mail text content;
the second extraction module is used for extracting the file header characteristics of the mail attachment;
the identification module is used for carrying out format identification on the mail attachment according to the file header characteristics and identifying whether the mail attachment is a risk mail attachment or not;
the third extraction module is used for extracting the text content of the mail attachment;
and the disassembly storage module is used for carrying out disassembly storage on the text content of the mail attachment.
As a further alternative to the mail attachment virus identification system, the identification module includes:
the file head characteristic identification module is used for identifying the file head characteristic of the mail attachment and identifying whether the file head characteristic of the mail attachment belongs to any one of a plain text format, a document format, a voice video format, a picture format and an application program format;
and the judging module is used for judging whether the mail attachment is a risk mail attachment according to the file header characteristic identification.
As a further alternative of the mail attachment virus identification system, the third extraction module includes a conversion module and a processing module, and the disassembly storage module includes a second acquisition module, a disassembly module and a storage module, where:
the conversion module is used for converting the mail attachment into a plain text;
the processing module is used for identifying names, goods, enterprises, addresses, containers, customs notes and contact phones in the plain text based on natural language processing technology;
the second acquisition module is used for acquiring file storage rules and file storage positions;
the disassembly module is used for disassembling the mail attachment according to the file storage rule;
and the storage module is used for storing the disassembled mail attachments according to the file storage position.
A computing device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of any of the mail attachment virus identification methods described above when the computer program is executed.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of any of the mail attachment virus identification methods described above.
The beneficial effects of the invention are as follows: judging whether the mail file comprises a mail attachment file header, if yes, extracting the mail text content and the mail attachment, otherwise extracting the mail text content, then extracting file header characteristics of the mail attachment, identifying the format of the mail attachment according to the file header characteristics, and identifying whether the mail attachment is a risk mail attachment, thereby not only improving the efficiency of identifying the mail attachment virus, but also improving the accuracy of identifying the mail attachment virus.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a mail attachment virus identification method of the present invention;
FIG. 2 is a schematic diagram of a mail attachment virus identification system according to the present invention.
Detailed Description
The following description of the technical solutions in the embodiments of the present invention will be clear and complete, and it is obvious that the described embodiments are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1-2, a mail attachment virus identification method specifically includes:
acquiring a mail file;
judging whether the mail file comprises a mail attachment file header, if yes, extracting mail text content and mail attachment, otherwise extracting mail text content;
extracting file header characteristics of the mail attachment;
and carrying out format identification on the mail attachment according to the file header characteristics, and identifying whether the mail attachment is a risk mail attachment, if so, extracting and disassembling and storing text content of the mail attachment, thereby realizing identification of mail attachment viruses.
In this embodiment, whether the mail file includes a mail attachment file header is determined first, if yes, the mail text content and the mail attachment are extracted, otherwise, the mail text content is extracted, then the mail attachment is extracted by file header feature, format recognition is performed on the mail attachment according to the file header feature, and whether the mail attachment is a risk mail attachment is recognized, so that the efficiency of mail attachment virus recognition can be improved, the accuracy of mail attachment virus recognition can be improved, in addition, the risk mail attachment sample can be effectively saved by extracting and disassembling the mail attachment text content, and the management is facilitated.
Preferably, the file header feature of the mail attachment includes plain text format, document format, audio video format, picture format, application format, and other formats.
Preferably, the identifying the mail attachment according to the file header feature includes:
and identifying the file header characteristics of the mail attachment, and identifying whether the file header characteristics of the mail attachment belong to any one of a plain text format, a document format, a voice video format, a picture format and an application program format, if so, judging that the mail attachment is a normal mail attachment, otherwise, judging that the mail attachment is a risk mail attachment.
In this embodiment, if the file header of the mail attachment is characterized by other formats, the mail attachment is considered to be a risk mail attachment.
Preferably, the extracting text content of the mail attachment includes:
converting the mail attachment into plain text;
the names of people, goods, businesses, addresses, containers, customs notes and contact phones in plain text are identified based on natural language processing techniques.
Preferably, the disassembling and storing the mail attachment specifically includes:
acquiring file storage rules and file storage positions;
disassembling the mail attachment according to the file storage rule;
and storing the disassembled mail attachments according to the file storage positions.
In this embodiment, by acquiring the file storage rule and the file storage position, the mail attachment can be accurately disassembled, and the disassembled mail attachment can be accurately stored.
A mail attachment virus identification system comprising:
the first acquisition module is used for acquiring mail files;
the judging module is used for judging whether the mail file comprises a mail attachment file header or not;
the first extraction module is used for extracting the mail text content and the mail attachment or extracting the mail text content;
the second extraction module is used for extracting the file header characteristics of the mail attachment;
the identification module is used for carrying out format identification on the mail attachment according to the file header characteristics and identifying whether the mail attachment is a risk mail attachment or not;
the third extraction module is used for extracting the text content of the mail attachment;
and the disassembly storage module is used for carrying out disassembly storage on the text content of the mail attachment.
In this embodiment, whether the mail file includes a mail attachment file header is determined first, if yes, the mail text content and the mail attachment are extracted, otherwise, the mail text content is extracted, then the mail attachment is extracted by file header feature, format recognition is performed on the mail attachment according to the file header feature, and whether the mail attachment is a risk mail attachment is recognized, so that the efficiency of mail attachment virus recognition can be improved, the accuracy of mail attachment virus recognition can be improved, in addition, the risk mail attachment sample can be effectively saved by extracting and disassembling the mail attachment text content, and the management is facilitated.
Preferably, the identification module includes:
the file head characteristic identification module is used for identifying the file head characteristic of the mail attachment and identifying whether the file head characteristic of the mail attachment belongs to any one of a plain text format, a document format, a voice video format, a picture format and an application program format;
and the judging module is used for judging whether the mail attachment is a risk mail attachment according to the file header characteristic identification.
In this embodiment, if the file header of the mail attachment is characterized by other formats, the mail attachment is considered to be a risk mail attachment.
Preferably, the third extracting module includes a converting module and a processing module, and the disassembling storage module includes a second obtaining module, a disassembling module and a storage module, where:
the conversion module is used for converting the mail attachment into a plain text;
the processing module is used for identifying names, goods, enterprises, addresses, containers, customs notes and contact phones in the plain text based on natural language processing technology;
the second acquisition module is used for acquiring file storage rules and file storage positions;
the disassembly module is used for disassembling the mail attachment according to the file storage rule;
and the storage module is used for storing the disassembled mail attachments according to the file storage position.
In this embodiment, by acquiring the file storage rule and the file storage position, the mail attachment can be accurately disassembled, and the disassembled mail attachment can be accurately stored.
A computing device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of any of the mail attachment virus identification methods described above when the computer program is executed.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of any of the mail attachment virus identification methods described above.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.
Claims (10)
1. The mail attachment virus identification method is characterized by comprising the following steps:
acquiring a mail file;
judging whether the mail file comprises a mail attachment file header, if yes, extracting mail text content and mail attachment, otherwise extracting mail text content;
extracting file header characteristics of the mail attachment;
and carrying out format identification on the mail attachment according to the file header characteristics, and identifying whether the mail attachment is a risk mail attachment, if so, extracting and disassembling and storing text content of the mail attachment, thereby realizing identification of mail attachment viruses.
2. The mail attachment virus identification method of claim 1, wherein the file header characteristics of the mail attachment include plain text format, document format, audio video format, picture format, application format, and other formats.
3. The method for identifying mail attachment virus according to claim 2, wherein the identifying the mail attachment in the format according to the header feature, and identifying whether the mail attachment is a risk mail attachment, specifically comprises:
and identifying the file header characteristics of the mail attachment, and identifying whether the file header characteristics of the mail attachment belong to any one of a plain text format, a document format, a voice video format, a picture format and an application program format, if so, judging that the mail attachment is a normal mail attachment, otherwise, judging that the mail attachment is a risk mail attachment.
4. A method for identifying a mail attachment virus according to claim 3, wherein the extracting text content of the mail attachment comprises:
converting the mail attachment into plain text;
the names of people, goods, businesses, addresses, containers, customs notes and contact phones in plain text are identified based on natural language processing techniques.
5. The method for identifying a mail attachment virus according to claim 4, wherein the disassembling and storing the mail attachment specifically comprises:
acquiring file storage rules and file storage positions;
disassembling the mail attachment according to the file storage rule;
and storing the disassembled mail attachments according to the file storage positions.
6. A mail attachment virus identification system, comprising:
the first acquisition module is used for acquiring mail files;
the judging module is used for judging whether the mail file comprises a mail attachment file header or not;
the first extraction module is used for extracting the mail text content and the mail attachment or extracting the mail text content;
the second extraction module is used for extracting the file header characteristics of the mail attachment;
the identification module is used for carrying out format identification on the mail attachment according to the file header characteristics and identifying whether the mail attachment is a risk mail attachment or not;
the third extraction module is used for extracting the text content of the mail attachment;
and the disassembly storage module is used for carrying out disassembly storage on the text content of the mail attachment.
7. The mail attachment virus identification system of claim 6, wherein the identification module comprises:
the file head characteristic identification module is used for identifying the file head characteristic of the mail attachment and identifying whether the file head characteristic of the mail attachment belongs to any one of a plain text format, a document format, a voice video format, a picture format and an application program format;
and the judging module is used for judging whether the mail attachment is a risk mail attachment according to the file header characteristic identification.
8. The mail attachment virus recognition system of claim 7, wherein the third extraction module comprises a conversion module and a processing module, and the disassemble storage module comprises a second acquisition module, a disassemble module, and a storage module, wherein:
the conversion module is used for converting the mail attachment into a plain text;
the processing module is used for identifying names, goods, enterprises, addresses, containers, customs notes and contact phones in the plain text based on natural language processing technology;
the second acquisition module is used for acquiring file storage rules and file storage positions;
the disassembly module is used for disassembling the mail attachment according to the file storage rule;
and the storage module is used for storing the disassembled mail attachments according to the file storage position.
9. A computing device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the mail attachment virus identification method of any one of claims 1-5 when the computer program is executed.
10. A computer readable storage medium, wherein a computer program is stored on said storage medium, said computer program when executed by a processor implementing the steps of the mail attachment virus identification method of any one of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211728728.7A CN116150752A (en) | 2022-12-30 | 2022-12-30 | Mail attachment virus identification method, system, equipment and storable medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211728728.7A CN116150752A (en) | 2022-12-30 | 2022-12-30 | Mail attachment virus identification method, system, equipment and storable medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116150752A true CN116150752A (en) | 2023-05-23 |
Family
ID=86340077
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211728728.7A Pending CN116150752A (en) | 2022-12-30 | 2022-12-30 | Mail attachment virus identification method, system, equipment and storable medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116150752A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8443447B1 (en) * | 2009-08-06 | 2013-05-14 | Trend Micro Incorporated | Apparatus and method for detecting malware-infected electronic mail |
CN103546449A (en) * | 2012-12-24 | 2014-01-29 | 哈尔滨安天科技股份有限公司 | E-mail virus detection method and device based on attachment formats |
CN105991395A (en) * | 2015-01-30 | 2016-10-05 | 杭州迪普科技有限公司 | Attachment replacing method and attachment replacing device |
CN115134147A (en) * | 2022-06-29 | 2022-09-30 | 中国工商银行股份有限公司 | E-mail detection method and device |
-
2022
- 2022-12-30 CN CN202211728728.7A patent/CN116150752A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8443447B1 (en) * | 2009-08-06 | 2013-05-14 | Trend Micro Incorporated | Apparatus and method for detecting malware-infected electronic mail |
CN103546449A (en) * | 2012-12-24 | 2014-01-29 | 哈尔滨安天科技股份有限公司 | E-mail virus detection method and device based on attachment formats |
CN105991395A (en) * | 2015-01-30 | 2016-10-05 | 杭州迪普科技有限公司 | Attachment replacing method and attachment replacing device |
CN115134147A (en) * | 2022-06-29 | 2022-09-30 | 中国工商银行股份有限公司 | E-mail detection method and device |
Non-Patent Citations (2)
Title |
---|
张小磊: "计算机病毒诊断与防治", 中国环境科学出版社, pages: 316 * |
盖璇;: "基于聚类分析算法的垃圾邮件识别", 计算机与现代化, no. 10, pages 21 - 26 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20170289082A1 (en) | Method and device for identifying spam mail | |
RU2701040C1 (en) | Method and a computer for informing on malicious web resources | |
CN114817968B (en) | Method, device and equipment for tracing path of featureless data and storage medium | |
CN112839012A (en) | Zombie program domain name identification method, device, equipment and storage medium | |
CN102045268A (en) | Method and device for recovering email data | |
CN108038441B (en) | System and method based on image recognition | |
JP5731361B2 (en) | Character string conversion method and character string conversion program | |
CN116305113A (en) | Executable file detection method, device, equipment and storage medium | |
CN114598597A (en) | Multi-source log analysis method and device, computer equipment and medium | |
CN113468395A (en) | Internet asset fingerprint identification method and system based on inverted index | |
CN116150752A (en) | Mail attachment virus identification method, system, equipment and storable medium | |
CN112669850A (en) | Voice quality detection method and device, computer equipment and storage medium | |
CN110955796B (en) | Case feature information extraction method and device based on stroke information | |
CN109510904B (en) | Method and system for detecting call center outbound record | |
CN111552783A (en) | Content analysis query method, device, equipment and computer storage medium | |
CN116015777A (en) | Document detection method, device, equipment and storage medium | |
CN113746814B (en) | Mail processing method, mail processing device, electronic equipment and storage medium | |
CN115982675A (en) | Document processing method, device, electronic equipment and storage medium | |
CN113472686B (en) | Information identification method, device, equipment and storage medium | |
CN113704180B (en) | Lossless firmware extraction method based on embedded device firmware file information feature library | |
CN103684991A (en) | Junk mail filtering method based on mail features and content | |
CN113688240A (en) | Threat element extraction method, device, equipment and storage medium | |
CN112733144A (en) | Malicious program intelligent detection method based on deep learning technology | |
JP2012098855A (en) | Specific information extraction apparatus and specific information extraction program | |
CN106506478A (en) | A kind of data evidence collecting method for mobile terminal Zello applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |