CN116150752A - Mail attachment virus identification method, system, equipment and storable medium - Google Patents

Mail attachment virus identification method, system, equipment and storable medium Download PDF

Info

Publication number
CN116150752A
CN116150752A CN202211728728.7A CN202211728728A CN116150752A CN 116150752 A CN116150752 A CN 116150752A CN 202211728728 A CN202211728728 A CN 202211728728A CN 116150752 A CN116150752 A CN 116150752A
Authority
CN
China
Prior art keywords
mail attachment
mail
attachment
file
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211728728.7A
Other languages
Chinese (zh)
Inventor
邹潜亨
钟伟彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Sunrun Networks Technology Co ltd
Original Assignee
Guangzhou Sunrun Networks Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Sunrun Networks Technology Co ltd filed Critical Guangzhou Sunrun Networks Technology Co ltd
Priority to CN202211728728.7A priority Critical patent/CN116150752A/en
Publication of CN116150752A publication Critical patent/CN116150752A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/107Computer-aided management of electronic mailing [e-mailing]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02WCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO WASTEWATER TREATMENT OR WASTE MANAGEMENT
    • Y02W90/00Enabling technologies or technologies with a potential or indirect contribution to greenhouse gas [GHG] emissions mitigation

Abstract

The invention discloses a mail attachment virus identification method, a system, a device and a storage medium, which are characterized in that whether a mail attachment file header is included is judged by judging a mail file, if yes, mail text content and the mail attachment are extracted, otherwise, mail text content is extracted, then file header characteristic extraction is carried out on the mail attachment, format identification is carried out on the mail attachment according to the file header characteristic, whether a mail attachment is a risk mail attachment is identified, the efficiency of mail attachment virus identification can be improved, the accuracy of mail attachment virus identification can be improved, in addition, a risk mail attachment sample can be effectively saved by extracting and disassembling the mail attachment text content, and the management person can sample conveniently.

Description

Mail attachment virus identification method, system, equipment and storable medium
Technical Field
The present invention relates to the field of mail attachment virus identification technologies, and in particular, to a mail attachment virus identification method, system, device, and storable medium.
Background
In daily work of people, mail is necessary for communication, so that the attacks carried by the mail are very much, malicious information can be transmitted through the mail to induce users to perform some operations, malicious network links can be transmitted, user name passwords and the like are obtained, and malicious programs are transmitted, and the malicious programs usually exist in an attachment form, so that virus identification of mail attachments is very important, the existing mail attachment identification method is to analyze the mail to obtain the mail attachments, and the attachments are identified by using antivirus software, so that the effect of virus identification of the mail attachments is achieved, however, the identification accuracy is lower in the prior art based on feature codes.
Disclosure of Invention
In view of the above, the present invention provides a method, a system, a device and a storage medium for identifying mail attachment viruses, which can solve the defect of low identification accuracy in the prior art.
The technical scheme of the invention is realized as follows:
a mail attachment virus identification method specifically comprises the following steps:
acquiring a mail file;
judging whether the mail file comprises a mail attachment file header, if yes, extracting mail text content and mail attachment, otherwise extracting mail text content;
extracting file header characteristics of the mail attachment;
and carrying out format identification on the mail attachment according to the file header characteristics, and identifying whether the mail attachment is a risk mail attachment, if so, extracting and disassembling and storing text content of the mail attachment, thereby realizing identification of mail attachment viruses.
As a further alternative to the mail attachment virus identification method, the file header features of the mail attachment include plain text format, document format, audio video format, picture format, application format, and other formats.
As a further alternative of the mail attachment virus identification method, the identifying the mail attachment according to the file header feature, and identifying whether the mail attachment is a risk mail attachment specifically includes:
and identifying the file header characteristics of the mail attachment, and identifying whether the file header characteristics of the mail attachment belong to any one of a plain text format, a document format, a voice video format, a picture format and an application program format, if so, judging that the mail attachment is a normal mail attachment, otherwise, judging that the mail attachment is a risk mail attachment.
As a further alternative of the mail attachment virus identification method, the extracting text content of the mail attachment from the mail attachment specifically includes:
converting the mail attachment into plain text;
the names of people, goods, businesses, addresses, containers, customs notes and contact phones in plain text are identified based on natural language processing techniques.
As a further alternative of the mail attachment virus identification method, the method for disassembling and storing the mail attachment specifically includes:
acquiring file storage rules and file storage positions;
disassembling the mail attachment according to the file storage rule;
and storing the disassembled mail attachments according to the file storage positions.
A mail attachment virus identification system comprising:
the first acquisition module is used for acquiring mail files;
the judging module is used for judging whether the mail file comprises a mail attachment file header or not;
the first extraction module is used for extracting the mail text content and the mail attachment or extracting the mail text content;
the second extraction module is used for extracting the file header characteristics of the mail attachment;
the identification module is used for carrying out format identification on the mail attachment according to the file header characteristics and identifying whether the mail attachment is a risk mail attachment or not;
the third extraction module is used for extracting the text content of the mail attachment;
and the disassembly storage module is used for carrying out disassembly storage on the text content of the mail attachment.
As a further alternative to the mail attachment virus identification system, the identification module includes:
the file head characteristic identification module is used for identifying the file head characteristic of the mail attachment and identifying whether the file head characteristic of the mail attachment belongs to any one of a plain text format, a document format, a voice video format, a picture format and an application program format;
and the judging module is used for judging whether the mail attachment is a risk mail attachment according to the file header characteristic identification.
As a further alternative of the mail attachment virus identification system, the third extraction module includes a conversion module and a processing module, and the disassembly storage module includes a second acquisition module, a disassembly module and a storage module, where:
the conversion module is used for converting the mail attachment into a plain text;
the processing module is used for identifying names, goods, enterprises, addresses, containers, customs notes and contact phones in the plain text based on natural language processing technology;
the second acquisition module is used for acquiring file storage rules and file storage positions;
the disassembly module is used for disassembling the mail attachment according to the file storage rule;
and the storage module is used for storing the disassembled mail attachments according to the file storage position.
A computing device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of any of the mail attachment virus identification methods described above when the computer program is executed.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of any of the mail attachment virus identification methods described above.
The beneficial effects of the invention are as follows: judging whether the mail file comprises a mail attachment file header, if yes, extracting the mail text content and the mail attachment, otherwise extracting the mail text content, then extracting file header characteristics of the mail attachment, identifying the format of the mail attachment according to the file header characteristics, and identifying whether the mail attachment is a risk mail attachment, thereby not only improving the efficiency of identifying the mail attachment virus, but also improving the accuracy of identifying the mail attachment virus.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a mail attachment virus identification method of the present invention;
FIG. 2 is a schematic diagram of a mail attachment virus identification system according to the present invention.
Detailed Description
The following description of the technical solutions in the embodiments of the present invention will be clear and complete, and it is obvious that the described embodiments are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1-2, a mail attachment virus identification method specifically includes:
acquiring a mail file;
judging whether the mail file comprises a mail attachment file header, if yes, extracting mail text content and mail attachment, otherwise extracting mail text content;
extracting file header characteristics of the mail attachment;
and carrying out format identification on the mail attachment according to the file header characteristics, and identifying whether the mail attachment is a risk mail attachment, if so, extracting and disassembling and storing text content of the mail attachment, thereby realizing identification of mail attachment viruses.
In this embodiment, whether the mail file includes a mail attachment file header is determined first, if yes, the mail text content and the mail attachment are extracted, otherwise, the mail text content is extracted, then the mail attachment is extracted by file header feature, format recognition is performed on the mail attachment according to the file header feature, and whether the mail attachment is a risk mail attachment is recognized, so that the efficiency of mail attachment virus recognition can be improved, the accuracy of mail attachment virus recognition can be improved, in addition, the risk mail attachment sample can be effectively saved by extracting and disassembling the mail attachment text content, and the management is facilitated.
Preferably, the file header feature of the mail attachment includes plain text format, document format, audio video format, picture format, application format, and other formats.
Preferably, the identifying the mail attachment according to the file header feature includes:
and identifying the file header characteristics of the mail attachment, and identifying whether the file header characteristics of the mail attachment belong to any one of a plain text format, a document format, a voice video format, a picture format and an application program format, if so, judging that the mail attachment is a normal mail attachment, otherwise, judging that the mail attachment is a risk mail attachment.
In this embodiment, if the file header of the mail attachment is characterized by other formats, the mail attachment is considered to be a risk mail attachment.
Preferably, the extracting text content of the mail attachment includes:
converting the mail attachment into plain text;
the names of people, goods, businesses, addresses, containers, customs notes and contact phones in plain text are identified based on natural language processing techniques.
Preferably, the disassembling and storing the mail attachment specifically includes:
acquiring file storage rules and file storage positions;
disassembling the mail attachment according to the file storage rule;
and storing the disassembled mail attachments according to the file storage positions.
In this embodiment, by acquiring the file storage rule and the file storage position, the mail attachment can be accurately disassembled, and the disassembled mail attachment can be accurately stored.
A mail attachment virus identification system comprising:
the first acquisition module is used for acquiring mail files;
the judging module is used for judging whether the mail file comprises a mail attachment file header or not;
the first extraction module is used for extracting the mail text content and the mail attachment or extracting the mail text content;
the second extraction module is used for extracting the file header characteristics of the mail attachment;
the identification module is used for carrying out format identification on the mail attachment according to the file header characteristics and identifying whether the mail attachment is a risk mail attachment or not;
the third extraction module is used for extracting the text content of the mail attachment;
and the disassembly storage module is used for carrying out disassembly storage on the text content of the mail attachment.
In this embodiment, whether the mail file includes a mail attachment file header is determined first, if yes, the mail text content and the mail attachment are extracted, otherwise, the mail text content is extracted, then the mail attachment is extracted by file header feature, format recognition is performed on the mail attachment according to the file header feature, and whether the mail attachment is a risk mail attachment is recognized, so that the efficiency of mail attachment virus recognition can be improved, the accuracy of mail attachment virus recognition can be improved, in addition, the risk mail attachment sample can be effectively saved by extracting and disassembling the mail attachment text content, and the management is facilitated.
Preferably, the identification module includes:
the file head characteristic identification module is used for identifying the file head characteristic of the mail attachment and identifying whether the file head characteristic of the mail attachment belongs to any one of a plain text format, a document format, a voice video format, a picture format and an application program format;
and the judging module is used for judging whether the mail attachment is a risk mail attachment according to the file header characteristic identification.
In this embodiment, if the file header of the mail attachment is characterized by other formats, the mail attachment is considered to be a risk mail attachment.
Preferably, the third extracting module includes a converting module and a processing module, and the disassembling storage module includes a second obtaining module, a disassembling module and a storage module, where:
the conversion module is used for converting the mail attachment into a plain text;
the processing module is used for identifying names, goods, enterprises, addresses, containers, customs notes and contact phones in the plain text based on natural language processing technology;
the second acquisition module is used for acquiring file storage rules and file storage positions;
the disassembly module is used for disassembling the mail attachment according to the file storage rule;
and the storage module is used for storing the disassembled mail attachments according to the file storage position.
In this embodiment, by acquiring the file storage rule and the file storage position, the mail attachment can be accurately disassembled, and the disassembled mail attachment can be accurately stored.
A computing device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of any of the mail attachment virus identification methods described above when the computer program is executed.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of any of the mail attachment virus identification methods described above.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (10)

1. The mail attachment virus identification method is characterized by comprising the following steps:
acquiring a mail file;
judging whether the mail file comprises a mail attachment file header, if yes, extracting mail text content and mail attachment, otherwise extracting mail text content;
extracting file header characteristics of the mail attachment;
and carrying out format identification on the mail attachment according to the file header characteristics, and identifying whether the mail attachment is a risk mail attachment, if so, extracting and disassembling and storing text content of the mail attachment, thereby realizing identification of mail attachment viruses.
2. The mail attachment virus identification method of claim 1, wherein the file header characteristics of the mail attachment include plain text format, document format, audio video format, picture format, application format, and other formats.
3. The method for identifying mail attachment virus according to claim 2, wherein the identifying the mail attachment in the format according to the header feature, and identifying whether the mail attachment is a risk mail attachment, specifically comprises:
and identifying the file header characteristics of the mail attachment, and identifying whether the file header characteristics of the mail attachment belong to any one of a plain text format, a document format, a voice video format, a picture format and an application program format, if so, judging that the mail attachment is a normal mail attachment, otherwise, judging that the mail attachment is a risk mail attachment.
4. A method for identifying a mail attachment virus according to claim 3, wherein the extracting text content of the mail attachment comprises:
converting the mail attachment into plain text;
the names of people, goods, businesses, addresses, containers, customs notes and contact phones in plain text are identified based on natural language processing techniques.
5. The method for identifying a mail attachment virus according to claim 4, wherein the disassembling and storing the mail attachment specifically comprises:
acquiring file storage rules and file storage positions;
disassembling the mail attachment according to the file storage rule;
and storing the disassembled mail attachments according to the file storage positions.
6. A mail attachment virus identification system, comprising:
the first acquisition module is used for acquiring mail files;
the judging module is used for judging whether the mail file comprises a mail attachment file header or not;
the first extraction module is used for extracting the mail text content and the mail attachment or extracting the mail text content;
the second extraction module is used for extracting the file header characteristics of the mail attachment;
the identification module is used for carrying out format identification on the mail attachment according to the file header characteristics and identifying whether the mail attachment is a risk mail attachment or not;
the third extraction module is used for extracting the text content of the mail attachment;
and the disassembly storage module is used for carrying out disassembly storage on the text content of the mail attachment.
7. The mail attachment virus identification system of claim 6, wherein the identification module comprises:
the file head characteristic identification module is used for identifying the file head characteristic of the mail attachment and identifying whether the file head characteristic of the mail attachment belongs to any one of a plain text format, a document format, a voice video format, a picture format and an application program format;
and the judging module is used for judging whether the mail attachment is a risk mail attachment according to the file header characteristic identification.
8. The mail attachment virus recognition system of claim 7, wherein the third extraction module comprises a conversion module and a processing module, and the disassemble storage module comprises a second acquisition module, a disassemble module, and a storage module, wherein:
the conversion module is used for converting the mail attachment into a plain text;
the processing module is used for identifying names, goods, enterprises, addresses, containers, customs notes and contact phones in the plain text based on natural language processing technology;
the second acquisition module is used for acquiring file storage rules and file storage positions;
the disassembly module is used for disassembling the mail attachment according to the file storage rule;
and the storage module is used for storing the disassembled mail attachments according to the file storage position.
9. A computing device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the mail attachment virus identification method of any one of claims 1-5 when the computer program is executed.
10. A computer readable storage medium, wherein a computer program is stored on said storage medium, said computer program when executed by a processor implementing the steps of the mail attachment virus identification method of any one of claims 1-5.
CN202211728728.7A 2022-12-30 2022-12-30 Mail attachment virus identification method, system, equipment and storable medium Pending CN116150752A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211728728.7A CN116150752A (en) 2022-12-30 2022-12-30 Mail attachment virus identification method, system, equipment and storable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211728728.7A CN116150752A (en) 2022-12-30 2022-12-30 Mail attachment virus identification method, system, equipment and storable medium

Publications (1)

Publication Number Publication Date
CN116150752A true CN116150752A (en) 2023-05-23

Family

ID=86340077

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211728728.7A Pending CN116150752A (en) 2022-12-30 2022-12-30 Mail attachment virus identification method, system, equipment and storable medium

Country Status (1)

Country Link
CN (1) CN116150752A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8443447B1 (en) * 2009-08-06 2013-05-14 Trend Micro Incorporated Apparatus and method for detecting malware-infected electronic mail
CN103546449A (en) * 2012-12-24 2014-01-29 哈尔滨安天科技股份有限公司 E-mail virus detection method and device based on attachment formats
CN105991395A (en) * 2015-01-30 2016-10-05 杭州迪普科技有限公司 Attachment replacing method and attachment replacing device
CN115134147A (en) * 2022-06-29 2022-09-30 中国工商银行股份有限公司 E-mail detection method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8443447B1 (en) * 2009-08-06 2013-05-14 Trend Micro Incorporated Apparatus and method for detecting malware-infected electronic mail
CN103546449A (en) * 2012-12-24 2014-01-29 哈尔滨安天科技股份有限公司 E-mail virus detection method and device based on attachment formats
CN105991395A (en) * 2015-01-30 2016-10-05 杭州迪普科技有限公司 Attachment replacing method and attachment replacing device
CN115134147A (en) * 2022-06-29 2022-09-30 中国工商银行股份有限公司 E-mail detection method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张小磊: "计算机病毒诊断与防治", 中国环境科学出版社, pages: 316 *
盖璇;: "基于聚类分析算法的垃圾邮件识别", 计算机与现代化, no. 10, pages 21 - 26 *

Similar Documents

Publication Publication Date Title
US20170289082A1 (en) Method and device for identifying spam mail
RU2701040C1 (en) Method and a computer for informing on malicious web resources
CN114817968B (en) Method, device and equipment for tracing path of featureless data and storage medium
CN112839012A (en) Zombie program domain name identification method, device, equipment and storage medium
CN102045268A (en) Method and device for recovering email data
CN108038441B (en) System and method based on image recognition
JP5731361B2 (en) Character string conversion method and character string conversion program
CN116305113A (en) Executable file detection method, device, equipment and storage medium
CN114598597A (en) Multi-source log analysis method and device, computer equipment and medium
CN113468395A (en) Internet asset fingerprint identification method and system based on inverted index
CN116150752A (en) Mail attachment virus identification method, system, equipment and storable medium
CN112669850A (en) Voice quality detection method and device, computer equipment and storage medium
CN110955796B (en) Case feature information extraction method and device based on stroke information
CN109510904B (en) Method and system for detecting call center outbound record
CN111552783A (en) Content analysis query method, device, equipment and computer storage medium
CN116015777A (en) Document detection method, device, equipment and storage medium
CN113746814B (en) Mail processing method, mail processing device, electronic equipment and storage medium
CN115982675A (en) Document processing method, device, electronic equipment and storage medium
CN113472686B (en) Information identification method, device, equipment and storage medium
CN113704180B (en) Lossless firmware extraction method based on embedded device firmware file information feature library
CN103684991A (en) Junk mail filtering method based on mail features and content
CN113688240A (en) Threat element extraction method, device, equipment and storage medium
CN112733144A (en) Malicious program intelligent detection method based on deep learning technology
JP2012098855A (en) Specific information extraction apparatus and specific information extraction program
CN106506478A (en) A kind of data evidence collecting method for mobile terminal Zello applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination