CN114036264A - E-mail author identity attribution identification method based on small sample learning - Google Patents

E-mail author identity attribution identification method based on small sample learning Download PDF

Info

Publication number
CN114036264A
CN114036264A CN202111383946.7A CN202111383946A CN114036264A CN 114036264 A CN114036264 A CN 114036264A CN 202111383946 A CN202111383946 A CN 202111383946A CN 114036264 A CN114036264 A CN 114036264A
Authority
CN
China
Prior art keywords
mail
author
text
identity
representation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111383946.7A
Other languages
Chinese (zh)
Other versions
CN114036264B (en
Inventor
许益家
方勇
刘中临
杨悦
郭文博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202111383946.7A priority Critical patent/CN114036264B/en
Publication of CN114036264A publication Critical patent/CN114036264A/en
Application granted granted Critical
Publication of CN114036264B publication Critical patent/CN114036264B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2463/00Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
    • H04L2463/146Tracing the source of attacks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to an identification method for attribution of an e-mail author identity, wherein a detected object is an e-mail. The method is applied to the field of e-mail owner identification, and the core of the method is that valuable header fields are screened out aiming at the e-mail header, and the characteristics of the fields are calculated through a statistical algorithm. Aiming at the body of the e-mail, text representation of Word level is constructed through Word2Vec algorithm, text representation of character level is constructed through CNN algorithm, and the writing habit characteristics of the mail writer are captured by using BilSTM algorithm and self-attention mechanism. The three characteristics are fused to obtain a new representation, a category vector of an author identity is constructed by using a dynamic routing algorithm, finally, the similarity between the anonymous mail and the author category vector is calculated by using a nerve tensor, a label is distributed to the anonymous mail sample according to the similarity score, and finally, the identification of the author is realized.

Description

E-mail author identity attribution identification method based on small sample learning
Technical Field
The invention relates to the field of mail identity recognition, which is mainly characterized in that a large number of e-mail data sets are collected, extracted three parts of characteristics are fused by a natural language processing method and a BilST (TM) algorithm, an analysis network-based detection model is trained, and finally mail attribution recognition under the condition of insufficient samples is realized.
Background
Electronic mail is a common communication mode in work and life of people and is often used by attackers. Meanwhile, many difficulties are faced in the process of obtaining evidence of the e-mail, and one of them is to judge the real author of the e-mail. The attacker can forge the identity of other people to attack by stealing the user's certificate or directly deceiving the e-mail server. Security mechanisms that simply use the mail transfer protocol do not fully resist these attacks.
Currently, email is an important carrier of high-level sustainability attacks and phishing attacks, in order to make victims more vulnerable, attackers can steal others' accounts or masquerade as people trusted by victims, like things, friends, etc. Attackers typically utilize the following two means of attack: 1) an attacker can steal the login certificate of a victim through vulnerabilities such as phishing mails or Cross-site scripting (XSS) of the mails, and then attack again by using the stolen certificate; 2) the attacker directly deceives the mail server through the faking attack of the sender, and forges the sender of the mail as the email address of other people.
The evidence obtaining of the e-mail creates more convenient conditions for solving the trial and judgment of various cases, but a plurality of difficulties still exist in the evidence obtaining process of the e-mail: 1) although domestic email service providers all require users to perform real-name authentication, the email is a communication mode using an open protocol, and the users can select foreign email service providers or self-built email servers to send anonymous emails; 2) criminals may steal the mailboxes of others, making it difficult to determine the true senders during the forensics process; 3) the protocol used by the e-mail still has a security problem, and in international meetings of three years, the e-mail sender has related research on forging attack, and can impersonate other people by attacking the e-mail server. These difficulties can interfere with email forensics.
In the existing research on the attribution problem of the identity of the mail author, researchers generally extract the characteristics of the mail body through manual work or deep learning algorithm to represent the identity of the mail author, and the characteristics can generally reflect the writing habit of the mail author. After capturing the different features, the model is constructed using different algorithms. However, there are some limitations in current research:
firstly, researchers usually only keep the information of the body of the e-mail and ignore the characteristics of the head of the e-mail;
and meanwhile, researchers generally construct the model under the condition that sufficient data sets exist, and the condition that the E-mail data collection is difficult and the data set for constructing the model is smaller in scale under the condition that the reality is ignored.
Disclosure of Invention
The invention discloses an electronic mail author identity attribution identification method based on small sample learning, which aims to realize accurate identification of an electronic mail owner under the condition of insufficient samples at present and aims to realize attribution identification aiming at anonymous attack mails.
The invention innovatively provides an electronic mail attribution identification method based on small samples, which realizes the representation of comprehensive semantics by fusing the characteristic information of the mail head and the mail body and then realizes the identification of a mail owner by using an Introduction network. The main content of the invention is divided into three parts: (1) a mail coding module: in the selection of characteristics, the existing mail owner identification method considers either header information or only the problem that character level characteristics are lost due To the fact that text characteristics of a word level are considered, so that the effect is poor in the actual detection process. (2) The author identity expression module: the dynamic routing algorithm is used for mapping the samples into the space, the samples of the same class are represented by the same class vector, and the problem of sample projection in metric-based meta-learning is solved through the dynamic routing algorithm. Since the capsule network can robustly learn invariance in partial and overall relations, the class vectors acquired by the dynamic routing algorithm are more effective than ordinary sample weighted average. (3) A relationship query module: in order to measure the space distance between different samples better, the invention adopts a neural tensor network which can score the correlation between the samples and the class vectors, thereby realizing the accurate judgment of mail attribution. Compared with the traditional E-mail identity verification method, the invention creatively introduces the meta-learning method in the field of a small sample to solve the problem of low identification precision caused by small size of an obtained E-mail data set in a practical scene, and provides support for tracing anonymous E-mail attack.
Drawings
The objects, implementations, advantages and features of the present invention will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings.
FIG. 1 is a method hierarchy framework of the invention;
FIG. 2 is a block diagram of a data preparation module;
FIG. 3 is a mail encoding module overall framework;
FIG. 4 is a CNN-based character level characterization flow diagram;
FIG. 5 Authority representation Module flow diagram;
FIG. 6 is a relational query module flow diagram.
Detailed Description
The method is mainly used for identifying the identity affiliation of the e-mail based on small sample learning under the condition of insufficient samples. Firstly, a multi-feature fusion technology is utilized to perform fusion representation on the mail header feature, the word level feature and the character level feature, three parts of features with weights are used to jointly construct vector codes of an electronic mail, and each vector code can represent a sample of the electronic mail. And then, constructing a category vector capable of representing the identity of the author of the mail by using a dynamic routing algorithm, finally calculating the similarity between the anonymous mail and the author vector by using a nerve tensor model, distributing a label for the anonymous mail sample according to the similarity score, and finally determining the owner of the mail.
The overall framework of the invention comprises a total of four modules: the system comprises a data preparation module, a mail coding module, an author identity representation module and a relation query module, which are shown in figure 1. The framework has hierarchy, data flow is transmitted from bottom to top, and the output of the lower layer is used as the input of the upper layer to participate in various processing.
A data preparation module. The data preparation module is used for acquiring and processing the original data set, sorting the original data set and then transmitting the sorted original data set to the mail encoding module, and the whole framework is shown as figure 2. The functions of the device comprise: data deduplication, data cleaning and data segmentation. Data deduplication works primarily to remove duplicate data. The main job of data cleansing is to cleanse forwarding and reference content present in the email. When forwarding or referring to an e-mail, the body part of the e-mail may contain text content written by others or text content written before itself. In order to avoid that these contents affect the current writing style characteristics of the author, it is necessary to delete the text contents. After the e-mail is cleaned, the content of the e-mail text needs to be divided, including dividing by taking characters as units and dividing by taking words as units.
And a mail coding module. For a piece of e-mail to be classified, firstly, the head characteristic and the body characteristic of the e-mail are extracted. When the characteristics of the header of the e-mail are obtained, valuable fields of the header of the e-mail need to be screened and the characteristics of each field need to be mined; when the text characteristics of the e-mail are obtained, the model captures the writing habit characteristics of the author at the word level and the character level of the text of the e-mail respectively through a BilSTM algorithm and a self-attention mechanism, and the whole flow is shown in FIG. 3. After capturing the text characteristic of the electronic mail, strengthening the mapping relation between the text characteristic and the electronic mail author through a nonlinear activation function. And finally, fusing the head characteristic of the electronic mail and the body characteristic of the electronic mail, and outputting the multi-characteristic fused sample code through Softmax. The header feature selection and body feature embedding will be described in detail below.
And selecting head features. When sending an email, a sender needs to fill necessary information and optional filling information in a WebMail page or a mail client, and the information includes a recipient, a subject, email content and the like. Among these information, the recipient and subject belong to the email header field, and the email content belongs to the email body field. The mail headers of different service providers are also different, for which we have chosen five common header fields.
Date: the sending time of an email is normalized and unified to a UTC time zone as a Data characteristic;
from: sender address, each e-mail has only one sender;
3. To: recipient addresses, which may consist of one or more, typically separated by semicolons when there are multiple recipients;
subject: the subject of the e-mail can be long or short, and has no strict writing rule, and is influenced by the personal writing habit. The Subject feature is as follows: the number of words appearing in the subject of the mail, the number of characters appearing in the subject, the number of capital letters appearing in the subject, the number of numbers appearing in the subject, and the number of punctuation appearing in the subject;
cc: copy of email (Carbon Copy) means that when an email is sent, the same mail content is sent to a Copy person mailbox, and when there are a plurality of Copy persons, a semicolon is used as an interval. Cc is characterized as follows: the number of mailboxes existing in the recipient field, the number of mails existing in the transcriber field, the number of recipients with the same domain name as the sender electronic mailbox, and the number of transcribers with the same domain name as the sender electronic mailbox.
And embedding text features. The e-mail text contains rich semantic information, and the characteristic information is extracted for higher. We have performed character-level embedding and word-level embedding, respectively, on the text. In the text character level embedding process, the text is divided into individual characters and then coded using word2 vec. In the word level embedding process, the letters of the words are firstly converted into lower case letters, then the words are mapped into the vector space, and the lower case of the letters of the words has little influence on the semantics, so the conversion can be carried out. However, in the study of the problem of attribution of text authorship, the conversion of words into lower-case letters can cause the loss of the characteristics of writing habits of authors, and if words using upper-case letters are reserved, the explosion of a word vector space can be caused, for example, for the word "email", 25 = 32 word vector representations can exist. Therefore, when the Word-level representation of the email body is constructed through Word2Vec, the letters in the words are unified into lower case letters, and meanwhile, in order to supplement the information loss of the text representation of the email body, the character-level email body representation is constructed by using the CNN algorithm, and the representation flow is shown in FIG. 4.
And an author identity representation module. The Industration network belongs to a meta-learning algorithm based on a measurement technology, and the core idea is that a sample vector is mapped into a space, then a proper measurement algorithm is selected to calculate the space distance of the sample, the smaller the space distance is, the higher the similarity is, the higher the possibility of belonging to the same category is, so that how to map the sample into the space and construct the same vector from the samples of the same category is very important for representing. To accomplish this, the indexing network builds a class vector representation of the sample based on a dynamic routing algorithm, and the construction process is shown in fig. 5. And the e-mail sample codes are mapped to the space in an author identity representation module through a dynamic routing algorithm, and the samples in the same class are represented by the same class vector.
And a relation query module. After the construction of each author category characterization is completed, the category of the unknown sample needs to be detected, as shown in fig. 6. The relation query module calculates a 'space distance' between a sample to be queried and each author category vector representation through a neural tensor network as a similarity judgment basis, if the sample to be queried is matched with the category, the similarity is 1, and if not, the similarity is 0.
As described above, the invention successfully realizes the identification of the identity attribution of the e-mail under the condition of a small sample, and has higher accuracy and practicability. Compared with the prior detection method, the invention has the following innovations:
the method comprises the steps of firstly, extracting writing habit characteristics of an author from an email body, respectively constructing word-level email body representation and character-level email body representation, capturing the writing habit characteristics of the author through a BilSTM and self-attention mechanism, and then performing feature fusion on the writing habit characteristics of the email body and the email head characteristics to construct comprehensive semantic information of the email;
and secondly, the identity attribution identification of the e-mail based on small sample learning is adopted, so that the accurate identification of the mail owner can be realized under the condition of insufficient samples.
Although the preferred embodiments of the present invention have been described for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims.

Claims (3)

1. An e-mail identity attribution identification method based on small sample learning is characterized by comprising the following steps:
A. in the mail coding module, in order to more comprehensively extract the characteristics representing the identity of the author of the mail from the mail, the invention extracts the characteristics and information of the head and the text of the mail and fuses the characteristics and the information to finally generate a new representation of the mail;
B. in an author identity representation module, aggregating samples of the same category by using a dynamic routing algorithm, and generating a category vector representation;
C. and in the relation query module, calculating the similarity between the sample to be detected and different class vectors through a nerve tensor model so as to judge the class of the sample to be detected and finally realize the determination of the identity of the author of the mail.
2. The method for identifying the identity affiliation of the e-mail based on the small sample learning as claimed in claim 1, wherein in the mail encoding process, firstly extracting the header characteristics of the mail: the method comprises the following steps of including five header fields of controllable senders, namely Data, From, To, Subject and Cc, and statistical characteristics of each field; and then, carrying out feature embedding at the mail body word level: performing Word segmentation on the text of the e-mail, then constructing a Word list of words after Word segmentation, and finally generating a vector representation of the Word level of the text of the e-mail through a Word2Vec algorithm; and simultaneously, embedding character level features into the mail text: vectorizing the e-mail by One-hot, and outputting character level vector representation of the mail text by a convolutional neural network; then, extracting the writing style characteristics of the author by adopting a BilSTM algorithm and a self-attention mechanism aiming at the text characters and the word level characteristics; and finally, splicing the head and the body characteristics of the mail, performing fusion representation by using a weight network, outputting a new representation of the mail, and completing mail characteristic fusion.
3. The method as claimed in claim 1, wherein in the relational query process, the detection model inputs the mail code to be queried, then the "spatial distance" between the sample to be queried and each author category vector representation is calculated as the similarity through the neural tensor network, if the similarity is 1, the query sample is matched with the category, otherwise, the similarity is not matched, and finally the attribution category of the mail is obtained, thereby completing the author identification.
CN202111383946.7A 2021-11-19 2021-11-19 Email authorship attribution identification method based on small sample learning Active CN114036264B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111383946.7A CN114036264B (en) 2021-11-19 2021-11-19 Email authorship attribution identification method based on small sample learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111383946.7A CN114036264B (en) 2021-11-19 2021-11-19 Email authorship attribution identification method based on small sample learning

Publications (2)

Publication Number Publication Date
CN114036264A true CN114036264A (en) 2022-02-11
CN114036264B CN114036264B (en) 2023-06-16

Family

ID=80145046

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111383946.7A Active CN114036264B (en) 2021-11-19 2021-11-19 Email authorship attribution identification method based on small sample learning

Country Status (1)

Country Link
CN (1) CN114036264B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115580594A (en) * 2022-12-12 2023-01-06 四川大学 E-mail processing and transmitting method, system and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040024769A1 (en) * 2002-08-02 2004-02-05 Forman George H. System and method for inducing a top-down hierarchical categorizer
US20090276377A1 (en) * 2008-04-30 2009-11-05 Cisco Technology, Inc. Network data mining to determine user interest
US20120047014A1 (en) * 2010-08-23 2012-02-23 Yahoo! Inc. Method and system for using email receipts for targeted advertising
CN102663001A (en) * 2012-03-15 2012-09-12 华南理工大学 Automatic blog writer interest and character identifying method based on support vector machine
CN104138260A (en) * 2014-07-02 2014-11-12 中山大学 Sleeping posture multi-classifying identifying method utilizing SVM classifier
US20150256499A1 (en) * 2013-10-08 2015-09-10 Socialmail LLC Ranking, collection, organization, and management of non-subscription electronic messages
CN109102025A (en) * 2018-08-15 2018-12-28 电子科技大学 Pedestrian based on deep learning combined optimization recognition methods again
CN110263346A (en) * 2019-06-27 2019-09-20 卓尔智联(武汉)研究院有限公司 Lexical analysis method, electronic equipment and storage medium based on small-sample learning
CN112287156A (en) * 2019-07-22 2021-01-29 奥多比公司 Automatically selecting user-requested objects in an image using multiple object detection models
CN112818109A (en) * 2021-02-25 2021-05-18 网易(杭州)网络有限公司 Intelligent reply method, medium, device and computing equipment for mail
CN113326347A (en) * 2021-05-21 2021-08-31 四川省人工智能研究院(宜宾) Syntactic information perception author attribution method
CN113408660A (en) * 2021-07-15 2021-09-17 北京百度网讯科技有限公司 Book clustering method, device, equipment and storage medium
CN113630302A (en) * 2020-05-09 2021-11-09 阿里巴巴集团控股有限公司 Junk mail identification method and device and computer readable storage medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040024769A1 (en) * 2002-08-02 2004-02-05 Forman George H. System and method for inducing a top-down hierarchical categorizer
US20090276377A1 (en) * 2008-04-30 2009-11-05 Cisco Technology, Inc. Network data mining to determine user interest
US20120047014A1 (en) * 2010-08-23 2012-02-23 Yahoo! Inc. Method and system for using email receipts for targeted advertising
CN102663001A (en) * 2012-03-15 2012-09-12 华南理工大学 Automatic blog writer interest and character identifying method based on support vector machine
US20150256499A1 (en) * 2013-10-08 2015-09-10 Socialmail LLC Ranking, collection, organization, and management of non-subscription electronic messages
CN104138260A (en) * 2014-07-02 2014-11-12 中山大学 Sleeping posture multi-classifying identifying method utilizing SVM classifier
CN109102025A (en) * 2018-08-15 2018-12-28 电子科技大学 Pedestrian based on deep learning combined optimization recognition methods again
CN110263346A (en) * 2019-06-27 2019-09-20 卓尔智联(武汉)研究院有限公司 Lexical analysis method, electronic equipment and storage medium based on small-sample learning
CN112287156A (en) * 2019-07-22 2021-01-29 奥多比公司 Automatically selecting user-requested objects in an image using multiple object detection models
CN113630302A (en) * 2020-05-09 2021-11-09 阿里巴巴集团控股有限公司 Junk mail identification method and device and computer readable storage medium
CN112818109A (en) * 2021-02-25 2021-05-18 网易(杭州)网络有限公司 Intelligent reply method, medium, device and computing equipment for mail
CN113326347A (en) * 2021-05-21 2021-08-31 四川省人工智能研究院(宜宾) Syntactic information perception author attribution method
CN113408660A (en) * 2021-07-15 2021-09-17 北京百度网讯科技有限公司 Book clustering method, device, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
单成海: "反垃圾邮件系统研究及实现", pages 139 - 234 *
周城金: "基于MAML算法的小样本文本分类研究", pages 138 - 672 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115580594A (en) * 2022-12-12 2023-01-06 四川大学 E-mail processing and transmitting method, system and storage medium
CN115580594B (en) * 2022-12-12 2023-05-09 四川大学 E-mail processing and transmitting method, system and storage medium

Also Published As

Publication number Publication date
CN114036264B (en) 2023-06-16

Similar Documents

Publication Publication Date Title
US11516223B2 (en) Secure personalized trust-based messages classification system and method
US8489689B1 (en) Apparatus and method for obfuscation detection within a spam filtering model
CN104036780B (en) Man-machine identification method and system
US8112484B1 (en) Apparatus and method for auxiliary classification for generating features for a spam filtering model
CN103258157B (en) A kind of online handwriting authentication method based on finger information and system
CN102098235A (en) Fishing mail inspection method based on text characteristic analysis
CN112333185B (en) Domain name shadow detection method and device based on DNS (Domain name Server) resolution
CN115378629A (en) Ether mill network anomaly detection method and system based on graph neural network and storage medium
CN114036264B (en) Email authorship attribution identification method based on small sample learning
Alkawaz et al. Identification and analysis of phishing website based on machine learning methods
Narayanan et al. IronSense: towards the identification of fake user-profiles on twitter using machine learning
Parekh et al. Spam URL detection and image spam filtering using machine learning
Kumar et al. Design and comparison of advanced color based image CAPTCHAs
Morovati et al. Detection of Phishing Emails with Email Forensic Analysis and Machine Learning Techniques.
CN115774762A (en) Instant messaging information processing method, device, equipment and storage medium
CN114499980A (en) Phishing mail detection method, device, equipment and storage medium
JP6784975B2 (en) Evaluation device, evaluation method, evaluation program and evaluation system
WO2017094202A1 (en) Document structure analysis device which applies image processing
CN113746814A (en) Mail processing method and device, electronic equipment and storage medium
Giorgi et al. Email Spoofing Attack Detection through an End to End Authorship Attribution System.
CN110661750B (en) Mail sender identity detection method, system, equipment and storage medium
Du et al. Research of the anti-phishing technology based on e-mail extraction and analysis
CN109145298B (en) System, method, equipment and storage medium for identifying illegal outgoing mailbox
CN108256573B (en) Web Service client false application identification method
CN110071849B (en) Security protocol implementation security analysis method, device, medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant