CN110569357A - method and device for constructing mail classification model, terminal equipment and medium - Google Patents

method and device for constructing mail classification model, terminal equipment and medium Download PDF

Info

Publication number
CN110569357A
CN110569357A CN201910767882.7A CN201910767882A CN110569357A CN 110569357 A CN110569357 A CN 110569357A CN 201910767882 A CN201910767882 A CN 201910767882A CN 110569357 A CN110569357 A CN 110569357A
Authority
CN
China
Prior art keywords
data set
script
corpus
classification model
url link
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910767882.7A
Other languages
Chinese (zh)
Inventor
陈磊华
潘文辉
朱南皓
杨芸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
On Keke Science And Technology (guangzhou) Co Ltd
Original Assignee
On Keke Science And Technology (guangzhou) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by On Keke Science And Technology (guangzhou) Co Ltd filed Critical On Keke Science And Technology (guangzhou) Co Ltd
Priority to CN201910767882.7A priority Critical patent/CN110569357A/en
Publication of CN110569357A publication Critical patent/CN110569357A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method, a device, terminal equipment and a medium for constructing a mail classification model. The method comprises the following steps: constructing a target data set and a corpus by using a sample mail data set; the corpus is used for training word2vec models which correspond to the text data set, the URL link data set and the script data set one by one, and the text data set, the URL link data set and the script data set are converted into feature vectors by using the word2vec models; constructing classifiers which correspond to the data sets except the fusion data set in the target data set one by one, and training the classifiers to obtain corresponding classification models; using the fusion data set for training a classification model to obtain decision weights of various data in the fusion data set; and according to the decision weight, performing index evaluation verification and optimization on the classification model by using the test mail data set. The invention can establish a mail classification model aiming at various data in the mails, so that the mails can be detected in multiple dimensions through the mail classification model, and the high-efficiency classification of the mails is realized.

Description

Method and device for constructing mail classification model, terminal equipment and medium
Technical Field
The invention relates to the field of information security, in particular to a method, a device, terminal equipment and a medium for constructing a mail classification model.
Background
In the present society, email is commonly used in both social and business, financial and other aspects, but with the concomitant flooding of spam. In the mail flow of 2018, the proportion of junk mails is over 50%. The junk mails not only occupy huge network flow and consume a great deal of time, energy and money of recipients, but also malicious links, malicious scripts and horse hanging attachments of a lot of junk mails can cause information leakage of users, and various losses are directly caused.
With the rapid development of the internet, spam has also evolved from only containing a single type of content to containing multiple types of content, such as text, images, URL links, attachments, JavaScript scripts, etc. The traditional spam detection system based on content is based on spam detection of a single dimension, only aims at pictures or characters to construct machine learning model detection, and does not consider URL detection based on promotion links/malicious links and detection of mail text script jump links. The detection means can realize catch on the spam detection with various types of feature fusion, cannot achieve good detection efficiency, and has limitations.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a method, an apparatus, a terminal device and a medium for constructing a mail classification model, which can establish a mail classification model for various data in a mail, so that the mail can be subjected to multidimensional detection through the mail classification model, and efficient classification of the mail is realized.
In order to solve the technical problem, the invention provides a method for constructing a mail classification model, which comprises the following steps:
constructing a target data set and a corpus by using a sample mail data set; the target data set comprises a text data set, a URL link data set, a script data set, an image data set and a fusion data set, the fusion data set comprises data sets of various combinations of the text data, the URL link data, the script data and the image data, and the corpus comprises a text corpus, a URL link corpus and a script corpus;
the corpus is used for training word2vec models which are in one-to-one correspondence with the text data set, the URL link data set and the script data set, and the text data set, the URL link data set and the script data set are converted into feature vectors by utilizing the word2vec models;
constructing classifiers which correspond to the data sets in the target data set except the fused data set one by one, and training the classifiers to obtain corresponding classification models;
Using the fusion data set to train the classification model to obtain decision weights of various data in the fusion data set;
And according to the decision weight, performing index evaluation verification and optimization on the classification model by using a test mail data set.
further, the text corpus, the URL link corpus and the script corpus are respectively constructed according to the text data set, the URL link data set and the script data set.
Further, the classification model is a deep learning model.
Further, the corresponding classification model comprises:
The classification models of the text data set, the URL link data set, the script data set and the image data set are a CNN model, an RNN model, an LSTM model and a CNN model respectively.
The invention also provides a device for constructing the mail classification model, which comprises the following components:
The data acquisition module is used for constructing a target data set and a corpus by utilizing the sample mail data set; the target data set comprises a text data set, a URL link data set, a script data set, an image data set and a fusion data set, the fusion data set comprises data sets of various combinations of the text data, the URL link data, the script data and the image data, and the corpus comprises a text corpus, a URL link corpus and a script corpus;
The vector conversion module is used for training word2vec models which are in one-to-one correspondence with the text data set, the URL link data set and the script data set, and converting the text data set, the URL link data set and the script data set into feature vectors by using the word2vec models;
the model pre-building module is used for building classifiers which correspond to the data sets in the target data set except the fusion data set one by one, and training the classifiers to obtain corresponding classification models;
The weight acquisition module is used for using the fusion data set to train the classification model to obtain decision weights of various data in the fusion data set;
And the model optimization module is used for performing index evaluation verification and optimization on the classification model by using the test mail data set according to the decision weight.
Further, the text corpus, the URL link corpus and the script corpus are respectively constructed according to the text data set, the URL link data set and the script data set.
Further, the classification model is a deep learning model.
Further, the corresponding classification model comprises:
The classification models of the text data set, the URL link data set, the script data set and the image data set are a CNN model, an RNN model, an LSTM model and a CNN model respectively.
the embodiment of the invention has the following beneficial effects:
The embodiment of the invention can establish a mail classification model aiming at various data in the mails, so that the mails can be subjected to multi-dimensional detection through the mail classification model, and the high-efficiency classification of the mails is realized.
The invention also provides a terminal device for constructing a mail classification model, which comprises a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, wherein the memory is coupled with the processor, and the processor implements the method for constructing the mail classification model when executing the computer program.
The invention also provides a computer-readable storage medium, which includes a stored computer program, wherein when the computer program runs, the apparatus on which the computer-readable storage medium is located is controlled to execute the method for constructing the mail classification model.
Drawings
FIG. 1 is a schematic flow chart of a method for constructing a mail classification model according to a first embodiment of the present invention;
Fig. 2 is a schematic structural diagram of an apparatus for constructing a mail classification model according to a second embodiment of the present invention.
Detailed Description
the technical solutions in the present invention will be described clearly and completely with reference to the accompanying drawings, and it is obvious that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that, the step numbers in the text are only for convenience of explanation of the specific embodiments, and do not serve to limit the execution sequence of the steps. The method provided by the embodiment can be executed by the relevant server, and the server is taken as an example for explanation below.
A first embodiment. Please refer to fig. 1.
as shown in fig. 1, a method for constructing a mail classification model according to a first embodiment includes steps S1 to S5:
S1, constructing a target data set and a corpus by using the sample mail data set; the target data set comprises a text data set, a URL link data set, a script data set, an image data set and a fusion data set, the fusion data set comprises data sets of various combinations of the text data, the URL link data, the script data and the image data, and the corpus comprises a text corpus, a URL link corpus and a script corpus.
S2, the corpus is used for training word2vec models corresponding to the text data set, the URL link data set and the script data set one by one, and the text data set, the URL link data set and the script data set are converted into feature vectors by using the word2vec models.
And S3, constructing classifiers corresponding to the data sets in the target data set except the fused data set one by one, and training the classifiers to obtain corresponding classification models.
And S4, using the fusion data set to train the classification model to obtain decision weights of various data in the fusion data set.
and S5, according to the decision weight, performing index evaluation verification and optimization on the classification model by using a test mail data set.
It should be noted that the sample mail data set includes normal mail data and spam mail data.
in a specific embodiment, the normal mails and the junk mails can be obtained by a mail receiving and sending system, a mail anti-malicious anti-spam system, user marks, expert marks and the like.
It is understood that step S1 is to construct the text data set and the text corpus by using the text data in the sample mail data set; constructing the URL link data set and the URL link corpus by using the URL link data in the sample mail set; constructing the script data set and the script corpus by using the script data in the sample mail data set; constructing the image dataset with the image data in the sample mail dataset; constructing different sets of the fused data using a plurality of combinations of the text data, the URL link data, the script data, and the image data in the sample mail data set.
Step S2, training the word2vec model corresponding to the text data set by using the text corpus, so that the word2vec model converts the text data set into a feature vector; training the word2vec model corresponding to the URL link data set by using the URL link corpus, and converting the URL link data set into a feature vector by using the word2vec model; and training the word2vec model corresponding to the script data set by using the script corpus, so that the word2vec model converts the script data set into a feature vector.
The word2vec model is trained by using CBOW or skip-gram to convert the corresponding data into a computer-understandable vector. By converting the text data set, the URL link data set and the script data set into vectors that can be recognized by a computer, the computer is prevented from being interrupted due to the fact that the text data set, the URL link data set and the script data set cannot be recognized.
Step S3, constructing the classifier according to the text data set converted into the feature vector, and training the classifier to obtain the classification model corresponding to the text data set; constructing the classifier according to the URL link data set converted into the feature vector, and training the classifier to obtain the classification model corresponding to the URL link data set; constructing the classifier according to the script data set converted into the feature vector, and training the classifier to obtain the classification model corresponding to the script data set; and constructing the classifier according to the image data set, and training the classifier to obtain the classification model corresponding to the image data set. The classification models are all single-dimensional classification models and are only used for classifying according to one type of data.
The step S4, training the classification model corresponding to the text data set by using the fused data set, and obtaining a decision weight of the text data in the fused data set; training the classification model corresponding to the URL link data set by using the fusion data set to obtain decision weight of the URL link data in the fusion data set; training the classification model corresponding to the script data set by using the fusion data set to obtain a decision weight of the script data in the fusion data set; training the classification model corresponding to the image data set by using the fusion data set to obtain decision weight of the image data in the fusion data set.
In step S5, according to the decision weights of the text data, the URL link data, the script data, and the image data, a test mail dataset is used to perform index evaluation verification and optimization on the classification models corresponding to the text dataset, the URL link dataset, the script dataset, and the image dataset.
in the embodiment, the corresponding single-dimensional classification models are constructed according to the text data, the URL link data, the script data and the image data in the sample mails, and the decision weights of different data are utilized to fuse the single-dimensional classification models, so that the multi-dimensional classification model is obtained.
similarly, if the mail is classified according to other data in the mail, a corresponding one-dimensional model may be added according to the data, and the decision weight of the data is added to the multi-dimensional classification model in this embodiment.
In a specific embodiment, the text corpus, the URL link corpus, and the script corpus are constructed according to the text dataset, the URL link dataset, and the script dataset, respectively.
It is to be understood that the text corpus is constructed from the text data sets; constructing the URL link corpus according to the URL link data set; and constructing the script corpus according to the script data set.
In this embodiment, for the text data set, a chinese word segmentation tool and a chinese stop word are used to perform word segmentation to construct the text corpus; for the URL link data set, using URL link common symbols such as link address symbols "-", "/" and the like for division to construct the URL link corpus; and for the script data set, using an abstract syntax tree parsing script to construct the script corpus, for example, a JavaScript script, using Esprima.js to parse corresponding JavaScript codes into an abstract syntax tree, and further constructing the JavaScript script corpus.
in a specific embodiment, the classification model is a deep learning model.
it can be understood that the deep learning model is adopted as the classification model, which is beneficial to improving the accuracy of classification. And the deep learning model can automatically extract primary features and combine the primary features into advanced features for learning, i.e. additional manual intervention for feature extraction is not needed, thus being beneficial to improving the classification efficiency.
in a specific embodiment, the corresponding classification model in step S3 includes: the classification models of the text data set, the URL link data set, the script data set and the image data set are a CNN model, an RNN model, an LSTM model and a CNN model respectively.
It is understood that, by using a CNN model as a classification model of the text data set/the image data set, local features in the text data set/the image data set can be effectively identified; an RNN model is adopted as a classification model of the URL link data set, so that time series characteristics in the URL link data set can be effectively identified; and by adopting the LSTM model as the classification model of the script data set, the context code association characteristics in the script data set can be effectively identified.
The embodiment of the invention has the following beneficial effects:
The embodiment of the invention can establish a mail classification model aiming at various data in the mails, so that the mails can be subjected to multi-dimensional detection through the mail classification model, and the high-efficiency classification of the mails is realized.
A second embodiment. Please refer to fig. 2.
As shown in fig. 2, a second embodiment provides a mail classification model building apparatus, including: a data obtaining module 21, configured to construct a target data set and a corpus by using a sample email data set; the target data set comprises a text data set, a URL link data set, a script data set, an image data set and a fusion data set, the fusion data set comprises data sets of various combinations of the text data, the URL link data, the script data and the image data, and the corpus comprises a text corpus, a URL link corpus and a script corpus; a vector conversion module 22, configured to use the corpus to train a word2vec model in one-to-one correspondence with the text data set, the URL link data set, and the script data set, and convert the text data set, the URL link data set, and the script data set into feature vectors by using the word2vec model; the model pre-modeling block 23 is configured to construct classifiers that correspond to the data sets in the target data set, except for the fused data set, one to one, and train the classifiers to obtain corresponding classification models; a weight obtaining module 24, configured to use the fused data set for training the classification model to obtain decision weights of various data in the fused data set; and the model optimization module 25 is configured to perform index evaluation verification and optimization on the classification model by using the test mail data set according to the decision weight.
It should be noted that the sample mail data set includes normal mail data and spam mail data.
In a specific embodiment, the normal mails and the junk mails can be obtained by a mail receiving and sending system, a mail anti-malicious anti-spam system, user marks, expert marks and the like.
it is understood that the data obtaining module 21 constructs the text data set and the text corpus by using the text data in the sample mail data set; constructing the URL link data set and the URL link corpus by using the URL link data in the sample mail set; constructing the script data set and the script corpus by using the script data in the sample mail data set; constructing the image dataset with the image data in the sample mail dataset; constructing different sets of the fused data using a plurality of combinations of the text data, the URL link data, the script data, and the image data in the sample mail data set.
the vector conversion module 22 trains the word2vec model corresponding to the text data set by using the text corpus, so that the word2vec model converts the text data set into a feature vector; training the word2vec model corresponding to the URL link data set by using the URL link corpus, and converting the URL link data set into a feature vector by using the word2vec model; and training the word2vec model corresponding to the script data set by using the script corpus, so that the word2vec model converts the script data set into a feature vector.
the word2vec model is trained by using CBOW or skip-gram to convert the corresponding data into a computer-understandable vector. By converting the text data set, the URL link data set and the script data set into vectors that can be recognized by a computer, the computer is prevented from being interrupted due to the fact that the text data set, the URL link data set and the script data set cannot be recognized.
The model pre-building module 23 is configured to build the classifier according to the text data set converted into the feature vector, and train the classifier to obtain the classification model corresponding to the text data set; constructing the classifier according to the URL link data set converted into the feature vector, and training the classifier to obtain the classification model corresponding to the URL link data set; constructing the classifier according to the script data set converted into the feature vector, and training the classifier to obtain the classification model corresponding to the script data set; and constructing the classifier according to the image data set, and training the classifier to obtain the classification model corresponding to the image data set. The classification models are all single-dimensional classification models and are only used for classifying according to one type of data.
The weight obtaining module 24 is configured to train the classification model corresponding to the text data set by using the fused data set, so as to obtain a decision weight of the text data in the fused data set; training the classification model corresponding to the URL link data set by using the fusion data set to obtain decision weight of the URL link data in the fusion data set; training the classification model corresponding to the script data set by using the fusion data set to obtain a decision weight of the script data in the fusion data set; training the classification model corresponding to the image data set by using the fusion data set to obtain decision weight of the image data in the fusion data set.
The model optimization module 25 performs index evaluation verification and optimization on the classification models corresponding to the text data set, the URL link data set, the script data set, and the image data set one by one using a test mail data set according to decision weights of the text data, the URL link data, the script data, and the image data.
In the embodiment, the corresponding single-dimensional classification models are constructed according to the text data, the URL link data, the script data and the image data in the sample mails, and the decision weights of different data are utilized to fuse the single-dimensional classification models, so that the multi-dimensional classification model is obtained.
similarly, if the mail is classified according to other data in the mail, a corresponding one-dimensional model may be added according to the data, and the decision weight of the data is added to the multi-dimensional classification model in this embodiment.
in a specific embodiment, the text corpus, the URL link corpus, and the script corpus are constructed according to the text dataset, the URL link dataset, and the script dataset, respectively.
It is to be understood that the text corpus is constructed from the text data sets; constructing the URL link corpus according to the URL link data set; and constructing the script corpus according to the script data set.
In this embodiment, for the text data set, a chinese word segmentation tool and a chinese stop word are used to perform word segmentation to construct the text corpus; for the URL link data set, using URL link common symbols such as link address symbols "-", "/" and the like for division to construct the URL link corpus; and for the script data set, using an abstract syntax tree parsing script to construct the script corpus, for example, a JavaScript script, using Esprima.js to parse corresponding JavaScript codes into an abstract syntax tree, and further constructing the JavaScript script corpus.
In a specific embodiment, the classification model is a deep learning model.
it can be understood that the deep learning model is adopted as the classification model, which is beneficial to improving the accuracy of classification. And the deep learning model can automatically extract primary features and combine the primary features into advanced features for learning, i.e. additional manual intervention for feature extraction is not needed, thus being beneficial to improving the classification efficiency.
In a specific embodiment, the corresponding classification model includes: the classification models of the text data set, the URL link data set, the script data set and the image data set are a CNN model, an RNN model, an LSTM model and a CNN model respectively.
It is understood that, by using a CNN model as a classification model of the text data set/the image data set, local features in the text data set/the image data set can be effectively identified; an RNN model is adopted as a classification model of the URL link data set, so that time series characteristics in the URL link data set can be effectively identified; and by adopting the LSTM model as the classification model of the script data set, the context code association characteristics in the script data set can be effectively identified.
The embodiment of the invention has the following beneficial effects:
The embodiment of the invention can establish a mail classification model aiming at various data in the mails, so that the mails can be subjected to multi-dimensional detection through the mail classification model, and the high-efficiency classification of the mails is realized.
A third embodiment.
A third embodiment provides a terminal device for constructing a mail classification model, which includes a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, wherein the memory is coupled to the processor, and the processor executes the computer program to implement the method for constructing a mail classification model as described above, and has the same beneficial effects as the method for constructing a mail classification model.
A fourth embodiment.
a fourth embodiment provides a computer-readable storage medium, which includes a stored computer program, where when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the method for constructing a mail classification model as described above, and has the same beneficial effects as the method for constructing the mail classification model.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.
it will be understood by those skilled in the art that all or part of the processes of the above embodiments may be implemented by hardware related to instructions of a computer program, and the computer program may be stored in a computer readable storage medium, and when executed, may include the processes of the above embodiments. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

Claims (10)

1. a method for constructing a mail classification model is characterized by comprising the following steps:
Constructing a target data set and a corpus by using a sample mail data set; the target data set comprises a text data set, a URL link data set, a script data set, an image data set and a fusion data set, the fusion data set comprises data sets of various combinations of the text data, the URL link data, the script data and the image data, and the corpus comprises a text corpus, a URL link corpus and a script corpus;
The corpus is used for training word2vec models which are in one-to-one correspondence with the text data set, the URL link data set and the script data set, and the text data set, the URL link data set and the script data set are converted into feature vectors by utilizing the word2vec models;
Constructing classifiers which correspond to the data sets in the target data set except the fused data set one by one, and training the classifiers to obtain corresponding classification models;
Using the fusion data set to train the classification model to obtain decision weights of various data in the fusion data set;
And according to the decision weight, performing index evaluation verification and optimization on the classification model by using a test mail data set.
2. the method for constructing the mail classification model according to claim 1, wherein the text corpus, the URL link corpus and the script corpus are constructed from the text dataset, the URL link dataset and the script dataset, respectively.
3. The method of constructing a mail classification model according to claim 1, wherein the classification model is a deep learning model.
4. The method of constructing a mail classification model according to claim 1, wherein the corresponding classification model comprises:
The classification models of the text data set, the URL link data set, the script data set and the image data set are a CNN model, an RNN model, an LSTM model and a CNN model respectively.
5. an apparatus for constructing a mail classification model, comprising:
The data acquisition module is used for constructing a target data set and a corpus by utilizing the sample mail data set; the target data set comprises a text data set, a URL link data set, a script data set, an image data set and a fusion data set, the fusion data set comprises data sets of various combinations of the text data, the URL link data, the script data and the image data, and the corpus comprises a text corpus, a URL link corpus and a script corpus;
The vector conversion module is used for training word2vec models which are in one-to-one correspondence with the text data set, the URL link data set and the script data set, and converting the text data set, the URL link data set and the script data set into feature vectors by using the word2vec models;
The model pre-building module is used for building classifiers which correspond to the data sets in the target data set except the fusion data set one by one, and training the classifiers to obtain corresponding classification models;
The weight acquisition module is used for using the fusion data set to train the classification model to obtain decision weights of various data in the fusion data set;
and the model optimization module is used for performing index evaluation verification and optimization on the classification model by using the test mail data set according to the decision weight.
6. The apparatus for constructing a mail classification model according to claim 5, wherein the text corpus, the URL link corpus, and the script corpus are constructed from the text dataset, the URL link dataset, and the script dataset, respectively.
7. The apparatus for constructing a mail classification model according to claim 5, wherein the classification model is a deep learning model.
8. The apparatus for constructing a mail classification model according to claim 5, wherein the corresponding classification model comprises:
The classification models of the text data set, the URL link data set, the script data set and the image data set are a CNN model, an RNN model, an LSTM model and a CNN model respectively.
9. terminal device for the construction of a mail classification model, characterized in that it comprises a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the memory being coupled to the processor and the processor implementing the method for the construction of a mail classification model according to claims 1 to 4 when executing the computer program.
10. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the method of constructing a mail classification model according to claims 1 to 4.
CN201910767882.7A 2019-08-19 2019-08-19 method and device for constructing mail classification model, terminal equipment and medium Pending CN110569357A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910767882.7A CN110569357A (en) 2019-08-19 2019-08-19 method and device for constructing mail classification model, terminal equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910767882.7A CN110569357A (en) 2019-08-19 2019-08-19 method and device for constructing mail classification model, terminal equipment and medium

Publications (1)

Publication Number Publication Date
CN110569357A true CN110569357A (en) 2019-12-13

Family

ID=68773963

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910767882.7A Pending CN110569357A (en) 2019-08-19 2019-08-19 method and device for constructing mail classification model, terminal equipment and medium

Country Status (1)

Country Link
CN (1) CN110569357A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111221970A (en) * 2019-12-31 2020-06-02 论客科技(广州)有限公司 Mail classification method and device based on behavior structure and semantic content joint analysis
CN115424278A (en) * 2022-08-12 2022-12-02 中国电信股份有限公司 Mail detection method and device and electronic equipment
CN115424278B (en) * 2022-08-12 2024-05-03 中国电信股份有限公司 Mail detection method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070065003A1 (en) * 2005-09-21 2007-03-22 Lockheed Martin Corporation Real-time recognition of mixed source text
JP2008250437A (en) * 2007-03-29 2008-10-16 Mitsubishi Space Software Kk Mail data sorting apparatus, mail data sorting program, mail data sorting method, e-mail data hierarchy localization device, e-mail data hierarchy localization program, and e-mail data hierarchy localization method
CN108199951A (en) * 2018-01-04 2018-06-22 焦点科技股份有限公司 A kind of rubbish mail filtering method based on more algorithm fusion models
CN109800852A (en) * 2018-11-29 2019-05-24 电子科技大学 A kind of multi-modal spam filtering method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070065003A1 (en) * 2005-09-21 2007-03-22 Lockheed Martin Corporation Real-time recognition of mixed source text
JP2008250437A (en) * 2007-03-29 2008-10-16 Mitsubishi Space Software Kk Mail data sorting apparatus, mail data sorting program, mail data sorting method, e-mail data hierarchy localization device, e-mail data hierarchy localization program, and e-mail data hierarchy localization method
CN108199951A (en) * 2018-01-04 2018-06-22 焦点科技股份有限公司 A kind of rubbish mail filtering method based on more algorithm fusion models
CN109800852A (en) * 2018-11-29 2019-05-24 电子科技大学 A kind of multi-modal spam filtering method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
高扬: "《人工智能与机器人现金技术丛书 智能摘要与深度学习》", 31 July 2019, 北京理工大学出版社有限责任公司 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111221970A (en) * 2019-12-31 2020-06-02 论客科技(广州)有限公司 Mail classification method and device based on behavior structure and semantic content joint analysis
CN115424278A (en) * 2022-08-12 2022-12-02 中国电信股份有限公司 Mail detection method and device and electronic equipment
CN115424278B (en) * 2022-08-12 2024-05-03 中国电信股份有限公司 Mail detection method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN104067567B (en) System and method for carrying out spam detection using character histogram
CN110602113B (en) Hierarchical phishing website detection method based on deep learning
CN109873810B (en) Network fishing detection method based on goblet sea squirt group algorithm support vector machine
CN108259415A (en) A kind of method and device of mail-detection
GB2600028A (en) Detection of phishing campaigns
CN106528642A (en) TF-IDF feature extraction based short text classification method
CN107239512B (en) A kind of microblogging comment spam recognition methods of combination comment relational network figure
CN109889436B (en) Method for discovering spammer in social network
CN107294834A (en) A kind of method and apparatus for recognizing spam
CN105893484A (en) Microblog Spammer recognition method based on text characteristics and behavior characteristics
CN113055386A (en) Method and device for identifying and analyzing attack organization
CN112199606B (en) Social media-oriented rumor detection system based on hierarchical user representation
WO2014029318A1 (en) Method and apparatus for identifying webpage type
WO2020082763A1 (en) Decision trees-based method and apparatus for detecting phishing website, and computer device
CN110321707A (en) A kind of SQL injection detection method based on big data algorithm
CN114465780A (en) Fishing mail detection method and system based on feature extraction
CN115757991A (en) Webpage identification method and device, electronic equipment and storage medium
CN115099239A (en) Resource identification method, device, equipment and storage medium
CN110569357A (en) method and device for constructing mail classification model, terminal equipment and medium
JPWO2018150472A1 (en) Interactive attack simulation device, interactive attack simulation method, and interactive attack simulation program
CN107533574A (en) Email relationship finger system based on random index pattern match
CN109213858B (en) Automatic identification method and system for network water army
CN116127079B (en) Text classification method
CN110516125B (en) Method, device and equipment for identifying abnormal character string and readable storage medium
WO2023065640A1 (en) Model parameter adjustment method and apparatus, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191213

RJ01 Rejection of invention patent application after publication