CN110569357A - method and device for constructing mail classification model, terminal equipment and medium - Google Patents
method and device for constructing mail classification model, terminal equipment and medium Download PDFInfo
- Publication number
- CN110569357A CN110569357A CN201910767882.7A CN201910767882A CN110569357A CN 110569357 A CN110569357 A CN 110569357A CN 201910767882 A CN201910767882 A CN 201910767882A CN 110569357 A CN110569357 A CN 110569357A
- Authority
- CN
- China
- Prior art keywords
- data set
- script
- corpus
- classification model
- url link
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
- G06F16/9566—URL specific, e.g. using aliases, detecting broken or misspelled links
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method, a device, terminal equipment and a medium for constructing a mail classification model. The method comprises the following steps: constructing a target data set and a corpus by using a sample mail data set; the corpus is used for training word2vec models which correspond to the text data set, the URL link data set and the script data set one by one, and the text data set, the URL link data set and the script data set are converted into feature vectors by using the word2vec models; constructing classifiers which correspond to the data sets except the fusion data set in the target data set one by one, and training the classifiers to obtain corresponding classification models; using the fusion data set for training a classification model to obtain decision weights of various data in the fusion data set; and according to the decision weight, performing index evaluation verification and optimization on the classification model by using the test mail data set. The invention can establish a mail classification model aiming at various data in the mails, so that the mails can be detected in multiple dimensions through the mail classification model, and the high-efficiency classification of the mails is realized.
Description
Technical Field
The invention relates to the field of information security, in particular to a method, a device, terminal equipment and a medium for constructing a mail classification model.
Background
In the present society, email is commonly used in both social and business, financial and other aspects, but with the concomitant flooding of spam. In the mail flow of 2018, the proportion of junk mails is over 50%. The junk mails not only occupy huge network flow and consume a great deal of time, energy and money of recipients, but also malicious links, malicious scripts and horse hanging attachments of a lot of junk mails can cause information leakage of users, and various losses are directly caused.
With the rapid development of the internet, spam has also evolved from only containing a single type of content to containing multiple types of content, such as text, images, URL links, attachments, JavaScript scripts, etc. The traditional spam detection system based on content is based on spam detection of a single dimension, only aims at pictures or characters to construct machine learning model detection, and does not consider URL detection based on promotion links/malicious links and detection of mail text script jump links. The detection means can realize catch on the spam detection with various types of feature fusion, cannot achieve good detection efficiency, and has limitations.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a method, an apparatus, a terminal device and a medium for constructing a mail classification model, which can establish a mail classification model for various data in a mail, so that the mail can be subjected to multidimensional detection through the mail classification model, and efficient classification of the mail is realized.
In order to solve the technical problem, the invention provides a method for constructing a mail classification model, which comprises the following steps:
constructing a target data set and a corpus by using a sample mail data set; the target data set comprises a text data set, a URL link data set, a script data set, an image data set and a fusion data set, the fusion data set comprises data sets of various combinations of the text data, the URL link data, the script data and the image data, and the corpus comprises a text corpus, a URL link corpus and a script corpus;
the corpus is used for training word2vec models which are in one-to-one correspondence with the text data set, the URL link data set and the script data set, and the text data set, the URL link data set and the script data set are converted into feature vectors by utilizing the word2vec models;
constructing classifiers which correspond to the data sets in the target data set except the fused data set one by one, and training the classifiers to obtain corresponding classification models;
Using the fusion data set to train the classification model to obtain decision weights of various data in the fusion data set;
And according to the decision weight, performing index evaluation verification and optimization on the classification model by using a test mail data set.
further, the text corpus, the URL link corpus and the script corpus are respectively constructed according to the text data set, the URL link data set and the script data set.
Further, the classification model is a deep learning model.
Further, the corresponding classification model comprises:
The classification models of the text data set, the URL link data set, the script data set and the image data set are a CNN model, an RNN model, an LSTM model and a CNN model respectively.
The invention also provides a device for constructing the mail classification model, which comprises the following components:
The data acquisition module is used for constructing a target data set and a corpus by utilizing the sample mail data set; the target data set comprises a text data set, a URL link data set, a script data set, an image data set and a fusion data set, the fusion data set comprises data sets of various combinations of the text data, the URL link data, the script data and the image data, and the corpus comprises a text corpus, a URL link corpus and a script corpus;
The vector conversion module is used for training word2vec models which are in one-to-one correspondence with the text data set, the URL link data set and the script data set, and converting the text data set, the URL link data set and the script data set into feature vectors by using the word2vec models;
the model pre-building module is used for building classifiers which correspond to the data sets in the target data set except the fusion data set one by one, and training the classifiers to obtain corresponding classification models;
The weight acquisition module is used for using the fusion data set to train the classification model to obtain decision weights of various data in the fusion data set;
And the model optimization module is used for performing index evaluation verification and optimization on the classification model by using the test mail data set according to the decision weight.
Further, the text corpus, the URL link corpus and the script corpus are respectively constructed according to the text data set, the URL link data set and the script data set.
Further, the classification model is a deep learning model.
Further, the corresponding classification model comprises:
The classification models of the text data set, the URL link data set, the script data set and the image data set are a CNN model, an RNN model, an LSTM model and a CNN model respectively.
the embodiment of the invention has the following beneficial effects:
The embodiment of the invention can establish a mail classification model aiming at various data in the mails, so that the mails can be subjected to multi-dimensional detection through the mail classification model, and the high-efficiency classification of the mails is realized.
The invention also provides a terminal device for constructing a mail classification model, which comprises a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, wherein the memory is coupled with the processor, and the processor implements the method for constructing the mail classification model when executing the computer program.
The invention also provides a computer-readable storage medium, which includes a stored computer program, wherein when the computer program runs, the apparatus on which the computer-readable storage medium is located is controlled to execute the method for constructing the mail classification model.
Drawings
FIG. 1 is a schematic flow chart of a method for constructing a mail classification model according to a first embodiment of the present invention;
Fig. 2 is a schematic structural diagram of an apparatus for constructing a mail classification model according to a second embodiment of the present invention.
Detailed Description
the technical solutions in the present invention will be described clearly and completely with reference to the accompanying drawings, and it is obvious that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that, the step numbers in the text are only for convenience of explanation of the specific embodiments, and do not serve to limit the execution sequence of the steps. The method provided by the embodiment can be executed by the relevant server, and the server is taken as an example for explanation below.
A first embodiment. Please refer to fig. 1.
as shown in fig. 1, a method for constructing a mail classification model according to a first embodiment includes steps S1 to S5:
S1, constructing a target data set and a corpus by using the sample mail data set; the target data set comprises a text data set, a URL link data set, a script data set, an image data set and a fusion data set, the fusion data set comprises data sets of various combinations of the text data, the URL link data, the script data and the image data, and the corpus comprises a text corpus, a URL link corpus and a script corpus.
S2, the corpus is used for training word2vec models corresponding to the text data set, the URL link data set and the script data set one by one, and the text data set, the URL link data set and the script data set are converted into feature vectors by using the word2vec models.
And S3, constructing classifiers corresponding to the data sets in the target data set except the fused data set one by one, and training the classifiers to obtain corresponding classification models.
And S4, using the fusion data set to train the classification model to obtain decision weights of various data in the fusion data set.
and S5, according to the decision weight, performing index evaluation verification and optimization on the classification model by using a test mail data set.
It should be noted that the sample mail data set includes normal mail data and spam mail data.
in a specific embodiment, the normal mails and the junk mails can be obtained by a mail receiving and sending system, a mail anti-malicious anti-spam system, user marks, expert marks and the like.
It is understood that step S1 is to construct the text data set and the text corpus by using the text data in the sample mail data set; constructing the URL link data set and the URL link corpus by using the URL link data in the sample mail set; constructing the script data set and the script corpus by using the script data in the sample mail data set; constructing the image dataset with the image data in the sample mail dataset; constructing different sets of the fused data using a plurality of combinations of the text data, the URL link data, the script data, and the image data in the sample mail data set.
Step S2, training the word2vec model corresponding to the text data set by using the text corpus, so that the word2vec model converts the text data set into a feature vector; training the word2vec model corresponding to the URL link data set by using the URL link corpus, and converting the URL link data set into a feature vector by using the word2vec model; and training the word2vec model corresponding to the script data set by using the script corpus, so that the word2vec model converts the script data set into a feature vector.
The word2vec model is trained by using CBOW or skip-gram to convert the corresponding data into a computer-understandable vector. By converting the text data set, the URL link data set and the script data set into vectors that can be recognized by a computer, the computer is prevented from being interrupted due to the fact that the text data set, the URL link data set and the script data set cannot be recognized.
Step S3, constructing the classifier according to the text data set converted into the feature vector, and training the classifier to obtain the classification model corresponding to the text data set; constructing the classifier according to the URL link data set converted into the feature vector, and training the classifier to obtain the classification model corresponding to the URL link data set; constructing the classifier according to the script data set converted into the feature vector, and training the classifier to obtain the classification model corresponding to the script data set; and constructing the classifier according to the image data set, and training the classifier to obtain the classification model corresponding to the image data set. The classification models are all single-dimensional classification models and are only used for classifying according to one type of data.
The step S4, training the classification model corresponding to the text data set by using the fused data set, and obtaining a decision weight of the text data in the fused data set; training the classification model corresponding to the URL link data set by using the fusion data set to obtain decision weight of the URL link data in the fusion data set; training the classification model corresponding to the script data set by using the fusion data set to obtain a decision weight of the script data in the fusion data set; training the classification model corresponding to the image data set by using the fusion data set to obtain decision weight of the image data in the fusion data set.
In step S5, according to the decision weights of the text data, the URL link data, the script data, and the image data, a test mail dataset is used to perform index evaluation verification and optimization on the classification models corresponding to the text dataset, the URL link dataset, the script dataset, and the image dataset.
in the embodiment, the corresponding single-dimensional classification models are constructed according to the text data, the URL link data, the script data and the image data in the sample mails, and the decision weights of different data are utilized to fuse the single-dimensional classification models, so that the multi-dimensional classification model is obtained.
similarly, if the mail is classified according to other data in the mail, a corresponding one-dimensional model may be added according to the data, and the decision weight of the data is added to the multi-dimensional classification model in this embodiment.
In a specific embodiment, the text corpus, the URL link corpus, and the script corpus are constructed according to the text dataset, the URL link dataset, and the script dataset, respectively.
It is to be understood that the text corpus is constructed from the text data sets; constructing the URL link corpus according to the URL link data set; and constructing the script corpus according to the script data set.
In this embodiment, for the text data set, a chinese word segmentation tool and a chinese stop word are used to perform word segmentation to construct the text corpus; for the URL link data set, using URL link common symbols such as link address symbols "-", "/" and the like for division to construct the URL link corpus; and for the script data set, using an abstract syntax tree parsing script to construct the script corpus, for example, a JavaScript script, using Esprima.js to parse corresponding JavaScript codes into an abstract syntax tree, and further constructing the JavaScript script corpus.
in a specific embodiment, the classification model is a deep learning model.
it can be understood that the deep learning model is adopted as the classification model, which is beneficial to improving the accuracy of classification. And the deep learning model can automatically extract primary features and combine the primary features into advanced features for learning, i.e. additional manual intervention for feature extraction is not needed, thus being beneficial to improving the classification efficiency.
in a specific embodiment, the corresponding classification model in step S3 includes: the classification models of the text data set, the URL link data set, the script data set and the image data set are a CNN model, an RNN model, an LSTM model and a CNN model respectively.
It is understood that, by using a CNN model as a classification model of the text data set/the image data set, local features in the text data set/the image data set can be effectively identified; an RNN model is adopted as a classification model of the URL link data set, so that time series characteristics in the URL link data set can be effectively identified; and by adopting the LSTM model as the classification model of the script data set, the context code association characteristics in the script data set can be effectively identified.
The embodiment of the invention has the following beneficial effects:
The embodiment of the invention can establish a mail classification model aiming at various data in the mails, so that the mails can be subjected to multi-dimensional detection through the mail classification model, and the high-efficiency classification of the mails is realized.
A second embodiment. Please refer to fig. 2.
As shown in fig. 2, a second embodiment provides a mail classification model building apparatus, including: a data obtaining module 21, configured to construct a target data set and a corpus by using a sample email data set; the target data set comprises a text data set, a URL link data set, a script data set, an image data set and a fusion data set, the fusion data set comprises data sets of various combinations of the text data, the URL link data, the script data and the image data, and the corpus comprises a text corpus, a URL link corpus and a script corpus; a vector conversion module 22, configured to use the corpus to train a word2vec model in one-to-one correspondence with the text data set, the URL link data set, and the script data set, and convert the text data set, the URL link data set, and the script data set into feature vectors by using the word2vec model; the model pre-modeling block 23 is configured to construct classifiers that correspond to the data sets in the target data set, except for the fused data set, one to one, and train the classifiers to obtain corresponding classification models; a weight obtaining module 24, configured to use the fused data set for training the classification model to obtain decision weights of various data in the fused data set; and the model optimization module 25 is configured to perform index evaluation verification and optimization on the classification model by using the test mail data set according to the decision weight.
It should be noted that the sample mail data set includes normal mail data and spam mail data.
In a specific embodiment, the normal mails and the junk mails can be obtained by a mail receiving and sending system, a mail anti-malicious anti-spam system, user marks, expert marks and the like.
it is understood that the data obtaining module 21 constructs the text data set and the text corpus by using the text data in the sample mail data set; constructing the URL link data set and the URL link corpus by using the URL link data in the sample mail set; constructing the script data set and the script corpus by using the script data in the sample mail data set; constructing the image dataset with the image data in the sample mail dataset; constructing different sets of the fused data using a plurality of combinations of the text data, the URL link data, the script data, and the image data in the sample mail data set.
the vector conversion module 22 trains the word2vec model corresponding to the text data set by using the text corpus, so that the word2vec model converts the text data set into a feature vector; training the word2vec model corresponding to the URL link data set by using the URL link corpus, and converting the URL link data set into a feature vector by using the word2vec model; and training the word2vec model corresponding to the script data set by using the script corpus, so that the word2vec model converts the script data set into a feature vector.
the word2vec model is trained by using CBOW or skip-gram to convert the corresponding data into a computer-understandable vector. By converting the text data set, the URL link data set and the script data set into vectors that can be recognized by a computer, the computer is prevented from being interrupted due to the fact that the text data set, the URL link data set and the script data set cannot be recognized.
The model pre-building module 23 is configured to build the classifier according to the text data set converted into the feature vector, and train the classifier to obtain the classification model corresponding to the text data set; constructing the classifier according to the URL link data set converted into the feature vector, and training the classifier to obtain the classification model corresponding to the URL link data set; constructing the classifier according to the script data set converted into the feature vector, and training the classifier to obtain the classification model corresponding to the script data set; and constructing the classifier according to the image data set, and training the classifier to obtain the classification model corresponding to the image data set. The classification models are all single-dimensional classification models and are only used for classifying according to one type of data.
The weight obtaining module 24 is configured to train the classification model corresponding to the text data set by using the fused data set, so as to obtain a decision weight of the text data in the fused data set; training the classification model corresponding to the URL link data set by using the fusion data set to obtain decision weight of the URL link data in the fusion data set; training the classification model corresponding to the script data set by using the fusion data set to obtain a decision weight of the script data in the fusion data set; training the classification model corresponding to the image data set by using the fusion data set to obtain decision weight of the image data in the fusion data set.
The model optimization module 25 performs index evaluation verification and optimization on the classification models corresponding to the text data set, the URL link data set, the script data set, and the image data set one by one using a test mail data set according to decision weights of the text data, the URL link data, the script data, and the image data.
In the embodiment, the corresponding single-dimensional classification models are constructed according to the text data, the URL link data, the script data and the image data in the sample mails, and the decision weights of different data are utilized to fuse the single-dimensional classification models, so that the multi-dimensional classification model is obtained.
similarly, if the mail is classified according to other data in the mail, a corresponding one-dimensional model may be added according to the data, and the decision weight of the data is added to the multi-dimensional classification model in this embodiment.
in a specific embodiment, the text corpus, the URL link corpus, and the script corpus are constructed according to the text dataset, the URL link dataset, and the script dataset, respectively.
It is to be understood that the text corpus is constructed from the text data sets; constructing the URL link corpus according to the URL link data set; and constructing the script corpus according to the script data set.
In this embodiment, for the text data set, a chinese word segmentation tool and a chinese stop word are used to perform word segmentation to construct the text corpus; for the URL link data set, using URL link common symbols such as link address symbols "-", "/" and the like for division to construct the URL link corpus; and for the script data set, using an abstract syntax tree parsing script to construct the script corpus, for example, a JavaScript script, using Esprima.js to parse corresponding JavaScript codes into an abstract syntax tree, and further constructing the JavaScript script corpus.
In a specific embodiment, the classification model is a deep learning model.
it can be understood that the deep learning model is adopted as the classification model, which is beneficial to improving the accuracy of classification. And the deep learning model can automatically extract primary features and combine the primary features into advanced features for learning, i.e. additional manual intervention for feature extraction is not needed, thus being beneficial to improving the classification efficiency.
In a specific embodiment, the corresponding classification model includes: the classification models of the text data set, the URL link data set, the script data set and the image data set are a CNN model, an RNN model, an LSTM model and a CNN model respectively.
It is understood that, by using a CNN model as a classification model of the text data set/the image data set, local features in the text data set/the image data set can be effectively identified; an RNN model is adopted as a classification model of the URL link data set, so that time series characteristics in the URL link data set can be effectively identified; and by adopting the LSTM model as the classification model of the script data set, the context code association characteristics in the script data set can be effectively identified.
The embodiment of the invention has the following beneficial effects:
The embodiment of the invention can establish a mail classification model aiming at various data in the mails, so that the mails can be subjected to multi-dimensional detection through the mail classification model, and the high-efficiency classification of the mails is realized.
A third embodiment.
A third embodiment provides a terminal device for constructing a mail classification model, which includes a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, wherein the memory is coupled to the processor, and the processor executes the computer program to implement the method for constructing a mail classification model as described above, and has the same beneficial effects as the method for constructing a mail classification model.
A fourth embodiment.
a fourth embodiment provides a computer-readable storage medium, which includes a stored computer program, where when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the method for constructing a mail classification model as described above, and has the same beneficial effects as the method for constructing the mail classification model.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.
it will be understood by those skilled in the art that all or part of the processes of the above embodiments may be implemented by hardware related to instructions of a computer program, and the computer program may be stored in a computer readable storage medium, and when executed, may include the processes of the above embodiments. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
Claims (10)
1. a method for constructing a mail classification model is characterized by comprising the following steps:
Constructing a target data set and a corpus by using a sample mail data set; the target data set comprises a text data set, a URL link data set, a script data set, an image data set and a fusion data set, the fusion data set comprises data sets of various combinations of the text data, the URL link data, the script data and the image data, and the corpus comprises a text corpus, a URL link corpus and a script corpus;
The corpus is used for training word2vec models which are in one-to-one correspondence with the text data set, the URL link data set and the script data set, and the text data set, the URL link data set and the script data set are converted into feature vectors by utilizing the word2vec models;
Constructing classifiers which correspond to the data sets in the target data set except the fused data set one by one, and training the classifiers to obtain corresponding classification models;
Using the fusion data set to train the classification model to obtain decision weights of various data in the fusion data set;
And according to the decision weight, performing index evaluation verification and optimization on the classification model by using a test mail data set.
2. the method for constructing the mail classification model according to claim 1, wherein the text corpus, the URL link corpus and the script corpus are constructed from the text dataset, the URL link dataset and the script dataset, respectively.
3. The method of constructing a mail classification model according to claim 1, wherein the classification model is a deep learning model.
4. The method of constructing a mail classification model according to claim 1, wherein the corresponding classification model comprises:
The classification models of the text data set, the URL link data set, the script data set and the image data set are a CNN model, an RNN model, an LSTM model and a CNN model respectively.
5. an apparatus for constructing a mail classification model, comprising:
The data acquisition module is used for constructing a target data set and a corpus by utilizing the sample mail data set; the target data set comprises a text data set, a URL link data set, a script data set, an image data set and a fusion data set, the fusion data set comprises data sets of various combinations of the text data, the URL link data, the script data and the image data, and the corpus comprises a text corpus, a URL link corpus and a script corpus;
The vector conversion module is used for training word2vec models which are in one-to-one correspondence with the text data set, the URL link data set and the script data set, and converting the text data set, the URL link data set and the script data set into feature vectors by using the word2vec models;
The model pre-building module is used for building classifiers which correspond to the data sets in the target data set except the fusion data set one by one, and training the classifiers to obtain corresponding classification models;
The weight acquisition module is used for using the fusion data set to train the classification model to obtain decision weights of various data in the fusion data set;
and the model optimization module is used for performing index evaluation verification and optimization on the classification model by using the test mail data set according to the decision weight.
6. The apparatus for constructing a mail classification model according to claim 5, wherein the text corpus, the URL link corpus, and the script corpus are constructed from the text dataset, the URL link dataset, and the script dataset, respectively.
7. The apparatus for constructing a mail classification model according to claim 5, wherein the classification model is a deep learning model.
8. The apparatus for constructing a mail classification model according to claim 5, wherein the corresponding classification model comprises:
The classification models of the text data set, the URL link data set, the script data set and the image data set are a CNN model, an RNN model, an LSTM model and a CNN model respectively.
9. terminal device for the construction of a mail classification model, characterized in that it comprises a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the memory being coupled to the processor and the processor implementing the method for the construction of a mail classification model according to claims 1 to 4 when executing the computer program.
10. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the method of constructing a mail classification model according to claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910767882.7A CN110569357A (en) | 2019-08-19 | 2019-08-19 | method and device for constructing mail classification model, terminal equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910767882.7A CN110569357A (en) | 2019-08-19 | 2019-08-19 | method and device for constructing mail classification model, terminal equipment and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110569357A true CN110569357A (en) | 2019-12-13 |
Family
ID=68773963
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910767882.7A Pending CN110569357A (en) | 2019-08-19 | 2019-08-19 | method and device for constructing mail classification model, terminal equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110569357A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111221970A (en) * | 2019-12-31 | 2020-06-02 | 论客科技(广州)有限公司 | Mail classification method and device based on behavior structure and semantic content joint analysis |
CN115424278A (en) * | 2022-08-12 | 2022-12-02 | 中国电信股份有限公司 | Mail detection method and device and electronic equipment |
CN115424278B (en) * | 2022-08-12 | 2024-05-03 | 中国电信股份有限公司 | Mail detection method and device and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070065003A1 (en) * | 2005-09-21 | 2007-03-22 | Lockheed Martin Corporation | Real-time recognition of mixed source text |
JP2008250437A (en) * | 2007-03-29 | 2008-10-16 | Mitsubishi Space Software Kk | Mail data sorting apparatus, mail data sorting program, mail data sorting method, e-mail data hierarchy localization device, e-mail data hierarchy localization program, and e-mail data hierarchy localization method |
CN108199951A (en) * | 2018-01-04 | 2018-06-22 | 焦点科技股份有限公司 | A kind of rubbish mail filtering method based on more algorithm fusion models |
CN109800852A (en) * | 2018-11-29 | 2019-05-24 | 电子科技大学 | A kind of multi-modal spam filtering method |
-
2019
- 2019-08-19 CN CN201910767882.7A patent/CN110569357A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070065003A1 (en) * | 2005-09-21 | 2007-03-22 | Lockheed Martin Corporation | Real-time recognition of mixed source text |
JP2008250437A (en) * | 2007-03-29 | 2008-10-16 | Mitsubishi Space Software Kk | Mail data sorting apparatus, mail data sorting program, mail data sorting method, e-mail data hierarchy localization device, e-mail data hierarchy localization program, and e-mail data hierarchy localization method |
CN108199951A (en) * | 2018-01-04 | 2018-06-22 | 焦点科技股份有限公司 | A kind of rubbish mail filtering method based on more algorithm fusion models |
CN109800852A (en) * | 2018-11-29 | 2019-05-24 | 电子科技大学 | A kind of multi-modal spam filtering method |
Non-Patent Citations (1)
Title |
---|
高扬: "《人工智能与机器人现金技术丛书 智能摘要与深度学习》", 31 July 2019, 北京理工大学出版社有限责任公司 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111221970A (en) * | 2019-12-31 | 2020-06-02 | 论客科技(广州)有限公司 | Mail classification method and device based on behavior structure and semantic content joint analysis |
CN115424278A (en) * | 2022-08-12 | 2022-12-02 | 中国电信股份有限公司 | Mail detection method and device and electronic equipment |
CN115424278B (en) * | 2022-08-12 | 2024-05-03 | 中国电信股份有限公司 | Mail detection method and device and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104067567B (en) | System and method for carrying out spam detection using character histogram | |
CN110602113B (en) | Hierarchical phishing website detection method based on deep learning | |
CN109873810B (en) | Network fishing detection method based on goblet sea squirt group algorithm support vector machine | |
CN108259415A (en) | A kind of method and device of mail-detection | |
GB2600028A (en) | Detection of phishing campaigns | |
CN106528642A (en) | TF-IDF feature extraction based short text classification method | |
CN107239512B (en) | A kind of microblogging comment spam recognition methods of combination comment relational network figure | |
CN109889436B (en) | Method for discovering spammer in social network | |
CN107294834A (en) | A kind of method and apparatus for recognizing spam | |
CN105893484A (en) | Microblog Spammer recognition method based on text characteristics and behavior characteristics | |
CN113055386A (en) | Method and device for identifying and analyzing attack organization | |
CN112199606B (en) | Social media-oriented rumor detection system based on hierarchical user representation | |
WO2014029318A1 (en) | Method and apparatus for identifying webpage type | |
WO2020082763A1 (en) | Decision trees-based method and apparatus for detecting phishing website, and computer device | |
CN110321707A (en) | A kind of SQL injection detection method based on big data algorithm | |
CN114465780A (en) | Fishing mail detection method and system based on feature extraction | |
CN115757991A (en) | Webpage identification method and device, electronic equipment and storage medium | |
CN115099239A (en) | Resource identification method, device, equipment and storage medium | |
CN110569357A (en) | method and device for constructing mail classification model, terminal equipment and medium | |
JPWO2018150472A1 (en) | Interactive attack simulation device, interactive attack simulation method, and interactive attack simulation program | |
CN107533574A (en) | Email relationship finger system based on random index pattern match | |
CN109213858B (en) | Automatic identification method and system for network water army | |
CN116127079B (en) | Text classification method | |
CN110516125B (en) | Method, device and equipment for identifying abnormal character string and readable storage medium | |
WO2023065640A1 (en) | Model parameter adjustment method and apparatus, electronic device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191213 |
|
RJ01 | Rejection of invention patent application after publication |